I've written my second BC7 block encoder. My first was written in a straightforward way to gain experience with the format. My second was more focused on competing against the Fast ISPC Texture Compressor, but without using any SIMD, and was over 30x faster than my first attempt.
The BC7 encoders I've studied seem to be hyper focused on RGB PSNR metrics, which is just the wrong metric for many types of textures. Encoding authors that treat input textures as opaque arrays of 4x4 vectors are at a disadvantage in this domain. RGB PSNR tends to spread the error equally between the channels, which isn't what we want on sRGB textures. Instead, it's desirable to tradeoff a small amount of additional R/B error for less G error. This is what perceptual codecs like JPEG do: they transform the input into YCbCr space, then downsample and quantize the hell out of the CbCr coefficients because preserving chroma is a waste of bits.
Many other BC1 block compression codecs support weighted RGB metrics because in BC1 not doing so visually looks worse on sRGB photos/albedo textures/etc. Encoders using perceptual metrics look better on color gradients and with highly saturated blocks. Heavy usage of perceptual metrics dates back to at least NVidia's original nvdxt compressor, and it wasn't possible for crunch to compete against nvdxt without supporting perceptual metrics. The squish library recommends using perceptual metrics by default, because BC1 without perceptual metrics looks worse.
Anyhow, etc2comp by John Brooks takes things a step further and supports computing error metrics in weighted YCbCr space. Compared to vanilla RGB weighted metrics, this looks better in my experience writing Basis (especially with ETC1). I'm currently using weights (128,64,16).
Here's the REC 709 luma PSNR of 31 test textures encoded with ispc_texcomp (slow/highest quality - uses 7 modes) and my non-SIMD encoder in perceptual mode using just 4 modes:
The overall average PSNR for ispc_texcomp was 48.57, mine was 50.4. Even with ispc_texcomp's massive mode and SIMD advantages it does worse on this metric. ispc_texcomp doesn't support optimizing for perceptual metrics, which puts it at a huge disadvantage on many texture types.
I re-encoded the textures with linear metrics. My encoder used 6 modes: 0, 1, 3, 4, 5, and 6 (including all component rotations and the index flag).
Note that ispc_texcomp in its best/slowest profile is pretty slow, and is much easier to compete against without SIMD code. It's just trying way too hard. It's faster in its lower quality "basic" profile, but it still doesn't support perceptual metrics so it'll continue to fight up a very steep hill.
For benchmarking, I ran each encoder in a single thread, and called ispc_texcomp with 64 blocks at a time.
Other findings: ispc_texcomp has a very weak mode 0 encoder, and it's weaker than it should be on grayscale textures. I'll blog examples soon.