For the first time, the backend of any encoder I've written supports explicit selector RDO. It basically tries to balance luma noise vs. bitrate. I've discovered a large number of improvements to crunch's basic approach. Each time I get tempted to try a completely different algorithm, I find another optimization.
Here's a random example of what selector RDO looks like on kodim18. At these quality settings, Basis ETC1 intermediate format gets 1.52 bits/texel, RDO compression (with the exact same quality level) on the ETC1 .KTX file+LZMA gets 1.83 bits/texel, and plain ETC1+LZMA is 3.33 bits/texel.
Delta selector RDO Enabled: 1.52 bits/texel Luma SSIM: .867811 (plain non-RDO ETC1 Luma SSIM is .981066)
RGB delta image (biased to 128):
The two textures above share the exact same endpoint/selector codebooks, the same per-macroblock mode values (flip and differential bits), and the same block color "endpoints". The delta selectors in the 2nd image were chosen to balance introduced noise vs. bitrate. I'm still tuning this algorithm but it's amazing to find improvements like this with so little code. This improvement and others like it will also be available in Basis's support for the DXT formats.
We also support an "ETC1S" mode, that doesn't support per-block tiling or 4:4:4 "individual" mode colors. It transcodes to standard ETC1 blocks. It gets 1.1 bits/texel, Luma SSIM .8616 using the same selector/endpoint codebook sizes. This mode has sometimes noticeable 4x4 block artifacts, but otherwise can perform extremely well in a rate-distortion sense vs. full ETC1 mode.
ETC1S mode has better behavior in a rate-distortion sense vs. full ETC1 mode, so there's still a lot of work to do.
The encoder's backend hasn't been tuned hardly at all yet, there are some missing optimization steps (that crunch used), and the selector codebook compression stage needs to be redone. The ETC1 per-block "mode" bits are also pretty expensive right now, so I need to implement a form of explicit "mode bit" RDO that purposely biases the mode bits in the front end to achieve better compression.