I've changed the lookup table used to convert to DXT1. Each cell in the 256K entry table (32*32*32*8, for each 5:5:5 base color and 3-bit intensity table entry in my ETC1 subset format) now contains 10 entries, to account for each combination of actually used ETC1 selector ranges in a block:
{ 0, 0 },
{ 1, 1 },
{ 2, 2 },
{ 3, 3 },
{ 0, 3 },
{ 1, 3 },
{ 2, 3 },
{ 0, 2 },
{ 0, 1 },
{ 1, 2 }
The first 4 entries here account for blocks that get encoded into a single color. The next entry accounts for blocks which use all selectors, then { 1, 3 } accounts for blocks which only use selectors 1,2,3, etc.
So for example, when converting from ETC1, if only selector 2 was actually used in a block, the ETC1->DXT1 converter uses a set of DXT1 low/high colors optimized for that particular use case. If all selectors were used, it uses entry #4, etc. The downsides to this technique are the extra CPU expense in the ETC1->DXT1 converter to determine the range of used selectors, and the extra memory to hold a larger table.
Note the ETC1 encoder is still not aware at all that its output will also be DXT1 coded. That's the next experiment. I don't think using this larger lookup table is necessary; a smaller table should hopefully be OK if the ETC1 subset encoder is aware of the DXT1 artifacts its introducing in each trial. Another idea is to use a simple table most of the time, and only access the larger/deeper conversion table on blocks which use the brighter ETC1 intensity table indices (the ones with more error, like 5-7).
ETC1 (subset):
Error: Max: 80, Mean: 3.802, MSE: 30.247, RMSE: 5.500, PSNR: 33.324
Error: Max: 73, Mean: 3.939, MSE: 32.218, RMSE: 5.676, PSNR: 33.050
I experimented with allowing the DXT1 optimizer (used to build the lookup table) to use 3-color blocks. This is actually a big deal for this use case, because the transparent selector's color is black (0,0,0). ETC1's saturation to 0 or 255 after adding the intensity table values creates "strange" block colors (away from the block's colorspace line), and this trick allows the DXT1 optimizer to work around that issue better. I'm not using this trick above, though.
I started seriously looking at the BC7 texture format's details today. It's complex, but nowhere near as complex as ASTC. I'm very tempted to try converting my ETC1 subset to that format next.
Also, if you're wondering why I'm working on this stuff: I want to write one .CRN-like encoder that supports efficient transcoding into as many GPU formats as possible. It's a lot of work to write these encoders, and the idea of that work's value getting amplified across a huge range of platforms and devices is very appealing. A universal format's quality won't be the best, but it may be practical to add a losslessly encoded "fixup" chunk to the end of the universal file. This could improve quality for a specific GPU format.
No comments:
Post a Comment