Saturday, April 21, 2018

RDO BC7 encoder planning

I'm building my first RDO BC7/BC6H encoders for our product (Basis), and I'm trying to decide which BC7 modes to implement first. I've decided on modes 1+6 for opaque textures. For alpha, I've boiled down the options to modes 1+5+6, 1+4+6, or maybe 1+4+6+7.

For alpha I know I'll need either 4 and/or 5 because they are the only modes with uncorrelated alpha support. Mode 1 isn't optional because it supports 2 subsets, greatly reducing ugly block artifacts. The 3 subset modes aren't used much in practice and aren't necessary. Mode 6 is optional but boosts quality on simple blocks due to its awesome 7777.1 bit endpoint precision and beefy 4-bit selectors. The bare minimum modes for a decent opaque/alpha encoder seems to be 1+4 or 1+5, but adding 6 and 7 will improve quality especially with alpha textures.

Others have suggested that I do a RDO BC7 encoder using only a single subset mode (6 seems perfect), but the result would have BC1-style block artifacts. I've already built a 2 subset RDO encoder for ETC1 in Basis and it isn't that much harder to support 2 subsets vs. 1.

RDO GPU texture encoders try to optimize the encoded output data so that, after the output is LZ compressed (and the output pretty much always is!), you get as much quality per compressed bit as you can. The output bit patterns matter a lot. If you can reuse a bit pattern that's occurred in a nearby block, without impacting quality too much, an LZ codec can exploit this to issue a match vs. literals (saving bits and improving quality per compressed output bit). There's more to it than that, because you also want the ability to control the output quality vs. compressed size tradeoff in an artist-friendly way.

My RDO BC1 encoder in Basis uses Lagrangian optimization, and this is simplified because in BC1 the bit patterns nicely line up to byte boundaries which modern LZ codecs like. The endpoints and selectors aren't munged together, and a large chunk of the output entropy is in the selector data.

Anyhow, here's some of the data I used to make these decisions about BC7. I have a lot more data than this which I may post later.

image corpus was kodim+7 others
For comparison ispc_basic (using 7 modes) gets 46.54 dB
Encoder was set to extremely high quality (better than ispc_slow)

Single mode:

mode enctime   dB               
1    83        45.8 
3    91.2      43.4 
6    12.1      42.9 
2    12.9      42.7 
0    19.6      42.34
4    38.56     41.48
5    14.4      38.45 

Key mode combinations:
                 
1+6  86.37     46.27
1+5  11.161    45.79
1+6  21.9      45.7 
1+4  137.6     45.38

This data definitely isn't ideal, because I used my ispc encoder which was heavily tuned for performance. So modes 1 and 3 tried all 64 partitions in this test, while modes 0 and 2 only tried the best predicted partition. But it matches up with earlier data I generated from a more brute force encoder.

Note that the mode 4 and 5 encoders used all rotation/index selector options, which skews the output a bit. I doubt I'll be supporting these mode 4/5 options. Two 1+6 entries are for the encoder set to very high vs. much lower quality levels.

Some earlier encoding error data for just kodim18, using an earlier C++ version of my encoder:

Best of any mode error: 12988.968550

Best of modes 0 and 1: 13186.750320
Best of modes 1 and 6: 13440.390471
Best of modes 1 and 2: 13521.855790
Best of modes 1 and 4: 13565.064541
Best of modes 1 and 3: 13589.143902
Best of modes 1 and 5: 13592.709517
Best of modes 1 and 7: 13604.182592
Best of modes 1 and 1: 13605.033774
Best of modes 0 and 6: 14099.723969
Best of modes 0 and 3: 15046.616630
Best of modes 2 and 6: 15383.155463
Best of modes 0 and 4: 15563.136445
Best of modes 0 and 5: 15665.245609
Best of modes 0 and 2: 15892.424359
Best of modes 3 and 6: 15977.876955

Best of modes 0 and 1 and 6: 13037.149688
Best of modes 0 and 1 and 4: 13160.175075
Best of modes 0 and 1 and 2: 13168.570462
Best of modes 0 and 1 and 3: 13171.807469
Best of modes 0 and 1 and 5: 13176.588633
Best of modes 0 and 1 and 7: 13186.504616
Best of modes 0 and 0 and 1: 13186.750320
Best of modes 0 and 1 and 1: 13186.750320
Best of modes 1 and 2 and 6: 13360.493404
Best of modes 1 and 4 and 6: 13400.774007
Best of modes 1 and 5 and 6: 13429.100640
Best of modes 1 and 3 and 6: 13433.822985
Best of modes 1 and 6 and 7: 13439.954762
Best of modes 1 and 1 and 6: 13440.390471
Best of modes 1 and 6 and 6: 13440.390471
Best of modes 1 and 2 and 4: 13489.904966
Best of modes 1 and 2 and 3: 13508.004738
Best of modes 1 and 2 and 5: 13511.406144
Best of modes 1 and 2 and 2: 13521.855790
Best of modes 1 and 1 and 2: 13521.855790

The 1+6 combination is pretty powerful. Only a quarter dB separates it from ispc_basic on this corpus. Also, mode 1's bit packing is nicely aligned to bytes (more or less) - perfect for RDO encoding!

byte bits
0    2 6  mode+partition
1    6 2  endpoints
2    4 4  endpoints
3    2 6  endpoints
4    6 2  endpoints
5    4 4  endpoints
6    2 6  endpoints
7    6 2  endpoints
8    4 4  endpoints
9    2 6  endpoints
10   2 6  pbits+selectors
11   8    selectors
12   8    selectors
13   8    selectors 
14   8    selectors
15   8    selectors

Note that most of mode 1's output is endpoint data, not selector data, so partition/pbit biasing, endpoint quantization and bit pattern optimization will be critical.

No comments:

Post a Comment