Saturday, May 12, 2018

Graphing our BC7 encoder's quality vs. encoding time for opaque textures

I non-RDO encoded to BC7 a 4k test texture containing random blocks chosen from a few thousand input textures 5000 times, using random codec settings for each trial encode. I recorded all the results to a CSV file.

The various stages of our codec can be configured in various ways. It's not immediately obvious or intuitive what settings are actually valuable, so we must run tests like this. Here are the various settings:

- Which modes to try
- Max # of partitions to examine for each mode (max of 16 or 64)
- Max # of partitions to evaluate after estimation, for each mode
- Various endpoint refinement settings ("uber" setting, iterative endpoint refinement trials)
- pbit refinement mode

Here's the resulting graph time vs. quality (RGB avg. PSNR) graph:

I examined this graph to help come up with strong codec profiles. Some key takeaways after examining the Pareto frontier of this graph (the uppermost points on the convex hull from left to right):
  • Ironically BC7 mode 0 (the mostly ignored 3 subset mode) is at its highest value in some of our fastest codec settings. Our 2 fastest settings use just mode 6. After this, we add mode 0 but just examine the first 1 or 4 partitions in our partition estimator. This combo is strong! I intuitively knew that mode 0 had unique value if the pbits were handled correctly. 3 subsets with 3-bit indices is a powerful combination. (If our partition estimator was faster we could afford to check more partitions in this mode before another set of codec settings becomes better.)
  • Mode 0 is only valuable if you limit the # of partitions examined during estimation to the first 1 or 4 (and just evaluate the best). In this case, when combined with 6 it's a uniquely powerful combination. With every other practical set of encoder settings (anything below 219 secs), mode 0 is of no value. 
  • Mode 4 is totally useless for opaque textures (for all settings below 401 secs of compute). Note we currently always evaluate all the rotations and index settings for mode 4 when it's enabled. This is possibly a minor weakness in our encoder, so I'm going to fix this and regenerate the graph.
  • Mode 6 is always Pareto optimal, i.e. it's always enabled in every optimal setting of the codec across the entire frontier. It doesn't have subsets, but it does have large 777.1 endpoints and 4-bit selectors. Mode 1 is also very strong, and is used across the entire frontier beyond the very fastest settings.
  • There's a very steep quality barrier at around 25-30 secs of compute. Beyond this inflection point only minor quality gains (of around .2-.4 dB) can be made - but only with large increases in encoding CPU time. Every BC7 codec I've seen hits a wall like this, sooner or later.
  • The sweetspot for our codec, at the beginning of this steep wall, is around 21 seconds of compute (~42x slower than mode 6 only). This config uses modes 1,3,6, limits the max # of partitions examined by the estimator to 42 for 1/3, and only evaluates the single best partition for modes 1/3. This mode also enables a more exhaustive pbit refinement mode, and a deeper endpoint refinement mode. Note we use the same estimated partition in mode 1/3 (they're both 2 subset modes), which is probably why this combo stands out. 
  • I'm not getting enough data samples on the frontier as I would like. Most samples have no value. I either need a more efficient way of computing random parameters, or I need to just use more samples.
  • It's interesting to explore the non-Pareto optimal solutions. Mode 5 only with pbit searching is a really bad performer (fast but very low quality).

No comments:

Post a Comment