I examined this graph to help come up with strong codec profiles. Some key takeaways after examining the Pareto frontier of this graph (the uppermost points on the convex hull from left to right):
- Ironically BC7 mode 0 (the mostly ignored 3 subset mode) is at its highest value in some of our fastest codec settings. Our 2 fastest settings use just mode 6. After this, we add mode 0 but just examine the first 1 or 4 partitions in our partition estimator. This combo is strong! I intuitively knew that mode 0 had unique value if the pbits were handled correctly. 3 subsets with 3-bit indices is a powerful combination. (If our partition estimator was faster we could afford to check more partitions in this mode before another set of codec settings becomes better.)
- Mode 0 is only valuable if you limit the # of partitions examined during estimation to the first 1 or 4 (and just evaluate the best). In this case, when combined with 6 it's a uniquely powerful combination. With every other practical set of encoder settings (anything below 219 secs), mode 0 is of no value.
- Mode 4 is totally useless for opaque textures (for all settings below 401 secs of compute). Note we currently always evaluate all the rotations and index settings for mode 4 when it's enabled. This is possibly a minor weakness in our encoder, so I'm going to fix this and regenerate the graph.
- Mode 6 is always Pareto optimal, i.e. it's always enabled in every optimal setting of the codec across the entire frontier. It doesn't have subsets, but it does have large 777.1 endpoints and 4-bit selectors. Mode 1 is also very strong, and is used across the entire frontier beyond the very fastest settings.
- There's a very steep quality barrier at around 25-30 secs of compute. Beyond this inflection point only minor quality gains (of around .2-.4 dB) can be made - but only with large increases in encoding CPU time. Every BC7 codec I've seen hits a wall like this, sooner or later.
- The sweetspot for our codec, at the beginning of this steep wall, is around 21 seconds of compute (~42x slower than mode 6 only). This config uses modes 1,3,6, limits the max # of partitions examined by the estimator to 42 for 1/3, and only evaluates the single best partition for modes 1/3. This mode also enables a more exhaustive pbit refinement mode, and a deeper endpoint refinement mode. Note we use the same estimated partition in mode 1/3 (they're both 2 subset modes), which is probably why this combo stands out.
- I'm not getting enough data samples on the frontier as I would like. Most samples have no value. I either need a more efficient way of computing random parameters, or I need to just use more samples.
- It's interesting to explore the non-Pareto optimal solutions. Mode 5 only with pbit searching is a really bad performer (fast but very low quality).