Thursday, March 30, 2023

EA/Microsoft Neural Network GPU Texture Compression Patents

Both Microsoft and EA have patented various ways of bolting on neural networks to GPU (ASTC/BC7/BC6h) texture compressors, in order to accelerate determining the compression params (mode, partition etc.):

EA: https://patents.google.com/patent/US10930020B2/en

Microsoft: https://patents.google.com/patent/US10504248B2/en

What this boils down to: techniques like TexNN are potentially patented. This work was done in 2017/2018 but it wasn't public until 2019:
https://cs.unc.edu/~psrihariv/publication/texnn/

Over a decade ago, I and others researched attempting to bolt real-time GPU tex compressors after existing lossy compressors (like JPEG etc.). The results suffer because you're dealing with 2 generations of lossy artifacts. Also existing image compressors aren't designed with normal maps etc. 

If you're working on a platform with limited computing resources (the web, mobile), or limited to no SIMD (web), real-time recompression has additional energy and resource constraints.

Both my "crunch" library and Basis Universal bypass the need for fast real-time texture compression entirely. They compress directly in the GPU texture domain. This approach is used by Unity and many AAA video games:
https://www.mobygames.com/person/190072/richard-geldreich/

I've been researching the next series of codecs for Basis Universal. This is why I wrote RDO PNG, QOI and LZ4, and bc7enc_rdo.

I am strongly anti-software patent. All the EA/MS patents will do is push developers towards open solutions, which will likely be better in the long run anyway. Their patents add nothing and were obvious. These corporations incentivize their developers to patent everything they can get their hands on (via bonuses etc.), which ultimately results in exploiting the system by patenting trivial or obvious ideas. Ultimately this slows innovation and encourages patent wars, which is bad for everybody.

It's possible to speed up a BC7 encoder without using neural networks. An encoder can first try the first 16 partition patterns, and find which is best. The results can be used to predict which more complex patterns are likely to improve the results. See the table and code here - this works:
https://github.com/richgel999/bc7enc_rdo/blame/master/bc7enc.cpp#L1714

It's potentially possible to use a 3 or 4 level hierarchy to determine the BC7 partition pattern. bc7enc.cpp only uses 2 levels. This would reduce the # of partition patterns to examine to only a handful.

To cut down the # of BC7 modes to check, you can first rule out mode 7 because it's only useful for blocks containing alpha. Then try mode 6. If it's good enough, stop. Then try mode 1. If that's good enough, stop, etc. Only a subset of blocks need to use the complex multiple subset modes. In many cases the 3 subset modes can be ignored entirely with little noticeable impact on the results. The component "rotation" feature is usually low value.

These optimizations cause divergence in SIMD encoders, unfortunately. The Neural Network encoders also suffer from the same problem. Neural Network encoders also must be trained, and if the texture to compress doesn't resemble the training data they could have severe and unpredictable quality cliffs. 

TexNN was first published here: