Wednesday, April 29, 2026

Large block size ASTC has been misunderstood

Most game developers misunderstand ASTC: the largest block sizes were intended for the largest resolution content (4k). GPU shader deblocking is easy: at the largest block sizes it's essentially free (because the bottleneck is typically memory bandwidth on mobile/tablets, not some ALU or cached tfetches).

The largest block sizes collapse memory consumption/bandwidth enormously (0.89-1.28 bpp for 12x12 or 10x10). 4 extra samples at block boundaries that are extremely likely to hit the texture cache (because they sample into a neighbor block) are going to be dirt cheap.

Once you add a form of well-specified deblocking, the next step is to make an encoder that is deblocking aware. Then you can heavily exploit deblocking - just like all modern image/video codecs have done for decades.

Unfortunately the deployment story with ASTC so far has been pretty spotty: There are few available full-format encoders because the format is so complex.

Intel's encoder in ispc_texcomp (now archived/no longer supported) didn't support 2-4 subsets (!!), only up to 8x8, and was broken (misusing/underutilizing modes in our analysis).

ARM's LDR encoder is good (in native, WASM story seems weaker) but it isn't deblocking aware and doesn't support supercompression.

Beyond ~2k ASTC 12x12 can look exceptional with a very simple deblocking shader (1+4 bilinear or trilinear samples only at block edges, mipmapping/filtering compatible), and the memory savings are huge.

Some related info is here:

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.