Modern GPU texture compressors have a secret (but dangerous) superpower: prefiltering (blurring). Sometimes an encoder way overfits edges, causing overall perceptual quality to collapse. One way to overcome this is to blur the input and encode the block again. This is what we do in HDR on the very worst blocks (as measured by SSIM).
It's paradoxical: blurring can boost perceptual quality.Richard Geldreich's Blog
Co-owner of Binomial LLC, working on GPU texture interchange. Open source developer, Open Geospatial Consortium member, graphics programmer, former video game developer. Worked previously at SpaceX (Starlink), Valve, Ensemble Studios (Microsoft), DICE Canada.
Wednesday, April 29, 2026
Large block size ASTC has been misunderstood
Most game developers misunderstand ASTC: the largest block sizes were intended for the largest resolution content (4k). GPU shader deblocking is easy: at the largest block sizes it's essentially free (because the bottleneck is typically memory bandwidth on mobile/tablets, not some ALU or cached tfetches).
The largest block sizes collapse memory consumption/bandwidth enormously (0.89-1.28 bpp for 12x12 or 10x10). 4 extra samples at block boundaries that are extremely likely to hit the texture cache (because they sample into a neighbor block) are going to be dirt cheap.
Once you add a form of well-specified deblocking, the next step is to make an encoder that is deblocking aware. Then you can heavily exploit deblocking - just like all modern image/video codecs have done for decades.
Unfortunately the deployment story with ASTC so far has been pretty spotty: There are few available full-format encoders because the format is so complex.
Intel's encoder in ispc_texcomp (now archived/no longer supported) didn't support 2-4 subsets (!!), only up to 8x8, and was broken (misusing/underutilizing modes in our analysis).
ARM's LDR encoder is good (in native, WASM story seems weaker) but it isn't deblocking aware and doesn't support supercompression.
Petascale changes everything
Let's say you have multiple petabytes of JPEG's/WebP/etc. content and you need (not want - because your competitors are doing it already) to add GPU texture support to your app. What do you do?
- Encode 2-3x times without supercompression (BC7+ASTC, maybe ETC1 too, no supercompression): utterly impractical, explodes (8x-16x or more) overall content size. Even 1 format (say ASTC) without supercompression is impractical - explodes content size.
- Use supercompression to a universal texture format, transcode on device (native or plain WASM): Adds another ~1-1.5 petabytes. The tech is entirely free, has no driver dependencies and is standardized by Khronos.
- Use compute shaders, try to transcode on device: but now you're endlessly chasing ever-changing mobile GPU driver bugs until the end of time, outliers can't use your app reliably. You're also stuck with large textures in VRAM because you can't exploit the largest ASTC block sizes (beyond 6x6, and even 6x6 sacrifices quality due to compute shader issues). Supercompressed solutions can readily exploit ASTC 8x8 (2bpp in VRAM), 10x10 (1.28 bpp) and 12x12 (0.89 bpp), while compute shader solutions are limited to the smallest block sizes (3.56 bpp - 8 bpp) and have to make sacrifices (such as disabling dual plane support in some scenarios) to even achieve that.
Tuesday, April 21, 2026
Thursday, April 9, 2026
XUASTC's next step: Intra-prediction of weight grids
Binomial has shown that image compression and GPU texture compression aren't separate fields. They're the same field, and the tools from one transfer directly to the other.
XUASTC is currently using JPEG-style DCT (from 1992) on ASTC weight grids:
https://github.com/BinomialLLC/basis_universal/wiki/XUASTC-LDR-Weight-Grid-DCT
We ported JPEG-style coding into ASTC, even preserving how libjpeg-style [1-100] Q factors are used to calculate quantization tables. (Our quantization table is the standard luminance JPEG table, with simple adaptive quantization added on top.)
This works, but it means the DCT has to carry the entire weight signal (just like JPEG). At the very lowest quality factors (Q levels 1-25 or so), the lowest spatial frequencies suffer (again, just like JPEG).
The next step is to port WebP-style intra-prediction into the weight grid domain. We can easily predict weight grids from nearby blocks, then code the weight residuals using DCT. It's the logical next step, and it'll push our bitrates even lower. While seemingly everyone is distracted by neural techniques, we're targeting billions of already shipped, hyper-efficient hardware decoders.