Modern GPU texture compressors have a secret (but dangerous) superpower: prefiltering (blurring). Sometimes an encoder way overfits edges, causing overall perceptual quality to collapse. One way to overcome this is to blur the input and encode the block again. This is what we do in HDR on the very worst blocks (as measured by SSIM).
It's paradoxical: blurring can boost perceptual quality.Co-owner of Binomial LLC, working on GPU texture interchange. Open source developer, Open Geospatial Consortium member, graphics programmer, former video game developer. Worked previously at SpaceX (Starlink), Valve, Ensemble Studios (Microsoft), DICE Canada.
Wednesday, April 29, 2026
Large block size ASTC has been misunderstood
Most game developers misunderstand ASTC: the largest block sizes were intended for the largest resolution content (4k). GPU shader deblocking is easy: at the largest block sizes it's essentially free (because the bottleneck is typically memory bandwidth on mobile/tablets, not some ALU or cached tfetches).
The largest block sizes collapse memory consumption/bandwidth enormously (0.89-1.28 bpp for 12x12 or 10x10). 4 extra samples at block boundaries that are extremely likely to hit the texture cache (because they sample into a neighbor block) are going to be dirt cheap.
Once you add a form of well-specified deblocking, the next step is to make an encoder that is deblocking aware. Then you can heavily exploit deblocking - just like all modern image/video codecs have done for decades.
Unfortunately the deployment story with ASTC so far has been pretty spotty: There are few available full-format encoders because the format is so complex.
Intel's encoder in ispc_texcomp (now archived/no longer supported) didn't support 2-4 subsets (!!), only up to 8x8, and was broken (misusing/underutilizing modes in our analysis).
ARM's LDR encoder is good (in native, WASM story seems weaker) but it isn't deblocking aware and doesn't support supercompression.
Petascale changes everything
Let's say you have multiple petabytes of JPEG's/WebP/etc. content and you need (not want - because your competitors are doing it already) to add GPU texture support to your app. What do you do?
- Encode 2-3x times without supercompression (BC7+ASTC, maybe ETC1 too, no supercompression): utterly impractical, explodes (8x-16x or more) overall content size. Even 1 format (say ASTC) without supercompression is impractical - explodes content size.
- Use supercompression to a universal texture format, transcode on device (native or plain WASM): Adds another ~1-1.5 petabytes. The tech is entirely free, has no driver dependencies and is standardized by Khronos.
- Use compute shaders, try to transcode on device: but now you're endlessly chasing ever-changing mobile GPU driver bugs until the end of time, outliers can't use your app reliably. You're also stuck with large textures in VRAM because you can't exploit the largest ASTC block sizes (beyond 6x6, and even 6x6 sacrifices quality due to compute shader issues). Supercompressed solutions can readily exploit ASTC 8x8 (2bpp in VRAM), 10x10 (1.28 bpp) and 12x12 (0.89 bpp), while compute shader solutions are limited to the smallest block sizes (3.56 bpp - 8 bpp) and have to make sacrifices (such as disabling dual plane support in some scenarios) to even achieve that.
Tuesday, April 21, 2026
Thursday, April 9, 2026
XUASTC's next step: Intra-prediction of weight grids
Binomial has shown that image compression and GPU texture compression aren't separate fields. They're the same field, and the tools from one transfer directly to the other.
XUASTC is currently using JPEG-style DCT (from 1992) on ASTC weight grids:
https://github.com/BinomialLLC/basis_universal/wiki/XUASTC-LDR-Weight-Grid-DCT
We ported JPEG-style coding into ASTC, even preserving how libjpeg-style [1-100] Q factors are used to calculate quantization tables. (Our quantization table is the standard luminance JPEG table, with simple adaptive quantization added on top.)
This works, but it means the DCT has to carry the entire weight signal (just like JPEG). At the very lowest quality factors (Q levels 1-25 or so), the lowest spatial frequencies suffer (again, just like JPEG).
The next step is to port WebP-style intra-prediction into the weight grid domain. We can easily predict weight grids from nearby blocks, then code the weight residuals using DCT. It's the logical next step, and it'll push our bitrates even lower. While seemingly everyone is distracted by neural techniques, we're targeting billions of already shipped, hyper-efficient hardware decoders.
Thursday, April 2, 2026
First XUASTC LDR 4x4 rate-distortion graphs
Monday, March 9, 2026
Basis Universal v2.1 library wiki mirror
It's been automatically converted to HTML and mirrored outside of GitHub here: Home.
A static mirror of the GitHub repo is here.
Sunday, March 8, 2026
The KTX-Software repo has been forked
Binomial LLC has forked the Khronos Group's KTX-Software repo, to use as a staging ground for next-generation GPU texture compression technology:
https://github.com/BinomialLLC/KTX-Software-Binomial-Fork/
Wednesday, March 4, 2026
Updated: "ASTC GPU Texture Decoding: Software Decoders, Spec Issues, and ARM Errata"
https://github.com/BinomialLLC/basis_universal/wiki/List-of-Available-Software-ASTC-Decoders
Sunday, March 1, 2026
.ASTC (the File Format): No Longer a Black Box
The basisu command line tool has a new option, -peek, which opens any standard ARM LDR/HDR .ASTC texture file, unpacks each block, and computes a bunch of statistics about the exact ASTC configurations the blocks used.
This is how we found out that Intel's ispc_texcomp's ASTC encoder is, for all practical purposes, broken.
Monday, February 23, 2026
Sunday, February 22, 2026
ARM ASTC Decoding Errata
See here:
https://documentation-service.arm.com/static/67ca1a5ece2747241fced502
More general info on the real-world complexities of decoding ASTC are here:https://github.com/BinomialLLC/basis_universal/wiki/List-of-Available-Software-ASTC-Decoders
Wednesday, February 18, 2026
Some things we've learned about GPU textures at planetary scale
1. ASTC is now the king: In billions of devices. Everything else=fallback, including BC7.
To us, BC7 is essentially a greatly simplified ASTC, but with some p-bits.
2. At multi-petabyte (planetary) scales: Supercompression bitrate=Matters enormously.
Notably, game developers (who have been using compressed textures the longest) don't work at scales this large, so the approaches and techniques they assume are correct or standard in this domain may not apply at all in extreme scales.
The tradeoffs game developers have made in the past are no longer aligned with modern hardware and network realities.
3. GPU Drivers=super sketchy.
This means for us: No driver usage, no compute. The largest vendors, who already deal with endless GPU driver bugs, don't want even more exposure in critical texture decompression/transcoding paths. If it fails for even ~.1% of customers in the wild, it's unusable.
4. All 14 ASTC block sizes are important in large scale deployment scenarios, not just 2 or 3.
This includes 12x12, which at 4k-8k is quite usable.
5. When mipmap and filtering compatible deblocking of the larger ASTC block sizes is trivial to do in a tiny pixel shader, it makes no sense not to deblock because the cost of not doing so is ~2x-8x more bitrate and bandwidth.
6. Unfortunately, LDR ASTC decoding actually isn't always bitwise exact. (BC7 wins for this, at least.)
The ASTC specification was so dense and complex even the vendors (including ARM itself!) couldn't get it right.
7. WASM SIMD isn't everywhere (or even when it is, some very big vendors won't allow it to be used or enabled), so that means we can't depend on SIMD. This means less searching, more math, and better algorithms in our encoders, or we can't ship.
8. Everything must be fuzzed. That means the obvious things like block decoders, all decompressors, etc. but it also includes encoders. Trust no data.
Tuesday, February 10, 2026
ASTC Texture Sampling with Deblocking in a Simple Pixel Shader
Deblocking is a standard feature in modern image/video codecs, and now developers can benefit from deblocking on GPU textures, either while transcoding to other formats like BC7, or while sampling ASTC textures directly.
This demo with source code shows how to sample ASTC textures (or really any GPU texture format, of any block size) with deblocking applied in a simple pixel shader. It's intended for the larger ASTC block sizes, i.e. beyond 6x6. It greatly reduces block artifacts, which allows larger block sizes to be used across a wider range of content, which ultimately lowers bitrates, memory bandwidth, and download sizes. It's fully compatible with mipmap filtering.
https://github.com/BinomialLLC/basis_universal/tree/master/shader_deblocking
This is a form of "GPU texture compression-aware shading" or "GPU format-informed reconstruction".
Saturday, January 31, 2026
Thursday, January 29, 2026
JPEG for ASTC
See the wiki page in the Basis Universal v2.0 repo here:
https://github.com/BinomialLLC/basis_universal/wiki/JPEG-for-ASTC
Tuesday, January 27, 2026
bc7f: A New Real-Time Analytical BC7 Encoder
bc7f: Prediction, Not Search
The portable, non-SIMD bc7f encoder relies on an analytical, statistics-driven error model rather than iterative search. This full featured (all BC7 modes, all mode features, all dual-plane channels, all partition patterns), strictly bounded O(1) real-time encoder exploits simple closed-form expressions to predict which BC7 mode family (4/5, 0/2, 1/3/7, or 6) is worth considering. It then estimates the block’s SSE/MSE for each candidate using lightweight block statistics derived from covariance analysis together with the mode’s weight and endpoint quantization characteristics. All of this is performed prior to encoding any BC7 modes. In purely analytical mode, bc7f predicts, encodes the input to a single BC7 mode configuration (without any decoding or error measurement), and returns.
BC7 block decoding is an affine interpolation between quantized endpoints using quantized weights, which allows first-order error propagation to be modeled directly. For a given block, the encoder computes basic statistics such as the covariance of the input texels; the principal axis derived from the covariance is used both for endpoint fitting and to estimate the orthogonal least-squares (“line fit”) residual error as trace(covariance) − λ₁. Quantization noise from endpoints and weights is modeled independently using uniform quantization assumptions, with endpoint error contributing an additive term and weight/index error contributing a span-dependent term proportional to the squared endpoint distance. These closed-form estimates are sufficient to predict relative SSE across BC7 mode families, partitions, and dual-plane configurations without trial encodes. As a result, bc7f can select parameters and emit a single BC7 block in strictly bounded time, producing deterministic, high-quality results without brute-force search or refinement.
bc7f is significantly faster than bc7e.ispc Level 1, but because it exploits the entire BC7 format, it isn’t as brittle. It's a “one-shot”, non-AbS (analysis by synthesis), but full featured encoder. The follow-up, “bc7g” is in the works, and it will be released as open source as well.
Binomial first developed these techniques for our full-featured (all block size) ASTC encoder, which is vastly more complex, and later used them to implement bc7f. We expect these predictive, analytical encoding techniques to be rapidly adopted.