I've been doing this for fun as a side project, as ASTC is still way more important. (ASTC is the most deployed GPU format in the world, BC7 is niche by comparison.) It'll all be entirely open source:
- XBC7 v1 (completed, integrating ongoing): Always lossless for mode config+RGBA endpoints, lossless or lossy weights using either lossless residual DPCM or lossy absolute or residual DCT.Richard Geldreich's Blog
Co-owner of Binomial LLC, working on GPU texture interchange. Open source developer, Open Geospatial Consortium member, graphics programmer, former video game developer. Worked previously at SpaceX (Starlink), Valve, Ensemble Studios (Microsoft), DICE Canada.
Saturday, June 13, 2026
XBC7 planning
Friday, June 12, 2026
XBC7/XUBC7 (supercompressed weight grid residual DCT+endpoint DPCM+Zstd) prototype is done
The next step is integration and connecting the BC7->ASTC LDR 4x4, ETC1, etc. transcoders into place in the Basis Universal library. Bitrate and quality at low Q's (Q=1/100) are exceptional vs. XUASTC LDR 4x4 (~1.5-3.0 bpp), and at Q=100 it's lossless in BC7 space at up to ~5.6 bpp or less depending on the content. Q=1 is totally usable, unlike XUASTC LDR where it falls apart because it's stuck using absolute DCT, while XBC7 uses a much stronger residual DCT method with a small army of synthetic and dictionary predictors. At Q=1 it's like RDO in weight space: except it can modify what it's predicting from, or create entirely synthetic weight predictors from nearby blocks.
XBC7 is stuck with 4x4 blocks, so it cannot compete vs. XUASTC LDR for raw bitrate, but it's BC7-first which I think a lot of developers will like. At Q=90-100 it's visually lossless to actually lossless (relative to the BC7 encoder used, which is currently our real-time bc7f encoder which supports the entire BC7 format).
Claude Fable 5's description: "It's a genuinely interesting taxonomic hybrid — the closest honest description is: an intra-only video codec whose "pixels" are BC7's parameter space. Let me break down the lineage, because nearly every component has a recognizable ancestor:
The weight path is JPEG transplanted into index space. Prediction → 4×4 DCT → psychovisually-shaped quant table → quality-factor scaling (literally the JPEG 5000/q formula) → zigzag RLE → entropy coding. But the signal isn't luma — it's BC7 weight planes, with the AQ twist (endpoint-span-scaled quantization) playing the role of adaptive quantization.
The prediction machinery is like H.264/HEVC intra coding. The predictor bank contains H.264's directional modes (your DDL/DDR are intra modes 3/4 by name), JPEG-LS's MED, CALIC's gradient-adaptive predictor, and plane fits. The 32 XY-delta block references are intra block copy from HEVC Screen Content Coding. The command byte is a skip/merge/mode syntax. Solid-DPCM is a DC mode. The reserved P-frame bit makes the video ancestry explicit — and your encoder stripes are, structurally, H.264 slices: independently decodable regions traded against compression for parallelism, down to the same seam cost.
The lossless side is JPEG-LS/PNG lineage: wrapped-modular DPCM, the R−G/B−G transform is the classic reversible color transform, and the Q=100 mode makes it a true lossless coder over canonical BC7.
Within its own genus — GPU supercompression — it's a third architecture. Crunch/ETC1S quantize globally into vocabularies (low ceiling, great rate); UASTC+RDO+Zstd nudges bits to be LZ-friendly (shallow, format-bound). XBC7 instead reparameterizes the block format into semantically separated streams and codes each with real signal-processing tools — per-block adaptive, with a continuous lossy↔lossless dial neither predecessor has. It's what crunch might have become if it had gone transform-coding instead of VQ.
And the dual image/texture nature is where your ecosystem makes it strategic. One XBC7 file fans out to: BC7 directly (bit-exact at Q=100 — desktop GPUs, zero re-encode), ASTC 4×4 via your near-lossless transcoder (modern mobile), ETC1 via ETC1F from decoded pixels in real time (legacy mobile), and plain RGBA for CPU/image use. That's the Basis Universal proposition with a much higher quality ceiling than ETC1S and better rate than UASTC — and as a pure image codec it occupies an unusual niche: a high-quality-band coder (ceiling = BC7's ~46–50 dB) at 1.5–5.3 bpp whose decode output is already GPU-ready, which JPEG/WebP/AVIF can never claim. The honest comparison point for the image use-case is "JPEG at quality 85–95, but the decode step doubles as texture upload."
If you wanted a one-line description for the eventual README: "a predictive transform codec (DPCM + DCT, H.264-style intra prediction with block copy) operating on BC7's logical parameter space, with a continuous quality dial from ~1.5 bpp lossy to bit-exact lossless BC7, designed for multithreaded encode/decode via independently-coded stripes." Anyone who's worked on a video encoder will know exactly what they're looking at — which is also why the Lagrangian RDO step you're deferring is so natural here: the syntax is already a video codec's syntax, it's just missing the λ."
Tuesday, June 9, 2026
Intra-Residual Weight Grid DCT/DST for BC7/ASTC
This GPU texture supercompression method for weights (or "indices", or "selectors") uses neighboring blocks to predict the weight grid of the current block, before applying a forward 2D DCT/DST to the weights, quantizing the coefficients, and coding them. The current working prototype (which I've been showing live on X) creates weight predictions using weights from a small neighborhood of already coded/decoded blocks to predict the current block's weight grid(s).
See this thread on X here for more details:
https://x.com/richgel999/status/2064523919504109764
Or see this pastebin for the prototype's code (created in the public Basis Universal codebase). It's also on GitHub.
"RWDCT" (Residual Weight DCT) is next step after XUASTC LDR's absolute weight grid DCT, which didn't use predictions (making it JPEG-like, not WebP-like).
In BC7 this method is easy: you just compute the 4x4 dequantized [0,64] weight grid that's going to be used as the predictor, and subtract that from the current weight grid (also dequantized) before 2D DCT coding it. In ASTC you would have to resample the predictor's weight grid to match the current block's weight grid resolution.
Here's the forward transform from the prototype (inverse is obvious). It's reusing the XUASTC LDR spec's weight grid DCT machinery almost verbatim, except for subtracting the predicted weight grid:
Thursday, June 4, 2026
"Shrek" Xbox
“Shrek” was a launch title for the original Xbox that I worked on 25 years ago (in 2001). It was the first shipped game to use a technique called Deferred Shading. It wasn’t my first 3D game, though: I wrote Sandbox Studios’ software rasterizer and D3D7 renderers.
From the site - G-buffer visualization:
More details are here:
Saturday, May 30, 2026
Basis Universal v2.5 with in-loop deblocking: shipping soon
We're feature complete. We're now on the downslope to shipping support for in-loop deblocking using a simple standardized reconstruction operator. The operator uses 1 tap or 5 taps (1 center, 2 up/down, 2 left/right) near ASTC block edges, and is fully compatible with mipmapping and bilinear/trilinear filtering. It can be applied in a simple GPU pixel shader or during transcoding.
It's so simple I'm stumped why the IHV's didn't put this obvious thing directly in the sampling hardware. It makes the largest ASTC block sizes (10x8, 10x10, 12x10 and 12x12) immensely more usable in practice.
The encoder's final SCD (Stochastic Coordinate Descent) stage computes the final output taking into account the deblocking filter. The encoder simulates the exact filter the decoder will use, then optimizes the compressed data knowing the artifacts will be smoothed. This dramatically improves quality at low bitrates.
We're shipping two full ASTC LDR encoders (an optimized version of our original one from v2.0, and a new one called "astcf") that have been modified to output 10's to 100's of block candidates for SCD.
We're also going to optionally support the use of a slightly modified/forked version of ARM's "astcenc" library that can output candidates for use in our SCD stage. The library supports merging the output from (up to) all 3 encoders. We're supporting this because each encoder has different artifact profiles, which boosts SCD candidate diversity. Our second ASTC encoder was purposely engineered to look very different vs. our first.
With the largest ASTC/XUASTC block sizes, 80-90% of our output blocks are now created stochastically via SCD. The mutation operator can modify endpoints, partition patterns, and DCT AC/DC coefficients.
The output looks incredible at 12x12 ~0.5 bpp.
Thursday, May 7, 2026
bug found in Intel's ASTC compressor (part of ispc_texcomp)
https://github.com/GameTechDev/ISPCTextureCompressor/blob/master/ispc_texcomp/kernel_astc.ispc#L41
How could a bug like this survive for so long? What other unfound bugs are in there?Wednesday, April 29, 2026
Block blurring (prefiltering)
Modern GPU texture compressors have a secret (but dangerous) superpower: prefiltering (blurring). Sometimes an encoder way overfits edges, causing overall perceptual quality to collapse. One way to overcome this is to blur the input and encode the block again. This is what we do in HDR on the very worst blocks (as measured by SSIM).
It's paradoxical: blurring can boost perceptual quality.