For validation purposes we are creating 100% standard ASTC data from the UASTC blocks, unpacking these ASTC blocks using an open source ASTC decoder (the one in Basis Universal), then computing RGB/RGBA average PSNR.
UASTC->ASTC 45.14 (always lossless)
Original->Near-optimal BC1: 36.96 (stb_dxt STB_DXT_HIGHQUAL)
This ASTC subset's quality is on average only ~1.5 dB lower than near-optimal BC7 for opaque content, but it's 9.7 dB higher than near-optimal BC1. Both RGB and RGBA content look *really* good. Our experience building several production BC7 encoders helped guide us to the right ASTC modes.
As this is a universal GPU texture compression system it will support ALL LDR GPU texture formats, like Basis Universal does. Here's the plan for the other formats:
1. All blocks are always LDR 4x4 pixels, and all UASTC modes use integer weight bits for compatibility with BC7.
2. Only uses Color Endpoint Mode (CEM) 8 or 12 (RGB/RGBA Direct) to simplify the encoder/transcoder. The other CEM's don't help enough to justify the added complexity.
3. CEM 8 and 12 support Blue Contraction, which is never utilized in UASTC. Instead, we swap the subset's endpoints if the MSB of the last weight index is 1 (exactly like BC7). This guarantees the last weight index has an MSB of 0, so we don't need to store it in the packed block format.
The UASTC->ASTC transcoder needs to check the dequantized endpoints to see if blue contraction would kick in. If so, it'll need to invert the weight indices and swap the subset's endpoints.
4. The 2 and 3 subset modes are constrained to only use the set of common 2/3-subset partition patterns that are in common between ASTC and BC7, which we've documented on our blog and on Twitter. Total of 60 patterns (30+11+19).
5. Mode 7 uses a 3-subset BC7 mode, but only a 2-subset ASTC mode. Two of the BC7 subset endpoints are set to equal colors to simplify the 3-subset partition pattern into a 2-subset pattern that's compatible with ASTC. This gives us 19 more useful partitions.
6. Opaque encodings get transcoded to BC7 modes 1,2,3,5,6. Alpha encodings transcode to BC7 modes 5,6,7. BC7 modes 0 and 4 are unused.
8. BC7 and ASTC interpolate endpoints in a similar way, except ASTC endpoints are scaled up to 16-bits before interpolation and then only the top 8-bits are used. This is a surprisingly minor difference that a good encoder can work around by choosing the lowest overall BC7 error from the hundreds/thousands of possible UASTC configurations/partition patterns/endpoints/etc.
10. A driver could easily transcode UASTC texture data to ASTC or BC7 completely transparently to the user. The blocks are completely independent and the transcode step can be done 4-8 blocks at a time with SIMD operations.
UASTC Mode #, Dual Plane Flag, Texel Weights BISE Range Index (# quant levels), # Subsets, Endpoint BISE Range Index (# quant levels), BC7 Target Mode
0. DualPlane: 0, WeightRange: 8 (16), Subsets: 1, EndpointRange: 19 (192) MODE6 RGB
1. DualPlane: 0, WeightRange: 2 (4), Subsets: 1, EndpointRange: 20 (256) MODE3
2. DualPlane: 0, WeightRange: 5 (8), Subsets: 2, EndpointRange: 8 (16) MODE1
3. DualPlane: 0, WeightRange: 2 (4), Subsets: 3, EndpointRange: 7 (12) MODE2
4. DualPlane: 0, WeightRange: 2 (4), Subsets: 2, EndpointRange: 12 (40) MODE3
5. DualPlane: 0, WeightRange: 5 (8), Subsets: 1, EndpointRange: 20 (256) MODE6 RGB
6. DualPlane: 1, WeightRange: 2 (4), Subsets: 1, EndpointRange: 18 (160) MODE5 RGB
7. DualPlane: 0, WeightRange: 2 (4), Subsets: 2, EndpointRange: 12 (40) MODE2
Alpha (CEM 12):
9. DualPlane: 0, WeightRange: 2 (4), Subsets: 2, EndpointRange: 8 (16) MODE7
10. DualPlane: 0, WeightRange: 8 (16), Subsets: 1, EndpointRange: 13 (48) MODE6
11. DualPlane: 1, WeightRange: 2 (4), Subsets: 1, EndpointRange: 13 (48) MODE5
12. DualPlane: 0, WeightRange: 5 (8), Subsets: 1, EndpointRange: 19 (192) MODE6
13. DualPlane: 1, WeightRange: 0 (2), Subsets: 1, EndpointRange: 20 (256) MODE5
14. DualPlane: 0, WeightRange: 2 (4), Subsets: 1, EndpointRange: 20 (256) MODE6
Once you have the UASTC mode index, endpoint values, weight indices, and optionally the partition index and component rotation fields extracted from the UASTC block, unpacking proceeds in exactly the same way as with a standard ASTC block. It uses the same ASTC endpoint value dequantization method, the 2/3/4-bit texel indices are converted to [0,64] interpolation weights in the same way, and the endpoints are interpolated as 16-bit values. See in particular sections 18.11-18.20 in the Khronos ASTC data format specification:
The are so few partition patterns that a decoder could use lookup tables, or it could use the ASTC pattern generator function (in section 18.21) with the correct seeds. The UASTC format stores partition pattern indices, not 10-bit seeds, to save space.
This is an RDO codec, so we're depending on a good LZ codec for compression. To implement multiple quality levels the current plan is to use an LZ dictionary simulator, bit price estimator, and Lagrangian optimization to choose block selector bytes which have been recently emitted into the output data stream. The quality level will control the error threshold used to choose "good enough" selectors which we've already sent (so they'll be cheap for LZ to encode). We've implemented this before in Basis BC1, but that was with already quantized selectors. So there will be some things to figure out.
This system is designed to be compatible with and explicitly exploit KTX2's support for RDO compression overtop of block based formats.
A really good BC1-5 encoder takes into account how the approximations used by the various vendor GPU's actually decode the blocks: