Monday, January 27, 2020

UASTC1 block format encoding, revision 5

UASTC1 (subsequently referred to as "UASTC" in this doc) is a 15 mode 4x4 pixel LDR-only subset of the ASTC specification with a simpler 128-bit block format. It can be quickly losslessly transcoded to the standard ASTC block format, quickly transcoded to BC7 with very low quality loss (.75 RGB dB PSNR on average), or re-encoded to high quality ETC1, ETC EAC A8, or BC1-5 with a small amount of per-pixel work. There are 8 opaque modes, 1 solid color mode, and 6 alpha modes. These UASTC modes each map to one of 6 BC7 modes (all except 0 and 4).

UASTC is the first high quality universal (or virtual) block-based texture format that supports block partitioning along with the ability to be efficiently transcoded to multiple GPU texture formats. Transcoding can either be done using the CPU or with a GPU compute shader. Transcoding UASTC to ASTC and BC7 do not involve any pixel-level operations. UASTC's fields directly correspond to ASTC's fields whenever possible.

See the previous post for a high-level description of each UASTC mode and encoder performance/features.

The UASTC block format has hint bits to accelerate transcoding to various texture formats. There are up to two BC1 hint bits per block which direct the UASTC->BC1 transcoder to reuse the UASTC endpoint and/or weight indices (appropriately scaled) for faster real-time compression. On average ~60% of UASTC blocks don't need PCA, and ~30% don't need real-time BC1 encoding at all. There are also hint bits to accelerate transcoding to high quality ETC1 and ETC2 EAC A8.

This post shows how each mode is laid out in a 128-bit UASTC block at the bit level. Bits are written starting from the beginning of the block (at the first byte's LSB) working "down" towards bit 128. The mode field is always first and is stored at bit 0 in the block (bit 0 of byte 0).

This is a snapshot of the current encoding. This may change somewhat over the next few weeks.

Planned Additions for UASTC2


We're going to add several "LA" (Luminance Alpha) modes, primarily for XY tangent space normal maps. We may also add a BC1-like mode to give the encoder more rate distortion options. To extend the format, we will emit UASTC1 mode index 15, then emit additional Huffman coded mode bits.

License


The UASTC specification and block format is explicitly not copyrighted by any entity, and to our knowledge is patent free. It may be used for any purpose whatsoever, including commercial purposes. The author of this work hereby waives all claim of copyright (economic and moral) in this work and immediately places it in the public domain; it may be used, distorted or destroyed in any manner whatsoever without further attribution or notice to the creator.

Field Definitions:


Mode: Huffman coded mode index (1-5 bits). The mode index range is [0,15]. One mode index (15) is saved for future expansion. The first bit of the Huffman code is the LSB, which is stored in bit 0 byte 0 of the UASTC block.

The Huffman codes and code lengths for each mode index are:

{ 0x1, 4 }, { 0xD, 5 }, { 0x1D, 5 }, { 0x3, 5 },
{ 0x13, 5 }, { 0xB, 5 }, { 0x1B, 5 }, { 0x7, 5 },
{ 0x17, 5 }, { 0xF, 5 }, { 0x2, 3 }, { 0x0, 2 },
{ 0x6, 3 }, { 0x1F, 5 }, { 0x9, 4 }, { 0x5, 4 }

BC1H0, BC1H1: BC1 transcoding acceleration hints:

BC1H0: If set the transcoder can scale the first subset's UASTC endpoints to BC1 (5,6,5) endpoints, and then scale (or copy) the UASTC weight indices to BC1 2-bit weights. This skips the expensive PCA and least squares steps involved in real-time BC1 encoding.

BC1H1: If set the transcoder scales (or copies) the UASTC weight indices to BC1 2-bit weights. Least squares (1 or 2 iterations) can then immediately be used to compute the BC1 endpoints. This skips the expensive PCA step.

All modes (except for solid color) have BC1H0, and most modes have BC1H1.

ETC1F, ETC1D, ETCI0, ETCI1: 8-bits of ETC1 transcode hints (flipped subblocks flag bit, differential encoding flag bit, intensity table index 0, intensity table index 1).

These hints are used by the transcoder to quickly create ETC1 blocks from the unpacked UASTC texels. To use them, the transcoder computes each 4x2 or 2x4 subblock's average color, quantizes them to 555:333 or 444:444 bits, then computes the selectors in luma space. No other work is necessary (because all the hard work was done in the UASTC encoder).

ETC1BIAS: A 5-bit field indicating how to bias each ETC1's subset's computed block color. The encoder chooses the bias field which results in lowest overall ETC1 error. See the ETC1 bias helper function near the end of this document.

ETC2TM: 8-bits of ETC2 EAC A8 transcode hints (4-bit table, 4-bit multiplier)

This is similar to how ETC1 blocks are packed, except these hints are for the alpha portion of ETC2 EAC A8 blocks. These bits are only present in modes 9-14 (the alpha modes).

ETQ: Packed endpoint trits/quints values. A simplified form of BISE is used in UASTC, see:
https://www.khronos.org/registry/DataFormat/specs/1.1/dataformat.1.1.html#astc-integer-sequence-encoding

See the "UASTC BISE Endpoint Ranges table" below for the # of trits or quints for each endpoint range. Some of the ranges don't have trits or quints, so there will be no ETQ fields.

We store the trits/quints first, followed by each value's bits. The bit interleaving and trit/quint rearranging and preprocessing in section 18.2 aren't used. Instead the encoded trits/quints are stored in UASTC as-is.

For quints, each encoded value is up to 7-bits: quint2*25+quint1*5+quint0, and similar for trits except each encoded value is up to 8-bits. When the number of endpoint values isn't a multiple of 5 or 3 values, the size of the final code is the minimum # of bits necessary to represent the encoded value (to save bits).

EBITS: Endpoint bits (one set of bits per ASTC endpoint value). See the "UASTC BISE Endpoint Ranges table" below for the # of bits for each endpoint range. Endpoint order is the same as ASTC's: RL, RH, GL, GH, BL, BH, etc. Max of 18 values (RGB 3-subsets: 3*2*3).

To retrieve the endpoint values, you extract the trits/quints from the encoded ETQ values, shift each one left the appropriate number of bits (depending on the UASTC mode's endpoint range) and logically OR in the EBITS values.

Endpoint values are a sequence of integers that must be dequantized to [0,255] by following the ASTC spec in section 18.13, see:
https://www.khronos.org/registry/DataFormat/specs/1.1/dataformat.1.1.html#astc-endpoint-unquantization

WEIGHTS: Encoded weight indices. Just like BC7, the first weight of each subset's "anchor" texel index always has a MSB of 0, so these weights can be encoded with one less bit than the others. (UASTC doesn't use Blue Contraction so we can use this trick.)

Weights are always encoded as plain bits (no BISE necessary). Weight ordering is the same as ASTC's (raster order, left to right/top to bottom scanline). In dual plane mode, the ordering is also ASTC's: p0 p1, p0 p1, p0 p1, etc. (two weight indices per texel).

The weights are dequantized to 6-bit interpolation values in the same way as ASTC's:
https://www.khronos.org/registry/DataFormat/specs/1.1/dataformat.1.1.html#_weight_unquantization

And the endpoints are interpolated in the same way as ASTC's:
https://www.khronos.org/registry/DataFormat/specs/1.1/dataformat.1.1.html#astc_weight_application

PAT: Index into the common BC7/ASTC partition pattern table. This table contains BC7 pattern indices, ASTC pattern seeds, and permutation/flip flags which indicate how to map ASTC pattern subset indices to BC7's. There are three tables and 60 total partition patterns.

A UASTC decoder can either use ASTC's partition pattern generator or BC7's partition tables. To map ASTC's partition patterns to BC7's, the pattern subset indices are either used as-is, inverted, permuted, and/or combined to get BC7 partition pattern subset indices (see the tables/example code at the very bottom). These simple transformations correspond to changing the order of the encoded BC7 endpoints, or setting 2 endpoints in a 3-subset BC7 block to the same color/alpha values. Every ASTC pattern included in the below common tables maps to a BC7 pattern without loss (i.e. there is no subset "crosstalk" when mapping a UASTC to a BC7 pattern).

COMPSEL: ASTC's Color Component Selector (CCS) field. Only present in Dual Plane modes.
This maps to BC7 mode 5's 2-bit component rotation field. The CCS value must be remapped and the endpoint RGBA components reordered when transcoding to BC7.

ASTC and BC7 handle dual plane mode component rotations slightly differently. In ASTC, if the CCS field is 0 for red, the red component still appears in its usual position in the endpoint values, and the decoder uses the 2nd plane to separately interpolate the red (or green, or blue, etc.) channel. In BC7, if the component rotation field is 1 (red), the red component is swapped with alpha at encode time, the alpha component is always interpolated with the 2nd plane's weight indices by the decoder, and the components are then swapped after endpoint interpolation. To losslessly transcode ASTC dual plane modes to BC7, you have to swap the appropriate ASTC endpoint channel with the alpha channel.

Other notes:


- The number of color components is 3 for modes [0,7], or 4 for modes [8,14].
- The number of subsets is [1,3].
- The total number of endpoint values is num_comps * 2 * num_subsets.
- The number of planes is either [1,2].
- The total number of weight values is either 16 (non-dual plane modes) or 32 (dual plane modes).
- Dual plane modes always have 1 subset in UASTC.
- Weight indices are always 1, 2, 3, or 4-bits for compatibility with BC7. BISE is not used at all for weight indices, only endpoints.
- Various endpoint value ordering examples (UASTC and ASTC use the same endpoint orderings):
1 subset RGB: RL0 RH0 GL0 GH0 BL0 BH0
1 subset RGBA: RL0 RH0 GL0 GH0 BL0 BH0 AL0 AH0

2 subset RGB: RL0 RH0 GL0 GH0 BL0 BH0 RL1 RH1 GL1 GH1 BL1 BH1
2 subset RGBA: RL0 RH0 GL0 GH0 BL0 BH0 AL0 AH0 RL1 RH1 GL1 GH1 BL1 BH1 AL1 AH1

In dual plane mode, the UASTC components are NOT reordered like they would be in BC7. The COMPSEL field corresponds to the ASTC CCS field, which indicates which color component to separately interpolate with the 2nd plane weight indices:

Dual plane RGB: RL0 RH0 GL0 GH0 BL0 BH0
Dual plane RGBA: RL0 RH0 GL0 GH0 BL0 BH0 AL0 AH0

- Transcoding UASTC->ASTC is always a 100% lossless operation. The endpoints may need to be swapped (and the corresponding weight indices inverted) to disable blue contraction, but this is always a lossless transformation.
- The primary source of loss when transcoding UASTC->BC7 is mapping UASTC endpoints to BC7 endpoints. This is done using a simple scale with optional optimal p-bit computation. The UASTC weight indices are either copied as-is, or converted to the closest corresponding BC7 weight indices using a lookup table. The partition patterns are lossless, the weight tables are the same for 2/3-bits and very similar for 4-bits, and the endpoint interpolation method is nearly the same (16-bits in UASTC/ASTC, 8-bits with BC7, and both formats use [0,64] weights with rounding in the linear interpolation).
- Unlike ASTC, the weights are not stored in reverse bit order starting from the end of the block. Instead they are stored immediately following the endpoint bits in regular (LSB first) bit order.
- The CEM field(s) are always 8 (RGB Direct) for modes 0-7, and 12 (RGBA Direct) for 9-14. Blue Contraction isn't supported (i.e. the UASTC endpoints can be in arbitrary order, which we exploit to free up index bits like BC7 does). Mode 8 is void-extent.
- Weight index packing is similar to BC7's: Each subset's endpoints are swapped as necessary so the first weight index (that uses that subset) MSB is 0. If the mode uses a single subset, the first weight index MSB must be 0. For multiple subset modes, the first weight index written for each subset in the partition pattern must have an MSB of 0. The "anchor" texel indices for each pattern can be precomputed and stored in a table, like with BC7. This saves 1, 2, or 3 bits in the packed block which can be repurposed for other uses. In dual plane modes (which are always one subset), the first two weight indices must have an MSB of 0.
- We have not attempted to optimize the block format for efficient hardware RTL implementation.

Modes:

The format of "WeightRange" and "EndpointRange" is Range: Index (# of Quant Levels).
Format is "field: bit_offset num_bits"

**** Mode: 0 (CEM 8)
DualPlane: 0, WeightRange: 8 (16), Subsets: 1, EndpointRange: 19 (192) - BC7 MODE6 RGB

Mode: 0 4
BC1H0: 4 1
BC1H1: 5 1
ETC1F: 6 1
ETC1D: 7 1
ETC1I0: 8 3
ETC1I1: 11 3
ETC1BIAS: 14 5
ETQ: 19 8
ETQ: 27 2
EBITS: 29 6
EBITS: 35 6
EBITS: 41 6
EBITS: 47 6
EBITS: 53 6
EBITS: 59 6
WEIGHTS: 65 63
Total bits: 128, endpoint bits: 46, weight bits: 63

**** Mode: 1 (CEM 8)
DualPlane: 0, WeightRange: 2 (4), Subsets: 1, EndpointRange: 20 (256) - BC7 MODE3

Mode: 0 5
BC1H0: 5 1
BC1H1: 6 1
ETC1F: 7 1
ETC1D: 8 1
ETC1I0: 9 3
ETC1I1: 12 3
ETC1BIAS: 15 5
EBITS: 20 8
EBITS: 28 8
EBITS: 36 8
EBITS: 44 8
EBITS: 52 8
EBITS: 60 8
WEIGHTS: 68 31
Total bits: 99, endpoint bits: 48, weight bits: 31

**** Mode: 2 (CEM 8)
DualPlane: 0, WeightRange: 5 (8), Subsets: 2, EndpointRange: 8 (16) - BC7 MODE1

Mode: 0 5
BC1H0: 5 1
BC1H1: 6 1
ETC1F: 7 1
ETC1D: 8 1
ETC1I0: 9 3
ETC1I1: 12 3
ETC1BIAS: 15 5
PAT: 20 5
EBITS: 25 4
EBITS: 29 4
EBITS: 33 4
EBITS: 37 4
EBITS: 41 4
EBITS: 45 4
EBITS: 49 4
EBITS: 53 4
EBITS: 57 4
EBITS: 61 4
EBITS: 65 4
EBITS: 69 4
WEIGHTS: 73 46
Total bits: 119, endpoint bits: 48, weight bits: 46

**** Mode: 3 (CEM 8)
DualPlane: 0, WeightRange : 2 (4), Subsets : 3, EndpointRange : 7 (12) - BC7 MODE2

Mode: 0 5
BC1H0: 5 1
BC1H1: 6 1
ETC1F: 7 1
ETC1D: 8 1
ETC1I0: 9 3
ETC1I1: 12 3
ETC1BIAS: 15 5
PAT: 20 4
ETQ: 24 8
ETQ: 32 8
ETQ: 40 8
ETQ: 48 5
EBITS: 53 2
EBITS: 55 2
EBITS: 57 2
EBITS: 59 2
EBITS: 61 2
EBITS: 63 2
EBITS: 65 2
EBITS: 67 2
EBITS: 69 2
EBITS: 71 2
EBITS: 73 2
EBITS: 75 2
EBITS: 77 2
EBITS: 79 2
EBITS: 81 2
EBITS: 83 2
EBITS: 85 2
EBITS: 87 2
WEIGHTS: 89 29
Total bits: 118, endpoint bits: 65, weight bits: 29

**** Mode: 4 (CEM 8)
DualPlane: 0, WeightRange: 2 (4), Subsets: 2, EndpointRange: 12 (40) - BC7 MODE3

Mode: 0 5
BC1H0: 5 1
BC1H1: 6 1
ETC1F: 7 1
ETC1D: 8 1
ETC1I0: 9 3
ETC1I1: 12 3
ETC1BIAS: 15 5
PAT: 20 5
ETQ: 25 7
ETQ: 32 7
ETQ: 39 7
ETQ: 46 7
EBITS: 53 3
EBITS: 56 3
EBITS: 59 3
EBITS: 62 3
EBITS: 65 3
EBITS: 68 3
EBITS: 71 3
EBITS: 74 3
EBITS: 77 3
EBITS: 80 3
EBITS: 83 3
EBITS: 86 3
WEIGHTS: 89 30
Total bits: 119, endpoint bits: 64, weight bits: 30

**** Mode: 5 (CEM 8)
DualPlane: 0, WeightRange: 5 (8), Subsets: 1, EndpointRange: 20 (256) - BC7 MODE6 RGB

Mode: 0 5
BC1H0: 5 1
BC1H1: 6 1
ETC1F: 7 1
ETC1D: 8 1
ETC1I0: 9 3
ETC1I1: 12 3
ETC1BIAS: 15 5
EBITS: 20 8
EBITS: 28 8
EBITS: 36 8
EBITS: 44 8
EBITS: 52 8
EBITS: 60 8
WEIGHTS: 68 47
Total bits: 115, endpoint bits: 48, weight bits: 47

**** Mode: 6 (CEM 8)
DualPlane: 1, WeightRange: 2 (4), Subsets: 1, EndpointRange: 18 (160) - BC7 MODE5 RGB

Mode: 0 5
BC1H0: 5 1
BC1H1: 6 1
ETC1F: 7 1
ETC1D: 8 1
ETC1I0: 9 3
ETC1I1: 12 3
ETC1BIAS: 15 5
COMPSEL: 20 2
ETQ: 22 7
ETQ: 29 7
EBITS: 36 5
EBITS: 41 5
EBITS: 46 5
EBITS: 51 5
EBITS: 56 5
EBITS: 61 5
WEIGHTS: 66 62
Total bits: 128, endpoint bits: 44, weight bits: 62

**** Mode: 7 (CEM 8)
DualPlane: 0, WeightRange: 2 (4), Subsets: 2, EndpointRange: 12 (40) - BC7 MODE2

Mode: 0 5
BC1H0: 5 1
BC1H1: 6 1
ETC1F: 7 1
ETC1D: 8 1
ETC1I0: 9 3
ETC1I1: 12 3
ETC1BIAS: 15 5
PAT: 20 5
ETQ: 25 7
ETQ: 32 7
ETQ: 39 7
ETQ: 46 7
EBITS: 53 3
EBITS: 56 3
EBITS: 59 3
EBITS: 62 3
EBITS: 65 3
EBITS: 68 3
EBITS: 71 3
EBITS: 74 3
EBITS: 77 3
EBITS: 80 3
EBITS: 83 3
EBITS: 86 3
WEIGHTS: 89 30
Total bits: 119, endpoint bits: 64, weight bits: 30

**** Mode: 8 (Void-Extent)
Void-Extent: Solid Color RGBA (BC7 MODE5 or MODE6)

Mode: 0 5
R: 5 8
G: 13 8
B: 21 8
A: 29 8
ETC1D: 37 1
ETC1I: 38 3
ETC1S: 41 2
ETC1R: 43 5
ETC1G: 48 5
ETC1B: 53 5

**** Mode: 9 (CEM 12)
DualPlane: 0, WeightRange: 2 (4), Subsets: 2, EndpointRange: 8 (16) - BC7 MODE7

Mode: 0 5
BC1H0: 5 1
BC1H1: 6 1
ETC1F: 7 1
ETC1D: 8 1
ETC1I0: 9 3
ETC1I1: 12 3
ETC1BIAS: 15 5
ETC2TM: 20 8
PAT: 28 5
EBITS: 33 4
EBITS: 37 4
EBITS: 41 4
EBITS: 45 4
EBITS: 49 4
EBITS: 53 4
EBITS: 57 4
EBITS: 61 4
EBITS: 65 4
EBITS: 69 4
EBITS: 73 4
EBITS: 77 4
EBITS: 81 4
EBITS: 85 4
EBITS: 89 4
EBITS: 93 4
WEIGHTS: 97 30
Total bits: 127, endpoint bits: 64, weight bits: 30

**** Mode: 10 (CEM 12)
DualPlane: 0, WeightRange: 8 (16), Subsets: 1, EndpointRange: 13 (48) - BC7 MODE6

Mode: 0 3
BC1H0: 3 1
ETC1F: 4 1
ETC1D: 5 1
ETC1I0: 6 3
ETC1I1: 9 3
ETC2TM: 12 8
ETQ: 20 8
ETQ: 28 5
EBITS: 33 4
EBITS: 37 4
EBITS: 41 4
EBITS: 45 4
EBITS: 49 4
EBITS: 53 4
EBITS: 57 4
EBITS: 61 4
WEIGHTS: 65 63
Total bits: 128, endpoint bits: 45, weight bits: 63

**** Mode: 11 (CEM 12)
DualPlane: 1, WeightRange: 2 (4), Subsets: 1, EndpointRange: 13 (48) - BC7 MODE5

Mode: 0 2
BC1H0: 2 1
ETC1F: 3 1
ETC1D: 4 1
ETC1I0: 5 3
ETC1I1: 8 3
ETC2TM: 11 8
COMPSEL: 19 2
ETQ: 21 8
ETQ: 29 5
EBITS: 34 4
EBITS: 38 4
EBITS: 42 4
EBITS: 46 4
EBITS: 50 4
EBITS: 54 4
EBITS: 58 4
EBITS: 62 4
WEIGHTS: 66 62
Total bits: 128, endpoint bits: 45, weight bits: 62

**** Mode: 12 (CEM 12)
DualPlane: 0, WeightRange: 5 (8), Subsets: 1, EndpointRange: 19 (192) - BC7 MODE6

Mode: 0 3
BC1H0: 3 1
ETC1F: 4 1
ETC1D: 5 1
ETC1I0: 6 3
ETC1I1: 9 3
ETC2TM: 12 8
ETQ: 20 8
ETQ: 28 5
EBITS: 33 6
EBITS: 39 6
EBITS: 45 6
EBITS: 51 6
EBITS: 57 6
EBITS: 63 6
EBITS: 69 6
EBITS: 75 6
WEIGHTS: 81 47
Total bits: 128, endpoint bits: 61, weight bits: 47

**** Mode: 13 (CEM 12)
DualPlane: 1, WeightRange: 0 (2), Subsets: 1, EndpointRange: 20 (256) - BC7 MODE5

Mode: 0 5
BC1H0: 5 1
BC1H1: 6 1
ETC1F: 7 1
ETC1D: 8 1
ETC1I0: 9 3
ETC1I1: 12 3
ETC1BIAS: 15 5
ETC2TM: 20 8
COMPSEL: 28 2
EBITS: 30 8
EBITS: 38 8
EBITS: 46 8
EBITS: 54 8
EBITS: 62 8
EBITS: 70 8
EBITS: 78 8
EBITS: 86 8
WEIGHTS: 94 30
Total bits: 124, endpoint bits: 64, weight bits: 30

**** Mode: 14 (CEM 12)
DualPlane: 0, WeightRange: 2 (4), Subsets: 1, EndpointRange: 20 (256) - BC7 MODE6

Mode: 0 4
BC1H0: 4 1
BC1H1: 5 1
ETC1F: 6 1
ETC1D: 7 1
ETC1I0: 8 3
ETC1I1: 11 3
ETC1BIAS: 14 5
ETC2TM: 19 8
EBITS: 27 8
EBITS: 35 8
EBITS: 43 8
EBITS: 51 8
EBITS: 59 8
EBITS: 67 8
EBITS: 75 8
EBITS: 83 8
WEIGHTS: 91 31
Total bits: 122, endpoint bits: 64, weight bits: 31


UASTC BISE Weight Index Ranges table:

Range    Bits Trits Quints       UASTC Modes       Quant. Levels
0        1                       13                2
2        2                       1 3 4 6 7 6 9 11  4
5        3                       2 5 12            8
8        4                       0 10              16

(UASTC weights do not use trits or quints.)


UASTC BISE Endpoint Ranges table:

Range    Bits Trits Quints       UASTC Modes   Quant. Levels
7        2    1                  3             12
8        4                       2 9           16
12       3          1            4 7           40
13       4    1                  10 11         48
18       5          1            6             160
19       6    1                  0 12          192
20       8                       1 5 13 14     256


UASTC/BC7 2-subset partition pattern table:


uint32_t TOTAL_ASTC_BC7_COMMON_PARTITIONS2 = 30

struct
{
  int m_bc7_pattern;
  int m_astc_seed;
// if true, invert the BC7 pattern's subset index to match ASTC's subset index
  bool m_invert;
} g_uastc_bc7_common_partitions2[TOTAL_ASTC_BC7_COMMON_PARTITIONS2] =

{
  { 0, 28, false  }, { 1, 20, false }, { 2, 16, true }, { 3, 29, false },
  { 4, 91, true }, { 5, 9, false }, { 6, 107, true }, { 7, 72, true },
  { 8, 149, false }, { 9, 204, true }, { 10, 50, false }, { 11, 114, true },
  { 12, 496, true }, { 13, 17, true }, { 14, 78, false }, { 15, 39, true }, 
  { 17, 252, true }, { 18, 828, true }, { 19, 43, false }, { 20, 156, false }, 
  { 21, 116, false }, { 22, 210, true }, { 23, 476, true }, { 24, 273, false },
  { 25, 684, true }, { 26, 359, false }, { 29, 246, true }, { 32, 195, true },
  { 33, 694, true }, { 52, 524, true }
};

// UASTC pattern table for the 2-subset modes
const uint8_t g_uastc_patterns2[TOTAL_ASTC_BC7_COMMON_PARTITIONS2][16] =
{
   { 0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1 }, { 0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1 }, 
   { 1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0 }, { 0,0,0,1,0,0,1,1,0,0,1,1,0,1,1,1 },
   { 1,1,1,1,1,1,1,0,1,1,1,0,1,1,0,0 }, { 0,0,1,1,0,1,1,1,0,1,1,1,1,1,1,1 }, 
   { 1,1,1,0,1,1,0,0,1,0,0,0,0,0,0,0 }, { 1,1,1,1,1,1,1,0,1,1,0,0,1,0,0,0 },
   { 0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1 }, { 1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0 }, 
   { 0,0,0,0,0,0,0,1,0,1,1,1,1,1,1,1 }, { 1,1,1,1,1,1,1,1,1,1,1,0,1,0,0,0 },
   { 1,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0 }, { 1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0 }, 
   { 0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1 }, { 1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0 },
   { 1,0,0,0,1,1,1,0,1,1,1,1,1,1,1,1 }, { 1,1,1,1,1,1,1,1,0,1,1,1,0,0,0,1 }, 
   { 0,1,1,1,0,0,1,1,0,0,0,1,0,0,0,0 }, { 0,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0 },
   { 0,0,0,0,1,0,0,0,1,1,0,0,1,1,1,0 }, { 1,1,1,1,1,1,1,1,0,1,1,1,0,0,1,1 }, 
   { 1,0,0,0,1,1,0,0,1,1,0,0,1,1,1,0 }, { 0,0,1,1,0,0,0,1,0,0,0,1,0,0,0,0 },
   { 1,1,1,1,0,1,1,1,0,1,1,1,0,0,1,1 }, { 0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0 }, 
   { 1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1 }, { 1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0 },
   { 1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0 }, { 1,0,0,1,0,0,1,1,0,1,1,0,1,1,0,0 }

};


UASTC/BC7 3-subset partition pattern table:


uint32_t TOTAL_ASTC_BC7_COMMON_PARTITIONS3 = 11;

struct
{
  uint8_t m_bc7;
  uint16_t m_astc;

// maps ASTC to BC7 subset indices using g_astc_bc7_subset_index_perm_tables[][]
  uint8_t m_astc_to_bc7_perm;
} g_uastc_bc7_common_partitions3[TOTAL_ASTC_BC7_COMMON_PARTITIONS3] =
{
  { 4, 260, 0 },  { 8, 74, 5 },  { 9, 32, 5 },  { 10, 156, 2 },
  { 11, 183, 2 },  { 12, 15, 0 },  { 13, 745, 4 },  { 20, 0, 1 },
  { 35, 335, 1 },  { 36, 902, 5 },  { 57, 254, 0 }
};


const uint8_t g_astc_bc7_subset_index_perm_tables[6][3] = 
{
{ 0, 1, 2 }, { 1, 2, 0 }, { 2, 0, 1 }, { 2, 1, 0 }, { 0, 2, 1 }, { 1, 0, 2 }
};

// UASTC pattern table for the 3-subset modes
const uint8_t g_uastc_patterns3[TOTAL_ASTC_BC7_COMMON_PARTITIONS3][16] =
{
   { 0,0,0,0,0,0,0,0,1,1,2,2,1,1,2,2 }, { 1,1,1,1,1,1,1,1,0,0,0,0,2,2,2,2 }, 
   { 1,1,1,1,0,0,0,0,0,0,0,0,2,2,2,2 }, { 1,1,1,1,2,2,2,2,0,0,0,0,0,0,0,0 },
   { 1,1,2,0,1,1,2,0,1,1,2,0,1,1,2,0 }, { 0,1,1,2,0,1,1,2,0,1,1,2,0,1,1,2 }, 
   { 0,2,1,1,0,2,1,1,0,2,1,1,0,2,1,1 }, { 2,0,0,0,2,0,0,0,2,1,1,1,2,1,1,1 },
   { 2,0,1,2,2,0,1,2,2,0,1,2,2,0,1,2 }, { 1,1,1,1,0,0,0,0,2,2,2,2,1,1,1,1 }, 
   { 0,0,2,2,0,0,1,1,0,0,1,1,0,0,2,2 }

};

UASTC/BC7 2-subset partition pattern table (mapped to the BC7 3-subset patterns, used only in UASTC mode 7):


uint32_t TOTAL_BC73_ASTC2_COMMON_PARTITIONS = 19;

struct
{
uint8_t m_bc73;
uint16_t m_astc2;
// [0,5] - how to modify the BC7 3-subset pattern to match the ASTC pattern (LSB=invert). See convert_subset_index_3_to_2().
uint8_t k;
} g_bc73_uastc2_common_partitions[TOTAL_BC73_ASTC2_COMMON_PARTITIONS] =
{
{ 10, 36, 4 }, { 11, 48, 4 }, { 0, 61, 3 }, { 2, 137, 4 },
{ 8, 161, 5 }, { 13, 183, 4 }, { 1, 226, 2 }, { 33, 281, 2 },
{ 40, 302, 3 }, { 20, 307, 4 }, { 21, 479, 0 }, { 58, 495, 3 },
{ 3, 593, 0 }, { 32, 594, 2 }, { 59, 605, 1 }, { 34, 799, 3 },
{ 20, 812, 1 }, { 14, 988, 4 }, { 31, 993, 3 }
};

uint32_t convert_subset_index_3_to_2(uint32_t p, uint32_t k)
{
    assert(k < 6);
    switch (k >> 1)
    {
    case 0:
        if (p <= 1)
            p = 0;
        else 
            p = 1;
        break;
    case 1:
        if (p == 0)
            p = 0;
        else 
            p = 1;
        break;
    case 2:
        if ((p == 0) || (p == 2))
            p = 0;
        else 
            p = 1;
        break;
    }
    if (k & 1)
        p = 1 - p;
    return p;
}


// UASTC pattern table for UASTC mode 7 (2 subset UASTC, 3-subset BC7)
const uint8_t g_bc7_3_uastc2_patterns2[TOTAL_BC7_3_ASTC2_COMMON_PARTITIONS][16] =
{
   { 0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0 }, { 0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0 }, 
   { 1,1,0,0,1,1,0,0,1,0,0,0,0,0,0,0 }, { 0,0,0,0,0,0,0,1,0,0,1,1,0,0,1,1 },
   { 1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1 }, { 0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0 }, 
   { 0,0,0,1,0,0,1,1,1,1,1,1,1,1,1,1 }, { 0,1,1,1,0,0,1,1,0,0,1,1,0,0,1,1 },
   { 1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0 }, { 0,1,1,1,0,1,1,1,0,0,0,0,0,0,0,0 }, 
   { 0,0,0,0,0,0,0,0,1,1,1,0,1,1,1,0 }, { 1,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0 },
   { 0,1,1,1,0,0,1,1,0,0,0,0,0,0,0,0 }, { 0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1 }, 
   { 1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0 }, { 1,1,0,0,1,1,0,0,1,1,0,0,1,0,0,0 },
   { 1,1,1,1,1,1,1,1,1,0,0,0,1,0,0,0 }, { 0,0,1,1,0,1,1,0,1,1,0,0,1,0,0,0 }, 
   { 1,1,1,1,0,1,1,1,0,0,0,0,0,0,0,0 }

};


UASTC Weight Tables


These 6-bit weight tables are used for endpoint interpolation in UASTC. They are the same as ASTC's.
const uint32_t g_astc_bc7_weights1[2] = { 0, 64 };
const uint32_t g_astc_bc7_weights2[4] = { 0, 21, 43, 64 };
const uint32_t g_astc_bc7_weights3[8] = { 0, 9, 18, 27, 37, 46, 55, 64 };
const uint32_t g_bc7_weights4[16] = { 0, 4, 9, 13, 17, 21, 26, 30, 34, 38, 43, 47, 51, 55, 60, 64 };
const uint32_t g_astc_weights4[16] = { 0, 4, 8, 12, 17, 21, 25, 29, 35, 39, 43, 47, 52, 56, 60, 64 };

Note BC7 and ASTC use the same 2 and 3 bit weight tables, while the 4-bit tables are slightly different. A UASTC encoder can work around this difference by evaluating the combined ASTC and transcoded BC7 error and choosing the mode/partition pattern/compsel/etc. configuration that minimizes the overall error.


UASTC Partition Pattern Anchor Index Tables


The texel indices in these tables indicate, for each common UASTC/BC7 pattern, which weight indices are stored with one less bit than normal. The texel indices are computed as x+y*4. 

The MSB's of weight indices stored with one less bit must be 0. If they aren't the endpoints corresponding to that subset are swapped and that subset's weights are inverted before packing the UASTC block. BC7 uses a similar concept.

Anchor weight indices are applied to weight indices in both planes in dual plane UASTC modes.

The first texel is always an anchor.

const uint8_t g_uastc_pattern2_anchors[TOTAL_ASTC_BC7_COMMON_PARTITIONS2][2] = 
{
   { 0, 2 }, { 0, 3 }, { 1, 0 }, { 0, 3 }, { 7, 0 }, { 0, 2 }, { 3, 0 }, 
   { 7, 0 }, { 0, 11 }, { 2, 0 }, { 0, 7 }, { 11, 0 }, { 3, 0 }, { 8, 0 }, 
   { 0, 4 }, { 12, 0 }, { 1, 0 }, { 8, 0 }, { 0, 1 }, { 0, 2 }, { 0, 4 }, 
   { 8, 0 }, { 1, 0 }, { 0, 2 }, { 4, 0 }, { 0, 1 }, { 4, 0 }, { 1, 0 }, 
   { 4, 0 }, { 1, 0 }
};

const uint8_t g_uastc_pattern3_anchors[TOTAL_ASTC_BC7_COMMON_PARTITIONS3][3] =
{
   { 0, 8, 10 },  { 8, 0, 12 }, { 4, 0, 12 }, { 8, 0, 4 }, { 3, 0, 2 }, 
   { 0, 1, 3 }, { 0, 2, 1 }, { 1, 9, 0 }, { 1, 2, 0 }, { 4, 0, 8 }, { 0, 6, 2 }
};

const uint8_t g_bc7_3_uastc2_patterns2_anchors[TOTAL_BC7_3_ASTC2_COMMON_PARTITIONS][2] =
{
   { 0, 4 }, { 0, 2 }, { 2, 0 }, { 0, 7 }, { 8, 0 }, { 0, 1 }, { 0, 3 }, 
   { 0, 1 }, { 2, 0 }, { 0, 1 }, { 0, 8 }, { 2, 0 }, { 0, 1 }, { 0, 7 }, 
   { 12, 0 }, { 2, 0 }, { 9, 0 }, { 0, 2 }, { 4, 0 }
};


UASTC to BC7 Mode Conversion Table


UASTC Mode   BC7 Mode    Implementation Notes
0            6           
1            3           set both endpoints to same colors/pbits
2            1           
3            2
4            3
5            6           convert weights from 3 to 4 bits
6            5           swap CCS component with alpha
7            2           two endpoints will be set to same colors
8            5 or 6      choose mode with lowest error
9            7
10           6
11           5           swap CCS component with alpha
12           6           convert weights from 3->4 bits
13           5           convert weights from 1->2 bits
14           6           convert weights from 2->4 bits

UASTC to BC7 Weight Index Conversion


If the UASTC mode's number of weight index bits matches the BC7 mode's index bits, then nothing needs to be done to convert UASTC weight indices to BC7 indices. In the cases that differ, use these tables to translate UASTC weight indices to BC7 indices:

UASTC 1-bit to BC7 2-bit conversion table: { 0, 3 }
UASTC 2-bit to BC7 4-bit conversion table: { 0, 5, 10, 15 }
UASTC 3-bit to BC7 4-bit conversion table: { 0, 2, 4, 6, 9, 11, 13, 15 }

BC7 requires the "anchor" indices to have MSB's of 0. The UASTC->BC7 transcoder needs to check the anchor MSB's and swap the appropriate endpoints, because the UASTC anchor indices aren't the same as BC7's.

UASTC to BC7 Endpoint Conversion


First dequantize the UASTC endpoints to [0,255] using the method outlined here:

The ASTC/UASTC endpoint ranges that use trits or quints must be decoded/dequantized using the parameters in Table 158. (Beware, in case you're not familiar with ASTC: The encoded values of ranges using trits/quints do not dequantize to [0,255] values in a monotonic order. This is why you need to apply the parameters in Table 158 to unquantize the encoded value. This may seem unintuitive, but this is supposed to simplify ASTC decoding hardware.)

Next, for modes with p-bits, divide the dequantized endpoints by 255.0 and compute the optimal quantized BC7 endpoint colors/p-bits using the helper functions included below. 

If the mode doesn't have p-bits, then scale the dequantized endpoint value to the desired number of BC7 endpoint bits, with rounding. To convert 8-bit endpoint components to 5-bits compute (value*31+127)/255, and to convert 8-bits to 7 compute (value*127+127)/255. 

The UASTC encoder assumes this is how the BC7 endpoints will be computed during transcoding. If you do something else the encoder's error computation won't be accurate.

Computing Optimal BC7 P-Bits


To compute the optimal shared or unique BC7 p-bits, we use the following two helper functions. (This is tricky enough, and almost always messed up in the BC7 encoders that we've seen, that we're just including the exact code we use.) Our UASTC encoder assumes the p-bits are computed in this way.

Inputs:
total_comps: 3 or 4
comp_bits: bits in BC7 output color (7 for mode 6, etc.)
xl/xh: the normalized [0,1] input vectors (dequantize ASTC's endpoints and divide by 255.0)

Outputs: 
bestMinColor, bestMaxColor: quantized colors to pack into BC7 output
best_pbits[2]: p-bits to pack into BC7 output

// Determines the best shared pbits to use to encode xl/xh
static void determine_shared_pbits(
   uint32_t total_comps, uint32_t comp_bits, float xl[4], float xh[4],
   color_quad_u8& bestMinColor, color_quad_u8& bestMaxColor, uint32_t best_pbits[2])
{
   const uint32_t total_bits = comp_bits + 1;
   assert(total_bits >= 4 && total_bits <= 8);

   const int iscalep = (1 << total_bits) - 1;
   const float scalep = (float)iscalep;

   float best_err = 1e+9f;

   for (int p = 0; p < 2; p++)
   {
      color_quad_u8 xMinColor, xMaxColor;
      for (uint32_t c = 0; c < 4; c++)
      {
         xMinColor.m_c[c] = (uint8_t)(clampi(((int)((xl[c] * scalep - p) / 2.0f + .5f)) * 2 + p, p, iscalep - 1 + p));
         xMaxColor.m_c[c] = (uint8_t)(clampi(((int)((xh[c] * scalep - p) / 2.0f + .5f)) * 2 + p, p, iscalep - 1 + p));
      }

      color_quad_u8 scaledLow, scaledHigh;

      for (uint32_t i = 0; i < 4; i++)
      {
         scaledLow.m_c[i] = (xMinColor.m_c[i] << (8 - total_bits));
         scaledLow.m_c[i] |= (scaledLow.m_c[i] >> total_bits);
         assert(scaledLow.m_c[i] <= 255);

         scaledHigh.m_c[i] = (xMaxColor.m_c[i] << (8 - total_bits));
         scaledHigh.m_c[i] |= (scaledHigh.m_c[i] >> total_bits);
         assert(scaledHigh.m_c[i] <= 255);
      }

      float err = 0;
      for (uint32_t i = 0; i < total_comps; i++)
         err += squaref((scaledLow.m_c[i] / 255.0f) - xl[i]) + squaref((scaledHigh.m_c[i] / 255.0f) - xh[i]);

      if (err < best_err)
      {
         best_err = err;
         best_pbits[0] = p;
         best_pbits[1] = p;
         for (uint32_t j = 0; j < 4; j++)
         {
            bestMinColor.m_c[j] = xMinColor.m_c[j] >> 1;
            bestMaxColor.m_c[j] = xMaxColor.m_c[j] >> 1;
         }
      }
   }
}

// Determines the best unique pbits to use to encode xl/xh
static void determine_unique_pbits(
   uint32_t total_comps, uint32_t comp_bits, float xl[4], float xh[4], 
   color_quad_u8 &bestMinColor, color_quad_u8 &bestMaxColor, uint32_t best_pbits[2])
{
   const uint32_t total_bits = comp_bits + 1;
   const int iscalep = (1 << total_bits) - 1;
   const float scalep = (float)iscalep;

   float best_err0 = 1e+9f;
   float best_err1 = 1e+9f;

   for (int p = 0; p < 2; p++)
   {
      color_quad_u8 xMinColor, xMaxColor;

      for (uint32_t c = 0; c < 4; c++)
      {
         xMinColor.m_c[c] = (uint8_t)(clampi(((int)((xl[c] * scalep - p) / 2.0f + .5f)) * 2 + p, p, iscalep - 1 + p));
         xMaxColor.m_c[c] = (uint8_t)(clampi(((int)((xh[c] * scalep - p) / 2.0f + .5f)) * 2 + p, p, iscalep - 1 + p));
      }

      color_quad_u8 scaledLow, scaledHigh;
      for (uint32_t i = 0; i < 4; i++)
      {
         scaledLow.m_c[i] = (xMinColor.m_c[i] << (8 - total_bits));
         scaledLow.m_c[i] |= (scaledLow.m_c[i] >> total_bits);
         assert(scaledLow.m_c[i] <= 255);

         scaledHigh.m_c[i] = (xMaxColor.m_c[i] << (8 - total_bits));
         scaledHigh.m_c[i] |= (scaledHigh.m_c[i] >> total_bits);
         assert(scaledHigh.m_c[i] <= 255);
      }

      float err0 = 0, err1 = 0;
      for (uint32_t i = 0; i < total_comps; i++)
      {
         err0 += squaref(scaledLow.m_c[i] - xl[i] * 255.0f);
         err1 += squaref(scaledHigh.m_c[i] - xh[i] * 255.0f);
      }

      if (err0 < best_err0)
      {
         best_err0 = err0;
         best_pbits[0] = p;

         bestMinColor.m_c[0] = xMinColor.m_c[0] >> 1;
         bestMinColor.m_c[1] = xMinColor.m_c[1] >> 1;
         bestMinColor.m_c[2] = xMinColor.m_c[2] >> 1;
         bestMinColor.m_c[3] = xMinColor.m_c[3] >> 1;
      }

      if (err1 < best_err1)
      {
         best_err1 = err1;
         best_pbits[1] = p;

         bestMaxColor.m_c[0] = xMaxColor.m_c[0] >> 1;
         bestMaxColor.m_c[1] = xMaxColor.m_c[1] >> 1;
         bestMaxColor.m_c[2] = xMaxColor.m_c[2] >> 1;
         bestMaxColor.m_c[3] = xMaxColor.m_c[3] >> 1;
      }
   }
}

ETC1 Transcoding


For solid color UASTC blocks: ETC1 transcode hints directly follow the 32-bit RGBA block color. These hints indicate which color mode (differential or individual), intensity table index, selector, and block color to use to encode the block with the lowest error.

For non-solid color blocks: The UASTC block must first be unpacked to pixels. After this, the only expensive work required to transcode to ETC1 is computing the subblock average colors and the texel indices. 

There are ETC1 hint fields which indicate how the ETC1 subblocks are flipped, which block color mode to use, each subblock's intensity table index, and how to bias the computed quantized subblock block colors. 

To compute each subblock's quantized block color: First compute each subblocks' average texel color. Then quantize the average subblock colors to 4 or 5-bits/component (depending on the ETC1 block color mode), with rounding. Next, apply the ETC1 bias indicated by the 5-bit UASTC ETC1BIAS field (see the apply_etc1_bias() function below) to each quantized subblock color. Finally, encode the quantized subblock colors in the ETC1 block. In differential mode the second subblock's differential color may need to be clamped to fit into 3-bits/component.

The last step is computing the texel indices. This step can be accelerated by computing the errors in a luma space with RGB component weights of (1,1,1).


color_rgba apply_etc1_bias(color_rgba block_color, uint32_t bias, uint32_t limit, uint32_t subblock)
{
   for (uint32_t c = 0; c < 3; c++)
   {
      static const int s_divs[3] = { 1, 3, 9 };

      int delta = 0;

      switch (bias)
      {
      case 2: delta = subblock ? 0 : ((c == 0) ? -1 : 0); break;
      case 5: delta = subblock ? 0 : ((c == 1) ? -1 : 0); break;
      case 6: delta = subblock ? 0 : ((c == 2) ? -1 : 0); break;

      case 7: delta = subblock ? 0 : ((c == 0) ? 1 : 0); break;
      case 11: delta = subblock ? 0 : ((c == 1) ? 1 : 0); break;
      case 15: delta = subblock ? 0 : ((c == 2) ? 1 : 0); break;

      case 18: delta = subblock ? ((c == 0) ? -1 : 0) : 0; break;
      case 19: delta = subblock ? ((c == 1) ? -1 : 0) : 0; break;
      case 20: delta = subblock ? ((c == 2) ? -1 : 0) : 0; break;

      case 21: delta = subblock ? ((c == 0) ? 1 : 0) : 0; break;
      case 24: delta = subblock ? ((c == 1) ? 1 : 0) : 0; break;
      case 8: delta = subblock ? ((c == 2) ? 1 : 0) : 0; break;

      case 10: delta = -2; break;

      case 27: delta = subblock ? 0 : -1; break;
      case 28: delta = subblock ? -1 : 1; break;
      case 29: delta = subblock ? 1 : 0; break;
      case 30: delta = subblock ? -1 : 0; break;
      case 31: delta = subblock ? 0 : 1; break;

      default:
         delta = ((bias / s_divs[c]) % 3) - 1;
         break;
      }
      
      int v = block_color[c];
      if (v == 0)
      {
         if (delta == -2)
            v += 3;
         else
            v += delta + 1;
      }
      else if (v == (int)limit)
      {
         v += (delta - 1);
      }
      else
      {
         v += delta;
         if ((v < 0) || (v > (int)limit))
            v = (v - delta) - delta;
      }

      assert(v >= 0);
      assert(v <= (int)limit);

      block_color[c] = (uint8_t)v;
   }

   return block_color;
}



References:

https://www.khronos.org/registry/DataFormat/specs/1.1/dataformat.1.1.html#ASTC
https://docs.microsoft.com/en-us/windows/win32/direct3d11/bc7-format-mode-reference
https://rockets2000.wordpress.com/2015/05/19/bc7-partitions-subsets/
https://github.com/KhronosGroup/OpenGL-Registry/blob/master/extensions/EXT/EXT_texture_compression_bptc.txt
https://www.khronos.org/registry/DataFormat/specs/1.1/dataformat.1.1.html#ETC1
https://www.khronos.org/registry/DataFormat/specs/1.1/dataformat.1.1.html#ETC2
https://docs.microsoft.com/en-us/windows/win32/direct3d10/d3d10-graphics-programming-guide-resources-block-compression#bc1

No comments:

Post a Comment