Given a particular 4x4 pixel block, what does the error of all possible ETC1 5:5:5 base color+3-bit intensity encodings look like? The resulting 4D visualization could inspire better optimization algorithms.
To compute these images, I created an ETC1 block in differential mode (5:5:5 base color with a 3:3:3 delta), set the base color to R,G,B, the diff color to (0,0,0), and set both subblock intensity table values to the same index from 0-7. I then encoded the source pixels (by finding the optimal selectors for each pixel), decoded them, and computed the overall block error (as perceptually R,G,B weighted color distance).
These visualizations are linear, where the brightest value (255) is max error, black is 0 error. The blocks used to compute each visualization are here too:
Finding the "best" block color+intensity table index to use in a subblock is basically a 4D search through functions like above. Hill climbing optimization seems useful, except for those pesky local minimums. For fun, I've already tried random-restart hill climbing, and it works, but there's got to be a better way.
rg_etc1 starts at the block's average color and scans outwards along the RGB axes, trying to find better colors. It always tries all 8 intensity tables every time it tries a candidate color (which in retrospect seems wildly inefficient, but hey I wrote it over a weekend years ago). It also has several refinement steps. One of them factors in the selectors of the best color found so far, in an attempt to improve the current block color. rg_etc1 ran circles around Mali's reference encoder, from what I remember, which was my goal.
Co-owner of Binomial LLC, working on GPU texture interchange. Open source developer, graphics programmer, former video game developer. Worked previously at SpaceX (Starlink), Valve, Ensemble Studios (Microsoft), DICE Canada.
Friday, September 16, 2016
ETC1 texture format visualizations
I've been thinking about how to improve my ETC1 block encoder's quality. What little curiosities lie inside this seemingly simple format?
Hmm: Out of all possible ETC1 subblock colors in 5:5:5 differential mode, how many involve clamping R, G, and/or B to 0 or 255? Turns out, 72% (189704 out of 262144) of the possibilities involve clamping one or more components. That's much more often than I thought!
Here's a bitmap visualizing when the clamping occurs on any of the 4 block colors encoded by each 5:5:5 base color/3-bit intensity table combination. White pixels signify that one or more color components had to be clamped, and black signifies no clamping:
The basic assumption that each ETC1 subblock color lies nicely spread out along a single colorspace line isn't accurate, due to [0,255] clamping. So any optimization techniques written with this assumption in mind could be missing better solutions. Also, this impacts converting ETC1 to other formats like DXT1, because both endpoints of each colorspace line in DXT1 are separately encoded. Is this really a big deal? I dunno, but it's good to know.
Anyhow, here's a visualization of all possible subcolors. First, there are 4 images, one for each subblock color [0,3]. The 2-bit ETC1 selectors basically selector a color from one of these images.
Within an image, there are 8 rows, one for each of the ETC1 intensity tables. Within a row, there are 32 small "tiles" for blue, and within each little 32x32 tile is red (X) and green (Y).
FasTC library
This library, which supports a bunch of common (ETC1, DXT, PVRTC, etc.) formats (not all for encoding yet though) looks great:
https://github.com/GammaUNC/FasTC
https://github.com/GammaUNC/FasTC
Thursday, September 15, 2016
Google's new ETC2 codec looks awesome
I've worked with many of the authors of this at one time or another:
Building a blazing fast ETC2 compressor
Repo:
https://github.com/google/etc2comp
(I can't believe the Mali encoder was only single threaded!)
Building a blazing fast ETC2 compressor
Repo:
https://github.com/google/etc2comp
(I can't believe the Mali encoder was only single threaded!)
Wednesday, September 14, 2016
ETC1 principle axis optimization
One possible potential (probably minor) optimization to ETC1 encoding: determine the principle axis of the entire texture, rotate the texture's RGB pixels (by treating them as 3D vectors) so this axis is aligned along the grayscale axis, then compress the texture as usual. The pixel shader can undo the rotation using a trivial handful of instructions.
ETC1 uses colorspace lines constrained to be parallel to the grayscale axis, which this optimization exploits.
ETC1 uses colorspace lines constrained to be parallel to the grayscale axis, which this optimization exploits.
etcpak
etcpak is a very fast, but low quality ETC1 (and a little bit of ETC2) compressor:
https://bitbucket.org/wolfpld/etcpak/wiki/Home
It's the fastest open source ETC1 encoder that I'm aware of.
Notice the lack of any PSNR/MSE/SSIM statistics anywhere (that I can see). Also, the developer doesn't seem to get that the other tools/libraries he compares his stuff against were optimized for quality, not raw speed. In particular, rg_etc1 (and crunch's ETC1 support) was tuned to compete against the reference encoder along both the quality and perf. axes.
Anyhow, there are some interesting things to learn from etcpak:
https://bitbucket.org/wolfpld/etcpak/wiki/Home
It's the fastest open source ETC1 encoder that I'm aware of.
Notice the lack of any PSNR/MSE/SSIM statistics anywhere (that I can see). Also, the developer doesn't seem to get that the other tools/libraries he compares his stuff against were optimized for quality, not raw speed. In particular, rg_etc1 (and crunch's ETC1 support) was tuned to compete against the reference encoder along both the quality and perf. axes.
Anyhow, there are some interesting things to learn from etcpak:
- Best quality doesn't always matter. It obviously depends on your use case. If you have 10 gigs of textures to compress then iteration speed can be very important.
- The value spectrum spans from highest quality/slow encode (to ship final assets) to crap quality/fast as hell encode (favoring iteration speed).
- Visually, the ETC1/2 formats are nicely forgiving. Even a low quality ETC1 encoder produces decent enough looking output for many use cases.
Sunday, September 11, 2016
Idea for next texture compression experiment
Right now, I've got a GPU texture in a simple ETC1 subset that is easily converted to most other GPU formats:
Base color: 15-bits, 5:5:5 RGB
Intensity table index: 3-bits
Selectors: 2-bits/texel
Most importantly, this is a "single subset" encoding, using BC7 terminology. BC7 supports between 1-3 subsets per block. A subset is just a colorspace line represented by two R,G,B endpoint colors.
This format is easily converted to DXT1 using a table lookup. It's also the "base" of the universal GPU texture format I've been thinking about, because it's the data needed for DXT1 support. The next step is to experiment with attempting to refine this base data to better take advantage of the full ETC1 specification. So let's try adding two subsets to each block, with two partitions (again using BC7 terminology), top/bottom or left/right, which are supported by both ETC1 and BC7.
For example, we can code this base color, then delta code the 2 subset colors relative to this base. We'll also add a couple more intensity indices, which can be delta coded against the base index. Another bit can indicate which ETC1 block color encoding "mode" should be used (individual 4:4:4 4:4:4 or differential 5:5:5 3:3:3) to represent the subset colors in the output block.
In DXT1 mode, we can ignore this extra delta coded data and just convert the basic (single subset) base format. In ETC1/BC7/ASTC modes, we can use the extra information to support 2 subsets and 2 partitions.
Currently, the idea is to share the same selector indices between the single subset (DXT1) and two subset (BC7/ASTC/full ETC1) encodings. This will constrain how well this idea works, but I think it's worth trying out.
To add more quality to the 2 subset mode, we can delta code (maybe with some fancy per-pixel prediction) another array of selectors in some way. We can also add support for more partitions (derived from BC7's or ASTC's), too.
Base color: 15-bits, 5:5:5 RGB
Intensity table index: 3-bits
Selectors: 2-bits/texel
Most importantly, this is a "single subset" encoding, using BC7 terminology. BC7 supports between 1-3 subsets per block. A subset is just a colorspace line represented by two R,G,B endpoint colors.
This format is easily converted to DXT1 using a table lookup. It's also the "base" of the universal GPU texture format I've been thinking about, because it's the data needed for DXT1 support. The next step is to experiment with attempting to refine this base data to better take advantage of the full ETC1 specification. So let's try adding two subsets to each block, with two partitions (again using BC7 terminology), top/bottom or left/right, which are supported by both ETC1 and BC7.
For example, we can code this base color, then delta code the 2 subset colors relative to this base. We'll also add a couple more intensity indices, which can be delta coded against the base index. Another bit can indicate which ETC1 block color encoding "mode" should be used (individual 4:4:4 4:4:4 or differential 5:5:5 3:3:3) to represent the subset colors in the output block.
In DXT1 mode, we can ignore this extra delta coded data and just convert the basic (single subset) base format. In ETC1/BC7/ASTC modes, we can use the extra information to support 2 subsets and 2 partitions.
Currently, the idea is to share the same selector indices between the single subset (DXT1) and two subset (BC7/ASTC/full ETC1) encodings. This will constrain how well this idea works, but I think it's worth trying out.
To add more quality to the 2 subset mode, we can delta code (maybe with some fancy per-pixel prediction) another array of selectors in some way. We can also add support for more partitions (derived from BC7's or ASTC's), too.
Subscribe to:
Posts (Atom)












