Friday, September 16, 2016

ETC1 texture format visualizations

I've been thinking about how to improve my ETC1 block encoder's quality. What little curiosities lie inside this seemingly simple format?

Hmm: Out of all possible ETC1 subblock colors in 5:5:5 differential mode, how many involve clamping R, G, and/or B to 0 or 255? Turns out, 72% (189704 out of 262144) of the possibilities involve clamping one or more components. That's much more often than I thought!

Here's a bitmap visualizing when the clamping occurs on any of the 4 block colors encoded by each 5:5:5 base color/3-bit intensity table combination. White pixels signify that one or more color components had to be clamped, and black signifies no clamping:


The basic assumption that each ETC1 subblock color lies nicely spread out along a single colorspace line isn't accurate, due to [0,255] clamping. So any optimization techniques written with this assumption in mind could be missing better solutions. Also, this impacts converting ETC1 to other formats like DXT1, because both endpoints of each colorspace line in DXT1 are separately encoded. Is this really a big deal? I dunno, but it's good to know.

Anyhow, here's a visualization of all possible subcolors. First, there are 4 images, one for each subblock color [0,3]. The 2-bit ETC1 selectors basically selector a color from one of these images.

Within an image, there are 8 rows, one for each of the ETC1 intensity tables. Within a row, there are 32 small "tiles" for blue, and within each little 32x32 tile is red (X) and green (Y).





FasTC library

This library, which supports a bunch of common (ETC1, DXT, PVRTC, etc.) formats (not all for encoding yet though) looks great:

https://github.com/GammaUNC/FasTC

Thursday, September 15, 2016

Google's new ETC2 codec looks awesome

I've worked with many of the authors of this at one time or another:

Building a blazing fast ETC2 compressor

Repo:
https://github.com/google/etc2comp

(I can't believe the Mali encoder was only single threaded!)

Wednesday, September 14, 2016

ETC1 principle axis optimization

One possible potential (probably minor) optimization to ETC1 encoding: determine the principle axis of the entire texture, rotate the texture's RGB pixels (by treating them as 3D vectors) so this axis is aligned along the grayscale axis, then compress the texture as usual. The pixel shader can undo the rotation using a trivial handful of instructions.

ETC1 uses colorspace lines constrained to be parallel to the grayscale axis, which this optimization exploits.

etcpak

etcpak is a very fast, but low quality ETC1 (and a little bit of ETC2) compressor:

https://bitbucket.org/wolfpld/etcpak/wiki/Home

It's the fastest open source ETC1 encoder that I'm aware of.

Notice the lack of any PSNR/MSE/SSIM statistics anywhere (that I can see). Also, the developer doesn't seem to get that the other tools/libraries he compares his stuff against were optimized for quality, not raw speed. In particular, rg_etc1 (and crunch's ETC1 support) was tuned to compete against the reference encoder along both the quality and perf. axes.

Anyhow, there are some interesting things to learn from etcpak:

  • Best quality doesn't always matter. It obviously depends on your use case. If you have 10 gigs of textures to compress then iteration speed can be very important.
  • The value spectrum spans from highest quality/slow encode (to ship final assets) to crap quality/fast as hell encode (favoring iteration speed). 
  • Visually, the ETC1/2 formats are nicely forgiving. Even a low quality ETC1 encoder produces decent enough looking output for many use cases.

Sunday, September 11, 2016

Idea for next texture compression experiment

Right now, I've got a GPU texture in a simple ETC1 subset that is easily converted to most other GPU formats:

Base color: 15-bits, 5:5:5 RGB
Intensity table index: 3-bits
Selectors: 2-bits/texel

Most importantly, this is a "single subset" encoding, using BC7 terminology. BC7 supports between 1-3 subsets per block. A subset is just a colorspace line represented by two R,G,B endpoint colors.

This format is easily converted to DXT1 using a table lookup. It's also the "base" of the universal GPU texture format I've been thinking about, because it's the data needed for DXT1 support. The next step is to experiment with attempting to refine this base data to better take advantage of the full ETC1 specification. So let's try adding two subsets to each block, with two partitions (again using BC7 terminology), top/bottom or left/right, which are supported by both ETC1 and BC7.

For example, we can code this base color, then delta code the 2 subset colors relative to this base. We'll also add a couple more intensity indices, which can be delta coded against the base index. Another bit can indicate which ETC1 block color encoding "mode" should be used (individual 4:4:4 4:4:4 or differential 5:5:5 3:3:3) to represent the subset colors in the output block.

In DXT1 mode, we can ignore this extra delta coded data and just convert the basic (single subset) base format. In ETC1/BC7/ASTC modes, we can use the extra information to support 2 subsets and 2 partitions.

Currently, the idea is to share the same selector indices between the single subset (DXT1) and two subset (BC7/ASTC/full ETC1) encodings. This will constrain how well this idea works, but I think it's worth trying out.

To add more quality to the 2 subset mode, we can delta code (maybe with some fancy per-pixel prediction) another array of selectors in some way. We can also add support for more partitions (derived from BC7's or ASTC's), too.

Saturday, September 10, 2016

Hierarchical clustering

One of the key algorithms in crunch is determining how to group together block endpoints into clusters. Crunch uses a bottom up clustering approach at the 8x8 pixel (or 2x2 DXTn block) "macroblock" level, then it switches to top down. The top down method is extremely sensitive to the vectors chosen to represent each block during the clusterization step. The algorithm crunch uses to compute representative vectors (used only during clusterization) was refined and tweaked over time. Badly chosen representative vectors cause the clustering step to product crappy clusters (i.e. nasty artifacts).

Anyhow, an alternative approach would be entirely bottom up. I think this method could require less tweaking. Some reading:

https://en.wikipedia.org/wiki/Hierarchical_clustering

https://onlinecourses.science.psu.edu/stat505/node/143

Also Google "agglomerative hierarchical clustering". Here's a Youtube video describing it.