Saturday, September 3, 2016

ETC1 block color clusterization experiment

Intro


ETC1 is a well thought out, elegant little GPU format. In my experience a few years ago writing a production quality block ETC1 encoder, I found it to be far less fiddly than DXT1. Both use 64-bits to represent a 4x4 texel block, or 4-bits per texel.

I've been very curious how hard it would be to add ETC1/2 support to crunch. Also, many people have asked about ETC1 support, which is guaranteed to be available on OpenGL ES 2.0 compatible Android devices. crunch currently only supports the DXT1/5/N (3DC) texture formats. crunch's higher level classes are highly specific to the DXT formats, so adding a new format is not trivial.

One of the trickier (and key) problems in adding a new GPU format to crunch is figuring out how to group blocks (using some form of cluster analysis) so they can share the same endpoints. GPU formats like DXT1 and ETC1 are riddled with block artifacts, and bad groupings can greatly amplify them. crunch for DXT has a endpoint clusterization algorithm that was refined over many tens of thousands of real-life game textures and satellite photography. I've just begun experimenting with ETC1, and so far I'm very impressed with how well behaved and versatile it is.

Note this experiment was conducted in a new data compression codebase I've been building, which is much larger than crunch's.

ETC1 Texture Compression


Unlike DXT1, which only supports 3 or 4 unique block colors, the ETC1 format supports up to 8 unique block colors. It divides up the block into either two 4x2 or 2x4 pixel "subblocks". A single "flip" bit controls whether or not the subblocks are oriented horizontally or vertically. Each subblock has 4 colors, for 8 total.

The 4 subblock colors are created by taking the subblock's base color and adding to it 4 grayscale colors from an intensity table. Each subblock has 3 bits which selects which intensity table to apply. The intensity tables are constant and part of the spec.

To encode the two block colors, ETC1 supports two modes: an "individual" mode, where each color is encoded to 4:4:4, or a "differential" mode, where the first color is 5:5:5 and the second color is a two's complement encoded 3:3:3 delta relative to the base color. The delta is applied before the base color is scaled to 8-bits.

From an encoding perspective, individual mode is most useful when the two subblocks have wildly different colors (favoring color diversity vs. encoding precision), and delta mode is most useful when encoding precision is more useful than diversity.

Each pixel is represented using 2-bit selectors, just like DXT1. Except in ETC1, the color selected depends on which subblock the pixel is within.

So that's ETC1 in a nutshell. In practice, from what I remember its quality is a little lower than DXT1, but not by much. Its artifacts look more pleasant to me than DXT1's (obviously subjective). Each ETC1 block is represented by 2 colorspace lines that are always parallel to the grayscale axis. By comparison, with DXT1, there's only a single line, but it can be in any direction, and perhaps that gives it a slight advantage.

ETC1 Endpoint Clusterization


The goal here is to figure out how to reduce the total number of unique endpoints (or block colors and intensity table indices) in an ETC1 encoded image without murdering the quality. This is just an early experiment, so let's try simplifying the ETC1 format itself to keep things simple. This experiment always use differential block color mode, with the delta color set to (0,0,0). So each subblock is represented using the same 5:5:5 color, and the same intensity table. The flip bit is always false. Obviously, this is going to lower quality, but let's see what happens. Note this simplified format is still 100% compatible with existing ETC1 decoders, we're just limiting ourselves to only using a simpler subset.

Here's the original image (kodim18 - because I remember this image being a pain to handle well in crunch for DXT1):


Here's the image encoded using high quality ETC1 compression (using rg_etc1, slow mode, perceptual colorspace metrics):


Delta:

Grayscale delta histogram:


Error: Max:  56, Mean: 2.827, MSE: 16.106, RMSE: 4.013, PSNR: 36.061

So the ETC1 encoding that takes advantage of all ETC1 features is 36.061 dB.

Here's the encoding using just diff mode, no flipping, with a (0,0,0) delta color:


Delta:


Grayscale delta histogram:


Max:  74, Mean: 3.638, MSE: 27.869, RMSE: 5.279, PSNR: 33.680

So we've lost 2.38 dB by limiting ourselves to this simpler subset of ETC1. The reduction in quality is obviously visible, but by no means fatal for the purposes of this quick experiment. 

In this experiment, each ETC1 block only contains 4 unique colors (or a single colorspace line, with "low" and "high" endpoints and 2 intermediate colors). Here's a visualization of the "low" and "high" endpoints in this image:



Now let's clusterize these block color endpoints, using 6D tree structured VQ (vector quantization) to perform the clusterization. The output of this step consists of a series of clusters, and each cluster contains one or more block indices. The idea is, blocks with similar endpoint vectors will be placed into the same cluster. This is a similar process used by crunch for DXT1. It's much like generating a RGB color palette from an array of image colors, except we're dealing with 6D vectors instead of 3D color vectors, and instead of using the output palette directly all we really care about is how the input vectors are grouped.

Here's a visualization of the cluster endpoint centroid vectors after generating 32 clusters:



Once we have the image organized into block clusters containing similar endpoints, use an internal helper class within rg_etc1 to find the near-optimal 5:5:5 endpoint and intensity table to represent all the pixels within each cluster. We can now create a ETC1-compatible texture by processing each block cluster and selecting the optimal selectors to use for each pixel.

Let's see what this texture looks like, and the PSNR, after limiting the number of unique endpoints.

ETC1 (subset) with 64 unique endpoints:



Error: Max: 110, Mean: 5.865, MSE: 70.233, RMSE: 8.380, PSNR: 29.665


ETC1 (subset) 256 unique endpoints:



Error: Max:  93, Mean: 4.624, MSE: 45.889, RMSE: 6.774, PSNR: 31.514


ETC1 (subset) 512 unique endpoints:



Error: Max:  87, Mean: 4.225, MSE: 38.411, RMSE: 6.198, PSNR: 32.286

ETC1 (subset) 1024 unique endpoints:



Error: Max:  87, Mean: 3.911, MSE: 32.967, RMSE: 5.742, PSNR: 32.950

ETC1 (subset) 4096 unique endpoints:



Error: Max:  87, Mean: 3.642, MSE: 28.037, RMSE: 5.295, PSNR: 33.654


Next Steps


This experiment shows one way to clusterize the endpoint optimization process in a limited subset of the ETC1 format. This first step must be mastered before crunch for ETC1 can be written.

The clusterization step outlined here isn't aware of flipping, or that each block can have 2 block colors, and we haven't even looked at the selectors yet. A production encoder will need to support more features of the ETC1 format. Note that crunch for DXT1 doesn't support 3 color blocks and works just fine, so it's possible we don't need to support every encoding feature.

Some next steps:

- Figure out how to best clusterize the full format. Expand the format subset to include two block colors, flipping, and both encodings.
Is 6D clusterization good enough - or is 12D needed?
- Selector clusterization 
- ETC1 specific refinement stages: refine endpoints based off the clusterized endpoints, then refine the clusterized endpoints based off the clusterized selectors, possibly repeat.
- crunch-style tiling ("macroblocking") will most likely be needed to get bitrate down to JPEG+real-time encoding competitive levels.
- ETC2 support

(Currently, I'm conducting these experiments in my spare time, in between VR and optimization contracts. If you're really interested in accelerating development of crunch for a specific GPU format please contact info@binomial.info.)

No comments:

Post a Comment