Saturday, September 17, 2016

Quick etcpak quality test

etcpak is a useful and really fast ETC1 (and some of 2) texture compressor. There is no such thing as a free lunch however, and there are some tradeoffs involved here. Quick example:

Original (kodim03):


crnlib in ETC1 uber mode (8.067 seconds):


RGB: Error: Max:  88, Mean: 2.086, MSE: 9.770, RMSE: 3.126, PSNR: 38.232, SSIM: 0.982703
Y: Error: Max:  34, Mean: 1.304, MSE: 3.750, RMSE: 1.936, PSNR: 42.391, SSIM: 0.982703

etcpak, ETC1 mode only (.006 seconds):


RGB: Error: Max:  80, Mean: 2.492, MSE: 12.757, RMSE: 3.572, PSNR: 37.073, SSIM: 0.980072
Y: Error: Max:  49, Mean: 1.494, MSE: 4.996, RMSE: 2.235, PSNR: 41.144, SSIM: 0.980072

Note I've integrated etcpak directly into my project, and used the BlockData class directly. This thing is *fast*, even without threading!

crnlib has several lower quality settings that are much faster (and still higher quality than etcpak), but nowhere near the speed of etcpak. I've not been focused on pure speed, but on quality and unique features like RDO and intermediate formats like .CRN.

I think the primary value of etcpak is its high performance and relatively compact code size (especially for an ETC2-aware compressor). On many textures/images it'll look perfectly fine. Next up is ETC2Comp, limited to ETC1 mode.

ETC1 with 3D/4D random-restart hill climbing

For fun, I implemented a full ETC1 block encoder using random-restart hill climbing, to see how it behaves compared to my current custom optimizer (the one in rg_etc1). This method works surprisingly well and is quite simple. (Note I'm switching to luma PSNR, because I've been using perceptually weighted color distance. My previous posts used average RGB PSNR.)

The number of attempts per block is fixed. The first 4D hill climb always starts at the subblock's average color, with an intensity table index of 3. The second 4D hill climb starts at a random color/intensity. (In differential mode, the 2nd subblock's hill climb position is constrained to lie near the first one, otherwise we can't code it.) Eventually, it switches from 4D to 3D hill climbing, by randomly climbing only within the best found intensity plane.

The nearly-best ETC1 encoding (using rg_etc1 - not hill climbing) was 38.053 dB:
Y: Error: Max:  38, Mean: 2.181, MSE: 10.181, RMSE: 3.191, PSNR: 38.053, SSIM: 0.983632

- 1 hill climb, 33.998 dB:


Y: Error: Max:  35, Mean: 3.943, MSE: 25.901, RMSE: 5.089, PSNR: 33.998, SSIM: 0.933979

- 2 hill climbs, 37.808 dB:



Y: Error: Max:  33, Mean: 2.281, MSE: 10.770, RMSE: 3.282, PSNR: 37.808, SSIM: 0.980324

- 4 hill climbs, 37.818 dB:

 Y: Error: Max:  33, Mean: 2.280, MSE: 10.748, RMSE: 3.278, PSNR: 37.818, SSIM: 0.980280

- 16 hill climbs, 37.919 dB:

Y: Error: Max:  38, Mean: 2.241, MSE: 10.499, RMSE: 3.240, PSNR: 37.919, SSIM: 0.981631

That 2nd random 4D hill climb helps a lot. Quality quickly plateaus however, at least on this image, and subsequent climbs don't add much. Very interestingly to me, even just 4 climbs nearly matches the quality of my hand-tuned ETC1 optimizer.

Friday, September 16, 2016

Visualizing ETC1 block encoding error as a 4D function

Given a particular 4x4 pixel block, what does the error of all possible ETC1 5:5:5 base color+3-bit intensity encodings look like? The resulting 4D visualization could inspire better optimization algorithms.

To compute these images, I created an ETC1 block in differential mode (5:5:5 base color with a 3:3:3 delta), set the base color to R,G,B, the diff color to (0,0,0), and set both subblock intensity table values to the same index from 0-7. I then encoded the source pixels (by finding the optimal selectors for each pixel), decoded them, and computed the overall block error (as perceptually R,G,B weighted color distance).

These visualizations are linear, where the brightest value (255) is max error, black is 0 error. The blocks used to compute each visualization are here too:










Finding the "best" block color+intensity table index to use in a subblock is basically a 4D search through functions like above. Hill climbing optimization seems useful, except for those pesky local minimums. For fun, I've already tried random-restart hill climbing, and it works, but there's got to be a better way.

rg_etc1 starts at the block's average color and scans outwards along the RGB axes, trying to find better colors. It always tries all 8 intensity tables every time it tries a candidate color (which in retrospect seems wildly inefficient, but hey I wrote it over a weekend years ago). It also has several refinement steps. One of them factors in the selectors of the best color found so far, in an attempt to improve the current block color. rg_etc1 ran circles around Mali's reference encoder, from what I remember, which was my goal.

ETC1 texture format visualizations

I've been thinking about how to improve my ETC1 block encoder's quality. What little curiosities lie inside this seemingly simple format?

Hmm: Out of all possible ETC1 subblock colors in 5:5:5 differential mode, how many involve clamping R, G, and/or B to 0 or 255? Turns out, 72% (189704 out of 262144) of the possibilities involve clamping one or more components. That's much more often than I thought!

Here's a bitmap visualizing when the clamping occurs on any of the 4 block colors encoded by each 5:5:5 base color/3-bit intensity table combination. White pixels signify that one or more color components had to be clamped, and black signifies no clamping:


The basic assumption that each ETC1 subblock color lies nicely spread out along a single colorspace line isn't accurate, due to [0,255] clamping. So any optimization techniques written with this assumption in mind could be missing better solutions. Also, this impacts converting ETC1 to other formats like DXT1, because both endpoints of each colorspace line in DXT1 are separately encoded. Is this really a big deal? I dunno, but it's good to know.

Anyhow, here's a visualization of all possible subcolors. First, there are 4 images, one for each subblock color [0,3]. The 2-bit ETC1 selectors basically selector a color from one of these images.

Within an image, there are 8 rows, one for each of the ETC1 intensity tables. Within a row, there are 32 small "tiles" for blue, and within each little 32x32 tile is red (X) and green (Y).





FasTC library

This library, which supports a bunch of common (ETC1, DXT, PVRTC, etc.) formats (not all for encoding yet though) looks great:

https://github.com/GammaUNC/FasTC

Thursday, September 15, 2016

Google's new ETC2 codec looks awesome

I've worked with many of the authors of this at one time or another:

Building a blazing fast ETC2 compressor

Repo:
https://github.com/google/etc2comp

(I can't believe the Mali encoder was only single threaded!)

Wednesday, September 14, 2016

ETC1 principle axis optimization

One possible potential (probably minor) optimization to ETC1 encoding: determine the principle axis of the entire texture, rotate the texture's RGB pixels (by treating them as 3D vectors) so this axis is aligned along the grayscale axis, then compress the texture as usual. The pixel shader can undo the rotation using a trivial handful of instructions.

ETC1 uses colorspace lines constrained to be parallel to the grayscale axis, which this optimization exploits.