Note: I'm consolidating my ETC1/2 benchmarks to one post here.
Co-owner of Binomial LLC, working on GPU texture interchange. Open source developer, graphics programmer, former video game developer. Worked previously at SpaceX (Starlink), Valve, Ensemble Studios (Microsoft), DICE Canada.
Sunday, September 18, 2016
Saturday, September 17, 2016
Let's evaluate the current state of ETC1/2 compression libraries
For regular block encoders (not RDO or crunch-style systems), I think what I need to do is to plot this like I would a lossless Pareto Frontier, with the Y axis being some measure of quality and the X axis being encoding speed across a wide range of test textures. Perhaps I can normalize the quality metric achieved by each encoder at its various settings vs. the highest achievable quality, for each image.
As far as I can tell so far, nobody's beating the quality/performance of etcpak, at its performance point. It's going to be fascinating to compare etcpak vs. ETC2Comp. Let's see how these two compare for pure ETC1 encoding, which is available across a huge range of devices. I'll compare against crnlib's ETC1 block encoder in multithreading mode, which was released before either etcpak or ETC2Comp.
As far as I can tell so far, nobody's beating the quality/performance of etcpak, at its performance point. It's going to be fascinating to compare etcpak vs. ETC2Comp. Let's see how these two compare for pure ETC1 encoding, which is available across a huge range of devices. I'll compare against crnlib's ETC1 block encoder in multithreading mode, which was released before either etcpak or ETC2Comp.
On 30Hz console games
That framerate feels incredibly low to me now. I've worked on 60Hz and 30Hz console titles, and the optimization efforts required felt very different. Keeping a smooth, hypnotic 60Hz was sometimes extremely tricky. Now with VR 30Hz seems so incredibly antiquated.
Quick etcpak quality test
etcpak is a useful and really fast ETC1 (and some of 2) texture compressor. There is no such thing as a free lunch however, and there are some tradeoffs involved here. Quick example:
Original (kodim03):
crnlib in ETC1 uber mode (8.067 seconds):
RGB: Error: Max: 88, Mean: 2.086, MSE: 9.770, RMSE: 3.126, PSNR: 38.232, SSIM: 0.982703
Y: Error: Max: 34, Mean: 1.304, MSE: 3.750, RMSE: 1.936, PSNR: 42.391, SSIM: 0.982703
etcpak, ETC1 mode only (.006 seconds):
RGB: Error: Max: 80, Mean: 2.492, MSE: 12.757, RMSE: 3.572, PSNR: 37.073, SSIM: 0.980072
Y: Error: Max: 49, Mean: 1.494, MSE: 4.996, RMSE: 2.235, PSNR: 41.144, SSIM: 0.980072
Note I've integrated etcpak directly into my project, and used the BlockData class directly. This thing is *fast*, even without threading!
crnlib has several lower quality settings that are much faster (and still higher quality than etcpak), but nowhere near the speed of etcpak. I've not been focused on pure speed, but on quality and unique features like RDO and intermediate formats like .CRN.
I think the primary value of etcpak is its high performance and relatively compact code size (especially for an ETC2-aware compressor). On many textures/images it'll look perfectly fine. Next up is ETC2Comp, limited to ETC1 mode.
Original (kodim03):
crnlib in ETC1 uber mode (8.067 seconds):
RGB: Error: Max: 88, Mean: 2.086, MSE: 9.770, RMSE: 3.126, PSNR: 38.232, SSIM: 0.982703
Y: Error: Max: 34, Mean: 1.304, MSE: 3.750, RMSE: 1.936, PSNR: 42.391, SSIM: 0.982703
RGB: Error: Max: 80, Mean: 2.492, MSE: 12.757, RMSE: 3.572, PSNR: 37.073, SSIM: 0.980072
Y: Error: Max: 49, Mean: 1.494, MSE: 4.996, RMSE: 2.235, PSNR: 41.144, SSIM: 0.980072
Note I've integrated etcpak directly into my project, and used the BlockData class directly. This thing is *fast*, even without threading!
crnlib has several lower quality settings that are much faster (and still higher quality than etcpak), but nowhere near the speed of etcpak. I've not been focused on pure speed, but on quality and unique features like RDO and intermediate formats like .CRN.
I think the primary value of etcpak is its high performance and relatively compact code size (especially for an ETC2-aware compressor). On many textures/images it'll look perfectly fine. Next up is ETC2Comp, limited to ETC1 mode.
ETC1 with 3D/4D random-restart hill climbing
For fun, I implemented a full ETC1 block encoder using random-restart hill climbing, to see how it behaves compared to my current custom optimizer (the one in rg_etc1). This method works surprisingly well and is quite simple. (Note I'm switching to luma PSNR, because I've been using perceptually weighted color distance. My previous posts used average RGB PSNR.)
The number of attempts per block is fixed. The first 4D hill climb always starts at the subblock's average color, with an intensity table index of 3. The second 4D hill climb starts at a random color/intensity. (In differential mode, the 2nd subblock's hill climb position is constrained to lie near the first one, otherwise we can't code it.) Eventually, it switches from 4D to 3D hill climbing, by randomly climbing only within the best found intensity plane.
The nearly-best ETC1 encoding (using rg_etc1 - not hill climbing) was 38.053 dB:
Y: Error: Max: 38, Mean: 2.181, MSE: 10.181, RMSE: 3.191, PSNR: 38.053, SSIM: 0.983632
- 1 hill climb, 33.998 dB:
Y: Error: Max: 35, Mean: 3.943, MSE: 25.901, RMSE: 5.089, PSNR: 33.998, SSIM: 0.933979
- 2 hill climbs, 37.808 dB:
Y: Error: Max: 33, Mean: 2.281, MSE: 10.770, RMSE: 3.282, PSNR: 37.808, SSIM: 0.980324
- 4 hill climbs, 37.818 dB:
Y: Error: Max: 33, Mean: 2.280, MSE: 10.748, RMSE: 3.278, PSNR: 37.818, SSIM: 0.980280
- 16 hill climbs, 37.919 dB:
Y: Error: Max: 38, Mean: 2.241, MSE: 10.499, RMSE: 3.240, PSNR: 37.919, SSIM: 0.981631
That 2nd random 4D hill climb helps a lot. Quality quickly plateaus however, at least on this image, and subsequent climbs don't add much. Very interestingly to me, even just 4 climbs nearly matches the quality of my hand-tuned ETC1 optimizer.
The number of attempts per block is fixed. The first 4D hill climb always starts at the subblock's average color, with an intensity table index of 3. The second 4D hill climb starts at a random color/intensity. (In differential mode, the 2nd subblock's hill climb position is constrained to lie near the first one, otherwise we can't code it.) Eventually, it switches from 4D to 3D hill climbing, by randomly climbing only within the best found intensity plane.
The nearly-best ETC1 encoding (using rg_etc1 - not hill climbing) was 38.053 dB:
Y: Error: Max: 38, Mean: 2.181, MSE: 10.181, RMSE: 3.191, PSNR: 38.053, SSIM: 0.983632
- 1 hill climb, 33.998 dB:
Y: Error: Max: 35, Mean: 3.943, MSE: 25.901, RMSE: 5.089, PSNR: 33.998, SSIM: 0.933979
- 2 hill climbs, 37.808 dB:
Y: Error: Max: 33, Mean: 2.281, MSE: 10.770, RMSE: 3.282, PSNR: 37.808, SSIM: 0.980324
- 4 hill climbs, 37.818 dB:
Y: Error: Max: 33, Mean: 2.280, MSE: 10.748, RMSE: 3.278, PSNR: 37.818, SSIM: 0.980280
- 16 hill climbs, 37.919 dB:
Y: Error: Max: 38, Mean: 2.241, MSE: 10.499, RMSE: 3.240, PSNR: 37.919, SSIM: 0.981631
That 2nd random 4D hill climb helps a lot. Quality quickly plateaus however, at least on this image, and subsequent climbs don't add much. Very interestingly to me, even just 4 climbs nearly matches the quality of my hand-tuned ETC1 optimizer.
Friday, September 16, 2016
Visualizing ETC1 block encoding error as a 4D function
Given a particular 4x4 pixel block, what does the error of all possible ETC1 5:5:5 base color+3-bit intensity encodings look like? The resulting 4D visualization could inspire better optimization algorithms.
To compute these images, I created an ETC1 block in differential mode (5:5:5 base color with a 3:3:3 delta), set the base color to R,G,B, the diff color to (0,0,0), and set both subblock intensity table values to the same index from 0-7. I then encoded the source pixels (by finding the optimal selectors for each pixel), decoded them, and computed the overall block error (as perceptually R,G,B weighted color distance).
These visualizations are linear, where the brightest value (255) is max error, black is 0 error. The blocks used to compute each visualization are here too:
Finding the "best" block color+intensity table index to use in a subblock is basically a 4D search through functions like above. Hill climbing optimization seems useful, except for those pesky local minimums. For fun, I've already tried random-restart hill climbing, and it works, but there's got to be a better way.
rg_etc1 starts at the block's average color and scans outwards along the RGB axes, trying to find better colors. It always tries all 8 intensity tables every time it tries a candidate color (which in retrospect seems wildly inefficient, but hey I wrote it over a weekend years ago). It also has several refinement steps. One of them factors in the selectors of the best color found so far, in an attempt to improve the current block color. rg_etc1 ran circles around Mali's reference encoder, from what I remember, which was my goal.
To compute these images, I created an ETC1 block in differential mode (5:5:5 base color with a 3:3:3 delta), set the base color to R,G,B, the diff color to (0,0,0), and set both subblock intensity table values to the same index from 0-7. I then encoded the source pixels (by finding the optimal selectors for each pixel), decoded them, and computed the overall block error (as perceptually R,G,B weighted color distance).
These visualizations are linear, where the brightest value (255) is max error, black is 0 error. The blocks used to compute each visualization are here too:
Finding the "best" block color+intensity table index to use in a subblock is basically a 4D search through functions like above. Hill climbing optimization seems useful, except for those pesky local minimums. For fun, I've already tried random-restart hill climbing, and it works, but there's got to be a better way.
rg_etc1 starts at the block's average color and scans outwards along the RGB axes, trying to find better colors. It always tries all 8 intensity tables every time it tries a candidate color (which in retrospect seems wildly inefficient, but hey I wrote it over a weekend years ago). It also has several refinement steps. One of them factors in the selectors of the best color found so far, in an attempt to improve the current block color. rg_etc1 ran circles around Mali's reference encoder, from what I remember, which was my goal.
ETC1 texture format visualizations
I've been thinking about how to improve my ETC1 block encoder's quality. What little curiosities lie inside this seemingly simple format?
Hmm: Out of all possible ETC1 subblock colors in 5:5:5 differential mode, how many involve clamping R, G, and/or B to 0 or 255? Turns out, 72% (189704 out of 262144) of the possibilities involve clamping one or more components. That's much more often than I thought!
Here's a bitmap visualizing when the clamping occurs on any of the 4 block colors encoded by each 5:5:5 base color/3-bit intensity table combination. White pixels signify that one or more color components had to be clamped, and black signifies no clamping:
The basic assumption that each ETC1 subblock color lies nicely spread out along a single colorspace line isn't accurate, due to [0,255] clamping. So any optimization techniques written with this assumption in mind could be missing better solutions. Also, this impacts converting ETC1 to other formats like DXT1, because both endpoints of each colorspace line in DXT1 are separately encoded. Is this really a big deal? I dunno, but it's good to know.
Anyhow, here's a visualization of all possible subcolors. First, there are 4 images, one for each subblock color [0,3]. The 2-bit ETC1 selectors basically selector a color from one of these images.
Within an image, there are 8 rows, one for each of the ETC1 intensity tables. Within a row, there are 32 small "tiles" for blue, and within each little 32x32 tile is red (X) and green (Y).
Subscribe to:
Posts (Atom)



















