Saturday, October 15, 2016

2D Haar Wavelet Transform on GPU texture selector indices

I've been very busy refining my new ETC1 compressor, so I haven't been posting much recently. Today I decided to do something different, so I've been playing around with the 2D Haar 4x4 and 8x8 transforms (or here) on ETC1 selector bits. I first did this years ago while writing crunch on DXT1/BC1, but I unfortunately didn't publish or use the results.

To use the Haar transform on selector indices, I prepare the input samples by adding .5 to each selector index (which range from [0,3] in ETC1), do the transform, uniform quantize, then do the inverse transform and truncate the resulting values back to the [0,3] selector range. (You must shift the input samples by .5 or it won't work.)

The quantization stage scales the the floating point coefficient by 4 (to get 2 bits to the right of the decimal point, which in experiments is just enough for 4x4) and converts to integer. This integer is then divided by a quantization value, then it's converted to float and divided by 4

For this uniform quantization matrix:
  1   1   1   2   2   3   3   4
  1   1   2   2   3   3   4   4
  1   2   2   3   3   4   4   5
  2   2   3   3   4   4   5   5
  2   3   3   4   4   5   5   6
  3   3   4   4   5   5   6   6
  3   4   4   5   5   6   6   7
  4   4   5   5   6   6   7   7

I get this ETC1 image after 8x8 Haar transform+quantization+inverse transform:


The original ETC1 compressed texture (before Haar filtering):


Selector visualization:


1x difference image (the delta between the original and filtered ETC1 images):


There is error in high frequencies, which is exactly what is to be expected given the above quantization matrix.

Here's a more aggressive quantization matrix:

  2   4   6   8  10  12  14  16
  4   6   8  10  12  14  16  18
  6   8  10  12  14  16  18  20
  8  10  12  14  16  18  20  22
 10  12  14  16  18  20  22  24
 12  14  16  18  20  22  24  26
 14  16  18  20  22  24  26  28
 16  18  20  22  24  26  28  30

ETC1 image:


Selector visualization:


An even more aggressive quantization matrix:

  3   6   9  12  15  18  21  24
  6   9  12  15  18  21  24  27
  9  12  15  18  21  24  27  30
 12  15  18  21  24  27  30  33
 15  18  21  24  27  30  33  36
 18  21  24  27  30  33  36  39
 21  24  27  30  33  36  39  42
 24  27  30  33  36  39  42  45


Selector visualization:


I have some ideas on how the 4x4 Haar transform could be very useful in Basis, but they are just ideas right now. I find it amazing that the selectors can be transformed and manipulated in the frequency domain like this.

Saturday, October 8, 2016

RDO ETC1 texture compression tool output

Here's what my current experimental compression tool outputs to stdout while compressing a single image. I've begun to experiment with different perceptual metrics, such as PSNR-HVS and PSNR-HVSM. (I'm somewhat leery of SSIM/MS-SSIM for this problem domain, but I still compute it.)

texexp -out kodim23.ktx -e 2048 -s 8192 -adaptive -file kodak\kodim23.png

Source filename: kodak\kodim23.png 768x512
Force ETC1S: 0 NumEndpointClusters: 2048 NumSelectorClusters: 8192 Adaptive: 1
Num failed 555 packing: 565 out of 24576 blocks, 692 out of 2048 clusters

clustered  RGB: Error: Max: 109, Mean: 2.925, MSE: 20.664, RMSE: 4.546, PSNR: 34.979, SSIM: 0.928872
clustered    R: Error: Max:  63, Mean: 3.007, MSE: 20.729, RMSE: 4.553, PSNR: 34.965, SSIM: 0.929768
clustered    G: Error: Max:  52, Mean: 2.049, MSE: 9.793, RMSE: 3.129, PSNR: 38.222, SSIM: 0.960504
clustered    B: Error: Max: 109, Mean: 3.719, MSE: 31.472, RMSE: 5.610, PSNR: 33.152, SSIM: 0.896344
clustered    Y: Error: Max:  53, Mean: 1.661, MSE: 6.830, RMSE: 2.613, PSNR: 39.787, SSIM: 0.971262

best_etc1    RGB: Error: Max:  61, Mean: 2.235, MSE: 11.225, RMSE: 3.350, PSNR: 37.629, SSIM: 0.946851
best_etc1      R: Error: Max:  50, Mean: 2.245, MSE: 10.829, RMSE: 3.291, PSNR: 37.785, SSIM: 0.952394
best_etc1      G: Error: Max:  35, Mean: 1.529, MSE: 5.260, RMSE: 2.294, PSNR: 40.921, SSIM: 0.973678
best_etc1      B: Error: Max:  61, Mean: 2.931, MSE: 17.587, RMSE: 4.194, PSNR: 35.679, SSIM: 0.914481
best_etc1      Y: Error: Max:  33, Mean: 1.231, MSE: 3.618, RMSE: 1.902, PSNR: 42.547, SSIM: 0.981130

etcpak_etc1  RGB: Error: Max: 113, Mean: 2.494, MSE: 14.325, RMSE: 3.785, PSNR: 36.570, SSIM: 0.940693
etcpak_etc1    R: Error: Max:  60, Mean: 2.637, MSE: 14.934, RMSE: 3.864, PSNR: 36.389, SSIM: 0.938706
etcpak_etc1    G: Error: Max:  65, Mean: 1.875, MSE: 8.011, RMSE: 2.830, PSNR: 39.094, SSIM: 0.964159
etcpak_etc1    B: Error: Max: 113, Mean: 2.970, MSE: 20.031, RMSE: 4.476, PSNR: 35.114, SSIM: 0.919215
etcpak_etc1    Y: Error: Max:  64, Mean: 1.497, MSE: 5.630, RMSE: 2.373, PSNR: 40.625, SSIM: 0.975777

clustered_s  RGB: Error: Max: 109, Mean: 3.065, MSE: 21.885, RMSE: 4.678, PSNR: 34.729, SSIM: 0.918469
clustered_s    R: Error: Max:  63, Mean: 3.136, MSE: 21.956, RMSE: 4.686, PSNR: 34.715, SSIM: 0.919488
clustered_s    G: Error: Max:  52, Mean: 2.241, MSE: 11.118, RMSE: 3.334, PSNR: 37.670, SSIM: 0.948531
clustered_s    B: Error: Max: 109, Mean: 3.818, MSE: 32.581, RMSE: 5.708, PSNR: 33.001, SSIM: 0.887388
clustered_s    Y: Error: Max:  53, Mean: 1.874, MSE: 8.115, RMSE: 2.849, PSNR: 39.038, SSIM: 0.959831

ETC1/2 block histogram:
ETC1_DIFFERENTIAL: 23576
ETC1_INDIVIDUAL: 1000
ETC2_T: 0
ETC2_H: 0
ETC2_PLANAR: 0

Total blocks: 24576, ETC1S: 22391 (91.109%), Diff: 23576 (95.931%), Indiv: 1000 (4.069%), Flip: 9005 (36.641%)

Wrote file kodim23.ktx

clustered_s LZMA compressed from 196676 to 89489 bytes, 1.820658 bits/texel
Best ETC1 LZMA compressed from 196676 to 130830 bytes, 2.661743 bits/texel
etcpak ETC1 LZMA compressed from 196676 to 117666 bytes, 2.393921 bits/texel

OpenCV SSIM:
R:     0.919503
G:     0.948527
B:     0.887368
Avg:   0.918466
709 L: 0.9599

basislib:
RGB Total   Error: Max: 109, Mean: 9.195, MSE: 65.655, RMSE: 8.103, PSNR: 29.958
RGB Average Error: Max: 109, Mean: 3.065, MSE: 21.885, RMSE: 4.678, PSNR: 34.729, SSIM: 0.918469
Luma        Error: Max:  53, Mean: 1.845, MSE: 7.865, RMSE: 2.805, PSNR: 39.174, SSIM: 0.959831
Red         Error: Max:  63, Mean: 3.136, MSE: 21.956, RMSE: 4.686, PSNR: 34.715, SSIM: 0.919488
Green       Error: Max:  52, Mean: 2.241, MSE: 11.118, RMSE: 3.334, PSNR: 37.670, SSIM: 0.948531
Blue        Error: Max: 109, Mean: 3.818, MSE: 32.581, RMSE: 5.708, PSNR: 33.001, SSIM: 0.887388

PSNR-HVS:  85.836
PSNR-HVSM: 90.828

Experiment succeeded.

The tool outputs over a dozen debug images. Here's some of the compressor prototype's output:





ETC1S visualization (white=ETC1 subset differential, green=ETC1 subset individual, black=full ETC1). The "ETC1 subset" format is a simplified form of ETC1 where both subblocks are constrained to use the same block colors.



Quantized selectors:


Differential vs. individual mode visualization (black=individual 444 444, white=differential 555 333):


Block flip visualization:


Quantized subblock 0 and 1 intensity tables:



Quantized subblock 0 and 1 block colors:



Friday, October 7, 2016

RDO ETC1 texture compression prototype

I've now got a basic ETC1 RDO compressor working. Clusterization is now used on both the block colors/intensity table indices and selectors. This compressor supports the entire ETC1 format: 2 subblocks per block, flipping, and both differential and individual block color modes.

Here's kodim14 using only 256 unique selector vectors and 256 block colors/intensity table indices:


This is just a bare minimum prototype. It doesn't support crunch-style macroblock tiling, or required things like mipmaps, texture arrays, etc. It's a proof of principle prototype that crunch-style RDO compression is totally doable in the full ETC1 format.

Here are more examples. I have PSNR and SSIM stats, which I'm going to focus on next.

16 block color, 16 selector clusters:

 32, 32:


64, 64:

128, 128:

512, 512:

4096:

512 block color, 3072 selector clusters:


ETC1 block color clusterization progress

I've got block color ("endpoint") clusterization working pretty well with the full ETC1 format. (Not just a subset, like in last month's endpoint clusterization experiment.)

Here are some quick examples, using only 256 unique block color/intensity table values for each image, and RGB avg. error metrics. There are actually two tables for each image, one for differential and another for individual mode, each built from the same 256 clusters. The tables are closely related, so it's possible to store the block colors in 555 format and use them as predictors to delta code the 444 block colors.

This is the first (and trickiest) major step to full ETC1 CRN/RDO support in Basis (the successor to crunch). In practice I think 256 unique endpoints is too few, but I'm purposely limited the # of clusters to get a feel for how well the current algorithm works.











 kodim18 at 256 endpoint clusters, with tile and differential bit visualizations:







Wednesday, October 5, 2016

ETC1 optimization notes

I've been optimizing this function:

std::pair<etc1_bits, error> = ETC1Encode(pixels, options).

Which actually gives me a really fast way of accurately computing this:

error = ETC1Distance(pixelsA, pixelsB, options).

I'm seriously considering a SIMD implementation next. I wrote one for DXT1 just for fun last week.

I need this distance function to be fast in order to justify another series of bottom->up clusterization experiments, and on improving the clusterization process itself. 

Monday, October 3, 2016

ETC2 planar block only output created with etcpak

Bartosz Taudul (etcpak author) sent these ETC2 planar block only encodings in a reply to my previous post. For planar-only they look amazing!

Note: I've verified these images myself by hacking etcpak's ProcessRGB_ETC2() function to immediately "return result.first" after it calls Planar( src ); It returns all planar blocks in this case. I've verified this by generating a histogram of the used ETC1/2 modes in all the encoded blocks.

Hey GPU texture format engineers: Come on, give us more basis functions to play with! I'm starting to look more deeply at ETC2 encoded textures, and a surprising amount of blocks in some textures are using planar mode vs. the other ETC2 modes.











He also says that etcpak uses planar blocks quite often (blue indicates a planar block):











Sunday, October 2, 2016

ETC2 texture compression using exclusively planar blocks

The usual explanation given for planar blocks is that they are intended for smoothly varying blocks (see my previous post on planar blocks). So it seems natural to model the block's pixels as three planes (separately R, G, and B) when trying to create a trial planar encoding.

etc2comp tries fitting several lines along the edges of the block to compute a trial plane definition, then it "twiddles" the quantized coordinates to minimize the quantization error. From what I've been told, in practice planar blocks aren't actually used that much in ETC2 compression, which seems sad to me.

I realized while looking at the planar mode's unpack function that the "offset" or "origin" component is equivalent to the DC component in the DCT. And, the H and V components are equivalent to two of the lowest frequency basis vectors (not counting DC) in the 4x4 DCT (circled in red):


So planar blocks actually support three basis vectors (including the DC component, in the upper left). So, why not try encoding each block in a test image as a ETC2-style planar block and see what the results looks like? In other words, look at using planar blocks from the perspective of transform coding.

In this experiment, I find the average block color, set the planar "O" color to that, then subtract that out from the block. I then dot (or inner product) the left-over pixel values against the H basis vector, then the V basis vector. I encode the resulting vectors into the ETC2 planar block H and V colors. I then use the same code to unpack this as I use to decode actual ETC2 planar block data. (Note because this is only a little fun experiment on Sunday night, I'm using 888 colors for O, H, and V, not 676, but I think the results should hold up in 676 with careful quantization/twiddling.)

The results are interesting. (See my newer post for much better looking planar-only encodings created by etcpak.) Remember, only planar blocks are used:

DC+H+V:



DC+H+V:


DC only:



DC+H+V:


DC only:

Rest are DC+H+V: