This is amazingly well done:
http://gamma.cs.unc.edu/GST/
Code:
https://github.com/GammaUNC/GST
Paper:
http://gamma.cs.unc.edu/GST/gst.pdf
Check this awesome timeline out:
I no longer feel so alone out here. I've been working on "Supercompressed Texture" technology for about a decade now, before I knew it would be named "Supercompressed Textures". The first title I was involved in that used GPU transcoding of compressed textures was for the PS2 version of World Series Baseball 2k3 (2003). It was designed by Blue Shift's then-CTO, John Brooks. This technology was then licensed to Electronic Arts for use in their PS3 titles.
And, my first Xbox 360 title (Halo Wars) relied on a real-time supercompressed texture decompression system I wrote in '06-'07, so the title would fit into memory at all. (crunch was actually my 2nd attempt at this approach, not my first.) So this tech has been around for years, being used behind closed doors in a low key way. It's like the academic world is just now catching on. In the professional game development world, this is advanced but still "old school" technology now.
My main bit of feedback about this paper, so far: The description of how the selector compression actually works is kinda muddled. (What's the "prefix sum" all about?) Also, it looks like crunch was used at the maximum quality level (255), not a tuned level or a number of levels. Crunch quality level 255 just isn't used in practice, to my knowledge. The codebooks at that level are huge and the image quality is unnecessarily high. Also, can I speed up crunch's CPU transcoder by 2-3x? Oh yes!
Another thing I noticed: Because GST doesn't support lossy endpoint quantization (like crunch does), I think its rate distortion performance is more limited than crunch's. crunch should be able to target lower bitrates than GST, is my guess. GST's main way of controlling the quality vs. rate tradeoff is its lossy dictionary-based selector compression method, while crunch can smoothly vary the quality of both the endpoints and selectors.
Next up: Universal Supercompressed Textures with either CPU or GPU decoding. (Isn't it obvious? We need to abstract away all of these crazy formats behind good technologies and shared tools.)
Co-owner of Binomial LLC, working on GPU texture interchange. Open source developer, graphics programmer, former video game developer. Worked previously at SpaceX (Starlink), Valve, Ensemble Studios (Microsoft), DICE Canada.
Sunday, October 16, 2016
Saturday, October 15, 2016
2D Haar Wavelet Transform on GPU texture selector indices
I've been very busy refining my new ETC1 compressor, so I haven't been posting much recently. Today I decided to do something different, so I've been playing around with the 2D Haar 4x4 and 8x8 transforms (or here) on ETC1 selector bits. I first did this years ago while writing crunch on DXT1/BC1, but I unfortunately didn't publish or use the results.
To use the Haar transform on selector indices, I prepare the input samples by adding .5 to each selector index (which range from [0,3] in ETC1), do the transform, uniform quantize, then do the inverse transform and truncate the resulting values back to the [0,3] selector range. (You must shift the input samples by .5 or it won't work.)
The quantization stage scales the the floating point coefficient by 4 (to get 2 bits to the right of the decimal point, which in experiments is just enough for 4x4) and converts to integer. This integer is then divided by a quantization value, then it's converted to float and divided by 4
For this uniform quantization matrix:
1 1 1 2 2 3 3 4
1 1 2 2 3 3 4 4
1 2 2 3 3 4 4 5
2 2 3 3 4 4 5 5
2 3 3 4 4 5 5 6
3 3 4 4 5 5 6 6
3 4 4 5 5 6 6 7
4 4 5 5 6 6 7 7
I get this ETC1 image after 8x8 Haar transform+quantization+inverse transform:
The original ETC1 compressed texture (before Haar filtering):
Selector visualization:
1x difference image (the delta between the original and filtered ETC1 images):
There is error in high frequencies, which is exactly what is to be expected given the above quantization matrix.
Here's a more aggressive quantization matrix:
2 4 6 8 10 12 14 16
4 6 8 10 12 14 16 18
6 8 10 12 14 16 18 20
8 10 12 14 16 18 20 22
10 12 14 16 18 20 22 24
12 14 16 18 20 22 24 26
14 16 18 20 22 24 26 28
16 18 20 22 24 26 28 30
ETC1 image:
Selector visualization:
An even more aggressive quantization matrix:
3 6 9 12 15 18 21 24
6 9 12 15 18 21 24 27
9 12 15 18 21 24 27 30
12 15 18 21 24 27 30 33
15 18 21 24 27 30 33 36
18 21 24 27 30 33 36 39
21 24 27 30 33 36 39 42
24 27 30 33 36 39 42 45
I have some ideas on how the 4x4 Haar transform could be very useful in Basis, but they are just ideas right now. I find it amazing that the selectors can be transformed and manipulated in the frequency domain like this.
To use the Haar transform on selector indices, I prepare the input samples by adding .5 to each selector index (which range from [0,3] in ETC1), do the transform, uniform quantize, then do the inverse transform and truncate the resulting values back to the [0,3] selector range. (You must shift the input samples by .5 or it won't work.)
The quantization stage scales the the floating point coefficient by 4 (to get 2 bits to the right of the decimal point, which in experiments is just enough for 4x4) and converts to integer. This integer is then divided by a quantization value, then it's converted to float and divided by 4
For this uniform quantization matrix:
1 1 1 2 2 3 3 4
1 1 2 2 3 3 4 4
1 2 2 3 3 4 4 5
2 2 3 3 4 4 5 5
2 3 3 4 4 5 5 6
3 3 4 4 5 5 6 6
3 4 4 5 5 6 6 7
4 4 5 5 6 6 7 7
I get this ETC1 image after 8x8 Haar transform+quantization+inverse transform:
The original ETC1 compressed texture (before Haar filtering):
Selector visualization:
1x difference image (the delta between the original and filtered ETC1 images):
There is error in high frequencies, which is exactly what is to be expected given the above quantization matrix.
Here's a more aggressive quantization matrix:
2 4 6 8 10 12 14 16
4 6 8 10 12 14 16 18
6 8 10 12 14 16 18 20
8 10 12 14 16 18 20 22
10 12 14 16 18 20 22 24
12 14 16 18 20 22 24 26
14 16 18 20 22 24 26 28
16 18 20 22 24 26 28 30
ETC1 image:
Selector visualization:
An even more aggressive quantization matrix:
3 6 9 12 15 18 21 24
6 9 12 15 18 21 24 27
9 12 15 18 21 24 27 30
12 15 18 21 24 27 30 33
15 18 21 24 27 30 33 36
18 21 24 27 30 33 36 39
21 24 27 30 33 36 39 42
24 27 30 33 36 39 42 45
Selector visualization:
I have some ideas on how the 4x4 Haar transform could be very useful in Basis, but they are just ideas right now. I find it amazing that the selectors can be transformed and manipulated in the frequency domain like this.
Saturday, October 8, 2016
RDO ETC1 texture compression tool output
Here's what my current experimental compression tool outputs to stdout while compressing a single image. I've begun to experiment with different perceptual metrics, such as PSNR-HVS and PSNR-HVSM. (I'm somewhat leery of SSIM/MS-SSIM for this problem domain, but I still compute it.)
texexp -out kodim23.ktx -e 2048 -s 8192 -adaptive -file kodak\kodim23.png
Source filename: kodak\kodim23.png 768x512
Force ETC1S: 0 NumEndpointClusters: 2048 NumSelectorClusters: 8192 Adaptive: 1
Num failed 555 packing: 565 out of 24576 blocks, 692 out of 2048 clusters
clustered RGB: Error: Max: 109, Mean: 2.925, MSE: 20.664, RMSE: 4.546, PSNR: 34.979, SSIM: 0.928872
clustered R: Error: Max: 63, Mean: 3.007, MSE: 20.729, RMSE: 4.553, PSNR: 34.965, SSIM: 0.929768
clustered G: Error: Max: 52, Mean: 2.049, MSE: 9.793, RMSE: 3.129, PSNR: 38.222, SSIM: 0.960504
clustered B: Error: Max: 109, Mean: 3.719, MSE: 31.472, RMSE: 5.610, PSNR: 33.152, SSIM: 0.896344
clustered Y: Error: Max: 53, Mean: 1.661, MSE: 6.830, RMSE: 2.613, PSNR: 39.787, SSIM: 0.971262
best_etc1 RGB: Error: Max: 61, Mean: 2.235, MSE: 11.225, RMSE: 3.350, PSNR: 37.629, SSIM: 0.946851
best_etc1 R: Error: Max: 50, Mean: 2.245, MSE: 10.829, RMSE: 3.291, PSNR: 37.785, SSIM: 0.952394
best_etc1 G: Error: Max: 35, Mean: 1.529, MSE: 5.260, RMSE: 2.294, PSNR: 40.921, SSIM: 0.973678
best_etc1 B: Error: Max: 61, Mean: 2.931, MSE: 17.587, RMSE: 4.194, PSNR: 35.679, SSIM: 0.914481
best_etc1 Y: Error: Max: 33, Mean: 1.231, MSE: 3.618, RMSE: 1.902, PSNR: 42.547, SSIM: 0.981130
etcpak_etc1 RGB: Error: Max: 113, Mean: 2.494, MSE: 14.325, RMSE: 3.785, PSNR: 36.570, SSIM: 0.940693
etcpak_etc1 R: Error: Max: 60, Mean: 2.637, MSE: 14.934, RMSE: 3.864, PSNR: 36.389, SSIM: 0.938706
etcpak_etc1 G: Error: Max: 65, Mean: 1.875, MSE: 8.011, RMSE: 2.830, PSNR: 39.094, SSIM: 0.964159
etcpak_etc1 B: Error: Max: 113, Mean: 2.970, MSE: 20.031, RMSE: 4.476, PSNR: 35.114, SSIM: 0.919215
etcpak_etc1 Y: Error: Max: 64, Mean: 1.497, MSE: 5.630, RMSE: 2.373, PSNR: 40.625, SSIM: 0.975777
clustered_s RGB: Error: Max: 109, Mean: 3.065, MSE: 21.885, RMSE: 4.678, PSNR: 34.729, SSIM: 0.918469
clustered_s R: Error: Max: 63, Mean: 3.136, MSE: 21.956, RMSE: 4.686, PSNR: 34.715, SSIM: 0.919488
clustered_s G: Error: Max: 52, Mean: 2.241, MSE: 11.118, RMSE: 3.334, PSNR: 37.670, SSIM: 0.948531
clustered_s B: Error: Max: 109, Mean: 3.818, MSE: 32.581, RMSE: 5.708, PSNR: 33.001, SSIM: 0.887388
clustered_s Y: Error: Max: 53, Mean: 1.874, MSE: 8.115, RMSE: 2.849, PSNR: 39.038, SSIM: 0.959831
ETC1/2 block histogram:
ETC1_DIFFERENTIAL: 23576
ETC1_INDIVIDUAL: 1000
ETC2_T: 0
ETC2_H: 0
ETC2_PLANAR: 0
Total blocks: 24576, ETC1S: 22391 (91.109%), Diff: 23576 (95.931%), Indiv: 1000 (4.069%), Flip: 9005 (36.641%)
Wrote file kodim23.ktx
clustered_s LZMA compressed from 196676 to 89489 bytes, 1.820658 bits/texel
Best ETC1 LZMA compressed from 196676 to 130830 bytes, 2.661743 bits/texel
etcpak ETC1 LZMA compressed from 196676 to 117666 bytes, 2.393921 bits/texel
OpenCV SSIM:
R: 0.919503
G: 0.948527
B: 0.887368
Avg: 0.918466
709 L: 0.9599
basislib:
RGB Total Error: Max: 109, Mean: 9.195, MSE: 65.655, RMSE: 8.103, PSNR: 29.958
RGB Average Error: Max: 109, Mean: 3.065, MSE: 21.885, RMSE: 4.678, PSNR: 34.729, SSIM: 0.918469
Luma Error: Max: 53, Mean: 1.845, MSE: 7.865, RMSE: 2.805, PSNR: 39.174, SSIM: 0.959831
Red Error: Max: 63, Mean: 3.136, MSE: 21.956, RMSE: 4.686, PSNR: 34.715, SSIM: 0.919488
Green Error: Max: 52, Mean: 2.241, MSE: 11.118, RMSE: 3.334, PSNR: 37.670, SSIM: 0.948531
Blue Error: Max: 109, Mean: 3.818, MSE: 32.581, RMSE: 5.708, PSNR: 33.001, SSIM: 0.887388
PSNR-HVS: 85.836
PSNR-HVSM: 90.828
Experiment succeeded.
The tool outputs over a dozen debug images. Here's some of the compressor prototype's output:
ETC1S visualization (white=ETC1 subset differential, green=ETC1 subset individual, black=full ETC1). The "ETC1 subset" format is a simplified form of ETC1 where both subblocks are constrained to use the same block colors.
Quantized selectors:
texexp -out kodim23.ktx -e 2048 -s 8192 -adaptive -file kodak\kodim23.png
Source filename: kodak\kodim23.png 768x512
Force ETC1S: 0 NumEndpointClusters: 2048 NumSelectorClusters: 8192 Adaptive: 1
Num failed 555 packing: 565 out of 24576 blocks, 692 out of 2048 clusters
clustered RGB: Error: Max: 109, Mean: 2.925, MSE: 20.664, RMSE: 4.546, PSNR: 34.979, SSIM: 0.928872
clustered R: Error: Max: 63, Mean: 3.007, MSE: 20.729, RMSE: 4.553, PSNR: 34.965, SSIM: 0.929768
clustered G: Error: Max: 52, Mean: 2.049, MSE: 9.793, RMSE: 3.129, PSNR: 38.222, SSIM: 0.960504
clustered B: Error: Max: 109, Mean: 3.719, MSE: 31.472, RMSE: 5.610, PSNR: 33.152, SSIM: 0.896344
clustered Y: Error: Max: 53, Mean: 1.661, MSE: 6.830, RMSE: 2.613, PSNR: 39.787, SSIM: 0.971262
best_etc1 RGB: Error: Max: 61, Mean: 2.235, MSE: 11.225, RMSE: 3.350, PSNR: 37.629, SSIM: 0.946851
best_etc1 R: Error: Max: 50, Mean: 2.245, MSE: 10.829, RMSE: 3.291, PSNR: 37.785, SSIM: 0.952394
best_etc1 G: Error: Max: 35, Mean: 1.529, MSE: 5.260, RMSE: 2.294, PSNR: 40.921, SSIM: 0.973678
best_etc1 B: Error: Max: 61, Mean: 2.931, MSE: 17.587, RMSE: 4.194, PSNR: 35.679, SSIM: 0.914481
best_etc1 Y: Error: Max: 33, Mean: 1.231, MSE: 3.618, RMSE: 1.902, PSNR: 42.547, SSIM: 0.981130
etcpak_etc1 RGB: Error: Max: 113, Mean: 2.494, MSE: 14.325, RMSE: 3.785, PSNR: 36.570, SSIM: 0.940693
etcpak_etc1 R: Error: Max: 60, Mean: 2.637, MSE: 14.934, RMSE: 3.864, PSNR: 36.389, SSIM: 0.938706
etcpak_etc1 G: Error: Max: 65, Mean: 1.875, MSE: 8.011, RMSE: 2.830, PSNR: 39.094, SSIM: 0.964159
etcpak_etc1 B: Error: Max: 113, Mean: 2.970, MSE: 20.031, RMSE: 4.476, PSNR: 35.114, SSIM: 0.919215
etcpak_etc1 Y: Error: Max: 64, Mean: 1.497, MSE: 5.630, RMSE: 2.373, PSNR: 40.625, SSIM: 0.975777
clustered_s RGB: Error: Max: 109, Mean: 3.065, MSE: 21.885, RMSE: 4.678, PSNR: 34.729, SSIM: 0.918469
clustered_s R: Error: Max: 63, Mean: 3.136, MSE: 21.956, RMSE: 4.686, PSNR: 34.715, SSIM: 0.919488
clustered_s G: Error: Max: 52, Mean: 2.241, MSE: 11.118, RMSE: 3.334, PSNR: 37.670, SSIM: 0.948531
clustered_s B: Error: Max: 109, Mean: 3.818, MSE: 32.581, RMSE: 5.708, PSNR: 33.001, SSIM: 0.887388
clustered_s Y: Error: Max: 53, Mean: 1.874, MSE: 8.115, RMSE: 2.849, PSNR: 39.038, SSIM: 0.959831
ETC1/2 block histogram:
ETC1_DIFFERENTIAL: 23576
ETC1_INDIVIDUAL: 1000
ETC2_T: 0
ETC2_H: 0
ETC2_PLANAR: 0
Total blocks: 24576, ETC1S: 22391 (91.109%), Diff: 23576 (95.931%), Indiv: 1000 (4.069%), Flip: 9005 (36.641%)
Wrote file kodim23.ktx
clustered_s LZMA compressed from 196676 to 89489 bytes, 1.820658 bits/texel
Best ETC1 LZMA compressed from 196676 to 130830 bytes, 2.661743 bits/texel
etcpak ETC1 LZMA compressed from 196676 to 117666 bytes, 2.393921 bits/texel
OpenCV SSIM:
R: 0.919503
G: 0.948527
B: 0.887368
Avg: 0.918466
709 L: 0.9599
basislib:
RGB Total Error: Max: 109, Mean: 9.195, MSE: 65.655, RMSE: 8.103, PSNR: 29.958
RGB Average Error: Max: 109, Mean: 3.065, MSE: 21.885, RMSE: 4.678, PSNR: 34.729, SSIM: 0.918469
Luma Error: Max: 53, Mean: 1.845, MSE: 7.865, RMSE: 2.805, PSNR: 39.174, SSIM: 0.959831
Red Error: Max: 63, Mean: 3.136, MSE: 21.956, RMSE: 4.686, PSNR: 34.715, SSIM: 0.919488
Green Error: Max: 52, Mean: 2.241, MSE: 11.118, RMSE: 3.334, PSNR: 37.670, SSIM: 0.948531
Blue Error: Max: 109, Mean: 3.818, MSE: 32.581, RMSE: 5.708, PSNR: 33.001, SSIM: 0.887388
PSNR-HVS: 85.836
PSNR-HVSM: 90.828
Experiment succeeded.
The tool outputs over a dozen debug images. Here's some of the compressor prototype's output:
ETC1S visualization (white=ETC1 subset differential, green=ETC1 subset individual, black=full ETC1). The "ETC1 subset" format is a simplified form of ETC1 where both subblocks are constrained to use the same block colors.
Quantized selectors:
Friday, October 7, 2016
RDO ETC1 texture compression prototype
I've now got a basic ETC1 RDO compressor working. Clusterization is now used on both the block colors/intensity table indices and selectors. This compressor supports the entire ETC1 format: 2 subblocks per block, flipping, and both differential and individual block color modes.
Here's kodim14 using only 256 unique selector vectors and 256 block colors/intensity table indices:
This is just a bare minimum prototype. It doesn't support crunch-style macroblock tiling, or required things like mipmaps, texture arrays, etc. It's a proof of principle prototype that crunch-style RDO compression is totally doable in the full ETC1 format.
Here are more examples. I have PSNR and SSIM stats, which I'm going to focus on next.
16 block color, 16 selector clusters:
32, 32:
64, 64:
128, 128:
512, 512:
4096:
512 block color, 3072 selector clusters:
Here's kodim14 using only 256 unique selector vectors and 256 block colors/intensity table indices:
This is just a bare minimum prototype. It doesn't support crunch-style macroblock tiling, or required things like mipmaps, texture arrays, etc. It's a proof of principle prototype that crunch-style RDO compression is totally doable in the full ETC1 format.
Here are more examples. I have PSNR and SSIM stats, which I'm going to focus on next.
128, 128:
512, 512:
4096:
512 block color, 3072 selector clusters:
ETC1 block color clusterization progress
I've got block color ("endpoint") clusterization working pretty well with the full ETC1 format. (Not just a subset, like in last month's endpoint clusterization experiment.)
Here are some quick examples, using only 256 unique block color/intensity table values for each image, and RGB avg. error metrics. There are actually two tables for each image, one for differential and another for individual mode, each built from the same 256 clusters. The tables are closely related, so it's possible to store the block colors in 555 format and use them as predictors to delta code the 444 block colors.
This is the first (and trickiest) major step to full ETC1 CRN/RDO support in Basis (the successor to crunch). In practice I think 256 unique endpoints is too few, but I'm purposely limited the # of clusters to get a feel for how well the current algorithm works.
kodim18 at 256 endpoint clusters, with tile and differential bit visualizations:
Here are some quick examples, using only 256 unique block color/intensity table values for each image, and RGB avg. error metrics. There are actually two tables for each image, one for differential and another for individual mode, each built from the same 256 clusters. The tables are closely related, so it's possible to store the block colors in 555 format and use them as predictors to delta code the 444 block colors.
This is the first (and trickiest) major step to full ETC1 CRN/RDO support in Basis (the successor to crunch). In practice I think 256 unique endpoints is too few, but I'm purposely limited the # of clusters to get a feel for how well the current algorithm works.
kodim18 at 256 endpoint clusters, with tile and differential bit visualizations:
Wednesday, October 5, 2016
ETC1 optimization notes
I've been optimizing this function:
std::pair<etc1_bits, error> = ETC1Encode(pixels, options).
Which actually gives me a really fast way of accurately computing this:
error = ETC1Distance(pixelsA, pixelsB, options).
I'm seriously considering a SIMD implementation next. I wrote one for DXT1 just for fun last week.
std::pair<etc1_bits, error> = ETC1Encode(pixels, options).
Which actually gives me a really fast way of accurately computing this:
error = ETC1Distance(pixelsA, pixelsB, options).
I'm seriously considering a SIMD implementation next. I wrote one for DXT1 just for fun last week.
I need this distance function to be fast in order to justify another series of bottom->up clusterization experiments, and on improving the clusterization process itself.
Monday, October 3, 2016
ETC2 planar block only output created with etcpak
Bartosz Taudul (etcpak author) sent these ETC2 planar block only encodings in a reply to my previous post. For planar-only they look amazing!
Note: I've verified these images myself by hacking etcpak's ProcessRGB_ETC2() function to immediately "return result.first" after it calls Planar( src ); It returns all planar blocks in this case. I've verified this by generating a histogram of the used ETC1/2 modes in all the encoded blocks.
Note: I've verified these images myself by hacking etcpak's ProcessRGB_ETC2() function to immediately "return result.first" after it calls Planar( src ); It returns all planar blocks in this case. I've verified this by generating a histogram of the used ETC1/2 modes in all the encoded blocks.
Hey GPU texture format engineers: Come on, give us more basis functions to play with! I'm starting to look more deeply at ETC2 encoded textures, and a surprising amount of blocks in some textures are using planar mode vs. the other ETC2 modes.
Subscribe to:
Posts (Atom)














































