Monday, February 8, 2021

bc7enc_rdo encoding examples

Compress kodim.png to kodim03.dds (with no mips) to two BC7 modes (1+6):

Highest Quality Mode (uses Modes 1+6)

This mode is like ispc_texcomp or bc7e's BC7 compressor. bc7enc_rdo currently only uses modes 1/6 on opaque blocks, and modes 5/6/7 on alpha blocks. 

bc7enc.exe -o -u4 kodim08.png

...
BC7 mode histogram:
0: 0
1: 8703
2: 0
3: 0
4: 0
5: 0
6: 15873
7: 0

Pre-RDO output data size: 393216, LZ (Deflate) compressed file size: 378097, 7.69 bits/texel
Processing time: 0.113000 secs
...
Output data size: 393216, LZ (Deflate) compressed file size: 378097, 7.69 bits/texel
Wrote DDS file kodim08.dds
Luma Max error: 13 RMSE: 1.279412 PSNR 45.991 dB, PSNR per bits/texel: 59787.033452
RGB Max error: 37 RMSE: 2.065000 PSNR 41.832 dB, PSNR per bits/texel: 54381.448887
RGBA Max error: 37 RMSE: 1.805041 PSNR 43.001 dB, PSNR per bits/texel: 55900.685976

So the output .DDS file compressed to 7.69 bits/texel using miniz (stock non-optimal parsing Deflate, so a few percent worse vs. zopfli or 7za's Deflate). The RGB PSNR was 41.8 and the RGBA PSNR was 43 dB. It used mode 1 around half as much as mode 6.

Notice the pre-RDO compressed size is equal to the output's compressed size (7.69 bits/texel). There was no RDO, or anything in particular done to reduce the encoded output data's entropy. The output is mostly Huffman compressed because Deflate can't find many 3+ byte matches, so the output is quite close to 8 bits/texel. It's basically noise to Deflate or most other LZ's.

Reduced Entropy Mode (-e option)


This mode is as fast as before. It only causes the encoder to weight modes, p-bits, etc. differently so the output data is naturally more compressible by entropy/LZ coders:

bc7enc -o -u4 -zc2048 kodim08.png -e

BC7 mode histogram:
0: 0
1: 3385
2: 0
3: 0
4: 0
5: 0
6: 21191
7: 0
Pre-RDO output data size: 393216, LZ (Deflate) compressed file size: 352693, 7.18 bits/texel
Processing time: 0.116000 secs
Output data size: 393216, LZ (Deflate) compressed file size: 352693, 7.18 bits/texel
Wrote DDS file kodim08.dds
Luma  Max error:  18 RMSE: 1.368621 PSNR 45.405 dB, PSNR per bits/texel: 63277.507753
RGB   Max error:  48 RMSE: 2.456375 PSNR 40.325 dB, PSNR per bits/texel: 56197.596592
RGBA  Max error:  48 RMSE: 2.152539 PSNR 41.472 dB, PSNR per bits/texel: 57795.900335

The RGB error increased by 1.5 dB (from 41.8 dB to 40.3 dB - so less signal and more distortion), however the compressibility went up. The output is now 7.18 bits/texel instead of the previous 7.69! Notice also that the "PSNR per bits/texel" value (the compressibility index I use to monitor the encoder's effectiveness) for RGB is now 56197 vs. the previous 54381.


Rate Distortion Optimization with the Entropy Reduction Transform (-e -z#)


Now let's enable all the tools the encoder has to reduce the encoded output data's entropy. This mode is slower, but it trivially threadable and you can scale down the amount of total compute by reducing the window size using "-zc#":

bc7enc -o -u4 -zc2048 kodim08.png -e -z.5

BC7 mode histogram:
0: 0
1: 4028
2: 0
3: 0
4: 0
5: 0
6: 20548
7: 0
Pre-RDO output data size: 393216, LZ (Deflate) compressed file size: 354192, 7.21 bits/texel
rdo_total_threads: 40
Using an automatically computed smooth block error scale of 19.375000
lambda: 0.500000
Lookback window size: 2048
Max allowed RMS increase ratio: 10.000000
Max smooth block std dev: 18.000000
Smooth block max MSE scale: 19.375000
Total modified blocks: 21589 87.85%
Total RDO postprocess time: 2.765000 secs
Processing time: 2.846000 secs
Output data size: 393216, LZ (Deflate) compressed file size: 316364, 6.44 bits/texel
Wrote DDS file kodim08.dds
Luma  Max error:  41 RMSE: 2.749131 PSNR 39.347 dB, PSNR per bits/texel: 61131.435585
RGB   Max error:  48 RMSE: 3.286210 PSNR 37.797 dB, PSNR per bits/texel: 58723.280910
RGBA  Max error:  48 RMSE: 2.861897 PSNR 38.998 dB, PSNR per bits/texel: 60588.948928

First, I set the window size the compressor uses to insert byte sequences from previously encoded blocks into each output block to 2KB to increase compression, using "-zc2048". The default is only 256 bytes, which is way faster (.42 seconds vs. 2.92 on my system).

Notice the RGB PSNR has dropped to 37.8 dB, however the compressed file is now only 6.44 bits/texel. The compressibility index (PSNR per bits/texel) is 58723. This is significantly higher than the previous two encodes, so the encoder has been able to squeeze more signal into the output bits (once they are LZ compressed).

The -z option directly sets lambda, which controls the rate distortion tradeoff. The higher this value, the more likely the encoder is to substitute a block with a previous block's bytes (either entirely or partially), which increases distortion but reduces entropy.

RDO compression using MSE as the internal error metric is difficult on smooth or flat regions. The RDO encoder tries to automatically scale up the computed MSE's of smooth blocks (using a simple linear function of each block's color channel maximum standard deviation), but the settings are conservative. You'll notice a message like this printed when you use -z:

Using an automatically computed smooth block error scale of 19.375000

By default the command line tool tries to compute a max smooth block factor based off the supplied lambda setting. There is no single calculation/set of settings that work perfectly on all input textures, but the formula in the code works OK for most textures at low-ish lambdas. (For an example of a difficult texture the currently formulas/settings doesn't handle so well, try encoding kodim03 at lambdas 1-3.) I tried to tune smooth block handling so lambdas at or near 1 it looks OK on textures with smooth gradients, skies, etc. 

You can use the -zb# option to manually set a max smooth block scale factor to a higher value. -zb30-100 works well. You'll need to experiment. -zb1.0 disables all smooth block handling, so only MSE is plugged into the lambda calculation.

No comments:

Post a Comment