Saturday, January 23, 2021

How to benchmark or use the UASTC encoder in the Basis Universal library

UASTC is a subset of LDR ASTC 4x4, 4x4 block size, always 8bpp, and very high quality. If your engine/product/benchmark supports BC7 or LDR ASTC 4x4, trying out our UASTC encoder/transcoder (without using .basis or .KTX2 at all) is pretty simple:

Compile/link in the Basis Universal encoder and transcoder .cpp files (or put them into libs). Call basisu_encoder_init() at startup.

To encode 4x4 blocks to the 8bpp UASTC format, call encode_uastc():
https://github.com/BinomialLLC/basis_universal/blob/master/encoder/basisu_uastc_enc.h

To decode UASTC blocks to raw 32bpp pixels, call

bool unpack_uastc(const uastc_block& blk, color32* pPixels, bool srgb);

Set the "srgb" flag to always false right now, because that's what the UASTC encoder assumes it will be set to. (We're fixing this for the Feb. release.)

Or you can call transcode_uastc_to_bc7() or transcode_uastc_to_astc(), then unpack those blocks yourself (ASTC will always be equal or higher quality than BC7 because UASTC is a pure subset of LDR 4x4 ASTC):

https://github.com/BinomialLLC/basis_universal/blob/master/transcoder/basisu_transcoder_uastc.h

There's an optional RDO post processor in there too that you can call on arrays of UASTC blocks, but it's pretty basic right now. See uastc_rdo().

The advantage of UASTC is that you can transcode it at run-time to basically any texture format. There are very high quality transcoders to BC1-5, ETC1/2, BC7, etc. It even supports PVRTC1. The disadvantage is a slight drop in quality vs. best BC7/ASTC, but not much, and slower encoding. We even throw in a free RDO encoder (as a simple post processor) for UASTC.

Tuesday, September 15, 2020

LZHAM and crunch are now Public Domain software

As of 9/15/2020, acting as the full legal owner of the LZHAM and crunch data compression libraries, I (acting as an individual) have placed these libraries into the Public Domain. For jurisdictions that don't recognize releasing Public Domain software, there are unlicense-style fallback clauses:


Thanks to Cowles & Thompson, a law firm in Dallas, TX for making this Public Domain release possible.

Wednesday, August 26, 2020

LZHAM and "crunch" IP will be placed into the Public Domain on 9/15/2020

As the owner of the "LZHAM" and "crunch" free open source software IP, I have decided to place these two works into the Public Domain in the United States, expressly waiving copyright protection. Once this is done this software will no longer by my or anyone's IP (i.e. it will NOT BE INTELLECTUAL PROPERTY, OR ANYONE'S PROPERTY). The upload placing these works into the Public Domain will occur on 9/15/2020 around noon EST.

This public domain declaration and anti-copyright waiver (somewhat derived from the unlicense and CC0) will be distributed along with the software:

THIS SOFTWARE IS IN THE PUBLIC DOMAIN

THIS IS FREE AND UNENCUMBERED SOFTWARE EXPLICITLY AND OVERTLY RELEASED AND CONTRIBUTED TO THE PUBLIC DOMAIN, PERMANENTLY, IRREVOCABLY AND UNCONDITIONALLY WAIVING ANY AND ALL CLAIM OF COPYRIGHT, IN PERPETUITY ON SEPTEMBER 15, 2020.

1. FALLBACK CLAUSES

THIS SOFTWARE MAY BE FREELY USED, DERIVED FROM, EXECUTED, LINKED WITH, MODIFIED AND DISTRIBUTED FOR ANY PURPOSE, COMMERCIAL OR NON-COMMERCIAL, BY ANYONE, FOR ANY REASON, WITH NO ATTRIBUTION, IN PERPETUITY.

THE AUTHOR OR AUTHORS OF THIS WORK HEREBY OVERTLY, FULLY, PERMANENTLY, IRREVOCABLY AND UNCONDITIONALLY FORFEITS AND WAIVES ALL CLAIM OF COPYRIGHT (ECONOMIC AND MORAL), ANY AND ALL RIGHTS OF INTEGRITY, AND ANY AND ALL RIGHTS OF ATTRIBUTION. ANYONE IS FREE TO COPY, MODIFY, ENHANCE, OPTIMIZE, PUBLISH, USE, COMPILE, DECOMPILE, ASSEMBLE, DISASSEMBLE, DOWNLOAD, UPLOAD, TRANSMIT, RECEIVE, SELL, FORK, DERIVE FROM, LINK, LINK TO, CALL, REFERENCE, WRAP, THUNK, ENCODE, ENCRYPT, TRANSFORM, STORE, RETRIEVE, DISTORT, DESTROY, RENAME, DELETE, BROADCAST, OR DISTRIBUTE THIS SOFTWARE, EITHER IN SOURCE CODE FORM, IN A TRANSLATED FORM, AS A LIBRARY, AS TEXT, IN PRINT, OR AS A COMPILED BINARY OR EXECUTABLE PROGRAM, OR IN DIGITAL FORM, OR IN ANALOG FORM, OR IN PHYSICAL FORM, OR IN ANY OTHER REPRESENTATION, FOR ANY PURPOSE, COMMERCIAL OR NON-COMMERCIAL, AND BY ANY MEANS, WITH NO ATTRIBUTION, IN PERPETUITY.

2. ANTI-COPYRIGHT WAIVER AND STATEMENT OF INTENT

IN JURISDICTIONS THAT RECOGNIZE COPYRIGHT LAWS, THE AUTHOR OR AUTHORS OF THIS SOFTWARE OVERTLY, FULLY, PERMANENTLY, IRREVOCABLY AND UNCONDITIONALLY DEDICATE, FORFEIT, AND WAIVE ANY AND ALL COPYRIGHT INTEREST IN THE SOFTWARE TO THE PUBLIC DOMAIN. WE MAKE THIS DEDICATION AND WAIVER FOR THE BENEFIT OF THE PUBLIC AT LARGE AND TO THE DETRIMENT OF OUR HEIRS AND SUCCESSORS. WE INTEND THIS DEDICATION AND WAIVER TO BE AN OVERT ACT OF RELINQUISHMENT IN PERPETUITY OF ALL PRESENT AND FUTURE RIGHTS TO THIS SOFTWARE UNDER COPYRIGHT LAW. WE INTEND THIS SOFTWARE TO BE FREELY USED, COMPILED, EXECUTED, MODIFIED, PUBLISHED, DERIVED FROM, OR DISTRIBUTED BY ANYONE, FOR ANY COMMERCIAL OR NON-COMMERCIAL USE, WITH NO ATTRIBUTION, IN PERPETUITY.

3. NO WARRANTY CLAUSE

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHOR OR AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE, OR DERIVING FROM THE SOFTWARE, OR LINKING WITH THE SOFTWARE, OR CALLING THE SOFTWARE, OR EXECUTING THE SOFTWARE, OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 

4. FINAL ANTI-COPYRIGHT AND INTENT FALLBACK CLAUSE

SHOULD ANY PART OF THIS PUBLIC DOMAIN DECLARATION, OR THE FALLBACK CLAUSES, OR THE ANTI-COPYRIGHT WAIVER FOR ANY REASON BE JUDGED LEGALLY INVALID OR INEFFECTIVE UNDER APPLICABLE LAW, THEN THE PUBLIC DOMAIN DECLARATION, THE 
FALLBACK CLAUSES, AND ANTI-COPYRIGHT WAIVER SHALL BE PRESERVED TO THE MAXIMUM EXTENT PERMITTED BY LAW TAKING INTO ACCOUNT THE ABOVE STATEMENT OF INTENT.

Thursday, April 16, 2020

Yet another BC1 encoder benchmark

stb_dxt v1.09, icbc, rgbcx v1.12, original crunch, and Unity's optimized variant of crunch. Both 4 and 3 color blocks can be used, but transparent texels are not utilized to get black/dark texels in this benchmark. Across a diverse assortment of 100 textures (not just images).



Same benchmark except this time with 3-color transparent texels used for black or dark texels in rgbcx (purple samples):


Here's an update, now with nvdxt.exe (black sample) and ispc_texcomp (brown sample). Note that the nvdxt.exe time is approximate because I had to spawn nvdxt.exe and it loads a .png and saves a .dds file. I did spawn it twice, once without timing it, then immediately again timing it.


nvdxt.exe command line:

nvdxt.exe -nomipmap -quality_highest -rms_threshold 50 -file image.png -output nvcompressed.dds -dxt1c -weight 1.0 1.0 1.0


Thursday, April 9, 2020

BC1 encoding initial endpoint determination benchmark

Benchmark of BC1 encoders using different methods to determine the initial endpoints: 

stb_dxt.h PCA: 35.754 dB, .551 us/block 
rgbcx.h PCA: 35.794, .651 
rgbcx.h PCA+inset: 35.925, .640 
rgbcx.h 2D LS+inset+opt round: 35.920 dB, .541 
rgbcx.h bounds+inset+XY covar: 35.836 dB, .472

This is across 100 textures, so even small avg. improvements are significant. Amazingly, the inset method (a few lines of code) buys rgbcx.h PCA .131 dB! All encoders should be doing this. You *must* pay attention to every little detail in these texture encoders.

Quality is performance in competitive texture block encoding, so even small boosts in quality allow us to dial down the # of total orders to check for the same average quality. This leads to a more competitive encoder.

Methods:

- bounds+inset+XY covar method is Castano's/van Waveren's. 
All encoders should be applying the "inset" method describes in this paper, because from a quantization perspective it makes perfect sense.

- 2D LS is Humus's method, ported to mostly integer math, with added inset+optimal rounding to 565: 

- stb_dxt.h and rgbcx.h PCA is 3D integer PCA (3x3 covar+4 power iters, pick 2 colors along principle axis). 

- PCA+inset+optimal rounding does PCA, picks 2 colors, then lerps the 2 colors by 1/16 or 15/16, then optimal rounds to 565.

Wednesday, April 8, 2020

AMD GPU BC1 decoding lookup tables

Here are the lookup tables you can use to determine how AMD GPU's decode BC1 textures: https://pastebin.com/raw/LSgn0ent

These tables were gathered straight from a Radeon RX 580 by using a small D3D9 app that rendered a textured BC1 quad with point sampling and did a CPU readback. I used this same D3D9 app on an NVidia 1080 and the pixels I read back exactly matched what the NV BC1 formulas on the web predicted, so I'm confident in the approach.

For selectors 0 and 1, the 5->8 and 6->8 endpoint conversion just uses bitshifts/OR's (same as ideal BC1). For 4-color selector 2, use the tables. For selector 3, just invert the low/high endpoints. (I've verified you can do this.) For 3-color selector 2, use the tables.

To access the tables, use [color0_component*32+color1_component], or *64 for 6-bits:
Block Compression (Direct3D 10) - Win32 appsdocs.microsoft.com

Converting the tables to formulas sounds like an interesting puzzle.

Example showing exactly how to use the tables to decode AMD BC1:


Tuesday, April 7, 2020

CPU BC1 Encoding Pareto Frontier

rgbcx.h now defines the BC1 Pareto Frontier for high quality CPU BC1 encoding (i.e. it's stronger than all other available practical high quality CPU encoders for both performance and quality):


Data:

Image

I didn't include AMD Compressonator's encoder because in previous benchmarks (conducted by others) it was beaten by a weaker version of rgbcx.h for both perf. and quality.

The overall CPU BC1 Pareto frontier is defined by ispc_texcomp (at low quality: ~33.1 dB) and rgbcx for any higher quality level. We're going to need SIMD to compete against ispc_texcomp BC1 (a weak stb_dxt clone), which is my next major goal.

To get rgbcx to compete against icbc for max. quality I had to add prioritized cluster fit support for 3-color blocks (not just 4).

It's possible to permit rgbcx to go to even higher quality levels by enlarging the total ordering tables. They're currently limited to 32 entries per total ordering.

I think rgbcx.h's max quality is slightly higher than icbc's HQ mode because prioritized cluster fit can afford to do optimal rounding and evaluate accurate MSE errors in every trial. Regular cluster fit can't afford to do so because it has to evaluate so many total orderings.

Links:
rgbcx: https://github.com/richgel999/bc7enc
libsquish: https://github.com/richgel999/libsquish
icbc: https://github.com/castano/icbc/blob/master/icbc.h