Tuesday, September 15, 2020

LZHAM and crunch are now Public Domain software

As of 9/15/2020, the LZHAM and crunch data compression libraries are in the Public Domain. For jurisdictions that don't recognize releasing Public Domain software, there are unlicense-style fallback clauses:


Thanks to Cowles & Thompson, a law firm in Dallas, TX for making this Public Domain release possible.

Wednesday, August 26, 2020

LZHAM and "crunch" IP will be placed into the Public Domain on 9/15/2020

As the owner of the "LZHAM" and "crunch" free open source software IP, I have decided to place these two works into the Public Domain in the United States, expressly waiving copyright protection. Once this is done this software will no longer by my or anyone's IP (i.e. it will NOT BE INTELLECTUAL PROPERTY, OR ANYONE'S PROPERTY). The upload placing these works into the Public Domain will occur on 9/15/2020 around noon EST.

This public domain declaration and anti-copyright waiver (somewhat derived from the unlicense and CC0) will be distributed along with the software:

THIS SOFTWARE IS IN THE PUBLIC DOMAIN

THIS IS FREE AND UNENCUMBERED SOFTWARE EXPLICITLY AND OVERTLY RELEASED AND CONTRIBUTED TO THE PUBLIC DOMAIN, PERMANENTLY, IRREVOCABLY AND UNCONDITIONALLY WAIVING ANY AND ALL CLAIM OF COPYRIGHT, IN PERPETUITY ON SEPTEMBER 15, 2020.

1. FALLBACK CLAUSES

THIS SOFTWARE MAY BE FREELY USED, DERIVED FROM, EXECUTED, LINKED WITH, MODIFIED AND DISTRIBUTED FOR ANY PURPOSE, COMMERCIAL OR NON-COMMERCIAL, BY ANYONE, FOR ANY REASON, WITH NO ATTRIBUTION, IN PERPETUITY.

THE AUTHOR OR AUTHORS OF THIS WORK HEREBY OVERTLY, FULLY, PERMANENTLY, IRREVOCABLY AND UNCONDITIONALLY FORFEITS AND WAIVES ALL CLAIM OF COPYRIGHT (ECONOMIC AND MORAL), ANY AND ALL RIGHTS OF INTEGRITY, AND ANY AND ALL RIGHTS OF ATTRIBUTION. ANYONE IS FREE TO COPY, MODIFY, ENHANCE, OPTIMIZE, PUBLISH, USE, COMPILE, DECOMPILE, ASSEMBLE, DISASSEMBLE, DOWNLOAD, UPLOAD, TRANSMIT, RECEIVE, SELL, FORK, DERIVE FROM, LINK, LINK TO, CALL, REFERENCE, WRAP, THUNK, ENCODE, ENCRYPT, TRANSFORM, STORE, RETRIEVE, DISTORT, DESTROY, RENAME, DELETE, BROADCAST, OR DISTRIBUTE THIS SOFTWARE, EITHER IN SOURCE CODE FORM, IN A TRANSLATED FORM, AS A LIBRARY, AS TEXT, IN PRINT, OR AS A COMPILED BINARY OR EXECUTABLE PROGRAM, OR IN DIGITAL FORM, OR IN ANALOG FORM, OR IN PHYSICAL FORM, OR IN ANY OTHER REPRESENTATION, FOR ANY PURPOSE, COMMERCIAL OR NON-COMMERCIAL, AND BY ANY MEANS, WITH NO ATTRIBUTION, IN PERPETUITY.

2. ANTI-COPYRIGHT WAIVER AND STATEMENT OF INTENT

IN JURISDICTIONS THAT RECOGNIZE COPYRIGHT LAWS, THE AUTHOR OR AUTHORS OF THIS SOFTWARE OVERTLY, FULLY, PERMANENTLY, IRREVOCABLY AND UNCONDITIONALLY DEDICATE, FORFEIT, AND WAIVE ANY AND ALL COPYRIGHT INTEREST IN THE SOFTWARE TO THE PUBLIC DOMAIN. WE MAKE THIS DEDICATION AND WAIVER FOR THE BENEFIT OF THE PUBLIC AT LARGE AND TO THE DETRIMENT OF OUR HEIRS AND SUCCESSORS. WE INTEND THIS DEDICATION AND WAIVER TO BE AN OVERT ACT OF RELINQUISHMENT IN PERPETUITY OF ALL PRESENT AND FUTURE RIGHTS TO THIS SOFTWARE UNDER COPYRIGHT LAW. WE INTEND THIS SOFTWARE TO BE FREELY USED, COMPILED, EXECUTED, MODIFIED, PUBLISHED, DERIVED FROM, OR DISTRIBUTED BY ANYONE, FOR ANY COMMERCIAL OR NON-COMMERCIAL USE, WITH NO ATTRIBUTION, IN PERPETUITY.

3. NO WARRANTY CLAUSE

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHOR OR AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE, OR DERIVING FROM THE SOFTWARE, OR LINKING WITH THE SOFTWARE, OR CALLING THE SOFTWARE, OR EXECUTING THE SOFTWARE, OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 

4. FINAL ANTI-COPYRIGHT AND INTENT FALLBACK CLAUSE

SHOULD ANY PART OF THIS PUBLIC DOMAIN DECLARATION, OR THE FALLBACK CLAUSES, OR THE ANTI-COPYRIGHT WAIVER FOR ANY REASON BE JUDGED LEGALLY INVALID OR INEFFECTIVE UNDER APPLICABLE LAW, THEN THE PUBLIC DOMAIN DECLARATION, THE 
FALLBACK CLAUSES, AND ANTI-COPYRIGHT WAIVER SHALL BE PRESERVED TO THE MAXIMUM EXTENT PERMITTED BY LAW TAKING INTO ACCOUNT THE ABOVE STATEMENT OF INTENT.

Thursday, April 16, 2020

Yet another BC1 encoder benchmark

stb_dxt v1.09, icbc, rgbcx v1.12, original crunch, and Unity's optimized variant of crunch. Both 4 and 3 color blocks can be used, but transparent texels are not utilized to get black/dark texels in this benchmark. Across a diverse assortment of 100 textures (not just images).



Same benchmark except this time with 3-color transparent texels used for black or dark texels in rgbcx (purple samples):


Here's an update, now with nvdxt.exe (black sample) and ispc_texcomp (brown sample). Note that the nvdxt.exe time is approximate because I had to spawn nvdxt.exe and it loads a .png and saves a .dds file. I did spawn it twice, once without timing it, then immediately again timing it.


nvdxt.exe command line:

nvdxt.exe -nomipmap -quality_highest -rms_threshold 50 -file image.png -output nvcompressed.dds -dxt1c -weight 1.0 1.0 1.0


Thursday, April 9, 2020

BC1 encoding initial endpoint determination benchmark

Benchmark of BC1 encoders using different methods to determine the initial endpoints: 

stb_dxt.h PCA: 35.754 dB, .551 us/block 
rgbcx.h PCA: 35.794, .651 
rgbcx.h PCA+inset: 35.925, .640 
rgbcx.h 2D LS+inset+opt round: 35.920 dB, .541 
rgbcx.h bounds+inset+XY covar: 35.836 dB, .472

This is across 100 textures, so even small avg. improvements are significant. Amazingly, the inset method (a few lines of code) buys rgbcx.h PCA .131 dB! All encoders should be doing this. You *must* pay attention to every little detail in these texture encoders.

Quality is performance in competitive texture block encoding, so even small boosts in quality allow us to dial down the # of total orders to check for the same average quality. This leads to a more competitive encoder.

Methods:

- bounds+inset+XY covar method is Castano's/van Waveren's. 
All encoders should be applying the "inset" method describes in this paper, because from a quantization perspective it makes perfect sense.

- 2D LS is Humus's method, ported to mostly integer math, with added inset+optimal rounding to 565: 

- stb_dxt.h and rgbcx.h PCA is 3D integer PCA (3x3 covar+4 power iters, pick 2 colors along principle axis). 

- PCA+inset+optimal rounding does PCA, picks 2 colors, then lerps the 2 colors by 1/16 or 15/16, then optimal rounds to 565.

Wednesday, April 8, 2020

AMD GPU BC1 decoding lookup tables

Here are the lookup tables you can use to determine how AMD GPU's decode BC1 textures: https://pastebin.com/raw/LSgn0ent

These tables were gathered straight from a Radeon RX 580 by using a small D3D9 app that rendered a textured BC1 quad with point sampling and did a CPU readback. I used this same D3D9 app on an NVidia 1080 and the pixels I read back exactly matched what the NV BC1 formulas on the web predicted, so I'm confident in the approach.

For selectors 0 and 1, the 5->8 and 6->8 endpoint conversion just uses bitshifts/OR's (same as ideal BC1). For 4-color selector 2, use the tables. For selector 3, just invert the low/high endpoints. (I've verified you can do this.) For 3-color selector 2, use the tables.

To access the tables, use [color0_component*32+color1_component], or *64 for 6-bits:
Block Compression (Direct3D 10) - Win32 appsdocs.microsoft.com

Converting the tables to formulas sounds like an interesting puzzle.

Example showing exactly how to use the tables to decode AMD BC1:


Tuesday, April 7, 2020

CPU BC1 Encoding Pareto Frontier

rgbcx.h now defines the BC1 Pareto Frontier for high quality CPU BC1 encoding (i.e. it's stronger than all other available practical high quality CPU encoders for both performance and quality):


Data:

Image

I didn't include AMD Compressonator's encoder because in previous benchmarks (conducted by others) it was beaten by a weaker version of rgbcx.h for both perf. and quality.

The overall CPU BC1 Pareto frontier is defined by ispc_texcomp (at low quality: ~33.1 dB) and rgbcx for any higher quality level. We're going to need SIMD to compete against ispc_texcomp BC1 (a weak stb_dxt clone), which is my next major goal.

To get rgbcx to compete against icbc for max. quality I had to add prioritized cluster fit support for 3-color blocks (not just 4).

It's possible to permit rgbcx to go to even higher quality levels by enlarging the total ordering tables. They're currently limited to 32 entries per total ordering.

I think rgbcx.h's max quality is slightly higher than icbc's HQ mode because prioritized cluster fit can afford to do optimal rounding and evaluate accurate MSE errors in every trial. Regular cluster fit can't afford to do so because it has to evaluate so many total orderings.

Links:
rgbcx: https://github.com/richgel999/bc7enc
libsquish: https://github.com/richgel999/libsquish
icbc: https://github.com/castano/icbc/blob/master/icbc.h

Saturday, April 4, 2020

New BC1 benchmark

Optimizing BC1 encoding is still useful and interesting because the same core algorithms are used in BC7 and ASTC/UASTC encoders. Most improvements made to BC1 encoding carry over nicely to the 2-bit and 3-bit selector modes of other formats.

Here's my latest benchmark:



The highest performing samples (above 37 dB) are rgbcx in 3-color block mode, where it can use transparent black colors (selector 3) for opaque black or very dark texels. (The only other BC1 encoder that might support this mode is the one in NVidia Texture Tools, but I'm not sure.) This technically turns opaque textures into textures with a useless alpha channel, but if the engine or shader just ignores alpha then this mode performs exceptionally well in the average case. The flags are cEncodeBC1Use3ColorBlocksForBlackPixels | cEncodeBC1Use3ColorBlocks

This mode is super useful because it allows the 3-color block encoder to focus the endpoints on the brighter texels within the block, potentially greatly increasing quality. Blocks with very dark or black texels are common in practice.

If your engine supports ignoring the alpha channel in sampled BC1 textures then everyone using BC1 should be using encoders that support this.

Data:

rgbcx.h flags:

- h is cEncodeBC1HighQuality
- ut is cEncodeBC1UseLikelyTotalOrderings
- ub is cEncodeBC1Use3ColorBlocksForBlackPixels
- 3 is cEncodeBC1Use3ColorBlocks

From the benchmarks I've seen it appears NVidia Texture Tools BC1 is around the same perf. as libsquish at slightly higher quality:


I believe this was rgbcx using 10 total orderings (the default setting). The max is 32, and every additional total ordering increases average quality. So at higher settings rgbcx is likely competitive against nvtt while being faster.

I'm currently working on integrating NVTT into my test app.