Richard Geldreich's Blog: RAD's ground breaking lossless compression product benchmarked

Wednesday, August 3, 2016

RAD's ground breaking lossless compression product benchmarked

Intro

Progress in the practical lossless compression field has been painfully stagnant in recent years. The state of the art is now rapidly changing, with several new open source codecs announced in recent times (such as Brotli and Zstd) offering high ratios and fast decompression. Recently, RAD Game Tools released several new codecs as part of its Oodle data compression product.

My first impression after benchmarking these new codecs was "what the hell, this can't be right", and after running the benchmarks again and double checking everything my attitude changed to "this is the new state of the art, and open source codecs have a lot of catching up to do".

This post uses the same benchmarking tool and data corpus that I used in this one.

Updated Aug. 5th: Changed compiler optimization settings from /O2 to /OX and disabled exceptions, re-ran benchmark and regenerated graphs, added codec versions and benchmarking machine info.

Codec Settings

All benchmarking was done under Win 10 x64, 64-bit executable, in a single process with no multithreading.

lz4 (v1.74): level 8, LZ4_compressHC2_limitedOutput() for compression, LZ4_decompress_safe() for decompression (Note: LZ4_decompress_fast() is faster, but it's inherently unsafe. I personally would never use the dangerous fast() version in a project.)
lzham (lzham_codec_devel on github): level "uber", dict size 2^26
brotli (latest on github as of Aug. 1, 2016): level 10, dict size 2^24
bitknit (standalone library provided by RAD): BitKnit_Level_VeryHigh
Zstd (v0.8.0): ZSTD_MAX_CLEVEL
Oodle library version: v2.3.0
Kraken: OodleLZ_CompressionLevel_Optimal2
Mermaid: OodleLZ_CompressionLevel_Optimal2
Selkie: OodleLZ_CompressionLevel_Optimal2
zlib (v1.2.8): level 9

Data corpus used: LZHAM corpus, only files 1KB or larger, 22,827 total files

Benchmarking machine:

Compiler and optimization settings:

Visual Studio 2015 Community Edition, 14.0.25424.00 Update 3

/GS- /GL /W4 /Gy /Zc:wchar_t /Zi /Gm- /Ox /Zc:inline /fp:precise /WX- /Zc:forScope /arch:AVX /Gd /Oi /MT

Totals

Sorted by highest to lowest ratio:

Original 5374152762

brotli 1671777094

lzham 1676729104

kraken 1685750158

zstd 1686207733

bitknit 1707850562

mermaid 1834845751

zlib 1963751711

selkie 1989554820

lz4 2131656949

Ratio vs. Decompression Throughput - Overview

On the far left there's LZHAM (dark gray), which at this point is looking pretty slow. (For a time, it was the decompression speed leader of the high ratio codecs, being 2-3x faster than LZMA.) Moving roughly left to right, there's Brotli (brown), zlib (light blue), Zstd (dark green), BitKnit (dark blue), Kraken (red), then a cluster of very fast codecs (LZ4 - yellow, Selkie - purple, Medmaid - light green, and even a sprinkling of Kraken - red).

Notes:

Kraken is just amazingly strong. It has a very high ratio with ground breaking decompression performance. There is nothing else like it in the open source world. Kraken's decompressor runs circles around the other high-ratio codecs (LZHAM, Brotli, Zstd) and is even faster than zlib!
Mermaid and Selkie combine the best of both worlds, being as fast or faster than LZ4 to decompress, but with compression ratios competitive or better than zlib!

Ratio vs. Decompression Throughput - High decompression throughput (LZ4, Mermaid, Selkie)

* LZ4 note: LZ4's decompression performance depends on whether or not the data was compressed with the HC or non-HC version of the compressor. I used the HC version for this post, which appears to output data which decompresses a bit faster. I'm guessing it's because there's less compressed data to process in HC streams.

Ratio vs. Decompression Throughput - High ratio codecs

LZHAM, Brotli, Kraken, BitKnit, and Zstd, with zlib and lz4 included for reference purposes. Kraken is starting to give LZ4 some competition, which is pretty amazing considering Kraken is a high ratio codec!

Ratio vs. Compression Throughput - All codecs

For the first time on my blog, here's a ratio vs. compression throughput scatter graph.

Notes:

LZHAM's compressor in single threaded mode is very slow. (Compression throughput was never a priority of LZHAM.) Brotli's compressor is also a slowpoke.
Interestingly, most of the other compressors cluster closely together in the 5-10mg/sec region.
zlib and lz4 are both very fast. lz4 isn't a surprise, but I'm a little surprised by how much zlib stands apart from the others.
There's definitely room here for compression speed improvements in the other codecs.

Conclusion

The open source world should be inspired by the amazing progress RAD has made here. If you're working on a product that needs lossless compression, RAD's Oodle product offers a massive amount of value. There's nothing else out there like it. I can't stress how big of a deal RAD's new lossless codecs are. Just their existence clearly demonstrates that there is still room for large improvements in this field.

Thanks to RAD Game Tools for providing me with a drop of their Oodle Data Compression library for me to benchmark, and to Charles Bloom for providing feedback on the codec settings and benchmark approach.

14 comments:

DanglingPointerAugust 3, 2016 at 4:46 PM
Have you benched lrzip? How does it compare to the above?
ReplyDelete
Replies
ckAugust 3, 2016 at 7:37 PM
I assume the corpus you used isn't available anywhere for download? I could at least give you a compression ratio for lrzip achieved on linux though it has not been coded with windows in mind (though there's a slight chance with ming.)
ReplyDelete
Replies
Daniël MantioneAugust 3, 2016 at 11:21 PM
My experience with Rad Game Tools is that for their video coding products, their algorithms aren't revolutionary at all, but they put a lot of work into developing an efficient implementation. They are not shy to use machine code and often use optimized multi-threading.

For the codecs under duscission here, obviously the low compression ratios show the algorithms must be some good. However, the high speed of the codecs suggest Rad have been putting a lot of work into optimized implementations again. Now that's in general a good thing, I like fast stuff :) It is also a stark contrast to how the open source community develops code. For open source code, portability and readability of the implementation is first priority, and can even be reason not to apply performance optimizations. You have to keep this in mind while doing these comparisons.
ReplyDelete
Replies
Roger PackAugust 4, 2016 at 3:05 PM
comparison to lzturbo please? Too bad this stuff isn't open source
ReplyDelete
Replies

Add comment