Wednesday, August 3, 2016

RAD's ground breaking lossless compression product benchmarked

Intro


Progress in the practical lossless compression field has been painfully stagnant in recent years. The state of the art is now rapidly changing, with several new open source codecs announced in recent times (such as Brotli and Zstd) offering high ratios and fast decompression. Recently, RAD Game Tools released several new codecs as part of its Oodle data compression product.

My first impression after benchmarking these new codecs was "what the hell, this can't be right", and after running the benchmarks again and double checking everything my attitude changed to "this is the new state of the art, and open source codecs have a lot of catching up to do".

This post uses the same benchmarking tool and data corpus that I used in this one.

Updated Aug. 5th: Changed compiler optimization settings from /O2 to /OX and disabled exceptions, re-ran benchmark and regenerated graphs, added codec versions and benchmarking machine info.

Codec Settings


All benchmarking was done under Win 10 x64, 64-bit executable, in a single process with no multithreading.

  • lz4 (v1.74): level 8, LZ4_compressHC2_limitedOutput() for compression, LZ4_decompress_safe() for decompression (Note: LZ4_decompress_fast() is faster, but it's inherently unsafe. I personally would never use the dangerous fast() version in a project.)
  • lzham (lzham_codec_devel on github): level "uber", dict size 2^26
  • brotli (latest on github as of Aug. 1, 2016): level 10, dict size 2^24
  • bitknit (standalone library provided by RAD): BitKnit_Level_VeryHigh
  • Zstd (v0.8.0): ZSTD_MAX_CLEVEL
  • Oodle library version: v2.3.0
  • Kraken: OodleLZ_CompressionLevel_Optimal2
  • Mermaid: OodleLZ_CompressionLevel_Optimal2
  • Selkie: OodleLZ_CompressionLevel_Optimal2
  • zlib (v1.2.8): level 9

Data corpus used: LZHAM corpus, only files 1KB or larger, 22,827 total files

Benchmarking machine:


Compiler and optimization settings:

Visual Studio 2015 Community Edition, 14.0.25424.00 Update 3

/GS- /GL /W4 /Gy /Zc:wchar_t /Zi /Gm- /Ox /Zc:inline /fp:precise /WX- /Zc:forScope /arch:AVX /Gd /Oi /MT

Totals


Sorted by highest to lowest ratio:

Original    5374152762

brotli      1671777094 
lzham       1676729104 
kraken      1685750158 
zstd        1686207733 
bitknit     1707850562 
mermaid     1834845751 
zlib        1963751711 
selkie      1989554820 
lz4         2131656949 

Ratio vs. Decompression Throughput - Overview




On the far left there's LZHAM (dark gray), which at this point is looking pretty slow. (For a time, it was the decompression speed leader of the high ratio codecs, being 2-3x faster than LZMA.) Moving roughly left to right, there's Brotli (brown), zlib (light blue), Zstd (dark green), BitKnit (dark blue), Kraken (red), then a cluster of very fast codecs (LZ4 - yellow, Selkie - purple, Medmaid - light green, and even a sprinkling of Kraken - red).

Notes:
  • Kraken is just amazingly strong. It has a very high ratio with ground breaking decompression performance. There is nothing else like it in the open source world. Kraken's decompressor runs circles around the other high-ratio codecs (LZHAM, Brotli, Zstd) and is even faster than zlib!
  • Mermaid and Selkie combine the best of both worlds, being as fast or faster than LZ4 to decompress, but with compression ratios competitive or better than zlib! 

Ratio vs. Decompression Throughput - High decompression throughput (LZ4, Mermaid, Selkie)



* LZ4 note: LZ4's decompression performance depends on whether or not the data was compressed with the HC or non-HC version of the compressor. I used the HC version for this post, which appears to output data which decompresses a bit faster. I'm guessing it's because there's less compressed data to process in HC streams.



Ratio vs. Decompression Throughput - High ratio codecs



LZHAM, Brotli, Kraken, BitKnit, and Zstd, with zlib and lz4 included for reference purposes. Kraken is starting to give LZ4 some competition, which is pretty amazing considering Kraken is a high ratio codec!


Ratio vs. Compression Throughput - All codecs


For the first time on my blog, here's a ratio vs. compression throughput scatter graph.



Notes:

  • LZHAM's compressor in single threaded mode is very slow. (Compression throughput was never a priority of LZHAM.) Brotli's compressor is also a slowpoke.
  • Interestingly, most of the other compressors cluster closely together in the 5-10mg/sec region.
  • zlib and lz4 are both very fast. lz4 isn't a surprise, but I'm a little surprised by how much zlib stands apart from the others. 
  • There's definitely room here for compression speed improvements in the other codecs.

Conclusion


The open source world should be inspired by the amazing progress RAD has made here. If you're working on a product that needs lossless compression, RAD's Oodle product offers a massive amount of value. There's nothing else out there like it. I can't stress how big of a deal RAD's new lossless codecs are. Just their existence clearly demonstrates that there is still room for large improvements in this field.

Thanks to RAD Game Tools for providing me with a drop of their Oodle Data Compression library for me to benchmark, and to Charles Bloom for providing feedback on the codec settings and benchmark approach.


14 comments:

  1. Have you benched lrzip? How does it compare to the above?

    ReplyDelete
    Replies
    1. Sorry I have not. However, looking at its readme, it appears to ultimately leverage an already existing compressor (like lzo, gzip, or lzma), and these codecs are now more or less obsolete from a performance/ratio perspective.

      Delete
    2. This compressor comparison test is quite interesting and if you're building custom compression tools, probably quite relevant (although I think that memory usage and total decompression/compression time might be factors to measure/consider) but for real-world usage, you might be better off considering lrzip (or lbzip2) as they are parallel/multithreaded apps (lrzip also has a nice progress bar when used interactively that I appreciated as well).

      I recently had to compress many multi-gigabyte files, and after trying a bunch of tools out there, I ended up using lbzip2 - besides having pretty decent compression and being very fast it generates archives that are binary compatible w/ regular bzip2 (and also gets speed boosts even decompressing regular bzip2 files). It worked so well that I symlinked bzip2 to lbzip2 on all my systems. I made a couple quick notes from that search.

      Delete
    3. http://www.gstatic.com/b/brotlidocs/brotli-2015-09-22.pdf
      This is a fairly old benchmark that showed brotli sweet spots. brotli 1 is is compression speed sweet spot. Brotli 9 is it throughput sweet spot and brotli 11 is it on disc size sweet spot. 10 is one of it bad performing spot more target to compression size than throughput so of course it did not perform well.

      Basically you have just done Truth Lies and Statistics.

      --Brotli's compressor is also a slowpoke-- of course it was 10 and 11 both are to use the methods to get the smaller on disc at cost of performance. 11 out performs 10 by a bit.

      zstd max ZSTD_MAX_CLEVEL need to read what it is. Again this is attempt max out disc compression at the cost of performance.

      lzham settings are for max compress not max performance as well

      So you basically crippled three of the open source compression options by not knowing what you were choosing.

      Most cases with compression you are after Throughput with open source compression you don't set them on max. The insane part is the performance difference between brotli 9 and brotli 10 is massive. Yes it quite a steep lost for quite minor compression gains.

      Delete
    4. Thanks for your feedback oiaohm, although it sounds overly negative and confrontational. Of course, I have the tools to just benchmark each codec at *every* possible setting each codec supports. I may just do this. I highly doubt any of my conclusions will fundamentally change. All practical open source codecs I'm aware of are now behind the state of the art.

      At the end of the day, I need to choose a set of settings which are reasonably representative of each codec, otherwise the graphs just get insane. At the end of the day, users are going to select a single setting (usually they just go "max compression" - assuming that setting doesn't take insanely long to compress) and run with it. Perhaps these open source codecs should explicitly offer a setting that intelligently balances compression against decompression throughput.

      brotli 11 out performs 10, but 11 is just way too slow to use in practice. It would be the slowest compressing codec out of all the ones I benchmark (much slower than lzham, which is already too slow), greatly increasing how long it takes to complete a benchmark run. I've seen that pdf. It's overly generous of Brotli's real-life performance IMHO. The delta between Brotli and Kraken is extremely large, even speeding its decompressor up by 2X (!) still places it behind Kraken.

      lzham settings are purposely set to max compression. That's the point. I wrote this codec, and all other settings purposely cripple the compressor (and aren't intended to explicitly speed up decompression). If using a lower settings does speed up decompression it's a side effect - it's not explicitly part of lzham's design.


      Delete
    5. Users don't just choose max compression. Users who would be willing to buy a compression product would first generate a pareto frontier using their test data. Then they might discover their data contains a lot of natural language text and choose a PPM compressor instead of a general purpose oriented compressor. For my data I discovered that if zstd is available, LZ4 is only useful at its fastest setting.

      Delete
    6. "Perhaps these open source codecs should explicitly offer a setting that intelligently balances compression against decompression throughput."

      Most open-source compression codecs come with a commandline binary configured with a default compression setting which provides the lion's share of the compression without becoming excessively slow. But you're right there are a few black sheep and you've picked several of them for your comparison.

      Brotli (brotli) comes with no settings at all, nor even hints as to acceptable values for the 'quality' flag, lol. On a scale of 1..10 of usability it ranks itself at 1.

      Zopfli (zopfli) 15 of 1..1000 (surely they are joking) provides gzip, zlib, deflate modes.

      LZ4 (liblz4-tool) comes configured for "fastest" compression because speed is the market they are trying to appeal to.

      LZHAM - I'll defer to the author for how he'd like to represent his product ;-)

      lzop (lzop) defaults to 3 of 1..9 because it "favors speed over compression ratio"
      lzip (lzip) defaults to 6 of 1..9
      lzma (lzma) "
      xz (xz) "
      gzip (gzip) "

      bzip2 (bzip2) defaults to 9 of 1 to 9, but the speed doesn't vary much regardless

      Compressors people have or probably will mention(ed) but which are not codecs:

      zstd (zstd) defaults to 1 of 1..21, so its fastest setting, and I'm unsure of why that is. this isn't really a codec.. this is an archiver.

      lrzip (lrzip) is not a codec but actually a bit of a swiss army knife leveraging tar, lzma, gzip, lzo, bzip2, and zpaq, and a sort of block sorting pre-processing stage.

      rzip (rzip) is an old version of lrzip using bzip2 compression and optimised for compressing CD-ROM images, or folders of highly redundant data such as the Linux Kernel sources. lzip radically outperformed .tar.bz2 at that time.

      7zip (7z; p7zip-full) changes defaults based on the files you pass it. For very small files ie 64k and smaller it will use deflate. for larger files it will use lzma with a relatively small dictionary, and it will scale up the dictonary size to 32M as the input file set gets larger. It's really pretty intuitive. But this is not a 'codec' either.

      Nano-zip (nz) this is an lzma/ppmd-based archiver with probably the best overall peformance for general consumer use, but again this is not a codec.

      Delete
    7. http://richg42.blogspot.com/2016/08/few-notes-about-previous-post.html

      Please show me the Brotli, Zstd, and/or LZHAM compressor settings that speed up decompression by 5x or more. In the mean time, I'm standing by my post here.

      Delete
  2. I assume the corpus you used isn't available anywhere for download? I could at least give you a compression ratio for lrzip achieved on linux though it has not been coded with windows in mind (though there's a slight chance with ming.)

    ReplyDelete
    Replies
    1. I could release a good chunk of my corpus, but some of the data I can't release.

      Delete
  3. My experience with Rad Game Tools is that for their video coding products, their algorithms aren't revolutionary at all, but they put a lot of work into developing an efficient implementation. They are not shy to use machine code and often use optimized multi-threading.

    For the codecs under duscission here, obviously the low compression ratios show the algorithms must be some good. However, the high speed of the codecs suggest Rad have been putting a lot of work into optimized implementations again. Now that's in general a good thing, I like fast stuff :) It is also a stark contrast to how the open source community develops code. For open source code, portability and readability of the implementation is first priority, and can even be reason not to apply performance optimizations. You have to keep this in mind while doing these comparisons.

    ReplyDelete
    Replies
    1. Basically RADs effort goes into understanding CPU IS and compiling on every platform. That's cool too. They are not academia. And when they have interesting results they publish them: https://github.com/rygorous/ryg_rans/ http://arxiv.org/abs/1402.3392

      Delete
    2. there are some exceptions that are "fast" (lz4fast and density come to mind). Though I guess these people don't have the time and resources to make it as cool as the closed source stuff .

      Delete
  4. comparison to lzturbo please? Too bad this stuff isn't open source

    ReplyDelete