I have been attacked (at a time in my life when the last thing I needed was more stress!) by a patent holder before, so hey I hate software patents:
http://www.infoworld.com/article/2619609/open-source-software/why-software-patents-are-evil.html
Co-owner of Binomial LLC, working on GPU texture interchange. Open source developer, graphics programmer, former video game developer. Worked previously at SpaceX (Starlink), Valve, Ensemble Studios (Microsoft), DICE Canada.
Monday, August 29, 2016
Friday, August 5, 2016
Brotli levels 0-10 vs. Oodle Kraken
For codec version info, compiler settings, etc. see this previous post.
This graph demonstrates that varying Brotli's compression level from 0 to 10 noticeably impacts its decompression throughput. (Level 11 is just too slow to complete the benchmark overnight.) As I expected, at none of these settings is it able to compete against Kraken.
Interestingly, it appears that at Brotli's lowest settings (0 and 1) it outputs compressed data that is extremely (and surprisingly) slow to decode. (I've highlighted these settings in yellow and green below.) I'm not sure if this is intentional or not, but with this kind of large slowdown I would avoid these Brotli settings (and use something like zlib or LZ4 instead if you need that much throughput).
Level Compressed Size
0 2144016081
1 2020173184
2 1963448673
3 1945877537
4 1905601392
5 1829657573
6 1803865722
7 1772564848
8 1756332118
9 1746959367
10 1671777094
Original 5374152762
This graph demonstrates that varying Brotli's compression level from 0 to 10 noticeably impacts its decompression throughput. (Level 11 is just too slow to complete the benchmark overnight.) As I expected, at none of these settings is it able to compete against Kraken.
Interestingly, it appears that at Brotli's lowest settings (0 and 1) it outputs compressed data that is extremely (and surprisingly) slow to decode. (I've highlighted these settings in yellow and green below.) I'm not sure if this is intentional or not, but with this kind of large slowdown I would avoid these Brotli settings (and use something like zlib or LZ4 instead if you need that much throughput).
Level Compressed Size
0 2144016081
1 2020173184
2 1963448673
3 1945877537
4 1905601392
5 1829657573
6 1803865722
7 1772564848
8 1756332118
9 1746959367
10 1671777094
Original 5374152762
Thursday, August 4, 2016
Few notes about the previous post
This rant is mostly directed at the commenters that claimed I hobbled the open source codecs (including my own!) by not selecting the "proper" settings:
Please look closely at the red dots. Those represent Kraken. Now, this is a log10/log2 graph (log10 on the throughput axis.) Kraken's decompressor is almost one order of magnitude faster than Brotli's. Specifically, it's around 5-8x faster, just from eyeing the graph. No amount of tweaking Brotli's settings is going to speed it up this much. Sorry everyone. I've benchmarked Brotli at settings 0-10 (11 is just too slow) overnight and I'll post them tomorrow, just to be sure.
There is only a single executable file. The codecs are statically linked into this executable. All open source codecs were compiled with Visual Studio 2015 with optimizations enabled. They all use the same exact compiler settings. I'll update the previous post tomorrow with the specific settings.
I'm not releasing my data corpus. Neither does Squeeze Chart. This is to prevent codec authors from tweaking their algorithms to perform well on a specific corpus while neglecting general purpose performance. It's just a large mix of data I found over time that was useful for developing and testing LZHAM. I didn't develop this corpus with any specific goals in mind, and it just happens to be useful as a compressor benchmark. (The reasoning goes: If it was good enough to tune LZHAM, it should be good enough for newer codecs.)
Please look closely at the red dots. Those represent Kraken. Now, this is a log10/log2 graph (log10 on the throughput axis.) Kraken's decompressor is almost one order of magnitude faster than Brotli's. Specifically, it's around 5-8x faster, just from eyeing the graph. No amount of tweaking Brotli's settings is going to speed it up this much. Sorry everyone. I've benchmarked Brotli at settings 0-10 (11 is just too slow) overnight and I'll post them tomorrow, just to be sure.
There is only a single executable file. The codecs are statically linked into this executable. All open source codecs were compiled with Visual Studio 2015 with optimizations enabled. They all use the same exact compiler settings. I'll update the previous post tomorrow with the specific settings.
I'm not releasing my data corpus. Neither does Squeeze Chart. This is to prevent codec authors from tweaking their algorithms to perform well on a specific corpus while neglecting general purpose performance. It's just a large mix of data I found over time that was useful for developing and testing LZHAM. I didn't develop this corpus with any specific goals in mind, and it just happens to be useful as a compressor benchmark. (The reasoning goes: If it was good enough to tune LZHAM, it should be good enough for newer codecs.)
Wednesday, August 3, 2016
RAD's ground breaking lossless compression product benchmarked
Intro
Progress in the practical lossless compression field has been painfully stagnant in recent years. The state of the art is now rapidly changing, with several new open source codecs announced in recent times (such as Brotli and Zstd) offering high ratios and fast decompression. Recently, RAD Game Tools released several new codecs as part of its Oodle data compression product.
My first impression after benchmarking these new codecs was "what the hell, this can't be right", and after running the benchmarks again and double checking everything my attitude changed to "this is the new state of the art, and open source codecs have a lot of catching up to do".
This post uses the same benchmarking tool and data corpus that I used in this one.
Updated Aug. 5th: Changed compiler optimization settings from /O2 to /OX and disabled exceptions, re-ran benchmark and regenerated graphs, added codec versions and benchmarking machine info.
Codec Settings
All benchmarking was done under Win 10 x64, 64-bit executable, in a single process with no multithreading.
- lz4 (v1.74): level 8, LZ4_compressHC2_limitedOutput() for compression, LZ4_decompress_safe() for decompression (Note: LZ4_decompress_fast() is faster, but it's inherently unsafe. I personally would never use the dangerous fast() version in a project.)
- lzham (lzham_codec_devel on github): level "uber", dict size 2^26
- brotli (latest on github as of Aug. 1, 2016): level 10, dict size 2^24
- bitknit (standalone library provided by RAD): BitKnit_Level_VeryHigh
- Zstd (v0.8.0): ZSTD_MAX_CLEVEL
- Oodle library version: v2.3.0
- Kraken: OodleLZ_CompressionLevel_Optimal2
- Mermaid: OodleLZ_CompressionLevel_Optimal2
- Selkie: OodleLZ_CompressionLevel_Optimal2
- zlib (v1.2.8): level 9
Data corpus used: LZHAM corpus, only files 1KB or larger, 22,827 total files
Benchmarking machine:
Compiler and optimization settings:
Visual Studio 2015 Community Edition, 14.0.25424.00 Update 3
Totals
Sorted by highest to lowest ratio:
brotli 1671777094
lzham 1676729104
kraken 1685750158
zstd 1686207733
bitknit 1707850562
mermaid 1834845751
zlib 1963751711
selkie 1989554820
lz4 2131656949
Ratio vs. Decompression Throughput - Overview
On the far left there's LZHAM (dark gray), which at this point is looking pretty slow. (For a time, it was the decompression speed leader of the high ratio codecs, being 2-3x faster than LZMA.) Moving roughly left to right, there's Brotli (brown), zlib (light blue), Zstd (dark green), BitKnit (dark blue), Kraken (red), then a cluster of very fast codecs (LZ4 - yellow, Selkie - purple, Medmaid - light green, and even a sprinkling of Kraken - red).
Notes:
- Kraken is just amazingly strong. It has a very high ratio with ground breaking decompression performance. There is nothing else like it in the open source world. Kraken's decompressor runs circles around the other high-ratio codecs (LZHAM, Brotli, Zstd) and is even faster than zlib!
- Mermaid and Selkie combine the best of both worlds, being as fast or faster than LZ4 to decompress, but with compression ratios competitive or better than zlib!
Ratio vs. Decompression Throughput - High decompression throughput (LZ4, Mermaid, Selkie)
* LZ4 note: LZ4's decompression performance depends on whether or not the data was compressed with the HC or non-HC version of the compressor. I used the HC version for this post, which appears to output data which decompresses a bit faster. I'm guessing it's because there's less compressed data to process in HC streams.
Ratio vs. Decompression Throughput - High ratio codecs
Ratio vs. Compression Throughput - All codecs
For the first time on my blog, here's a ratio vs. compression throughput scatter graph.
- LZHAM's compressor in single threaded mode is very slow. (Compression throughput was never a priority of LZHAM.) Brotli's compressor is also a slowpoke.
- Interestingly, most of the other compressors cluster closely together in the 5-10mg/sec region.
- zlib and lz4 are both very fast. lz4 isn't a surprise, but I'm a little surprised by how much zlib stands apart from the others.
- There's definitely room here for compression speed improvements in the other codecs.
Conclusion
The open source world should be inspired by the amazing progress RAD has made here. If you're working on a product that needs lossless compression, RAD's Oodle product offers a massive amount of value. There's nothing else out there like it. I can't stress how big of a deal RAD's new lossless codecs are. Just their existence clearly demonstrates that there is still room for large improvements in this field.
Thanks to RAD Game Tools for providing me with a drop of their Oodle Data Compression library for me to benchmark, and to Charles Bloom for providing feedback on the codec settings and benchmark approach.
lz4hc vs. lz4 performance on the LZHAM test corpus
Both use LZ4_decompress_safe(). lz4hc uses LZ4_compressHC2_limitedOutput(), lz4 uses LZ4_compress_limitedOutput().
22,827 total files, all files >= 1KB.
total 5374152762
lz4hc 2199213331
lz4 2575990728