Thursday, December 3, 2015

One test showing the performance of miniz vs. zlib

miniz (was here, now migrating to github here) is my single source file zlib-alternative. It's a complete from scratch reimplementation, and my 5th Deflate/Inflate implementation so far. It has an extremely fast, real-time Deflate-compatible compressor, and for fun the entire decompressor lives in only a single C function. From this post by Tom Alexander:

miniz vs zlib

For this final test, we will use the code from the above test which is using read and only a single thread. This should be enough to compare the raw performance of miniz vs zlib by comparing our binary vs zcat.
TypeTime
fzcat (modified for test)64.25435829162598
zcat109.0133900642395

Conclusions

So it seems that the benefit of mmap vs read isn't as significant as I expected. THe benefit theoretically could be more significant on a machine with multiple processes reading the same file but I'll leave that as an excercise for the reader.
miniz turned out to be significantly faster than zlib even when both are used in the same fashion (single threaded and read). Additionally, using the copious amounts of ram available to machines today allowed us to speed everything up even more with threading.

8 comments:

  1. "For this final test, we will use the code from the above test which is using read and only a single thread. This should be enough to compare the raw performance of miniz vs zlib by comparing our binary vs zcat."

    Uh. Apples to oranges?.. I thought miniz should be slower than zlib.

    ReplyDelete
    Replies
    1. It's been a few years, but if I remember correctly miniz's real-time compression mode (level 1 I think) is faster than zlib's, it's on the decompression Pareto frontier or really close.

      Sadly I don't recall how miniz's decompressor fared vs. zlib's. I would bet miniz's code size is smaller, though.

      Delete
    2. For what it's worth, if someone wants to write a plugin for Squash I'd be happy to include it, that would let it appear in the next version of the Squash Benchmark. It looks like it would mostly just be a matter of copying the zlib plugin and changing some prefixes…

      Delete
    3. I can write a plugin as time permits.

      Also, my next weekend goal is to get Squash into 7-zip, as the ultimate custom codec plugin. It supports streaming, which is the requirement. Then I can just add plugins to Squash and not worry about messing inside of 7-zip plugins.

      Delete
    4. :o that would be very cool.

      I was under the impression that each codec would need a unique integer instead of a string for identification in 7zip archives, is that accurate? Of course I'm happy to add a support for a "7zip-id" field to the plugin ini files (and API to access it from Squash), but it would require some sort of coordination with the 7zip people…

      Also, if you want to host it in the Squash repository I'd be willing (I'd give you access, of course). That way I could help fix anything I break (the Squash API isn't stable quite yet), and we would know about breaks right away (I abuse Travis CI and AppVeyor pretty heavily).

      Delete
    5. 7-zip lets us store a small 5 byte or whatever header struct that I think it stores in its central directory. So we can store any info we want there.

      Delete
    6. I forgot to mention here that I went ahead and put together a miniz plugin. Flushing is currently disabled due to https://github.com/richgel999/miniz/issues/48 but that won't cause any problems for the benchmark; it should be included in the next run (probably later this month, I'm hoping to have 0.8 ready for release soon so I can run the benchmark while I'm out of town for Christmas).

      Delete
  2. I recently had some performance issues in the Squash benchmark when mmap was used, which have disappeared since I started using MAP_HUGETLB… it might be worth looking into that before abandoning mmap. FWIW, if you're on Linux the `perf` can be quite helpful here; it certainly was for me.

    That said, in Squash I currently only use mmap for codecs which don't support streaming so I can avoid allocating enough room for the entire input and output in RAM. I haven't looked into using it for streaming I/O (though I do have a bug open about the idea).

    If you do abandon mmap, there are two things I've been meaning to investigate for Squash: the first idea (which I got from Lasse Collin) is to use posix_fadvise with POSIX_FADV_SEQUENTIAL, and possibly POSIX_FADV_WILLNEED. The second is using the POSIX AIO API (see `man 7 aio`) to ask the OS to read another block of data before you start processing the current one.

    ReplyDelete