Wednesday, August 3, 2016

RAD's ground breaking lossless compression product benchmarked

Intro


Progress in the practical lossless compression field has been painfully stagnant in recent years. The state of the art is now rapidly changing, with several new open source codecs announced in recent times (such as Brotli and Zstd) offering high ratios and fast decompression. Recently, RAD Game Tools released several new codecs as part of its Oodle data compression product.

My first impression after benchmarking these new codecs was "what the hell, this can't be right", and after running the benchmarks again and double checking everything my attitude changed to "this is the new state of the art, and open source codecs have a lot of catching up to do".

This post uses the same benchmarking tool and data corpus that I used in this one.

Updated Aug. 5th: Changed compiler optimization settings from /O2 to /OX and disabled exceptions, re-ran benchmark and regenerated graphs, added codec versions and benchmarking machine info.

Codec Settings


All benchmarking was done under Win 10 x64, 64-bit executable, in a single process with no multithreading.

  • lz4 (v1.74): level 8, LZ4_compressHC2_limitedOutput() for compression, LZ4_decompress_safe() for decompression (Note: LZ4_decompress_fast() is faster, but it's inherently unsafe. I personally would never use the dangerous fast() version in a project.)
  • lzham (lzham_codec_devel on github): level "uber", dict size 2^26
  • brotli (latest on github as of Aug. 1, 2016): level 10, dict size 2^24
  • bitknit (standalone library provided by RAD): BitKnit_Level_VeryHigh
  • Zstd (v0.8.0): ZSTD_MAX_CLEVEL
  • Oodle library version: v2.3.0
  • Kraken: OodleLZ_CompressionLevel_Optimal2
  • Mermaid: OodleLZ_CompressionLevel_Optimal2
  • Selkie: OodleLZ_CompressionLevel_Optimal2
  • zlib (v1.2.8): level 9

Data corpus used: LZHAM corpus, only files 1KB or larger, 22,827 total files

Benchmarking machine:


Compiler and optimization settings:

Visual Studio 2015 Community Edition, 14.0.25424.00 Update 3

/GS- /GL /W4 /Gy /Zc:wchar_t /Zi /Gm- /Ox /Zc:inline /fp:precise /WX- /Zc:forScope /arch:AVX /Gd /Oi /MT

Totals


Sorted by highest to lowest ratio:

Original    5374152762

brotli      1671777094 
lzham       1676729104 
kraken      1685750158 
zstd        1686207733 
bitknit     1707850562 
mermaid     1834845751 
zlib        1963751711 
selkie      1989554820 
lz4         2131656949 

Ratio vs. Decompression Throughput - Overview




On the far left there's LZHAM (dark gray), which at this point is looking pretty slow. (For a time, it was the decompression speed leader of the high ratio codecs, being 2-3x faster than LZMA.) Moving roughly left to right, there's Brotli (brown), zlib (light blue), Zstd (dark green), BitKnit (dark blue), Kraken (red), then a cluster of very fast codecs (LZ4 - yellow, Selkie - purple, Medmaid - light green, and even a sprinkling of Kraken - red).

Notes:
  • Kraken is just amazingly strong. It has a very high ratio with ground breaking decompression performance. There is nothing else like it in the open source world. Kraken's decompressor runs circles around the other high-ratio codecs (LZHAM, Brotli, Zstd) and is even faster than zlib!
  • Mermaid and Selkie combine the best of both worlds, being as fast or faster than LZ4 to decompress, but with compression ratios competitive or better than zlib! 

Ratio vs. Decompression Throughput - High decompression throughput (LZ4, Mermaid, Selkie)



* LZ4 note: LZ4's decompression performance depends on whether or not the data was compressed with the HC or non-HC version of the compressor. I used the HC version for this post, which appears to output data which decompresses a bit faster. I'm guessing it's because there's less compressed data to process in HC streams.



Ratio vs. Decompression Throughput - High ratio codecs



LZHAM, Brotli, Kraken, BitKnit, and Zstd, with zlib and lz4 included for reference purposes. Kraken is starting to give LZ4 some competition, which is pretty amazing considering Kraken is a high ratio codec!


Ratio vs. Compression Throughput - All codecs


For the first time on my blog, here's a ratio vs. compression throughput scatter graph.



Notes:

  • LZHAM's compressor in single threaded mode is very slow. (Compression throughput was never a priority of LZHAM.) Brotli's compressor is also a slowpoke.
  • Interestingly, most of the other compressors cluster closely together in the 5-10mg/sec region.
  • zlib and lz4 are both very fast. lz4 isn't a surprise, but I'm a little surprised by how much zlib stands apart from the others. 
  • There's definitely room here for compression speed improvements in the other codecs.

Conclusion


The open source world should be inspired by the amazing progress RAD has made here. If you're working on a product that needs lossless compression, RAD's Oodle product offers a massive amount of value. There's nothing else out there like it. I can't stress how big of a deal RAD's new lossless codecs are. Just their existence clearly demonstrates that there is still room for large improvements in this field.

Thanks to RAD Game Tools for providing me with a drop of their Oodle Data Compression library for me to benchmark, and to Charles Bloom for providing feedback on the codec settings and benchmark approach.


lz4hc vs. lz4 performance on the LZHAM test corpus


Both use LZ4_decompress_safe(). lz4hc uses LZ4_compressHC2_limitedOutput(), lz4 uses LZ4_compress_limitedOutput().

22,827 total files, all files >= 1KB.

total 5374152762
lz4hc 2199213331
lz4   2575990728


Sunday, July 31, 2016

New lossless compression benchmarks on the way

I've been benchmarking several new lossless codecs from Rad Game Tools: Kraken, Selkie, and Mermaid. (How does Rad think up these odd but cool sounding names?) Stay tuned!


Thursday, July 21, 2016

enet networking library

I switched over all the low-level networking in a VR app I've been working on to enet today. It's a UDP-based networking library that supports optional reliable and in order packet delivery, packet fragmentation (so very large packets can be sent over UDP), and multiple channels.

The API was super easy to use, the code is written in C, and the thing just works. Compiling it was as easy as dropping the .C files into the project and hitting Build.

I love libraries like this.

Sunday, July 3, 2016

Welcome to "The Hunger Games"

Pretty much required reading if you're going to work (and stand out!) at a self-organizing company:

The Hunger Games


Sunday, June 5, 2016

What a company-wide "reorg" looks like in a flat, manager-less company

Working at a bunch of companies over the years has given me a lot of interesting perspective. I really enjoy trying to describe how processes in top-down companies can be done in non-hierarchical ("no boss" or self-organizing) companies. Let's try to describe, say to a hierarchical company employee, what a company-wide "reorg" could look like in a no-manager company.

I first heard the word "reorg" in relation to how Microsoft periodically reorganizes its corporate structure to "better align the company to new corporate-level goals and strategies". (That's a joke.) Issuing a company-wide reorg in a hierarchical company is very much a executive-level decision. It's a top-down directive that the company intentionally follows, like a military maneuver. It just happens, you know about it as an employee, and you must go with the flow.

But what does a deep reorg actually look like in a non-hierarchical, manager-less company? The CEO can't just come cruising in totally reorg'ing the place. (Remember, the CEO is not your boss in companies like this!) Such a traumatic "mass adjustment of resources" is just not in the culture. (Small-scale "horse trades" occur all the time in manager-less companies. I'm talking about a deep, planned reorganization that impacts a large chunk of the company.)

Well, here's one way you can re-org a manager-less company. This approach assumes the company actor(s) attempting to pull the reorg off have the power to form new teams and make internal/external hires.

First, you need to form a small team around some new product or technology. Do it just below the radar (internally). It needs to show promise and be a rising-star type project. You should work to get as much strategic press exposure about this new team's work as possible.

Next, you start internally recruiting and externally hiring for that new team. You optimize the external hiring process to streamline it, to accept some candidates as contractors (who you may eventually hire) and some as immediate full-time employees. For the internal recruits, you only hire those internal developers who are the most passionate about the new project's goals or its technology. Hiring on the new team must be done carefully, because it's ultimately part of a greater company-wide sorting and reorganization process.

If the new project becomes large enough, it creates a rift of sorts in the organization. The new team gets more power and size over time. An entire ecosystem of other friendly teams can form around the new team. The company self-organizes itself into a market of teams around the new project, and a block of other "deprecated teams" who may not be aligned with the reorg's goals.

These deprecated teams can be reduced in size by letting go of internal developers over time. Anyone the company doesn't want long-term can be quickly moved onto a deprecated team. To minimize shock to the deprecated team's product (which may need to remain live), the team can fall back to possibly cheaper external contractors as it internally shrinks. Ultimately, the product can be put on long-term life support with minimal internal cost.

Now, if you are a developer in a company like this, and you want to survive the reorg, you should be asking yourself right about now "am I on a deprecated team?". If you are, you better learn the company's new religion quickly or you may be pushed out. (Or, you need to visibly work on background projects that support post-reorg goals or needs.)

If you are a senior team member in this scenario, and you want your team to not become deprecated, you need to quickly figure out how to transition your product into the "new era" so it remains relevant.

Thursday, May 19, 2016

We Need to Collectively Renegotiate

I'm sitting here watching season 2 of Halt and Catch Fire. This season wipes the slate mostly clean and starts over at an early 80's garage-style software service startup in Texas. At first, I pushed back at the idea of a real-time online gaming service using early 80's Commodore-era computer, disk storage and modem technology. Then I realized, everything they are showing here was more or less technologically feasible, or at worst was at the very edge of that era's hardware/software technology.

While watching this I had another interesting realization. Lots of my previous posts are really my way of telling every full-time software engineer I can reach to basically "wake up".

Let's mentally model the current employment situation as a 2D simulation. See all those little dots? Those are the full-time software developers working at corporations. Let's hit fast forward. Wow, that's weird! All these super valuable programmers keep going to and from the same bland corporate company nodes to work every morning. Their working conditions sometimes really suck and they are generally underpaid. These corporations have even been known to illegally cooperate with each other (i.e. conspire) to keep compensation to a minimum.

We've been interacting with lots of clients, some very well known in their fields, and most paint a similar picture: Their view is that too many engineers are "locked up" inside these corporations. It's actually very hard to find good software developers. There is room in the system for more software consultants, little consulting companies with amazing programmers like Blue Shift.

So here's my idea:

Now let's try upping the communication, empathy, independent organization and trust levels across all these agents in the simulation and see what happens. A bunch of smaller companies pop up and start offering their services to a potentially huge array of clients. They can negotiate for the best pay and conditions possible in this changed economy.

To pull this off in the real world, what we need to do is start talking, trusting, and cooperating with each other much more, especially across teams and companies. We all have a common interest here that totally transcends pretty much any corporate NDA. Collectively, we as software engineers have way too much power and value in the system to be working as atomized individuals competing with each other for scraps.

We can leave these corporations to form our own consulting or product companies. This will force the market to reorganize itself. Do this and working conditions and compensation levels can be organically pressured upwards. We actually have the power to do this if we would just organize and communicate more effectively.

Personally I believe even just a small number of programmers doing this can have a surprising economic and perhaps even a cultural impact.

In practice, doing this isn't that hard. I've started three companies so far, in between working at various companies. The first one did very early deferred shading research for Microsoft, the second one created crunch, and the third (Binomial) is consulting oriented.

To start: While still employed, work on building a community of other engineers at various companies. Up your visibility by making sure your code and work is easily found online, attend every event you can, give presentations, teach and help people, and be as public as possible. Save up 6 months or whatever of finances, find some friends and make the leap.

And if you fail? No big deal, just sign up for another full-time gig for a while. One that likely pays more, because this collective renegotiation strategy results in higher average wages, and because by changing companies you might even get a raise for being more experienced now!

To find clients, tap into your network and offer your services. This will free up amazing teams to work across companies, instead of them being locked up inside a few corporate fortresses.

Another fallback strategy if your new company fails is to get acqui-hired by a larger company, skipping the ridiculous interview process many companies use. Just develop a cool piece of technology that you think one or more companies would be interested in.

(So, think I'm crazy? I have a bunch of detailed mental models here, built up over time by working at several large strategically placed software companies in several states. This can work. We just need to organize better and teach each other how to do it.)