Sunday, February 1, 2015

LZHAM 1.0 integrated into 7zip command line and GUI

I integrated the LZHAM codec into the awesome open source 7zip archiver a few years ago for testing purposes, but I was hesitant to release it because I was still frequently changing LZHAM's bitstream. The bitstream is now locked, so here it is in case anyone else out there finds it useful. (Updated: See my next post for a 7zip 9.38 beta compatible codec plugin.)

Important note: Please do *not* depend on this for anything important, this is only for testing purposes. There are very likely bugs in here, and the LZHAM codec id will be changing. I'll be releasing an official codec within the next week or two.

Here's the full source code and prebuilt Windows x86 binaries in the "bin" directory:
http://www.tenacioussoftware.com/7zipsrc_release_lzham_1_0.7z

Here are new bins fixing a decompressor reinit() problem in the original release (email me if you care about the source - the 7zip related portions are unchanged, I just merged over the latest version of LZHAM):
http://www.tenacioussoftware.com/7zipsrc_release3_lzham_1_0.7z

Note I haven't updated the makefiles yet, just the VS 2008 project files. This has only been tested by me, and I'm not an expert on the very large 7zip codebase (so buyer beware). I did most of this work several years ago, so this is undoubtedly an outdated version of 7zip.
I've only been able to compile the 32-bit version of 7zip so far, so the max. dictionary size is limited to 64MB. (Important note: I'm not trying to fork or break 7zip in any way, this is *only* for testing and fooling around and any archives it makes in LZHAM mode shouldn't be distributed.)

I'll be merging my changes over into the latest version of 7zip, probably next weekend. Also, LZHAM is statically linked in at the moment, I'll be changing this to load LZHAM as a DLL.

Here are some example command line usages (you can also select LZHAM in the GUI too). The method may range from 1-9, just like LZMA, and internally it's converted to the following LZHAM settings. You can use the "-md=16M" or "-md=128K" option to override the dictionary size. The -mmt=on/off option controls threading, which is on by default (i.e. -mmt=on or -mmt=off), and this new option controls deterministic parsing (which defaults to *off*): -mz=on

-mx=X:
7z method 1: LZHAM method 0, dict size 2^20
7z method 2: LZHAM method 1, dict size 2^21
7z methods 3-4: LZHAM method 2, dict size 2^22
7z methods 5-6: LZHAM method 3, dict size 2^23
7z methods 7-8: LZHAM method 4, dict size 2^26
7z methods 9: LZHAM method 4 extreme parsing, dict size 2^26 (can be very slow!)

In practice, beware using anything more than -mx=8 ("Maximum" in the GUI) unless you have a very powerful machine and some patience. Also, unless you're on a Core i7 or Xeon LZHAM's compressor will seem very slow to you, because the compressor is totally hamstrung on single core CPU's. (LZHAM is focused on decompression speed combined with very high ratios, so compression speed totally takes back seat.)

Example usage:

E:\dev\lzham\7zipsrc\bin>7z -m0=LZHAM -mx=9 a temp *.dll

7-Zip 9.20 (LZHAM v1.0) Copyright (c) 1999-2010 Igor Pavlov 2010-11-18
Scanning

Creating archive temp.7z

Compressing 7z.dll

Everything is Ok

E:\dev\lzham\7zipsrc\bin>7z -slt l temp2.7z

7-Zip 9.20 (LZHAM v1.0) Copyright (c) 1999-2010 Igor Pavlov 2010-11-18

Listing archive: temp2.7z

--
Path = temp2.7z
Type = 7z
Method = LZHAM BCJ
Solid = -
Blocks = 1
Physical Size = 487287
Headers Size = 122

----------
Path = 7z.dll
Size = 1268736
Packed Size = 487165
Modified = 2015-01-31 01:13:33
Attributes = ....A
CRC = 000E5D5E
Encrypted = -
Method = BCJ LZHAM:[1017030000]
Block = 0

7zip GUI:


Sunday, January 25, 2015

LZHAM v1.0 released on github

Here: https://github.com/richgel999/lzham_codec

I haven't merged over the XCode project yet, but it's fully compatible with OSX now. Also, LZHAM v1.0 is not backwards compatible with the previous alpha version.

Saturday, January 24, 2015

Windows 10: An Arrow Aimed Straight at Steam

I find this very interesting news, and if you're not paying attention you should:

Phoronix: Windows 10 To Be A Free Upgrade: What Linux Users Need To Know

PC World: Windows 10's new features: Cortana, a 'Spartan' browser, Xbox streaming, and more

Rock, Paper, Shotgun: Is Windows 10 Good For PC Gamers Or XBone Owners?

I think Microsoft's strategy here is surprisingly well thought out. Their execs finally figured out "to know your enemy you must become your enemy". Windows 10 is free, has a new state of the art graphics API (DirectX v12) created by the best graphics specialists, real software engineers, and real testers in the business, awesome developer tools (Visual Studio) that actually work with real CPU/GPU debugging and profiling support all built in, all of your existing apps and games still work, and they're pulling out all the stops with the Halo/Xbox branding right down into the OS and browser.

They just need to make the Windows 10 App Store not suck: Continue to use their Xbox brand as a lever, carefully feed and nourish the ecosystem, listen to their customers, and undercut the living hell out of Steam. Steam itself started out as a total pile of crap, but they listened to their customers, fixed the problems over time, gave their customers good deals and shipped apps you couldn't get anywhere else at the right prices, and built and nourished the community. Microsoft can do all the same things, and perhaps they've finally figured this out.

It's now all down to execution, recovering from some obviously bone headed moves (sometimes fueled by excessive Redmond Kool-Aid drinking, like the botched Windows 8 UI and no Start Menu disasters), recognizing and quickly recovering from the inevitable new bone headed moves, and sustaining the effort over the long term (something Microsoft has definitely not been very good at except for their core brands). Competition is great.

Friday, January 23, 2015

LZHAM v1.0 is being tested on iOS/OSX/Linux/Win

Currently testing it on a few machines using random codec settings with ~3.5 million files. We also just switched over our title's bundle decompression step from LZMA to LZHAM, so the decompressor will be tested on many iOS devices too.

I've also tested the compression portion of the code on iOS, but I won't be able to get much coverage there before releasing it. Honestly the decompressor is much more interesting for mobile devices (that's really the whole point of LZHAM).

I'll be porting and testing LZHAM on Android within a week or so - should be easy by this point.

LZHAM v1.0 vs. LZMA decompression perf. on iPhone 6+

I borrowed a coworker's iPhone 6+ and reran my bundle compression benchmarking app. According to wikipedia, it's a 1.4 GHz dual-core ARMv8-A.

LZHAM is 2.3x-9x faster on this device, unless the bundle's compressed size is < 1000 bytes. The comp size threshold where LZHAM is faster is lower than what I'm used to seeing, not sure exactly why yet.

1. Bundles sorted by LZHAM vs. LZMA decompression speedup (slowest on left):


2. Bundles sorted by LZMA compressed size (smallest on left), with relative decompression speedup in blue:


Wednesday, January 21, 2015

First LZHAM iOS stats with Unity asset bundle data

Got everything (both the compressor and decompressor) working. Was surprisingly easy. Had 1 misaligned load to deal with in the compressor's match finder because of a #ifdef problem.

I combined together 3 of our larger Unity asset bundles together into a single .TAR file and here are the current results on my iPhone 4 (800 MHz A4 CPU - 512MB RAM):

LZHAM Compressed from 15209984 to 4999552 bytes
LZHAM Comp time: 112.771710, BPS: 134874.110168
LZHAM Decomp time: 0.895846, BPS: 16978346.767638

For compression, I used a 16MB dictionary, highest compression (level 4) with normal parsing. Compression is slow, but LZHAM is designed for offline use so as long as it works at all I'm not sweating it for now.

Decompression is around 47 cycles per byte on these bundles files, which contain a variety of Unity asset data.

Now LZMA stats (level 9 16MB dictionary, default tuning options):

LZMA compressed from 15209984 to 4726211 bytes
LZMA Comp time: 41.805776, BPS: 363824.942238
LZMA Decomp time: 1.993880, BPS: 7628334.723455

LZMA decompression was ~105 cycles/byte.

So LZHAM decompresses this data 2.2x faster. Its ratio is slightly lower, but this can be somewhat compensated for by enabling LZHAM's better parser and compressing offline (with a multicore desktop CPU). This helps a little: 4960935 bytes. By using more frequent Huff table updates (level 3 vs. the default 8) and extreme parsing, I get 4942383 compressed bytes, but decompression is ~18% slower. I'm going to graph all of this data next.

For reference, my iPhone 4's CPU is ~13.6x slower for compression and ~8.5x slower for decompression vs. my Core i7 3.3 GHz desktop CPU (comparing absolute wall time, no multithreading, same settings and file data, etc.).

Update: Here are the testing results after compressing & decompressing all of our uncompressed asset bundles on my iPhone 4. I limited LZHAM's compressor to a dictionary size of 8MB, less frequent table updating (table update speed of 12 vs the default 8), and normal parsing, which limited its ratio a bit vs. running it on desktop.

LZHAM is slower on a few files totaling ~.2% of the data (~320k out of 172MB), from there it rises to between 1.8x-4.8x faster. (Note I'm currently regenerating this graph so LZHAM's dictionary size matches LZMA's.)

1. Red=Speedup, Blue=LZMA compressed size, sorted by compressed size.


2. Red: Speedup, Blue: LZMA_comp_size/LZHAM_comp_size, sorted by speedup.


Tuesday, January 20, 2015

LZHAM v1.0 vs. LZMA decomp. perf on a large corpus of files

LZHAM isn't always faster than LZMA. LZHAM has a more expensive startup cost (which I've reduced a bunch since the alpha but it's still there), and it must update several large Huffman tables at periodic intervals. The previous alpha versions had way too many Huff tables, which really dragged the codec down on some files. The new version now only has a handful of tables, I've reduced the default table update interval, and you can now fine tune the update interval. This graph was generated at update speed 20 (least # of updates/fastest).

To visualize where LZHAM is faster or slower, I ran a test app on a corpus of 21,702 files, all >= 1024 bytes (to reduce the sheer number of them) and timed how long LZHAM vs. LZMA took to decompress each file. This is a mix of game assets from various titles, the usual standard corpus files (calgary etc.), XML/JSON/binary JSON/source files, random WAV/BMP/TGA/JPG/MP3's, executables+DLL's from popular installs, etc. Just random stuff.

I tossed any results where LZMA expanded the data because in these cases LZMA is up to ~60x slower than LZHAM. LZHAM has special handling for uncompressed data, and LZMA does not, so it just bogs down really badly in these cases. There are still many cases in this graph where LZMA just bogs down terribly on nearly uncompressible files, LZHAM can win massively in this case because it chooses which 512k blocks to store uncompressed.


Here's the resulting graph showing LZMA's vs. LZHAM's decompression time on a 3.3 GHz Core i7, sorted by the LZMA's compressed file size. The blue line is the speedup, where less than 1.0 means LZMA was faster, and greater than 1.0 means LZHAM was faster. 

The red line is the compressed file size on a log scale. This corpus has a ton of small (<4kb) files.

This graph shows than LZHAM v1.0 is pretty much always slower than LZMA if the compressed file size is <= ~2400 bytes. LZHAM can be only ~20% as fast in these cases. At around ~13k LZHAM is usually faster, and the greater the amount of compressed data the higher the likelihood that LZHAM is faster. You can estimate the threshold amount of original/source data if you know your data's average compression ratio.

Basically, LZHAM sucks on small blocks. I can make this somewhat better by reducing the startup cost, and optionally allowing the user to disable the huff table updating (or just slow it down even more). Another alternative is to have the compressor intelligently break up the input stream into a handful (or just 2) carefully chosen LZHAM blocks and issue a "update all huff tables now" command to the decompressor in between the block boundaries.

Note all of my timings include the time LZHAM takes to allocate its work memory and initialize its internal data structures. The dictionary size for LZHAM was always 64MB in this test, while the dict. size for LZMA was tuned to be the first pow2 >= the source file size, so it's possible LZHAM is at a bit of a disadvantage here due to extra memory allocation costs relative to LZMA. I'm running another test (in LZHAM's unbuffered mode) to find out if this makes any difference.

(Thanks to John Brooks for giving me some feedback on this graph.)

Update: Here's a new, less noisy graph with the following differences:
- Filtered out all files with a LZMA comp ratio less than 2% (because we know LZMA totally sucks at these low ratios)
- Switched LZHAM into unbuffered mode, for a minor decompression speed boost.