I combined together 3 of our larger Unity asset bundles together into a single .TAR file and here are the current results on my iPhone 4 (800 MHz A4 CPU - 512MB RAM):
LZHAM Compressed from 15209984 to 4999552 bytes
LZHAM Comp time: 112.771710, BPS: 134874.110168
LZHAM Decomp time: 0.895846, BPS: 16978346.767638
Decompression is around 47 cycles per byte on these bundles files, which contain a variety of Unity asset data.
Now LZMA stats (level 9 16MB dictionary, default tuning options):
LZMA compressed from 15209984 to 4726211 bytes
LZMA Comp time: 41.805776, BPS: 363824.942238LZMA Decomp time: 1.993880, BPS: 7628334.723455
LZMA decompression was ~105 cycles/byte.
So LZHAM decompresses this data 2.2x faster. Its ratio is slightly lower, but this can be somewhat compensated for by enabling LZHAM's better parser and compressing offline (with a multicore desktop CPU). This helps a little: 4960935 bytes. By using more frequent Huff table updates (level 3 vs. the default 8) and extreme parsing, I get 4942383 compressed bytes, but decompression is ~18% slower. I'm going to graph all of this data next.
For reference, my iPhone 4's CPU is ~13.6x slower for compression and ~8.5x slower for decompression vs. my Core i7 3.3 GHz desktop CPU (comparing absolute wall time, no multithreading, same settings and file data, etc.).
Update: Here are the testing results after compressing & decompressing all of our uncompressed asset bundles on my iPhone 4. I limited LZHAM's compressor to a dictionary size of 8MB, less frequent table updating (table update speed of 12 vs the default 8), and normal parsing, which limited its ratio a bit vs. running it on desktop.
LZHAM is slower on a few files totaling ~.2% of the data (~320k out of 172MB), from there it rises to between 1.8x-4.8x faster. (Note I'm currently regenerating this graph so LZHAM's dictionary size matches LZMA's.)
1. Red=Speedup, Blue=LZMA compressed size, sorted by compressed size.
2. Red: Speedup, Blue: LZMA_comp_size/LZHAM_comp_size, sorted by speedup.