Friday, January 16, 2015

LZHAM v1.0 progress

Ported to OSX, and exposed several new compression/decompression parameters to allow the user to configure some of the codec's inner workings: literal/delta_literal bitmasks (or number of literal/delta literal bits - less versatile but simpler), the Huff table max interval between updates, and the rate at which the Huff table update interval slows between updates. These settings are absolutely critical to the decompressor's performance, memory, and CPU cache utilization.

The very early results are promising: 25-30% faster decoding (Core i7) and much less memory usage (still determining) by just tuning the settings (less tables/slower updating), with relatively little impact on compression ratio. The ratio reduction is only a fraction of 1% on the few files I've tested. (Disclaimer: I've only just got this working. These results do make sense -- it takes a bunch of CPU to update the Huff decode tables.)

Also, by reducing the # of Huff tables the decompressor shouldn't bog down nearly so much on mostly incompressible data. The user can currently select between 1-64 tables (separately for literals and delta literals, for up to 128 total tables). The codec supports prediction orders between 0-2, with 2 programmable predictor bitmasks for literals/delta_literals. (I'm not really sure exposing separate masks for literals vs. delta literals is useful, but after my experience optimizing LZMA's options with Unity asset data I'm now leaning to just exposing all sorts of stuff and let the caller figure it out.)

I'm also going to expose dictionary position related bitmasks to feed into the various predictions, just like LZMA, because they are valuable on real-life game data.

Annoyingly, when I lower the compression ratio decompression can get dramatically faster. I believe this has to do with a different mix of the decode loop exercised by the lower ratio bitstream, but I'm not really sure yet (and I don't remember if I figured out why 3+ years ago). I'll be writing a guide on how to tune the various settings to speed up LZHAM's decompressor.

On the downside, the user has more knobs to turn to make max use of the codec.


Wednesday, January 14, 2015

More LZHAM notes

It's been a while since I've made any major changes to LZHAM (except for minor cmake related stuff). This was a codec I wrote over a few nights and weekends while I was also working my day job. I eventually had to let active dev on LZHAM go to sleep because I got "sidetracked" shipping Portal 2. The codec's alpha has already been successfully deployed in several products, such as Planetside 2 and Titanfall, which isn't bad for a few nights of R&D and implementation work.

I covered what I was thinking of doing with LZHAM in this blog post. I have more interest in improving it again. For the types of products I'm now working on, what matters a lot is the title's retention rate, from first starting the product to the customer actually getting into real gameplay. Slow downloads or updates, loading screens, etc. equals lost users. Lost users=lower monetization. We actually measure the retention rate of every aspect of this in the field. So things like background downloading, streaming, proper organization of asset data into Unity asset bundles, and of course good data compression matter massively to us.

Anyhow, some ideas for LZHAM decompression startup and throughput improvements which I can do pretty quickly:

- After much testing on our game data, I now realize I underestimated how useful the various LZMA settings are. Right now LZHAM always uses the upper 3 MSB's of the prev. two literals for literal/delta literal contexts. Allow the user to control all of this: which prev. literal(s) (if any), say up to 8 bytes back, and which bits from those literals, separately for each type of prediction (literals/delta literals).

- In my quest to get LZHAM's ratio up to be similar to LZMA I made several tradeoffs which can greatly impact decompression perf, especially on uncompressible data. Right now the codec must always init and manage 64*2 Huffman tables. Allow the user to reduce or even increase the # of tables.

- LZHAM was designed for "solid" compression, where you give the codec dozens to hundreds of MB's containing many assets, and you don't restart/reinit the codec in between assets. It's like a slow to start drag racer. So it can suck on small files.

I'm not honestly 100% sure what to do about this yet that won't kill decompression perf. The way LZHAM updates Huffman tables seems like an albatross here. Amortized over many MB's it typically works fine, but on small files they can't be updated (adapted) quickly enough. Less tables are probably good here.

I could just integrate something like miniz into the codec, and try using it on each internal compressor block and using whatever is better. But that seems horrible.

- The Huffman table update frequency needs to be better tuned. If I can't think of anything smarter, allow the user to control the update schedule.

Note if you are very serious about fast, high ratio compression and decompression, Rad's Oodle product is very good. Given what I know about it, it's the best (fastest, highest compression, and most scalable/portable) production class lossless codec I know of.

Friday, January 9, 2015

Improved Unity asset bundle file compression

Download Times Matter


Sean Cooper, Ryan Inselmann and I have been building a custom lossless archiver designed specifically for Unity asset bundle files. The archiver itself uses several well-known techniques in the hardcore archiving/game repacking world. This post is mostly about how we've begun to tune the LZMA settings used by the archiver to be more effective on Unity asset bundle data. We'll cover the actual archiver in a later post.

LZMA has several knobs you can turn to potentially improve compression:
http://stackoverflow.com/questions/3057171/lzma-compression-settings-details

The LZHAM codec (my faster to decode alternative to LZMA) isn't easily tunable in the same way as LZMA yet, which is a flaw. I totally regret this because tuning these options works:

Total asset bundle asset data (iOS): 197,514,230
Unity's built-in compression: 91,692,327
Our archiver, un-tuned LZMA: 76,049,530
Our archiver, tuned LZMA (48 option trials): 75,231,764

Our archiver, tuned LZMA (225 option trials): 74,154,817

LZ codecs with untunable models/settings are much less interesting to me now. (Yet one more thing to work on in LZHAM.)

Here are the optimal settings we've found for each of our Unity asset classes on iOS. So for example, textures are compressed using different LZMA options (3, 3, 3) vs. animation clips (1, 2, 2).

Best LZMA settings after trying all 225 options. (In case of ties the compressor just selects the lowest lc,lp,pb settings - I just placed a triply nested for() loop around calling LZMA and it chooses the first best as the "best".)

Class (id): lc lp pb

GameObject (1): 8 0 0
Light (108): 2 2 2
Animation (111): 8 2 2
MonoScript (115): 2 0 0
LineRenderer (120): 0 0 0
SphereCollider (135): 0 0 0
SkinnedMeshRenderer (137): 8 4 2
AssetBundle (142): 0 2 2
WindZone (182): 0 0 0
ParticleSystem (198): 0 2 3
ParticleSystemRenderer (199): 8 3 3
Camera (20): 0 2 2
Material (21): 3 0 1
SpriteRenderer (212): 0 0 0
MeshRenderer (23): 8 4 2
Texture2D (28): 8 2 3
MeshFilter (33): 8 4 1
Transform (4): 6 2 2
Mesh (43): 0 0 1
MeshCollider (64): 1 4 1
BoxCollider (65): 7 4 2
AnimationClip (74): 0 2 2
AudioSource (82): 0 0 0
AudioClip (83): 8 0 0
Avatar (90): 2 0 2
AnimatorController (91): 2 0 2
? (95): 7 4 1
TrailRenderer (96): 0 0 0

Tuesday, January 6, 2015

Utils/tools to help transition to OSX from Windows

I've developed software under Kubuntu and Windows for years now. I found adapting to Kubuntu from a Windows world to be amazingly easy (ignoring the random Linux driver installation headaches). After a few days mucking around with some keyboard shortcuts and relearning Bash and I was all set. 

These days, the center of gravity in the mobile development world is OSX due to iOS's market share, so it's time I bit the bullet and dive into the Apple world.

I'll admit, transitioning to OSX has been mind numbingly painful at times. (Apple, why do you persist on using such a wonky keyboard layout? Where's my ctrl+left??! No alt+tab?? wtf? ARGH.) I mentioned this to Matt Pritchard, a long time OSX/iOS developer, and he passed along a list of OSX utilities and tools that can help make the transition easier for long-time Windows developers.

Monday, January 5, 2015

BonusXP's hybrid office arrangement

After my Valve experience I'm now deeply interested in how companies arrange and maintain the actual space their employees work in. The first thing I do when I enter a studio is look beyond the reception area and poke around to see how much importance and thought went into their actual workspace. This can tell you a lot about a company. Ignore the marketing and just observe.

BonusXP (makers of Monster Crew and Cave Mania, a growing indie game developer near Dallas in Allen, TX) used feedback from everyone at the studio to decide how their new office would be arranged. They decided on a hybrid mix of a low-density open plan surrounded by normal offices, with some key desk additions to cut down on visual distractions and give devs a better sense of control over their environment.

They also have a pretty sweet optional work from home on Friday policy.






Key things about this space that I see:
  • Tidy: no rats nests, shabby furniture, or giant collections of 2000 nerf guns and action figures.
  • Opaque glass dividers on each desk.
  • Small offices with windows surrounding the open office area.
  • Real desks with actual storage space.
  • Low density and lots of space around each mini "pod" of desks. 
  • Blue color theme, which triggers a relaxing response.
They are working on a secret project with Stardock. You can read more about BonusXP on their blog.

Sunday, January 4, 2015

Robot Entertainment knows how to make open offices work better

There are precious few public pics available of Robot Entertainment's (The Orcs Must Die people near Dallas, TX - one of the post-Ensemble Studios companies) offices. This is a shame, because they've got a very smartly laid out space. (Update: Found more pics.)







The important elements:
  • Discipline based "pod" organization and half partitions above eye height.
  • Disruptive foot traffic and audio/visual noise mostly confined within a single pod.
  • Pervasive availability of usable and visible whiteboards
  • High ceilings
  • No power cords ducktaped to the floor
  • Lower density desk layout. 
  • Small desks with whiteboards encourages small meetings
Borderline genius considering how little most studios, even insanely successful ones, think about this stuff.

And at this studio you get an entire Beer Garden in your offices. No high school lunch room cafeteria here!




Another major difference between Robot's offices verses the spaces of companies with dehumanizing, industrial scale open layouts is persistence and presence. At Robot's (and the old Ensemble Studios) offices, employees can bring in books, references, games, etc. and arrange them about their office in actual physical book cases and shelves. Sometimes Kindle doesn't cut it, especially for historical references. You can't practically do this at companies that attach wheels to your little desks and force you to play bumper cars every quarter.



For completeness, here's their lobby and meeting area:




Some impressions I get about this space by just looking at the pics: openness, welcoming, and team oriented.

Microsoft's vs. Valve's digs

I can't stand Microsoft's terrible "Modern" UI, but I'll give them props for having an actual Office Innovation Team that actually thinks about this stuff. (Total side note to the OIT people: Choose a different title, it's virtually impossible to search for it.)

From Pics of MS Building 4:




Contrast the above to this (from here):


Now where do you want to work?

You know, you can tell a lot about a company and its culture by just looking at photos of their offices.