The very early results are promising: 25-30% faster decoding (Core i7) and much less memory usage (still determining) by just tuning the settings (less tables/slower updating), with relatively little impact on compression ratio. The ratio reduction is only a fraction of 1% on the few files I've tested. (Disclaimer: I've only just got this working. These results do make sense -- it takes a bunch of CPU to update the Huff decode tables.)
Also, by reducing the # of Huff tables the decompressor shouldn't bog down nearly so much on mostly incompressible data. The user can currently select between 1-64 tables (separately for literals and delta literals, for up to 128 total tables). The codec supports prediction orders between 0-2, with 2 programmable predictor bitmasks for literals/delta_literals. (I'm not really sure exposing separate masks for literals vs. delta literals is useful, but after my experience optimizing LZMA's options with Unity asset data I'm now leaning to just exposing all sorts of stuff and let the caller figure it out.)
I'm also going to expose dictionary position related bitmasks to feed into the various predictions, just like LZMA, because they are valuable on real-life game data.
Annoyingly, when I lower the compression ratio decompression can get dramatically faster. I believe this has to do with a different mix of the decode loop exercised by the lower ratio bitstream, but I'm not really sure yet (and I don't remember if I figured out why 3+ years ago). I'll be writing a guide on how to tune the various settings to speed up LZHAM's decompressor.