Over the previous week I've increased the decompressor's average speed by roughly 1.5%-2%, by enabling Huffman decoding acceleration tables on the distance LSB symbol table.
I'm currently testing several changes which decrease the time it takes for the decompressor to initialize, and increases avg. throughput on tiny (<10KB) streams:
- The initial set of Huffman tables are very simple because the symbol frequencies are all 1's. The decompressor now quickly detects this case and bypasses the more general canonical Huffman codesize computation step.
- The decompressor bypasses the construction of Huffman decoding acceleration tables near the beginning of the stream. It computes the approx. cost of constructing the accel. tables and decoding the next X symbols using those tables, verses just decoding the next X symbols without accel. tables, and chooses whatever is predicted to be cheapest. The Huff decode acceleration tables can be rather big (up to 2048 entries), so this is only a win at the beginning of streams (when the tables are rapidly updated).
x86 timing results on a 64 byte test file (file to file decompression, 64MB dictionary):
x86 timing results on a 3087 byte test file:
Decomp only: .170ms
Decomp only: .132ms