For fun, let's try path finding into the future and see if we can add some more API's that expose more possibilities. How about these API's, which enable the caller to explore the solution space more deeply:
- CompressPush(): Push compressor's internal state
- CompressPop(): Pop compressor's internal state
- CompressQuery(): Determine how many bits it would take to compress a blob
- CompressContinue(): Commit some data generating some compressed output
Once we have these new API's (push/pop/query, we already have commit) we can use the compressor to explore data graphs in order to compose the smallest compressed output.
The Current Situation
There are two classes of nodes here that represent different concepts.
The blue nodes (A, B, C, etc.) represent internal compressor states, and the black nodes represent some data you've given to the compressor. Black nodes represent calls to CompressContinue() with some data to compress. You put in some data by calling CompressContinue(), and the compressor moves from internal states A all the way to F, and at the end you have some compressed data that will recover the data blobs input in nodes G, H, I, etc. at decompression time. Whenever the compressor moves from one blue node to the next it'll hand you some compressed bits.
Now let's introduce CompressQuery()
Now the black nodes represent "trials". In this graph, the compressor starts in state A, and we conduct three trials labeled B, C, and D using the CompressQuery() API to determine the cheapest path to take (i.e. the path with the highest compression ratio). After figuring out how to get into compressor state E in the cheapest way (i.e. the fewest amount of compressed bits), we "commit" the data we want to really compress by calling CompressContinue(), which takes us into state E (and also gives us a blob of compressed data). We repeat the process for trials F, G, H, which gets the compressor into state I, etc. At Y we have fully compressed the input data stream and we're done.
CompressQuery() is a good, logical first step but it's too shallow. It's just a purely local optimization tool.
Let's go further: push/pop the compressor state
Imagine something like this:
At compressor state A we first push the compressor's internal state. Now conduct three trials (C, D, E), giving us compressor state G, etc. At L we're done, so we pop back to node A and explore the bottom forest. Once we've found the best solution we pop back to A and commit the black nodes with the best results to get the final compressed data.
Do you mean like media filters that winrar uses but more specialized to your data?
ReplyDeleteBTW I would love it if you could somehow work together with 7-zip to get LZHAM in the GUI.
For 1st question, yes.
ReplyDeleteAbout 7-zip: Yes, this is one of my goals, I will contact Igor and ask him. I did this myself in the 7-zip codebase from years ago, and it wasn't that hard. But now we have a new plugin API in 7-zip, and that would need to be hooked up. Igor is the right person to do it. (I would do it and give him the code, but I don't have the free time.)
Right now we have two 7-zip codec plugins written (LZHAM and Zstd), Zstd one is coming. After that, the squash universal library.