Sunday, April 2, 2023

The Dark Horse of the Image Codec World: Near-Lossless Image Formats Using Ultra-Fast LZ Codecs

I think simple ultra-high speed lossy (or near-lossless) image codecs, built from the new generation of fast LZ codecs, are going to become more relevant in the future.

Computing bottlenecks change over time. As disk space, disk bandwidth, and internet bandwidth increases, older image codecs that squeeze every last bit out of the resulting file become less valuable for many use cases. Eventually websites or products using noticeably lossy compressed images will be less attractive. The bandwidth savings from overly lossy image codecs will become meaningless, and the CPU/user time and battery or grid energy spent on complex decompression steps will be wasted.

Eventually, much simpler codecs with lower (weaker) compression ratios that introduce less distortion, but have blazingly fast decompression rates are going to become more common. This core concept motivates this work.

One way to construct a simple lossless or lossy image codec with fast decompression is to combine a custom encoding tool with the popular lossless LZ4 codec. LZ4's compression and decompression is extremely fast, and the library is reliable, updated often, extensively fuzz tested, and very simple to use. 

To make it lossy, the encoder needs to precondition the image data so when it's subsequently compressed by LZ4, the proportion of 4+ byte matches vs. literals is increased compared to the original image data. I've been constructing LZ Preconditioners, and building new LZ codecs that amend themselves to this preconditioning step, over the past year.

Such a codec will not be able to compete against JPEG, WebP, JPEG 2000, etc. for perceived quality per bit. However, it'll be extremely fast to decode, very simple, and will likely not bloat executables because the LZ4 library is already present in many codebases. Using LZ4 introduces no new security risks.

This LZ preconditioning step must be done in a way that minimizes obvious visible artifacts, as perceived by the Human Visual System (HVS). This tradeoff, of increasing distortion but reducing the bitrate is a classic application of Rate-distortion theory. This is well-known in video coding, and now in GPU texture encoding (which I introduced in 2009 with my "crunch" compression library).

The rdopng tool on github supports creating lossy LZ4 compressed RGB/RGBA images using a simple rate-distortion model. (Lossless is next, but I wanted to solve the harder problem first.) During the preconditioning step, the LZ4 rate in bits is approximated using a sliding dictionary and a match finder. For each potential match replacement which would introduce distortion into the lookahead buffer, the preconditioner approximates the introduced visual error by computing color error distances in a scaled Oklab perceptual colorspace. (Oklab is one of the most powerful colorspaces I've used for this sort of work. There are better colorspaces for compression, but Oklab is simple to use and well-documented.)

Perceptually, distortions introduced into regions of images surrounded by high frequency details are less noticeable vs. regions containing smooth or gradient features. Before preconditioning, the encoder computes two error scaling masks which indicate which areas of the image contains large or small gradients/smooth regions. These scaling masks suppress introducing distortions (by using longer or more matches) if doing so would be too noticeable to the HVS. This step has a large impact on bitrate and can be improved.

To speed up encoding, the preconditioner only examines a window region above and to the left of the lookahead buffer. LZ4's unfortunate minimum match size of 4 bytes complicates encoding of 24bpp RGB images. Encoding is not very fast due to this search process, but it's possible to thread it by working on different regions of the image in parallel. The encoder is a proof of principle and testing grounds, and not as fast as it could be, but it works.

The encoder also supports angular error metrics for encoding tangent space normal maps.

LZ4I images are trivial to decode in any language. A LZ4I image consists of a simple header followed by the LZ4 compressed 24bpp RGB (R first) or RGBA pixel data:

Some example encodings:

Lossless original PNG image, 19.577 bits/pixel PNG, or 20.519 bits/pixel LZ4I:

Lossy LZ4I: 42.630 709 Y dB,  12.985 bits/pixel, 1.1 gigapixels/sec. decompression rate using LZ4_decompress_safe() (on a mobile Core i7 1065G7 at 1.3GHz base clock):

Biased delta image:

Greyscale histogram of biased delta image:

A more lossy LZ4I encoding, 38.019 709 Y dB, 8.319 bits/pixel:

Biased delta image:

Greyscale delta image histogram:

Lossless original: 14.779 bits/pixel PNG, 16.551 bits/pixel LZ4I:

Lossy LZ4I: 45.758 709 Y dB, 7.433 bits/pixel (less than half the size vs. lossless LZ4I!):

Biased delta image:

rdopng also supports lossy QOI and PNG encoding. QOI is a particularly attractive for lossy compression because the encoding search space is tiny, however decompression is slower. Lossy QOI encoding is extremely fast vs. PNG/LZ4.

It's also possible to construct specialized preconditioners for other LZ codecs, such as LZMA, Brotli, Zstandard, or LZSSE. Note the LZ4 preconditioner demonstrated here is universal (i.e. it's compatible with any LZ codec) because it just introduces more or longer matches, but it doesn't exploit the individual LZ commands supported by each codec.

LZSSE is particularly attractive as a preconditioning target because it's 30-40% faster than LZ4 and has a higher ratio. This is next. A format that uses tiled decompression and multiple threads is also a no-brainer. Ultimately I think LZ or QOI-like variants will be very strong contenders in the future.

No comments:

Post a Comment