Richard Geldreich's Blog: Vectorized interleaved Range Coding using SSE 4.1

Monday, April 24, 2023

Vectorized interleaved Range Coding using SSE 4.1

In order to avoid the current (and upcoming) ANS/rANS entropy coding patent minefield, we're avoiding it and using vectorized Range Coding instead. Here's a 24-bit SSE 4.1 example using 16 interleaved streams. This example decoder gets 550-700 megabytes/sec. with 8-bit alphabets on various Intel/AMD CPU's I've tried:

https://github.com/richgel999/sserangecoding

More on the rANS patent situation (from early 2022):

https://www.theregister.com/2022/02/17/microsoft_ans_patent/

This decoder design is practical on any CPU or GPU that supports fast hardware integer or float division. It explicitly uses 24-bit registers to sidestep issues with float divides. I've put much less work on optimizing the encoder, but the key step (the post-encode byte swizzle) is the next bottleneck to address.

Richard Geldreich's Blog

Monday, April 24, 2023

Vectorized interleaved Range Coding using SSE 4.1

No comments:

Post a Comment