Monday, April 23, 2018

DirectXTex BC7 3 subset wierdness

I brought Microsoft's DirectXTex project (latest code) into my test project, to see how it fairs vs. ispc_texcomp and my encoder. Unfortunately, it appears broken. Across 31 test images (kodim and others):

DirectXTex BC7:

No flags (0): 9972.6 secs 44.41 dB

BC_FLAGS_USE_3SUBSETS: 13534.6 secs 44.25 dB

This is wrong. Quality should go up with 3 subset modes enabled, not down. I'm tempted to go figure out what's wrong in there myself, but it's a lot of code.

By comparison, my ispc encoder gets 477.1 secs 46.72 dB (using high quality settings). ispc_texcomp is in the same ballpark. With 3 subset modes disabled, I get 45.96 dB (as expected - the 3 subset modes are useful!).

I verified that the flag is doing something. Here's the BC7 mode histogram for kodim01 with the flags parameter to DirectX::D3DXEncodeBC7() set to 0:

0 17968 0 1752 0 0 4856 0

With 3 subsets enabled:

1435 16647 26 1675 0 0 4793 0

The source looks nice and readable, and as a library it was dead-simple to get it building and linked against my stuff. But it doesn't appear to be a production-ready encoder, it's more like a sample.

I'm calling it from multiple threads using OpenMP (it's too slow to benchmark otherwise). It makes my 20 core Xeon workstation crawl for a while, it's that slow.

Also, there is no need to disable 3 subset modes in a properly written encoder. Some timings with my encoder: 487 secs (3 subsets enabled) vs. 476 secs (3 subsets disabled). In another test at lower quality: 215.9 secs (3 subsets enabled) vs. 199.5 secs (disabled). This was on a 20 core Xeon workstation/40 threads/31 images (kodim+others). 

The extra cost of a 3 subset mode isn't a big deal (3 endpoint optimizations) once you've estimated which partition(s) to examine more deeply. Partition estimation is fast and simple with SIMD, and a nice property of 3 subset modes is that the # of pixels fed to the endpoint optimizer per subset is rather low (enabling 3-subset specific optimizations). If your pbit handling is correct these modes are quite valuable.

No comments:

Post a Comment