Richard Geldreich's Blog: BC7 showdown: ispc

Saturday, April 21, 2018

BC7 showdown: ispc_texcomp vs. my ispc encoder

This benchmark compares the Fast ISPC Texture Compressor's BC7 encoder vs. my new ispc vectorized encoder. I've just barely begun to profile and optimize it, but it's already looking really strong. To create this encoder, I studied all other available BC7 encoders and leveraged all the things I learned while creating crunch's BC1 high-quality encoder and Basic's new ETC1 and universal format encoders. This is a non-RDO encoding test, i.e. what matters here is how much quality each encoder can achieve per unit time.

It was conducted on a 20 core Xeon workstation across 31 test images (first 24 are the kodim images). Both use AVX instructions. The quality metric is RGB average PSNR, perceptual mode disabled. The PSNR's below are averages across the entire set. The timings are the total amount of CPU time used only calling the encoder functions (across all threads). OpenMP was used for threading, and each encoder was called with 64 blocks per function call.

I'm currently focusing on ispc_texcomp's basic and slow profiles:

ispc_texcomp:
basic profile: 100.5 secs, 46.54 dB, .4631 dB/sec
slow profile: 355.29 secs, 46.77 dB, .1316 dB/sec

My encoder:
uber 0: 56.7 secs, 46.49 dB, .8199 dB/sec
uber 1: 86.4 secs, 46.72 dB, .5407 dB/sec
uber 2: 129.1 secs, 46.79 dB, .3624 dB/sec
uber 2 (2 refinement passes, 16 max 1,3 partitions): 161.9 secs, 46.84 dB, .2893 dB/sec
uber 3 (2 refinement passes, 32 max 1,3 partitions, pbit search): 215.2 secs, 46.91 dB, .2180 dB/sec
uber 4 (2 refinement passes, 64 max 1,3 partitions, pbit search): : 292.5 secs, 46.96 dB, .1605 dB/sec

The dB/sec. values are a simple measure of encoder efficiency. ispc_texcomp's slow profile at .1315 dB/sec. is working very hard for very little quality per unit time compared to its basic profile. The efficiency of both encoders decreases as the quality is improved, but ispc_texcomp falls off very rapidly above basic and mine falls off later. I believe a whole texture encoder like etc2comp's can more efficiently get through the quality barrier here.

What this boils down to: If you use ispc_texcomp, definitely avoid the slow profile (the tiny gain in quality isn't worth it). And it's definitely possible to compete against ispc_texcomp using plain RGB metrics.

2 comments:

UnknownApril 30, 2018 at 4:35 PM
Hi Rich, your result is amazing. Is there any chance you can share your code?
ReplyDelete
Replies
Rich GeldreichMay 1, 2018 at 11:04 PM
Thanks Hai. I opened sourced a strong subset of my C prototype encoder:
http://richg42.blogspot.com/2018/04/new-bc7-encoder-open-sourced.html
ReplyDelete
Replies