Monday, September 19, 2016

ETC1 and ETC1/2 Texture Compressor Benchmark

(This is a "sticky" blog post. I'll keep this page up to date as interesting or important events happen. Examples: When a new practical ETC encoder gets released, or when ETC codecs are significantly updated.)

The main purpose behind this particular benchmark is to conduct a deep survey of every known practical ETC1/2 encoder, so I can be sure basislib's ETC1 and universal encoders are very high quality. I want to closely understand where this space is at, and where it's going. This is exactly what I did while writing crunch. I need a very high quality, stable, and scalable ETC1/2 block parameter optimizer that works with potentially many thousands of input pixels. rg_etc1's internal ETC1 optimizer is the only thing I have right now that solves this problem.

I figured this data would be very useful to other developers, so here's a highest achievable quality benchmark of the following four practical ETC1/2 compressors:

  • etc2comp: A full-featured ETC1/2 encoder developed by engineers at Blue Shift and sponsored by Google. Supports both RGB and perceptual error metrics.
  • etcpak: Extremely fast, ETC1 and partial ETC2 (planar blocks only), RGB error metrics only
  • Intel ISPC Texture Compressor: A very fast ETC1 compressor, RGB error metrics only
  • basislib ETC1: An updated version of my open source ETC1 block encoder, rg_etc1. Supports both RGB and perceptual error metrics (unlike rg_etc1).

The test files were  ~1,500 .PNG textures from the larger test corpus I used to tune crunch. Each texture was compressed using each encoder, then unpacked using rg_etc1 modified to support the 3 new ETC2 block types (planar, T, and H).

Benchmarking like this is surprisingly tricky. The API's to all the encoders are different, most are not well documented, and even exactly how you compute PSNR (because there are multiple definitions each with slightly different equations) isn't super well defined. Please see the "developer feedback" notes below.

I've sanity checked these results by writing .KTX files, converting them to .PNG using Mali's GPU Texture Compression Tool (which thankfully worked, because the .KTX format is iffy when it comes to interchange), then computing PSNR's using ImageMagick's "compare" tool. Thanks to John Brooks at Blue Shift for helping me verify the data for etc2comp, and helping me track down and fix the effort=100.0 issue in the first release of this benchmark.

I also have performance statistics, which I'll cover in a future post. The perf. data I have for etcpak isn't usable for accurate timing right now, because the etcpak code I'm calling is only single threaded and includes some I/O.

This first graph compares all four compressors in ETC1 mode, using RGB (average) PSNR.

Error Metric: Avg. RGB


The next graph enables ETC2 support in the encoders that support it, currently just etc2comp and etcpak:


etc2comp in ETC2 mode really shines at the lower quality levels. At below approximately 32 dB it appears the minimum expected quality improvement from ETC2 is significant. Above ~32 dB, the minimum expected improvement drops down a bit, closer to ETC1's quality level. (Which seems to make sense, as ETC2 was designed to better handle blocks that ETC1 is weak at.)

etcpak doesn't support T and H blocks, so it suffers a lot here. This is why it's very important to pay attention to benchmarks like this one, because quality (even in ETC2-capable or aware compressors) can highly vary between libraries.

Error Metric: Perceptual


Developer Feedback

  • ISPC: I had to copy ispc.exe into your project directory for it to build in my VS2015 solution. That brought down the "out of the box" experience of getting your stuff into my solution. On the upside, your API was dead simple to figure out and was very "pure" - as it should be. (However, you should rename "stride" to "stride_in_bytes". I've seen at least one programmer get it wrong and I had to help them.)
  • etcpak: Can you add a single API to do compression with multithreading, like etc2comp? And have it return a double of how much time it takes to actually execute, excluding file I/O stuff. Your codec is so fast than I/O times will seriously skew the statistics.
  • etc2comp: Hey, ETC1 is still extremely important. Both Intel, basislib, and rg_etc1 have higher ETC1 quality than etc2comp. Also, could you add some defines like this to etc.h so developers know how to correctly call the public Etc::Encode() API:



  • 9/20: I fixed etc2comp's "effort" setting, added Intel's compressor, and removed the perceptual graphs (for now) to speed things up.
  • 9/20: Changed title and purpose of this post to a sticky benchmark page. I'm now moving into the public texture compression benchmarking space - why not? It's fun!


  1. Nice work Rich!

    One thing to look at is how PSNR is being calculated. While developing ETC2comp, the engineers at Blue Shift, Inc. found that different codecs often calculate PSNR very differently from each other and their reported PSNR values could not be reliably compared against each other.

    For that reason, when comparing codecs, we typically use a numeric encoding (not perceptual) for each codec, and then calculate PSNR on the decoded images using ImageMagick, which is a robust, neutral way to evaluate quality.

    Do you know if your PSNR calculations match those of ImageMagick?


    1. This post has both RGB (average) and 601 Luma PSNR. I'm using the same open source PSNR code I used to create crunch on these post. I use RGB average, separate R,G,B, and Luma (either 601 or 709) PSNR to compare codecs. (I also display max component error and several other per-component metrics, like RMSE, etc.) While creating crunch I had to pay attention to Luma PSNR to be competitive against the other DXT encoders, which can be extremely perceptual.

      Sadly, it's 2016 and we still don't have standards for how to compute PSNR. I've been using a newer version of this code from crunch, which supports a variety of metrics:

      See error_metrics::compute(). "average_component_error" is set to true. It basically creates a histogram and computes the metrics from that. It uses these formulas to compute PSNR:

      Luma PSNR is even more problematic. Do they use 601 or 709 luma, for example? I've been using 601 for years but I also support 709 in my latest code. My posts have been using 601. 709 has a higher green weight, so it's even more important for an encoder to favor green vs. the other channels. In perceptual mode crunch favors green massively and it massively de-emphasizes blue. (It has to, to be competitive vs. NVDXT.)

      I'll try ImageMagick.

    2. Yes, I've verified that my PSNR's are exactly the same way as ImageMagick's. I've also exported .KTX files, loaded them into Mali's Texture Tool, saved .PNG's and computed PSNR's - they match mine exactly.

  2. I'm not sure it makes sense to show ETC2comp at Effort=1 (lowest quality) unless encode speed is also being measured. I guess it might be worth comparing all codecs at lowest quality setting to show their quality range.


    1. Yes, I'm also measuring encoding speed, although I've been saving that for another post. Super fast encoding is valuable too, so I figured I would include effort=1 for reference purposes.

      Also, I already had the effort=1 graph (by accident!) so I figured I would keep it.

    2. Good! When you get to that later quality-vs-speed post, try ETC2comp at Effort=40 too. While developing ETC2comp, the dev team found that this was a good sweet spot. It's also the default Effort setting.


    3. I've still got the graph, but I want to keep this simple. I'll do a deeper Pareto Frontier analysis next.

  3. > etcpak: Can you add a single API to do compression with multithreading, like etc2comp? And have it return a double of how much time it takes to actually execute, excluding file I/O stuff. Your codec is so fast than I/O times will seriously skew the statistics.

    etcpak has a benchmark mode, which prints image load time and mean compression time measured over NUMCPU * 10 runs. Compression in benchmark mode is done to a heap-allocated memory, so there's no I/O dependency, which is present in normal mode of operation, due to usage of mmap-ed memory.

    Image load time is an important metric in etcpak, as it is a bottleneck compared to compression speed. I was able to optimize it slightly by removing checksum calculations in libpng and zlib, for a nice 12% speed boost.

    Making an API shouldn't be too hard, just look at Application.cpp:131-152 for an implementation of benchmark mode.

    Implementation of job scheduling is another important thing to take into consideration. My stupid simple TaskDispatch manages to slightly outperform Intel's Cilk and isn't as prone to interference from other processes as MSVC's std::async.