Saturday, October 1, 2016

ETC1 encoder performance on kodim18 at various quality/effort levels

I'm trying to get a handle on how the available ETC1 compressors perform, using their public API's, at their various quality or effort levels. This is only for a single image (kodim18 - my usual for quick tests like this).

First, here's the performance of etc2comp on kodim18, in ETC1-only mode, multithreading enabled, RGB Avg PSNR metrics, on effort levels [0,100] in increments of 5:


Efforts between roughly 40-65 seem to be the sweet spot. Effort=100 is obviously wasteful.

Here's another graph (can you tell I'm practicing my Excel graphing skills!), this time comparing the time and quality of various ETC1 (and now ETC2 - for etc2comp) compressors at different encoder quality/effort settings:



Important Notes:
  • To be fair to Intel's ETC1 encoder (function CompressBlocksETC1()), which is not multithreaded, I added another bar labeled "ispc_etc1 MT" which has the total CPU time divided by 20 (to roughly match the speedup I'm seeing using 40 threads in the other natively multithreaded encoders). 
  • basislib is now using a variant of cluster fit (see previous posts). basislib_1 is lowest quality, basislib_3 is highest. Notice that basislib_3 is only ~2X slower than Intel's SIMD code, but basislib doesn't use any SIMD at all.
  • basislib and etc2comp both use 40 threads
  • etcpak's timings are currently single threaded, because I'm still using a single threaded entrypoint inside the code (BlockData::Process()). It's on my TODO to fix this. IMHO unless you need a real-time ETC1 encoder I think it trades off too much quality. However, if you need a real-time encoder and don't mind the loss in quality it's your best bet. If the author added a few optional SIMD-optimized cluster fit trials in there it would probably kick ass.
To get an idea how efficient etc2comp currently is at scanning the ETC1 search space for kodim18, let's see what effort level (and how much CPU time) etc2comp takes to approximately match two other encoder's quality levels:
  • basislib Cluster Fit (64 trials out of 165): .115 secs 35.917 dB
Quality matched or exceeded first at etc2comp effort 70: 1.095 secs 35.953 dB
  • Intel ISPC: 1.03 secs 35.969 dB
Quality matched or exceeded first at etc2comp effort 80: 1.83 secs 35.992 dB

perceptual: 0 etc2: 0 rec709: 1
Source filename: kodak\kodim18.png 512x768
--- basislib Quality: 4
basislib pack time: 0.115
basislib ETC image Error: Max:  56, Mean: 2.865, MSE: 16.648, RMSE: 4.080, PSNR: 35.917, SSIM: 0.965767

--- etc2comp effort: 0
etc2comp time: 0.051663
etc2comp Error: Max:  64, Mean: 3.186, MSE: 21.123, RMSE: 4.596, PSNR: 34.883, SSIM: 0.958817
--- etc2comp effort: 5
etc2comp time: 0.083509
etc2comp Error: Max:  64, Mean: 3.129, MSE: 19.658, RMSE: 4.434, PSNR: 35.196, SSIM: 0.959313
--- etc2comp effort: 10
etc2comp time: 0.106361
etc2comp Error: Max:  64, Mean: 3.092, MSE: 19.052, RMSE: 4.365, PSNR: 35.331, SSIM: 0.959794
--- etc2comp effort: 15
etc2comp time: 0.133278
etc2comp Error: Max:  64, Mean: 3.063, MSE: 18.661, RMSE: 4.320, PSNR: 35.421, SSIM: 0.960250
--- etc2comp effort: 20
etc2comp time: 0.193460
etc2comp Error: Max:  64, Mean: 3.042, MSE: 18.416, RMSE: 4.291, PSNR: 35.479, SSIM: 0.960595
--- etc2comp effort: 25
etc2comp time: 0.162790
etc2comp Error: Max:  64, Mean: 3.027, MSE: 18.256, RMSE: 4.273, PSNR: 35.517, SSIM: 0.960869
--- etc2comp effort: 30
etc2comp time: 0.182370
etc2comp Error: Max:  64, Mean: 3.012, MSE: 18.108, RMSE: 4.255, PSNR: 35.552, SSIM: 0.961207
--- etc2comp effort: 35
etc2comp time: 0.196609
etc2comp Error: Max:  64, Mean: 2.998, MSE: 17.980, RMSE: 4.240, PSNR: 35.583, SSIM: 0.961578
--- etc2comp effort: 40
etc2comp time: 0.217227
etc2comp Error: Max:  64, Mean: 2.987, MSE: 17.888, RMSE: 4.229, PSNR: 35.605, SSIM: 0.961854
--- etc2comp effort: 45
etc2comp time: 0.248881
etc2comp Error: Max:  64, Mean: 2.970, MSE: 17.771, RMSE: 4.216, PSNR: 35.634, SSIM: 0.962461
--- etc2comp effort: 50
etc2comp time: 0.361306
etc2comp Error: Max:  59, Mean: 2.916, MSE: 17.175, RMSE: 4.144, PSNR: 35.782, SSIM: 0.963669
--- etc2comp effort: 55
etc2comp time: 0.379762
etc2comp Error: Max:  59, Mean: 2.902, MSE: 17.091, RMSE: 4.134, PSNR: 35.803, SSIM: 0.964149
--- etc2comp effort: 60
etc2comp time: 0.522357
etc2comp Error: Max:  59, Mean: 2.882, MSE: 16.840, RMSE: 4.104, PSNR: 35.867, SSIM: 0.964800
--- etc2comp effort: 65
etc2comp time: 0.560707
etc2comp Error: Max:  59, Mean: 2.878, MSE: 16.818, RMSE: 4.101, PSNR: 35.873, SSIM: 0.964974
--- etc2comp effort: 70
etc2comp time: 1.095014
etc2comp Error: Max:  59, Mean: 2.857, MSE: 16.512, RMSE: 4.063, PSNR: 35.953, SSIM: 0.965366
--- etc2comp effort: 75
etc2comp time: 1.166479
etc2comp Error: Max:  59, Mean: 2.852, MSE: 16.490, RMSE: 4.061, PSNR: 35.959, SSIM: 0.965534
--- etc2comp effort: 80
etc2comp time: 1.829960
etc2comp Error: Max:  59, Mean: 2.842, MSE: 16.362, RMSE: 4.045, PSNR: 35.992, SSIM: 0.965769
--- etc2comp effort: 85
etc2comp time: 1.904691
etc2comp Error: Max:  59, Mean: 2.836, MSE: 16.329, RMSE: 4.041, PSNR: 36.001, SSIM: 0.966037
--- etc2comp effort: 90
etc2comp time: 2.709250
etc2comp Error: Max:  59, Mean: 2.829, MSE: 16.255, RMSE: 4.032, PSNR: 36.021, SSIM: 0.966277
--- etc2comp effort: 95
etc2comp time: 2.802099
etc2comp Error: Max:  59, Mean: 2.827, MSE: 16.251, RMSE: 4.031, PSNR: 36.022, SSIM: 0.966315
--- etc2comp effort: 100
etc2comp time: 3.619217
etc2comp Error: Max:  59, Mean: 2.825, MSE: 16.216, RMSE: 4.027, PSNR: 36.031, SSIM: 0.966349

--- etcpak time: 0.006
etcpak Error: Max:  90, Mean: 3.464, MSE: 25.640, RMSE: 5.064, PSNR: 34.042, SSIM: 0.950396

--- ispc_etc time: 1.033881
ispc_etc1 Error: Max:  56, Mean: 2.866, MSE: 16.450, RMSE: 4.056, PSNR: 35.969, SSIM: 0.965412


ETC2 enabled:

perceptual: 0 etc2: 1 rec709: 1
Source filename: kodak\kodim18.png 512x768
--- basislib Quality: 1
basislib pack time: 0.036
basislib ETC image Error: Max:  73, Mean: 3.168, MSE: 20.910, RMSE: 4.573, PSNR: 34.927, SSIM: 0.959441
--- basislib Quality: 2
basislib pack time: 0.054
basislib ETC image Error: Max:  56, Mean: 2.945, MSE: 17.772, RMSE: 4.216, PSNR: 35.634, SSIM: 0.964359
--- basislib Quality: 3
basislib pack time: 0.107
basislib ETC image Error: Max:  56, Mean: 2.865, MSE: 16.648, RMSE: 4.080, PSNR: 35.917, SSIM: 0.965767
--- etc2comp effort: 0
etc2comp time: 0.086306
etc2comp Error: Max:  64, Mean: 3.158, MSE: 20.810, RMSE: 4.562, PSNR: 34.948, SSIM: 0.959069
--- etc2comp effort: 5
etc2comp time: 0.179816
etc2comp Error: Max:  59, Mean: 3.087, MSE: 19.044, RMSE: 4.364, PSNR: 35.333, SSIM: 0.959701
--- etc2comp effort: 10
etc2comp time: 0.247190
etc2comp Error: Max:  59, Mean: 3.046, MSE: 18.396, RMSE: 4.289, PSNR: 35.484, SSIM: 0.960256
--- etc2comp effort: 15
etc2comp time: 0.287620
etc2comp Error: Max:  59, Mean: 3.013, MSE: 17.977, RMSE: 4.240, PSNR: 35.584, SSIM: 0.960751
--- etc2comp effort: 20
etc2comp time: 0.339065
etc2comp Error: Max:  59, Mean: 2.990, MSE: 17.709, RMSE: 4.208, PSNR: 35.649, SSIM: 0.961162
--- etc2comp effort: 25
etc2comp time: 0.383907
etc2comp Error: Max:  59, Mean: 2.971, MSE: 17.515, RMSE: 4.185, PSNR: 35.697, SSIM: 0.961507
--- etc2comp effort: 30
etc2comp time: 0.432019
etc2comp Error: Max:  59, Mean: 2.954, MSE: 17.352, RMSE: 4.166, PSNR: 35.737, SSIM: 0.961876
--- etc2comp effort: 35
etc2comp time: 0.480186
etc2comp Error: Max:  59, Mean: 2.938, MSE: 17.210, RMSE: 4.149, PSNR: 35.773, SSIM: 0.962279
--- etc2comp effort: 40
etc2comp time: 0.516155
etc2comp Error: Max:  59, Mean: 2.925, MSE: 17.107, RMSE: 4.136, PSNR: 35.799, SSIM: 0.962590
--- etc2comp effort: 45
etc2comp time: 0.565827
etc2comp Error: Max:  59, Mean: 2.911, MSE: 17.009, RMSE: 4.124, PSNR: 35.824, SSIM: 0.963044
--- etc2comp effort: 50
etc2comp time: 1.124057
etc2comp Error: Max:  59, Mean: 2.892, MSE: 16.852, RMSE: 4.105, PSNR: 35.864, SSIM: 0.963703
--- etc2comp effort: 55
etc2comp time: 1.192462
etc2comp Error: Max:  59, Mean: 2.880, MSE: 16.772, RMSE: 4.095, PSNR: 35.885, SSIM: 0.964164
--- etc2comp effort: 60
etc2comp time: 1.713074
etc2comp Error: Max:  59, Mean: 2.851, MSE: 16.424, RMSE: 4.053, PSNR: 35.976, SSIM: 0.964913
--- etc2comp effort: 65
etc2comp time: 1.828673
etc2comp Error: Max:  59, Mean: 2.846, MSE: 16.398, RMSE: 4.049, PSNR: 35.983, SSIM: 0.965099
--- etc2comp effort: 70
etc2comp time: 2.461853
etc2comp Error: Max:  59, Mean: 2.836, MSE: 16.274, RMSE: 4.034, PSNR: 36.016, SSIM: 0.965358
--- etc2comp effort: 75
etc2comp time: 2.608303
etc2comp Error: Max:  59, Mean: 2.831, MSE: 16.247, RMSE: 4.031, PSNR: 36.023, SSIM: 0.965534
--- etc2comp effort: 80
etc2comp time: 3.383624
etc2comp Error: Max:  59, Mean: 2.820, MSE: 16.156, RMSE: 4.019, PSNR: 36.047, SSIM: 0.965855
--- etc2comp effort: 85
etc2comp time: 3.719689
etc2comp Error: Max:  59, Mean: 2.814, MSE: 16.125, RMSE: 4.016, PSNR: 36.056, SSIM: 0.966079
--- etc2comp effort: 90
etc2comp time: 4.675509
etc2comp Error: Max:  59, Mean: 2.808, MSE: 16.072, RMSE: 4.009, PSNR: 36.070, SSIM: 0.966264
--- etc2comp effort: 95
etc2comp time: 4.619700
etc2comp Error: Max:  59, Mean: 2.806, MSE: 16.068, RMSE: 4.008, PSNR: 36.071, SSIM: 0.966293
--- etc2comp effort: 100
etc2comp time: 4.771136
etc2comp Error: Max:  59, Mean: 2.805, MSE: 16.064, RMSE: 4.008, PSNR: 36.072, SSIM: 0.966309
--- etcpak time: 0.045
etcpak Error: Max:  90, Mean: 3.458, MSE: 25.593, RMSE: 5.059, PSNR: 34.050, SSIM: 0.950551
ispc_etc time: 1.034064
ispc_etc1 Error: Max:  56, Mean: 2.866, MSE: 16.450, RMSE: 4.056, PSNR: 35.969, SSIM: 0.965412

9 comments:

  1. Awesome work as usual Rich!

    A few thoughts:

    1) Estimating a 40x speedup for a multithreaded Intel encoder is probably too generous. I'd expect closer to a 20x speedup on a 20-core CPU using 40 threads.

    2) While ETC2Comp is viable for ETC1-only encoding, it's really designed for the more complex ETC2 formats so it can handle Alpha textures, normal maps, and HDR (11-bit-per-component) textures.

    I'm not sure why users would choose to encode with ETC1 only as ETC2 adds quality and features.

    3) The Effort feature of ETC2Comp allows users to control the full range between 'high-speed' and 'high-quality' encoding, but as your benchmark graphs and research have shown, if it was possible to also add cluster-fit techniques to ETC2Comp, then both speed and quality would likely see a nice boost!

    -JB

    ReplyDelete
    Replies
    1. Hey JB - Yea, the 40x speedup thing was a last minute estimate at 2:30am, because I wanted to be fair. I'll change it to 20x.

      About ETC1: It's supported by a huge range of deployed devices, so it's still very important to many of the developers I speak with.

      Delete
    2. Also, many (most?) blocks in ETC2 textures use ETC1 mode, right? So speeding up ETC1 encoding will also help developers using etc2comp in ETC2 mode as well.

      Delete
    3. > Also, many (most?) blocks in ETC2 textures use ETC1 mode, right?

      It depends. If the images being encoded are limited to 8-bit RGB, then yes, both ETC1 and ETC2 block types will be used and improvements to ETC1 encoding quality and speed will help.

      However, if the images being encoded include RGBA alpha textures, normal maps, high dynamic range, or 1-or-2-channel data tables, then nearly every encoded block will be of a newer ETC2 type.

      For these image types, a classic ETC1-block-format encoder does not really help.

      -JB

      Delete
    4. So what you're saying is: it boils down to the distribution of used texture types. If a product is targeting ETC1-only capable devices (which is a huge number of currently deployed devices - several hundreds of millions), all of these map types (normal maps, alpha textures, tables, etc.) must be encoded into ETC1 blocks (or perhaps some uncompressed format). For alpha textures, developers do things like use atlases or use multiple ETC1 textures. Importantly, there are many products that require compressed textures where "advanced" things like normal maps, HDR, 1/2 channel tables are not used or valuable.

      If a dev is targeting ETC2, they are free to pick whatever ETC1/ETC2/EAC format that can get the job down with an acceptable amount of quality. Now whether or not ETC1 matters so much depends on the distribution of these asset types, and on the decisions the graphics programmers make when deciding which texture format (ETC1, ETC2, or some EAC encoding) they use for each texture types.

      My main point is, graphics programmers will be biased in their decision here. If high quality ETC2/EAC texture encoding is much slower than high-quality ETC1 encoding, there will be a tendency to stay away from ETC2/EAC (because it takes much longer to encode) and instead choose formats which are extremely fast to encode at acceptable quality. It's all about effective product value delivered vs. developer cost+pain.

      Delete
    5. Also, I can just enable ETC2 mode and create another 2nd graph. Now we can see how much CPU time it takes to encode multithreaded Intel ETC1 (or basislib_3) vs. etc2comp ETC2 to approximately the same PSNR (or SSIM, etc.).

      ETC2 is supposed to be approx. 1 dB better than ETC1 (according to one Ericsson presentation), so presumably etc2comp will have to expend "less" effort to compete against ETC1 at the same quality.

      I think this would make for an interesting graph!

      Delete
    6. I've added some ETC2 effort levels into the graph.

      Delete
    7. The additional ETC2 graph entries are great! I think they show that ETC2Comp has room to improve on both ETC1 and ETC2 RGB block types. However the idea I was trying to get at before was that the majority of ETC2 block types supported by ETC2Comp will not be tested using a corpus of RGB-only textures.

      The key thing I realized while making ETC2Comp is that the primary value of ETC2 is not that it can add 1dB on RGB textures, but that it can 'level up' the ETC format and allow encoding the full range of texture types used by developers.

      ETC1 is great for LDR RGB textures with no alpha, but that's about it. ETC2 moves into the league of BC1-7 and ASTC.

      The analogy I would use is that ETC2 is similar to BC1-7, and evaluating an ETC2 encoder based purely on it's ETC1 quality is similar to evaluating a BC1-7 encoder based only on it's DXT1 (non-alpha) quality.

      It's much easier to optimize for a single-format, and while opaque RGB is the most common texture type in many use-cases, the value of these encoders is in their ability to handle the full spectrum of image and numerical data encoded in textures by modern games & applications.

      Opaque-only RGB is a great place to start benchmarking and you are doing awesome work by graphing and analyzing all these encoders. But by limiting the corpus and tests to RGB non-alpha, it's really exercising only a fraction of ETC2Comp's abilities and design.

      If you (or anyone) was bored and wanted to compare ETC2 vs BC1-7 vs ASTC across RGB, RGBA, normal maps, HDR, and numeric vector data, then we'd have a comprehensive understanding of quality & speed.

      But I doubt you are that bored. :)

      -JB

      Delete
    8. Yup, totally understand. Seriously, etc2comp sets a new baseline for a high quality GPU texture compression library, which is why I'm benchmarking it so much. It supports ETC1/ETC2/EAC, which accounts for all the texture types needed by modern developers. In that way, it's awesome.

      Delete