I'm taking a quick break from RDO BC7. I've been working on it for too long and I need to mix things up.
I've been experimenting with high-quality PVRTC encoding for several years, off and on. I've finally found an algorithm that is simple and fast, that in most cases beats PVRTexTool's approach. (PVRTexTool is the "standard" high-quality production encoder for PVRTC. To my knowledge it's the best available.) In the cases I can find where PVRTexTool does better, the quality delta is low (<.5 dB).
I know PVRTC is doomed long term (ASTC is far better), but it's still pervasive on iOS devices.
Useful references:
http://roartindon.blogspot.com/2014/08/pvr-texture-compression-exploration.html
http://jcgt.org/published/0003/04/07/paper-lowres.pdf
It's a three phase algorithm:
1. Compute endpoints using van Waveren's approximation: For each 4x4 block compute the RGB(A) bounds of that block. Set the low endpoint to the floor() of the bounds (with correct rounding to 554), and set the high endpoint to the ceil() of the bounds (again with correct rounding to 555).
An alternative is to use Intensity Dilation (see the link to the paper), which may lead to better results. But this is far simpler and it's what Lim successfully uses in his real-time encoder.
One trick you can use for slightly higher quality is to try a pass with the low/high endpoints inverted (use the high bounds for the first endpoint with ceil(), and the low bounds for the second endpoint with floor()). Choose the ordering that minimizes the overall error. Once the endpoint order is set in place in PVRTC it can be difficult for this algorithm to change it (because all blocks influence all other blocks directly/indirectly).
2. Now go and select the optimal modulation values for each pixel using these endpoints (factoring in the PVRTC endpoint interpolation, of course).
The results at this point are usually a little better than PVRTexTool in "Lower" quality, at least visually. The results so far should be equivalent or slightly better than Lim's encoder (depending on how much you approximate the modulation value search).
Interestingly, the results up to this point are acceptable for some use cases already. The output is too banded and high contrast areas will be smeared out, but the distortion introduced up to this point is predictable and stable.
3. For each block in raster order: Now use 1x1 block least squares optimization (using normal equations) separately on each component to solve for the best low/high endpoints to use for each block. A single block impacts 7x7 pixels (or 3x3 blocks) in PVRTC 4bpp mode.
The surrounding endpoints, modulation values, and output pixels are constants, and the only unknowns are the endpoints, so this is fairly straightforward. This is just like how it's done in BC1 and BC7 encoders, except we're dealing with larger matrices (7x7 instead of 4x4) and we need to carefully factor in the endpoint interpolation.
For solving, the equation is Ax=b, where A is a 49x2 matrix (7x7 pixels=49), x is a 2x1 matrix (the low and high endpoint values we're solving for), and b is 49x1 matrix containing the desired output values (which are the desired RGB output pixel values minus the interpolated and weighted contribution from the surrounding constant endpoints). The A matrix contains the per-pixel modulation weights multiplied by the amount the endpoint influences the result (factoring in endpoint interpolation).
After you've done 1x1 least squares on each component, the results are rounded to 554/555. Then you find the optimal modulation values for the effected 7x7 block of pixels, and only accept the results if the overall error has been reduced.
You can "twiddle" the modulation values in various ways before doing the least squares calculations, just like BC1/BC7 encoders do. I've tried incrementing the lowest modulation value and/or decrementing the higher modulation value, and seeing if the results are any better. This works well.
Step 3 can be repeated multiple times to improve quality more. 3-5 refinement iterations seems to be enough. You can vary the block processing order for slightly higher quality.
There are definitely many other improvements, but this is the basic idea. Each step is simple, and all steps are vectorizable and threadable.
PVRTexTool uses 2x2 SVD, as far as I know, but this seems unnecessary, and seems to lead to noticeable stipple-like artifacts being introduced in many cases. (Check out the car door below.) Also, PVRTexTool's handling of gradients seems questionable (perhaps endpoint rounding issues?).
Quick example encodings:
Original:
PVRTexTool 4.19.0, "Very High Quality":
RGB Average Error: PSNR: 36.245, SSIM: 0.979442
Luma Error: PSNR: 36.841, SSIM: 0.984710
New encoder (using perceptual colorspace metrics, so it's trying to optimize for lower luma error):
RGB Average Error: PSNR: 36.728, SSIM: 0.976032
Luma Error: PSNR: 37.827, SSIM: 0.990251
Original:
PVRTexTool Very High:
RGB Average Error: PSNR: 41.809, SSIM: 0.993144
Luma Error: PSNR: 41.943, SSIM: 0.993875
New encoder (perceptual mode):
RGB Average Error: PSNR: 41.730, SSIM: 0.991800
Luma Error: PSNR: 43.419, SSIM: 0.997416
Original:
PVRTexTool high quality:
RGB Average Error: PSNR: 27.640, SSIM: 0.954125
Luma Error: PSNR: 30.292, SSIM: 0.964433
New encoder (RGB metrics):
RGB Average Error: PSNR: 29.523, SSIM: 0.957067
Luma Error: PSNR: 32.702, SSIM: 0.974145