While working on an encoder I conduct a lot of experiments (probably thousands over time) to improve it. To check for regressions or silly bugs, you must use some sort of error metrics otherwise it'll be impossible to make forward progress just by examining the output with your eye.
Here's what I commonly use:
Total time: 0.124466, encode-only CPU time: 3.288052
Compressed size: 360041, 7.325053 bpp
BC7 Mode histogram:
6 22 2 0 3809 2754 17936 0
RGB Total Error: Max: 53, Mean: 5.220, MSE: 21.670, RMSE: 4.655, PSNR: 34.772, CRCA: 0x76C0BC5D CRCB: 0x9A5C47D1
RGB Average Error: Max: 53, Mean: 1.740, MSE: 7.223, RMSE: 2.688, PSNR: 39.543, SSIM: 0.985854, CRCA: 0x76C0BC5D CRCB: 0x9A5C47D1
Luma Error: Max: 35, Mean: 1.123, MSE: 3.048, RMSE: 1.746, PSNR: 43.290, SSIM: 0.993128, CRCA: 0x351EFAEF CRCB: 0xA29288A7
Red Error: Max: 52, Mean: 1.766, MSE: 7.544, RMSE: 2.747, PSNR: 39.355, SSIM: 0.987541, CRCA: 0xC673D5EA CRCB: 0x8623AA59
Green Error: Max: 40, Mean: 1.553, MSE: 5.462, RMSE: 2.337, PSNR: 40.758, SSIM: 0.989642, CRCA: 0x2BF4294F CRCB: 0x418E5EE1
Blue Error: Max: 53, Mean: 1.900, MSE: 8.664, RMSE: 2.944, PSNR: 38.753, SSIM: 0.980377, CRCA: 0x76C0BC5D CRCB: 0x9A5C47D1
Alpha Error: Max: 31, Mean: 1.039, MSE: 3.308, RMSE: 1.819, PSNR: 42.935, SSIM: 0.976243, CRCA: 0x524049DA CRCB: 0x7492A642
I report some timings (two timings because it may be a threaded encode), the compressed size of the data (using LZMA) to get a feel for the encoder's output entropy, the mode histogram on those modes that support this concept, and then a bunch of metrics.
I report a bunch of variations of PSNR (RGB Total and RGB Average) because there really are no rock solid standards for this and every author or tool seems to do this in a different way. I sometimes also use RGBA PSNR to compare my results against other tools.
Sometimes, an encoder bug will subtly break just a few blocks. That's where the Max errors are really handy.
Luma error is what I pay attention to during perceptual encoding tests, and RGB Average for non-perceptual tests. SSIM and PSNR are highly correlated in my experience for block compression work, which I'll show in a future blog post.
I also use PSNR per second or SSIM per second metrics (or some variation).
The CRC's are there to detect when the output (or input) data is exactly the same across runs.
I have a tool which can compare two encoders across a large corpus of test textures, which is really handy for finding regressions.
I also test with large textures containing random blocks gathered from thousands of test textures. For example (blogger.com has resampled it, here's the original 4K PNG):
These corpus textures can be generated using crunch in the corpus_gen mode.
I've found that testing like this is a reliable way of gauging the overall strength of an encoder vs. another. Graphics devs will instantly say "but wait why aren't you looking at minimizing block artifacts by looking at adjacent blocks!" That turns block compression into a global optimization problem (like PVRTC), and it's already hard enough as it is to do in a practical amount of CPU time. Also with BC7, the error is already pretty low (45-60 dB), and BC1-style block artifacts in a properly written BC7 encoder at high quality settings are uncommon.