"A comprehensive assessment of the structural similarity index"
"In this paper, it is shown, both empirically and analytically, that the index is directly related to the conventional, and often unreliable, mean squared error. In the first evaluation, the two metrics are statistically compared with one another. Then, in the second, a pair of functions that algebraically connects the two is derived. These results suggest a much closer relationship between the structural similarity index and mean squared error."
"This research, however, appears to be the first to directly consider the statistical relationships between the two methods. As well, this work develops a pair of mathematical functions that directly link the two. Given these findings, one is left to question whether the structural similarity index is ready for widespread adoption."Interesting! I get the feeling there's more to SSIM than meets the eye. Unfortunately, this paper is behind a paywall. Another quote from the paper:
"These findings suggest a reasonably significant level of correlation between the SSIM and MSE. Values range from r = 0.6364 to r = 1.0000, with an average of r = 0.9116 and a variance of 0.007. An average this large, along with a small variance, suggests that most of the correlations are decidedly significant. Clearly, when ordering coded images, the SSIM and MSE often choose similar arrangements. Results such as this are likely a sign of a deeper relationship between the two methods."Hmm, okay. So MSE and SSIM are highly correlated. The paper even has simple algorithms to convert between MSE<->SSIM. Perhaps I could use these algorithms to help optimize my SSIM code. (Just joking.) From the conclusion:
"Collectively, these findings suggest that the performance of the SSIM is perhaps much closer to that of the MSE than some might claim. Consequently, one is left to question the legitimacy of many of the applications of the SSIM."Got it. Here's another interesting paper, this one not behind a paywall:
"Mystery behind similarity measures MSE and SSIM"
"We see that it is based on the same sample moments and correlation coefficient as MSE. So this is the first observation/property or mystery revealed about MSE and SSIM: both measures are composed of the same parameters which are only combined in a different way."
"So the third observation for SSIM is its instability around zero point (0,0) and the fourth one – it can be used only for data of the same sign. The authors of SSIM solve these problems by introducing small constants and restricting the usage to non-negative data only, respectively."
"The fifth observation for Dice measure and thus for SSIM too is that it depends on the absolute values of input parameters. First, it is insensitive at all if one of the parameters is equal 0. Secondly, its sensitivity is decreasing by the increase of absolute parameter values."Hmm, none of that sounds great to me. They go on to introduce their own metric they call CMSC, and claim "all proposed measures are free of drawbacks of MSE and SSIM and thus are more suitable as objective similarity/quality measures not only for the images but any signals."
John Brooks at Blue Shift experimented with using SSIM in his new ETC1/2 encoder, etc2comp. In a conversation about SSIM, he said that:
"It [SSIM] becomes insensitive in high-contrast areas. SSIM is all about matching contrast & structure. But Block Truncation Coding by its nature is increasing contrast because it posterizes color transitions to 4 selector values. This made the encoder freak out and try to reduce contrast to compensate, making the encoding look crappy. I think it might be the right tool for high-level jobs, but was a poor tool for driving low-level encoder behavior."
"BTC trades 16 shades for 4 which means sharper transitions and more contrast when measured against the original. It also usually means less structure than the original due to posterizing 16-to-4. But neither artifact can be controlled by the encoder as they are a result of the encoding, so it's very hard to navigate the encoding search space when SSIM is so outside its design parameters."
Sounds pretty reasonable to me. I'm going to be doing some testing using a ETC1 encoder optimized for SSIM very soon. Let's see what happens.