The window size was only 128 bytes (8 BC7 blocks). 3 byte matches is the minimum Deflate match length. 16 byte matches replicate entire BC7 blocks. Not sure why there's a noticeable peak at 10 bytes.
Entire block replacements are super valuable at these lambdas. The ERT in bc7enc_rdo weights matches of any sort way more than literals. If some nearby previous block is good enough it makes perfect sense to use it.
One thing I think would be easily added to the transform: If there's a match at the end of the previous block, try to continue/extend it by weighting the bytes following the copied bytes in the window a little more cheaply (to coax the transform towards extending the match).