Here are the visualizations:
After studying the clusterizations visualized as massive PNG files I saw a lot of nonsensical things. The algorithm worked, but sometimes clusters would be surprisingly large (in 6D for endpoints or 16D space for selectors), leading to unrelated blocks being lumped into the same cluster.
To fix this, I started using Lloyd's algorithm at a higher level, so the codebook could be refined over several iterations:
1. Create candidate codebook (like crunch)
2. Reassign each input block to the best codebook entry (by trying them all and computing the error of each), creating a new clusterization.
3. Compute new codebook entries (by optimizing the endpoints or selecting the best selectors to use for each cluster factoring in the block endpoints).
4. Repeat steps 2-3 X times. Each iteration will lower the overall error.
You also need to insert steps to identify redundant codebook entries and delete them. If the codebook becomes too small, you can find the cluster with the worst error and split it into two or more clusters.
Also, whenever you decide to use a different endpoint or selector to code a block, you've changed the clusterization used and you should recompute the codebook (factoring in the actual clusterization). Optimizations like selector RDO change the final clusterization.