I think this is (vaguely to roughly - no idea?) similar to the new supercompressed texture algorithm John Brooks is working on, CEO of Blue Shift:
Instead of crunch-style top down clusterization (VQ) on the color endpoints and selector vectors with GPU format specific optimization and refinement stages, you could convert a pixel-wise codec (something like JPEG but hopefully better!) to directly output GPU block data. In ETC1, each subblock's colorspace line lies along the grayscale axis. The selector optimization/decision process is driven very strongly by the importance of luma vs. chroma error in ETC1. (And it's possible to trivially convert ETC1 data into high quality DXT1, so none of this info is specific to exclusively ETC1.)
The compressor can send "mode" bits to the decompression/transcoder system, depending on the importance of each bit of compressed data. The actual importance is related to a balance between bitrate and desired decompression/encoding time. If something is just way too expensive to do in the transcoder, then just do the computation in the encoder offline and send the results as some arithmetically coded bits with the right contexts.
And the encoder can make all key RDO decisions keeping in mind the destination GPU format (ETC1, DXT1, ASTC, etc.) and the transcoder's strengths/weaknesses.
It's also possible to combine crunch's clusterization+refinement algorithm to handle the chroma problem in this format. Use VQ on the luma/chroma endpoint information, and a pixel-wise approach on the luma to drive the selector decision process. I can imagine so many possibilities here!