In my experiments, a simple but usable subset of ETC1 can be easily converted to DXT1, BC7, and ATC. And after studying the standard, it very much looks like the full ETC1 format can be converted into BC7 with very little loss. (And when I say "converted", I mean using very little CPU, just basically some table lookup operations over the endpoint and selector entries.)
ASTC seems to be (at first glance) around as powerful as BC7, so converting the full ETC1 format to ASTC with very little loss should be possible. (Unfortunately ASTC is so dense and complex that I don't have time to determine this for sure yet.)
So I'm pretty confident now that a universal format could be compatible with ASTC, BC7, DXT1, ETC1, and ATC. The only other major format that I can't fit into this scheme easily is my old nemesis, PVRTC.
Obviously this format won't look as good compared to a dedicated, single format encoder's output. So what? There are many valuable use cases that don't require super high quality levels. This scheme purposely trades off a drop in quality for interchange and distribution.
Additionally, with a crunch-style encoding method, only the endpoint (and possibly the selector) codebook entries (of which there are usually only hundreds, possibly up to a few thousand in a single texture) would need to be converted to the target format. So the GPU format conversion step doesn't actually need to be insanely fast.
Another idea is to just unify ASTC and BC7, two very high quality formats. The drop in quality due to unification would be relatively much less significant with this combination. (But how valuable is this combo?)