I've worked with many of the authors of this at one time or another:
Building a blazing fast ETC2 compressor
Repo:
https://github.com/google/etc2comp
(I can't believe the Mali encoder was only single threaded!)
Co-owner of Binomial LLC, working on GPU texture interchange. Open source developer, graphics programmer, former video game developer. Worked previously at SpaceX (Starlink), Valve, Ensemble Studios (Microsoft), DICE Canada.
Thursday, September 15, 2016
Wednesday, September 14, 2016
ETC1 principle axis optimization
One possible potential (probably minor) optimization to ETC1 encoding: determine the principle axis of the entire texture, rotate the texture's RGB pixels (by treating them as 3D vectors) so this axis is aligned along the grayscale axis, then compress the texture as usual. The pixel shader can undo the rotation using a trivial handful of instructions.
ETC1 uses colorspace lines constrained to be parallel to the grayscale axis, which this optimization exploits.
ETC1 uses colorspace lines constrained to be parallel to the grayscale axis, which this optimization exploits.
etcpak
etcpak is a very fast, but low quality ETC1 (and a little bit of ETC2) compressor:
https://bitbucket.org/wolfpld/etcpak/wiki/Home
It's the fastest open source ETC1 encoder that I'm aware of.
Notice the lack of any PSNR/MSE/SSIM statistics anywhere (that I can see). Also, the developer doesn't seem to get that the other tools/libraries he compares his stuff against were optimized for quality, not raw speed. In particular, rg_etc1 (and crunch's ETC1 support) was tuned to compete against the reference encoder along both the quality and perf. axes.
Anyhow, there are some interesting things to learn from etcpak:
https://bitbucket.org/wolfpld/etcpak/wiki/Home
It's the fastest open source ETC1 encoder that I'm aware of.
Notice the lack of any PSNR/MSE/SSIM statistics anywhere (that I can see). Also, the developer doesn't seem to get that the other tools/libraries he compares his stuff against were optimized for quality, not raw speed. In particular, rg_etc1 (and crunch's ETC1 support) was tuned to compete against the reference encoder along both the quality and perf. axes.
Anyhow, there are some interesting things to learn from etcpak:
- Best quality doesn't always matter. It obviously depends on your use case. If you have 10 gigs of textures to compress then iteration speed can be very important.
- The value spectrum spans from highest quality/slow encode (to ship final assets) to crap quality/fast as hell encode (favoring iteration speed).
- Visually, the ETC1/2 formats are nicely forgiving. Even a low quality ETC1 encoder produces decent enough looking output for many use cases.
Sunday, September 11, 2016
Idea for next texture compression experiment
Right now, I've got a GPU texture in a simple ETC1 subset that is easily converted to most other GPU formats:
Base color: 15-bits, 5:5:5 RGB
Intensity table index: 3-bits
Selectors: 2-bits/texel
Most importantly, this is a "single subset" encoding, using BC7 terminology. BC7 supports between 1-3 subsets per block. A subset is just a colorspace line represented by two R,G,B endpoint colors.
This format is easily converted to DXT1 using a table lookup. It's also the "base" of the universal GPU texture format I've been thinking about, because it's the data needed for DXT1 support. The next step is to experiment with attempting to refine this base data to better take advantage of the full ETC1 specification. So let's try adding two subsets to each block, with two partitions (again using BC7 terminology), top/bottom or left/right, which are supported by both ETC1 and BC7.
For example, we can code this base color, then delta code the 2 subset colors relative to this base. We'll also add a couple more intensity indices, which can be delta coded against the base index. Another bit can indicate which ETC1 block color encoding "mode" should be used (individual 4:4:4 4:4:4 or differential 5:5:5 3:3:3) to represent the subset colors in the output block.
In DXT1 mode, we can ignore this extra delta coded data and just convert the basic (single subset) base format. In ETC1/BC7/ASTC modes, we can use the extra information to support 2 subsets and 2 partitions.
Currently, the idea is to share the same selector indices between the single subset (DXT1) and two subset (BC7/ASTC/full ETC1) encodings. This will constrain how well this idea works, but I think it's worth trying out.
To add more quality to the 2 subset mode, we can delta code (maybe with some fancy per-pixel prediction) another array of selectors in some way. We can also add support for more partitions (derived from BC7's or ASTC's), too.
Base color: 15-bits, 5:5:5 RGB
Intensity table index: 3-bits
Selectors: 2-bits/texel
Most importantly, this is a "single subset" encoding, using BC7 terminology. BC7 supports between 1-3 subsets per block. A subset is just a colorspace line represented by two R,G,B endpoint colors.
This format is easily converted to DXT1 using a table lookup. It's also the "base" of the universal GPU texture format I've been thinking about, because it's the data needed for DXT1 support. The next step is to experiment with attempting to refine this base data to better take advantage of the full ETC1 specification. So let's try adding two subsets to each block, with two partitions (again using BC7 terminology), top/bottom or left/right, which are supported by both ETC1 and BC7.
For example, we can code this base color, then delta code the 2 subset colors relative to this base. We'll also add a couple more intensity indices, which can be delta coded against the base index. Another bit can indicate which ETC1 block color encoding "mode" should be used (individual 4:4:4 4:4:4 or differential 5:5:5 3:3:3) to represent the subset colors in the output block.
In DXT1 mode, we can ignore this extra delta coded data and just convert the basic (single subset) base format. In ETC1/BC7/ASTC modes, we can use the extra information to support 2 subsets and 2 partitions.
Currently, the idea is to share the same selector indices between the single subset (DXT1) and two subset (BC7/ASTC/full ETC1) encodings. This will constrain how well this idea works, but I think it's worth trying out.
To add more quality to the 2 subset mode, we can delta code (maybe with some fancy per-pixel prediction) another array of selectors in some way. We can also add support for more partitions (derived from BC7's or ASTC's), too.
Saturday, September 10, 2016
Hierarchical clustering
One of the key algorithms in crunch is determining how to group together block endpoints into clusters. Crunch uses a bottom up clustering approach at the 8x8 pixel (or 2x2 DXTn block) "macroblock" level, then it switches to top down. The top down method is extremely sensitive to the vectors chosen to represent each block during the clusterization step. The algorithm crunch uses to compute representative vectors (used only during clusterization) was refined and tweaked over time. Badly chosen representative vectors cause the clustering step to product crappy clusters (i.e. nasty artifacts).
Anyhow, an alternative approach would be entirely bottom up. I think this method could require less tweaking. Some reading:
https://en.wikipedia.org/wiki/Hierarchical_clustering
https://onlinecourses.science.psu.edu/stat505/node/143
Also Google "agglomerative hierarchical clustering". Here's a Youtube video describing it.
Anyhow, an alternative approach would be entirely bottom up. I think this method could require less tweaking. Some reading:
https://en.wikipedia.org/wiki/Hierarchical_clustering
https://onlinecourses.science.psu.edu/stat505/node/143
Also Google "agglomerative hierarchical clustering". Here's a Youtube video describing it.
Friday, September 9, 2016
Few more random thoughts on a "universal" GPU texture format
In my experiments, a simple but usable subset of ETC1 can be easily converted to DXT1, BC7, and ATC. And after studying the standard, it very much looks like the full ETC1 format can be converted into BC7 with very little loss. (And when I say "converted", I mean using very little CPU, just basically some table lookup operations over the endpoint and selector entries.)
ASTC seems to be (at first glance) around as powerful as BC7, so converting the full ETC1 format to ASTC with very little loss should be possible. (Unfortunately ASTC is so dense and complex that I don't have time to determine this for sure yet.)
So I'm pretty confident now that a universal format could be compatible with ASTC, BC7, DXT1, ETC1, and ATC. The only other major format that I can't fit into this scheme easily is my old nemesis, PVRTC.
Obviously this format won't look as good compared to a dedicated, single format encoder's output. So what? There are many valuable use cases that don't require super high quality levels. This scheme purposely trades off a drop in quality for gains in interchange.
Additionally, with a crunch-style encoding method, only the endpoint (and possibly the selector) codebook entries (of which there are usually only hundreds, possibly up to a few thousand in a single texture) would need to be converted to the target format. So the GPU format conversion step doesn't actually need to be insanely fast.
Another idea is to just unify ASTC and BC7, two very high quality formats. The drop in quality due to unification would be relatively much less significant with this combination. (But how valuable is this combo?)
ASTC seems to be (at first glance) around as powerful as BC7, so converting the full ETC1 format to ASTC with very little loss should be possible. (Unfortunately ASTC is so dense and complex that I don't have time to determine this for sure yet.)
So I'm pretty confident now that a universal format could be compatible with ASTC, BC7, DXT1, ETC1, and ATC. The only other major format that I can't fit into this scheme easily is my old nemesis, PVRTC.
Obviously this format won't look as good compared to a dedicated, single format encoder's output. So what? There are many valuable use cases that don't require super high quality levels. This scheme purposely trades off a drop in quality for gains in interchange.
Additionally, with a crunch-style encoding method, only the endpoint (and possibly the selector) codebook entries (of which there are usually only hundreds, possibly up to a few thousand in a single texture) would need to be converted to the target format. So the GPU format conversion step doesn't actually need to be insanely fast.
Another idea is to just unify ASTC and BC7, two very high quality formats. The drop in quality due to unification would be relatively much less significant with this combination. (But how valuable is this combo?)
Some memories
I remember a few years ago at one company, I was explaining and showing one of my early graphics API tracing/replaying demos (on a really cool 1st person game made by some company in Europe) to a couple "senior" engineers there. I described my plan and showed them the demo.
Both of them said it wasn't interesting, and implied I should stop now and not show what I was working on to the public.
Thanks to these two engineers, I knew for sure I had something valuable! And it turned out, this tool (and tools like it) was very useful and valuable to developers. I later showed this tool to the public and received amazingly positive feedback.
I had learned from many previous experiences that, at this particular company, resistance to new ideas was usually a sign. The harder they resisted, the more useful and interesting the technology probably was. The company had horribly stagnated, and the engineers there were, as a group, optimizing for yearly stack ranking slots (and their bonuses) and not for the actual needs of the company.
Both of them said it wasn't interesting, and implied I should stop now and not show what I was working on to the public.
Thanks to these two engineers, I knew for sure I had something valuable! And it turned out, this tool (and tools like it) was very useful and valuable to developers. I later showed this tool to the public and received amazingly positive feedback.
Subscribe to:
Posts (Atom)