Wednesday, September 14, 2016

etcpak

etcpak is a very fast, but low quality ETC1 (and a little bit of ETC2) compressor:

https://bitbucket.org/wolfpld/etcpak/wiki/Home

It's the fastest open source ETC1 encoder that I'm aware of.

Notice the lack of any PSNR/MSE/SSIM statistics anywhere (that I can see). Also, the developer doesn't seem to get that the other tools/libraries he compares his stuff against were optimized for quality, not raw speed. In particular, rg_etc1 (and crunch's ETC1 support) was tuned to compete against the reference encoder along both the quality and perf. axes.

Anyhow, there are some interesting things to learn from etcpak:

  • Best quality doesn't always matter. It obviously depends on your use case. If you have 10 gigs of textures to compress then iteration speed can be very important.
  • The value spectrum spans from highest quality/slow encode (to ship final assets) to crap quality/fast as hell encode (favoring iteration speed). 
  • Visually, the ETC1/2 formats are nicely forgiving. Even a low quality ETC1 encoder produces decent enough looking output for many use cases.

Sunday, September 11, 2016

Idea for next texture compression experiment

Right now, I've got a GPU texture in a simple ETC1 subset that is easily converted to most other GPU formats:

Base color: 15-bits, 5:5:5 RGB
Intensity table index: 3-bits
Selectors: 2-bits/texel

Most importantly, this is a "single subset" encoding, using BC7 terminology. BC7 supports between 1-3 subsets per block. A subset is just a colorspace line represented by two R,G,B endpoint colors.

This format is easily converted to DXT1 using a table lookup. It's also the "base" of the universal GPU texture format I've been thinking about, because it's the data needed for DXT1 support. The next step is to experiment with attempting to refine this base data to better take advantage of the full ETC1 specification. So let's try adding two subsets to each block, with two partitions (again using BC7 terminology), top/bottom or left/right, which are supported by both ETC1 and BC7.

For example, we can code this base color, then delta code the 2 subset colors relative to this base. We'll also add a couple more intensity indices, which can be delta coded against the base index. Another bit can indicate which ETC1 block color encoding "mode" should be used (individual 4:4:4 4:4:4 or differential 5:5:5 3:3:3) to represent the subset colors in the output block.

In DXT1 mode, we can ignore this extra delta coded data and just convert the basic (single subset) base format. In ETC1/BC7/ASTC modes, we can use the extra information to support 2 subsets and 2 partitions.

Currently, the idea is to share the same selector indices between the single subset (DXT1) and two subset (BC7/ASTC/full ETC1) encodings. This will constrain how well this idea works, but I think it's worth trying out.

To add more quality to the 2 subset mode, we can delta code (maybe with some fancy per-pixel prediction) another array of selectors in some way. We can also add support for more partitions (derived from BC7's or ASTC's), too.

Saturday, September 10, 2016

Hierarchical clustering

One of the key algorithms in crunch is determining how to group together block endpoints into clusters. Crunch uses a bottom up clustering approach at the 8x8 pixel (or 2x2 DXTn block) "macroblock" level, then it switches to top down. The top down method is extremely sensitive to the vectors chosen to represent each block during the clusterization step. The algorithm crunch uses to compute representative vectors (used only during clusterization) was refined and tweaked over time. Badly chosen representative vectors cause the clustering step to product crappy clusters (i.e. nasty artifacts).

Anyhow, an alternative approach would be entirely bottom up. I think this method could require less tweaking. Some reading:

https://en.wikipedia.org/wiki/Hierarchical_clustering

https://onlinecourses.science.psu.edu/stat505/node/143

Also Google "agglomerative hierarchical clustering". Here's a Youtube video describing it.

Friday, September 9, 2016

Few more random thoughts on a "universal" GPU texture format

In my experiments, a simple but usable subset of ETC1 can be easily converted to DXT1, BC7, and ATC. And after studying the standard, it very much looks like the full ETC1 format can be converted into BC7 with very little loss. (And when I say "converted", I mean using very little CPU, just basically some table lookup operations over the endpoint and selector entries.)

ASTC seems to be (at first glance) around as powerful as BC7, so converting the full ETC1 format to ASTC with very little loss should be possible. (Unfortunately ASTC is so dense and complex that I don't have time to determine this for sure yet.)

So I'm pretty confident now that a universal format could be compatible with ASTC, BC7, DXT1, ETC1, and ATC. The only other major format that I can't fit into this scheme easily is my old nemesis, PVRTC.

Obviously this format won't look as good compared to a dedicated, single format encoder's output. So what? There are many valuable use cases that don't require super high quality levels. This scheme purposely trades off a drop in quality for gains in interchange.

Additionally, with a crunch-style encoding method, only the endpoint (and possibly the selector) codebook entries (of which there are usually only hundreds, possibly up to a few thousand in a single texture) would need to be converted to the target format. So the GPU format conversion step doesn't actually need to be insanely fast.

Another idea is to just unify ASTC and BC7, two very high quality formats. The drop in quality due to unification would be relatively much less significant with this combination. (But how valuable is this combo?)

Some memories

I remember a few years ago at one company, I was explaining and showing one of my early graphics API tracing/replaying demos (on a really cool 1st person game made by some company in Europe) to a couple "senior" engineers there. I described my plan and showed them the demo.

Both of them said it wasn't interesting, and implied I should stop now and not show what I was working on to the public.

Thanks to these two engineers, I knew for sure I had something valuable! And it turned out, this tool (and tools like it) was very useful and valuable to developers. I later showed this tool to the public and received amazingly positive feedback.

I had learned from many previous experiences that, at this particular company, resistance to new ideas was usually a sign. The harder they resisted, the more useful and interesting the technology probably was. The company had horribly stagnated, and the engineers there were, as a group, optimizing for yearly stack ranking slots (and their bonuses) and not for the actual needs of the company.

Wednesday, September 7, 2016

ETC1->DXT1 encoding table error visualization

Here's are two visualizations of the overall DXT1 encoding error due to using this table, assuming each selector is used equally (which is not always true). This is the lookup table referred to in my previous post.

Each small 32x32 pixel tile in this image visualizes a R,G slice of the 3D lattice, there are 32 tiles for B (left to right), and there are 8 rows overall. The first row of tiles is for ETC intensity table 0, the second 1, etc.

First visualization, where the max error in each individual tile is scaled to white:


Second visualization, visualizing max overall encoding error relative to all tiles:


Hmm - the last row (representing ETC1 intensity table 7) is approximated the worst in DXT1.


Monday, September 5, 2016

More thoughts on a universal GPU texture interchange format

Just some random thoughts:

I still think the idea of a universal GPU texture compression standard is fascinating and useful. Something that can be efficiently transcoded to 2 or more major vendor formats, without sacrificing too much along the quality or compression ratio axes. Developers could just encode to this standard interchange format and ship to a large range of devices without worrying about whether GPU Y supports arcane texture format Z. (This isn't my idea, it's from Won Chun at RAD.)

Imagine, for example, a format that can be efficiently transcoded to ASTC, with an alternate mode in the transcoder that outputs BC7 as a fallback. Interestingly, imagine if this GPU texture interchange format looked a bit better (and/or transcoded more quickly) when transcoded into one of the GPU formats verses the other. This situation seems very possible in some of the designs of a universal format I've been thinking about.

Now imagine, in a few years time, a large set of universal GPU textures gets used and stored by developers, and distributed into the wild on the web. Graphics or rendering code samples even start getting distributed using this interchange format. A situation like this would apply pressure to the other GPU vendor with the inferior format to either dump their format or create a newer format more compatible with efficient transcoding.

To put it simply, a universal format could help fix this mess of GPU texture formats we have today.