In my experiments, a simple but usable subset of ETC1 can be easily converted to DXT1, BC7, and ATC. And after studying the standard, it very much looks like the full ETC1 format can be converted into BC7 with very little loss. (And when I say "converted", I mean using very little CPU, just basically some table lookup operations over the endpoint and selector entries.)
ASTC seems to be (at first glance) around as powerful as BC7, so converting the full ETC1 format to ASTC with very little loss should be possible. (Unfortunately ASTC is so dense and complex that I don't have time to determine this for sure yet.)
So I'm pretty confident now that a universal format could be compatible with ASTC, BC7, DXT1, ETC1, and ATC. The only other major format that I can't fit into this scheme easily is my old nemesis, PVRTC.
Obviously this format won't look as good compared to a dedicated, single format encoder's output. So what? There are many valuable use cases that don't require super high quality levels. This scheme purposely trades off a drop in quality for gains in interchange.
Additionally, with a crunch-style encoding method, only the endpoint (and possibly the selector) codebook entries (of which there are usually only hundreds, possibly up to a few thousand in a single texture) would need to be converted to the target format. So the GPU format conversion step doesn't actually need to be insanely fast.
Another idea is to just unify ASTC and BC7, two very high quality formats. The drop in quality due to unification would be relatively much less significant with this combination. (But how valuable is this combo?)
Co-owner of Binomial LLC, working on GPU texture interchange. Open source developer, graphics programmer, former video game developer. Worked previously at SpaceX (Starlink), Valve, Ensemble Studios (Microsoft), DICE Canada.
Friday, September 9, 2016
Some memories
I remember a few years ago at one company, I was explaining and showing one of my early graphics API tracing/replaying demos (on a really cool 1st person game made by some company in Europe) to a couple "senior" engineers there. I described my plan and showed them the demo.
Both of them said it wasn't interesting, and implied I should stop now and not show what I was working on to the public.
Thanks to these two engineers, I knew for sure I had something valuable! And it turned out, this tool (and tools like it) was very useful and valuable to developers. I later showed this tool to the public and received amazingly positive feedback.
I had learned from many previous experiences that, at this particular company, resistance to new ideas was usually a sign. The harder they resisted, the more useful and interesting the technology probably was. The company had horribly stagnated, and the engineers there were, as a group, optimizing for yearly stack ranking slots (and their bonuses) and not for the actual needs of the company.
Both of them said it wasn't interesting, and implied I should stop now and not show what I was working on to the public.
Thanks to these two engineers, I knew for sure I had something valuable! And it turned out, this tool (and tools like it) was very useful and valuable to developers. I later showed this tool to the public and received amazingly positive feedback.
Wednesday, September 7, 2016
ETC1->DXT1 encoding table error visualization
Here's are two visualizations of the overall DXT1 encoding error due to using this table, assuming each selector is used equally (which is not always true). This is the lookup table referred to in my previous post.
Each small 32x32 pixel tile in this image visualizes a R,G slice of the 3D lattice, there are 32 tiles for B (left to right), and there are 8 rows overall. The first row of tiles is for ETC intensity table 0, the second 1, etc.
First visualization, where the max error in each individual tile is scaled to white:
Second visualization, visualizing max overall encoding error relative to all tiles:
Hmm - the last row (representing ETC1 intensity table 7) is approximated the worst in DXT1.
Each small 32x32 pixel tile in this image visualizes a R,G slice of the 3D lattice, there are 32 tiles for B (left to right), and there are 8 rows overall. The first row of tiles is for ETC intensity table 0, the second 1, etc.
First visualization, where the max error in each individual tile is scaled to white:
Second visualization, visualizing max overall encoding error relative to all tiles:
Hmm - the last row (representing ETC1 intensity table 7) is approximated the worst in DXT1.
Monday, September 5, 2016
More thoughts on a universal GPU texture interchange format
Just some random thoughts:
I still think the idea of a universal GPU texture compression standard is fascinating and useful. Something that can be efficiently transcoded to 2 or more major vendor formats, without sacrificing too much along the quality or compression ratio axes. Developers could just encode to this standard interchange format and ship to a large range of devices without worrying about whether GPU Y supports arcane texture format Z. (This isn't my idea, it's from Won Chun at RAD.)
Imagine, for example, a format that can be efficiently transcoded to ASTC, with an alternate mode in the transcoder that outputs BC7 as a fallback. Interestingly, imagine if this GPU texture interchange format looked a bit better (and/or transcoded more quickly) when transcoded into one of the GPU formats verses the other. This situation seems very possible in some of the designs of a universal format I've been thinking about.
Now imagine, in a few years time, a large set of universal GPU textures gets used and stored by developers, and distributed into the wild on the web. Graphics or rendering code samples even start getting distributed using this interchange format. A situation like this would apply pressure to the other GPU vendor with the inferior format to either dump their format or create a newer format more compatible with efficient transcoding.
To put it simply, a universal format could help fix this mess of GPU texture formats we have today.
I still think the idea of a universal GPU texture compression standard is fascinating and useful. Something that can be efficiently transcoded to 2 or more major vendor formats, without sacrificing too much along the quality or compression ratio axes. Developers could just encode to this standard interchange format and ship to a large range of devices without worrying about whether GPU Y supports arcane texture format Z. (This isn't my idea, it's from Won Chun at RAD.)
Imagine, for example, a format that can be efficiently transcoded to ASTC, with an alternate mode in the transcoder that outputs BC7 as a fallback. Interestingly, imagine if this GPU texture interchange format looked a bit better (and/or transcoded more quickly) when transcoded into one of the GPU formats verses the other. This situation seems very possible in some of the designs of a universal format I've been thinking about.
Now imagine, in a few years time, a large set of universal GPU textures gets used and stored by developers, and distributed into the wild on the web. Graphics or rendering code samples even start getting distributed using this interchange format. A situation like this would apply pressure to the other GPU vendor with the inferior format to either dump their format or create a newer format more compatible with efficient transcoding.
To put it simply, a universal format could help fix this mess of GPU texture formats we have today.
Visualizing ETC1 texture compression
The ETC1 format consists of two block colors, two intensity table selectors, two mode bits ("diff" and "flip"), and 16 2-bit selectors. Here are some simple visualizations of what this encoded data looks like.
The original image (kodim14):
The ETC1 encoded image (using rg_etc1 in slow mode - modified to use perceptual colorspace metrics):
Here's the selector image (the 2-bit selectors have been scaled up to 0-255):
Subblock 0's intensity, scaled from 0-7 to 0-255:
Subblock 1's color, expanded to 8,8,8:
Subblock 1's intensity, scaled from 0-7 to 0-255:
The "flip" mode bits (white=flipped):
Saturday, September 3, 2016
ETC1 block color clusterization experiment
Intro
ETC1 is a well thought out, elegant little GPU format. In my experience a few years ago writing a production quality block ETC1 encoder, I found it to be far less fiddly than DXT1. Both use 64-bits to represent a 4x4 texel block, or 4-bits per texel.
I've been very curious how hard it would be to add ETC1/2 support to crunch. Also, many people have asked about ETC1 support, which is guaranteed to be available on OpenGL ES 2.0 compatible Android devices. crunch currently only supports the DXT1/5/N (3DC) texture formats. crunch's higher level classes are highly specific to the DXT formats, so adding a new format is not trivial.
One of the trickier (and key) problems in adding a new GPU format to crunch is figuring out how to group blocks (using some form of cluster analysis) so they can share the same endpoints. GPU formats like DXT1 and ETC1 are riddled with block artifacts, and bad groupings can greatly amplify them. crunch for DXT has a endpoint clusterization algorithm that was refined over many tens of thousands of real-life game textures and satellite photography. I've just begun experimenting with ETC1, and so far I'm very impressed with how well behaved and versatile it is.
Note this experiment was conducted in a new data compression codebase I've been building, which is much larger than crunch's.
ETC1 Texture Compression
Unlike DXT1, which only supports 3 or 4 unique block colors, the ETC1 format supports up to 8 unique block colors. It divides up the block into either two 4x2 or 2x4 pixel "subblocks". A single "flip" bit controls whether or not the subblocks are oriented horizontally or vertically. Each subblock has 4 colors, for 8 total.
The 4 subblock colors are created by taking the subblock's base color and adding to it 4 grayscale colors from an intensity table. Each subblock has 3 bits which selects which intensity table to apply. The intensity tables are constant and part of the spec.
To encode the two block colors, ETC1 supports two modes: an "individual" mode, where each color is encoded to 4:4:4, or a "differential" mode, where the first color is 5:5:5 and the second color is a two's complement encoded 3:3:3 delta relative to the base color. The delta is applied before the base color is scaled to 8-bits.
From an encoding perspective, individual mode is most useful when the two subblocks have wildly different colors (favoring color diversity vs. encoding precision), and delta mode is most useful when encoding precision is more useful than diversity.
Each pixel is represented using 2-bit selectors, just like DXT1. Except in ETC1, the color selected depends on which subblock the pixel is within.
So that's ETC1 in a nutshell. In practice, from what I remember its quality is a little lower than DXT1, but not by much. Its artifacts look more pleasant to me than DXT1's (obviously subjective). Each ETC1 block is represented by 2 colorspace lines that are always parallel to the grayscale axis. By comparison, with DXT1, there's only a single line, but it can be in any direction, and perhaps that gives it a slight advantage.
ETC1 Endpoint Clusterization
The goal here is to figure out how to reduce the total number of unique endpoints (or block colors and intensity table indices) in an ETC1 encoded image without murdering the quality. This is just an early experiment, so let's try simplifying the ETC1 format itself to keep things simple. This experiment always use differential block color mode, with the delta color set to (0,0,0). So each subblock is represented using the same 5:5:5 color, and the same intensity table. The flip bit is always false. Obviously, this is going to lower quality, but let's see what happens. Note this simplified format is still 100% compatible with existing ETC1 decoders, we're just limiting ourselves to only using a simpler subset.
Here's the original image (kodim18 - because I remember this image being a pain to handle well in crunch for DXT1):
Here's the image encoded using high quality ETC1 compression (using rg_etc1, slow mode, perceptual colorspace metrics):
Delta:
Error: Max: 56, Mean: 2.827, MSE: 16.106, RMSE: 4.013, PSNR: 36.061
So the ETC1 encoding that takes advantage of all ETC1 features is 36.061 dB.
Here's the encoding using just diff mode, no flipping, with a (0,0,0) delta color:
Delta:
So we've lost 2.38 dB by limiting ourselves to this simpler subset of ETC1. The reduction in quality is obviously visible, but by no means fatal for the purposes of this quick experiment.
In this experiment, each ETC1 block only contains 4 unique colors (or a single colorspace line, with "low" and "high" endpoints and 2 intermediate colors). Here's a visualization of the "low" and "high" endpoints in this image:
Now let's clusterize these block color endpoints, using 6D tree structured VQ (vector quantization) to perform the clusterization. The output of this step consists of a series of clusters, and each cluster contains one or more block indices. The idea is, blocks with similar endpoint vectors will be placed into the same cluster. This is a similar process used by crunch for DXT1. It's much like generating a RGB color palette from an array of image colors, except we're dealing with 6D vectors instead of 3D color vectors, and instead of using the output palette directly all we really care about is how the input vectors are grouped.
Here's a visualization of the cluster endpoint centroid vectors after generating 32 clusters:
Once we have the image organized into block clusters containing similar endpoints, use an internal helper class within rg_etc1 to find the near-optimal 5:5:5 endpoint and intensity table to represent all the pixels within each cluster. We can now create a ETC1-compatible texture by processing each block cluster and selecting the optimal selectors to use for each pixel.
Let's see what this texture looks like, and the PSNR, after limiting the number of unique endpoints.
ETC1 (subset) with 64 unique endpoints:
Error: Max: 110, Mean: 5.865, MSE: 70.233, RMSE: 8.380, PSNR: 29.665
ETC1 (subset) 256 unique endpoints:
Error: Max: 93, Mean: 4.624, MSE: 45.889, RMSE: 6.774, PSNR: 31.514
ETC1 (subset) 512 unique endpoints:
Error: Max: 87, Mean: 4.225, MSE: 38.411, RMSE: 6.198, PSNR: 32.286
ETC1 (subset) 1024 unique endpoints:
Error: Max: 87, Mean: 3.911, MSE: 32.967, RMSE: 5.742, PSNR: 32.950
ETC1 (subset) 4096 unique endpoints:
Error: Max: 87, Mean: 3.642, MSE: 28.037, RMSE: 5.295, PSNR: 33.654
Next Steps
This experiment shows one way to clusterize the endpoint optimization process in a limited subset of the ETC1 format. This first step must be mastered before crunch for ETC1 can be written.
The clusterization step outlined here isn't aware of flipping, or that each block can have 2 block colors, and we haven't even looked at the selectors yet. A production encoder will need to support more features of the ETC1 format. Note that crunch for DXT1 doesn't support 3 color blocks and works just fine, so it's possible we don't need to support every encoding feature.
Some next steps:
- Figure out how to best clusterize the full format. Expand the format subset to include two block colors, flipping, and both encodings.
Is 6D clusterization good enough - or is 12D needed?
- Selector clusterization
- ETC1 specific refinement stages: refine endpoints based off the clusterized endpoints, then refine the clusterized endpoints based off the clusterized selectors, possibly repeat.
- crunch-style tiling ("macroblocking") will most likely be needed to get bitrate down to JPEG+real-time encoding competitive levels.
- ETC2 support
- ETC2 support
(Currently, I'm conducting these experiments in my spare time, in between VR and optimization contracts. If you're really interested in accelerating development of crunch for a specific GPU format please contact info@binomial.info.)
Monday, August 29, 2016
Good article: Why software patents are evil
I have been attacked (at a time in my life when the last thing I needed was more stress!) by a patent holder before, so hey I hate software patents:
http://www.infoworld.com/article/2619609/open-source-software/why-software-patents-are-evil.html
http://www.infoworld.com/article/2619609/open-source-software/why-software-patents-are-evil.html
Subscribe to:
Posts (Atom)


























