Monday, October 29, 2018

Existing BC7 codecs don't handle textures with decorrelated alpha channels well

Developers aren't getting the alpha quality they could be getting if they had better BC7 codecs. I noticed while working on our new non-RDO BC7 codec that existing BC7 codecs don't handle textures with decorrelated alpha signals well. They wind up trashing the alpha channel when the A signal doesn't resemble the signal in RGB. I didn't have time to investigate the issue until now. I'm guessing most developers either don't care, or they use simple (correlated) alpha channels, or multiple textures.

Some codecs allow the user to specify individual RGBA channel weightings. (ispc_texcomp isn't one of them.) This doesn't work well in practice, and users will rarely fiddle with the weightings anyway. You have to weight A so highly that RGB gets trashed.

Here's an example using a well-known CPU BC7 codec:

RGB image: kodim18
Alpha image: kodim17

Encoded using Intel's ispc_texcomp to BC7 profile alpha_slow:

RGB Average Error: Max:  40, Mean: 1.868, MSE: 7.456, RMSE: 2.731, PSNR: 39.406
Luma        Error: Max:  26, Mean: 1.334, MSE: 3.754, RMSE: 1.938, PSNR: 42.386
Alpha       Error: Max:  36, Mean: 1.932, MSE: 7.572, RMSE: 2.752, PSNR: 39.339

Encoded RGB:

Encoded A:

Experimental RDO BC7 codec (quantization disabled) with support for decorrelated alpha. Uses only modes 4, 6, and 7:

M4: 72.432457%, M6: 17.028809%, M7: 10.538737%
RGB Average Error: Max:  65, Mean: 2.031, MSE: 8.871, RMSE: 2.978, PSNR: 38.651
Luma        Error: Max:  34, Mean: 1.502, MSE: 4.887, RMSE: 2.211, PSNR: 41.241
Alpha       Error: Max:  29, Mean: 1.601, MSE: 5.703, RMSE: 2.388, PSNR: 40.570

Encoded RGB:

Encoded A:

Zoomed in comparison:



This experimental codec separately measures per-block RGB average and alpha PSNR. It prefers mode 4, and switches to modes 6 or 7 using this logic:

const float M7_RGB_THRESHOLD = 1.0f;
const float M7_A_THRESHOLD = 40.0f;
const float M7_A_DERATING = 12.0f;

const float M6_RGB_THRESHOLD = 1.0f;
const float M6_A_THRESHOLD = 40.0f;
const float M6_A_DERATING = 7.0f;

if ((m6_rgb_psnr > (math::maximum(m4_rgb_psnr, m7_rgb_psnr) + M6_RGB_THRESHOLD)) && (m6_a_psnr > M6_A_THRESHOLD) && (m6_a_psnr > math::maximum(m4_a_psnr, m7_a_psnr) - M6_A_DERATING))
{
block_modes[block_index] = 6;
}
else if ((m7_rgb_psnr > (m4_rgb_psnr + M7_RGB_THRESHOLD)) && (m7_a_psnr > (m4_a_psnr - M7_A_DERATING)) && (m7_a_psnr > M7_A_THRESHOLD))
{
block_modes[block_index] = 7;
}
else 
{
block_modes[block_index] = 4;
}

What this basically does: we only use mode 6 or 7 when the RGB PSNR is greater than mode 4's RGB PSNR plus a threshold (1 dB). But we only do this if we don't degrade alpha quality too much (either 12 dB or 7 dB), and if alpha quality is above a minimum threshold (40 dB).

PSNR doesn't really capture the extra distortion being introduced here. It can help to load the alpha images into Photoshop or whatever and compare them closely as layers.

RGB PSNR went down a little with my experimental RDO BC7 codec, but alpha went up. Visually, alpha is greatly improved. My ispc non-RDO BC7 code currently has the same issue as Intel's ispc_texcomp with decorrelated alpha.

Tech company patterns

First off, I'm not finished with this blog post. Don't take it too seriously. I'm not referring to any specific company here, i.e. this is not about any one company. Some aren't even from me. It's incomplete. There are definitely more patterns. I do strongly believe you can lump most tech companies into a number of general categories because that's good for business. There are good companies too.

A company can be a blend of multiple patterns. For each pattern's attributes I'm using examples from real-life companies I (or friends/ex-coworkers) have seen over the decades. The majority were inspired by multiple companies.

When you interview or interact with a company it's a good idea to figure out what makes the company tick. Knowing patterns like this can help figure this out. Once you get some experience with this, you can quickly figure out how a company is actually ran or structured fairly quickly. I have shown this post to a few tech business people and they were like "Yup, I know of several companies that fit each category. This is obvious stuff and spot on."

1. The Famous Company Pattern 
Massive subsidies from digital distribution or a large popular engine.
Due to intense developer marketing efforts this is a company everyone yearns to work for.
It doesn't matter how bad the company's working conditions are rumored to be. Developers still yearn to someday work for the famous company as a career goal.
Let them call you - *always*. Can be dehumanizing and super abusive.
The company doesn't really need you unless you work on the core product that makes tons of cash. They may be "collecting" employees to keep them away from competitors. Or, they may be hiring employees as a way to give the wealthy insiders something interesting to do.
Results don't really matter at all because the company is going to make money anyway.
The less results matter, the more you should expect things such as: Soviet-like purges. Insane, power mad staff. No accountability. Toxic teams.
Possibly lots of crunching, micromanagement.
May have a cult-like psuedo-leader/godhead used for PR/marketing.

2. The Megacorp Pattern
Massive dilbert-like internal politics between and within groups. Can be decent if you find the right group that you fit into.
Results only matter due to internal politics and constant reorgs/layoffs, not due to any intrinsic need for profits.
Great for those who want a somewhat stable lifestyle, if you can tolerate the politics and culture.
Workers turn the megacorp into their corporate tribe and absolutely obsess over internal politics at virtually all times. (If you go to a bar and hang out with employees from megacorps, and all they talk about is company politics, well there you go. It becomes their obsession.)
Culture can appear absolutely batshit insane from the outside (some megacorps can be very insular).
May have illegally colluded with other megacorps to not hire each other's employees to keep wages and horizontal movement between firms down.
Company may be full of fiefdoms. Lots of shenanigans to protect skilled workers from the insane review system and yearly stack ranking.
Company may strategically spread rumors and PR that the "Company has Changed" and things are so much better now. This is possible, but be wary.
Insiders who get fired sometimes get massive payouts from other insiders on their way out.

3. The Acquired Company Pattern
Firm acquired by large megacorp
Key dynamic is how well the company and its management integrates into the new one.
The former insiders/founders become mid-level management at the megacorp.
Firm may try to keep its unique identity and culture but usually fails.
Resistance is futile: Either the firm ultimately becomes fully absorbed into the megacorp or it will be shut down.
Don't join until you figure out what the megacorp actually thinks of the acquired company and its mid-level managers.
The former owners will be super tight.
The relationship between the former owners and corporate can turn into "us vs. them", which isn't healthy for studio stability.
Be wary if the acquired studio is cooking the books and has secret passion projects going in the background (like with pattern 8). If the studio is lying to corporate about who is working on what they'll eventually figure this out and heads will roll. The larger the secret projects, the more danger the studio is in.
If the acquired company is geographically far away from the corporate mothership, be very wary. If the mothership becomes unhappy with the former owners (now mid-level management) the acquired company will be laid off.
Company morale can drop over time once the company is acquired and the workers collectively realize that they now work for a faceless megacorp. This can lead to the Phoenix Pattern, below.
Insiders at the acquired company can become insiders at the new megacorp.

4. The Legendary Company Pattern
Products are legendary and set the bar.
There are two types of employees: The "Old Guard" that worked on the earlier legendary products, and everyone else.
Can be a very good choice if you can fit in and get shit done.
Don't expect to become an insider anytime soon if ever. Only old guard workers can be insiders.

5. The Silicon Valley Startup Pattern
Founders and investors get in bed with each other.
Founders can appear absolutely batshit on social media, during public presentations etc.
CEO and closest insiders can be very tight knit. They will cover each other's asses.
For the gamblers. The earlier you get in, the higher the probability you'll get good stock.
Non-savvy founders get eventually pushed out or lose power.
These startups come and go constantly, so if you work for one that almost inevitably goes bust just move your desk one block away.
If the startup is actually located in Silicon Valley: Employees may walk at the slightest issue and take a job (along with all your company know-how/IP/experience/etc.) at the megacorp next store. The company is only able to recruit the talent that isn't already working for the megacorp (or fully funded startup) up the street.
Talk to ex-employees before joining. If they had to sign NDA's or got threatened if they talked, avoid. If the company has lots of turnover, avoid.

6. The Self-Funded Startup Pattern
Formed by a small, passionate group of insiders wanting to recapture past glories or just be independent.
Can be good, but don't expect it to last when the insiders break up.
Founders can be super passionate about their project and will continue investing in it even after it becomes obvious to everyone else that it's never going to make a dime.
These startups have lifespans of a few years or so unless they have a big success.
Commonly seen in combination with patterns 7, 8, 9.
Investigate the backgrounds of the owners and obviously avoid if there are any red flags: multiple lawsuits, scams, forced ex-employees to sign NDA's, etc.

7. Single Publisher/Throw-Away Sweatshop Pattern
Beholden to a single publisher or customer. Publisher/primary and only customer is abusive. Company ignores it because it has no choice.
You will be treated like dogs. Crunch is expected.
Founders think they are making good money, but because they go for long periods of time without any income while still working they actually aren't.
Company has zero leverage with its publisher because it doesn't have any alternatives.
Can be OK if you work there hourly, but avoid full-time contracts because you will be crunched to death and treated badly.
Darth Vader-like publisher will break all the rules, recruit your best staff, make changes to your team or contract, etc. at will - because it can.
Unlike the Multiple Publisher Pattern, you will be interacting with the publisher's employees and they will treat you like shit.
If the firm is bought out the insiders will become mid-level managers at the new company (pattern 3). As the company has zero leverage, alternatives, or its own IP it'll be more like a mass "acqui-hire". If you aren't a company insider near the top before the buyout don't expect to earn much from the buyout.
Several more healthy variants on this patterns are possible. The key dynamics are the relationship with the single publisher and the team's talent and fame. Another single publisher variant is possible where the team is just so overwhelmingly famous that they can choose virtually any publisher they want.

8. Multiple Publisher Pattern
Company keeps multiple products in the pipeline with multiple publishers in an attempt to spread around risk and give the company some negotiation leverage.
Firm tends to lie to each publisher about who is working on what. (Publishers know this, too.)
Publisher(s) are kept at arms length and generally aren't allowed to interact with employees - always through managers. This is to prevent the flow of too much company information back to the publisher.
May have secret independent passion-projects in the background covertly funded with publisher funds.
Fragile. If one team fails, the company is in trouble and expect layoffs. If two or more teams fail, the company is toast.
Always talk to all teams in the firm to build a picture of how healthy each product is.
Can be great places to work as long as you realize it probably won't last unless one or more products hits big or the company is bought out.
If the firm is bought out by a publisher the company switches to pattern 3 as the insider owners become mid-level managers at the new company.

9. The Phoenix/Small Town Pattern
Company formed after mass layoff or some other type of company trauma (a purge, or low morale after a megacorp acquisition).
Two groups: Insiders and Outsiders. Insiders are *tight*. Outsiders will never become insiders - new insiders will always be brought in and ordained as management.
Eventual Buyout Mentality: You will be constantly told that the company will eventually be sold and you'll become rich off your stock - just like last time.
Local shadowy investors prop the company up during hard times.
Stock is absolutely, totally worthless unless the Insiders love you during a buyout.
If you piss the insiders off but are still valuable, they will mess with your stock during the buyout to shortchange you.
Unstable until established. Buyout may never actually happen.
Small-town environment may make the company somewhat shady. Horizontal movement between tech companies in the same small town is virtually impossible due to illegal collusion between companies to not compete over employees.
If the company folds a new company will be formed sometimes literally across the street and the best laid off employees will be instantly hired. They'll be handed some fake stock and told they'll be wealthy someday once the new company is sold. (Right.)
The company actually exists to make the insiders wealthy and to give the upper management a decent lifestyle. Everyone else is expendable.

10. Wealthy Dictator Pattern
No "Insiders": There's the dictator-like owner, upper management, and everyone else.
Always meet and interview with the owner first. Avoid if they give you the creeps or a bad vibe.
Company is an expression of the owner's weird development philosophies. It's basically the owner's hobby or side company.
Best avoided if in a small city/town.
Check the background of the owner and figure out where their funding came from. If they are scam artists, have lots of ex-employees suing them, or have otherwise shady backgrounds, obviously avoid.

11. The World Domination Pattern
This large decentralized organization pattern was designed - not evolved. It follows a well-thought out template and a plan.
The company controls an engine or a software product used by a large ecosystem of content creators.
At War with competitive engine companies, which the company absolutely hates.
Funded with large amounts of investor capitol and through support contracts with large firms.
Massive, sprawling corporation consisting of multiple smaller firms spread over the entire globe.
The engine company workers actually wind up secretly hating the developers who use their engine.
Joining as a single developer gets you nowhere. It's best to be acquired by the firm as a small group and given your own office. The company actively looks for these small groups to hire/acquire.
Can be a good gig in the right city but don't expect to get anywhere. It's just a job on a large sprawling piece of engine software nobody fully understands anymore.
Company can employ talent virtually anywhere on the globe.
It's hinted in whispers that the eventual buyout will make them all wealthy. (Right.)
Workers generally treated like crap. Contractors (especially in Eastern Europe or Russia) are massively underpaid and undervalued.
Company has a firm process and procedure for doing things and that's it.
Upper management layer is cult-like and very tight.
Each office has its own strange brand of small town-esque politics and culture.

12. The Master Psychological Manipulator Pattern
The owner has graduated from writing code to Programming Programmers. Owner is a master psychological manipulator. He locks in employees by doing things like co-signing their mortgages.
Possibly combined with pattern 10 (wealthy dictator).
Employees are afraid of the owner, and afraid of what happens if they leave.
You will be so well manipulated by companies following this pattern that everything will feel amazing and alright until the trap is sprung and you're in so deep you're afraid to leave.
All employees are ultimately disposable interchangeable cogs. The firm is constantly on the lookout for key "10x" engineers who can keep the product(s) functioning.
Somehow the firm is actually not profitable and has almost gone under, but was bailed out by friends from other companies pumping in cash for strategic reasons.
The owner and his closest insider friends have their own strange subculture. It can be almost impossible to comprehend them while hearing them talk to each other.
The owner/founder is reclusive and rarely come into the office. This causes stress with the employees due to a leadership vacuum. Alternately, you are so micro-managed and watched you can't breath.
Special individual agreements are secretly struck with each employee. Some employees are paid massively more than others.
The firm's software isn't very good, but it has good marketing and appears stable from the outside.
Like other companies listed above, the owner appears to be absolutely batshit if you actually listen to them. Probably technically disconnected because they don't code themselves anymore. They are unable to run projects with more than a small handful of programmers at a time because their primary skill is manipulation of individuals, not project management.
Not recommended unless you're game to being psychologically profiled and manipulated.

Tuesday, July 24, 2018

This is why we're working on Basis.

Here's a very interesting graph of game install/on-device sizes from The Cost of Games:


This is a *log* graph. Notice the overall trend. Most of this data is texture data.

And so this is why our product is so valuable.

Thursday, July 12, 2018

A little ETC1S history

I've been talking about ETC1S for several years. I removed some of my earlier posts (to prevent others from stealing our work - which does happen) but they are here:
https://web.archive.org/web/20160913170247/http://richg42.blogspot.com/

We also covered our work with ETC1S and a universal texture format at CppCon 2016:

Just in case there's any confusion, we shipped our first ETC1S encoder to Netflix early last year, and developed all the universal stuff from 2016-early 2018.

Sunday, July 8, 2018

Basis status update

I sent this as a reply to someone by email, but it makes a good blog post too. Here's what Basis does today right now (i.e. this is what we ship for OSX/Windows/Linux):

1. RDO BC1-5: Like crunch's, slower but higher quality/smaller files (supports up to 32K codebooks, LZ-specific RDO optimizations - crunch is limited to only 8K codebooks, no LZ RDO)

This competes against crunch's original BC1-5 RDO solution, which is extremely fast (I wrote it for max speed) but lower quality for same bitrate. The decrease in bitrate for same quality completely depends on the content and the LZ codec you use, but it can be as high as 20% according to one large customer. On the other hand, for some texture's it'll only be a few percent.

crunch's RDO is limited to 8K codebooks so Basis can be used were crunch cannot due to quality concerns.

Some teams prefer fast encoding at lower quality, and some prefer higher quality (especially on normal maps) at lower speed. We basically gave away the lower quality option in crunch.

2. RDO ETC1: Up to 32K codebooks, no LZ-specific RDO optimizations yet.
Crunch doesn't support ETC1 RDO.
You could compress to ETC1 .CRN, then unpack that to .KTX, to give you a "poor man's" equivalent to direct ETC1 RDO, but you'll still be limited to 8K codebooks max (which is too low quality for many normal maps and some albedo textures).

3. .basis: universal (supports transcoding to BC1-5, BC7, PVRTC1 4bpp opaque, ETC1, more formats on the way)
crunch doesn't support this.
We provide all of the C++ decoder/transcoder source code, which builds using emscripten.

.basis started as a custom ETC1 system we wrote for Netflix, then I figured out how to make it universal. Note that I recently open sourced the key ETC1S->BC1 transcoding technique in crunch publicly (to help the Khronos universal GPU texture effort along by giving them the key info they needed to implement their own solution):

4. Non-RDO BC7: superior to ispc_texcomp's. Written in ispc.

I'm currently working on RDO BC7 and better support for PVRTC. We are building a portfolio of encoders for all the formats, as fast as we can. We're going to keep adding encoders over the next few years.

Our intention is not to compete against crunch (that's commercial suicide). I put a ton of value into crunch, and after Alexander optimized .CRN more its value went through the roof. A bunch of large teams are using it on commercial products because it works so well.


Sunday, June 17, 2018

PVRTC encoding examples

This is "testpat.png", which I got somewhere on the web. It's a surprisingly tricky image to encode to PVRTC. The gradients, various patterns, the transitions between these regions and even the constant-color areas are hard to handle in PVRTC. (Sorry, there is lena in there. I will change this to something else eventually.)

Note my encoder used clamp addressing for both encoding and decoding but PVRTexTool used wrap (not that it matters with this image). Here's the .pvr file for testpat.

Original

BC1: 47.991 Y PSNR

PVRTexTool "Best Quality": 41.943 Y PSNR

Experimental encoder (bounding box, precomputed tables, 1x1 block LS): 44.914 Y PSNR:
Here's delorean (resampled to .25 original size):

Original

BC1: 43.293 Y PSNR, .997308 Y SSIM

PVRTexTool "Best Quality": 40.440 Y PSNR, .996007 Y SSIM

Experimental encoder: 42.891 Y PSNR, .997021 Y SSIM
Interestingly, on delorean you can see that PVRTC's handling of smooth gradients is clearly superior vs. BC1 with a strong encoder.

Here's xmen_1024:

Original

BC1: 37.757 Y PSNR, .984543 Y SSIM
BC1 (AMD Compressonator quality=1): 37.306 Y PSNR, .978997 Y SSIM

PVRTexTool "Best Quality": 36.762 Y PSNR, .976023 Y SSIM

Experimental encoder: 37.314 Y PSNR, .9812 Y SSIM

"Y" is REC 709 Luma, SSIM was computed using OpenCV. The images marked "BC1" were compressed using crunch (uber quality, perceptual mode), which is a bit better than AMD Compressonator's output.

Tuesday, June 12, 2018

Real-time PVRTC encoding for a universal GPU texture format system

Here's one way to support PVRTC in a universal GPU texture format system that transcodes from a block based format like ETC1S.

First, study this PVRTC code:
https://bitbucket.org/jthlim/pvrtccompressor/src/default/PvrTcEncoder.cpp

Unfortunately, this library has several key bugs, but its core texture encoding approach is sound for real-time use.

Don't use its decompressor, it's not bit accurate vs. the GPU and doesn't unpack alpha properly. Use this "official" decoder instead as a reference instead:

https://github.com/google/swiftshader/blob/master/third_party/PowerVR_SDK/Tools/PVRTDecompress.h

Function EncodeRgb4Bpp() has two passes:

1. The first pass computes RGB(A) bounding boxes for each 4x4 block: 

    for(int y = 0; y < blocks; ++y) { for(int x = 0; x < blocks; ++x) { ColorRgbBoundingBox cbb; CalculateBoundingBox(cbb, bitmap, x, y); PvrTcPacket* packet = packets + GetMortonNumber(x, y); packet->usePunchthroughAlpha = 0; packet->SetColorA(cbb.min); packet->SetColorB(cbb.max); } }
Most importantly, SetColorA() must floor and SetColorB() must ceil. Note that the alpha version of the code in this library (function EncodeRgba4Bpp()) is very wrong: it assumes alpha 7=255, which is incorrect (it's actually (7*2)*255/15 or 238). 

This pass can be done while decoding ETC1S blocks during transcoding. The endpoint/modulation values need to be saved to a temporary buffer.

It's possible to swap the low and high endpoints and get an encoding that results in less error (I believe because the endpoint encoding precision of blue isn't symmetrical - it's 4/5 not 5/5), but you have to encode the image twice so it doesn't seem worth the trouble.

2. Now that the per-block endpoints are computed, you can compute the per-pixel modulation values. This function is quite optimizable without requiring vector code (which doesn't work on the Web yet):

for(int y = 0; y < blocks; ++y) { for(int x = 0; x < blocks; ++x) { const unsigned char (*factor)[4] = PvrTcPacket::BILINEAR_FACTORS; const ColorRgba<unsigned char>* data = bitmap.GetData() + y * 4 * size + x * 4; uint32_t modulationData = 0; for(int py = 0; py < 4; ++py) { const int yOffset = (py < 2) ? -1 : 0; const int y0 = (y + yOffset) & blockMask; const int y1 = (y0+1) & blockMask; for(int px = 0; px < 4; ++px) { const int xOffset = (px < 2) ? -1 : 0; const int x0 = (x + xOffset) & blockMask; const int x1 = (x0+1) & blockMask; const PvrTcPacket* p0 = packets + GetMortonNumber(x0, y0); const PvrTcPacket* p1 = packets + GetMortonNumber(x1, y0); const PvrTcPacket* p2 = packets + GetMortonNumber(x0, y1); const PvrTcPacket* p3 = packets + GetMortonNumber(x1, y1); ColorRgb<int> ca = p0->GetColorRgbA() * (*factor)[0] + p1->GetColorRgbA() * (*factor)[1] + p2->GetColorRgbA() * (*factor)[2] + p3->GetColorRgbA() * (*factor)[3]; ColorRgb<int> cb = p0->GetColorRgbB() * (*factor)[0] + p1->GetColorRgbB() * (*factor)[1] + p2->GetColorRgbB() * (*factor)[2] + p3->GetColorRgbB() * (*factor)[3]; const ColorRgb<unsigned char>& pixel = data[py*size + px]; ColorRgb<int> d = cb - ca; ColorRgb<int> p{pixel.r*16, pixel.g*16, pixel.b*16}; ColorRgb<int> v = p - ca; // PVRTC uses weightings of 0, 3/8, 5/8 and 1 // The boundaries for these are 3/16, 1/2 (=8/16), 13/16 int projection = (v % d) * 16; int lengthSquared = d % d; if(projection > 3*lengthSquared) modulationData++; if(projection > 8*lengthSquared) modulationData++; if(projection > 13*lengthSquared) modulationData++; modulationData = BitUtility::RotateRight(modulationData, 2); factor++; } } PvrTcPacket* packet = packets + GetMortonNumber(x, y); packet->modulationData = modulationData; } }

The code above interpolates the endpoints in full RGB(A) space, which isn't necessary. You can sum each channel into a single value (like Luma, but just R+G+B), interpolate that instead (much faster in scalar code), then decide which modulation values to use in 1D space. Also, you can unroll the innermost px/py loops using macros or whatever.

Encoding from ETC1S simplifies things somewhat because, for each block, you can precompute the R+G+B values to use for each of the 4 possible input selectors.

That's basically it. If you combine this post with my previous one, you've got a nice real-time PVRTC encoder usable in WebAssembly/asm.js (i.e. it doesn't need vector ops to be fast). Quality is surprisingly good for a real-time encoder, especially if you add the optional 3rd pass described in my other post.

Opaque is tougher to handle, but the basic concepts are the same.

The encoder in this library doesn't support punch-through alpha, which is quite valuable and easy to encode in my testing.