Wednesday, May 18, 2016

A consultant's perspective on working full-time for a corporation

I've been working as a software consultant at Binomial LLC (a company Stephanie Hurlburt and I just started) for just enough time now that my perspective has begun to shift from my former full-time mentality. I'm obviously no expert at this, but I'm learning as fast as possible, and I intend on sharing everything I learn.

Let's compare what it's like to work full-time somewhere from the viewpoint of a consultant's perspective. This is something I couldn't do before, and very few if any of my coworkers ever talked about stuff like this at all:

Note: Obviously, not all corporations are bad places to work. I'm very much generalizing here from multiple past experiences at medium (say >50 people) to large corporations. There are some very nice companies out there too.

- Corporations are Control Freaks


With full-timing, your "client" (your employer) is extremely controlling, even in open office and "no manager" type companies.

Your control fetish "client" probably requires you to commute every day, sit in an office somewhere in a noisy non-optimal environment, and bang out code like a machine every day of the week for weeks and months at a time.

Some clients actually encourage the other "consultants" (your coworkers) to use peer pressure on you so you work longer than is sustainable or healthy. Or, they just require you to overwork yourself.

Some clients record and monitor all of your network traffic (or your emails, instant message traffic etc.), monitor when you enter and leave the workplace, etc.

You know what? Working for ultra-controlling clients like this sounds terrifying from my new perspective.

Corporations and recruiters work together to suppress wages and opportunities


I know many programmers who are locked completely up within the corporate fill-timing paradigm. When they inevitably loose their jobs (due to layoffs, random purges, teams or companies failing, etc.) they are thrown to the full-timing wolves that orbit all the major tech companies.

Now you need to put yourself on the market, start talking with recruiters, and dust-off your elite whiteboarding skills - fast.

With consulting, a good goal is to build up a community of potential clients over time and keep this community as happy as possible. You're not dependent on any single client this way, and you gain access to more potential projects this way. Price fixing is unlikely if your client community is diverse.

- Corporations can harbor, even encourage Toxic Teams


A few years ago at one company, I got tossed onto a horribly abusive, brutal, and kinda nutty team right out of the starting gate. That was a super intense, Alice in Wonderland type experience that I will never repeat. (Why did I stay on this team for around a year, even though I should have left? Because I loved the company's products and this team was trying to build a new engine for future products. I eventually moved to another team that was actually shipping something real.)

With consulting, we can first sign a short-term contract (from days to a few months depending on the scope) to feel each other out. If it doesn't work out, no big deal, the damage (to both parties) is minimized. Working for a new company as a consultant isn't this stressful, life changing event like changing jobs is.

- Corporations Demand Intense and Extreme Loyalty


These "clients" want to basically own your life.

Working full-time at a corporation, you negotiate, then sign a contract. That's usually it. Truly renegotiating can be tough, because to have a real negotiation you must be willing to leave. This is a type of client that demands total absolute loyalty! Many corporations even claim ownership over all your work, even things you do on your own time at home.

As a consultant, you can potentially renegotiate in between every project. It's not that big of a deal.

- Corporations are Work-A-Holics


At a corp, if I want to take time off I must check to see if I'm allowed first. At many places you only get a handful of weeks off once every year. If you get sick, or have some life changing event you need to deal with, you're potentially screwed.

This seems analogous to a "toxic client" in consulting, one that is super jealous and extremely demanding of your time! 

- Corporations limit your freedom


At one of my previous companies, I could theoretically wheel my desk anywhere and pick and choose what team I worked with. In practice, office politics and super-cliquish teams massively constrained this freedom. At another company, I was stuck on the same team basically forever, because the org structure had ossified around us over many years.

As a consultant I can pick and choose who I want to work with. Obviously, I tend to choose nicer clients, because at the end of the day this is a social process and I personally like working with friendly and no-ego teams. My market of teams to work with is potentially massive now. I can wheel my little virtual desk anywhere in the world.

Friday, May 13, 2016

More company culture quotes from Disrupted

Interesting quotes from Disrupted: My Misadventure in the Start-Up Bubble (which I'm only 1/3rd through):
"I've been warned that at a place like HubSpot the worst thing you can say is anything that was done at your last company is something we should think about doing here. Even if your last company was Google or Apple, nobody at HubSpot wants to be told, especially by some newcomer, some outsider, that there might be a better way. HubSpot is HubSpot. It's unique. It's different. HubSpot has its own way of doing things. We're rethinking everything. We're challenging all the assumptions. We're not just making software, we're reinventing the way companies do business."
A funny take on HubSpot's usage of the DISC personality test:
"Managers, people like Zack, get the same training that I'm getting, but then they go to an extra class where they learn how to use DISC when they are managing people. Try to image the calamity of that: Zack, age twenty-eight, with no management experience, gets training from Dave, a weekend rock guitarist, on how to apply a set of fundamentally unsound psychological principles as a way to manipulate the people who report to him."
It's crazy what people will do for money.

Thursday, April 28, 2016

Hiring Group Dynamics

So there are several interesting hiring related phenomenon I've seen at various companies. I think some of the most exaggerated hiring behavior will emerge at "flat" companies with yearly bonuses based (partially) from the data gathered during peer feedback.

Here's a description of one category of emergent behavior I noticed when the programmers have nearly free reign to run the hiring process and who will ultimately get hired:

Want a good bonus? Never hire new competition!

You would think the programmers doing the hiring would always be fair and unbiased in their assessments of each candidate's abilities, right? And they would, of course, always optimize for adding value to the company by making good hires. The company programmers involved in the process would choose good candidates for each opening, irrespective of politics, or concerns over their future positions or bonuses, etc.

In practice, I think especially at companies with massive yearly bonuses, the company's programmers will band together unofficially and make it practically impossible for potential competition to enter the company, make waves, and possibly eclipse the old guard. We have a classic conflict of interest situation here. This tendency to embargo your competition is especially effective when hiring specialists, such as graphics programmers, although I've seen it happen regardless of specialty.

At one well-known company, I watched around a dozen experienced graphics programmers get rejected in our interview process. Each time, without exception it was a NO HIRE, even though we were in dire need of graphics programmers. A few of the names were pretty well known in graphics circles, so my jaw dropped after several of these NO HIRE interviews.

I was involved in some of these interviews. Almost every time, these candidates would do sometimes incredible things during the whiteboard interview, but somehow one or two graphics programmers would always find some other reason to be thumbs down. (I didn't say anything at the time, because I was afraid doing so would have made enemies and hurt my career at this company. I was basically incentivized to say nothing by the peer feedback based bonus system.)

Eventually, upper management quietly noticed that irrespective of our company's dire need of graphics engineers, we weren't hiring them anyway. This company had a major upcoming threat to its primary profit generating product looming in its future, and the counter to this competitive threat involved some very specialized graphics engineering. The CEO had to step in and basically just subvert the entire completely broken hiring process and just start hiring graphics contractors almost sight unseen.

Unfortunately, these graphics contractors had virtually no path to full-time employment, so they got treated like 3rd class citizens at best and all were eventually pushed out. (Sometimes years later, even after delivering massive amounts of value to some teams.)

Anyhow, how do I know all this stuff? At this particular company, I somehow fell through the cracks and was interviewed and hired as a generalist programmer, not a graphics specialists. Eventually, the old graphics guard basically got lazy and shied away from the company's toughest graphics problems (or actually shipping anything involving new graphics code), but somebody had to do this "dirty" graphics work. The non-graphics programmers figured things out and started sending graphics work my way, and I started asking myself "why are we not hiring any graphics programmers?!"

Unfortunately I'm terrible at saying "no" to requests for help, so this resulted in a lot of work.

Turns out, that refined, "fair" hiring machine that management was so proud of was a total joke.

Sunday, April 10, 2016

Tips for Interviewing at Software Companies

Here's another blog post in the "Rich goes off the rails and reveals a bunch of shit he's learned over the years while working as a corporate programmer" category.

Companies Must be Continually Reminded that the Interview Goes Both Ways


Many corps have an internal company culture that places the company in the superior position relative to job candidates. These companies feel they can choose who they want from a seemingly endless variety of potential employees, so who cares how they're treated right? The reasoning goes "we'll just hire someone else" if a candidate pushes back.

We need to collectively turn the tables on companies like this. Let's give them a powerful form of feedback. Let's exercise our right to "route around" bad companies and not apply or accept job offers from corporations that act like we are replaceable cogs. (Alternately, let's all talk between ourselves and collectively compare notes and boost our compensation rates and working conditions. They can't stop us!)

I'm hoping this blog post will help make people more able to discern the good companies from the bad ones. For the record, I do believe there are many good companies out there, but for every good company there seems to be a bunch of bad ones.

Remember: We Write the Software Which Runs the System


Here's a key concept to internalize: We write the software that literally drives this entire system. Food production and distribution, electricity production and distribution, telecommunications, government, finance, trucking, planes, etc. It's all ran by computers one way or the other, and we write the code that makes these computers work. Without computers this entire system crumbles into the dark ages.

As time goes by more and more of the system is becoming automated and computerized. This means we as programmers collectively have the power in these relationships with corporations, but we haven't effectively organized ourselves or figured out how to best exercise our power yet. We now have the technology to instantly communicate between ourselves, which if we all start using it can lead to massive changes in a relatively short period of time.

Interview Tips


Some corps are exquisitely designed to extract as much "value" from you as quickly as possible, your health and sanity be damned. Above all, I want to help other programmers avoid places like this. When applying for a position at a software corp, keep these things in mind:

- Follow your instincts.

Ask a lot of questions. Learn how to interpret body language. Are you treated with respect? Are your questions answered in a straightforward way?

Remember, the hiring pipelines of these companies are tuned to take advantage of the macro-level psychological profile of "typical" programmers. Get educated, fast. These companies are not your friends. They will try to get into your brain and "bend" you psychologically in order to make you conform and "fit in" to their brand of corporate utopia.

Trust your gut feelings during the interview! If you feel disrespected or not taken seriously, don't ignore it. It's not just in your head. Run away! You won't grow there, and it'll be a dehumanizing place.

- "You need us to achieve success, you are nothing so follow us!"

Run away fast! I've seen this tactic applied against dozens of developers after one company collapsed in Dallas. Sadly, it worked with a bunch of people and they ultimately all got screwed.

- Deeply analyze any critique given to you during the interview

Sometimes critique that seems merit-based is really bias in disguise. It's very important that if you get bad feedback, you stand back and think "Is this true? Or is there just something wrong with this company (elitism, sexism, they just didn't want to hire you, etc)?"

- How much is your time worth?

Ultimately, you are selling your time for digital digits in some bank computer somewhere. You will not get this time back, period. It's worth a lot, probably much more than you think.

How much income does the company actually make given your time? Some companies make millions of dollars per software developer, yet pay only a fraction of this to you.

Remember, this is a market and market principles apply here. By increasing our communication levels and giving feedback to the market (by routing around bad companies, demanding higher pay during negotiations, pushing back during interviews, etc.) we can collectively raise our salaries, compensation packages, and improve our working conditions.

- Are you gambling your time away trying to get your stock "lottery ticket"?

In this scenario, you're willing to be underpaid, but you're hoping the company will sell out in X months or years and make you millions. Just beware that this is a form of gambling with your time and finances. 

I've seen a few companies exquisitely exploit and continually encourage this "gambler mindset" with its workers in order to suppress wages. Be careful!

- Admit that you probably suck at negotiating

Generally, in my experience programmers make horrible negotiators. The most important thing is to approach the negotiation with the proper mindset. They need you, not vice versa, and they probably have much more money (and the capability to pay you) than you suspect.

This is a topic definitely worthy of another blog post.

- Learn how to recognize negative psychological traits like sociopathy and narcissism.

Some companies are full of sociopathic monsters whose job description (and honestly, their corporate duty) is to exploit you as much as possible.

Learn to recognize the signals. They will try to get into your head, quickly build a mental model of you, then play off your willingness to not rock the boat or be seen as a "troublemaker" to the corporation. They will find subtle ways to threaten you, if needed, to keep you in line.

Narcissists can be especially horrible to work around, especially when they are in management. Learn the signs.

- Beware of code words like "Elite", "10Xer", "no bosses", "scheduled crunch", etc.

"Elite" - Programmers willing to work endlessly in sometimes horrific conditions are labeled "Elite". Yes, we have a very screwed up system here, where people who get exploited and are worked until exhaustion are labeled "Elite". Avoid companies like the plague that use this word in their job descriptions or recruiting emails. 

(Attention recruiters: Please stop sending me emails with the word "elite" anywhere in them. Thanks!)

"10X programmer" - Someone who hacks up some shit (sometimes actually using stolen ideas), and then depends on other practicing programmers to do the actual work of making these (sometimes very sub-optimal) ideas actually shippable. These programmers tend to publicly take full public credit for things they only partially worked on or thought up. We don't need more of these "10x" assholes, instead we need to completely reboot our programmer culture so the very concept of a "10x'er" is totally alien.

"No bosses" - This recruiting shtick from 1999 means there are many powerful bosses in hiding, and/or that everyone is effectively your boss. Also avoid companies that advertise this like the plague. It's a recruiting tactic designed to attract programmers who had bad bosses in the past. (Believe it or not, there are many very good managers out there!) 

Also, without managers, you will be directly exposed to the many wolves out there who can make your programming life a living hell. A good manager will shield their programmers from endless bullshit and insanity.

"Crunching" - This means the company externalizes the cost to your health and sanity of working endless hours. They may have bad planning, or bad business models, whatever. Avoid companies that crunch endlessly like the plague unless you just don't care about your health.

"Scheduled Crunch" or "we occasionally crunch" - Again, any company that schedules crunch just doesn't understand or care how to plan, or is ignoring the cost of working crazy hours on your health.

- Ask: Does the corp own all your work?

Can you work on stuff at home, like open source software or your own commercial software? Avoid companies like the plague that don't let you work on your own stuff.

- Ask or check for insane contract clauses

Is there a non-disparagement clause? If you quit, must you wait X months or whatever before you can work on something else? Push back and avoid companies that do questionable shit like this.

- Who contacted you first?

If the company reached out to you first, did a recruiter contact you, or an actual programmer at the company? 

Ideally, a real-life programmer reached out to you. Only a programmer working at the company can genuinely answer your questions and give you a real idea about the working conditions and types of problems you will be working on there.

Remember recruiters are just part of the company's "hiring pipeline". The pipeline is basically "X Programmers In -> Y Programmers Out". You are just a replaceable number to these companies. Recruiters will say anything in order to get you to sign the dotted line.

- Is there a whiteboard interview?

Is there a whiteboard interview? Push back and say no. I've helped hire at several successful companies (easily over 100 people over the years) that didn't use whiteboard interviews at all. Anyone saying "this is just the way things are done" doesn't have perspective and is part of the serious problems our industry has.

After taking and giving way too many of these whiteboard interviews, I think they are total fucking bullshit. Whiteboard interviews test a candidate's ability to scribble uncompilable pseudo-code on a chalkboard (!) while being faced down by multiple adversarial programmers. (Who in some cases would rather be doing anything else, and who don't want more internal programmer competition in the first place!)

Some companies give programmers whiteboard questions from various books verbatim, like this one. This is just downright ridiculous, a waste of time for everyone involved, and a total demonstration that the process is completely bogus.

Whiteboard interviews are extremely stressful to candidates. I've seen amazing programmers just lock up and become dysfunctional in these conditions. We are testing candidates for the wrong abilities. I refuse to take part in any more of this insanity.

Some programmers use these bogus hazing ritual-like whiteboard interviews to help drive down the applicant's ego while simultaneously driving up their egos. This is a huge red flag -- avoid these programmers and the companies who employ them.

- "We only hire senior programmers"

Let's translate: We don't help train new programmers. The phrase "programmer empathy" isn't on our radar here. We probably treat each other like complete trash. We actually assume you are an idiot until you battle your way into a position of respect. Avoid companies like this!

Remember, all senior programmers at one time were junior programmers.

- What's the company's culture?

Ask lots of questions from people who work there. Is the official company message great, but when you pull programmers aside they actually hate working there? Search the web for reviews of the company, search linkedin and find former employees and ask them about the company.

- Talk to the executives

Are they sociopaths? Raging narcissists? Ask them what they look for in programmer candidates, and see how they respond. Do they treat you with respect?

I've known execs who thought programmers were literally crazy, and trust me the companies they ran were not healthy.

- Look around the office

Little things can give you a lot of information. Is the office a mess? How much space and privacy do employees have? Is the environment quiet or loud?

- Does this company give proper attribution for ideas it uses?

I'm throwing this out here because I've noticed one very well known VR company outright steal ideas from its competitors or academics working in the space. Explicitly ask the company about its attribution policy.

Personally, I will never involve myself in any way with people or corporations who outright steal ideas for personal or corporate gain. (I can't believe I have to even say this. We have fallen to the level of stealing and re-branding ideas from each other!)

- Master your fears

What do you fear? Recognize it and look past it, because these companies are designed to exploit your fears and use them against you as a weapon.

Saturday, April 9, 2016

Why I left Unity

I left Unity about a month ago. It was a short, but amazing and enlightening, experience. I felt that my job there was done (or here). My first little contribution to the world of VR.

Instead of pivoting onto some new project or something, I've decided to basically pivot my entire life. Currently, it seems a large subset of the industry has rearranged itself around VR/AR, so I figure my timing couldn't be better. Unsurprisingly, pivoting your entire life isn't easy but I believe it's for the better. Working a full-time software job endlessly bulldozing countless lines of anonymous C/C++ code isn't on my list of happy things to do anymore. Screw that!

I'm now living in Seattle, a city I've only visited like 3 times in my entire 6 years living in Washington. Holy shit, what an amazing city! Bellevue is such a barren wasteland compared to Seattle it's ridiculous. I'm avoiding the eastside completely because the place reminds me of a mental prison. I've got some particularly bad memories there. I can't even look at downtown anymore without thinking of various horrible memories. I'm now a much happier independent consultant here in Seattle.

You know, just thinking here: I must be a bunch of game/software industry execs worst nightmare. "Oh shit, Rich is some crazy ass SOB, he's going to spill all of our endless secrets and damage our hiring and spotless reputations!" Yes, I've seen a lot of shit in my career. A lot of it the world should know about -- someday. If it makes you feel any better, I'm almost as public as I possibly can be, and I'm staying that way. I really love blogging.

I've been thinking a lot about the word "counterculture", especially in relation to the software industry. I know exactly what the "mainstream", or "normal" software developer culture is. The corporate software development and game industry culture I've seen so far is immature, exploitative, abusive, and downright dehumanizing. Where are the alternative software developer cultures?

Imagine the kinds of amazing new software that could be created in alternative cultural environments. To really improve this industry we need to upgrade our culture, not our hardware, comp-sci curriculum, or office arrangements. We need to fix the root problem, which is to escape from this insane mainstream culture and create something healthier.

Sunday, March 27, 2016

Quotes from "My Year In Startup Hell"

I loved this article on Fortune.com (exerted from "Disrupted: My Misadventure in the Start-Up Bubble"). It covers so many corporate company culture things I've seen or experienced in my game development career in a single article. The article is from a startup perspective, but much of this applies to many other more mature companies. These quotes especially struck a chord with me:
"Arriving here feels like landing on some remote island where a bunch of people have been living for years, in isolation, making up their own rules and rituals and religion and language—even, to some extent, inventing their own reality. This happens at all organizations, but for some reason tech startups seem to be especially prone to groupthink. Every tech startup seems to be like this. Believing that your company is not just about making money, that there is a meaning and a purpose to what you do, that your company has a mission, and that you want to be part of that mission—that is a big prerequisite for working at one of these places."
On people that get fired:
"Dharmesh’s culture code incorporates elements of HubSpeak. For example, it instructs that when someone quits or gets fired, the event will be referred to as “graduation.” In my first month at HubSpot I’ve witnessed several graduations, just in the marketing department. We’ll get an email from Cranium saying, “Team, just letting you know that Derek has graduated from HubSpot, and we’re excited to see how he uses his superpowers in his next big adventure!” Only then do you notice that Derek is gone, that his desk has been cleared out. Somehow Derek’s boss will have arranged his disappearance without anyone knowing about it. People just go up in smoke, like Spinal Tap drummers."
On what I call "Reality Shaping":
"The ideal HubSpotter is someone who exhibits a quality known as GSD, which stands for “get shit done.” This is used as an adjective, as in “Courtney is always in super-GSD mode.” The people who lead customer training seminars are called inbound marketing professors and belong to the faculty at HubSpot Academy. Our software is magical, such that when people use it—wait for it—one plus one equals three. Halligan and Dharmesh first introduced this alchemical concept at HubSpot’s annual customer conference, with a huge slide behind them that said “1 + 1 = 3.” Since then it has become an actual slogan at the company. People use the concept of one plus one equals three as a prism through which to evaluate new ideas. One day Spinner, the woman who runs PR, tells me, “I like that idea, but I’m not sure that it’s one-plus-one-equals-three enough.”
This is so true:
"Another thing I’m learning in my new job is that while people still refer to this business as the “tech industry,” in truth it is no longer really about technology at all. “You don’t get rewarded for creating great technology, not anymore,” says a friend of mine who has worked in tech since the 1980s, a former investment banker who now advises startups. “It’s all about the business model. The market pays you to have a company that scales quickly. It’s all about getting big fast. Don’t be profitable, just get big."
 Yup:
"On top of the fun stuff you create a mythology that attempts to make the work seem meaningful. Supposedly millennials don’t care so much about money, but they’re very motivated by a sense of mission. So, you give them a mission. You tell your employees how special they are and how lucky they are to be here. You tell them that it’s harder to get a job here than to get into Harvard and that because of their superpowers they have been selected to work on a very important mission to change the world. You make a team logo. You give everyone a hat and a T-shirt. You make up a culture code and talk about creating a company that everyone can love. You dangle the prospect that some might get rich."
Umm yea I know the feeling:
"Training takes place in a tiny room, where for two weeks I sit shoulder to shoulder with 20 other new recruits, listening to pep talks that start to sound like the brainwashing you get when you join a cult. It’s everything I ever imagined might take place inside a tech company, only even better."
On the office environment:
"Everyone works in vast, open spaces, crammed next to one another like seamstresses in Bangladeshi shirt factories, only instead of being hunched over sewing machines people are hunched over laptops. Nerf-gun battles rage, with people firing weapons from behind giant flat-panel monitors, ducking and rolling under desks. People hold standing meetings and even walking meetings, meaning the whole group goes for a walk and the meeting takes place while you’re walking."
Personally, I've learned over the years that I really need to live solidly in reality. I can't switch back and forth from "Corporate Enforced Reality #57" to "Real-World Ground-Level Reality" every single day of the week and stay happy and healthy long term.

I've learned that I don't need to associate myself with any corporation to be a "real" developer. If you aren't treated with respect by someone because you aren't associated with a company label, that person may very well not be a person you want to associate yourself with.

Here's a little rant: I think having to rigidly conform to a corporate mental model (like the insane one described at the company above) to earn money is demeaning and even dehumanizing. Folks, 1+1 does not equal 3. This is insane.

Somewhere back in time I must have fallen into some jacked parallel universe where treating workers like utterly replaceable mind controllable automatons is normal, accepted, encouraged, and even something to be proud of. In the universe of 1984, 1+1=3.

And the money the company gives you? It's just binary bits in some bank computer. There are plenty of ways of making money that don't involve becoming mentally insane. Trading off sanity for a bi-weekly set of binary digits added to your account balance actually isn't the greatest idea in my experience. Staying at a job you are unhappy with thinking that, eventually, you will achieve true long lasting happiness during "retirement" is also a pretty extreme way of living life. There are other paths to happiness.

My brain now pushes back when I think about living and working this way again. It basically says "hey that's super unhealthy and unsustainable!". I used to be irrationally fearful of working outside of a corporate bubble. Fear is your worst enemy and can make you much more manipulatable to others.

A long time ago, at Ensemble Studios in Dallas, I was completely wrapped up in our company's special little super insular "tribal" company culture. The company collapsed overnight and we all learned what was actually occurring at the corporate level for the previous 6-9 months. It became super clear that we all were living in a fairy-tale corporate enforced reality bubble. Even many of my "company friends" evaporated overnight into non-friends. It was ugly: even the very personalities of formally awesome coworkers instantly changed.

My ego was tied solidly into this company's culture and products. After it collapsed I had to carefully hit the "ego reset button". I had to strongly resist automatically following the locally exploitative paths laid out for me after Ensemble collapsed.

Monday, January 18, 2016

Compression ratio plots of zlib competitors Brotli and BitKnit

Continuing yesterday's post, here are the compression ratios for all 12k data points (between 256 -128MB) in the LZHAM vs. LZMA test file corpus, on various codecs: LZ4HC L8, Brotli L9, Rad's BitKnit VeryHigh, LZHAM m4, and of course zlib L9.

Vertical Axis = Compression ratio (higher is more compression)
Horizontal Axis = File, sorted purely by zlib's compression ratio
Color = Codec (using the same color coding as my previous post)

The data points have been sorted by zlib's compression ratio, which is why the green line is so nice and smooth. These are the same data points as yesterday's scatter graphs.


LZ4HC vs. BitKnit vs. zlib:



LZ4HC vs. Brotli vs. zlib:



 Brotli vs. BitKnit vs. zlib:



Here are a couple bonus plots, this time for LZHAM vs. LZ4HC or Brotli:



Comments:


It's clear from looking at these plots that simply stating "codec X has a higher ratio than codec Y" is at best a gross approximation. It highly depends on the file's content, the file's size, and even how well a file's content resembles what the codec designer's tuned the codec to handle best.

For example, Brotli has special optimizations (such as a precomputed static dictionary) for textual data. Also, like zlib, it uses entropy coding tables (the Huffman symbol code lengths) precomputed by the compressor, which can give it the edge on smaller files vs. codecs that use purely adaptive table updating approaches like LZHAM. (Also, I would imagine that the more stationary the data source, the more precomputed Huffman tables make sense.)

Another advantage Brotli shares with zlib due to its usage of precomputed Huffman tables: It doesn't need to spend valuable CPU time computing Huffman code lengths during decompression. LZHAM struggles to do this quickly, particularly on small files (less than approx. 4-8KB) where most decompression time is spent computing Huffman code lengths (and not actually decompressing the file!).

It's also possible to design a codec to be very strong at handling binary files. Apparently, BitKnit is tuned more in this direction. It still handles text files well, but it makes some intelligent design tradeoffs that favor really high and symmetrical compression/decompression performance with only a small sacrifice to text file ratios. This tradeoff makes a lot of sense, particularly in the game development context where a lot of data files are in various special binary formats.

Interestingly, Brotli and BitKnit seem to flip flop back and forth as to who is best ratio-wise. There are noticeable clusters of files where Brotli is slightly better, then clusters with BitKnit. I'll be analyzing these clusters soon to attempt to see what's going on. I believe this helps show that this data file corpus has a decent amount of interesting variety.

Finally, Brotli's compression ratio is just about always at least as good as zlib's (or extremely close). IMO, the Brotli team's Zopfli roots are showing strongly here.

Next Steps:


I need to break down these ratio graphs into clusters somehow. So we can then show results for "small text files" vs. "large binary files" etc. As the compressed totals show yesterday, Brotli and BitKnit have approximately equal compression power across the entire corpus. But there are categories of data were one codec is better than the other.

Looking into the future, it may be a good idea for the next major compressor to support both precomputed (Brotli-style) and semi-adaptive (LZHAM and presumably BitKnit-style) entropy table updating approaches.

Thanks to Blue Shift's CEO, John Brooks, for suggesting to chart this way.

Sunday, January 17, 2016

zlib in serious danger of becoming obsolete

Intro


I’m now starting to deeply analyze the performance of two new general purpose data compression codecs. One is Google’s Brotli codec, another is a brand new codec from Rad Game Tools named “BitKnit”. Both codecs are attempting to displace zlib, which is used by the Linux kernel, and is one of the most used compression libraries in the world. So I’m paying very close attention to what’s going on here.

To put things into perspective, in the lossless compression world we’re lucky to see a significant advancement every 5-10 years. Now, we have two independently implemented codecs that are giving zlib serious competition on multiple axes: throughput, ratio, and even code size.

Summary


I’m now using what I think is a very interesting and insightful approach to deeply analyze the practical performance characteristics of lossless codecs. As I learned while working on the Steam Linux/SteamOS project, robust benchmarking can be extremely difficult in practice. So I'm still gathering and analyzing the data, and tweaking how it’s graphed. What I’ve seen so far looks very interesting for multiple reasons.

First, it's looking pretty certain that both BitKnit and Brotli compete extremely well against zlib's decompression performance, but at much higher (LZMA/LZHAM-like) compression ratios. Amazingly, BitKnit’s compressor is also extremely fast, around the same speed as zlib’s. (By comparison, at maximum compression levels, both Brotli’s and LZHAM's compressors are pretty slow.) The graphs in this post only focus on decompression throughput, however. I’m saving the compression throughput analysis for another post.

One rough way of judging the complexity of a compressor vs. others is to compare the number of lines of code in each implementation. BitKnit at 2,700 lines of code (including comments) is smaller than both LZ4 (3,306 - no comments), zlib's (23,845 - no comments, incl. 3k lines of asm), or LZHAM’s (11,651 - no comments). Brotli's is rather large at 47,919 lines (no comments), but some fraction of this consists of embedded static tables.

Interestingly to me, BitKnit’s decompressor uses around half the temporary work RAM of LZHAM's (16k vs. 34k).

New Benchmarking Approach


While writing and analyzing LZHAM I started with a tiny set (like 5-10) of files for early testing. I spent a huge amount of time optimizing the compressor to excel on large text XML files such as enwik8/9, which are popular in the lossless data compression world. I consider this a serious mistake, so I've been rethinking how to best benchmark these systems.

The new codec analysis approach I’m using runs each decompressor on thousands of test corpus files, then I plot the resulting (throughput, ratio) data pairs in Excel using scatter graphs. The data points are colored by codec, and the points are transparent so regions with higher density (or with data points from multiple overlapping codecs) are more easily visualized. This is far better than what the Squash Compression Benchmark does IMO, because at a single glance you can see the results on thousands of (hopefully interesting) files, instead of the results on only a single file at a time from a tiny set of corpus files.

I generated these scatter graphs on 12k data files from the final LZHAM vs. LZMA corpus. There is some value in using these data files, because I used this same test corpus to analyze LZHAM to ensure it was competitive against LZMA. This corpus consists of a mix of game data, traditional textual data, every other compression corpus I could get my hands on, some artificial XML/JSON/UBJ test data, and lots of other interesting stuff I’ve encountered on the various projects I’ve worked on over the years. (Unfortunately, I cannot publicly distribute the bulk of these data files until someone comes up with an approach that allows me to share the corpus in a one way encrypted manner that doesn’t significantly impact throughput or ratio. I have no idea how this could really be done, or even if it's possible.)

The Data


The X axis is decompression throughput (right=faster), and the Y axis is compression ratio (higher=better ratio or more compression). The very bottom of the graph is the uncompressible line (ratio=1.0).

Color code:

Black/Gray = LZHAM
Red = Brotli
Green = zlib
Blue = BitKnit
Yellow = LZ4

Totals for 11,999 files (including uncompressible files):

Uncomp:   2,499,169,096
lz4:      1,167,777,908 2.14
zlib:     1,044,180,362 2.39
brotli:     918,949,263 2.72
bitknit:    898,621,908 2.78
lzham:      882,723,287 2.83

Totals after removing 1,330 files with a zlib compression ratio less than 1.1 (i.e. uncompressible files):

Uncomp:    2,147,530,394
lz4:         815,221,536 2.63
zlib:        693,090,474 3.1
brotli:      568,461,065 3.78
bitknit:     547,869,148 3.92
lzham:       532,235,143 4.03

This is a log2 log2 plot, basically an overview of the data:


This is a zoomed linear plot , looking more closely at the uncompressible (ratio=1) or nearly uncompressible (ratio very close but not 1) regions:



This log2 log2 plot is limited to just LZHAM vs. BitKnit:


Finally, another log2 log2 plot showing just BitKnit vs. zlib:



Current Observations


Fabian Giesen (Rad) and I have noticed several interesting things about these scatter plots:

- The data points with a ratio of 1 (or extremely close to 1) show how well the algorithm handles uncompressible data, which is hopefully near memcpy() performance.

(Note LZMA’s data will be very interesting, because it doesn’t have good handling for uncompressible data.)

There are a handful (around 50-60 depending on the codec) of data points with a ratio slightly below 1 (.963-.999). The majority are small (287-1kb) uncompressible files.

- Slightly "above" this ratio (very close to ratio 1, but not quite), literal handling dominates the decompressor's workload. There are distinct clusters on and near the ratio=1 line for each compressor.

LZHAM actually does kinda well here vs. the others, but it falls apart rapidly as the ratio increases.

- Notice the rough pushed down "<" shape of each algorithm's plot. LZHAM's is pretty noticeable. At the bottom right (ratios at/close to 1.0), literals dominate the decompression time.

Interpreting this as if all algorithms are plain LZ with discrete literals and matches:
As you go "up" to higher ratios, the decompressor has to process more and more matches, which (at least in LZHAM) are more expensive to handle vs. literals. After the “bend”, as you go up and to the right the matches grow increasingly numerous and longer (on average).

- LZHAM has an odd little cluster of data points to the right (on the ratio ~3 line) where it almost keeps up with BitKnit. Wonder what that is exactly? (Perhaps lots of easily decoded rep matches?)

- Notice zlib’s throughput stops increasing and plateaus as the ratio increases - why is that? Somebody needs to dive into zlib’s decompressor and see why it’s not scaling here.

I need to add my implementation of zlib’s core API (miniz) to see how well it compares.

Important Notes:

- The x64 benchmark command line app was run on Win10, 2x Xeon E5-2690 V2 3.0GHz (20 cores/40 threads). Benchmark app is single threaded.

- All test corpus files are between 256 bytes and 127.7 MB

- All algorithms were directly linked into the executable, and the decompressors were invoked in the same way

- Each decompressor is invoked repeatedly in a loop until 10ms has elapsed, this is done 4 times and shortest average runtime is taken

- Brotli was limited to comp level 9, as level 10 can be too slow for rapid experimentation, 16MB dictionary size (the largest it supports)

- LZHAM was limited to using its regular parser (not it's best of X arrivals parser, i.e. the "extreme parsing" flag was disabled), 64MB dictionary size.

- LZ4HC compressor, level 8

- zlib's asm modules were not used in this run. It'll be interesting to see what difference the asm optimizations make.

Next Step


The next major update will also show LZMA, Zstd, and miniz. I'm also going to throw some classes from crunch in here to clusterize the samples, so we can get a better idea of how well each algorithm performs on different classes of data.

I feel strongly that scatter graphs like these can be used to help intelligently guide the design and implementation of new practical compressors.

A big thanks to Fabian “ryg” Giesen at Rad Game Tools for giving me access to "BitKnit" for analysis. BitKnit is going to be released in the next major version of Oodle, Rad’s network and general purpose data compression product.

Thursday, December 17, 2015

Light Indexed Deferred Rendering

I can't find this information anywhere on the net, so I'll put it here. I've always been fascinated by the various alternatives to pure deferred shading, so I was really excited when I first encountered a light indexed variant.

Backstory: Halo Wars

I tired of the challenges inherent in pure deferred rendering approaches by around 2005, so I purposely used several deferred rendering alternatives on Halo Wars. HW combined X360 specific multi-light forward lighting for precise lighting (or for lights that required dynamic shadowing and/or cookies), along with an approximate voxel-based method for much cheaper, lower priority "bulk" lights.

In the voxel based approximate approach, one volume texture contained the total incoming light at each voxel's center (irradiance), and the other contained the weighted average of all the light vectors from all the lights that influenced each voxel (with the total weight in alpha). From this data, it was possible to compute approximate exit radiances at any point/normal. This "voxel light buffering" system was very efficient and scalable, so it was also used to illuminate terrain chunks and meshes by most omni/spot lights in the game. As an added bonus, it was easy and cheap to illuminate particles using this approach too.

We couldn't use multi-light forward on everything because the GPU fillrate cost from each added light was too high, and our object vs. light overlap check (just simple bounding box vs. sphere or cone) was very rough. Especially on terrain chunks, the extra pixel cost was brutal.

I had the GPU update the voxel textures each frame. Lights became as cheap as super low-res sprites, so we could have hundreds of them. The voxel texture's bounds in worldspace always covered the terrain in the view frustum (up to a certain depth and world height). The resolution was either 64x64x16 or 128x128x16 (from memory).

I found this approach interesting because it could achieve both diffuse and specular lighting, and because it visually degraded well when more than one light hit a voxel. Anyhow, after Ensemble collapsed I put all this stuff into the back of my mind.

Source 2

When I first came to Valve after HW shipped (late 2009), I kept encouraging the graphics programmers to look for another viable alternative to deferred rendering. One day, in early to middle 2010, I came in to work and Shanon Drone showed me his amazing prototype implementation of a voxel based forward rendering variant of what is known as Light Indexed Deferred Lighting. (So what would the name be? Voxed Based Light Indexed Forward Lighting?)

From what I remember, his approach voxelized in projection space. Each voxel contained an integer index into a list of lights that overlapped with that voxel's worldspace bounding box. The light lists were stored in a separate texture. The CPU (using SIMD instructions) was used to update the voxel texture, and the CPU also filled in the lightlist texture.

The scene was rendered just like with forward lighting, except the pixel shader fetched from the voxel light index texture, and then iterated over all the lights using a dynamic loop. The light parameters were fetched from the lightlist texture. I don't remember where the # of lights per voxel was stored, it could have been in either texture.

The new approach was ported and tested on X360, and I remember we switched to packing the light indices (or something - it's been a long time!) into an array of constant registers to make the approach practical at all on X360.

I have no idea if this is published anywhere, but I remember the day very clearly as an internal breakthrough. It's been many years now, so I doubt anyone will mind me mentioning it.

Published material on light indexed deferred/forward rendering techniques (or similar variants):

I'll update this blog post as I run across links:

By Damian Trebilco:

Data Compression in Dungeon Boss

Dungeon Boss (DB) is a Unity iOS/Android title I helped Boss Fight Entertainment ship this year. I recently wrote a blog post basically containing a brain dump of all the low-level memory related things we learned in the trenches while trying to make the product stable.

Due to the sheer amount of content and code, DB couldn't have shipped at all on our target devices without aggressively utilizing every built-in Unity compression option we could get our hands on:

- Audio: We found and modified a custom editor script that allowed DB's sound designer to fine tune the music and FX audio bitrates to the absolutely minimum needed. We gave more bits to music vs. sound effects. (It also helped that DB's sound effects purposely have a classic 8-bit feel.)

- Meshes: We used Unity's built-in mesh compression feature as aggressively as we could. This caused a few issues on map meshes that we had to work around by hand.

- Animation: We tuned our FBX Importer compression settings several times to find the best balance between visible artifacts and memory/file savings.

- Texture compression: At the last minute, we had to switch the majority of our textures to PVRTC 2bpp to free up as much memory headroom as possible. Without doing this DB was too unstable on lower end iOS devices. Quality wise, 2bpp looked amazingly good (much better than I expected).

- Asset bundles: We disabled Unity 4.6's built-in LZMA asset bundle compression, grouped the assets by type, then we packed the raw type-grouped asset data using a better compressor (LZHAM). Thankfully, there were enough Unity modding tools available, and Unity was flexible enough, that we could do this without having access to the source code of Unity itself.

Also see this page in the Unity docs for more info on how to examine and reduce file sizes of Unity builds.

Sunday, December 13, 2015

Binary delta compression and seed dictionaries as universal OS services

Some late Saturday night compression ideas:

Brotli's inclusion of a seed dictionary that helps it gain higher ratios on textual data, and our recent work on offline and real-time binary delta compression (patching), has given me this idea:

First, Brotli's static dictionary isn't "breaking" any rules at all. What they've done makes a lot of sense. They've tailored their codec for a very common use case (text), a place where it will have a lot of value. The seed dictionary is a legitimate part of the decompressor, except instead of pure executable code they also tacked on some interesting "universal" data to it.

Which got me thinking, the OS itself should just include a universal seed dictionary, or a set of them for different data types. A small set of seed data, like even just 4-16MB, would be fine for starters. This data can be automatically distributed to clients as part of an OS update or upgrade. It could contain natural language phrases, common x86 opcode sequences, XML, common JPEG quantization tables/markers/headers, common GLSL/HLSL keywords, etc. Once Linux does it, everybody will do it sooner or later.

Normally, when you want to compress some data for either real-time transmission, or archival purposes, you would use plain non-delta based compression. You can't assume the remote side has your seed dictionary right? But this is a weak if not flawed argument, because the remote side already has a compatible decompressor. They can surely download a little more data to have a good, stable universal seed dictionary too. And if they only download it once (through OS updates), then it's not a big cost.

Alternately, we can adopt a new codec that has a good built-in selection of universal seed data, and embed that into the OS. But the seed data should be available to OS processes as well, so they can compress/decompress data against it without having to carry around and distribute their own private seed dictionaries.

Other related thoughts:


Obviously, static dictionaries don't help in all cases. No harm done, just don't use them in those cases.

This method can be applied to networking too. The seed dictionary can be made part of the actual protocol. Packets can be delta compressed against app specific or universal seed dictionaries.

New codecs should just support seed dictionaries as required features. New LZ algorithms should be competitive against bsdiff in binary patching scenarios.

In the open source world, we need something better than bsdiff, which requires huge amounts of RAM on large files (and is just unreliable). Another interesting area of work would be to figure out the "best" universal seed dictionary to use for a OS service like this.

Plain lossless data compression and binary delta compression should not be separate concepts or tools: let's unify them. LZ-Sub shows one potential way forward.

Friday, December 11, 2015

Why by-convention read-only shared state is bad

I'm going to switch topics from my usual stuff to something else I find interesting: multithreading.

Multithreading Skinning on Age of Empires 3


Back at Ensemble Studios, I remember working on the multithreaded rendering code in both Age of Empires 3 and Halo Wars. At the time (2003-2004 timeframe), multithreading was a new thing for game engines. Concepts like shared state, producer consumer queues, TLS, semaphores/events, condition variables, job systems, etc. were new things to many game engineers. (And lockfree approaches were considered very advanced, like Area 51 grade stuff.) Age of Empires 3 was a single threaded engine, but we did figure out how to use threading for at least two things I'm aware of: the loading screen, and jobified skinning.

On Age3, it was easy to thread the SIMD skinning code used to render all skinned meshes, because I analyzed the rendering code and determined that the input data we needed for "jobified" skinning was guaranteed to be stable during the execution of the skinning job. (The team also said it seemed "impossible" to do this, which of course just made me more determined to get this feature working and shipped.)

The key thing about making it work was this: I sat down and said to the other rendering programmer who owned this code, "do not change this mesh and bone data in between this spot and this spot in your code -- or the multithreaded skinning code will randomly explode!". We then filled in the newly introduced main thread bubble with some totally unrelated work that couldn't possibly modify our mesh/bone data (or so we thought!), and in our little world all was good.

This approach worked, but it was dangerous and fragile because it introduced a new constraint into the codebase that was hard to understand or even notice by just locally studying the code. If someone would have changed how the mesh renderer worked, or changed the data that our skinning jobs accessed, they could have introduced a whole new class of bugs (such as Heisenbugs) into the system without knowing it. Especially in a large rapidly changing codebase, this approach is a slippery slope.

We were willing to do it this way in the Age3 codebase because very few people (say 2-3 at most in the entire company) would ever touch this very deep rendering code. I made sure the other programmer who owned the system had a mental model of the new constraint, also made sure there were some comments and little checks in there, and all was good.

Importantly, Ensemble had extremely low employee turnover, and this low-level engine code rarely changed. Just in case the code exploded in a way I couldn't predict on a customer's machine, I put in a little undocumented command line option that disabled it completely.

Age3 and Halo Wars made a lot of use of the fork and join model for multithreading. Halo Wars used the fork and join model with a global system, and we allowed multiple threads (particularly the main and render threads) to fork and join independently.



See those sections where you have multiple tasks stacked up in parallel? Those parts of the computation are going to read some data, and if it's some global state it means you have a potential conflict with the main thread. Anything executed in those parallel sections of the overall computation must follow your newly introduced conventions on which shared state data must remain stable (and when), or disaster ensues.

Complexity Management


Looking back now, I realize this approach of allowing global access to shared state that is supposed to be read-only and stable by convention or assumption is not a sustainable practice. Eventually, somebody somewhere is going to introduce a change using only their "local" mental model of the code that doesn’t factor in our previously agreed upon data stability constraint, and they will break the system in a way they don't understand. Bugs will come in, the engine will crash in some way randomly, etc. Imagine the damage to a codebase if this approach was followed by dozens of systems, instead of just one.

Creating and modifying large scale software is all about complexity management. Our little Age3 threaded skinning change made the engine more complex in ways that were hard to understand, talk about, or reason about. When you are making decisions about how to throw some work onto a helper thread, you need to think about the overall complexity and sustainability of your approaches at a very high level, or your codebase is eventually going to become buggy and practically unchangeable.

One solution: RCU


One sustainable engineering solution to this problem is Read Copy Update (RCU, used in the Linux kernel). With RCU, there are simple clear policies to understand, like "you must bump up/down the reference to any RCU object you want to read”, etc. RCU does introduce more complexity, but now it’s a first class engine concept that other programmers can understand and reason about.

RCU is a good approach for sharing large amounts of data, or large objects. But for little bits of shared state, like a single int or whatever, the overhead is pretty high.

Another solution: Deep Copy


The other sustainable approach is to deep copy the global data into a temporary job-specific context structure, so the job won't access the data as shared/global state. The deep copy approach is even simpler, because the job just has its own little stable snapshot of the data which is guaranteed to stay stable during job execution.

Unfortunately, implementing the deep copy approach can be taxing. What if a low-level helper system used by job code (that previously accessed some global state) doesn’t easily have access to your job context object? You need a way to somehow get this data “down” into deeper systems.

The two approaches I’m aware of are TLS, or (for lack of a better phrase) “value plumbing”. TLS may be too slow, and is crossing into platform specific land. The other option, value plumbing, means you just pass the value (or a pointer to your job context object) down into any lower level systems that need it. This may involve passing the context pointer or value data down several layers through your callstacks.

But, you say, this plumbing approach is ugly because of all these newly introduced method variables! But, it is a sustainable approach in the long term, like RCU. Eventually, you can refactor and clean up the plumbing code to make it cleaner if needed, assuming it’s a problem at all. You have traded off some code ugliness for a solution that is reliable in the long term.


Conclusion


Sometimes, when doing large scale software engineering, you must make changes to codebases that may seem ugly (like value plumbing) as a tradeoff for higher sustainability and reliability. It can be difficult to make these choices, so some higher level thinking is needed to balance the long term pros and cons of all approaches.

Thursday, December 10, 2015

Okumura's "ar002" and reverse engineering the internals of Katz's PKZIP

Sometime around 1992, still in high school, I upgraded from my ancient 16-bit OS/9 computer to a 80286 DOS machine with a hard drive. I immediately got onto some local BBS's at the blistering speed of 1200 baud and discovered this beautiful little gem of a compression program written in the late 80's by Haruhiko Okumura at Matsusaka University:

http://oku.edu.mie-u.ac.jp/~okumura/compression/ar002/

奥村晴彦

I studied every bit of ar002 intensely, especially the Huffman related code. I was impressed and learned tons of things from ar002. It used a complex (to a teenager) tree-based algorithm to find and insert strings into a sliding dictionary, and I sensed there must be a better way because its compressor felt fairly slow. (How slow? We're talking low single digit kilobytes per second on a 12MHz machine if I remember correctly, or thousands of CPU cycles per input byte!)

So I started focusing on alternative match finding algorithms, with all experiments done in 8086 real mode assembly for performance because compilers at the time weren't very good. On these old CPU's a human assembly programmer could run circles around C compilers, and every cycle counted.

Having very few programming related books (let alone very expensive compression books!), I focused instead on whatever was available for free. This was stuff downloaded from BBS's, like the archiving tool PKZIP.EXE. PKZIP held a special place, because it was your gateway into this semi-secretive underground world of data distributed on BBS's by thousands of people. PKZIP's compressor internally used mysterious, patented algorithms that accomplished something almost magical (to me) with data. The very first program you needed to download to do much of anything was PKZIP.EXE. To me, data compression was a key technology in this world.

Without PKZIP you couldn't do anything with the archives you downloaded, they were just gibberish. After downloading an archive on your system (which took forever using Z-Modem), you would manually unzip it somewhere and play around with whatever awesome goodies were inside the archive.

Exploring the early data communication networks at 1200 or 2400 baud was actually crazy fun, and this tool and a good terminal program was your gateway into this world. There were other archivers like LHA and ARJ, but PKZIP was king because it had the best practical compression ratio for the time, it was very fast compared to anything else, and it was the most popular.

This command line tool advertised awesomeness. The help text screamed "FAST!". So of course I became obsessed with cracking this tool's secrets. I wanted to deeply understand how it worked and make my own version that integrated everything I had learned through my early all-assembly compression experiments and studying ar002 (and little bits of LHA, but ar002's source was much more readable).



I used my favorite debugger of all time, Borland's Turbo Debugger, to single step through PKZIP:



PKZIP was written by Phil Katz, who was a tortured genius in my book. In my opinion his work at the time is under appreciated. His Deflate format was obviously very good for its time to have survived all the way into the internet era.


I single stepped through all the compression and decompression routines Phil wrote in this program. It was a mix of C and assembly. PKZIP came with APPNOTE.TXT, which described the datastream format at a high level. Unfortunately, at the time it lacked some key information about how to decode each block's Huffman codelengths (which were themselves Huffman coded!), so you had to use reverse engineering techniques to figure out the rest. Also most importantly, it only covered the raw compressed datastream format, so the algorithms were up to you to figure out.

The most important thing I learned while studying Phil's code: the entire sliding dictionary was literally moved down in memory every ~4KB of data to compress. (Approximately 4KB - it's been a long time.) I couldn't believe he didn't use a ring buffer approach to eliminate all this data movement. This little realization was key to me: PKZIP spent many CPU cycles just moving memory around!

PKZIP's match finder, dictionary updater (which used a simple rolling hash), and linked list updater functions were all written in assembly and called from higher-level C code. The assembly code was okay, but as a demoscener I knew it could be improved (or at least equaled) with tighter code and some changes to the match finder and dictionary updating algorithms.

Phil's core loops would use 32-bit instructions on 386 CPU's, but strangely he turned off/on interrupts constantly around the code sequences using 32-bit instructions. I'm guessing he was trying to work around interrupt handlers or TSR's that borked the high word in 32-bit registers.

To fully reverse engineer the format, I had to feed in very tiny files into PKZIP's decompressor and single step through the code near the beginning of blocks. If you paid very close attention, you could build a mental model of what the assembly code was doing relative to the datastream format described in APPNOTE.TXT. I remember doing it, and it was slow, difficult work (in an unheated attic on top of things).

I wrote my own 100% compatible codec in all-assembly using what I thought were better algorithms, and it was more or less competitive against PKZIP. Compression ratio wise, it was very close to Phil's code. I started talking about compression and PKZIP on a Fidonet board somewhere, and this led to the code being sublicensed for use in EllTech Development's "Compression Plus" compression/archiving middleware product.

For fun, here's the ancient real-mode assembly code to my final Deflate compatible compressor. Somewhere at the core of my brain there is still a biologically-based x86-compatible assembly code optimizer. Here's the core dictionary updating code, which scanned through new input data and updated its hash-based search accelerator:

;--------------- HASH LOOP
ALIGN 4
HDLoopB:
                ReptCount = 0

                REPT    16

                Mov     dx, [bp+2]

                Shl     bx, cl
                And     bh, ch
                Xor     bl, dl
                Add     bx, bx
                Mov     ax, bp
                XChg    ax, [bx+si]
                Mov     es:[di+ReptCount], ax
                Inc     bp

                Shl     bx, cl
                And     bh, ch
                Xor     bl, dh
                Add     bx, bx
                Mov     ax, bp
                XChg    ax, [bx+si]
                Mov     es:[di+ReptCount+2], ax
                Inc     bp

                ReptCount = ReptCount + 4

                ENDM

                Add     di, ReptCount

                Dec     [HDCounter]
                Jz      HDOdd
                Jmp     HDLoopB

As an exercise while learning C I ported the core hash-based LZ algorithms I used from all-assembly. I uploaded two variants to local BBS's as "prog1.c" and "prog2.c". These little educational compression programs were referenced in the paper "A New Approach to Dictionary-Based Lossless Compression" (2004), and the code is still floating around on the web.

I rewrote my PKZIP-compatible codec in C, using similar techniques, and this code was later purchased by Microsoft (when they bought Ensemble Studios). It was used by Age of Empires 1/2, which sold tens of millions of copies (back when games shipped in boxes this was a big deal). I then optimized this code to scream on the Xbox 360 CPU, and this variant shipped on Halo Wars, Halo 3, and Forza 2. So if you've played any of these games, somewhere in there you where running a very distant variant of the above ancient assembly code.

Eventually the Deflate compressed bitstream format created by Phil Katz made its way into an internet standard, RFC 1951.

A few years ago, moonlighting while at Valve, I wrote an entirely new Deflate compatible codec called "miniz" but this time with a zlib compatible API. It lives in a single source code file, and the entire stream-capable decompressor lives in a single C function. I first wrote it in C++, then quickly realized this wasn't so hot of an idea (after feedback from the community) so I ported it to pure C. It's now on Github here. In its lowest compression mode miniz also sports a very fast real-time compressor. miniz's single function decompression function has been successfully compiled and executed on very old processors, like the 65xxx series.

Conclusion


I don't think it's well known that the work of compression experts and experimenters in Japan significantly influenced early compression programmers in the US. There's a very interesting early history of Japanese data compression work on Haruhiko Okumura's site here. Back then Japanese and American compression researchers would communicate and share knowledge:
"At one time a programmer for PK and I were in close contact. We exchanged a lot of ideas. No wonder PKZIP and LHA are so similar."
I'm guessing he's referring to Phil Katz? (Who else could it have been?)

Also, I believe Phil Katz deserves more respect for creating Inflate and Deflate. The algorithm obviously works, and the compressed data format is an internet standard so it will basically live forever. These days, Deflate seems relatively simple, but at the time it was anything but simple to workers in this field. Amazingly, good old Deflate still has some legs, as Zopfli and 7-zip's optimizing Deflate-compatible compressors demonstrate.

Finally, I personally learned a lot from studying the code written by these inspirational early data compression programmers. I'll never get to meet Phil Katz, but perhaps one day I could meet Dr. Okumura and say thanks.

The awesomeness that is ClangFormat

While working on the Linux project at Valve we started using this code formatting tool from the LLVM folks:

http://clang.llvm.org/docs/ClangFormat.html

Coding StandardsMike Sartain (now at Rad) recommended we try it. ClangFormat is extremely configurable, and can handle more or less any convention you could want. It's a top notch tool. Here's vogl's .clang-format configuration file to see how easy it is to configure.

One day on the vogl GL debugger project, I finally got tired of trying to enforce or even worry about the lowest level bits of coding conventions. Things like if there should be a space before/after pointers, space after commas, bracket styles, switch styles, etc.

I find codebases with too much style diversity increase the mental tax of parsing code, especially for newcomers. Even worse, for both newcomers and old timers: you need to switch styles on the fly as you're modifying code spread throughout various files (to ensure the new code fits in to its local neighbor). After some time in such a codebase, you do eventually build a mental model and internalize each local neighborhood's little style oddities. This adds no real value I can think of, it only subtracts as far as I can tell.

I have grown immune to most styles now, but switching between X different styles to write locally conforming code seems like an inefficient use of programmer time.

In codebases like this, trying to tell newcomers to use a single nice and neat company sanctioned style convention isn't practically achievable. A codebase's style diversity tends to increase over time unless you stop it early. Your codebase is stuck in a style vortex.

So sometimes it makes sense to just blow away the current borked style and do a drive-by ClangFormat on all files and check them back in. Of course this can make diff'ing and 3-way merging harder, but after a while that issue mostly becomes moot as churn occurs. It's a traumatic thing, yes, but doable.

Next, you can integrate ClangFormat into your checkin process, or into the editor. Require all submitted code to be first passed through ClangFormat. Detecting divergence can be part of the push request process, or something.

On new codebases, be sure to figure this out early or you'll get stuck in the style diversity vortex.

Microsoft's old QuickBASIC product from the 80's would auto-format to a standard style as lines were being entered and edited. 

Perhaps in the far future (or a current alternate universe), the local coding style can just be an editor, diff-time, and grep-time only thing. Let's ditch text completely and switch to some sort of parse tree or AST that also preserves human things like comments.

Then, the editor just loads and converts to text, you edit there, and it converts back to binary for saving and checkin. With this approach, it should be possible to switch to different code views to visualize and edit the code in different ways. (I'm sure this has all been thought of 100 times already, and it ended in tears each time it was tried.)

As a plus, once the code has been pre-parsed perhaps builds can go a little faster.

Anyhow, here we are in 2015. We individually have personal super computers, and yet we're still editing text on glass TTY's (or their LCD equivalents). I now have more RAM than the hard drives I owned until 2000 or so. Hardware technology has improved massively, but software technology hasn't caught up yet.

Monday, December 7, 2015

The future of GPU texture compression

Google engineers were the first to realize the value of crunch (original site here), my advanced lossy texture compression library and command line toolset for DXTc textures that was recently integrated into Unity 5.x. Here's Brandon Jones at Google describing how crunch can be used in WebGL apps in his article "Saving Bandwidth and Memory with WebGL and Crunch", from the book "HTML5 Game Development Insights".

While I was writing crunch I was only thinking "this is something useful for console and PC game engines". I had no idea it could be useful for web or WebGL apps. Thinking back, I should have sat down and asked myself "what other software technology, or what other parts of the stack, will need to deal with compressed GPU textures"? (One lesson learned: Learn Javascript!)

Anyhow, crunch is an example of the new class of compression solutions opened up by collapsing parts of the traditional game data processing tool chain into single, unified solutions.

So let's go forward and break down some artificial barriers and combine knowledge across a few different problem domains:

- Game development art asset authoring methods
- Game engine build pipelines/data preprocessing
- GPU texture compression
- Data compression
Rate distortion optimization

The following examples are for DXTc, but they apply to other formats like PVRTC/ETC/etc. too. (Of course, many companies have different takes on the pipelines described here. These are just vanilla, general examples. Id Software's megatexturing and Allegorithmic's tech use very different approaches.)

The old way





The previous way of creating DXTc GPU textures was (this example is the Halo Wars model):

1. Artists save texture or image as an uncompressed file (like .TGA or .PNG) from Photoshop etc.

2. We call a command line tool which grinds away to compress the image to the highest quality achievable at a fixed bitrate (the lossy GPU texture compression step). Alternately, we can use the GPU to accelerate the process to near-real time. (Both solve the same problem.)

This is basically a fixed rate lossy texture compressor with a highly constrained standardized output format compatible with GPU texture hardware.

Now we have a .DDS file, stored somewhere in the game's repo.

3. To ship a build, the game's asset bundle or archive system losslessly packs the texture, using LZ4, LZO, Deflate, LZX, LZMA, etc. - this data gets shipped to end users

The Current New Way




The "current" new way is a little less complex (at the high level) because we delete the lossless compression step in stage 3. Step 2 now borrows a "new" concept from the video compression world, Rate Distortion Optimization (RDO), and applies it to GPU texture compression:

1. Artists selects a JPEG-style quality level and saves texture or image as an uncompressed file (like .TGA or .PNG) from Photoshop etc.

2. We call a command line tool called "crunch" that combines lossy clusterized DXTc compression with VQ, followed by an custom optimizing lossless backend coder. Now we have a .CRN file at some quality level, stored somewhere in the game's repo

3. To ship a build, game's asset bundle or archive system stores the .CRN file uncompressed (because it's already compressed earlier) - this data gets shipped to end users

The most advanced game engines, such as Unity and some other AAA in-house game engines, do it more or less this way now.

The Other New Way (that nobody knows about)

















1. Artists hits a "Save" button, and a preview window pops up. Artist can tune various compression options in real-time to find the best balance between lossy compression artifacts and file size. (Why not? It's their art. This is also the standard web way, but with JPEG.) "OK" button saves a .CRN and .PNG file simultaneously.

2. To ship a build, game's asset bundle or archive system stores the .CRN file uncompressed (because it's already been compressed) - this data gets shipped to end users

But step #1 seems impossible right? crunch's compression engine is notoriously slow, even on a 20 core Xeon machines. Most teams build .CRN data in the cloud using hundreds to thousands of machines. I jobified the hell out of crunch's compressor, but it's still very slow.

Internally,  the crunch library has a whole "secret" set of methods and classes that enable this way forward. (Interested? Start looking in the repo in this file here.) 

The Demo of real-time crunch compression


Here's a Windows demo showing crunch-like compression done in real-time. It's approximately 50-100x faster than the command line tool's compression speed. (I still have the source to this demo somewhere, let me know if you would like it released.) 

This demo utilizes the internal classes in crnlib to do all the heavy lifting. All the real code is already public. These classes don't output a .CRN file though, they just output plain .DDS files which are then assumed to be losslessly compressed later. But there's no reason why a fast and simple (non-optimizing) .CRN backend couldn't be tacked on, the core concepts are all the same.

One of the key techniques used to speed up the compression process in the QDXT code demonstrated in this demo is jobified Tree Structured VQ (TSVQ), described here.

GPU texture compression tools: What we really want




The engineers working on GPU texture compression don't always have a full mental model of how texture assets are actually utilized by game makers. Their codecs are typically optimized for either highest possible quality (without taking eons to compress), or they optimize for fastest compression time with minimal to no quality loss (relative to offline compression). These tools ignore the key distribution problems that their customers face completely, and they don't allow artists to control the tradeoff between quality vs. filesize like 25 year old standard formats such as JPEG do.

Good examples of this class of tools:

Intel: Fast ISPC Texture Compressor

NVidia: GPU Accelerated Texture Compression

libsquish

These are awesome, high quality GPU texture compression tools/libs, with lots of customers. Unfortunately they solve the wrong problem.

What we really want, are libraries and tools that give us additional options that help solve the distribution problem, like rate distortion optimization. (As an extra bonus, we want new GPU texture formats compatible with specialized techniques like VQ/clusterization/etc. But now I'm probably asking for too much.)

The GPU vendors are the best ones to bridge the artificial divides described earlier. This is some very specialized technology, and the GPU format engineers just need to learn more about compression, machine learning, entropy coding, etc. Make sure, when you are designing a new GPU texture format, that you release something like crunch for that format, or it'll be a 2nd class format to your customers.

Now, the distant future




Won Chun (then at Google, now at Rad) came up with a great idea a few years back. What the web and game engine worlds could really use is a "Universal Crunch" format. A GPU agnostic "download anywhere" format, that can be quickly transcoded into any other major format, like DXTc, or PVRTC, or ASTC, etc. Such a texture codec would be quite an undertaking, but I've been thinking about it for years and I think it's possible. Some quality tradeoffs would have to be made, of course, but if you like working on GPU texture compression problems, or want to commercialize something in this space, perhaps go in this direction.