Sunday, February 7, 2021

Lagrangian multiplier based RDO encoding early outs

Some minor observations about Lagrangian multiplier based RDO (with BC7 RDO+Deflate or LZ4):

We're optimizing to find lowest t (sometimes called j), given many hundreds/thousands of ways of encoding a BC7 block:

float t = trial_mse * smooth_block_error_scale + trial_lz_bits * lambda;

For each trial block, we compute its MSE and estimate its LZ bits using a simple Deflate/LZ4-like model.

If we already have a potential solution for a block (the best found so far), given the trial block's MSE and the current best_t we can compute how many bits (maximum) a new trial encoding would take to be an improvement. If the number of computed threshold bits is ridiculous (like negative, or just impossible to achieve with Deflate on a 128-bit block input), we can immediately throw out that trial block:

threshold_trial_lz_bits = (best_t - trial_mse * smooth_block_error_scale ) / lambda

Same for MSE: if we already have a solution, we can compute the MSE threshold where it's impossible for a trial to be an improvement:

threshold_trial_mse  = (best_t - (trial_lz_bits * lambda)) /  smooth_block_error_scale

This seems less valuable because running the LZ simulator to compute trial_lz_bits is likely more expensive than computing a trial block's MSE. We could plug in a lowest possible estimate for trial_lz_bits, and use that as a threshold MSE. Another interesting thing about this: trials are very likely to always have an MSE >= than the best found encoding for a block.

Using simple formulas like this results in large perf. improvements (~2x).

No comments:

Post a Comment