Sunday, February 9, 2014

picojpeg: Decoding Lena on a Tandy Color Computer 3

Got a small grayscale version of Lena decoding to the 640x192x4 (HSCREEN 4) graphics mode using my picojpeg.c module. Here it is on an old LCD monitor hooked up to the CoCo's composite output:


In case you didn't know, the Color Computer is a classic 8/16-bit personal home computer from Tandy Corp./Radio Shack. The particular model I'm using (CoCo 3) was released in 1986. (As a kid I always wanted a CoCo 3, but I was stuck with a 16KB CoCo 2 and by the time I could afford to upgrade I had moved on to the PC.)

This is using a 2x2 ordered dither matrix, zoomed by 2x horizontally (displayed as 256x128), using composite out as a sort of inherent low pass filter. The BASIC program currently sets the graphics mode, programs the grayscale palette, and then jumps into the C code's _start function:

10 CLEAR12,9984
20 LOADM"TEST.BIN
30 HSCREEN4
40 PALETTE0,0
50 PALETTE1,16
60 PALETTE2,32
70 PALETTE3,48
80 EXEC9991

The C code (compiled using gcc6809 under Linux) disables interrupts, maps the graphics pages into the ROM area (starting at 0x8000, ending at 0xFEFF or thereabouts) using the MMU0 registers, then decodes the ~4KB JPEG and plots the pixels as each JPEG MCU block is decoded. It currently takes about 45 seconds to decode the 128x128 image, which beats the PIC18F microcontroller I was using a few years ago to originally test this code (that thing would take 10-20 minutes!).

Everything is completely unoptimized - I'm just glad to get it working at all. Using a CoCo to prototype/test this kind of stuff is actually much nicer overall than working on a tiny microcontroller. The CoCo community has compiled a large suite of documentation over the years, and there's a very nice set of tools available. Not to mention I've got built-in (albeit limited) graphics, 128KB of RAM, a 6-bit DAC for sound output, and various forms of input. The 6809 is a surprisingly powerful 8/16-bit CPU to work on - if you push it right.

This poor 1986 CoCo3 has seen better days (the case is an all-yellow wreck). I need to find a machine in better shape:


Zoomed in further, with a 4x4 dither matrix:


picojpeg on a Tandy Color Computer 3 (6809 processor)



I've gotten my picojpeg.c module decoding JPEG images on the (almost) 30 year old CoCo 3. Here's my first 16x16 decoded image on a 6809 CPU. (For all I know, it could be the first JPEG to be decoded on this particular CPU.) The upper nibble of the red component was written to each character of the default low-res 32x16 text screen. It's currently damn ugly, but this is good enough to verify the pixels are being decoded in this test image.

For the compiler, I'm using gcc6809, compiled under Ubuntu 13.10 (with several evil workarounds to get the compiler to build at all), ToolShed to convert the .bin file to a CoCo .dsk image file (running via Wine), and DriveWire 4 (running on another machine with Win7) to transfer the .dsk file to the CoCo. The .dsk image also has a small MS BASIC program to LOADM the .bin file and EXEC it at the right start address.

Unfortunately, I can't run it again without restarting the machine and booting up DriveWire again -- no idea why yet. If I hit reset and EXEC the decoder again it does something but the output image is bogus. This process is terribly slow because after restarting the machine I must CLOADM the DriveWire 4 client over the cassette port (hooked up to my PC's sound output). The DriveWire booting process uses Winamp to play back the cassette sound file!



 The test JPEG, enlarged by 8x and converted to PNG:


The next step is to figure out why I can't restart it without shutting down, and then switch it over to a CoCo 3 16-color graphics mode and add some sort of dithered output. After that comes lots of 6809 specific optimization. It'll still be slow as molasses, even after tuning it, but it's fun anyway.

Figuring out how to build gcc6809 under Ubuntu 13.10, then to get this sucker to output CoCo compatible .bin files was a pain. Figuring out how to enable gcc compiler optimizations without causing aslink to core dump was even more fun. I'll post all the things I had to do soon. It's pretty cool to have a modern C compiler capable of targeting the machine you started with..

Sunday, February 2, 2014

Few tips from Mike Sartain on QtCreator Projects, Ninja warning/error parsing

Mike has been improving how we use QtCreator on vogl a lot recently:

QtCreator projects

QtCreator and Ninja warning/error parsing

QtCreator's support for custom builds (in our case, cmake+ninja) is surprisingly flexible.

Friday, January 17, 2014

GL Game Devs: Please Send Us Your Game Traces!

We're going through all games in the Steam Linux catalog to ensure they are compatible with our new GL tracer/debugger toolset (VOGL). But this is backwards looking: what we really want is to ensure our full-stream tracing and state snapshot/restore code is compatible and correct with your new (most likely unreleased) GL code.

So if you would like to help right now, please use apitrace (on any "big" GL platform: Linux/OSX/Windows) to make a 30-90 second trace of your app's gameplay:
https://github.com/apitrace/apitrace

The more representative the trace is of your actual (end-user) GL usage, the better. We're especially interested in advanced GL 3.x/4.x usage patterns, or any new extensions you use. We'll add your trace to our private correctness and regression testing repo. We'll replay your trace, capture it with our tools, and take it from there.

E-mail me (richg at valvesoftware dot com) with the location, or for very large traces that are too big for dropbox, etc. we'll set you up a private FTP site.

Also, if your game is already on Steam Linux and you're actively debugging GL issues I can move your app to the top of the testing list.

Saturday, January 11, 2014

VOGL OpenGL Tracer/Debugger - Bonus Content

There's a bunch of content I wanted to get into our Steam Dev Days presentation on our new OpenGL tracer/debugger (VOGL), but you can only cram in so much into a 20 minute presentation with 5-10 minutes set aside for demos. Here's the missing content:

Dev Environment

  • All code written and tested on Linux - Not a port
  • Distros we're developing on: Kubuntu 13.10, Ubuntu 12.04, Linux Mint
  • IDE: QtCreator v3.0.0
  • Building: cmake+ninja
  • Compiler: clang v3.3
    • gcc v4.6 works too, but is very slow
  • chroots used to standardize our build environment across dev machines
  • Source control: Mercurial, TortoiseHG


QtCreator v3.0.0: IDE, gdb debugger front-end, integrated source control:




First Non-Divergent Replay: Portal


  • 5.4 megacalls, 1.9 GB trace file




VOGL's Current GL Compatibility

  • GL v1 - v3.3, core or compatibility contexts, partial support for GL v4.x (full 4.x later this year)
  • Tracer: 2,652 functions, almost all auto-genned. Replayable: 1,498 functions, ~45% auto-genned
  • Fully supported extensions:
  • AMD_draw_buffers_blend, ARB_blend_func_extended, ARB_color_buffer_float, ARB_copy_buffer, ARB_create_context, ARB_debug_output, ARB_draw_buffers, ARB_draw_buffers_blend, ARB_draw_elements_base_vertex, ARB_framebuffer_object, ARB_get_proc_address, ARB_get_program_binary, ARB_gpu_shader_fp64, ARB_instanced_arrays, ARB_internalformat_query, ARB_internalformat_query2, ARB_map_buffer_range, ARB_multisample, ARB_multitexture, ARB_occlusion_query, ARB_point_parameters, ARB_program_interface_query, ARB_provoking_vertex, ARB_sample_shading, ARB_shader_atomic_counters, ARB_shader_objects, ARB_sync, ARB_texture_buffer_object, ARB_texture_compression, ARB_texture_multisample, ARB_texture_storage, ARB_texture_storage_multisample, ARB_timer_query, ARB_transpose_matrix, ARB_uniform_buffer_object, ARB_vertex_array_object, ARB_vertex_buffer_object, ARB_vertex_program, ARB_vertex_shader, ARB_vertex_type_2_10_10_10_rev, ARB_viewport_array, ARB_window_pos, EXT_bindable_uniform, EXT_blend_color, EXT_blend_equation_separate, EXT_blend_func_separate, EXT_blend_minmax, EXT_compiled_vertex_array, EXT_cull_vertex, EXT_depth_bounds_test, EXT_draw_buffers2, EXT_draw_instanced, EXT_draw_range_elements, EXT_fog_coord, EXT_framebuffer_blit, EXT_framebuffer_multisample, EXT_framebuffer_object, EXT_geometry_shader4, EXT_gpu_program_parameters, EXT_gpu_shader4, EXT_multi_draw_arrays, EXT_multisample, EXT_paletted_texture, EXT_point_parameters, EXT_polygon_offset, EXT_provoking_vertex, EXT_secondary_color, EXT_stencil_two_side, EXT_subtexture, EXT_swap_control, EXT_texture3D, EXT_texture_buffer_object, EXT_texture_integer, EXT_texture_object, EXT_timer_query, GREMEDY_frame_terminator, GREMEDY_string_marker, NV_vertex_program4, SGI_swap_control

  • Partial support for many more extensions, prioritized by usage and importance. ARB/EXT higher priority vs. vendor specific.
  • sharelist support
    • Replayer class is currently single threaded and automatically issues MakeCurrent()’s as needed.
    • Trace packets have timestamps and a global call counter, replayer issues them in “wall clock” time order.
  • Lots of support for old-school GL API’s:
    • ARB assembly language shaders (ARB_vertex_program and ARB_pixel_program)
    • Client side arrays (CSA’s):
      • Set via glVertexAttribPointer, glNormalPointer, glTexCoordPointer, glInterleavedArrays, etc.
    • glBegin/glEnd
    • Display lists:
      • Currently only support the most popular usages: whitelist of ~500 funcs, non-recursive, only texture binding
    • Fixed function pipeline:
      • Lights, texgen, texenv, materials, lights, matrix stacks, etc.

Completed and Short-Term Goals


  • Completed goals:
    • Survey all existing solutions, determine strengths/weaknesses of each.
    • Build database of all known GL API’s. No single definitive source found - so we combine:
      • Old Khronos .spec, apitrace's glapi.py, and Fournier's "gl-spec-parser" web scraper.
      • New Khronos XML spec not used yet - wasn’t available at the time.
    • Create 32/64-bit tracer SO and replayer tool based off a common set of reusable C++ classes
      • Replayer class accepts arbitrary packets from any source (even generated on the fly)
      • State snapshot/restore classes for generating GL state snapshots and restoring them
        • Snapshot classes just make standard GL calls, no knowledge of tracing or replaying
        • Any GL state that needs shadowing is handled by the tracer or replayer itself
      • All state objects serializable/deserializable to JSON/UBJ+binary files
    • Test tools on a variety of real and synthetic call streams, get as many apps to work as possible, iterate
  • Current goals:
    • Finish GL 3.x support before moving to 4.x - only a handful of GL 3.x state to snapshot/restore remaining at this point
    • Trace editor UI - needs a lot of love
    • Build library of app traces, implement continuous automated regression testing
    • Profile and optimize GL replayer class, build driver benchmarking tool
      • We’re already ~90% faster than apitrace’s replayer in -benchmark mode on Metro Last Light


Longer Term Goals


  • Trace editor/debugger UI
    • Full control over server: Click button to launch app on Steambox, another to capture frame, etc.
    • Full display of GL state vector, state vector diff’ing between two calls
    • Obvious things: Live editing, vertex/pixel history, CPU and GPU profiling, etc.
  • On the fly tracing
    • Tracer records state snapshots every X seconds, also continuously records ring buffer of GL trace packets
    • User clicks to save trace containing previous X seconds of gameplay to new trace file
  • Faster looping and seeking through huge traces (“DVR” replayer mode)
    • Lazy program shader compilation and linking during state restore/playback
    • Automatic keyframe generation during tracing or playback - use worker thread to write snapshots
    • Generate delta state snapshot objects, for fast seeking between keyframes/faster frame looping
  • Replayer perf: Fully multithreaded playback pipeline
    • For each context: thread A decodes packets and composes x86 opcodes to call GL, thread B execs this
  • Really long term: Vendor neutral shader debugging using standard GL calls
    • Compile shader to AST, insert atomic append ops that dump shader IP+variable state after each op to a huge buffer (only on selected vertex/pixel), output AST as GLSL, run draw with this shader
    • UI reads and parses buffer, now we have all the info we need to simulate shader stepping in a UI
    • Easier said than done, but we believe it’s inevitable that someone is going to do this right.


JSON Support Details


  • Binary traces convertible to JSON traces and vice versa
    • Binary traces are dumped to: one JSON file per frame, loose files for large data blobs contained in trace packets, and the trace’s .zip archive.
    • JSON traces guaranteed lossless vs. binary: float/double’s coded as hex strings when needed.
    • .zip archive can be unzipped and deleted, trace is still replayable from loose files, and loose files take precedence over archive files.
  • Direct playback and debugging of JSON traces
  • JSON traces and blob files designed to be manually editable
    • Replayer does its best to fix up/adjust GL call parameters as needed.
    • Many fields optional, and we’ve tried to avoid “magic” keys or field interdependencies.
  • Binary traces also make extensive use the Universal Binary JSON (UBJ) format: http://ubjson.org/
  • The voglcommon lib contains helper classes to read/write traces and packets, or you can read the JSON data yourself.


References




Minimal OpenGL JSON Trace

Sample JSON full-stream OpenGL trace, visualized as a graph:



Source:
// draw_triangle.json - Draws 1 white triangle on a gray background
{
   "meta" : { "cur_frame" : 0, "eof" : true }, "sof" : { "pointer_sizes" : 4 },
   
   "packets" : [
    { "func" : "glXCreateContext", "context" : "0x0", "params" : { "dpy" : "0x1", "vis" : "0x1", "shareList" : "0x0", "direct" : true  }, "return" : "0x1" },
{ "func" : "glXMakeCurrent", "context" : "0x0", "params" : { "dpy" : "0x1", "drawable" : "0x1", "context" : "0x1" }, "return" : true },
   
    { "func" : "glViewport", "params" : { "x" : 0, "y" : 0, "width" : 400, "height" : 200 } },
   
    { "func" : "glClearColor", "params" : { "red" : 0.25, "green" : .25, "blue" : .25, "alpha" : 1. } },
{ "func" : "glClear", "params" : { "mask" : "0x4000" } },
   
    { "func" : "glMatrixMode", "params" : { "mode" : "GL_PROJECTION" }, },

    { "func" : "glLoadIdentity" },
   
    { "func" : "glMatrixMode", "params" : { "mode" : "GL_MODELVIEW" } },
    { "func" : "glLoadIdentity" },
     
    { "func" : "glColor3f", "params" : { "red" : 1., "green" : 1., "blue" : 1. }, },
    { "func" : "glScalef", "params" : { "x" : 0.2, "y" : 0.2, "z" : 1. } },
    { "func" : "glTranslatef", "params" : { "x" : -1.5, "y" : 0., "z" : 0. } },

    { "func" : "glBegin", "params" : { "mode" : "GL_TRIANGLES" } },
    { "func" : "glVertex2f", "params" : { "x" : 0., "y" : 4. } },
    { "func" : "glVertex2f", "params" : { "x" : 4., "y" : 0. }, },
    { "func" : "glVertex2f", "params" : { "x" : 0., "y" : 0. } },
    { "func" : "glEnd" },

    { "func" : "glXSwapBuffers", "params" : {"dpy" : "0x1", "drawable" : "0x1" } }
]
}

Output from voglreplay draw_triangle.json -dump_screenshots:


OpenGL State Snapshot JSON File - Visualized as a Graph

Been using graphviz's dot tool to visualize JSON files containing OpenGL state snapshots. This tree only contains high-level GL state. The large texture, buffer, shader, etc. data is written to loose binary files and referenced by unique filenames in the JSON data, which isn't represented here.

I had to limit the max # of values per JSON array/object to 20 otherwise dot falls over and takes hours to complete. I wish I knew of a faster graph visualization tool.