Saturday, January 11, 2014

VOGL OpenGL Tracer/Debugger - Bonus Content

There's a bunch of content I wanted to get into our Steam Dev Days presentation on our new OpenGL tracer/debugger (VOGL), but you can only cram in so much into a 20 minute presentation with 5-10 minutes set aside for demos. Here's the missing content:

Dev Environment

  • All code written and tested on Linux - Not a port
  • Distros we're developing on: Kubuntu 13.10, Ubuntu 12.04, Linux Mint
  • IDE: QtCreator v3.0.0
  • Building: cmake+ninja
  • Compiler: clang v3.3
    • gcc v4.6 works too, but is very slow
  • chroots used to standardize our build environment across dev machines
  • Source control: Mercurial, TortoiseHG


QtCreator v3.0.0: IDE, gdb debugger front-end, integrated source control:




First Non-Divergent Replay: Portal


  • 5.4 megacalls, 1.9 GB trace file




VOGL's Current GL Compatibility

  • GL v1 - v3.3, core or compatibility contexts, partial support for GL v4.x (full 4.x later this year)
  • Tracer: 2,652 functions, almost all auto-genned. Replayable: 1,498 functions, ~45% auto-genned
  • Fully supported extensions:
  • AMD_draw_buffers_blend, ARB_blend_func_extended, ARB_color_buffer_float, ARB_copy_buffer, ARB_create_context, ARB_debug_output, ARB_draw_buffers, ARB_draw_buffers_blend, ARB_draw_elements_base_vertex, ARB_framebuffer_object, ARB_get_proc_address, ARB_get_program_binary, ARB_gpu_shader_fp64, ARB_instanced_arrays, ARB_internalformat_query, ARB_internalformat_query2, ARB_map_buffer_range, ARB_multisample, ARB_multitexture, ARB_occlusion_query, ARB_point_parameters, ARB_program_interface_query, ARB_provoking_vertex, ARB_sample_shading, ARB_shader_atomic_counters, ARB_shader_objects, ARB_sync, ARB_texture_buffer_object, ARB_texture_compression, ARB_texture_multisample, ARB_texture_storage, ARB_texture_storage_multisample, ARB_timer_query, ARB_transpose_matrix, ARB_uniform_buffer_object, ARB_vertex_array_object, ARB_vertex_buffer_object, ARB_vertex_program, ARB_vertex_shader, ARB_vertex_type_2_10_10_10_rev, ARB_viewport_array, ARB_window_pos, EXT_bindable_uniform, EXT_blend_color, EXT_blend_equation_separate, EXT_blend_func_separate, EXT_blend_minmax, EXT_compiled_vertex_array, EXT_cull_vertex, EXT_depth_bounds_test, EXT_draw_buffers2, EXT_draw_instanced, EXT_draw_range_elements, EXT_fog_coord, EXT_framebuffer_blit, EXT_framebuffer_multisample, EXT_framebuffer_object, EXT_geometry_shader4, EXT_gpu_program_parameters, EXT_gpu_shader4, EXT_multi_draw_arrays, EXT_multisample, EXT_paletted_texture, EXT_point_parameters, EXT_polygon_offset, EXT_provoking_vertex, EXT_secondary_color, EXT_stencil_two_side, EXT_subtexture, EXT_swap_control, EXT_texture3D, EXT_texture_buffer_object, EXT_texture_integer, EXT_texture_object, EXT_timer_query, GREMEDY_frame_terminator, GREMEDY_string_marker, NV_vertex_program4, SGI_swap_control

  • Partial support for many more extensions, prioritized by usage and importance. ARB/EXT higher priority vs. vendor specific.
  • sharelist support
    • Replayer class is currently single threaded and automatically issues MakeCurrent()’s as needed.
    • Trace packets have timestamps and a global call counter, replayer issues them in “wall clock” time order.
  • Lots of support for old-school GL API’s:
    • ARB assembly language shaders (ARB_vertex_program and ARB_pixel_program)
    • Client side arrays (CSA’s):
      • Set via glVertexAttribPointer, glNormalPointer, glTexCoordPointer, glInterleavedArrays, etc.
    • glBegin/glEnd
    • Display lists:
      • Currently only support the most popular usages: whitelist of ~500 funcs, non-recursive, only texture binding
    • Fixed function pipeline:
      • Lights, texgen, texenv, materials, lights, matrix stacks, etc.

Completed and Short-Term Goals


  • Completed goals:
    • Survey all existing solutions, determine strengths/weaknesses of each.
    • Build database of all known GL API’s. No single definitive source found - so we combine:
      • Old Khronos .spec, apitrace's glapi.py, and Fournier's "gl-spec-parser" web scraper.
      • New Khronos XML spec not used yet - wasn’t available at the time.
    • Create 32/64-bit tracer SO and replayer tool based off a common set of reusable C++ classes
      • Replayer class accepts arbitrary packets from any source (even generated on the fly)
      • State snapshot/restore classes for generating GL state snapshots and restoring them
        • Snapshot classes just make standard GL calls, no knowledge of tracing or replaying
        • Any GL state that needs shadowing is handled by the tracer or replayer itself
      • All state objects serializable/deserializable to JSON/UBJ+binary files
    • Test tools on a variety of real and synthetic call streams, get as many apps to work as possible, iterate
  • Current goals:
    • Finish GL 3.x support before moving to 4.x - only a handful of GL 3.x state to snapshot/restore remaining at this point
    • Trace editor UI - needs a lot of love
    • Build library of app traces, implement continuous automated regression testing
    • Profile and optimize GL replayer class, build driver benchmarking tool
      • We’re already ~90% faster than apitrace’s replayer in -benchmark mode on Metro Last Light


Longer Term Goals


  • Trace editor/debugger UI
    • Full control over server: Click button to launch app on Steambox, another to capture frame, etc.
    • Full display of GL state vector, state vector diff’ing between two calls
    • Obvious things: Live editing, vertex/pixel history, CPU and GPU profiling, etc.
  • On the fly tracing
    • Tracer records state snapshots every X seconds, also continuously records ring buffer of GL trace packets
    • User clicks to save trace containing previous X seconds of gameplay to new trace file
  • Faster looping and seeking through huge traces (“DVR” replayer mode)
    • Lazy program shader compilation and linking during state restore/playback
    • Automatic keyframe generation during tracing or playback - use worker thread to write snapshots
    • Generate delta state snapshot objects, for fast seeking between keyframes/faster frame looping
  • Replayer perf: Fully multithreaded playback pipeline
    • For each context: thread A decodes packets and composes x86 opcodes to call GL, thread B execs this
  • Really long term: Vendor neutral shader debugging using standard GL calls
    • Compile shader to AST, insert atomic append ops that dump shader IP+variable state after each op to a huge buffer (only on selected vertex/pixel), output AST as GLSL, run draw with this shader
    • UI reads and parses buffer, now we have all the info we need to simulate shader stepping in a UI
    • Easier said than done, but we believe it’s inevitable that someone is going to do this right.


JSON Support Details


  • Binary traces convertible to JSON traces and vice versa
    • Binary traces are dumped to: one JSON file per frame, loose files for large data blobs contained in trace packets, and the trace’s .zip archive.
    • JSON traces guaranteed lossless vs. binary: float/double’s coded as hex strings when needed.
    • .zip archive can be unzipped and deleted, trace is still replayable from loose files, and loose files take precedence over archive files.
  • Direct playback and debugging of JSON traces
  • JSON traces and blob files designed to be manually editable
    • Replayer does its best to fix up/adjust GL call parameters as needed.
    • Many fields optional, and we’ve tried to avoid “magic” keys or field interdependencies.
  • Binary traces also make extensive use the Universal Binary JSON (UBJ) format: http://ubjson.org/
  • The voglcommon lib contains helper classes to read/write traces and packets, or you can read the JSON data yourself.


References




Minimal OpenGL JSON Trace

Sample JSON full-stream OpenGL trace, visualized as a graph:



Source:
// draw_triangle.json - Draws 1 white triangle on a gray background
{
   "meta" : { "cur_frame" : 0, "eof" : true }, "sof" : { "pointer_sizes" : 4 },
   
   "packets" : [
    { "func" : "glXCreateContext", "context" : "0x0", "params" : { "dpy" : "0x1", "vis" : "0x1", "shareList" : "0x0", "direct" : true  }, "return" : "0x1" },
{ "func" : "glXMakeCurrent", "context" : "0x0", "params" : { "dpy" : "0x1", "drawable" : "0x1", "context" : "0x1" }, "return" : true },
   
    { "func" : "glViewport", "params" : { "x" : 0, "y" : 0, "width" : 400, "height" : 200 } },
   
    { "func" : "glClearColor", "params" : { "red" : 0.25, "green" : .25, "blue" : .25, "alpha" : 1. } },
{ "func" : "glClear", "params" : { "mask" : "0x4000" } },
   
    { "func" : "glMatrixMode", "params" : { "mode" : "GL_PROJECTION" }, },

    { "func" : "glLoadIdentity" },
   
    { "func" : "glMatrixMode", "params" : { "mode" : "GL_MODELVIEW" } },
    { "func" : "glLoadIdentity" },
     
    { "func" : "glColor3f", "params" : { "red" : 1., "green" : 1., "blue" : 1. }, },
    { "func" : "glScalef", "params" : { "x" : 0.2, "y" : 0.2, "z" : 1. } },
    { "func" : "glTranslatef", "params" : { "x" : -1.5, "y" : 0., "z" : 0. } },

    { "func" : "glBegin", "params" : { "mode" : "GL_TRIANGLES" } },
    { "func" : "glVertex2f", "params" : { "x" : 0., "y" : 4. } },
    { "func" : "glVertex2f", "params" : { "x" : 4., "y" : 0. }, },
    { "func" : "glVertex2f", "params" : { "x" : 0., "y" : 0. } },
    { "func" : "glEnd" },

    { "func" : "glXSwapBuffers", "params" : {"dpy" : "0x1", "drawable" : "0x1" } }
]
}

Output from voglreplay draw_triangle.json -dump_screenshots:


OpenGL State Snapshot JSON File - Visualized as a Graph

Been using graphviz's dot tool to visualize JSON files containing OpenGL state snapshots. This tree only contains high-level GL state. The large texture, buffer, shader, etc. data is written to loose binary files and referenced by unique filenames in the JSON data, which isn't represented here.

I had to limit the max # of values per JSON array/object to 20 otherwise dot falls over and takes hours to complete. I wish I knew of a faster graph visualization tool.


Saturday, November 9, 2013

Some Advice When Starting a New Job

I think this advice would be helpful working almost anywhere:

- Make some friends
- Earn some respect
- Ask some questions
- Don't piss off your company's customers

Saturday, October 26, 2013

QtCreator's Python Debug Visualizers

Peter Lohrmann wrote QtCreator debug visualizers in Python for some key classes used by the Linux OpenGL debugger project we've been working on together. He recently blogged about the details here.

So far we've got visualizers for our dynamic_string and vector classes. (Like many/most game devs, we use our own custom containers to minimize our reliance on the C++ runtime and "standard" libraries, but that's another story.) Before, to visualize the contents of vectors in QtCreator, I've had to muck around in the mud with the watch window and type in the object's name, followed by the pointer and the # of elements to view. Our dynamic_string class uses the small string optimization (not the super optimized version that Thatcher describes here, just something basic to get the job done). So it's been a huge pain to visualize strings, or basically anything in the watch/locals window.

The below pic shows the new debug visualizers in action on a vector of vectors containing dynamic_strings.  Holy shit, it just works!

I'm not a big fan of Python, but this is valuable and cool enough to make it worth my while to learn it.


Here's the code. Almost all of this is Peter's work, I've just tweaked the vector dumper to fix some things. I'm a total Python newbie so it's possible I screwed something up here, but this is working much better than I expected already. It's amazing how something simple like this on Linux can make me so happy.

You can find a bunch of QtCreator's debug visualizer code here: ~/qtcreator-2.8.0/share/qtcreator/dumper

In my ~/.gdbinit file:

python
execfile('/home/richg/dev/raddebugger/src/crnlib/gdb-dumpers.py')
end

And here's my /home/richg/dev/raddebugger/src/crnlib/gdb-dumpers.py file:

#!/usr/bin/python

# This file contains debug dumpers / helpers / visualizers so that certain crnlib

# classes can be more easily inspected by gdb and QtCreator.

def qdump__crnlib__dynamic_string(d, value):

    dyn = value["m_dyn"]
    small = value["m_small"]
    len = value["m_len"]
    small_flag = small["m_flag"]
    d.putAddress(value.address)
    buf = dyn["m_pStr"]
    if small_flag == 1:
        buf = small["m_buf"]
    p = buf.cast(lookupType("unsigned char").pointer())
    strPrefix = "[%d] " % int(len)
    str = "'" + p.string(length=len) + "'"
    d.putValue(strPrefix + str)
    d.putNumChild(3)
    with Children(d):
        d.putSubItem("m_len", len)
        with SubItem(d, "m_small"):
            d.putValue( str if small_flag == 1 else "<ignored>")
            d.putNumChild(2)
            with Children(d):
                d.putSubItem("m_flag", small_flag)
                with SubItem(d, "m_buf"):
                    d.putValue(str if small_flag == 1 else "<ignored>")
        with SubItem(d, "m_dyn"):
            d.putValue("<ignored>" if small_flag == 1 else str)
            d.putNumChild(2)
            with Children(d):
                with SubItem(d, "m_buf_size"):
                    d.putValue("<ignored>" if small_flag == 1 else dyn["m_buf_size"])
                with SubItem(d, "m_pStr"):
                    d.putValue("<ignored>" if small_flag == 1 else str)

def qdump__crnlib__vector(d, value):

    size = value["m_size"]
    capacity = value["m_capacity"]
    data = value["m_p"]
    maxDisplayItems = 100
    innerType = d.templateArgument(value.type, 0)
    p = gdb.Value(data.cast(innerType.pointer()))
    d.putValue( 'Size: {} Capacity: {} Data: {}'.format(size, capacity, data ) )
    d.putNumChild(size)
    numDisplayItems = min(maxDisplayItems, size)
    if d.isExpanded():
         with Children(d, size, maxNumChild=numDisplayItems, childType=innerType, addrBase=p, addrStep=p.dereference().__sizeof__):
             for i in range(0,numDisplayItems):
                 d.putSubItem(i, p.dereference())
                 p += 1

Saturday, October 19, 2013

A Shout-Out to QtCreator 2.8.x on Linux

So this is a little post about C/C++ IDE's, which apart from the browser is the key piece of software I live in most of the day. I know a lot of Windows-centric developers who swear by Visual Studio, and up until recently I used to be one of the VS faithful. I'm going to try and sell you on trying something else, especially if you develop on Linux or OSX but it's available for Windows too.

I think I've finally found a reasonable cross platform VS alternative for C/C++ development that doesn't require shelling out hundreds (or thousands) of dollars every time MS tweaks (or totally screws up) the UI or adds some compiler options. I've been using QtCreator full-time now for 6 months and I think it's awesome. I would buy it in a heartbeat, but it's a free download and it's even open source.

A bit of the background behind my need for a VS alternative: For more than a decade I've been using Visual Studio (since VC5 I think), and various other IDE's from Borland/Watcom/MS before that. When I started working on a new Linux OpenGL debugger (about 6 months ago) all the Linux devs around me where using text editors, cgdb, etc. There was no way in hell I was going back to only a text editor (even the goodness that is Sublime) for editing, gdb cmd line for debugging, and another command line for compiling, etc. It's been a long time since my DOS development days and I'm just too old to do that again on the PC. (On embedded platforms I can tolerate crappy or no IDE's, but not on a full-blown modern desktop!) So I began an exhaustive, and somewhat desperate search for a real Linux IDE with a useful debugger that doesn't suck.

I experimented with a bunch of packages (such as CodeBlocks, CodeLite, Eclipse, KDevelop, etc.) and even some stand-alone debuggers (like ddd, cgdb) on some of my open source projects and settled on the amazing QtCreator 2.8.x. It's a full blown C/C++ IDE with surprisingly few rough edges. It's got all the usual stuff you would expect: editor, project manager (with optional support for things like cmake), integrated source control, an Intellisense-equivalent that just works and doesn't randomly slow the IDE to a crawl like in VS, C/C++ refactoring, and nice gdb/lldb frontends that don't require you to know anything about obscure gdb commands. I've been using it to compile with either clang v3.3 (using Mike Sartain's instructions that make it trivial to switch between clang vs. gcc), and with gcc v4.6. The whole product is super polished, and I find myself happier using it than VS and its fleet of unreliable (but pretty much necessary on real projects) 3rd party plugins like Visual Assist, Incredibuild, etc. that make the whole thing a buggy and unstable mess.

QtCreator's name can be misleading. It's not just for Qt stuff, although it's obviously designed to be great for Qt dev/debugging too. I use it to debug command line and OpenGL apps, either starting them from within QtCreator or attaching to the process remotely. It's got built-in support for Mercurial (hg), Git, Perforce, SVN, etc. although I've only used its hg and p4 support.


Visual Studio since 2012 has apparently gone almost completely batshit, so I've been delaying upgrading for as long as possible even before my VS divorce. I was hoping the saner and more tasteful hands at MS would reign in the "modern app" idiocy and fix things, but I've lost hope. Although with Ballmer (who's obviously been completely out of touch) being finally put to pasture maybe they can turn the ship around.

Here are a few more screenshots of QtCreator Linux in action. I'm using KDE Plasma desktop installed under Ubuntu v12.04 x64. (If you've just installed Ubuntu for the first time and have no Linux desktop preferences yet, do yourself a favor and just go reinstall Kubuntu.) If you want to try it out, be sure to download the version from Qt's website (not the Ubuntu software center - it's really outdated the last time I checked). Also check if your distro requires disabling ptrace hardening before you debug anything. Also, I had to change the default terminal used for running/debugging apps to something else, so under Tools->Environment->Terminal: "/usr/bin/xterm -sl 1999999 -fg white -bg black -geometry 200x60 -e"

We've also just added custom debug visualizers for our most important container classes to QtCreator, but I've not had a chance to play with this stuff yet.

Source control configuration:


Debugging:


More debugging:



Sunday, October 13, 2013

The big miniz zip64 merge

Currently merging miniz v1.15 into my in-progress zip64 branch which is being used/tested in the Linux OpenGL debugger engine project I've been working on. The left pane is v1.15, right is the new version with zip64.


Fun times. The new version still needs to be C-ified in a few places. I'm actually liking the purity of C for some strange reason, it's amazing how fast it compiles vs. C++. I'm so used to glacially slow compiles that when I used TCC (Tiny C Compiler) again it appeared to complete so fast that I thought it had silently crashed.