Friday, April 4, 2014

vogl support for Unreal Engine 4

We're extremely excited that Epic is porting Unreal Engine 4 to Linux -- see the official announcement or some press here and here. Once we heard UE4 Linux was coming we pretty much dropped everything to ensure vogl can handle UE4 callstreams. The latest code on github now supports full-stream tracing/replaying and trimming of UE4 callstreams in either GL3 or GL4 mode. UI support for UE4 is still in the early stages, but now that we can snapshot/restore UE4 and continue to play back the callstream without diverging it's only matter of time before the UI comes up to speed.

UE4's OpenGL renderer is the most advanced we've worked with so far. It has provided us with valuable real-world test cases for several modern GPU features we've not had traces to validate our code against, such as compute shaders and cubemap arrays. We'll be making UE4 GL callstreams part of our regression test suite going forward.

Here are some shots of a trace of UE4's test game being replayed in voglreplay64's --interactive mode (which relies on state snapshotting/restoring):




Here's a trimmed trace loaded in the editor:


Known problems:

  • UI: Peter Lohrmann just added a dropdown that lets you select which context's state to view. This code is hot off the presses and is a bit fiddly at the moment. Also, UE4 uses several texture formats that the vogl UI can't display right now (LunarG is helping us fix this, see below.)
  • Snapshotting UE4 during tracing is currently unsupported (but snapshotting during replaying works), because the tracer can't snapshot state while any buffers are mapped. (We also have this problem with the Steam Big Picture renderer.) We have a fix in the works.
  • We're seeing several query related warnings/errors while snapshotting and replaying UE4 callstreams. (This problem is in vogl's replayer, not UE4.) These need to be investigated, but they don't seem to cause the replayer to diverge.
  • There are several "zombie" buffer objects that have been deleted on one context but remain bound on another, which causes the snapshot system to report handle remapping errors on these objects during snapshotting. These buffers don't appear to be actually referenced after they are deleted, so this doesn't cause the replay to diverge. We've got some ideas on how to improve vogl's handling of this scenario (which is unfortunately very easy to do by accident in GL).

Other news:

LunarG has provided us with the first drop of their universal OpenGL texture format converter/transformer module, which will be going open source soon. This module allows us to convert any type of OpenGL/KTX texture data to various canonical formats (such as 8-bit or float RGBA) in a driver independent manner, with the optional transforms we need to build a good texture/framebuffer viewer UI. The current vogl UI uses some temporary and very incomplete stand-in code to convert textures to formats Qt accepts, so we're really looking forward to switching to LunarG's solution.

Finally, John McDonald recently joined Valve and the SteamOS team and is currently getting up to speed on the vogl codebase.


Friday, March 21, 2014

couple vogl debugger/editor UI screenshots

vogl's UI (being worked on by Peter Lohrmann) has come far in the past month. I used it today while debugging what seemed to be a replay bug in Xonotic (reported by a dev named blackout24 on github). I first trimmed a single frame that clearly showed the problem, then played back this trimmed trace in a endless loop to verify the issue still showed up in the trim. I manually trimmed the source trace using voglreplay64, but I think initial support for doing all this directly from the UI just went in.

The UI helped me quickly pinpoint the first draw affected by the rendering problem. I then drilled down and examined all the GL state, textures, shaders, etc. on and around this draw. Clicking on a GL command that already had a snapshot was fast, only around a second in debug, and around 3-4 seconds on commands without snapshots. I still dumped the trimmed trace to JSON+loose files, more out of habit than anything, but using the UI was much faster than doing things the old way (which involved dumping massive amounts of PNG's on each major event, then using voglreplay -find and/or grep on huge JSON files).

Here's the pinpointed draw showing the problem (a completely opaque foliage billboard that should have been rendered transparently):


Depth and stencil buffers are currently displayed by mapping their individual bytes directly to image components - we're working on that.

Here's the foliage texture. I enabled alpha blending in the UI to double check the texture's alpha channel was reasonable:


Xonotic replay showing the problem, with the powerful QtCreator IDE in the background:


Turns out the problem was caused by Xonotic's usage of alpha to coverage on a multisampled default framebuffer. We don't currently support automatically enabling multisampling on default framebuffers during replaying. (We do of course support MSAA renderbuffers/textures/FBO's, but not on the default framebuffer yet.)

For now, I added a "-msaa X" command line option to the replayer to enable MSAA on the default framebuffer until we address this. This is crappy, but the vast majority of GL apps just don't enable MSAA this way and we have bigger fish to fry at the moment. (Also, I don't want to touch vogl's GLX/X-Windows related code until we abstract it away into SDL or something.)

Wednesday, March 19, 2014

vogl's tracer/replayer now supports the Steam Linux client

Steam's Big Picture mode is one of the last remaining Valve OpenGL apps that vogl didn't support until now. (The desktop client's GL callstream has worked for months.) The fixes for Big Picture are now all pushed to our github repo.

Here's a Big Picture ("10ft") replay in interactive mode after pausing (which involves a full state snapshot, context teardown, and state restore) and continuing playback:


Replaying 10ft traces on NVidia technically works, but there's a driver bug that is causing playback to be extremely slow on my box (that NVidia is checking out). So all tracing and replaying in these shots was done on a AMD 57xx series part using the closed source fglrx driver.

The desktop ("2ft") GL callstream is looking good too, but compared to 10ft I have hardly spent any time looking at it. (I used the "-lock_window_dimensions -width 2560 -height 1600" cmd line options to replay this trace for 10ft, so the window is much bigger than needed for 2ft):


There are some known remaining issues, none of them show stoppers for debugging purposes. I'll be adding this trace to our shiny new regression test system Mike Sartain is working on soon.

- The replayer's auto window resize logic is almost useless on Steam traces because it creates so many trampoline contexts (associated with tiny windows) during startup and mode changes. So you must currently replay using "-lock_window_dimensions -width X -height Y".

- Can't make single/multi-frame snapshots of 10ft during tracing, only replaying. This isn't a big deal, because you can make a full-stream trace and just trim the frames you want to look at.

This problem is caused by the 10ft renderer keeping several buffers mapped all the time. I have a safe and easy fix coming that might address this issue (but it'll only work when the app keeps the entire buffer mapped).

- Can only debug 10ft on AMD until the NVidia driver bug is fixed

- The UI has not been tested on 10ft traces yet. Peter Lohrmann just added better support for debugging traces containing multiple contexts (specifically to help 10ft debugging along) which I'll try soon.

- The 10ft renderer deletes textures while they are still bound to FBO's (and keeps the FBO's around)

This causes various problems for the snapshot code because it can't retrieve the texture attachment handles in these FBO's (we just get 0's for the GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME), and (the last time I checked) we can't reliably retrieve information on these deleted textures on all drivers. This seems to be a very rare pattern (that I've never seen in any game titles, just 10ft). After asking around it turns out this problem is not a showstopper for 10ft because it always rebinds a new texture to the same attachment point before it ever renders to the FBO again.

All that red text in the below screenshot is due to this issue, but the output should still be correct.


You don't need to do anything special to trace steam:

./trace.sh /usr/bin/steam

trace.sh is our example tracing script, see here. The example script causes the tracer to wait for a keypress, but you may not see the "waiting for keypress" message - just press any key if the app appears to stop.


Saturday, March 15, 2014

Trying SmartGitHg build v6 preview 4

We've been using Mercurial+TortoiseHg for the previous year (with hosting on Bitbucket), but the open source mainstream uses git so we're now switching vogl over to it exclusively. I gather most Linux devs primarily use command line tools, which is fine and all (I obviously do too when needed) but I want to find good GUI's for this stuff. The last time I had to use CLI tools for version control was 1997 under DOS.

There's an added bonus to being obstinate and pushing to find and use good native Linux GUI's for our major devtools: devs porting from the Windows/OSX/console worlds already have huge piles of solid GUI-based tools, and we need to find reasonably competitive native Linux alternatives. (When I say "native", I mean "not under Wine". I use Wine every day to run some old non-critical Windows programs I just can't find Linux alternatives for that I like, such as the Boxer Text Editor and Paint Shop Pro. Wine seems to run older Win32 apps better than Windows 7/8 itself these days!)

So I'm on the lookout for a git UI that is at least as good as TortoiseHg for doing the basics. I found a good Visual Studio alternative (QtCreator) a year ago after a wide search involving around a half dozen other Linux IDE's. I knew QtCreator was a good product after using its debugger for 20 minutes. It's by no means perfect but I've not had to use gdb/cgdb once since switching to it.

SmartGitHg has been on my radar, so I'm trying it out. It's commercial but has a 30 day trial (and is free for non-commercial use). This thing could be unusable -- I have no idea yet.



http://www.syntevo.com/smartgithg/early-access

It needs openjdk-7-jre to run, which I installed first using the Muon package manager (under Ubuntu 13.10+KDE). The UI seems more complex, but cleaner, than TortoiseHg's. If you're already familiar with Mercurial/thg it seems pretty easy to map over the concepts and accomplish the basics. I just pushed a trivial change up using it (added a link to the vogl wiki). I'll keep trying UI's if necessary until something works for 90% of the things devs do (add files, check in, push, pull, merge, resolve conflicts, browse history, etc).

Friday, March 14, 2014

Completed another round of testing on AMD's (fglrx) driver

I fixed a number of issues specific to AMD's driver - changelist notes are here. Mike should hopefully push these changes to github out tonight or tomorrow latest. (3/15: These changes are live on github - thanks Mike!)

Here's Dota2 replaying on fglrx in -interactive mode. Also, our regression test suite is now working (for the first time!) on AMD, which is pretty exciting.

The GL API callstream involved is hairy - it's kinda amazing that it works at all:

- I traced Dota2 using apitrace on an NVidia 780 to a .trace file
- I played this back on AMD's fglrx using glretrace, then intercepted its output using libvogltrace to a vogl .bin trace file
- I then play this new trace using voglreplay. The regression test suite verifies that the backbuffer CRC's seen during tracing vs. replaying are the same (we've failed if not).

So we're mixing two different drivers and tracing/replaying frameworks in this test. The yellow warnings in the screenshot below are caused by missing uniforms, which are optimized differently by the AMD driver's compiler vs. NVidia (so some program uniforms are missing on AMD, which should be harmless).


Wednesday, March 12, 2014

Notes on current vogl limitations

All debuggers have limitations. Most of the time, you don't really know what they are until they pop up while you're trying to debug something (usually at the worst time, after wasting many hours). So here's a list of vogl limitations/issues I've been compiling (which will go up on the wiki once it's setup):

Note: All this is on the vogl wiki now: https://github.com/ValveSoftware/vogl/wiki
  • We don't support LD_PRELOAD-style tracing on Optimus setups. 
I would like to support it, but honestly it's challenging enough to do this on vanilla desktop stacks. Once you throw 2 drivers in there all bets are off with all the tracers I've tried. Any help in this area would be great.

We do support manually loading our tracer (libvogltrace32/64.so) on Optimus, but it's not something I've had the time to test much. To do this, manually load libvogltrace and dlsym() the gliGetProcAddressRAD()function (to be renamed to voglGetProcAddress()).

  • Can't take state snapshots during tracing or replaying while any buffers are currently mapped. 
This is typically not a problem because almost all apps just map a buffer, poke around inside the mapped region (reading and/or writing with the CPU), then unmap and move on.

I'm currently working on removing this restriction during replaying (which is easy because we fully control all GL contexts during replaying), but reliably removing this limitation during tracing in all scenarios seems challenging.

  • PBO (pixel pack and unpack buffers) not supported in the current github drop
This is already implemented and is being tested with Steam 10ft traces. I'll hopefully push it up by the end of the week.

  • GL 4.x is not supported for full-stream or snapshotting
There's a lot of GL 4.x stuff that will work, but it's not been a priority to support the latest bleeding edge stuff. Almost all shipped GL products I'm seeing only use GL 3.x, at best. Interestingly, the biggest/most ported releases tend to use a very conservative set of GL v2/v3.

  • Cubemap arrays not supported for snapshotting yet (but are OK for full-stream)
Here's the list of texture types we can snapshot: 1D, 2D, RECTANGLE, CUBE_MAP, 1D_ARRAY, 2D_ARRAY, 3D, 2D_MULTISAMPLE, and 2D_MULTISAMPLE_ARRAY. Incomplete textures are OK, but you'll get a warning if you haven't properly set GL_TEXTURE_MAX_LEVEL (which you most definitely should always do because not doing so is unreliable in practice).

  • Abuse of GL handles+multiple contexts
Sadly GL handles behave in interesting and obscure ways once you introduce sharelists. So before you delete textures (and most other objects) you should make sure they are not bound on other contexts before you delete them, otherwise you're going down a direction that you'll probably regret (and that will give vogl headaches). vogl will give you errors on this scenario when you try to snapshot. For example:

Let's say you create a second context that shares with your first context. It gens a texture (handle=1), binds it on both contexts, calls glTexStorage() to initialize it, then deletes the texture on the 1st context. Everything appears as expected on the 1st context: the texture becomes auto-unbound, glIsTexture() reports false, and I can't retrieve the texture's width anymore (using glGetTexLevelParameteriv()). All nice and neat.

But on the 2nd context, the texture remains bound, glIsTexture() returns false, but I can still retrieve the texture's width. If I call glGenTextures() handle 1 gets immediately reused, even though it's still bound (as reported by glGet() on GL_TEXTURE_BINDING_2D) and even though I can retrieve texture 1's width. At this point handle 1 means two different things (!) on this specific context, which is most wonderful. If I then rebind texture handle 1 (which was just re-genned) I can no longer retrieve the width.

  • Can't snapshot textures after they are deleted (but still bound elsewhere)
We support snapshotting shaders that have been attached to programs and then immediately deleted. We also support snapshotting programs that have been deleted but are still bound. These are pretty common GL patterns we've seen in a few major titles. At program link time we make a deep copy of all attached shaders (called the "link time snapshot" in the code), so we can guarantee we can snapshot and recreate the program's actual linked state no matter what the app does with the shaders after linking.

However, there are other scenarios (such as binding a texture to a FBO, then deleting the texture but keeping it bound to the FBO) that we don't fully support for snapshotting. This scenario may never be fully supported: the last time I tried I couldn't query state of deleted (but still bound) textures on at least one driver, and we're not going to deeply shadow all texture state to work around this. Luckily, I've only ever seen this done purposely in one app so far, and the attached texture was not actually used for rendering purposes after the deletion. (They kept it attached to keep their hands on the GPU memory so the driver wouldn't reclaim it.)

vogl will spit out an error and typically try to continue snapshotting when it encounters a handle attached to an object that has been deleted (and we've lost track of). You'll get a handle remap error, because we won't know how to remap the handle from the GL replay domain back into the trace domain. The snapshot may cause the replayer to diverge, though.

  • During replaying the default (GLX) framebuffer is always 32-bit RGBA, no MSAA, with a 24/8 depth stencil buffer. 
On the todo list, but this hasn't been a problem so far. Apps that use MSAA tend to use renderbuffers or maybe MSAA textures, probably because this is more portable (vs. mucking around with the default GLX framebuffer's setup). It's possible for an app replay to diverge if the default framebuffer has a configuration that it didn't have during tracing, but in practice I haven't seen this happen.

  • Replay window auto-resizing can be a problem in some apps
Unlike apitrace, we only use a single replay window and resize it as needed. The auto-resize logic can get stuck resizing too much. This problem pops up most often in GLUT/FreeGLUT apps. We can capture/replay them, but the replayer's window code tends to get confused by the GLUT UI window activity. It'll still replay properly, but slowly as the replayer auto-resizes the replay window. 

If the window auto-resizes too much use "-lock_window_dimensions -width X -height Y" on the voglreplay command line to lock the replay window to a fixed size.

We may switch to apitrace-style multiple windows, or maybe pbuffers, to work around this (needs investigation).

  • We can't snapshot inside of glBegin/glEnd regions.
We didn't think it was worth the extra complexity to be able to snapshot/restore incomplete glBegin sequence, so either snapshot right before or right after the region. (Hey, at least we support snapshotting apps that use glBegin at all!)

  • Display list limitations
No recursion and no resources can be bound in the display list but textures. We do support around 400 API's inside of display lists. GL display lists are ancient API's at this point, so I don't think we'll do much more in this area unless a big title from the past uses them. (We do already support Doom3's usage of GL display lists, though.)

  • Be careful deleting contexts that share lists with other contexts
We support tracing/replaying/snapshotting/restoring the state of multiple contexts. vogl has the concept of "root" contexts and "sharelist groups". A sharelist group is 2 or more contexts that share objects, and the first context created in this group (that doesn't, and can't, share with anything else) is marked as the "root" context for that group.

vogl can't snapshot state if the "root" context of a sharelist group is destroyed while other leaf contexts are still present. Either snapshot immediately after all the leaf contexts are destroyed, or reorder your context deletions so the root gets killed last. In 99% of cases none of this matters; most apps just delete all their contexts at once or just leak them at exit.

  • Forking while tracing
I've encountered problems with this on some apps (mostly Mono ones I think). Needs investigation, we haven't tested it.

  • Try to delete your contexts when exiting
We've got several hooks in there to make sure the trace is properly flushed and closed when apps exit and leak their contexts. These hooks work most of the time, but it's best if you properly tear down your contexts when you exit.

The replayer does support unflushed traces (with no trace archive at the end), but there are no guarantees.

Also, not properly tearing down your contexts before exiting actually makes it very difficult for us to fully flush any in-progress asynchronous PBO readbacks (used for real-time JPEG capturing).

  • UI limitations
The entire UI is still very, very new. The texture, renderbuffer, and default framebuffer viewer in particular is very basic. It has little support for viewing traces that have multiple contexts.

Peter Lohrmann is working on improving the UI. We're currently using it to help us debug the debugger itself, which is progress, but there's a bunch of work left before I would try using it to debug a title with it.

  • Driver compat
I've tested the most on NVidia, a moderate amount on AMD, and (unfortunately) very little on Intel's open source driver so far. (Not purposely - it's just a time limitation.) We mostly ping-pong between NVidia and AMD as driver bugs pop up and we wait for the vendor to provide us with fixes. A developer at LunarG is now helping us get vogl working on Intel's open source driver.

  • Program binary gotchas
If you trace a 32-bit app that uses program binaries, on at least 1 driver (NVidia) you must replay using the 32-bit replayer (same for 64-bit). You can forcefully disable the app's usage of program binaries while tracing using --vogl_disable_gl_program_binary. This flag causes the tracer to remove the GL_ARB_get_program_binary extension string, and it'll also force the driver to always fail links with program bins (in case you don't check the string).

We've gone back and forth with always disabling program binaries by default in the tracer, but at the end of the day we take the policy of changing the app's behavior during tracing as little as possible unless you have purposely chosen to override something.

Note program binaries are usually *extremely* fragile, so traces containing program binaries may only be replayable on the exact driver version you captured them on.

  • Can't take a snapshotting while tracing if other threads have contexts current
We take the snapshot immediately after the next glXSwapBuffers() call. The tracer will attempt to make each context current on the same thread that calls glXSwapBuffer()'s so it can take a snapshot, but it won't be able to do this if the app has the context current on the other thread. So don't leave your contexts current across swaps if you want to take a snapshot. (We couldn't think of a reliable/robust way around this limitation.)

To snapshot during tracing, write a file named "__trigger_capture__" to the app's current directory and the tracer will immediately take a snapshot. You can take as many snapshots as you want while tracing. (Of course, you can't have specified "--vogl_tracefile X" on your command line, which would have put the tracer into full-stream mode.) I'll better document this within a day or so, for now just search the code in vogl_intercept.cpp.


  • Replayer whitelist
If the tracer encounters a GL/GLX function it knows the replayer won't be able to handle it'll give you an error when it encounters the call. The call will be written to the trace as best the tracer can, and the call will go directly to the driver, but the replayer will ignore it (after spitting out an error message). When you exit the traced app, you'll get a list of non-whitelisted funcs that were actually called during tracing. The func whitelist is the union of the API's contained in two files:
https://github.com/ValveSoftware/vogl/blob/master/glspec/gl_glx_whitelisted_funcs.txt
https://github.com/ValveSoftware/vogl/blob/master/glspec/gl_glx_simple_replay_funcs.txt

You can still try to replay this trace, but it may diverge or horribly fail. To see a more detailed whitelist, run the "voglgen" tool with the -debug option in the glspec directory.

Some of the newer GL debug related funcs aren't in the whitelist yet, I'll be adding them in very soon.

You'll get warnings if you call GetProcAddress() on GL/GLX functions that are not in the whitelist. This is typically harmless, most apps use GL extension libraries that retrieve the addresses of hundreds to thousands of GL funcs they never actually call.