Here it is:
https://github.com/ValveSoftware/vogl/wiki
Co-owner of Binomial LLC, working on GPU texture interchange. Open source developer, graphics programmer, former video game developer. Worked previously at SpaceX (Starlink), Valve, Ensemble Studios (Microsoft), DICE Canada.
Thursday, March 13, 2014
Wednesday, March 12, 2014
Notes on current vogl limitations
All debuggers have limitations. Most of the time, you don't really know what they are until they pop up while you're trying to debug something (usually at the worst time, after wasting many hours). So here's a list of vogl limitations/issues I've been compiling (which will go up on the wiki once it's setup):
Note: All this is on the vogl wiki now: https://github.com/ValveSoftware/vogl/wiki
We do support manually loading our tracer (libvogltrace32/64.so) on Optimus, but it's not something I've had the time to test much. To do this, manually load libvogltrace and dlsym() the gliGetProcAddressRAD()function (to be renamed to voglGetProcAddress()).
I'm currently working on removing this restriction during replaying (which is easy because we fully control all GL contexts during replaying), but reliably removing this limitation during tracing in all scenarios seems challenging.
Let's say you create a second context that shares with your first context. It gens a texture (handle=1), binds it on both contexts, calls glTexStorage() to initialize it, then deletes the texture on the 1st context. Everything appears as expected on the 1st context: the texture becomes auto-unbound, glIsTexture() reports false, and I can't retrieve the texture's width anymore (using glGetTexLevelParameteriv()). All nice and neat.
But on the 2nd context, the texture remains bound, glIsTexture() returns false, but I can still retrieve the texture's width. If I call glGenTextures() handle 1 gets immediately reused, even though it's still bound (as reported by glGet() on GL_TEXTURE_BINDING_2D) and even though I can retrieve texture 1's width. At this point handle 1 means two different things (!) on this specific context, which is most wonderful. If I then rebind texture handle 1 (which was just re-genned) I can no longer retrieve the width.
However, there are other scenarios (such as binding a texture to a FBO, then deleting the texture but keeping it bound to the FBO) that we don't fully support for snapshotting. This scenario may never be fully supported: the last time I tried I couldn't query state of deleted (but still bound) textures on at least one driver, and we're not going to deeply shadow all texture state to work around this. Luckily, I've only ever seen this done purposely in one app so far, and the attached texture was not actually used for rendering purposes after the deletion. (They kept it attached to keep their hands on the GPU memory so the driver wouldn't reclaim it.)
vogl will spit out an error and typically try to continue snapshotting when it encounters a handle attached to an object that has been deleted (and we've lost track of). You'll get a handle remap error, because we won't know how to remap the handle from the GL replay domain back into the trace domain. The snapshot may cause the replayer to diverge, though.
If the window auto-resizes too much use "-lock_window_dimensions -width X -height Y" on the voglreplay command line to lock the replay window to a fixed size.
We may switch to apitrace-style multiple windows, or maybe pbuffers, to work around this (needs investigation).
Note: All this is on the vogl wiki now: https://github.com/ValveSoftware/vogl/wiki
- We don't support LD_PRELOAD-style tracing on Optimus setups.
We do support manually loading our tracer (libvogltrace32/64.so) on Optimus, but it's not something I've had the time to test much. To do this, manually load libvogltrace and dlsym() the gliGetProcAddressRAD()function (to be renamed to voglGetProcAddress()).
- Can't take state snapshots during tracing or replaying while any buffers are currently mapped.
I'm currently working on removing this restriction during replaying (which is easy because we fully control all GL contexts during replaying), but reliably removing this limitation during tracing in all scenarios seems challenging.
- PBO (pixel pack and unpack buffers) not supported in the current github drop
This is already implemented and is being tested with Steam 10ft traces. I'll hopefully push it up by the end of the week.
- GL 4.x is not supported for full-stream or snapshotting
- Cubemap arrays not supported for snapshotting yet (but are OK for full-stream)
Here's the list of texture types we can snapshot: 1D, 2D, RECTANGLE, CUBE_MAP, 1D_ARRAY, 2D_ARRAY, 3D, 2D_MULTISAMPLE, and 2D_MULTISAMPLE_ARRAY. Incomplete textures are OK, but you'll get a warning if you haven't properly set GL_TEXTURE_MAX_LEVEL (which you most definitely should always do because not doing so is unreliable in practice).
- Abuse of GL handles+multiple contexts
Let's say you create a second context that shares with your first context. It gens a texture (handle=1), binds it on both contexts, calls glTexStorage() to initialize it, then deletes the texture on the 1st context. Everything appears as expected on the 1st context: the texture becomes auto-unbound, glIsTexture() reports false, and I can't retrieve the texture's width anymore (using glGetTexLevelParameteriv()). All nice and neat.
But on the 2nd context, the texture remains bound, glIsTexture() returns false, but I can still retrieve the texture's width. If I call glGenTextures() handle 1 gets immediately reused, even though it's still bound (as reported by glGet() on GL_TEXTURE_BINDING_2D) and even though I can retrieve texture 1's width. At this point handle 1 means two different things (!) on this specific context, which is most wonderful. If I then rebind texture handle 1 (which was just re-genned) I can no longer retrieve the width.
- Can't snapshot textures after they are deleted (but still bound elsewhere)
However, there are other scenarios (such as binding a texture to a FBO, then deleting the texture but keeping it bound to the FBO) that we don't fully support for snapshotting. This scenario may never be fully supported: the last time I tried I couldn't query state of deleted (but still bound) textures on at least one driver, and we're not going to deeply shadow all texture state to work around this. Luckily, I've only ever seen this done purposely in one app so far, and the attached texture was not actually used for rendering purposes after the deletion. (They kept it attached to keep their hands on the GPU memory so the driver wouldn't reclaim it.)
vogl will spit out an error and typically try to continue snapshotting when it encounters a handle attached to an object that has been deleted (and we've lost track of). You'll get a handle remap error, because we won't know how to remap the handle from the GL replay domain back into the trace domain. The snapshot may cause the replayer to diverge, though.
- During replaying the default (GLX) framebuffer is always 32-bit RGBA, no MSAA, with a 24/8 depth stencil buffer.
- Replay window auto-resizing can be a problem in some apps
Unlike apitrace, we only use a single replay window and resize it as needed. The auto-resize logic can get stuck resizing too much. This problem pops up most often in GLUT/FreeGLUT apps. We can capture/replay them, but the replayer's window code tends to get confused by the GLUT UI window activity. It'll still replay properly, but slowly as the replayer auto-resizes the replay window.
If the window auto-resizes too much use "-lock_window_dimensions -width X -height Y" on the voglreplay command line to lock the replay window to a fixed size.
We may switch to apitrace-style multiple windows, or maybe pbuffers, to work around this (needs investigation).
- We can't snapshot inside of glBegin/glEnd regions.
- Display list limitations
No recursion and no resources can be bound in the display list but textures. We do support around 400 API's inside of display lists. GL display lists are ancient API's at this point, so I don't think we'll do much more in this area unless a big title from the past uses them. (We do already support Doom3's usage of GL display lists, though.)
- Be careful deleting contexts that share lists with other contexts
We support tracing/replaying/snapshotting/restoring the state of multiple contexts. vogl has the concept of "root" contexts and "sharelist groups". A sharelist group is 2 or more contexts that share objects, and the first context created in this group (that doesn't, and can't, share with anything else) is marked as the "root" context for that group.
vogl can't snapshot state if the "root" context of a sharelist group is destroyed while other leaf contexts are still present. Either snapshot immediately after all the leaf contexts are destroyed, or reorder your context deletions so the root gets killed last. In 99% of cases none of this matters; most apps just delete all their contexts at once or just leak them at exit.
We've gone back and forth with always disabling program binaries by default in the tracer, but at the end of the day we take the policy of changing the app's behavior during tracing as little as possible unless you have purposely chosen to override something.
Note program binaries are usually *extremely* fragile, so traces containing program binaries may only be replayable on the exact driver version you captured them on.
To snapshot during tracing, write a file named "__trigger_capture__" to the app's current directory and the tracer will immediately take a snapshot. You can take as many snapshots as you want while tracing. (Of course, you can't have specified "--vogl_tracefile X" on your command line, which would have put the tracer into full-stream mode.) I'll better document this within a day or so, for now just search the code in vogl_intercept.cpp.
https://github.com/ValveSoftware/vogl/blob/master/glspec/gl_glx_whitelisted_funcs.txt
https://github.com/ValveSoftware/vogl/blob/master/glspec/gl_glx_simple_replay_funcs.txt
You can still try to replay this trace, but it may diverge or horribly fail. To see a more detailed whitelist, run the "voglgen" tool with the -debug option in the glspec directory.
Some of the newer GL debug related funcs aren't in the whitelist yet, I'll be adding them in very soon.
You'll get warnings if you call GetProcAddress() on GL/GLX functions that are not in the whitelist. This is typically harmless, most apps use GL extension libraries that retrieve the addresses of hundreds to thousands of GL funcs they never actually call.
- Forking while tracing
I've encountered problems with this on some apps (mostly Mono ones I think). Needs investigation, we haven't tested it.
- Try to delete your contexts when exiting
We've got several hooks in there to make sure the trace is properly flushed and closed when apps exit and leak their contexts. These hooks work most of the time, but it's best if you properly tear down your contexts when you exit.
The replayer does support unflushed traces (with no trace archive at the end), but there are no guarantees.
Also, not properly tearing down your contexts before exiting actually makes it very difficult for us to fully flush any in-progress asynchronous PBO readbacks (used for real-time JPEG capturing).
Also, not properly tearing down your contexts before exiting actually makes it very difficult for us to fully flush any in-progress asynchronous PBO readbacks (used for real-time JPEG capturing).
- UI limitations
The entire UI is still very, very new. The texture, renderbuffer, and default framebuffer viewer in particular is very basic. It has little support for viewing traces that have multiple contexts.
Peter Lohrmann is working on improving the UI. We're currently using it to help us debug the debugger itself, which is progress, but there's a bunch of work left before I would try using it to debug a title with it.
- Driver compat
- Program binary gotchas
We've gone back and forth with always disabling program binaries by default in the tracer, but at the end of the day we take the policy of changing the app's behavior during tracing as little as possible unless you have purposely chosen to override something.
Note program binaries are usually *extremely* fragile, so traces containing program binaries may only be replayable on the exact driver version you captured them on.
- Can't take a snapshotting while tracing if other threads have contexts current
- Replayer whitelist
https://github.com/ValveSoftware/vogl/blob/master/glspec/gl_glx_whitelisted_funcs.txt
https://github.com/ValveSoftware/vogl/blob/master/glspec/gl_glx_simple_replay_funcs.txt
You can still try to replay this trace, but it may diverge or horribly fail. To see a more detailed whitelist, run the "voglgen" tool with the -debug option in the glspec directory.
Some of the newer GL debug related funcs aren't in the whitelist yet, I'll be adding them in very soon.
You'll get warnings if you call GetProcAddress() on GL/GLX functions that are not in the whitelist. This is typically harmless, most apps use GL extension libraries that retrieve the addresses of hundreds to thousands of GL funcs they never actually call.
zip64 version of the miniz library released as part of the vogl codebase
miniz is my (mostly) drop-in zlib replacement library:
http://code.google.com/p/miniz
Anyhow, the version of miniz on Google Code only supports zip32, but I added full support for zip64 and a bunch of other features in my spare time last year. I used vogl to test the new code, which you can find the source to here:
https://github.com/ValveSoftware/vogl/blob/master/src/voglcore/vogl_miniz_zip.cpp
https://github.com/ValveSoftware/vogl/blob/master/src/voglcore/vogl_miniz.cpp
The files are marked ".cpp" but it's just plain C code. I need to re-run the latest new code through a C compiler again, but there shouldn't be anything in there that C can't handle. If there is I'll fix it. zip64 was a real pain to fully implement, and next time I will definitely choose a cleaner archive format.
I need to extract this code from the vogl codebase (should be relatively easy as miniz is an independent blob of code) and do a standalone release at some point.
miniz is probably one of my most popular open source libraries. Between all the Microsoft games that used my earlier lossless codecs (Age of Empires 1/2, Halo 3 and I think one of its sequels, Forza 2, Halo Wars) and miniz my compression code has found its way into a bunch of shipped products. One of my other compression libs (picojpeg) is now in orbit on the Skycube nano-satellite, which should be fully deployed from the ISS by the end of the month after its shakedown period is over. I do compression stuff purely for the fun of it so it's pretty cool to see what people wind up doing with it.
http://code.google.com/p/miniz
Anyhow, the version of miniz on Google Code only supports zip32, but I added full support for zip64 and a bunch of other features in my spare time last year. I used vogl to test the new code, which you can find the source to here:
https://github.com/ValveSoftware/vogl/blob/master/src/voglcore/vogl_miniz_zip.cpp
https://github.com/ValveSoftware/vogl/blob/master/src/voglcore/vogl_miniz.cpp
The files are marked ".cpp" but it's just plain C code. I need to re-run the latest new code through a C compiler again, but there shouldn't be anything in there that C can't handle. If there is I'll fix it. zip64 was a real pain to fully implement, and next time I will definitely choose a cleaner archive format.
I need to extract this code from the vogl codebase (should be relatively easy as miniz is an independent blob of code) and do a standalone release at some point.
miniz is probably one of my most popular open source libraries. Between all the Microsoft games that used my earlier lossless codecs (Age of Empires 1/2, Halo 3 and I think one of its sequels, Forza 2, Halo Wars) and miniz my compression code has found its way into a bunch of shipped products. One of my other compression libs (picojpeg) is now in orbit on the Skycube nano-satellite, which should be fully deployed from the ISS by the end of the month after its shakedown period is over. I do compression stuff purely for the fun of it so it's pretty cool to see what people wind up doing with it.
vogl GL debugger source is on github
We promised at Steam Dev Days we would open source the project, so here it is:
https://github.com/ValveSoftware/vogl
Creating a OpenGL debugger that handles both full-stream tracing *and* state snapshotting (with compat profile support to boot!) is a surprisingly massive undertaking for ~3 devs, so please bear with us. We're knee deep in fleshing out the UI and improving the tracer/replayer to be fully compatible with GL v3.3 (4.x will be later this year). Please file bug reports on github and send us trace logs (or apitrace/vogl traces), etc. and we'll do our best to make it work with your app.
We'll be posting more instructions and our current TODO list on the wiki soon.
We're currently in the process of adding PBO support (done, testing it right now), and we've added the ability to snapshot while buffers are mapped during replaying. (Both things are needed to trace/replay/snapshot Steam 10ft.)
https://github.com/ValveSoftware/vogl
Creating a OpenGL debugger that handles both full-stream tracing *and* state snapshotting (with compat profile support to boot!) is a surprisingly massive undertaking for ~3 devs, so please bear with us. We're knee deep in fleshing out the UI and improving the tracer/replayer to be fully compatible with GL v3.3 (4.x will be later this year). Please file bug reports on github and send us trace logs (or apitrace/vogl traces), etc. and we'll do our best to make it work with your app.
We'll be posting more instructions and our current TODO list on the wiki soon.
We're currently in the process of adding PBO support (done, testing it right now), and we've added the ability to snapshot while buffers are mapped during replaying. (Both things are needed to trace/replay/snapshot Steam 10ft.)
Tuesday, March 11, 2014
togl D3D9->OpenGL layer source release on github
This is a raw dump of the togl layer right from DoTA2:
https://github.com/ValveSoftware/ToGL
This is old news by now; I think the press picked up on this even before I heard it was finally released. I really wish we had the time to package it better (so you could actually compile it!) with some examples, etc. There's a ton of practical Linux GL driver know-how packed all over this code -- if you look carefully. Every Valve Source1 title ultimately goes through this layer on Linux/Mac. (The Mac ports do use a different, much earlier branch, however. At some point the Linux and Mac branches merged back together, but I don't know if that occurred in time for DoTA's version.)
We talked a lot about what we learned while working on this layer at GDC last year:
Porting Source to Linux: Valve's Lessons Learned
https://github.com/ValveSoftware/ToGL
This is old news by now; I think the press picked up on this even before I heard it was finally released. I really wish we had the time to package it better (so you could actually compile it!) with some examples, etc. There's a ton of practical Linux GL driver know-how packed all over this code -- if you look carefully. Every Valve Source1 title ultimately goes through this layer on Linux/Mac. (The Mac ports do use a different, much earlier branch, however. At some point the Linux and Mac branches merged back together, but I don't know if that occurred in time for DoTA's version.)
We talked a lot about what we learned while working on this layer at GDC last year:
Porting Source to Linux: Valve's Lessons Learned
https://www.youtube.com/watch?v=btNVfUygvio
Or here:
https://developer.nvidia.com/gdc-2013
There's a lot of history to this particular code. This layer was first started by the Mac team, then later ported from Mac to Linux by the Steam team, and then finally ported by the Linux team to Windows (!) so we could actually debug it. (Because the best available GL debuggers at the time were Windows-only. We are working to correct that with vogl.) John McDonald, Rick Johnson, Mike Sartain, Pierre-Loup Griffais and I then got our hands on it (at various times) to push it down the correctness (for Source1) and performance axes. I spent many months agonizing over this layer's per-batch flush path: tweaking, profiling (with Rad's awesome Telemetry tool), optimizing, and testing it to run the Source1 engine correctly, quickly, and reliably on the drivers available for Linux.
The code is far from perfect: many parts are more like a battleground in there. It's optimized for results, and the primary metrics for success were perf vs. Windows and Source1 correctness, sometimes to the detriment of other factors. A lot of experiments were conducted, some blind alleys were backed out of, and we learned *a lot* about the true state of OpenGL drivers during the effort. If you want to see how to stay in the "fast lanes" of multiple Linux GL drivers simultaneously it might be worth checking out. (Most of the Linux drivers share common codebases with the Windows GL drivers, so a lot of what's in there is relevant to Windows GL too.)
(The first version of this post stated there was another version of togl that supported both Mac and Linux, and had all the SM3 fixes I made for various projects. Turns out the version on github is the very latest version, because all the togl branches were merged back into Dota2 at some point.)
Or here:
https://developer.nvidia.com/gdc-2013
There's a lot of history to this particular code. This layer was first started by the Mac team, then later ported from Mac to Linux by the Steam team, and then finally ported by the Linux team to Windows (!) so we could actually debug it. (Because the best available GL debuggers at the time were Windows-only. We are working to correct that with vogl.) John McDonald, Rick Johnson, Mike Sartain, Pierre-Loup Griffais and I then got our hands on it (at various times) to push it down the correctness (for Source1) and performance axes. I spent many months agonizing over this layer's per-batch flush path: tweaking, profiling (with Rad's awesome Telemetry tool), optimizing, and testing it to run the Source1 engine correctly, quickly, and reliably on the drivers available for Linux.
The code is far from perfect: many parts are more like a battleground in there. It's optimized for results, and the primary metrics for success were perf vs. Windows and Source1 correctness, sometimes to the detriment of other factors. A lot of experiments were conducted, some blind alleys were backed out of, and we learned *a lot* about the true state of OpenGL drivers during the effort. If you want to see how to stay in the "fast lanes" of multiple Linux GL drivers simultaneously it might be worth checking out. (Most of the Linux drivers share common codebases with the Windows GL drivers, so a lot of what's in there is relevant to Windows GL too.)
(The first version of this post stated there was another version of togl that supported both Mac and Linux, and had all the SM3 fixes I made for various projects. Turns out the version on github is the very latest version, because all the togl branches were merged back into Dota2 at some point.)
Saturday, March 8, 2014
Finished Up Support for the Hitachi 6309 CPU
I've added Hitachi 6309 support to my disassembler, monitor interrupt handlers, and monitor client. So I'm now able to switch over my 6809 test app to use the 6309's "native" mode, which is something like 15-30% faster. I can single step over 6309 code sequences and display/modify the extra registers (E/F/V). I verified my disassembler by using the a09 6809/6309 cross assembler to assemble a bunch of test 6309 code, disassemble it, then diff the results vs. the original code. Lomont's 6309 Reference and The 6309 Book are the best references I've found.
The 6309 is amazingly powerful for its time. You've got some 32-bit ops, fast CPU memory transfers, hardware division/multiplication, various register to register ops, and two more 16-bit regs to play with over the 6809 (W and V, although V is limited to exchanges/transfers). W is hybrid register, useful as a pointer or general purpose register. I've written a good deal of real mode (16-bit segmented) 8086/80286 assembly back in the day, and I really like the feel of 6309 assembly.
Unfortunately, the assembler used by gcc6809 (as6809) doesn't support the 6309. The gcc6809 package comes with a 6309 assembler (as6309), but it doesn't compile out of the box. I got it to compile but it's very clear that whoever worked on it never finished it. I made a quick stab at fixing up as6309 but to be honest the C code in there is like assembly code (with unfathomable 2-3 letter variable names and obfuscated program flow), and I don't have time to get into it for a hobby project.
So for now, I'm using the a09 assembler (which does support 6309) to create position independent code (at address 0) contained in simple .bin files which I convert to as09 assembly source files. The .s files contain nothing but ".byte 0xXX" statements and the symbols. To get the symbols I manually place a small symbol table at the end of the .bin file that is automatically located and parsed using a custom command line tool which converts the a09 assembled .bin file to .s assembly files:
code
org $0
;------------------------------------------------------------------------------
; void _math_muli_16_16_32(int16 left_value, int16 right_value, int32 *pResult)
;------------------------------------------------------------------------------
; x - int16 left value
; stack:
; 2,3 - int16 right value
; 4,5 - int32* result_ptr
_math_muli_16_16_32:
right_val = 2
result_ptr = 4
tfr x,d
muld right_val, s
stq [result_ptr, s]
rts
;------------------------------------------------------------------------------
; Define public symbols, processed by cc3monitor -a09 <src_filename.bin>
;------------------------------------------------------------------------------
fcb 0x12, 0x35, 0xFF, 0xF0
This gets converted to an .s file which the gcc6809 tool chain likes:
.module asmhelpers
.area .text
.globl _math_muli_16_16_32
_math_muli_16_16_32:
.byte 0x1F
.byte 0x10
.byte 0x11
.byte 0xAF
.byte 0x62
.byte 0x10
.byte 0xED
.byte 0xF8
.byte 0x4
.byte 0x39
This is a cheesy hack but works fine (for a hobby project).
The 6309 is amazingly powerful for its time. You've got some 32-bit ops, fast CPU memory transfers, hardware division/multiplication, various register to register ops, and two more 16-bit regs to play with over the 6809 (W and V, although V is limited to exchanges/transfers). W is hybrid register, useful as a pointer or general purpose register. I've written a good deal of real mode (16-bit segmented) 8086/80286 assembly back in the day, and I really like the feel of 6309 assembly.
Unfortunately, the assembler used by gcc6809 (as6809) doesn't support the 6309. The gcc6809 package comes with a 6309 assembler (as6309), but it doesn't compile out of the box. I got it to compile but it's very clear that whoever worked on it never finished it. I made a quick stab at fixing up as6309 but to be honest the C code in there is like assembly code (with unfathomable 2-3 letter variable names and obfuscated program flow), and I don't have time to get into it for a hobby project.
So for now, I'm using the a09 assembler (which does support 6309) to create position independent code (at address 0) contained in simple .bin files which I convert to as09 assembly source files. The .s files contain nothing but ".byte 0xXX" statements and the symbols. To get the symbols I manually place a small symbol table at the end of the .bin file that is automatically located and parsed using a custom command line tool which converts the a09 assembled .bin file to .s assembly files:
code
org $0
;------------------------------------------------------------------------------
; void _math_muli_16_16_32(int16 left_value, int16 right_value, int32 *pResult)
;------------------------------------------------------------------------------
; x - int16 left value
; stack:
; 2,3 - int16 right value
; 4,5 - int32* result_ptr
_math_muli_16_16_32:
right_val = 2
result_ptr = 4
tfr x,d
muld right_val, s
stq [result_ptr, s]
rts
;------------------------------------------------------------------------------
; Define public symbols, processed by cc3monitor -a09 <src_filename.bin>
;------------------------------------------------------------------------------
fcb 0x12, 0x35, 0xFF, 0xF0
fcc "_math_muli_16_16_32$"
fcw _math_muli_16_16_32
.module asmhelpers
.area .text
.globl _math_muli_16_16_32
_math_muli_16_16_32:
.byte 0x1F
.byte 0x10
.byte 0x11
.byte 0xAF
.byte 0x62
.byte 0x10
.byte 0xED
.byte 0xF8
.byte 0x4
.byte 0x39
This is a cheesy hack but works fine (for a hobby project).
Monday, March 3, 2014
Source level debugger and monitor app for 6809/6309 CPU
I've been working on a Linux OpenGL debugger for about a year now, so I figured it would be fun and educational to create a low-level CPU debugger just to learn more about the problem domain. (I'll eventually use all this stuff to remotely debug on various tiny microcontrollers, so there's some practical value in all this work too.) To make the effort more interesting (and achievable in my spare time), I'm doing it for the simple 6809/6309 CPU's and interfacing it to an old 8-bit computer (Tandy CoCo3) over a serial port. (Yes, I could emulate all this stuff, but there's not nearly as much fun in that. I want to work with *real* hardware!)
I first wrote a small monitor program for the 6809, so I could remotely control and debug program execution over the CoCo3's "bit banging" serial port. There's a bit of assembly to handle the stack manipulation, but it's written entirely in C otherwise using gcc6809. This monitor function lives in a single SWI (software interrupt) handler and only supports very basic ops: read/write memory, read/write main program's registers (which are popped/pushed on the main program's stack in the SWI handler), ping, "trampoline" (copy memory from source/destination and transfer control to the specified address), or return from the SWI interrupt handler and continue main program execution. The monitor also hooks FIRQ and enables the RS-232 port's CD (carrier detect) level sensitive interrupt so I can remotely trigger asynchronous breakpoints by toggling the DTR pin. (My DB9->CoCo serial cable is wired so DTR from the PC is hooked up to the CoCo's CD pin.)
With this design I can remotely do pretty much anything I want with the machine being debugged. Once the remote machine is running the monitor I can write a new program to memory and start it (even overwriting the currently executing program and monitor using the trampoline command), examine and modify memory/registers, implement new debugging features, etc. without having to modify (and then possibly debug) the 6809 monitor function itself.
The client app is written in C++ in the VOGL codebase and supports the usual monitor-type commands, plus a bunch of commands for debugging, 6809/6309 disassembly, loading DECB (Microsoft Disk Extended Color BASIC) .bin files into memory, dumping memory to .bin files, etc. It supports both assembly and simple source level debugging. You can single step by instructions or lines (step into, step over, or step out), get callstacks with symbols, and print parameters and local variables. I'm parsing the debug STAB information generated by gcc in the assembly .S files, and the NOICE debug information generated by aslink to get type and symbol address information.
Robust callstacks are surprisingly tough to get working. The S register is constantly manipulated by the compiler and there's no stack base register when optimizations are enabled. So it's hard to reliably determine the return addresses without some extra information to help the process along. To get callstacks I modified gcc6809 to optionally insert a handful of prolog/epilog instructions into each generated function (2 at the beginning and 1 at the end). The prolog sequence stores the current value of the S register into a separate 256-byte stack located at absolute address 0x100. (It stores a word, but the stack pointer is only decremented by a single byte because I only care about the lowest byte of the stack register. My stacks are <= 256 bytes.) The debugger reads this stack of "stack pointers" to figure out what the S register was at the beginning of each function. It can then determine where the return PC's are located in the real system hardware stack.
The 6809 code to do this uses no registers, just a single global pointer at absolute address 0xFE and indirect addressing:
0x0628 7A 00 FF _main: DEC $00FF (m15+0xF0)
0x062B 10 EF 9F 00 FE STS [$00FE (m15+0xEF)]
0x0630 34 40 PSHS U
0x0632 33 E4 LEAU , S
0x0634 func: _main line: test.c(101):
coco3_init();
0x0634 BD 0E 9E JSR _coco3_init ($0E9E)
0x0637 func: _main line: test.c(103):
monitor_start();
0x0637 BD 08 03 JSR _monitor_start ($0803)
0x063A func: _main line: test.c(105):
coco3v_text_init();
0x063A BD 25 64 JSR _coco3v_text_init ($2564)
0x063D func: _main line: test.c(106):
core_printf("start\r\n");
0x063D 8E 06 20 LDX #$0620
0x0640 34 10 PSHS X
0x0642 BD 1D 5C JSR _core_printf ($1D5C)
0x0645 32 62 LEAS 2, S
0x0647 func: _main line: test.c(108):
test_func();
0x0647 BD 03 8F JSR _test_func ($038F)
0x064A func: _main line: test.c(110):
core_hault();
0x064A BD 1D F7 JSR _core_hault ($1DF7)
0x064D func: _main line: test.c(112):
return 0;
0x064D 8E 00 00 LDX #$0000
0x0650 7C 00 FF INC $00FF (m15+0xF0)
0x0653 func: _main line: test.c(113):
}
0x0653 35 C0 PULS PC, U
I first wrote a small monitor program for the 6809, so I could remotely control and debug program execution over the CoCo3's "bit banging" serial port. There's a bit of assembly to handle the stack manipulation, but it's written entirely in C otherwise using gcc6809. This monitor function lives in a single SWI (software interrupt) handler and only supports very basic ops: read/write memory, read/write main program's registers (which are popped/pushed on the main program's stack in the SWI handler), ping, "trampoline" (copy memory from source/destination and transfer control to the specified address), or return from the SWI interrupt handler and continue main program execution. The monitor also hooks FIRQ and enables the RS-232 port's CD (carrier detect) level sensitive interrupt so I can remotely trigger asynchronous breakpoints by toggling the DTR pin. (My DB9->CoCo serial cable is wired so DTR from the PC is hooked up to the CoCo's CD pin.)
With this design I can remotely do pretty much anything I want with the machine being debugged. Once the remote machine is running the monitor I can write a new program to memory and start it (even overwriting the currently executing program and monitor using the trampoline command), examine and modify memory/registers, implement new debugging features, etc. without having to modify (and then possibly debug) the 6809 monitor function itself.
The client app is written in C++ in the VOGL codebase and supports the usual monitor-type commands, plus a bunch of commands for debugging, 6809/6309 disassembly, loading DECB (Microsoft Disk Extended Color BASIC) .bin files into memory, dumping memory to .bin files, etc. It supports both assembly and simple source level debugging. You can single step by instructions or lines (step into, step over, or step out), get callstacks with symbols, and print parameters and local variables. I'm parsing the debug STAB information generated by gcc in the assembly .S files, and the NOICE debug information generated by aslink to get type and symbol address information.
Robust callstacks are surprisingly tough to get working. The S register is constantly manipulated by the compiler and there's no stack base register when optimizations are enabled. So it's hard to reliably determine the return addresses without some extra information to help the process along. To get callstacks I modified gcc6809 to optionally insert a handful of prolog/epilog instructions into each generated function (2 at the beginning and 1 at the end). The prolog sequence stores the current value of the S register into a separate 256-byte stack located at absolute address 0x100. (It stores a word, but the stack pointer is only decremented by a single byte because I only care about the lowest byte of the stack register. My stacks are <= 256 bytes.) The debugger reads this stack of "stack pointers" to figure out what the S register was at the beginning of each function. It can then determine where the return PC's are located in the real system hardware stack.
The 6809 code to do this uses no registers, just a single global pointer at absolute address 0xFE and indirect addressing:
0x0628 7A 00 FF _main: DEC $00FF (m15+0xF0)
0x062B 10 EF 9F 00 FE STS [$00FE (m15+0xEF)]
0x0630 34 40 PSHS U
0x0632 33 E4 LEAU , S
0x0634 func: _main line: test.c(101):
coco3_init();
0x0634 BD 0E 9E JSR _coco3_init ($0E9E)
0x0637 func: _main line: test.c(103):
monitor_start();
0x0637 BD 08 03 JSR _monitor_start ($0803)
0x063A func: _main line: test.c(105):
coco3v_text_init();
0x063A BD 25 64 JSR _coco3v_text_init ($2564)
0x063D func: _main line: test.c(106):
core_printf("start\r\n");
0x063D 8E 06 20 LDX #$0620
0x0640 34 10 PSHS X
0x0642 BD 1D 5C JSR _core_printf ($1D5C)
0x0645 32 62 LEAS 2, S
0x0647 func: _main line: test.c(108):
test_func();
0x0647 BD 03 8F JSR _test_func ($038F)
0x064A func: _main line: test.c(110):
core_hault();
0x064A BD 1D F7 JSR _core_hault ($1DF7)
0x064D func: _main line: test.c(112):
return 0;
0x064D 8E 00 00 LDX #$0000
0x0650 7C 00 FF INC $00FF (m15+0xF0)
0x0653 func: _main line: test.c(113):
}
0x0653 35 C0 PULS PC, U
Some pics of the monitor client app, showing source level disassembly, callstacks, symbols, etc. The monitor's serial protocol is mostly synchronous and I'm paranoid about checksumming everything (because bit banging at 115200 baud is not 100% robust on this hardware).
Here's the physical hardware running a heap test program. The cross platform C codebase compiles on both the PC using clang, and on the CoCo using gcc6809. I'm doing this cross platform because it's still *much* easier to debug on the PC using QtCreator vs. remotely debugging using my monitor app. Using the monitor to debug problems, even with symbols, makes me totally appreciate how good QtCreator's debugger actually is!
Subscribe to:
Posts (Atom)





