Monday, March 3, 2014

Source level debugger and monitor app for 6809/6309 CPU

I've been working on a Linux OpenGL debugger for about a year now, so I figured it would be fun and educational to create a low-level CPU debugger just to learn more about the problem domain. (I'll eventually use all this stuff to remotely debug on various tiny microcontrollers, so there's some practical value in all this work too.) To make the effort more interesting (and achievable in my spare time), I'm doing it for the simple 6809/6309 CPU's and interfacing it to an old 8-bit computer (Tandy CoCo3) over a serial port. (Yes, I could emulate all this stuff, but there's not nearly as much fun in that. I want to work with *real* hardware!)

I first wrote a small monitor program for the 6809, so I could remotely control and debug program execution over the CoCo3's "bit banging" serial port. There's a bit of assembly to handle the stack manipulation, but it's written entirely in C otherwise using gcc6809. This monitor function lives in a single SWI (software interrupt) handler and only supports very basic ops: read/write memory, read/write main program's registers (which are popped/pushed on the main program's stack in the SWI handler), ping, "trampoline" (copy memory from source/destination and transfer control to the specified address), or return from the SWI interrupt handler and continue main program execution. The monitor also hooks FIRQ and enables the RS-232 port's CD (carrier detect) level sensitive interrupt so I can remotely trigger asynchronous breakpoints by toggling the DTR pin. (My DB9->CoCo serial cable is wired so DTR from the PC is hooked up to the CoCo's CD pin.)

With this design I can remotely do pretty much anything I want with the machine being debugged. Once the remote machine is running the monitor I can write a new program to memory and start it (even overwriting the currently executing program and monitor using the trampoline command), examine and modify memory/registers, implement new debugging features, etc. without having to modify (and then possibly debug) the 6809 monitor function itself.

The client app is written in C++ in the VOGL codebase and supports the usual monitor-type commands, plus a bunch of commands for debugging, 6809/6309 disassembly, loading DECB (Microsoft Disk Extended Color BASIC) .bin files into memory, dumping memory to .bin files, etc. It supports both assembly and simple source level debugging. You can single step by instructions or lines (step into, step over, or step out), get callstacks with symbols, and print parameters and local variables. I'm parsing the debug STAB information generated by gcc in the assembly .S files, and the NOICE debug information generated by aslink to get type and symbol address information.

Robust callstacks are surprisingly tough to get working. The S register is constantly manipulated by the compiler and there's no stack base register when optimizations are enabled. So it's hard to reliably determine the return addresses without some extra information to help the process along. To get callstacks I modified gcc6809 to optionally insert a handful of prolog/epilog instructions into each generated function (2 at the beginning and 1 at the end). The prolog sequence stores the current value of the S register into a separate 256-byte stack located at absolute address 0x100. (It stores a word, but the stack pointer is only decremented by a single byte because I only care about the lowest byte of the stack register. My stacks are <= 256 bytes.) The debugger reads this stack of "stack pointers" to figure out what the S register was at the beginning of each function. It can then determine where the return PC's are located in the real system hardware stack.

The 6809 code to do this uses no registers, just a single global pointer at absolute address 0xFE and indirect addressing:

0x0628 7A 00 FF         _main:                  DEC   $00FF (m15+0xF0)          
0x062B 10 EF 9F 00 FE                           STS   [$00FE (m15+0xEF)]        
0x0630 34 40                                    PSHS  U                         
0x0632 33 E4                                    LEAU  , S                       
0x0634 func: _main line: test.c(101):
    coco3_init();
0x0634 BD 0E 9E                                 JSR   _coco3_init ($0E9E)       
0x0637 func: _main line: test.c(103):
monitor_start();
0x0637 BD 08 03                                 JSR   _monitor_start ($0803)    
0x063A func: _main line: test.c(105):
coco3v_text_init();
0x063A BD 25 64                                 JSR   _coco3v_text_init ($2564) 
0x063D func: _main line: test.c(106):
    core_printf("start\r\n");
0x063D 8E 06 20                                 LDX   #$0620                    
0x0640 34 10                                    PSHS  X                         
0x0642 BD 1D 5C                                 JSR   _core_printf ($1D5C)      
0x0645 32 62                                    LEAS  2, S                      
0x0647 func: _main line: test.c(108):
test_func();
0x0647 BD 03 8F                                 JSR   _test_func ($038F)        
0x064A func: _main line: test.c(110):
    core_hault();
0x064A BD 1D F7                                 JSR   _core_hault ($1DF7)       
0x064D func: _main line: test.c(112):
    return 0;
0x064D 8E 00 00                                 LDX   #$0000                    
0x0650 7C 00 FF                                 INC   $00FF (m15+0xF0)          
0x0653 func: _main line: test.c(113):
}

0x0653 35 C0                                    PULS  PC, U                     

Some pics of the monitor client app, showing source level disassembly, callstacks, symbols, etc. The monitor's serial protocol is mostly synchronous and I'm paranoid about checksumming everything (because bit banging at 115200 baud is not 100% robust on this hardware).





Here's the physical hardware running a heap test program. The cross platform C codebase compiles on both the PC using clang, and on the CoCo using gcc6809. I'm doing this cross platform because it's still *much* easier to debug on the PC using QtCreator vs. remotely debugging using my monitor app. Using the monitor to debug problems, even with symbols, makes me totally appreciate how good QtCreator's debugger actually is!


Monday, February 17, 2014

CoCo 3 Upgrades: Hitachi 6309 CPU, 512KB RAM, PS2 keyboard

Installed a bunch of CoCo 3 upgrades from Cloud-9 Tech over the weekend:

  • I upgraded the old 68B09 CPU to the powerful Hitachi 63C09. This involved desoldering the old CPU and replacing it with a socket. I also put in a Pro-Tector+ board to protect the CPU from the inevitable torture I have planned for this thing (once I get all of my electronics gear out of storage and back in one place).




  • The Cloud-9 512K Triad upgrade board (the blue triangle) was trivial to install by comparison. Following the instructions, I removed the four old (128K) RAM chips, snipped a couple capacitors, and plugged it in:


  • Cloud-9 also sells a nice PS2-keyboard upgrade board which was an easy install (no soldering):



I enabled 6309 native mode (15%+ faster vs. 6809 mode) and tested it with my gcc6809 compiled test program. Here it is outputting text to the 40x24 text mode:



I'm currently scrolling the text screen up using a simple C routine. It's so embarrassingly slow right now (even at 1.89MHz 6309 native mode) that you can kinda see the scroll function move the lines up the screen. But this is fine for simple printf()-style debug output. I'm using this BSD-licensed tiny printf() for embedded applications.

I've also compiled in the DriveWire 4 assembly 115kbps/230kbps I/O routines into this test app, so I can do disk I/O without relying on OS/9 or the BASIC ROM routines. My plan going forward is to continue completely "taking over" the machine and just do my own thing (no OS at all). It should be easy to code up a DriveWire compatible I/O disk module (here's the DriveWire protocol specification).

Monday, February 10, 2014

The Color Computer 3's 256 Color (Artifacting) Mode

I've begun playing around with the CoCo3's 256-color artifact color mode (more info herehere, and here, and a Youtube video here). It only works on the composite or RF modulator outputs. You basically set the 640x192x4 mode, fill the 4-color palette with grayscale entries (palette entries 0, 16, 32, 63), and then treat each byte as a separate pixel. (Normally, each byte would contain 4 separate pixels in this mode.)

The various grayscale signal patterns cause different sorts of chroma channel "leakage", leading to the below colors. (The X axis is the least significant nibble in this shot.) This shot was taken on an old LCD monitor (Samsung 150MP), hooked up to the composite input. The actual colors were more vibrant/unique than they appear in this photo.


It's a pretty sweet mode, and what's amazing to me is that all the programmers from the late 80's who worked on this platform mostly ignored it (probably because it didn't work on RGB monitors). Some pretty sweet things could have been done with it because 1 pixel per byte is quite convenient and it didn't need any funky tricks like frame flipping, or mucking with video palette registers in a interrupt handler. I still remember seeing some 256-color images of some photos in the late 80's for the first time (at a Radio Shack store on a Tandy 1000), and being utterly amazed at the detail.

Apparently Tandy engineers were planning on including a real (not artifacted) 256 color mode in the updated CoCo 3, but the execs didn't want their little CoCo line to compete against their big Tandy 1000's. So in a ridiculously shortsighted decision they nixed the idea. However, there were rumors that this mode was actually implemented in the CoCo3's "GIME" graphics chip anyway but just not documented. The fascinating history and technical details can be read about here. Through some sleuthing the author even tracked down and interviewed the hardware engineer who created the CoCo 3's GIME chip, John Prickett.

I've decoded a color JPEG using this mode. These first results are totally crude, but hey it's progress. My palette currently sucks -- I built it from the above image taken on an iPhone just to get things rolling. I didn't use dithering, which would certainly help, but the palette entries are the real problem right now. I've not been able to find an "official" palette or conversion algorithm anywhere yet, so I'm going to somehow either compute one or (maybe better) find a video capture card somewhere and just sample it.


Sunday, February 9, 2014

picojpeg: Decoding Lena on a Tandy Color Computer 3

Got a small grayscale version of Lena decoding to the 640x192x4 (HSCREEN 4) graphics mode using my picojpeg.c module. Here it is on an old LCD monitor hooked up to the CoCo's composite output:


In case you didn't know, the Color Computer is a classic 8/16-bit personal home computer from Tandy Corp./Radio Shack. The particular model I'm using (CoCo 3) was released in 1986. (As a kid I always wanted a CoCo 3, but I was stuck with a 16KB CoCo 2 and by the time I could afford to upgrade I had moved on to the PC.)

This is using a 2x2 ordered dither matrix, zoomed by 2x horizontally (displayed as 256x128), using composite out as a sort of inherent low pass filter. The BASIC program currently sets the graphics mode, programs the grayscale palette, and then jumps into the C code's _start function:

10 CLEAR12,9984
20 LOADM"TEST.BIN
30 HSCREEN4
40 PALETTE0,0
50 PALETTE1,16
60 PALETTE2,32
70 PALETTE3,48
80 EXEC9991

The C code (compiled using gcc6809 under Linux) disables interrupts, maps the graphics pages into the ROM area (starting at 0x8000, ending at 0xFEFF or thereabouts) using the MMU0 registers, then decodes the ~4KB JPEG and plots the pixels as each JPEG MCU block is decoded. It currently takes about 45 seconds to decode the 128x128 image, which beats the PIC18F microcontroller I was using a few years ago to originally test this code (that thing would take 10-20 minutes!).

Everything is completely unoptimized - I'm just glad to get it working at all. Using a CoCo to prototype/test this kind of stuff is actually much nicer overall than working on a tiny microcontroller. The CoCo community has compiled a large suite of documentation over the years, and there's a very nice set of tools available. Not to mention I've got built-in (albeit limited) graphics, 128KB of RAM, a 6-bit DAC for sound output, and various forms of input. The 6809 is a surprisingly powerful 8/16-bit CPU to work on - if you push it right.

This poor 1986 CoCo3 has seen better days (the case is an all-yellow wreck). I need to find a machine in better shape:


Zoomed in further, with a 4x4 dither matrix:


picojpeg on a Tandy Color Computer 3 (6809 processor)



I've gotten my picojpeg.c module decoding JPEG images on the (almost) 30 year old CoCo 3. Here's my first 16x16 decoded image on a 6809 CPU. (For all I know, it could be the first JPEG to be decoded on this particular CPU.) The upper nibble of the red component was written to each character of the default low-res 32x16 text screen. It's currently damn ugly, but this is good enough to verify the pixels are being decoded in this test image.

For the compiler, I'm using gcc6809, compiled under Ubuntu 13.10 (with several evil workarounds to get the compiler to build at all), ToolShed to convert the .bin file to a CoCo .dsk image file (running via Wine), and DriveWire 4 (running on another machine with Win7) to transfer the .dsk file to the CoCo. The .dsk image also has a small MS BASIC program to LOADM the .bin file and EXEC it at the right start address.

Unfortunately, I can't run it again without restarting the machine and booting up DriveWire again -- no idea why yet. If I hit reset and EXEC the decoder again it does something but the output image is bogus. This process is terribly slow because after restarting the machine I must CLOADM the DriveWire 4 client over the cassette port (hooked up to my PC's sound output). The DriveWire booting process uses Winamp to play back the cassette sound file!



 The test JPEG, enlarged by 8x and converted to PNG:


The next step is to figure out why I can't restart it without shutting down, and then switch it over to a CoCo 3 16-color graphics mode and add some sort of dithered output. After that comes lots of 6809 specific optimization. It'll still be slow as molasses, even after tuning it, but it's fun anyway.

Figuring out how to build gcc6809 under Ubuntu 13.10, then to get this sucker to output CoCo compatible .bin files was a pain. Figuring out how to enable gcc compiler optimizations without causing aslink to core dump was even more fun. I'll post all the things I had to do soon. It's pretty cool to have a modern C compiler capable of targeting the machine you started with..

Sunday, February 2, 2014

Few tips from Mike Sartain on QtCreator Projects, Ninja warning/error parsing

Mike has been improving how we use QtCreator on vogl a lot recently:

QtCreator projects

QtCreator and Ninja warning/error parsing

QtCreator's support for custom builds (in our case, cmake+ninja) is surprisingly flexible.

Friday, January 17, 2014

GL Game Devs: Please Send Us Your Game Traces!

We're going through all games in the Steam Linux catalog to ensure they are compatible with our new GL tracer/debugger toolset (VOGL). But this is backwards looking: what we really want is to ensure our full-stream tracing and state snapshot/restore code is compatible and correct with your new (most likely unreleased) GL code.

So if you would like to help right now, please use apitrace (on any "big" GL platform: Linux/OSX/Windows) to make a 30-90 second trace of your app's gameplay:
https://github.com/apitrace/apitrace

The more representative the trace is of your actual (end-user) GL usage, the better. We're especially interested in advanced GL 3.x/4.x usage patterns, or any new extensions you use. We'll add your trace to our private correctness and regression testing repo. We'll replay your trace, capture it with our tools, and take it from there.

E-mail me (richg at valvesoftware dot com) with the location, or for very large traces that are too big for dropbox, etc. we'll set you up a private FTP site.

Also, if your game is already on Steam Linux and you're actively debugging GL issues I can move your app to the top of the testing list.