I first wrote a small monitor program for the 6809, so I could remotely control and debug program execution over the CoCo3's "bit banging" serial port. There's a bit of assembly to handle the stack manipulation, but it's written entirely in C otherwise using gcc6809. This monitor function lives in a single SWI (software interrupt) handler and only supports very basic ops: read/write memory, read/write main program's registers (which are popped/pushed on the main program's stack in the SWI handler), ping, "trampoline" (copy memory from source/destination and transfer control to the specified address), or return from the SWI interrupt handler and continue main program execution. The monitor also hooks FIRQ and enables the RS-232 port's CD (carrier detect) level sensitive interrupt so I can remotely trigger asynchronous breakpoints by toggling the DTR pin. (My DB9->CoCo serial cable is wired so DTR from the PC is hooked up to the CoCo's CD pin.)
With this design I can remotely do pretty much anything I want with the machine being debugged. Once the remote machine is running the monitor I can write a new program to memory and start it (even overwriting the currently executing program and monitor using the trampoline command), examine and modify memory/registers, implement new debugging features, etc. without having to modify (and then possibly debug) the 6809 monitor function itself.
The client app is written in C++ in the VOGL codebase and supports the usual monitor-type commands, plus a bunch of commands for debugging, 6809/6309 disassembly, loading DECB (Microsoft Disk Extended Color BASIC) .bin files into memory, dumping memory to .bin files, etc. It supports both assembly and simple source level debugging. You can single step by instructions or lines (step into, step over, or step out), get callstacks with symbols, and print parameters and local variables. I'm parsing the debug STAB information generated by gcc in the assembly .S files, and the NOICE debug information generated by aslink to get type and symbol address information.
Robust callstacks are surprisingly tough to get working. The S register is constantly manipulated by the compiler and there's no stack base register when optimizations are enabled. So it's hard to reliably determine the return addresses without some extra information to help the process along. To get callstacks I modified gcc6809 to optionally insert a handful of prolog/epilog instructions into each generated function (2 at the beginning and 1 at the end). The prolog sequence stores the current value of the S register into a separate 256-byte stack located at absolute address 0x100. (It stores a word, but the stack pointer is only decremented by a single byte because I only care about the lowest byte of the stack register. My stacks are <= 256 bytes.) The debugger reads this stack of "stack pointers" to figure out what the S register was at the beginning of each function. It can then determine where the return PC's are located in the real system hardware stack.
The 6809 code to do this uses no registers, just a single global pointer at absolute address 0xFE and indirect addressing:
0x0628 7A 00 FF _main: DEC $00FF (m15+0xF0)
0x062B 10 EF 9F 00 FE STS [$00FE (m15+0xEF)]
0x0630 34 40 PSHS U
0x0632 33 E4 LEAU , S
0x0634 func: _main line: test.c(101):
0x0634 BD 0E 9E JSR _coco3_init ($0E9E)
0x0637 func: _main line: test.c(103):
0x0637 BD 08 03 JSR _monitor_start ($0803)
0x063A func: _main line: test.c(105):
0x063A BD 25 64 JSR _coco3v_text_init ($2564)
0x063D func: _main line: test.c(106):
0x063D 8E 06 20 LDX #$0620
0x0640 34 10 PSHS X
0x0642 BD 1D 5C JSR _core_printf ($1D5C)
0x0645 32 62 LEAS 2, S
0x0647 func: _main line: test.c(108):
0x0647 BD 03 8F JSR _test_func ($038F)
0x064A func: _main line: test.c(110):
0x064A BD 1D F7 JSR _core_hault ($1DF7)
0x064D func: _main line: test.c(112):
0x064D 8E 00 00 LDX #$0000
0x0650 7C 00 FF INC $00FF (m15+0xF0)
0x0653 func: _main line: test.c(113):
0x0653 35 C0 PULS PC, U
Some pics of the monitor client app, showing source level disassembly, callstacks, symbols, etc. The monitor's serial protocol is mostly synchronous and I'm paranoid about checksumming everything (because bit banging at 115200 baud is not 100% robust on this hardware).
Here's the physical hardware running a heap test program. The cross platform C codebase compiles on both the PC using clang, and on the CoCo using gcc6809. I'm doing this cross platform because it's still *much* easier to debug on the PC using QtCreator vs. remotely debugging using my monitor app. Using the monitor to debug problems, even with symbols, makes me totally appreciate how good QtCreator's debugger actually is!