We promised at Steam Dev Days we would open source the project, so here it is:
https://github.com/ValveSoftware/vogl
Creating a OpenGL debugger that handles both full-stream tracing *and* state snapshotting (with compat profile support to boot!) is a surprisingly massive undertaking for ~3 devs, so please bear with us. We're knee deep in fleshing out the UI and improving the tracer/replayer to be fully compatible with GL v3.3 (4.x will be later this year). Please file bug reports on github and send us trace logs (or apitrace/vogl traces), etc. and we'll do our best to make it work with your app.
We'll be posting more instructions and our current TODO list on the wiki soon.
We're currently in the process of adding PBO support (done, testing it right now), and we've added the ability to snapshot while buffers are mapped during replaying. (Both things are needed to trace/replay/snapshot Steam 10ft.)
Co-owner of Binomial LLC, working on GPU texture interchange. Open source developer, graphics programmer, former video game developer. Worked previously at SpaceX (Starlink), Valve, Ensemble Studios (Microsoft), DICE Canada.
Wednesday, March 12, 2014
Tuesday, March 11, 2014
togl D3D9->OpenGL layer source release on github
This is a raw dump of the togl layer right from DoTA2:
https://github.com/ValveSoftware/ToGL
This is old news by now; I think the press picked up on this even before I heard it was finally released. I really wish we had the time to package it better (so you could actually compile it!) with some examples, etc. There's a ton of practical Linux GL driver know-how packed all over this code -- if you look carefully. Every Valve Source1 title ultimately goes through this layer on Linux/Mac. (The Mac ports do use a different, much earlier branch, however. At some point the Linux and Mac branches merged back together, but I don't know if that occurred in time for DoTA's version.)
We talked a lot about what we learned while working on this layer at GDC last year:
Porting Source to Linux: Valve's Lessons Learned
https://github.com/ValveSoftware/ToGL
This is old news by now; I think the press picked up on this even before I heard it was finally released. I really wish we had the time to package it better (so you could actually compile it!) with some examples, etc. There's a ton of practical Linux GL driver know-how packed all over this code -- if you look carefully. Every Valve Source1 title ultimately goes through this layer on Linux/Mac. (The Mac ports do use a different, much earlier branch, however. At some point the Linux and Mac branches merged back together, but I don't know if that occurred in time for DoTA's version.)
We talked a lot about what we learned while working on this layer at GDC last year:
Porting Source to Linux: Valve's Lessons Learned
https://www.youtube.com/watch?v=btNVfUygvio
Or here:
https://developer.nvidia.com/gdc-2013
There's a lot of history to this particular code. This layer was first started by the Mac team, then later ported from Mac to Linux by the Steam team, and then finally ported by the Linux team to Windows (!) so we could actually debug it. (Because the best available GL debuggers at the time were Windows-only. We are working to correct that with vogl.) John McDonald, Rick Johnson, Mike Sartain, Pierre-Loup Griffais and I then got our hands on it (at various times) to push it down the correctness (for Source1) and performance axes. I spent many months agonizing over this layer's per-batch flush path: tweaking, profiling (with Rad's awesome Telemetry tool), optimizing, and testing it to run the Source1 engine correctly, quickly, and reliably on the drivers available for Linux.
The code is far from perfect: many parts are more like a battleground in there. It's optimized for results, and the primary metrics for success were perf vs. Windows and Source1 correctness, sometimes to the detriment of other factors. A lot of experiments were conducted, some blind alleys were backed out of, and we learned *a lot* about the true state of OpenGL drivers during the effort. If you want to see how to stay in the "fast lanes" of multiple Linux GL drivers simultaneously it might be worth checking out. (Most of the Linux drivers share common codebases with the Windows GL drivers, so a lot of what's in there is relevant to Windows GL too.)
(The first version of this post stated there was another version of togl that supported both Mac and Linux, and had all the SM3 fixes I made for various projects. Turns out the version on github is the very latest version, because all the togl branches were merged back into Dota2 at some point.)
Or here:
https://developer.nvidia.com/gdc-2013
There's a lot of history to this particular code. This layer was first started by the Mac team, then later ported from Mac to Linux by the Steam team, and then finally ported by the Linux team to Windows (!) so we could actually debug it. (Because the best available GL debuggers at the time were Windows-only. We are working to correct that with vogl.) John McDonald, Rick Johnson, Mike Sartain, Pierre-Loup Griffais and I then got our hands on it (at various times) to push it down the correctness (for Source1) and performance axes. I spent many months agonizing over this layer's per-batch flush path: tweaking, profiling (with Rad's awesome Telemetry tool), optimizing, and testing it to run the Source1 engine correctly, quickly, and reliably on the drivers available for Linux.
The code is far from perfect: many parts are more like a battleground in there. It's optimized for results, and the primary metrics for success were perf vs. Windows and Source1 correctness, sometimes to the detriment of other factors. A lot of experiments were conducted, some blind alleys were backed out of, and we learned *a lot* about the true state of OpenGL drivers during the effort. If you want to see how to stay in the "fast lanes" of multiple Linux GL drivers simultaneously it might be worth checking out. (Most of the Linux drivers share common codebases with the Windows GL drivers, so a lot of what's in there is relevant to Windows GL too.)
(The first version of this post stated there was another version of togl that supported both Mac and Linux, and had all the SM3 fixes I made for various projects. Turns out the version on github is the very latest version, because all the togl branches were merged back into Dota2 at some point.)
Saturday, March 8, 2014
Finished Up Support for the Hitachi 6309 CPU
I've added Hitachi 6309 support to my disassembler, monitor interrupt handlers, and monitor client. So I'm now able to switch over my 6809 test app to use the 6309's "native" mode, which is something like 15-30% faster. I can single step over 6309 code sequences and display/modify the extra registers (E/F/V). I verified my disassembler by using the a09 6809/6309 cross assembler to assemble a bunch of test 6309 code, disassemble it, then diff the results vs. the original code. Lomont's 6309 Reference and The 6309 Book are the best references I've found.
The 6309 is amazingly powerful for its time. You've got some 32-bit ops, fast CPU memory transfers, hardware division/multiplication, various register to register ops, and two more 16-bit regs to play with over the 6809 (W and V, although V is limited to exchanges/transfers). W is hybrid register, useful as a pointer or general purpose register. I've written a good deal of real mode (16-bit segmented) 8086/80286 assembly back in the day, and I really like the feel of 6309 assembly.
Unfortunately, the assembler used by gcc6809 (as6809) doesn't support the 6309. The gcc6809 package comes with a 6309 assembler (as6309), but it doesn't compile out of the box. I got it to compile but it's very clear that whoever worked on it never finished it. I made a quick stab at fixing up as6309 but to be honest the C code in there is like assembly code (with unfathomable 2-3 letter variable names and obfuscated program flow), and I don't have time to get into it for a hobby project.
So for now, I'm using the a09 assembler (which does support 6309) to create position independent code (at address 0) contained in simple .bin files which I convert to as09 assembly source files. The .s files contain nothing but ".byte 0xXX" statements and the symbols. To get the symbols I manually place a small symbol table at the end of the .bin file that is automatically located and parsed using a custom command line tool which converts the a09 assembled .bin file to .s assembly files:
code
org $0
;------------------------------------------------------------------------------
; void _math_muli_16_16_32(int16 left_value, int16 right_value, int32 *pResult)
;------------------------------------------------------------------------------
; x - int16 left value
; stack:
; 2,3 - int16 right value
; 4,5 - int32* result_ptr
_math_muli_16_16_32:
right_val = 2
result_ptr = 4
tfr x,d
muld right_val, s
stq [result_ptr, s]
rts
;------------------------------------------------------------------------------
; Define public symbols, processed by cc3monitor -a09 <src_filename.bin>
;------------------------------------------------------------------------------
fcb 0x12, 0x35, 0xFF, 0xF0
This gets converted to an .s file which the gcc6809 tool chain likes:
.module asmhelpers
.area .text
.globl _math_muli_16_16_32
_math_muli_16_16_32:
.byte 0x1F
.byte 0x10
.byte 0x11
.byte 0xAF
.byte 0x62
.byte 0x10
.byte 0xED
.byte 0xF8
.byte 0x4
.byte 0x39
This is a cheesy hack but works fine (for a hobby project).
The 6309 is amazingly powerful for its time. You've got some 32-bit ops, fast CPU memory transfers, hardware division/multiplication, various register to register ops, and two more 16-bit regs to play with over the 6809 (W and V, although V is limited to exchanges/transfers). W is hybrid register, useful as a pointer or general purpose register. I've written a good deal of real mode (16-bit segmented) 8086/80286 assembly back in the day, and I really like the feel of 6309 assembly.
Unfortunately, the assembler used by gcc6809 (as6809) doesn't support the 6309. The gcc6809 package comes with a 6309 assembler (as6309), but it doesn't compile out of the box. I got it to compile but it's very clear that whoever worked on it never finished it. I made a quick stab at fixing up as6309 but to be honest the C code in there is like assembly code (with unfathomable 2-3 letter variable names and obfuscated program flow), and I don't have time to get into it for a hobby project.
So for now, I'm using the a09 assembler (which does support 6309) to create position independent code (at address 0) contained in simple .bin files which I convert to as09 assembly source files. The .s files contain nothing but ".byte 0xXX" statements and the symbols. To get the symbols I manually place a small symbol table at the end of the .bin file that is automatically located and parsed using a custom command line tool which converts the a09 assembled .bin file to .s assembly files:
code
org $0
;------------------------------------------------------------------------------
; void _math_muli_16_16_32(int16 left_value, int16 right_value, int32 *pResult)
;------------------------------------------------------------------------------
; x - int16 left value
; stack:
; 2,3 - int16 right value
; 4,5 - int32* result_ptr
_math_muli_16_16_32:
right_val = 2
result_ptr = 4
tfr x,d
muld right_val, s
stq [result_ptr, s]
rts
;------------------------------------------------------------------------------
; Define public symbols, processed by cc3monitor -a09 <src_filename.bin>
;------------------------------------------------------------------------------
fcb 0x12, 0x35, 0xFF, 0xF0
fcc "_math_muli_16_16_32$"
fcw _math_muli_16_16_32
.module asmhelpers
.area .text
.globl _math_muli_16_16_32
_math_muli_16_16_32:
.byte 0x1F
.byte 0x10
.byte 0x11
.byte 0xAF
.byte 0x62
.byte 0x10
.byte 0xED
.byte 0xF8
.byte 0x4
.byte 0x39
This is a cheesy hack but works fine (for a hobby project).
Monday, March 3, 2014
Source level debugger and monitor app for 6809/6309 CPU
I've been working on a Linux OpenGL debugger for about a year now, so I figured it would be fun and educational to create a low-level CPU debugger just to learn more about the problem domain. (I'll eventually use all this stuff to remotely debug on various tiny microcontrollers, so there's some practical value in all this work too.) To make the effort more interesting (and achievable in my spare time), I'm doing it for the simple 6809/6309 CPU's and interfacing it to an old 8-bit computer (Tandy CoCo3) over a serial port. (Yes, I could emulate all this stuff, but there's not nearly as much fun in that. I want to work with *real* hardware!)
I first wrote a small monitor program for the 6809, so I could remotely control and debug program execution over the CoCo3's "bit banging" serial port. There's a bit of assembly to handle the stack manipulation, but it's written entirely in C otherwise using gcc6809. This monitor function lives in a single SWI (software interrupt) handler and only supports very basic ops: read/write memory, read/write main program's registers (which are popped/pushed on the main program's stack in the SWI handler), ping, "trampoline" (copy memory from source/destination and transfer control to the specified address), or return from the SWI interrupt handler and continue main program execution. The monitor also hooks FIRQ and enables the RS-232 port's CD (carrier detect) level sensitive interrupt so I can remotely trigger asynchronous breakpoints by toggling the DTR pin. (My DB9->CoCo serial cable is wired so DTR from the PC is hooked up to the CoCo's CD pin.)
With this design I can remotely do pretty much anything I want with the machine being debugged. Once the remote machine is running the monitor I can write a new program to memory and start it (even overwriting the currently executing program and monitor using the trampoline command), examine and modify memory/registers, implement new debugging features, etc. without having to modify (and then possibly debug) the 6809 monitor function itself.
The client app is written in C++ in the VOGL codebase and supports the usual monitor-type commands, plus a bunch of commands for debugging, 6809/6309 disassembly, loading DECB (Microsoft Disk Extended Color BASIC) .bin files into memory, dumping memory to .bin files, etc. It supports both assembly and simple source level debugging. You can single step by instructions or lines (step into, step over, or step out), get callstacks with symbols, and print parameters and local variables. I'm parsing the debug STAB information generated by gcc in the assembly .S files, and the NOICE debug information generated by aslink to get type and symbol address information.
Robust callstacks are surprisingly tough to get working. The S register is constantly manipulated by the compiler and there's no stack base register when optimizations are enabled. So it's hard to reliably determine the return addresses without some extra information to help the process along. To get callstacks I modified gcc6809 to optionally insert a handful of prolog/epilog instructions into each generated function (2 at the beginning and 1 at the end). The prolog sequence stores the current value of the S register into a separate 256-byte stack located at absolute address 0x100. (It stores a word, but the stack pointer is only decremented by a single byte because I only care about the lowest byte of the stack register. My stacks are <= 256 bytes.) The debugger reads this stack of "stack pointers" to figure out what the S register was at the beginning of each function. It can then determine where the return PC's are located in the real system hardware stack.
The 6809 code to do this uses no registers, just a single global pointer at absolute address 0xFE and indirect addressing:
0x0628 7A 00 FF _main: DEC $00FF (m15+0xF0)
0x062B 10 EF 9F 00 FE STS [$00FE (m15+0xEF)]
0x0630 34 40 PSHS U
0x0632 33 E4 LEAU , S
0x0634 func: _main line: test.c(101):
coco3_init();
0x0634 BD 0E 9E JSR _coco3_init ($0E9E)
0x0637 func: _main line: test.c(103):
monitor_start();
0x0637 BD 08 03 JSR _monitor_start ($0803)
0x063A func: _main line: test.c(105):
coco3v_text_init();
0x063A BD 25 64 JSR _coco3v_text_init ($2564)
0x063D func: _main line: test.c(106):
core_printf("start\r\n");
0x063D 8E 06 20 LDX #$0620
0x0640 34 10 PSHS X
0x0642 BD 1D 5C JSR _core_printf ($1D5C)
0x0645 32 62 LEAS 2, S
0x0647 func: _main line: test.c(108):
test_func();
0x0647 BD 03 8F JSR _test_func ($038F)
0x064A func: _main line: test.c(110):
core_hault();
0x064A BD 1D F7 JSR _core_hault ($1DF7)
0x064D func: _main line: test.c(112):
return 0;
0x064D 8E 00 00 LDX #$0000
0x0650 7C 00 FF INC $00FF (m15+0xF0)
0x0653 func: _main line: test.c(113):
}
0x0653 35 C0 PULS PC, U
I first wrote a small monitor program for the 6809, so I could remotely control and debug program execution over the CoCo3's "bit banging" serial port. There's a bit of assembly to handle the stack manipulation, but it's written entirely in C otherwise using gcc6809. This monitor function lives in a single SWI (software interrupt) handler and only supports very basic ops: read/write memory, read/write main program's registers (which are popped/pushed on the main program's stack in the SWI handler), ping, "trampoline" (copy memory from source/destination and transfer control to the specified address), or return from the SWI interrupt handler and continue main program execution. The monitor also hooks FIRQ and enables the RS-232 port's CD (carrier detect) level sensitive interrupt so I can remotely trigger asynchronous breakpoints by toggling the DTR pin. (My DB9->CoCo serial cable is wired so DTR from the PC is hooked up to the CoCo's CD pin.)
With this design I can remotely do pretty much anything I want with the machine being debugged. Once the remote machine is running the monitor I can write a new program to memory and start it (even overwriting the currently executing program and monitor using the trampoline command), examine and modify memory/registers, implement new debugging features, etc. without having to modify (and then possibly debug) the 6809 monitor function itself.
The client app is written in C++ in the VOGL codebase and supports the usual monitor-type commands, plus a bunch of commands for debugging, 6809/6309 disassembly, loading DECB (Microsoft Disk Extended Color BASIC) .bin files into memory, dumping memory to .bin files, etc. It supports both assembly and simple source level debugging. You can single step by instructions or lines (step into, step over, or step out), get callstacks with symbols, and print parameters and local variables. I'm parsing the debug STAB information generated by gcc in the assembly .S files, and the NOICE debug information generated by aslink to get type and symbol address information.
Robust callstacks are surprisingly tough to get working. The S register is constantly manipulated by the compiler and there's no stack base register when optimizations are enabled. So it's hard to reliably determine the return addresses without some extra information to help the process along. To get callstacks I modified gcc6809 to optionally insert a handful of prolog/epilog instructions into each generated function (2 at the beginning and 1 at the end). The prolog sequence stores the current value of the S register into a separate 256-byte stack located at absolute address 0x100. (It stores a word, but the stack pointer is only decremented by a single byte because I only care about the lowest byte of the stack register. My stacks are <= 256 bytes.) The debugger reads this stack of "stack pointers" to figure out what the S register was at the beginning of each function. It can then determine where the return PC's are located in the real system hardware stack.
The 6809 code to do this uses no registers, just a single global pointer at absolute address 0xFE and indirect addressing:
0x0628 7A 00 FF _main: DEC $00FF (m15+0xF0)
0x062B 10 EF 9F 00 FE STS [$00FE (m15+0xEF)]
0x0630 34 40 PSHS U
0x0632 33 E4 LEAU , S
0x0634 func: _main line: test.c(101):
coco3_init();
0x0634 BD 0E 9E JSR _coco3_init ($0E9E)
0x0637 func: _main line: test.c(103):
monitor_start();
0x0637 BD 08 03 JSR _monitor_start ($0803)
0x063A func: _main line: test.c(105):
coco3v_text_init();
0x063A BD 25 64 JSR _coco3v_text_init ($2564)
0x063D func: _main line: test.c(106):
core_printf("start\r\n");
0x063D 8E 06 20 LDX #$0620
0x0640 34 10 PSHS X
0x0642 BD 1D 5C JSR _core_printf ($1D5C)
0x0645 32 62 LEAS 2, S
0x0647 func: _main line: test.c(108):
test_func();
0x0647 BD 03 8F JSR _test_func ($038F)
0x064A func: _main line: test.c(110):
core_hault();
0x064A BD 1D F7 JSR _core_hault ($1DF7)
0x064D func: _main line: test.c(112):
return 0;
0x064D 8E 00 00 LDX #$0000
0x0650 7C 00 FF INC $00FF (m15+0xF0)
0x0653 func: _main line: test.c(113):
}
0x0653 35 C0 PULS PC, U
Some pics of the monitor client app, showing source level disassembly, callstacks, symbols, etc. The monitor's serial protocol is mostly synchronous and I'm paranoid about checksumming everything (because bit banging at 115200 baud is not 100% robust on this hardware).
Here's the physical hardware running a heap test program. The cross platform C codebase compiles on both the PC using clang, and on the CoCo using gcc6809. I'm doing this cross platform because it's still *much* easier to debug on the PC using QtCreator vs. remotely debugging using my monitor app. Using the monitor to debug problems, even with symbols, makes me totally appreciate how good QtCreator's debugger actually is!
Monday, February 17, 2014
CoCo 3 Upgrades: Hitachi 6309 CPU, 512KB RAM, PS2 keyboard
Installed a bunch of CoCo 3 upgrades from Cloud-9 Tech over the weekend:
I enabled 6309 native mode (15%+ faster vs. 6809 mode) and tested it with my gcc6809 compiled test program. Here it is outputting text to the 40x24 text mode:
I'm currently scrolling the text screen up using a simple C routine. It's so embarrassingly slow right now (even at 1.89MHz 6309 native mode) that you can kinda see the scroll function move the lines up the screen. But this is fine for simple printf()-style debug output. I'm using this BSD-licensed tiny printf() for embedded applications.
I've also compiled in the DriveWire 4 assembly 115kbps/230kbps I/O routines into this test app, so I can do disk I/O without relying on OS/9 or the BASIC ROM routines. My plan going forward is to continue completely "taking over" the machine and just do my own thing (no OS at all). It should be easy to code up a DriveWire compatible I/O disk module (here's the DriveWire protocol specification).
- I upgraded the old 68B09 CPU to the powerful Hitachi 63C09. This involved desoldering the old CPU and replacing it with a socket. I also put in a Pro-Tector+ board to protect the CPU from the inevitable torture I have planned for this thing (once I get all of my electronics gear out of storage and back in one place).
- The Cloud-9 512K Triad upgrade board (the blue triangle) was trivial to install by comparison. Following the instructions, I removed the four old (128K) RAM chips, snipped a couple capacitors, and plugged it in:
- Cloud-9 also sells a nice PS2-keyboard upgrade board which was an easy install (no soldering):
I enabled 6309 native mode (15%+ faster vs. 6809 mode) and tested it with my gcc6809 compiled test program. Here it is outputting text to the 40x24 text mode:
I'm currently scrolling the text screen up using a simple C routine. It's so embarrassingly slow right now (even at 1.89MHz 6309 native mode) that you can kinda see the scroll function move the lines up the screen. But this is fine for simple printf()-style debug output. I'm using this BSD-licensed tiny printf() for embedded applications.
I've also compiled in the DriveWire 4 assembly 115kbps/230kbps I/O routines into this test app, so I can do disk I/O without relying on OS/9 or the BASIC ROM routines. My plan going forward is to continue completely "taking over" the machine and just do my own thing (no OS at all). It should be easy to code up a DriveWire compatible I/O disk module (here's the DriveWire protocol specification).
Monday, February 10, 2014
The Color Computer 3's 256 Color (Artifacting) Mode
I've begun playing around with the CoCo3's 256-color artifact color mode (more info here, here, and here, and a Youtube video here). It only works on the composite or RF modulator outputs. You basically set the 640x192x4 mode, fill the 4-color palette with grayscale entries (palette entries 0, 16, 32, 63), and then treat each byte as a separate pixel. (Normally, each byte would contain 4 separate pixels in this mode.)
The various grayscale signal patterns cause different sorts of chroma channel "leakage", leading to the below colors. (The X axis is the least significant nibble in this shot.) This shot was taken on an old LCD monitor (Samsung 150MP), hooked up to the composite input. The actual colors were more vibrant/unique than they appear in this photo.
It's a pretty sweet mode, and what's amazing to me is that all the programmers from the late 80's who worked on this platform mostly ignored it (probably because it didn't work on RGB monitors). Some pretty sweet things could have been done with it because 1 pixel per byte is quite convenient and it didn't need any funky tricks like frame flipping, or mucking with video palette registers in a interrupt handler. I still remember seeing some 256-color images of some photos in the late 80's for the first time (at a Radio Shack store on a Tandy 1000), and being utterly amazed at the detail.
Apparently Tandy engineers were planning on including a real (not artifacted) 256 color mode in the updated CoCo 3, but the execs didn't want their little CoCo line to compete against their big Tandy 1000's. So in a ridiculously shortsighted decision they nixed the idea. However, there were rumors that this mode was actually implemented in the CoCo3's "GIME" graphics chip anyway but just not documented. The fascinating history and technical details can be read about here. Through some sleuthing the author even tracked down and interviewed the hardware engineer who created the CoCo 3's GIME chip, John Prickett.
The various grayscale signal patterns cause different sorts of chroma channel "leakage", leading to the below colors. (The X axis is the least significant nibble in this shot.) This shot was taken on an old LCD monitor (Samsung 150MP), hooked up to the composite input. The actual colors were more vibrant/unique than they appear in this photo.
It's a pretty sweet mode, and what's amazing to me is that all the programmers from the late 80's who worked on this platform mostly ignored it (probably because it didn't work on RGB monitors). Some pretty sweet things could have been done with it because 1 pixel per byte is quite convenient and it didn't need any funky tricks like frame flipping, or mucking with video palette registers in a interrupt handler. I still remember seeing some 256-color images of some photos in the late 80's for the first time (at a Radio Shack store on a Tandy 1000), and being utterly amazed at the detail.
Apparently Tandy engineers were planning on including a real (not artifacted) 256 color mode in the updated CoCo 3, but the execs didn't want their little CoCo line to compete against their big Tandy 1000's. So in a ridiculously shortsighted decision they nixed the idea. However, there were rumors that this mode was actually implemented in the CoCo3's "GIME" graphics chip anyway but just not documented. The fascinating history and technical details can be read about here. Through some sleuthing the author even tracked down and interviewed the hardware engineer who created the CoCo 3's GIME chip, John Prickett.
I've decoded a color JPEG using this mode. These first results are totally crude, but hey it's progress. My palette currently sucks -- I built it from the above image taken on an iPhone just to get things rolling. I didn't use dithering, which would certainly help, but the palette entries are the real problem right now. I've not been able to find an "official" palette or conversion algorithm anywhere yet, so I'm going to somehow either compute one or (maybe better) find a video capture card somewhere and just sample it.
Sunday, February 9, 2014
picojpeg: Decoding Lena on a Tandy Color Computer 3
Got a small grayscale version of Lena decoding to the 640x192x4 (HSCREEN 4) graphics mode using my picojpeg.c module. Here it is on an old LCD monitor hooked up to the CoCo's composite output:
In case you didn't know, the Color Computer is a classic 8/16-bit personal home computer from Tandy Corp./Radio Shack. The particular model I'm using (CoCo 3) was released in 1986. (As a kid I always wanted a CoCo 3, but I was stuck with a 16KB CoCo 2 and by the time I could afford to upgrade I had moved on to the PC.)
This is using a 2x2 ordered dither matrix, zoomed by 2x horizontally (displayed as 256x128), using composite out as a sort of inherent low pass filter. The BASIC program currently sets the graphics mode, programs the grayscale palette, and then jumps into the C code's _start function:
10 CLEAR12,9984
20 LOADM"TEST.BIN
30 HSCREEN4
40 PALETTE0,0
50 PALETTE1,16
60 PALETTE2,32
70 PALETTE3,48
80 EXEC9991
The C code (compiled using gcc6809 under Linux) disables interrupts, maps the graphics pages into the ROM area (starting at 0x8000, ending at 0xFEFF or thereabouts) using the MMU0 registers, then decodes the ~4KB JPEG and plots the pixels as each JPEG MCU block is decoded. It currently takes about 45 seconds to decode the 128x128 image, which beats the PIC18F microcontroller I was using a few years ago to originally test this code (that thing would take 10-20 minutes!).
Everything is completely unoptimized - I'm just glad to get it working at all. Using a CoCo to prototype/test this kind of stuff is actually much nicer overall than working on a tiny microcontroller. The CoCo community has compiled a large suite of documentation over the years, and there's a very nice set of tools available. Not to mention I've got built-in (albeit limited) graphics, 128KB of RAM, a 6-bit DAC for sound output, and various forms of input. The 6809 is a surprisingly powerful 8/16-bit CPU to work on - if you push it right.
This poor 1986 CoCo3 has seen better days (the case is an all-yellow wreck). I need to find a machine in better shape:
In case you didn't know, the Color Computer is a classic 8/16-bit personal home computer from Tandy Corp./Radio Shack. The particular model I'm using (CoCo 3) was released in 1986. (As a kid I always wanted a CoCo 3, but I was stuck with a 16KB CoCo 2 and by the time I could afford to upgrade I had moved on to the PC.)
This is using a 2x2 ordered dither matrix, zoomed by 2x horizontally (displayed as 256x128), using composite out as a sort of inherent low pass filter. The BASIC program currently sets the graphics mode, programs the grayscale palette, and then jumps into the C code's _start function:
10 CLEAR12,9984
20 LOADM"TEST.BIN
30 HSCREEN4
40 PALETTE0,0
50 PALETTE1,16
60 PALETTE2,32
70 PALETTE3,48
80 EXEC9991
The C code (compiled using gcc6809 under Linux) disables interrupts, maps the graphics pages into the ROM area (starting at 0x8000, ending at 0xFEFF or thereabouts) using the MMU0 registers, then decodes the ~4KB JPEG and plots the pixels as each JPEG MCU block is decoded. It currently takes about 45 seconds to decode the 128x128 image, which beats the PIC18F microcontroller I was using a few years ago to originally test this code (that thing would take 10-20 minutes!).
Everything is completely unoptimized - I'm just glad to get it working at all. Using a CoCo to prototype/test this kind of stuff is actually much nicer overall than working on a tiny microcontroller. The CoCo community has compiled a large suite of documentation over the years, and there's a very nice set of tools available. Not to mention I've got built-in (albeit limited) graphics, 128KB of RAM, a 6-bit DAC for sound output, and various forms of input. The 6809 is a surprisingly powerful 8/16-bit CPU to work on - if you push it right.
This poor 1986 CoCo3 has seen better days (the case is an all-yellow wreck). I need to find a machine in better shape:
Zoomed in further, with a 4x4 dither matrix:
Subscribe to:
Posts (Atom)












