From 80a8ea19ca417b64ff5160db85ed9758a8ee1d58 Mon Sep 17 00:00:00 2001 From: Michiel Van Der Kolk Date: Thu, 17 Mar 2005 13:41:05 +0000 Subject: Source documentation of gnuboy (all there is anyways...) Helps with understanding the code. git-svn-id: svn://svn.rockbox.org/rockbox/trunk@6195 a1c6a512-1295-4272-9138-f99709370657 --- apps/plugins/rockboy/HACKING | 472 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 472 insertions(+) create mode 100644 apps/plugins/rockboy/HACKING (limited to 'apps') diff --git a/apps/plugins/rockboy/HACKING b/apps/plugins/rockboy/HACKING new file mode 100644 index 0000000000..3efd85ed9b --- /dev/null +++ b/apps/plugins/rockboy/HACKING @@ -0,0 +1,472 @@ + +HACKING ON THE GNUBOY SOURCE TREE + + + BASIC INFO + +In preparation for the first release, I'm putting together a simple +document to aid anyone interested in playing around with or improving +the gnuboy source. First of all, before working on anything, you +should know my policies as maintainer. I'm happy to accept contributed +code, but there are a few guidelines: + +* Obviously, all code must be able to be distributed under the GNU +GPL. This means that your terms of use for the code must be equivalent +to or weaker than those of the GPL. Public domain and MIT-style +licenses are perfectly fine for new code that doesn't incorporate +existing parts of gnuboy, e.g. libraries, but anything derived from or +built upon the GPL'd code can only be distributed under GPL. When in +doubt, read COPYING. + +* Please stick to a coding and naming convention similar to the +existing code. I can reformat contributions if I need to when +integrating them, but it makes it much easier if that's already done +by the coder. In particular, indentions are a single tab (char 9), and +all symbols are all lowercase, except for macros which are all +uppercase. + +* All code must be completely deterministic and consistent across all +platforms. this results in the two following rules... + +* No floating point code whatsoever. Use fixed point or better yet +exact analytical integer methods as opposed to any approximation. + +* No threads. Emulation with threads is a poor approximation if done +sloppily, and it's slow anyway even if done right since things must be +kept synchronous. Also, threads are not portable. Just say no to +threads. + +* All non-portable code belongs in the sys/ or asm/ trees. #ifdef +should be avoided except for general conditionally-compiled code, as +opposed to little special cases for one particular cpu or operating +system. (i.e. #ifdef USE_ASM is ok, #ifdef __i386__ is NOT!) + +* That goes for *nix code too. gnuboy is written in ANSI C, and I'm +not going to go adding K&R function declarations or #ifdef's to make +sure the standard library is functional. If your system is THAT +broken, fix the system, don't "fix" the emulator. + +* Please no feature-creep. If something can be done through an +external utility or front-end, or through clever use of the rc +subsystem, don't add extra code to the main program. + +* On that note, the modules in the sys/ tree serve the singular +purpose of implementing calls necessary to get input and display +graphics (and eventually sound). Unlike in poorly-designed emulators, +they are not there to give every different target platform its own gui +and different set of key bindings. + +* Furthermore, the main loop is not in the platform-specific code, and +it will never be. Windows people, put your code that would normally go +in a message loop in ev_refresh and/or sys_sleep! + +* Commented code is welcome but not required. + +* I prefer asm in AT&T syntax (the style used by *nix assemblers and +likewise DJGPP) as opposed to Intel/NASM/etc style. If you really must +use a different style, I can convert it, but I don't want to add extra +dependencies on nonstandard assemblers to the build process. Also, +portable C versions of all code should be available. + +* Have fun with it. If my demands stifle your creativity, feel free to +fork your own projects. I can always adapt and merge code later if +your rogue ideas are good enough. :) + +OK, enough of that. Now for the fun part... + + + THE SOURCE TREE STRUCTURE + +[documentation] +README - general information related to using gnuboy +INSTALL - compiling and installation instructions +HACKING - this file, obviously +COPYING - the gnu gpl, grants freedom under condition of preseving it + +[build files] +Version - doubles as a C and makefile include, identifies version number +Rules - generic build rules to be included by makefiles +Makefile.* - system-specific makefiles +configure* - script for generating *nix makefiles + +[non-portable code] +sys/*/* - hardware and software platform-specific code +asm/*/* - optimized asm versions of some code, not used yet +asm/*/asm.h - header specifying which functions are replaced by asm +asm/i386/asmnames.h - #defines to fix _ prefix brain damage on DOS/Windows + +[main emulator stuff] +main.c - entry point, event handler...basically a mess +loader.c - handles file io for rom and ram +emu.c - another mess, basically the frame loop that calls state.c +debug.c - currently just cpu trace, eventually interactive debugging +hw.c - interrupt generation, gamepad state, dma, etc. +mem.c - memory mapper, read and write operations +fastmem.h - short static functions that will inline for fast memory io +regs.h - macros for accessing hardware registers +save.c - savestate handling + +[cpu subsystem] +cpu.c - main cpu emulation +cpuregs.h - macros for cpu registers and flags +cpucore.h - data tables for cpu emulation +asm/i386/cpu.s - entire cpu core, rewritten in asm + +[graphics subsystem] +fb.h - abstract framebuffer definition, extern from platform-specifics +lcd.c - main control of refresh procedure +lcd.h - vram, palette, and internal structures for refresh +asm/i386/lcd.s - asm versions of a few critical functions +lcdc.c - lcdc phase transitioning + +[input subsystem] +input.h - internal keycode definitions, etc. +keytables.c - translations between key names and internal keycodes +events.c - event queue + +[resource/config subsystem] +rc.h - structure defs +rccmds.c - command parser/processor +rcvars.c - variable exports and command to set rcvars +rckeys.c - keybindingds + +[misc code] +path.c - path searching +split.c - general purpose code to split strings into argv-style arrays + + + OVERVIEW OF PROGRAM FLOW + +The initial entry point main() main.c, which will process the command +line, call the system/video initialization routines, load the +rom/sram, and pass control to the main loop in emu.c. Note that the +system-specific main() hook has been removed since it is not needed. + +There have been significant changes to gnuboy's main loop since the +original 0.8.0 release. The former state.c is no more, and the new +code that takes its place, in lcdc.c, is now called from the cpu loop, +which although slightly unfortunate for performance reasons, is +necessary to handle some strange special cases. + +Still, unlike some emulators, gnuboy's main loop is not the cpu +emulation loop. Instead, a main loop in emu.c which handles video +refresh, polling events, sleeping between frames, etc. calls +cpu_emulate passing it an idea number of cycles to run. The actual +number of cycles for which the cpu runs will vary slightly depending +on the length of the final instruction processed, but it should never +be more than 8 or 9 beyond the ideal cycle count passed, and the +actual number will be returned to the calling function in case it +needs this information. The cpu code now takes care of all timer and +lcdc events in its main loop, so the caller no longer needs to be +aware of such things. + +Note that all cycle counts are measured in CGB double speed MACHINE +cycles (2**21 Hz), NOT hardware clock cycles (2**23 Hz). This is +necessary because the cpu speed can be switched between single and +double speed during a single call to cpu_emulate. When running in +single speed or DMG mode, all instruction lengths are doubled. + +As for the LCDC state, things are much simpler now. No more huge +glorious state table, no more P/Q/R, just a couple simple functions. +Aside from the number of cycles left before the next state change, all +the state information fits nicely in the locations the Game Boy itself +provides for it -- the LCDC, STAT, and LY registers. + +If the special cases for the last line of VBLANK look strange to you, +good. There's some weird stuff going on here. According to documents +I've found, LY changes from 153 to 0 early in the last line, then +remains at 0 until the end of the first visible scanline. I don't +recall finding any roms that rely on this behavior, but I implemented +it anyway. + +That covers the basics. As for flow of execution, here's a simplified +call tree that covers most of the significant function calls taking +place in normal operation: + + main sys/ + \_ real_main main.c + |_ sys_init sys/ + |_ vid_init sys/ + |_ loader_init loader.c + |_ emu_reset emu.c + \_ emu_run emu.c + |_ cpu_emulate cpu.c + | |_ div_advance cpu.c * + | |_ timer_advance cpu.c * + | |_ lcdc_advance cpu.c * + | | \_ lcdc_trans lcdc.c + | | |_ lcd_refreshline lcd.c + | | |_ stat_change lcdc.c + | | | \_ lcd_begin lcd.c + | | \_ stat_trigger lcdc.c + | \_ sound_advance cpu.c * + |_ vid_end sys/ + |_ sys_elapsed sys/ + |_ sys_sleep sys/ + |_ vid_begin sys/ + \_ doevents main.c + + (* included in cpu.c so they can inline; also in cpu.s) + + + MEMORY READ/WRITE MAP + +Whenever possible, gnuboy avoids emulating memory reads and writes +with a function call. To this end, two pointer tables are kept -- one +for reading, the other for writing. They are indexed by bits 12-15 of +the address in Game Boy memory space, and yield a base pointer from +which the whole address can be used as an offset to access Game Boy +memory with no function calls whatsoever. For regions that cannot be +accessed without function calls, the pointer in the table is NULL. + +For example, reading from address addr can be accomplished by testing +to make sure mbc.rmap[addr>>12] is not NULL, then simply reading +mbc.rmap[addr>>12][addr]. + +And for the disbelievers in this optimization, here are some numbers +to compare. First, FFL2 with memory tables disabled: + + % cumulative self self total + time seconds seconds calls us/call us/call name + 28.69 0.57 0.57 refresh_2 + 13.17 0.84 0.26 4307863 0.06 0.06 mem_read + 11.63 1.07 0.23 cpu_emulate + +Now, with memory tables enabled: + + 38.86 0.66 0.66 refresh_2 + 8.42 0.80 0.14 156380 0.91 0.91 spr_enum + 6.76 0.91 0.11 483134 0.24 1.31 lcdc_trans + 6.16 1.02 0.10 cpu_emulate + . + . + . + 0.59 1.61 0.01 216497 0.05 0.05 mem_read + +As you can see, not only does mem_read take up (proportionally) 1/20 +as much time, since it is rarely called, but the main cpu loop in +cpu_emulate also runs considerably faster with all the function call +overhead and cache misses avoided. + +These tests were performed on K6-2/450 with the assembly cores +enabled; your milage may vary. Regardless, however, I think it's clear +that using the address mapping tables is quite a worthwhile +optimization. + + + LCD RENDERING CORE DESIGN + +The LCD core presently used in gnuboy is very much a high-level one, +performing the task of rasterizing scanlines as many independent steps +rather than one big loop, as is often seen in other emulators and the +original gnuboy LCD core. In some ways, this is a bit of a tradeoff -- +there's a good deal of overhead in rebuilding the tile pattern cache +for roms that change their tile patterns frequently, such as full +motion video demos. Even still, I consider the method we're presently +using far superior to generating the output display directly from the +gameboy tiledata -- in the vast majority of roms, tiles are changed so +infrequently that the overhead is irrelevant. Even if the tiles are +changed rapidly, the only chance for overhead beyond what would be +present in a monolithic rendering loop lies in (host cpu) cache misses +and the possibility that we might (tile pattern) cache a tile that has +changed but that will never actually be used, or that will only be +used in one orientation (horizontally and vertically flipped versions +of all tiles are cached as well). Such tile caching issues could be +addressed in the long term if they cause a problem, but I don't see it +hurting performance too significantly at the present. As for host cpu +cache miss issues, I find that putting multiple data decoding and +rendering steps together in a single loop harms performance much more +significantly than building a 256k (pattern) cache table, on account +of interfering with branch prediction, register allocation, and so on. + +Well, with those justifications given, let's proceed to the steps +involved in rendering a scanline: + +updatepatpix() - updates tile pattern cache. + +tilebuf() - reads gb tile memory according to its complicated tile +addressing system which can be changed via the LCDC register, and +outputs nice linear arrays of the actual tile indices used in the +background and window on the present line. + +Before continuing, let me explain the output format used by the +following functions. There is a byte array scan.buf, accessible by +macro as BUF, which is the output buffer for the line. The structure +of this array is simple: it is composed of 6 bpp gameboy color +numbers, where the bits 0-1 are the color number from the tile, bits +2-4 are the (cgb or dmg) palette index, and bit 5 is 0 for background +or window, 1 for sprite. + +What is the justification for using a strange format like this, rather +than raw host color numbers for output? Well, believe it or not, it +improves performance. It's already necessary to have the gameboy color +numbers available for use in sprite priority. And, when running in +mono gb mode, building this output data is VERY fast -- it's just a +matter of doing 64 bit copies from the tile pattern cache to the +output buffer. + +Furthermore, using a unified output format like this eliminates the +need to have separate rendering functions for each host color depth or +mode. We just call a one-line function to apply a palette to the +output buffer as we copy it to the video display, and we're done. And, +if you're not convinced about performance, just do some profiling. +You'll see that the vast majority of the graphics time is spent in the +one-line copy function (render_[124] depending on bytes per pixel), +even when using the fast asm versions of those routines. That is to +say, any overhead in the following functions is for all intents and +purposes irrelevant to performance. With that said, here they are: + +bg_scan() - expands the background layer to the output buffer. + +wnd_scan() - expands the window layer. + +spr_scan() - expands the sprites. Note that this requires spr_enum() +to have been called already to build a list of which sprites are +visible on the current scanline and sort them by priority. + +It should be noted that the background and window functions also have +color counterparts, which are considerably slower due to merging of +palette data. At this point, they're staying down around 8% time +according to the profiler, so I don't see a major need to rewrite them +anytime soon. It should be considered, however, that a different +intermediate format could be used for gbc, or that asm versions of +these two routines could be written, in the long term. + +Finally, some notes on palettes. You may be wondering why the 6 bpp +intermediate output can't be used directly on 256-color display +targets. After all, that would give a huge performance boost. The +problem, however, is that the gameboy palette can change midscreen, +whereas none of the presently targetted host systems can handle such a +thing, much less do it portably. For color roms, using our own +internal color mappings in addition to the host system palette is +essential. For details on how this is accomplished, read palette.c. + +Now, in the long term, it MAY be possible to use the 6 bpp color +"almost" directly for mono roms. Note that I say almost. The idea is +this. Using the color number as an index into a table is slow. It +takes an extra read and causes various pipeline stalls depending on +the host cpu architecture. But, since there are relatively few +possible mono palettes, it may actually be possible to set up the host +palette in a clever way so as to cover all the possibilities, then use +some fancy arithmetic or bit-twiddling to convert without a lookup +table -- and this could presumably be done 4 pixels at a time with +32bit operations. This area remains to be explored, but if it works, +it might end up being the last hurdle to getting realtime emulation +working on very low-end systems like i486. + + + SOUND + +Rather than processing sound after every few instructions (and thus +killing the cache coherency), we update sound in big chunks. Yet this +in no way affects precise sound timing, because sound_mix is always +called before reading or writing a sound register, and at the end of +each frame. + +The main sound module interfaces with the system-specific code through +one structure, pcm, and a few functions: pcm_init, pcm_close, and +pcm_submit. While the first two should be obvious, pcm_submit needs +some explaining. Whenever realtime sound output is operational, +pcm_submit is responsible for timing, and should not return until it +has successfully processed all the data in its input buffer (pcm.buf). +On *nix sound devices, this typically means just waiting for the write +syscall to return, but on systems such as DOS where low level IO must +be handled in the program, pcm_submit needs to delay until the current +position in the DMA buffer has advanced sufficiently to make space for +the new samples, then copy them. + +For special sound output implementations like write-to-file or the +dummy sound device, pcm_submit should write the data immediately and +return 0, indicating to the caller that other methods must be used for +timing. On real sound devices that are presently functional, +pcm_submit should return 1, regardless of whether it buffered or +actually wrote the sound data. + +And yes, for unices without OSS, we hope to add piped audio output +soon. Perhaps Sun audio device and a few others as well. + + + OPTIMIZED ASSEMBLY CODE + +A lot can be said on this matter. Nothing has been said yet. + + + INTERACTIVE DEBUGGER + +Apologies, there is no interactive debugger in gnuboy at present. I'm +still working out the design for it. In the long run, it should be +integrated with the rc subsystem, kinda like a cross between gdb and +Quake's ever-famous console. Whether it will require a terminal device +or support the graphical display remains to be determined. + +In the mean time, you can use the debug trace code already +implemented. Just "set trace 1" from your gnuboy.rc or the command +line. Read debug.c for info on how to interpret the output, which is +condensed as much as possible and not quite self-explanatory. + + + PORTING + +On all systems on which it is available, the gnu compiler should +probably be used. Writing code specific to non-free compilers makes it +impossible for free software users to actively contribute. On the +other hand, compiler-specific code should always be kept to a minimum, +to make porting to or from non-gnu compilers easier. + +Porting to new cpu architectures should not be necessary. Just make +sure you unset IS_LITTLE_ENDIAN in the makefiles to enable the big +endian default if the target system is big endian. If you do have +problems building on certain cpus, however, let us know. Eventually, +we will also want asm cpu and graphics code for popular host cpus, but +this can wait, since the c code should be sufficiently fast on most +platforms. + +The bulk of porting efforts will probably be spent on adding support +for new operating systems, and on systems with multiple video (or +sound, once that's implemented) architectures, new interfaces for +those. In general, the operating system interface code goes in a +directory under sys/ named for the os (e.g. sys/nix/ for *nix +systems), and display interfaces likewise go in their respective +directories under sys/ (e.g. sys/x11/ for the x window system +interface). + +For guidelines in writing new system and display interface modules, i +recommend reading the files in the sys/dos/, sys/svga/, and sys/nix/ +directories. These are some of the simpler versions (aside from the +tricky dos keyboard handling), as opposed to all the mess needed for +x11 support. + +Also, please be aware that the existing system and display interface +modules are somewhat primitive; they are designed to be as quick and +sloppy as possible while still functioning properly. Eventually they +will be greatly improved. + +Finally, remember your obligations under the GNU GPL. If you produce +any binaries that are compiled strictly from the source you received, +and you intend to release those, you *must* also release the exact +sources you used to produce those binaries. This is not pseudo-free +software like Snes9x where binaries usually appear before the latest +source, and where the source only compiles on one or two platforms; +this is true free software, and the source to all binaries always +needs to be available at the same time or sooner than the +corresponding binaries, if binaries are to be released at all. This of +course applies to all releases, not just new ports, but from +experience i find that ports people usually need the most reminding. + + + EPILOGUE + +That's it for now. More info will eventually follow. Happy hacking! + + + + + + + + + + + + + -- cgit v1.2.3