summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--apps/plugins/rockboy/HACKING472
1 files changed, 472 insertions, 0 deletions
diff --git a/apps/plugins/rockboy/HACKING b/apps/plugins/rockboy/HACKING
new file mode 100644
index 0000000000..3efd85ed9b
--- /dev/null
+++ b/apps/plugins/rockboy/HACKING
@@ -0,0 +1,472 @@
1
2HACKING ON THE GNUBOY SOURCE TREE
3
4
5 BASIC INFO
6
7In preparation for the first release, I'm putting together a simple
8document to aid anyone interested in playing around with or improving
9the gnuboy source. First of all, before working on anything, you
10should know my policies as maintainer. I'm happy to accept contributed
11code, but there are a few guidelines:
12
13* Obviously, all code must be able to be distributed under the GNU
14GPL. This means that your terms of use for the code must be equivalent
15to or weaker than those of the GPL. Public domain and MIT-style
16licenses are perfectly fine for new code that doesn't incorporate
17existing parts of gnuboy, e.g. libraries, but anything derived from or
18built upon the GPL'd code can only be distributed under GPL. When in
19doubt, read COPYING.
20
21* Please stick to a coding and naming convention similar to the
22existing code. I can reformat contributions if I need to when
23integrating them, but it makes it much easier if that's already done
24by the coder. In particular, indentions are a single tab (char 9), and
25all symbols are all lowercase, except for macros which are all
26uppercase.
27
28* All code must be completely deterministic and consistent across all
29platforms. this results in the two following rules...
30
31* No floating point code whatsoever. Use fixed point or better yet
32exact analytical integer methods as opposed to any approximation.
33
34* No threads. Emulation with threads is a poor approximation if done
35sloppily, and it's slow anyway even if done right since things must be
36kept synchronous. Also, threads are not portable. Just say no to
37threads.
38
39* All non-portable code belongs in the sys/ or asm/ trees. #ifdef
40should be avoided except for general conditionally-compiled code, as
41opposed to little special cases for one particular cpu or operating
42system. (i.e. #ifdef USE_ASM is ok, #ifdef __i386__ is NOT!)
43
44* That goes for *nix code too. gnuboy is written in ANSI C, and I'm
45not going to go adding K&R function declarations or #ifdef's to make
46sure the standard library is functional. If your system is THAT
47broken, fix the system, don't "fix" the emulator.
48
49* Please no feature-creep. If something can be done through an
50external utility or front-end, or through clever use of the rc
51subsystem, don't add extra code to the main program.
52
53* On that note, the modules in the sys/ tree serve the singular
54purpose of implementing calls necessary to get input and display
55graphics (and eventually sound). Unlike in poorly-designed emulators,
56they are not there to give every different target platform its own gui
57and different set of key bindings.
58
59* Furthermore, the main loop is not in the platform-specific code, and
60it will never be. Windows people, put your code that would normally go
61in a message loop in ev_refresh and/or sys_sleep!
62
63* Commented code is welcome but not required.
64
65* I prefer asm in AT&T syntax (the style used by *nix assemblers and
66likewise DJGPP) as opposed to Intel/NASM/etc style. If you really must
67use a different style, I can convert it, but I don't want to add extra
68dependencies on nonstandard assemblers to the build process. Also,
69portable C versions of all code should be available.
70
71* Have fun with it. If my demands stifle your creativity, feel free to
72fork your own projects. I can always adapt and merge code later if
73your rogue ideas are good enough. :)
74
75OK, enough of that. Now for the fun part...
76
77
78 THE SOURCE TREE STRUCTURE
79
80[documentation]
81README - general information related to using gnuboy
82INSTALL - compiling and installation instructions
83HACKING - this file, obviously
84COPYING - the gnu gpl, grants freedom under condition of preseving it
85
86[build files]
87Version - doubles as a C and makefile include, identifies version number
88Rules - generic build rules to be included by makefiles
89Makefile.* - system-specific makefiles
90configure* - script for generating *nix makefiles
91
92[non-portable code]
93sys/*/* - hardware and software platform-specific code
94asm/*/* - optimized asm versions of some code, not used yet
95asm/*/asm.h - header specifying which functions are replaced by asm
96asm/i386/asmnames.h - #defines to fix _ prefix brain damage on DOS/Windows
97
98[main emulator stuff]
99main.c - entry point, event handler...basically a mess
100loader.c - handles file io for rom and ram
101emu.c - another mess, basically the frame loop that calls state.c
102debug.c - currently just cpu trace, eventually interactive debugging
103hw.c - interrupt generation, gamepad state, dma, etc.
104mem.c - memory mapper, read and write operations
105fastmem.h - short static functions that will inline for fast memory io
106regs.h - macros for accessing hardware registers
107save.c - savestate handling
108
109[cpu subsystem]
110cpu.c - main cpu emulation
111cpuregs.h - macros for cpu registers and flags
112cpucore.h - data tables for cpu emulation
113asm/i386/cpu.s - entire cpu core, rewritten in asm
114
115[graphics subsystem]
116fb.h - abstract framebuffer definition, extern from platform-specifics
117lcd.c - main control of refresh procedure
118lcd.h - vram, palette, and internal structures for refresh
119asm/i386/lcd.s - asm versions of a few critical functions
120lcdc.c - lcdc phase transitioning
121
122[input subsystem]
123input.h - internal keycode definitions, etc.
124keytables.c - translations between key names and internal keycodes
125events.c - event queue
126
127[resource/config subsystem]
128rc.h - structure defs
129rccmds.c - command parser/processor
130rcvars.c - variable exports and command to set rcvars
131rckeys.c - keybindingds
132
133[misc code]
134path.c - path searching
135split.c - general purpose code to split strings into argv-style arrays
136
137
138 OVERVIEW OF PROGRAM FLOW
139
140The initial entry point main() main.c, which will process the command
141line, call the system/video initialization routines, load the
142rom/sram, and pass control to the main loop in emu.c. Note that the
143system-specific main() hook has been removed since it is not needed.
144
145There have been significant changes to gnuboy's main loop since the
146original 0.8.0 release. The former state.c is no more, and the new
147code that takes its place, in lcdc.c, is now called from the cpu loop,
148which although slightly unfortunate for performance reasons, is
149necessary to handle some strange special cases.
150
151Still, unlike some emulators, gnuboy's main loop is not the cpu
152emulation loop. Instead, a main loop in emu.c which handles video
153refresh, polling events, sleeping between frames, etc. calls
154cpu_emulate passing it an idea number of cycles to run. The actual
155number of cycles for which the cpu runs will vary slightly depending
156on the length of the final instruction processed, but it should never
157be more than 8 or 9 beyond the ideal cycle count passed, and the
158actual number will be returned to the calling function in case it
159needs this information. The cpu code now takes care of all timer and
160lcdc events in its main loop, so the caller no longer needs to be
161aware of such things.
162
163Note that all cycle counts are measured in CGB double speed MACHINE
164cycles (2**21 Hz), NOT hardware clock cycles (2**23 Hz). This is
165necessary because the cpu speed can be switched between single and
166double speed during a single call to cpu_emulate. When running in
167single speed or DMG mode, all instruction lengths are doubled.
168
169As for the LCDC state, things are much simpler now. No more huge
170glorious state table, no more P/Q/R, just a couple simple functions.
171Aside from the number of cycles left before the next state change, all
172the state information fits nicely in the locations the Game Boy itself
173provides for it -- the LCDC, STAT, and LY registers.
174
175If the special cases for the last line of VBLANK look strange to you,
176good. There's some weird stuff going on here. According to documents
177I've found, LY changes from 153 to 0 early in the last line, then
178remains at 0 until the end of the first visible scanline. I don't
179recall finding any roms that rely on this behavior, but I implemented
180it anyway.
181
182That covers the basics. As for flow of execution, here's a simplified
183call tree that covers most of the significant function calls taking
184place in normal operation:
185
186 main sys/
187 \_ real_main main.c
188 |_ sys_init sys/
189 |_ vid_init sys/
190 |_ loader_init loader.c
191 |_ emu_reset emu.c
192 \_ emu_run emu.c
193 |_ cpu_emulate cpu.c
194 | |_ div_advance cpu.c *
195 | |_ timer_advance cpu.c *
196 | |_ lcdc_advance cpu.c *
197 | | \_ lcdc_trans lcdc.c
198 | | |_ lcd_refreshline lcd.c
199 | | |_ stat_change lcdc.c
200 | | | \_ lcd_begin lcd.c
201 | | \_ stat_trigger lcdc.c
202 | \_ sound_advance cpu.c *
203 |_ vid_end sys/
204 |_ sys_elapsed sys/
205 |_ sys_sleep sys/
206 |_ vid_begin sys/
207 \_ doevents main.c
208
209 (* included in cpu.c so they can inline; also in cpu.s)
210
211
212 MEMORY READ/WRITE MAP
213
214Whenever possible, gnuboy avoids emulating memory reads and writes
215with a function call. To this end, two pointer tables are kept -- one
216for reading, the other for writing. They are indexed by bits 12-15 of
217the address in Game Boy memory space, and yield a base pointer from
218which the whole address can be used as an offset to access Game Boy
219memory with no function calls whatsoever. For regions that cannot be
220accessed without function calls, the pointer in the table is NULL.
221
222For example, reading from address addr can be accomplished by testing
223to make sure mbc.rmap[addr>>12] is not NULL, then simply reading
224mbc.rmap[addr>>12][addr].
225
226And for the disbelievers in this optimization, here are some numbers
227to compare. First, FFL2 with memory tables disabled:
228
229 % cumulative self self total
230 time seconds seconds calls us/call us/call name
231 28.69 0.57 0.57 refresh_2
232 13.17 0.84 0.26 4307863 0.06 0.06 mem_read
233 11.63 1.07 0.23 cpu_emulate
234
235Now, with memory tables enabled:
236
237 38.86 0.66 0.66 refresh_2
238 8.42 0.80 0.14 156380 0.91 0.91 spr_enum
239 6.76 0.91 0.11 483134 0.24 1.31 lcdc_trans
240 6.16 1.02 0.10 cpu_emulate
241 .
242 .
243 .
244 0.59 1.61 0.01 216497 0.05 0.05 mem_read
245
246As you can see, not only does mem_read take up (proportionally) 1/20
247as much time, since it is rarely called, but the main cpu loop in
248cpu_emulate also runs considerably faster with all the function call
249overhead and cache misses avoided.
250
251These tests were performed on K6-2/450 with the assembly cores
252enabled; your milage may vary. Regardless, however, I think it's clear
253that using the address mapping tables is quite a worthwhile
254optimization.
255
256
257 LCD RENDERING CORE DESIGN
258
259The LCD core presently used in gnuboy is very much a high-level one,
260performing the task of rasterizing scanlines as many independent steps
261rather than one big loop, as is often seen in other emulators and the
262original gnuboy LCD core. In some ways, this is a bit of a tradeoff --
263there's a good deal of overhead in rebuilding the tile pattern cache
264for roms that change their tile patterns frequently, such as full
265motion video demos. Even still, I consider the method we're presently
266using far superior to generating the output display directly from the
267gameboy tiledata -- in the vast majority of roms, tiles are changed so
268infrequently that the overhead is irrelevant. Even if the tiles are
269changed rapidly, the only chance for overhead beyond what would be
270present in a monolithic rendering loop lies in (host cpu) cache misses
271and the possibility that we might (tile pattern) cache a tile that has
272changed but that will never actually be used, or that will only be
273used in one orientation (horizontally and vertically flipped versions
274of all tiles are cached as well). Such tile caching issues could be
275addressed in the long term if they cause a problem, but I don't see it
276hurting performance too significantly at the present. As for host cpu
277cache miss issues, I find that putting multiple data decoding and
278rendering steps together in a single loop harms performance much more
279significantly than building a 256k (pattern) cache table, on account
280of interfering with branch prediction, register allocation, and so on.
281
282Well, with those justifications given, let's proceed to the steps
283involved in rendering a scanline:
284
285updatepatpix() - updates tile pattern cache.
286
287tilebuf() - reads gb tile memory according to its complicated tile
288addressing system which can be changed via the LCDC register, and
289outputs nice linear arrays of the actual tile indices used in the
290background and window on the present line.
291
292Before continuing, let me explain the output format used by the
293following functions. There is a byte array scan.buf, accessible by
294macro as BUF, which is the output buffer for the line. The structure
295of this array is simple: it is composed of 6 bpp gameboy color
296numbers, where the bits 0-1 are the color number from the tile, bits
2972-4 are the (cgb or dmg) palette index, and bit 5 is 0 for background
298or window, 1 for sprite.
299
300What is the justification for using a strange format like this, rather
301than raw host color numbers for output? Well, believe it or not, it
302improves performance. It's already necessary to have the gameboy color
303numbers available for use in sprite priority. And, when running in
304mono gb mode, building this output data is VERY fast -- it's just a
305matter of doing 64 bit copies from the tile pattern cache to the
306output buffer.
307
308Furthermore, using a unified output format like this eliminates the
309need to have separate rendering functions for each host color depth or
310mode. We just call a one-line function to apply a palette to the
311output buffer as we copy it to the video display, and we're done. And,
312if you're not convinced about performance, just do some profiling.
313You'll see that the vast majority of the graphics time is spent in the
314one-line copy function (render_[124] depending on bytes per pixel),
315even when using the fast asm versions of those routines. That is to
316say, any overhead in the following functions is for all intents and
317purposes irrelevant to performance. With that said, here they are:
318
319bg_scan() - expands the background layer to the output buffer.
320
321wnd_scan() - expands the window layer.
322
323spr_scan() - expands the sprites. Note that this requires spr_enum()
324to have been called already to build a list of which sprites are
325visible on the current scanline and sort them by priority.
326
327It should be noted that the background and window functions also have
328color counterparts, which are considerably slower due to merging of
329palette data. At this point, they're staying down around 8% time
330according to the profiler, so I don't see a major need to rewrite them
331anytime soon. It should be considered, however, that a different
332intermediate format could be used for gbc, or that asm versions of
333these two routines could be written, in the long term.
334
335Finally, some notes on palettes. You may be wondering why the 6 bpp
336intermediate output can't be used directly on 256-color display
337targets. After all, that would give a huge performance boost. The
338problem, however, is that the gameboy palette can change midscreen,
339whereas none of the presently targetted host systems can handle such a
340thing, much less do it portably. For color roms, using our own
341internal color mappings in addition to the host system palette is
342essential. For details on how this is accomplished, read palette.c.
343
344Now, in the long term, it MAY be possible to use the 6 bpp color
345"almost" directly for mono roms. Note that I say almost. The idea is
346this. Using the color number as an index into a table is slow. It
347takes an extra read and causes various pipeline stalls depending on
348the host cpu architecture. But, since there are relatively few
349possible mono palettes, it may actually be possible to set up the host
350palette in a clever way so as to cover all the possibilities, then use
351some fancy arithmetic or bit-twiddling to convert without a lookup
352table -- and this could presumably be done 4 pixels at a time with
35332bit operations. This area remains to be explored, but if it works,
354it might end up being the last hurdle to getting realtime emulation
355working on very low-end systems like i486.
356
357
358 SOUND
359
360Rather than processing sound after every few instructions (and thus
361killing the cache coherency), we update sound in big chunks. Yet this
362in no way affects precise sound timing, because sound_mix is always
363called before reading or writing a sound register, and at the end of
364each frame.
365
366The main sound module interfaces with the system-specific code through
367one structure, pcm, and a few functions: pcm_init, pcm_close, and
368pcm_submit. While the first two should be obvious, pcm_submit needs
369some explaining. Whenever realtime sound output is operational,
370pcm_submit is responsible for timing, and should not return until it
371has successfully processed all the data in its input buffer (pcm.buf).
372On *nix sound devices, this typically means just waiting for the write
373syscall to return, but on systems such as DOS where low level IO must
374be handled in the program, pcm_submit needs to delay until the current
375position in the DMA buffer has advanced sufficiently to make space for
376the new samples, then copy them.
377
378For special sound output implementations like write-to-file or the
379dummy sound device, pcm_submit should write the data immediately and
380return 0, indicating to the caller that other methods must be used for
381timing. On real sound devices that are presently functional,
382pcm_submit should return 1, regardless of whether it buffered or
383actually wrote the sound data.
384
385And yes, for unices without OSS, we hope to add piped audio output
386soon. Perhaps Sun audio device and a few others as well.
387
388
389 OPTIMIZED ASSEMBLY CODE
390
391A lot can be said on this matter. Nothing has been said yet.
392
393
394 INTERACTIVE DEBUGGER
395
396Apologies, there is no interactive debugger in gnuboy at present. I'm
397still working out the design for it. In the long run, it should be
398integrated with the rc subsystem, kinda like a cross between gdb and
399Quake's ever-famous console. Whether it will require a terminal device
400or support the graphical display remains to be determined.
401
402In the mean time, you can use the debug trace code already
403implemented. Just "set trace 1" from your gnuboy.rc or the command
404line. Read debug.c for info on how to interpret the output, which is
405condensed as much as possible and not quite self-explanatory.
406
407
408 PORTING
409
410On all systems on which it is available, the gnu compiler should
411probably be used. Writing code specific to non-free compilers makes it
412impossible for free software users to actively contribute. On the
413other hand, compiler-specific code should always be kept to a minimum,
414to make porting to or from non-gnu compilers easier.
415
416Porting to new cpu architectures should not be necessary. Just make
417sure you unset IS_LITTLE_ENDIAN in the makefiles to enable the big
418endian default if the target system is big endian. If you do have
419problems building on certain cpus, however, let us know. Eventually,
420we will also want asm cpu and graphics code for popular host cpus, but
421this can wait, since the c code should be sufficiently fast on most
422platforms.
423
424The bulk of porting efforts will probably be spent on adding support
425for new operating systems, and on systems with multiple video (or
426sound, once that's implemented) architectures, new interfaces for
427those. In general, the operating system interface code goes in a
428directory under sys/ named for the os (e.g. sys/nix/ for *nix
429systems), and display interfaces likewise go in their respective
430directories under sys/ (e.g. sys/x11/ for the x window system
431interface).
432
433For guidelines in writing new system and display interface modules, i
434recommend reading the files in the sys/dos/, sys/svga/, and sys/nix/
435directories. These are some of the simpler versions (aside from the
436tricky dos keyboard handling), as opposed to all the mess needed for
437x11 support.
438
439Also, please be aware that the existing system and display interface
440modules are somewhat primitive; they are designed to be as quick and
441sloppy as possible while still functioning properly. Eventually they
442will be greatly improved.
443
444Finally, remember your obligations under the GNU GPL. If you produce
445any binaries that are compiled strictly from the source you received,
446and you intend to release those, you *must* also release the exact
447sources you used to produce those binaries. This is not pseudo-free
448software like Snes9x where binaries usually appear before the latest
449source, and where the source only compiles on one or two platforms;
450this is true free software, and the source to all binaries always
451needs to be available at the same time or sooner than the
452corresponding binaries, if binaries are to be released at all. This of
453course applies to all releases, not just new ports, but from
454experience i find that ports people usually need the most reminding.
455
456
457 EPILOGUE
458
459That's it for now. More info will eventually follow. Happy hacking!
460
461
462
463
464
465
466
467
468
469
470
471
472