diff options
Diffstat (limited to 'firmware/malloc/THOUGHTS')
-rw-r--r-- | firmware/malloc/THOUGHTS | 170 |
1 files changed, 0 insertions, 170 deletions
diff --git a/firmware/malloc/THOUGHTS b/firmware/malloc/THOUGHTS deleted file mode 100644 index 27517361da..0000000000 --- a/firmware/malloc/THOUGHTS +++ /dev/null | |||
@@ -1,170 +0,0 @@ | |||
1 | ==================================== | ||
2 | Memory Allocation Algorithm Theories | ||
3 | ==================================== | ||
4 | |||
5 | GOAL | ||
6 | It is intended to be a 100% working memory allocation system. It should be | ||
7 | capable of replacing an ordinary Operating System's own routines. It should | ||
8 | work good in a multitasking, shared memory, non-virtual memory environment | ||
9 | without clogging the memory. Primary aimed for small machines, CPUs and | ||
10 | memory amounts. | ||
11 | |||
12 | I use a best-fit algorithm with a slight overhead in order to increase speed | ||
13 | a lot. It should remain scalable and work good with very large amount of | ||
14 | memory and free/used memory blocks too. | ||
15 | |||
16 | TERMINOLOGY | ||
17 | |||
18 | FRAGMENT - small identically sized parts of a larger BLOCK, they are _not_ | ||
19 | allocated when traversed in lists etc | ||
20 | BLOCK - large memory area, if used for FRAGMENTS, they are linked in a | ||
21 | lists. One list for each FRAGMENT size supported. | ||
22 | TOP - head struct that holds information about and points to a chain | ||
23 | of BLOCKS for a particular FRAGMENT size. | ||
24 | CHUNK - a contiguous area of free memory | ||
25 | |||
26 | MEMORY SYSTEM | ||
27 | |||
28 | We split the system in two parts. One part allocates small memory amounts | ||
29 | and one part allocates large memory amounts, but all allocations are done | ||
30 | "through" the small-part-system. There is an option to use only the small | ||
31 | system (and thus use the OS for large blocks) or the complete package. | ||
32 | |||
33 | ############################################################################## | ||
34 | SMALL SIZE ALLOCATIONS | ||
35 | ############################################################################## | ||
36 | |||
37 | Keywords for this system is 'Deferred Coalescing' and 'quick lists'. | ||
38 | |||
39 | ALLOC | ||
40 | |||
41 | * Small allocations are "aligned" upwards to a set of preset sizes. In the | ||
42 | current implementation I use 20, 28, 52, 116, 312, 580, 1016, 2032 bytes. | ||
43 | Memory allocations of these sizes are referred to as FRAGMENTS. | ||
44 | (The reason for these specific sizes is the requirement that they must be | ||
45 | 32-bit aligned and fit as good as possible within 4064 bytes.) | ||
46 | |||
47 | * Allocations larger than 2032 will get a BLOCK for that allocation only. | ||
48 | |||
49 | * Each of these sizes has it's own TOP. When a FRAGMENT is requested, a | ||
50 | larger BLOCK will be allocated and divided into many FRAGMENTS (all of the | ||
51 | same size). TOP points to a list with BLOCKS that contains FRAGMENTS of | ||
52 | the same size. Each BLOCK has a 'number of free FRAGMENTS' counter and so | ||
53 | has each TOP (for the entire chain). | ||
54 | |||
55 | * A BLOCK is around 4064 bytes plus the size of the information header. This | ||
56 | size is adjusted to make the allocation of the big block not require more | ||
57 | than 4096 bytes. (This might not be so easy to be sure of, if you don't | ||
58 | know how the big-block system works, but the BMALLOC system uses an | ||
59 | extra header of 12 bytes and the header for the FRAGMENT BLOCK is 20 bytes | ||
60 | in a general 32-bit environment.) | ||
61 | |||
62 | * In case the allocation of a BLOCK fails when a FRAGMENT is required, the | ||
63 | next size of FRAGMENTS will be checked for a free FRAGMENT. First when the | ||
64 | larger size lists have been tested without success it will fail for real. | ||
65 | |||
66 | FREE | ||
67 | |||
68 | * When FRAGMENTS are freed so that a BLOCK becomes non-used, it is returned | ||
69 | to the system. | ||
70 | |||
71 | * FREEing a fragment adds the buffer in a LIFO-order. That means that the | ||
72 | next request for a fragment from the same list, the last freed buffer will | ||
73 | be returned first. | ||
74 | |||
75 | REALLOC | ||
76 | |||
77 | * REALLOCATION of a FRAGMENT does first check if the new size would fit | ||
78 | within the same FRAGMENT and if it would use the same FRAGMENT size. If it | ||
79 | does and would, the same pointer is returned. | ||
80 | |||
81 | OVERHEAD | ||
82 | |||
83 | Yes, there is an overhead on small allocations (internal fragmentation). | ||
84 | Yet, I do believe that small allocations more often than larger ones are | ||
85 | used dynamically. I believe that a large overhead is not a big problem if it | ||
86 | remains only for a while. The big gain is with the extreme speed we can GET | ||
87 | and RETURN small allocations. This has yet to be proven. I am open to other | ||
88 | systems of dealing with the small ones, but I don`t believe in using the | ||
89 | same system for all sizes of allocations. | ||
90 | |||
91 | IMPROVEMENT | ||
92 | |||
93 | An addition to the above described algorithm is the `save-empty-BLOCKS-a- | ||
94 | while-afterwards`. It will be used when the last used FRAGMENT within a | ||
95 | BLOCK is freed. The BLOCK will then not get returned to the system until "a | ||
96 | few more" FRAGMENTS have been freed in case the last [few] freed FRAGMENTS | ||
97 | are allocated yet again (and thus prevent the huge overhead of making | ||
98 | FRAGMENTS in a BLOCK). The "only" drawback of such a SEBAWA concept is | ||
99 | that it would mean an even bigger overhead... | ||
100 | |||
101 | HEADERS (in allocated data) | ||
102 | |||
103 | FRAGMENTS - 32-bit pointer to its parent BLOCK (lowest bit must be 0) | ||
104 | BLOCK - 32-bit size (lowest bit must be 1 to separate this from | ||
105 | FRAGMENTS) | ||
106 | |||
107 | ############################################################################## | ||
108 | LARGER ALLOCATIONS | ||
109 | ############################################################################## | ||
110 | |||
111 | If the requested size is larger than the largest FRAGMENT size supported, | ||
112 | the allocation will be made for this memory area alone, or if a BLOCK is | ||
113 | allocated to fit lots of FRAGMENTS a large block is also desired. | ||
114 | |||
115 | * We add memory to the "system" with the add_pool() function call. It | ||
116 | specifies the start and size of the new block of memory that will be | ||
117 | used in this memory allocation system. Several add_pool() calls are | ||
118 | supported and they may or may not add contiguous memory. | ||
119 | |||
120 | * Make all blocks get allocated aligned to BLOCKSIZE (sometimes referred to | ||
121 | as 'grain size'), 64 bytes in my implementation. Reports tell us there is | ||
122 | no real gain in increasing the size of the align. | ||
123 | |||
124 | * We link *all* pieces of memory (AREAS), free or not free. We keep the list | ||
125 | in address order and thus when a FREE() occurs we know instantly if there | ||
126 | are FREE CHUNKS wall-to-wall. No list "travels" needed. Requires some | ||
127 | extra space in every allocated BLOCK. Still needs to put the new CHUNK in | ||
128 | the right place in size-sorted list/tree. All memory areas, allocated or | ||
129 | not, contain the following header: | ||
130 | - size of this memory area (31 bits) | ||
131 | - FREE status (1 bit) | ||
132 | - pointer to the next AREA closest in memory (32 bits) | ||
133 | - pointer to the prev AREA closest in memory (32 bits) | ||
134 | (Totally 12 bytes) | ||
135 | |||
136 | * Sort all FREE CHUNKS in size-order. We use a SPLAY TREE algorithm for | ||
137 | maximum speed. Data/structs used for the size-sorting functions are kept | ||
138 | in an abstraction layer away from this since it is really not changing | ||
139 | anything (except executing speed). | ||
140 | |||
141 | ALLOC (RSIZE - requested size, aligned properly) | ||
142 | |||
143 | * Fetch a CHUNK that RSIZE fits within. If the found CHUNK is larger than | ||
144 | RSIZE, split it and return the RSIZE to the caller. Link the new CHUNK | ||
145 | into the list/tree. | ||
146 | |||
147 | FREE (AREA - piece of memory that is returned to the system) | ||
148 | |||
149 | * Since the allocated BLOCK has kept its link-pointers, we can without | ||
150 | checking any list instantly see if there are any FREE CHUNKS that are | ||
151 | wall-to-wall with the AREA (both sides). If the AREA *is* wall-to-wall | ||
152 | with one or two CHUNKS that or they are unlinked from the lists, enlarged | ||
153 | and re-linked into the lists. | ||
154 | |||
155 | REALLOC | ||
156 | |||
157 | * There IS NO realloc() of large blocks, they are performed in the previous | ||
158 | layer (dmalloc). | ||
159 | |||
160 | |||
161 | ############################################################################## | ||
162 | FURTHER READING | ||
163 | ############################################################################## | ||
164 | |||
165 | * "Dynamic Storage Allocation: A Survey and Critical Review" (Paul R. Wilson, | ||
166 | Mark S. Johnstone, Michael Neely, David Boles) | ||
167 | ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps | ||
168 | |||
169 | * "A Memory Allocator" (Doug Lea) | ||
170 | http://g.oswego.edu/dl/html/malloc.html | ||