diff options
Diffstat (limited to 'apps/plugins/pdbox/dbestfit-3.3/thoughts')
-rw-r--r-- | apps/plugins/pdbox/dbestfit-3.3/thoughts | 340 |
1 files changed, 340 insertions, 0 deletions
diff --git a/apps/plugins/pdbox/dbestfit-3.3/thoughts b/apps/plugins/pdbox/dbestfit-3.3/thoughts new file mode 100644 index 0000000000..8dbd8b9979 --- /dev/null +++ b/apps/plugins/pdbox/dbestfit-3.3/thoughts | |||
@@ -0,0 +1,340 @@ | |||
1 | ===================================== | ||
2 | Memory Allocation Algorithm Theories. | ||
3 | ===================================== | ||
4 | |||
5 | GOAL | ||
6 | It is intended to be a 100% working memory allocation system. It should be | ||
7 | capable of replacing an ordinary Operating System's own routines. It should | ||
8 | work good in a multitasking, shared memory, non-virtual memory environment | ||
9 | without clogging the memory. Primary aimed for small machines, CPUs and | ||
10 | memory amounts. | ||
11 | |||
12 | I use a best-fit algorithm with a slight overhead in order to increase speed | ||
13 | a lot. It should remain scalable and work good with very large amount of | ||
14 | memory and free/used memory blocks too. | ||
15 | |||
16 | TERMINOLOGY | ||
17 | |||
18 | FRAGMENT - small identically sized parts of a larger BLOCK, they are when | ||
19 | travered in lists etc _not_ allocated. | ||
20 | BLOCK - large memory area, if used for FRAGMENTS, they are linked in a | ||
21 | lists. One list for each FRAGMENT size supported. | ||
22 | TOP - head struct that holds information about and points to a chain | ||
23 | of BLOCKS for a particular FRAGMENT size. | ||
24 | CHUNK - a contiguous area of free memory | ||
25 | |||
26 | MEMORY SYSTEM | ||
27 | |||
28 | We split the system in two parts. One part allocates small memory amounts | ||
29 | and one part allocates large memory amounts, but all allocations are done | ||
30 | "through" the small-part-system. There is an option to use only the small | ||
31 | system (and thus use the OS for large blocks) or the complete package. | ||
32 | |||
33 | ############################################################################## | ||
34 | SMALL SIZE ALLOCATIONS | ||
35 | ############################################################################## | ||
36 | |||
37 | Keywords for this system is 'Deferred Coalescing' and 'quick lists'. | ||
38 | |||
39 | ALLOC | ||
40 | |||
41 | * Small allocations are "aligned" upwards to a set of preset sizes. In the | ||
42 | current implementation I use 20, 28, 52, 116, 312, 580, 812, 2028 bytes. | ||
43 | Memory allocations of these sizes are refered to as FRAGMENTS. | ||
44 | (The reason for these specific sizes is the requirement that they must be | ||
45 | 32-bit aligned and fit as good as possible within 4060 bytes.) | ||
46 | |||
47 | * Allocations larger than 2028 will get a BLOCK for that allocation only. | ||
48 | |||
49 | * Each of these sizes has it's own TOP. When a FRAGMENT is requested, a | ||
50 | larger BLOCK will be allocated and divided into many FRAGMENTS (all of the | ||
51 | same size). TOP points to a list with BLOCKS that contains FRAGMENTS of | ||
52 | the same size. Each BLOCK has a 'number of free FRAGMENTS' counter and so | ||
53 | has each TOP (for the entire chain). | ||
54 | |||
55 | * A BLOCK is around 4060 bytes plus the size of the information header. This | ||
56 | size is adjusted to make the allocation of the big block not require more | ||
57 | than 4096 bytes. (This might not be so easy to be sure of, if you don't | ||
58 | know how the big-block system works, but the BMALLOC system uses an | ||
59 | extra header of 12 bytes and the header for the FRAGMENT BLOCK is 20 bytes | ||
60 | in a general 32-bit unix environment.) | ||
61 | |||
62 | * In case the allocation of a BLOCK fails when a FRAGMENT is required, the | ||
63 | next size of FRAGMENTS will be checked for a free FRAGMENT. First when the | ||
64 | larger size lists have been tested without success it will fail for real. | ||
65 | |||
66 | FREE | ||
67 | |||
68 | * When FRAGMENTS are freed so that a BLOCK becomes non-used, it is returned | ||
69 | to the system. | ||
70 | |||
71 | * FREEing a fragment adds the buffer in a LIFO-order. That means that the | ||
72 | next request for a fragment from the same list, the last freed buffer will | ||
73 | be returned first. | ||
74 | |||
75 | REALLOC | ||
76 | |||
77 | * REALLOCATION of a FRAGMENT does first check if the new size would fit | ||
78 | within the same FRAGMENT and if it would use the same FRAGMENT size. If it | ||
79 | does and would, the same pointer is returned. | ||
80 | |||
81 | OVERHEAD | ||
82 | |||
83 | Yes, there is an overhead on small allocations (internal fragmentation). | ||
84 | Yet, I do believe that small allocations more often than larger ones are | ||
85 | used dynamically. I believe that a large overhead is not a big problem if it | ||
86 | remains only for a while. The big gain is with the extreme speed we can GET | ||
87 | and RETURN small allocations. This has yet to be proven. I am open to other | ||
88 | systems of dealing with the small ones, but I don`t believe in using the | ||
89 | same system for all sizes of allocations. | ||
90 | |||
91 | IMPROVEMENT | ||
92 | |||
93 | An addition to the above described algorithm is the `save-empty-BLOCKS-a- | ||
94 | while-afterwards`. It will be used when the last used FRAGMENT within a | ||
95 | BLOCK is freed. The BLOCK will then not get returned to the system until "a | ||
96 | few more" FRAGMENTS have been freed in case the last [few] freed FRAGMENTS | ||
97 | are allocated yet again (and thus prevent the huge overhead of making | ||
98 | FRAGMENTS in a BLOCK). The "only" drawback of such a SEBAWA concept is | ||
99 | that it would mean an even bigger overhead... | ||
100 | |||
101 | HEADERS (in allocated data) | ||
102 | |||
103 | FRAGMENTS - 32-bit pointer to its parent BLOCK (lowest bit must be 0) | ||
104 | BLOCK - 32-bit size (lowest bit must be 1 to separate this from | ||
105 | FRAGMENTS) | ||
106 | |||
107 | ############################################################################## | ||
108 | LARGER ALLOCATIONS | ||
109 | ############################################################################## | ||
110 | |||
111 | If the requested size is larger than the largest FRAGMENT size supported, | ||
112 | the allocation will be made for this memory area alone, or if a BLOCK is | ||
113 | allocated to fit lots of FRAGMENTS a large block is also desired. | ||
114 | |||
115 | * We add memory to the "system" with the add_pool() function call. It | ||
116 | specifies the start and size of the new block of memory that will be | ||
117 | used in this memory allocation system. Several add_pool() calls are | ||
118 | supported and they may or may not add contiguous memory. | ||
119 | |||
120 | * Make all blocks get allocated aligned to BLOCKSIZE (sometimes referred to | ||
121 | as 'grain size'), 64 bytes in my implementation. Reports tell us there is | ||
122 | no real gain in increasing the size of the align. | ||
123 | |||
124 | * We link *all* pieces of memory (AREAS), free or not free. We keep the list | ||
125 | in address order and thus when a FREE() occurs we know instanstly if there | ||
126 | are FREE CHUNKS wall-to-wall. No list "travels" needed. Requires some | ||
127 | extra space in every allocated BLOCK. Still needs to put the new CHUNK in | ||
128 | the right place in size-sorted list/tree. All memory areas, allocated or | ||
129 | not, contain the following header: | ||
130 | - size of this memory area | ||
131 | - FREE status | ||
132 | - pointer to the next AREA closest in memory | ||
133 | - pointer to the prev AREA closest in memory | ||
134 | (12 bytes) | ||
135 | |||
136 | * Sort all FREE CHUNKS in size-order. We use a SPLAY TREE algorithm for | ||
137 | maximum speed. Data/structs used for the size-sorting functions are kept | ||
138 | in an abstraction layer away from this since it is really not changing | ||
139 | anything (except executing speed). | ||
140 | |||
141 | ALLOC (RSIZE - requested size, aligned properly) | ||
142 | |||
143 | * Fetch a CHUNK that RSIZE fits within. If the found CHUNK is larger than | ||
144 | RSIZE, split it and return the RSIZE to the caller. Link the new CHUNK | ||
145 | into the list/tree. | ||
146 | |||
147 | FREE (AREA - piece of memory that is returned to the system) | ||
148 | |||
149 | * Since the allocated BLOCK has kept its link-pointers, we can without | ||
150 | checking any list instantly see if there are any FREE CHUNKS that are | ||
151 | wall-to-wall with the AREA (both sides). If the AREA *is* wall-to-wall | ||
152 | with one or two CHUNKS that or they are unlinked from the lists, enlarged | ||
153 | and re-linked into the lists. | ||
154 | |||
155 | REALLOC | ||
156 | |||
157 | * There IS NO realloc() of large blocks, they are performed in the higher | ||
158 | layer (dmalloc). | ||
159 | |||
160 | |||
161 | ############################################################################## | ||
162 | FURTHER READING | ||
163 | ############################################################################## | ||
164 | |||
165 | * "Dynamic Storage Allocation: A Survey and Critical Review" (Paul R. Wilson, | ||
166 | Mark S. Johnstone, Michael Neely, David Boles) | ||
167 | ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps | ||
168 | |||
169 | * "A Memory Allocator" (Doug Lea) | ||
170 | http://g.oswego.edu/dl/html/malloc.html | ||
171 | ===================================== | ||
172 | Memory Allocation Algorithm Theories. | ||
173 | ===================================== | ||
174 | |||
175 | GOAL | ||
176 | It is intended to be a 100% working memory allocation system. It should be | ||
177 | capable of replacing an ordinary Operating System's own routines. It should | ||
178 | work good in a multitasking, shared memory, non-virtual memory environment | ||
179 | without clogging the memory. Primary aimed for small machines, CPUs and | ||
180 | memory amounts. | ||
181 | |||
182 | I use a best-fit algorithm with a slight overhead in order to increase speed | ||
183 | a lot. It should remain scalable and work good with very large amount of | ||
184 | memory and free/used memory blocks too. | ||
185 | |||
186 | TERMINOLOGY | ||
187 | |||
188 | FRAGMENT - small identically sized parts of a larger BLOCK, they are when | ||
189 | travered in lists etc _not_ allocated. | ||
190 | BLOCK - large memory area, if used for FRAGMENTS, they are linked in a | ||
191 | lists. One list for each FRAGMENT size supported. | ||
192 | TOP - head struct that holds information about and points to a chain | ||
193 | of BLOCKS for a particular FRAGMENT size. | ||
194 | CHUNK - a contiguous area of free memory | ||
195 | |||
196 | MEMORY SYSTEM | ||
197 | |||
198 | We split the system in two parts. One part allocates small memory amounts | ||
199 | and one part allocates large memory amounts, but all allocations are done | ||
200 | "through" the small-part-system. There is an option to use only the small | ||
201 | system (and thus use the OS for large blocks) or the complete package. | ||
202 | |||
203 | ############################################################################## | ||
204 | SMALL SIZE ALLOCATIONS | ||
205 | ############################################################################## | ||
206 | |||
207 | Keywords for this system is 'Deferred Coalescing' and 'quick lists'. | ||
208 | |||
209 | ALLOC | ||
210 | |||
211 | * Small allocations are "aligned" upwards to a set of preset sizes. In the | ||
212 | current implementation I use 20, 28, 52, 116, 312, 580, 812, 2028 bytes. | ||
213 | Memory allocations of these sizes are refered to as FRAGMENTS. | ||
214 | (The reason for these specific sizes is the requirement that they must be | ||
215 | 32-bit aligned and fit as good as possible within 4060 bytes.) | ||
216 | |||
217 | * Allocations larger than 2028 will get a BLOCK for that allocation only. | ||
218 | |||
219 | * Each of these sizes has it's own TOP. When a FRAGMENT is requested, a | ||
220 | larger BLOCK will be allocated and divided into many FRAGMENTS (all of the | ||
221 | same size). TOP points to a list with BLOCKS that contains FRAGMENTS of | ||
222 | the same size. Each BLOCK has a 'number of free FRAGMENTS' counter and so | ||
223 | has each TOP (for the entire chain). | ||
224 | |||
225 | * A BLOCK is around 4060 bytes plus the size of the information header. This | ||
226 | size is adjusted to make the allocation of the big block not require more | ||
227 | than 4096 bytes. (This might not be so easy to be sure of, if you don't | ||
228 | know how the big-block system works, but the BMALLOC system uses an | ||
229 | extra header of 12 bytes and the header for the FRAGMENT BLOCK is 20 bytes | ||
230 | in a general 32-bit unix environment.) | ||
231 | |||
232 | * In case the allocation of a BLOCK fails when a FRAGMENT is required, the | ||
233 | next size of FRAGMENTS will be checked for a free FRAGMENT. First when the | ||
234 | larger size lists have been tested without success it will fail for real. | ||
235 | |||
236 | FREE | ||
237 | |||
238 | * When FRAGMENTS are freed so that a BLOCK becomes non-used, it is returned | ||
239 | to the system. | ||
240 | |||
241 | * FREEing a fragment adds the buffer in a LIFO-order. That means that the | ||
242 | next request for a fragment from the same list, the last freed buffer will | ||
243 | be returned first. | ||
244 | |||
245 | REALLOC | ||
246 | |||
247 | * REALLOCATION of a FRAGMENT does first check if the new size would fit | ||
248 | within the same FRAGMENT and if it would use the same FRAGMENT size. If it | ||
249 | does and would, the same pointer is returned. | ||
250 | |||
251 | OVERHEAD | ||
252 | |||
253 | Yes, there is an overhead on small allocations (internal fragmentation). | ||
254 | Yet, I do believe that small allocations more often than larger ones are | ||
255 | used dynamically. I believe that a large overhead is not a big problem if it | ||
256 | remains only for a while. The big gain is with the extreme speed we can GET | ||
257 | and RETURN small allocations. This has yet to be proven. I am open to other | ||
258 | systems of dealing with the small ones, but I don`t believe in using the | ||
259 | same system for all sizes of allocations. | ||
260 | |||
261 | IMPROVEMENT | ||
262 | |||
263 | An addition to the above described algorithm is the `save-empty-BLOCKS-a- | ||
264 | while-afterwards`. It will be used when the last used FRAGMENT within a | ||
265 | BLOCK is freed. The BLOCK will then not get returned to the system until "a | ||
266 | few more" FRAGMENTS have been freed in case the last [few] freed FRAGMENTS | ||
267 | are allocated yet again (and thus prevent the huge overhead of making | ||
268 | FRAGMENTS in a BLOCK). The "only" drawback of such a SEBAWA concept is | ||
269 | that it would mean an even bigger overhead... | ||
270 | |||
271 | HEADERS (in allocated data) | ||
272 | |||
273 | FRAGMENTS - 32-bit pointer to its parent BLOCK (lowest bit must be 0) | ||
274 | BLOCK - 32-bit size (lowest bit must be 1 to separate this from | ||
275 | FRAGMENTS) | ||
276 | |||
277 | ############################################################################## | ||
278 | LARGER ALLOCATIONS | ||
279 | ############################################################################## | ||
280 | |||
281 | If the requested size is larger than the largest FRAGMENT size supported, | ||
282 | the allocation will be made for this memory area alone, or if a BLOCK is | ||
283 | allocated to fit lots of FRAGMENTS a large block is also desired. | ||
284 | |||
285 | * We add memory to the "system" with the add_pool() function call. It | ||
286 | specifies the start and size of the new block of memory that will be | ||
287 | used in this memory allocation system. Several add_pool() calls are | ||
288 | supported and they may or may not add contiguous memory. | ||
289 | |||
290 | * Make all blocks get allocated aligned to BLOCKSIZE (sometimes referred to | ||
291 | as 'grain size'), 64 bytes in my implementation. Reports tell us there is | ||
292 | no real gain in increasing the size of the align. | ||
293 | |||
294 | * We link *all* pieces of memory (AREAS), free or not free. We keep the list | ||
295 | in address order and thus when a FREE() occurs we know instanstly if there | ||
296 | are FREE CHUNKS wall-to-wall. No list "travels" needed. Requires some | ||
297 | extra space in every allocated BLOCK. Still needs to put the new CHUNK in | ||
298 | the right place in size-sorted list/tree. All memory areas, allocated or | ||
299 | not, contain the following header: | ||
300 | - size of this memory area | ||
301 | - FREE status | ||
302 | - pointer to the next AREA closest in memory | ||
303 | - pointer to the prev AREA closest in memory | ||
304 | (12 bytes) | ||
305 | |||
306 | * Sort all FREE CHUNKS in size-order. We use a SPLAY TREE algorithm for | ||
307 | maximum speed. Data/structs used for the size-sorting functions are kept | ||
308 | in an abstraction layer away from this since it is really not changing | ||
309 | anything (except executing speed). | ||
310 | |||
311 | ALLOC (RSIZE - requested size, aligned properly) | ||
312 | |||
313 | * Fetch a CHUNK that RSIZE fits within. If the found CHUNK is larger than | ||
314 | RSIZE, split it and return the RSIZE to the caller. Link the new CHUNK | ||
315 | into the list/tree. | ||
316 | |||
317 | FREE (AREA - piece of memory that is returned to the system) | ||
318 | |||
319 | * Since the allocated BLOCK has kept its link-pointers, we can without | ||
320 | checking any list instantly see if there are any FREE CHUNKS that are | ||
321 | wall-to-wall with the AREA (both sides). If the AREA *is* wall-to-wall | ||
322 | with one or two CHUNKS that or they are unlinked from the lists, enlarged | ||
323 | and re-linked into the lists. | ||
324 | |||
325 | REALLOC | ||
326 | |||
327 | * There IS NO realloc() of large blocks, they are performed in the higher | ||
328 | layer (dmalloc). | ||
329 | |||
330 | |||
331 | ############################################################################## | ||
332 | FURTHER READING | ||
333 | ############################################################################## | ||
334 | |||
335 | * "Dynamic Storage Allocation: A Survey and Critical Review" (Paul R. Wilson, | ||
336 | Mark S. Johnstone, Michael Neely, David Boles) | ||
337 | ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps | ||
338 | |||
339 | * "A Memory Allocator" (Doug Lea) | ||
340 | http://g.oswego.edu/dl/html/malloc.html | ||