Module oxischeme::heap
[-]
[+]
[src]
The heap
module provides memory management for our Scheme implementation.
Allocation
Scheme has a variety of types that must be allocated on the heap: cons cells, strings, procedures, and vectors (currently unimplemented).
Oxischeme does not allocate each individual object directly from the OS, which would have unnecessary bookkeeping overhead. Instead, we allocate objects from an "arena" which contains a pre-allocated object pool reserved for allocating future objects from. We keep track of an arena's un-used objects with a "free list" of indices into this pool. When we allocate a new object, we remove the first entry from this list and return a pointer to the object at that entry's index in the object pool. Garbage collection adds new entries to the free list when reclaiming dead objects. When allocating, if our existing arenas' pools are already at capacity (ie, all of their free lists are empty), then we allocate another arena from the OS, add it to our set of arenas, and allocate from its object pool. During garbage collection, if an arena is empty, its memory is returned to the OS.
Garbage Collection
Any type that is heap-allocated must be garbage collected so that the memory of no-longer used instances of that type can be reclaimed for reuse. This provides the illusion of infinite memory, and frees Scheme programmers from manually managing allocations and frees. We refer to GC-managed types as "GC things". Note that a GC thing does not need to be a Scheme value type: activations are also managed by the GC, but are not a first class Scheme value.
Any structure that has references to a garbage collected type must
participate in garbage collection by telling the garbage collector about
all of the GC things it is holding alive. Participation is implemented via
the Trace
trait. Note that the set of types that participate in garbage
collection is not the same as the set of all GC things. Some GC things do not
participate in garbage collection: strings do not hold references to any
other GC things.
A "GC root" is a GC participant that is always reachable. For example, the global activation is a root because global variables must always be accessible.
We use a simple mark and sweep garbage collection algorithm. In the mark phase, we start from the roots and recursively trace every reachable object in the heap graph, adding them to our "marked" set. If a GC thing is not reachable, then it is impossible for the Scheme program to use it in the future, and it is safe for the garbage collector to reclaim it. The unreachable objects are the set of GC things that are not in the marked set. We find these unreachable objects and return them to their respective arena's free list in the sweep phase.
Rooting
Sometimes it is necessary to temporarily root GC things referenced by pointers on the stack. Garbage collection can be triggered by allocating any GC thing, and it isn't always clear which rust functions (or other functions called by those functions, or even other functions called by those functions called from the first function, and so on) might allocate a GC thing and trigger collection. The situation we want to avoid is a rust function using a temporary variable that references a GC thing, then calling another function which triggers a collection and collects the GC thing that was referred to by the temporary variable, and the temporary variable is now a dangling pointer. If the rust function accesses it again, that is undefined behavior: it might still get the value it was pointing at, or it might be a segfault, or it might be a freshly allocated value used by something else! Not good!
Here is what this scenario looks like in psuedo code:
let a = pointer_to_some_gc_thing; function_which_can_trigger_gc(); // Oops! A collection was triggered and dereferencing this pointer leads // to undefined behavior! *a;
There are two possible solutions to this problem. The first is conservative garbage collection, where we walk the stack and if anything on the stack looks like it might be a pointer and if coerced to a pointer happens to point to a GC thing in the heap, we assume that it is a pointer and we consider the GC thing that may or may not actually be pointed to by a variable on the stack a GC root. The second is precise rooting. With precise rooting, it is the responsibility of the rust function's author to explicitly root and unroot pointers to GC things used in variables on the stack.
Oxischeme uses precise rooting. Precise rooting is implemented with the
Rooted<GcThingPtr>
smart pointer type, which roots its referent upon
construction and unroots it when the smart pointer goes out of scope and is
dropped.
Using precise rooting and Rooted
, we can solve the dangling pointer
problem like this:
{ // The pointed to GC thing gets rooted when wrapped with `Rooted`. let a = Rooted::new(heap, pointer_to_some_gc_thing); function_which_can_trigger_gc(); // Dereferencing `a` is now safe, because the referent is a GC root! *a; } // `a` goes out of scope, and its referent is unrooted.
Tips for working with precise rooting if your function allocates GC things, or calls other functions which allocate GC things:
Accept GC thing parameters as
&Rooted<T>
or&mut Rooted<T>
to ensure that callers properly root them.Accept a
&mut Heap
parameter and returnRooted<T>
for getters and methods that return GC things. This greatly alleviates potential foot-guns, as a caller would have to explicitly unwrap the smart pointer and store that in a new variable to cause a dangling pointer. It also cuts down onRooted<T>
construction boiler plate.Always root GC things whose lifetime spans a call which could trigger a collection!
When in doubt, Just Root It!
Structs
Arena | An arena from which to allocate |
ArenaPtr | A pointer to a |
ArenaSet | A set of |
Heap | The scheme heap and GC runtime, containing all allocated cons cells, activations, procedures, and strings (including strings for symbols). |
Rooted | A smart pointer wrapping the pointer type |
Enums
GcThing | The union of the various types that are GC things. |
Statics
DEFAULT_ACTIVATIONS_CAPACITY | The default capacity of activations per arena. |
DEFAULT_CONS_CAPACITY | The default capacity of cons cells per arena. |
DEFAULT_PROCEDURES_CAPACITY | The default capacity of procedures per arena. |
DEFAULT_STRINGS_CAPACITY | The default capacity of strings per arena. |
Traits
ToGcThing | A trait for types that can be coerced to a |
Trace | The |
Type Definitions
IterGcThing | An iterable of |
RootedStringPtr | A rooted pointer to a string on the heap. |
StringPtr | A pointer to a string on the heap. |