Module oxischeme::heap [-] [+] [src]

The heap module provides memory management for our Scheme implementation.

Allocation

Scheme has a variety of types that must be allocated on the heap: cons cells, strings, procedures, and vectors (currently unimplemented).

Oxischeme does not allocate each individual object directly from the OS, which would have unnecessary bookkeeping overhead. Instead, we allocate objects from an "arena" which contains a pre-allocated object pool reserved for allocating future objects from. We keep track of an arena's un-used objects with a "free list" of indices into this pool. When we allocate a new object, we remove the first entry from this list and return a pointer to the object at that entry's index in the object pool. Garbage collection adds new entries to the free list when reclaiming dead objects. When allocating, if our existing arenas' pools are already at capacity (ie, all of their free lists are empty), then we allocate another arena from the OS, add it to our set of arenas, and allocate from its object pool. During garbage collection, if an arena is empty, its memory is returned to the OS.

Garbage Collection

Any type that is heap-allocated must be garbage collected so that the memory of no-longer used instances of that type can be reclaimed for reuse. This provides the illusion of infinite memory, and frees Scheme programmers from manually managing allocations and frees. We refer to GC-managed types as "GC things". Note that a GC thing does not need to be a Scheme value type: activations are also managed by the GC, but are not a first class Scheme value.

Any structure that has references to a garbage collected type must participate in garbage collection by telling the garbage collector about all of the GC things it is holding alive. Participation is implemented via the Trace trait. Note that the set of types that participate in garbage collection is not the same as the set of all GC things. Some GC things do not participate in garbage collection: strings do not hold references to any other GC things.

A "GC root" is a GC participant that is always reachable. For example, the global activation is a root because global variables must always be accessible.

We use a simple mark and sweep garbage collection algorithm. In the mark phase, we start from the roots and recursively trace every reachable object in the heap graph, adding them to our "marked" set. If a GC thing is not reachable, then it is impossible for the Scheme program to use it in the future, and it is safe for the garbage collector to reclaim it. The unreachable objects are the set of GC things that are not in the marked set. We find these unreachable objects and return them to their respective arena's free list in the sweep phase.

Rooting

Sometimes it is necessary to temporarily root GC things referenced by pointers on the stack. Garbage collection can be triggered by allocating any GC thing, and it isn't always clear which rust functions (or other functions called by those functions, or even other functions called by those functions called from the first function, and so on) might allocate a GC thing and trigger collection. The situation we want to avoid is a rust function using a temporary variable that references a GC thing, then calling another function which triggers a collection and collects the GC thing that was referred to by the temporary variable, and the temporary variable is now a dangling pointer. If the rust function accesses it again, that is undefined behavior: it might still get the value it was pointing at, or it might be a segfault, or it might be a freshly allocated value used by something else! Not good!

Here is what this scenario looks like in psuedo code:

let a = pointer_to_some_gc_thing;
function_which_can_trigger_gc();
// Oops! A collection was triggered and dereferencing this pointer leads
// to undefined behavior!
*a;

There are two possible solutions to this problem. The first is conservative garbage collection, where we walk the stack and if anything on the stack looks like it might be a pointer and if coerced to a pointer happens to point to a GC thing in the heap, we assume that it is a pointer and we consider the GC thing that may or may not actually be pointed to by a variable on the stack a GC root. The second is precise rooting. With precise rooting, it is the responsibility of the rust function's author to explicitly root and unroot pointers to GC things used in variables on the stack.

Oxischeme uses precise rooting. Precise rooting is implemented with the Rooted<GcThingPtr> smart pointer type, which roots its referent upon construction and unroots it when the smart pointer goes out of scope and is dropped.

Using precise rooting and Rooted, we can solve the dangling pointer problem like this:

{
    // The pointed to GC thing gets rooted when wrapped with `Rooted`.
    let a = Rooted::new(heap, pointer_to_some_gc_thing);
    function_which_can_trigger_gc();
    // Dereferencing `a` is now safe, because the referent is a GC root!
    *a;
}
// `a` goes out of scope, and its referent is unrooted.

Tips for working with precise rooting if your function allocates GC things, or calls other functions which allocate GC things:

Accept GC thing parameters as &Rooted<T> or &mut Rooted<T> to ensure that callers properly root them.
Accept a &mut Heap parameter and return Rooted<T> for getters and methods that return GC things. This greatly alleviates potential foot-guns, as a caller would have to explicitly unwrap the smart pointer and store that in a new variable to cause a dangling pointer. It also cuts down on Rooted<T> construction boiler plate.
Always root GC things whose lifetime spans a call which could trigger a collection!
When in doubt, Just Root It!

Structs

Arena	An arena from which to allocate `T` objects from.
ArenaPtr	A pointer to a `T` instance in an arena.
ArenaSet	A set of `Arena`s. Manages allocating and deallocating additional `Arena`s from the OS, depending on the number of objects requested and kept alive by the mutator.
Heap	The scheme heap and GC runtime, containing all allocated cons cells, activations, procedures, and strings (including strings for symbols).
Rooted	A smart pointer wrapping the pointer type `T`. It keeps its referent rooted while the smart pointer is in scope to prevent dangling pointers caused by a garbage collection within the pointers lifespan. For more information see the module level documentation about rooting.

Enums

GcThing

The union of the various types that are GC things.

Statics

DEFAULT_ACTIVATIONS_CAPACITY	The default capacity of activations per arena.
DEFAULT_CONS_CAPACITY	The default capacity of cons cells per arena.
DEFAULT_PROCEDURES_CAPACITY	The default capacity of procedures per arena.
DEFAULT_STRINGS_CAPACITY	The default capacity of strings per arena.

Traits

ToGcThing	A trait for types that can be coerced to a `GcThing`.
Trace	The `Trace` trait allows GC participants to inform the collector of their references to other GC things.

Type Definitions

IterGcThing	An iterable of `GcThing`s.
RootedStringPtr	A rooted pointer to a string on the heap.
StringPtr	A pointer to a string on the heap.

Modules

Functions