Tagless Garbage Collection

One key issue with GarbageCollection is identification of the root set--the set of objects which make the roots (sources) of the object dependency graph. In general, the root set consists of all references located in TheStack (or stacks, or wherever ActivationRecords are stored), cached in registers, and in the static object area (static objects have unlimited extent), which are live (will have an effect on the future computation of the program). Any object reachable from the root set is (possibly) live, and is not reclaimed by the collector. Any object not reachable from the root set is garbage and may be reclaimed.

Note that there likely will be objects which are reachable by the root set, but not live (and thus could be reclaimed by the collector); this is called SemanticGarbage?. Determining the SemanticGarbage? in a program is in general undecidable; thus reachability is used as an approximation. (Determining what is reachable from the root set is simple graph traversal; and all live objects are reachable.)

Several strategies for determining the root set: One is to treat all suitably-aligned objects in TheStack, etc., as a potential live reference; this is commonly known as ConservativeGarbageCollection?. Languages with unsafe typing such as C/C++ are limited to this sort of collection--obviously, this will cause many "false alarms" as ints on the stack are treated like potential pointers. In languages which make a clearer distinction between references and non-references, it is possible to exclude ints.

One way to do this is to maintain (in the ActivationRecord) a tag map--essentially a vector or bits indicating whether or not a given item in TheStack is a reference or not. These bits are typically set/cleared at runtime--if a pointer is loaded onto TheStack, the corresponding tag bit is set; if a pointer goes out of scope or is replaced with an int, the bit can be cleared. This adds quite a bit of runtime overhead.

A better way is tagless GarbageCollection--wherein the compiler knows what elements of a given stack frame are pointers and what aren't, and can free the runtime code from having to deal with keeping track of this.

Separate from the issue of whether tags are used or not, is the issue of liveness. Knowing what is a pointer and what isn't is a significant improvement, but we can still do better. At any given time, some of the pointers in a stack frame might be dead--unable to contribute to the program any further. For example:

        Foo s = new Foo(1);
        this.my_s_ = s;
        // hundreds of lines of computation that don't involve s anymore

After the statement this.my_s_ = s; the pointer s is dead. It has no effect on the computation. (Note that the object pointed to by s is not necessarily garbage; it is referenced by the my_s_ attribute of the enclosing object). The garbage collector can--in theory--safely ignore s when determining the root set, if it is invoked after s becomes dead. By doing so, the garbage collector can become both faster (fewer vertices in the dependency graph to traverse) and more efficient (less likelihood that examining the dependencies of a dead reference will keep SemanticGarbage? from being collected).

However, in most languages (those that don't do explicit PointerKilling), the pointer s will continue to be considered "active" (and part of the RootSet?) until the enclosing block goes out of scope. If the block does extensive computation (calls a ray-tracer, say, or streams in an MPEG off the net--not returning until these operations complete)--the object pointed to by s may remain live until long after it becomes SemanticGarbage?.

Benjamin Goldberg was doing research on tagless garbage collection with programming languages with StaticTyping (using TypeInference) when he discovered a rather interesting thing: If in a language with certain properties, you can use compile-time analysis to recreate the types of most references on the stack. Those references which you cannot determine the types for via this analysis are SemanticGarbage?--they need not be handled by the garbage collector.

A cool discovery, if you ask me.

The paper is:

Goldberg, Benjamin. Tag-free garbage collection for strongly-typed programming languages. ACM SIGPLAN notices 26, 6 (1991) 165-176.

View edit of March 24, 2007 or FindPage with title or text search