Tortoise And Hare

Nickname for "Floyd's Cycle-Finding Algorithm" invented by Robert W. Floyd in 1967; see my update to the wikipedia entry http://en.wikipedia.org/wiki/Floyd%27s_cycle-finding_algorithm.

A simple technique for detecting infinite loops inside lists, symlinks, state machines, etc.

For example, to detect a loop inside a linked list: This algorithm is particularly interesting because it is O(n) time and O(1) space, which appears to be optimal; more obvious algorithms are O(n) space.


There is an alternative version in which the hare races ahead for 2^n steps, then the tortoise teleports to him and n is increased by 1. More efficient (lower multiplicative constant) and only slightly more complex to code.

Algorithm A, the original:
    tortoise = head
    rabbit = head
    length = 0
    if rabbit == null: return length
    rabbit, length = step(rabbit), length+1
    if rabbit == null: return length
    while rabbit != tortoise:
        rabbit, length = step(rabbit), length+1
        if rabbit==null: return length
        rabbit, length = step(rabbit), length+1
        if rabbit==null: return length
        tortoise = step(tortoise)
    return infinity

Algorithm B, the leaping version:
    tortoise = head
    rabbit = head
    length,gap = 0, 0
    if rabbit == null: return "Finite list of length "+string(length)
    rabbit, length, gap = step(rabbit), length+1, gap+1
    if rabbit == null: return "Finite list of length "+string(length)
    limit = 1
    while rabbit != tortoise:
        if gap==limit: tortoise, gap, limit = rabbit, 0, limit*2
        rabbit, length, gap = step(rabbit), length+1, gap+1
        if rabbit==null: return "Finite list of length "+string(length)
    return "List with loop of size "+string(gap)

In each, length is how long the list is so far as traversed by rabbit. In the first, if rabbit takes 2n steps, tortoise takes n steps. In the second tortoise never takes any steps. In linked lists stepping to the next node isn't expensive, but this loop finding algorithm can be used in places where the list is implicit (such as the Pollard Rho and Elliptic Curve methods of factoring integers) and stepping is expensive. Algorithm B then wins by a substantial margin.

Here's the complete analysis:

Suppose there is no loop and the list is of length 2n. Then algorithm A takes 3n steps, and algorithm B takes 2n steps. Now suppose there is a loop, and let t be the number of steps taken to reach the loop and let p be the size of the loop.

Algorithm A:

Algorithm B:

Therefore algorithm B never takes more steps than algorithm A and almost always takes fewer. Algorithm B is significantly harder to analyse, and should perhaps be introduced second in a teaching context.

A comment about the tortoise in the second version: copying the rabbit's position into the tortoise (aka teleporting the tortoise to the rabbit) is about the same complexity as following a link in a linked list. However, the teleportation is far less expensive than a step in the context of integer factoring algorithms.

It occurs to me that I don't see a rationale for doubling as being the most efficient limit change. Assuming a multiplicative change, at least, is most efficient, then a heuristic argument would be that multiplying by e would be best, since infinity is too much and (1 + infinitesimal) is too small, and e is the usual minimum of such things.

Do the analysis. I've done the *2 case, surely you can work out what gives the minimum expected time.

That would be fair. But you're probably the better mathematician of us two, and it is typical of me to be lazy like this: to offer a heuristic motivation rather than an actual proof. :-) (This is undoubtedly how I ended up in the commercial world rather than academia. I do appreciate the importance of proofs, but I tend to suffer from the FrameProblem: getting tangled up in possible issues that an expert mathematician knows can simply be ignored.)

Well, if t is really small then we want the multiplier to be really big, and if t is really big then we want the multiplier to be really small. It depends on the assumptions you make about how the cycles have arisen, and therefore what the likely relationship is. For practical purposes a multiplier of 2 would seem a very good choice.

Is there an analog to this algorithm for detecting cycles in an arbitrary graph?

If you have a deterministic algorithm for iterating through the nodes in your graph then just follow it and you'll find a cycle or a dead-end. If you find a dead-end you don't know that there are no cycles. If your graph can't be iterated in a deterministic manner then I think you're out of luck.

You can always find cycles by taking the n-th power of the adjacency matrix and looking on the diagonal.

Are dead-ends a fundamental problem - wouldn't backtracking (e.g. depth-first) iteration avoid them? Are we talking about directed or undirected graphs?

The entire point of the algorithm on this page is that it doesn't use back-tracking. The question was whether this algorithm works. There are other algorithms, including some which back-track, but that wasn't the question. When you ask a question like that I no longer know how to be helpful. You either don't understand the algorithm or you're asking a different question entirely.

Sorry, I interpreted "analog" differently. The depth-first graph version seems analogous to me since it finds cycles in a data structure by "racing" to see if a hare overtakes a tortoise. At O(n) time and space it is still an efficient algorithm, but not quite as good as the original.

I can't think of a deterministic O(1) space algorithm to find cycles in graphs. I'd be interested to see one.


CategoryAlgorithm

EditText of this page (last edited January 31, 2009) or FindPage with title or text search