Chapter 7: Meta Programming - Tail Call Optimization (TCO) - 《You Don't Know JS: ES6 & Beyond（1st edition）》

Tail Call Optimization (TCO)

Tail Call Optimization (TCO)

Normally, when a function call is made from inside another function, a second stack frame is allocated to separately manage the variables/state of that other function invocation. Not only does this allocation cost some processing time, but it also takes up some extra memory.

A call stack chain typically has at most 10-15 jumps from one function to another and another. In those scenarios, the memory usage is not likely any kind of practical problem.

However, when you consider recursive programming (a function calling itself repeatedly) — or mutual recursion with two or more functions calling each other — the call stack could easily be hundreds, thousands, or more levels deep. You can probably see the problems that could cause, if memory usage grows unbounded.

JavaScript engines have to set an arbitrary limit to prevent such programming techniques from crashing by running the browser and device out of memory. That’s why we get the frustrating “RangeError: Maximum call stack size exceeded” thrown if the limit is hit.

Warning: The limit of call stack depth is not controlled by the specification. It’s implementation dependent, and will vary between browsers and devices. You should never code with strong assumptions of exact observable limits, as they may very well change from release to release.

Certain patterns of function calls, called tail calls, can be optimized in a way to avoid the extra allocation of stack frames. If the extra allocation can be avoided, there’s no reason to arbitrarily limit the call stack depth, so the engines can let them run unbounded.

A tail call is a return statement with a function call, where nothing has to happen after the call except returning its value.

This optimization can only be applied in strict mode. Yet another reason to always be writing all your code as strict!

Here’s a function call that is not in tail position:

"use strict";
function foo(x) {
    return x * 2;
}
function bar(x) {
    // not a tail call
    return 1 + foo( x );
}
bar( 10 );                // 21

1 + .. has to be performed after the foo(x) call completes, so the state of that bar(..) invocation needs to be preserved.

But the following snippet demonstrates calls to foo(..) and bar(..) where both are in tail position, as they’re the last thing to happen in their code path (other than the return):

"use strict";
function foo(x) {
    return x * 2;
}
function bar(x) {
    x = x + 1;
    if (x > 10) {
        return foo( x );
    }
    else {
        return bar( x + 1 );
    }
}
bar( 5 );                // 24
bar( 15 );                // 32

In this program, bar(..) is clearly recursive, but foo(..) is just a regular function call. In both cases, the function calls are in proper tail position. The x + 1 is evaluated before the bar(..) call, and whenever that call finishes, all that happens is the return.

Proper Tail Calls (PTC) of these forms can be optimized — called tail call optimization (TCO) — so that the extra stack frame allocation is unnecessary. Instead of creating a new stack frame for the next function call, the engine just reuses the existing stack frame. That works because a function doesn’t need to preserve any of the current state, as nothing happens with that state after the PTC.

TCO means there’s practically no limit to how deep the call stack can be. That trick slightly improves regular function calls in normal programs, but more importantly opens the door to using recursion for program expression even if the call stack could be tens of thousands of calls deep.

We’re no longer relegated to simply theorizing about recursion for problem solving, but can actually use it in real JavaScript programs!

As of ES6, all PTC should be optimizable in this way, recursion or not.

Tail Call Rewrite

The hitch, however, is that only PTC can be optimized; non-PTC will still work of course, but will cause stack frame allocation as they always did. You’ll have to be careful about structuring your functions with PTC if you expect the optimizations to kick in.

If you have a function that’s not written with PTC, you may find the need to manually rearrange your code to be eligible for TCO.

Consider:

"use strict";
function foo(x) {
    if (x <= 1) return 1;
    return (x / 2) + foo( x - 1 );
}
foo( 123456 );            // RangeError

The call to foo(x-1) isn’t a PTC because its result has to be added to (x / 2) before returning.

However, to make this code eligible for TCO in an ES6 engine, we can rewrite it as follows:

"use strict";
var foo = (function(){
    function _foo(acc,x) {
        if (x <= 1) return acc;
        return _foo( (x / 2) + acc, x - 1 );
    }
    return function(x) {
        return _foo( 1, x );
    };
})();
foo( 123456 );            // 3810376848.5

If you run the previous snippet in an ES6 engine that implements TCO, you’ll get the 3810376848.5 answer as shown. However, it’ll still fail with a RangeError in non-TCO engines.

Non-TCO Optimizations

There are other techniques to rewrite the code so that the call stack isn’t growing with each call.

One such technique is called trampolining, which amounts to having each partial result represented as a function that either returns another partial result function or the final result. Then you can simply loop until you stop getting a function, and you’ll have the result. Consider:

"use strict";
function trampoline( res ) {
    while (typeof res == "function") {
        res = res();
    }
    return res;
}
var foo = (function(){
    function _foo(acc,x) {
        if (x <= 1) return acc;
        return function partial(){
            return _foo( (x / 2) + acc, x - 1 );
        };
    }
    return function(x) {
        return trampoline( _foo( 1, x ) );
    };
})();
foo( 123456 );            // 3810376848.5

This reworking required minimal changes to factor out the recursion into the loop in trampoline(..):

First, we wrapped the return _foo .. line in the return partial() { .. function expression.
Then we wrapped the _foo(1,x) call in the trampoline(..) call.

The reason this technique doesn’t suffer the call stack limitation is that each of those inner partial(..) functions is just returned back to the while loop in trampoline(..), which runs it and then loop iterates again. In other words, partial(..) doesn’t recursively call itself, it just returns another function. The stack depth remains constant, so it can run as long as it needs to.

Trampolining expressed in this way uses the closure that the inner partial() function has over the x and acc variables to keep the state from iteration to iteration. The advantage is that the looping logic is pulled out into a reusable trampoline(..) utility function, which many libraries provide versions of. You can reuse trampoline(..) multiple times in your program with different trampolined algorithms.

Of course, if you really wanted to deeply optimize (and the reusability wasn’t a concern), you could discard the closure state and inline the state tracking of acc into just one function’s scope along with a loop. This technique is generally called recursion unrolling:

"use strict";
function foo(x) {
    var acc = 1;
    while (x > 1) {
        acc = (x / 2) + acc;
        x = x - 1;
    }
    return acc;
}
foo( 123456 );            // 3810376848.5

This expression of the algorithm is simpler to read, and will likely perform the best (strictly speaking) of the various forms we’ve explored. That may seem like a clear winner, and you may wonder why you would ever try the other approaches.

There are some reasons why you might not want to always manually unroll your recursions:

Instead of factoring out the trampolining (loop) logic for reusability, we’ve inlined it. This works great when there’s only one example to consider, but as soon as you have a half dozen or more of these in your program, there’s a good chance you’ll want some reusability to keep things shorter and more manageable.
The example here is deliberately simple enough to illustrate the different forms. In practice, there are many more complications in recursion algorithms, such as mutual recursion (more than just one function calling itself).

The farther you go down this rabbit hole, the more manual and intricate the unrolling optimizations are. You’ll quickly lose all the perceived value of readability. The primary advantage of recursion, even in the PTC form, is that it preserves the algorithm readability, and offloads the performance optimization to the engine.

If you write your algorithms with PTC, the ES6 engine will apply TCO to let your code run in constant stack depth (by reusing stack frames). You get the readability of recursion with most of the performance benefits and no limitations of run length.

Meta?

What does TCO have to do with meta programming?

As we covered in the “Feature Testing” section earlier, you can determine at runtime what features an engine supports. This includes TCO, though determining it is quite brute force. Consider:

"use strict";
try {
    (function foo(x){
        if (x < 5E5) return foo( x + 1 );
    })( 1 );
    TCO_ENABLED = true;
}
catch (err) {
    TCO_ENABLED = false;
}

In a non-TCO engine, the recursive loop will fail out eventually, throwing an exception caught by the try..catch. Otherwise, the loop completes easily thanks to TCO.

Yuck, right?

But how could meta programming around the TCO feature (or rather, the lack thereof) benefit our code? The simple answer is that you could use such a feature test to decide to load a version of your application’s code that uses recursion, or an alternative one that’s been converted/transpiled to not need recursion.

Self-Adjusting Code

But here’s another way of looking at the problem:

"use strict";
function foo(x) {
    function _foo() {
        if (x > 1) {
            acc = acc + (x / 2);
            x = x - 1;
            return _foo();
        }
    }
    var acc = 1;
    while (x > 1) {
        try {
            _foo();
        }
        catch (err) { }
    }
    return acc;
}
foo( 123456 );            // 3810376848.5

This algorithm works by attempting to do as much of the work with recursion as possible, but keeping track of the progress via scoped variables x and acc. If the entire problem can be solved with recursion without an error, great. If the engine kills the recursion at some point, we simply catch that with the try..catch and then try again, picking up where we left off.

I consider this a form of meta programming in that you are probing during runtime the ability of the engine to fully (recursively) finish the task, and working around any (non-TCO) engine limitations that may restrict you.

At first (or even second!) glance, my bet is this code seems much uglier to you compared to some of the earlier versions. It also runs a fair bit slower (on larger runs in a non-TCO environment).

The primary advantage, other than it being able to complete any size task even in non-TCO engines, is that this “solution” to the recursion stack limitation is much more flexible than the trampolining or manual unrolling techniques shown previously.

Essentially, _foo() in this case is a sort of stand-in for practically any recursive task, even mutual recursion. The rest is the boilerplate that should work for just about any algorithm.

The only “catch” is that to be able to resume in the event of a recursion limit being hit, the state of the recursion must be in scoped variables that exist outside the recursive function(s). We did that by leaving x and acc outside of the _foo() function, instead of passing them as arguments to _foo() as earlier.

Almost any recursive algorithm can be adapted to work this way. That means it’s the most widely applicable way of leveraging TCO with recursion in your programs, with minimal rewriting.

This approach still uses a PTC, meaning that this code will progressively enhance from running using the loop many times (recursion batches) in an older browser to fully leveraging TCO’d recursion in an ES6+ environment. I think that’s pretty cool!