Chapter 3: Organization - Generators - 《You Don't Know JS: ES6 & Beyond（1st edition）》

Generators

Generators

All functions run to completion, right? In other words, once a function starts running, it finishes before anything else can interrupt.

At least that’s how it’s been for the whole history of JavaScript up to this point. As of ES6, a new somewhat exotic form of function is being introduced, called a generator. A generator can pause itself in mid-execution, and can be resumed either right away or at a later time. So it clearly does not hold the run-to-completion guarantee that normal functions do.

Moreover, each pause/resume cycle in mid-execution is an opportunity for two-way message passing, where the generator can return a value, and the controlling code that resumes it can send a value back in.

As with iterators in the previous section, there are multiple ways to think about what a generator is, or rather what it’s most useful for. There’s no one right answer, but we’ll try to consider several angles.

Note: See the Async & Performance title of this series for more information about generators, and also see Chapter 4 of this current title.

Syntax

The generator function is declared with this new syntax:

function *foo() {
    // ..
}

The position of the * is not functionally relevant. The same declaration could be written as any of the following:

function *foo()  { .. }
function* foo()  { .. }
function * foo() { .. }
function*foo()   { .. }
..

The only difference here is stylistic preference. Most other literature seems to prefer function* foo(..) { .. }. I prefer function *foo(..) { .. }, so that’s how I’ll present them for the rest of this title.

My reason is purely didactic in nature. In this text, when referring to a generator function, I will use *foo(..), as opposed to foo(..) for a normal function. I observe that *foo(..) more closely matches the * positioning of function *foo(..) { .. }.

Moreover, as we saw in Chapter 2 with concise methods, there’s a concise generator form in object literals:

var a = {
    *foo() { .. }
};

I would say that with concise generators, *foo() { .. } is rather more natural than * foo() { .. }. So that further argues for matching the consistency with *foo().

Consistency eases understanding and learning.

Executing a Generator

Though a generator is declared with *, you still execute it like a normal function:

foo();

You can still pass it arguments, as in:

function *foo(x,y) {
    // ..
}
foo( 5, 10 );

The major difference is that executing a generator, like foo(5,10) doesn’t actually run the code in the generator. Instead, it produces an iterator that will control the generator to execute its code.

We’ll come back to this later in “Iterator Control,” but briefly:

function *foo() {
    // ..
}
var it = foo();
// to start/advanced `*foo()`, call
// `it.next(..)`

`yield`

Generators also have a new keyword you can use inside them, to signal the pause point: yield. Consider:

function *foo() {
    var x = 10;
    var y = 20;
    yield;
    var z = x + y;
}

In this *foo() generator, the operations on the first two lines would run at the beginning, then yield would pause the generator. If and when resumed, the last line of *foo() would run. yield can appear any number of times (or not at all, technically!) in a generator.

You can even put yield inside a loop, and it can represent a repeated pause point. In fact, a loop that never completes just means a generator that never completes, which is completely valid, and sometimes entirely what you need.

yield is not just a pause point. It’s an expression that sends out a value when pausing the generator. Here’s a while..true loop in a generator that for each iteration yields a new random number:

function *foo() {
    while (true) {
        yield Math.random();
    }
}

The yield .. expression not only sends a value — yield without a value is the same as yield undefined — but also receives (e.g., is replaced by) the eventual resumption value. Consider:

function *foo() {
    var x = yield 10;
    console.log( x );
}

This generator will first yield out the value 10 when pausing itself. When you resume the generator — using the it.next(..) we referred to earlier — whatever value (if any) you resume with will replace/complete the whole yield 10 expression, meaning that value will be assigned to the x variable.

A yield .. expression can appear anywhere a normal expression can. For example:

function *foo() {
    var arr = [ yield 1, yield 2, yield 3 ];
    console.log( arr, yield 4 );
}

*foo() here has four yield .. expressions. Each yield results in the generator pausing to wait for a resumption value that’s then used in the various expression contexts.

yield is not technically an operator, though when used like yield 1 it sure looks like it. Because yield can be used all by itself as in var x = yield;, thinking of it as an operator can sometimes be confusing.

Technically, yield .. is of the same “expression precedence” — similar conceptually to operator precedence — as an assignment expression like a = 3. That means yield .. can basically appear anywhere a = 3 can validly appear.

Let’s illustrate the symmetry:

var a, b;
a = 3;                    // valid
b = 2 + a = 3;            // invalid
b = 2 + (a = 3);        // valid
yield 3;                // valid
a = 2 + yield 3;        // invalid
a = 2 + (yield 3);        // valid

Note: If you think about it, it makes a sort of conceptual sense that a yield .. expression would behave similar to an assignment expression. When a paused yield expression is resumed, it’s completed/replaced by the resumption value in a way that’s not terribly dissimilar from being “assigned” that value.

The takeaway: if you need yield .. to appear in a position where an assignment like a = 3 would not itself be allowed, it needs to be wrapped in a ( ).

Because of the low precedence of the yield keyword, almost any expression after a yield .. will be computed first before being sent with yield. Only the ... spread operator and the , comma operator have lower precedence, meaning they’d bind after the yield has been evaluated.

So just like with multiple operators in normal statements, another case where ( ) might be needed is to override (elevate) the low precedence of yield, such as the difference between these expressions:

yield 2 + 3;            // same as `yield (2 + 3)`
(yield 2) + 3;            // `yield 2` first, then `+ 3`

Just like = assignment, yield is also “right-associative,” which means that multiple yield expressions in succession are treated as having been ( .. ) grouped from right to left. So, yield yield yield 3 is treated as yield (yield (yield 3)). A “left-associative” interpretation like ((yield) yield) yield 3 would make no sense.

Just like with operators, it’s a good idea to use ( .. ) grouping, even if not strictly required, to disambiguate your intent if yield is combined with other operators or yields.

Note: See the Types & Grammar title of this series for more information about operator precedence and associativity.

`yield *`

In the same way that the * makes a function declaration into function * generator declaration, a * makes yield into yield *, which is a very different mechanism, called yield delegation. Grammatically, yield *.. will behave the same as a yield .., as discussed in the previous section.

yield * .. requires an iterable; it then invokes that iterable’s iterator, and delegates its own host generator’s control to that iterator until it’s exhausted. Consider:

function *foo() {
    yield *[1,2,3];
}

Note: As with the * position in a generator’s declaration (discussed earlier), the * positioning in yield * expressions is stylistically up to you. Most other literature prefers yield* .., but I prefer yield *.., for very symmetrical reasons as already discussed.

The [1,2,3] value produces an iterator that will step through its values, so the *foo() generator will yield those values out as it’s consumed. Another way to illustrate the behavior is in yield delegating to another generator:

function *foo() {
    yield 1;
    yield 2;
    yield 3;
}
function *bar() {
    yield *foo();
}

The iterator produced when *bar() calls *foo() is delegated to via yield *, meaning whatever value(s) *foo() produces will be produced by *bar().

Whereas with yield .. the completion value of the expression comes from resuming the generator with it.next(..), the completion value of the yield *.. expression comes from the return value (if any) from the delegated-to iterator.

Built-in iterators generally don’t have return values, as we covered at the end of the “Iterator Loop” section earlier in this chapter. But if you define your own custom iterator (or generator), you can design it to return a value, which yield *.. would capture:

function *foo() {
    yield 1;
    yield 2;
    yield 3;
    return 4;
}
function *bar() {
    var x = yield *foo();
    console.log( "x:", x );
}
for (var v of bar()) {
    console.log( v );
}
// 1 2 3
// x: 4

While the 1, 2, and 3 values are yielded out of *foo() and then out of *bar(), the 4 value returned from *foo() is the completion value of the yield *foo() expression, which then gets assigned to x.

Because yield * can call another generator (by way of delegating to its iterator), it can also perform a sort of generator recursion by calling itself:

function *foo(x) {
    if (x < 3) {
        x = yield *foo( x + 1 );
    }
    return x * 2;
}
foo( 1 );

The result from foo(1) and then calling the iterator’s next() to run it through its recursive steps will be 24. The first *foo(..) run has x at value 1, which is x < 3. x + 1 is passed recursively to *foo(..), so x is then 2. One more recursive call results in x of 3.

Now, because x < 3 fails, the recursion stops, and return 3 * 2 gives 6 back to the previous call’s yield *.. expression, which is then assigned to x. Another return 6 * 2 returns 12 back to the previous call’s x. Finally 12 * 2, or 24, is returned from the completed run of the *foo(..) generator.

Iterator Control

Earlier, we briefly introduced the concept that generators are controlled by iterators. Let’s fully dig into that now.

Recall the recursive *foo(..) from the previous section. Here’s how we’d run it:

function *foo(x) {
    if (x < 3) {
        x = yield *foo( x + 1 );
    }
    return x * 2;
}
var it = foo( 1 );
it.next();                // { value: 24, done: true }

In this case, the generator doesn’t really ever pause, as there’s no yield .. expression. Instead, yield * just keeps the current iteration step going via the recursive call. So, just one call to the iterator’s next() function fully runs the generator.

Now let’s consider a generator that will have multiple steps and thus multiple produced values:

function *foo() {
    yield 1;
    yield 2;
    yield 3;
}

We already know we can consume an iterator, even one attached to a generator like *foo(), with a for..of loop:

for (var v of foo()) {
    console.log( v );
}
// 1 2 3

Note: The for..of loop requires an iterable. A generator function reference (like foo) by itself is not an iterable; you must execute it with foo() to get the iterator (which is also an iterable, as we explained earlier in this chapter). You could theoretically extend the GeneratorPrototype (the prototype of all generator functions) with a Symbol.iterator function that essentially just does return this(). That would make the foo reference itself an iterable, which means for (var v of foo) { .. } (notice no () on foo) will work.

Let’s instead iterate the generator manually:

function *foo() {
    yield 1;
    yield 2;
    yield 3;
}
var it = foo();
it.next();                // { value: 1, done: false }
it.next();                // { value: 2, done: false }
it.next();                // { value: 3, done: false }
it.next();                // { value: undefined, done: true }

If you look closely, there are three yield statements and four next() calls. That may seem like a strange mismatch. In fact, there will always be one more next() call than yield expression, assuming all are evaluated and the generator is fully run to completion.

But if you look at it from the opposite perspective (inside-out instead of outside-in), the matching between yield and next() makes more sense.

Recall that the yield .. expression will be completed by the value you resume the generator with. That means the argument you pass to next(..) completes whatever yield .. expression is currently paused waiting for a completion.

Let’s illustrate this perspective this way:

function *foo() {
    var x = yield 1;
    var y = yield 2;
    var z = yield 3;
    console.log( x, y, z );
}

In this snippet, each yield .. is sending a value out (1, 2, 3), but more directly, it’s pausing the generator to wait for a value. In other words, it’s almost like asking the question, “What value should I use here? I’ll wait to hear back.”

Now, here’s how we control *foo() to start it up:

var it = foo();
it.next();                // { value: 1, done: false }

That first next() call is starting up the generator from its initial paused state, and running it to the first yield. At the moment you call that first next(), there’s no yield .. expression waiting for a completion. If you passed a value to that first next() call, it would currently just be thrown away, because no yield is waiting to receive such a value.

Note: An early proposal for the “beyond ES6” timeframe would let you access a value passed to an initial next(..) call via a separate meta property (see Chapter 7) inside the generator.

Now, let’s answer the currently pending question, “What value should I assign to x?” We’ll answer it by sending a value to the next next(..) call:

it.next( "foo" );        // { value: 2, done: false }

Now, the x will have the value "foo", but we’ve also asked a new question, “What value should I assign to y?” And we answer:

it.next( "bar" );        // { value: 3, done: false }

Answer given, another question asked. Final answer:

it.next( "baz" );        // "foo" "bar" "baz"
                        // { value: undefined, done: true }

Now it should be clearer how each yield .. “question” is answered by the next next(..) call, and so the “extra” next() call we observed is always just the initial one that starts everything going.

Let’s put all those steps together:

var it = foo();
// start up the generator
it.next();                // { value: 1, done: false }
// answer first question
it.next( "foo" );        // { value: 2, done: false }
// answer second question
it.next( "bar" );        // { value: 3, done: false }
// answer third question
it.next( "baz" );        // "foo" "bar" "baz"
                        // { value: undefined, done: true }

You can think of a generator as a producer of values, in which case each iteration is simply producing a value to be consumed.

But in a more general sense, perhaps it’s appropriate to think of generators as controlled, progressive code execution, much like the tasks queue example from the earlier “Custom Iterators” section.

Note: That perspective is exactly the motivation for how we’ll revisit generators in Chapter 4. Specifically, there’s no reason that next(..) has to be called right away after the previous next(..) finishes. While the generator’s inner execution context is paused, the rest of the program continues unblocked, including the ability for asynchronous actions to control when the generator is resumed.

Early Completion

As we covered earlier in this chapter, the iterator attached to a generator supports the optional return(..) and throw(..) methods. Both of them have the effect of aborting a paused generator immediately.

Consider:

function *foo() {
    yield 1;
    yield 2;
    yield 3;
}
var it = foo();
it.next();                // { value: 1, done: false }
it.return( 42 );        // { value: 42, done: true }
it.next();                // { value: undefined, done: true }

return(x) is kind of like forcing a return x to be processed at exactly that moment, such that you get the specified value right back. Once a generator is completed, either normally or early as shown, it no longer processes any code or returns any values.

In addition to return(..) being callable manually, it’s also called automatically at the end of iteration by any of the ES6 constructs that consume iterators, such as the for..of loop and the ... spread operator.

The purpose for this capability is so the generator can be notified if the controlling code is no longer going to iterate over it anymore, so that it can perhaps do any cleanup tasks (freeing up resources, resetting status, etc.). Identical to a normal function cleanup pattern, the main way to accomplish this is to use a finally clause:

function *foo() {
    try {
        yield 1;
        yield 2;
        yield 3;
    }
    finally {
        console.log( "cleanup!" );
    }
}
for (var v of foo()) {
    console.log( v );
}
// 1 2 3
// cleanup!
var it = foo();
it.next();                // { value: 1, done: false }
it.return( 42 );        // cleanup!
                        // { value: 42, done: true }

Warning: Do not put a yield statement inside the finally clause! It’s valid and legal, but it’s a really terrible idea. It acts in a sense as deferring the completion of the return(..) call you made, as any yield .. expressions in the finally clause are respected to pause and send messages; you don’t immediately get a completed generator as expected. There’s basically no good reason to opt in to that crazy bad part, so avoid doing so!

In addition to the previous snippet showing how return(..) aborts the generator while still triggering the finally clause, it also demonstrates that a generator produces a whole new iterator each time it’s called. In fact, you can use multiple iterators attached to the same generator concurrently:

function *foo() {
    yield 1;
    yield 2;
    yield 3;
}
var it1 = foo();
it1.next();                // { value: 1, done: false }
it1.next();                // { value: 2, done: false }
var it2 = foo();
it2.next();                // { value: 1, done: false }
it1.next();                // { value: 3, done: false }
it2.next();                // { value: 2, done: false }
it2.next();                // { value: 3, done: false }
it2.next();                // { value: undefined, done: true }
it1.next();                // { value: undefined, done: true }

Early Abort

Instead of calling return(..), you can call throw(..). Just like return(x) is essentially injecting a return x into the generator at its current pause point, calling throw(x) is essentially like injecting a throw x at the pause point.

Other than the exception behavior (we cover what that means to try clauses in the next section), throw(..) produces the same sort of early completion that aborts the generator’s run at its current pause point. For example:

function *foo() {
    yield 1;
    yield 2;
    yield 3;
}
var it = foo();
it.next();                // { value: 1, done: false }
try {
    it.throw( "Oops!" );
}
catch (err) {
    console.log( err );    // Exception: Oops!
}
it.next();                // { value: undefined, done: true }

Because throw(..) basically injects a throw .. in replacement of the yield 1 line of the generator, and nothing handles this exception, it immediately propagates back out to the calling code, which handles it with a try..catch.

Unlike return(..), the iterator’s throw(..) method is never called automatically.

Of course, though not shown in the previous snippet, if a try..finally clause was waiting inside the generator when you call throw(..), the finally clause would be given a chance to complete before the exception is propagated back to the calling code.

Error Handling

As we’ve already hinted, error handling with generators can be expressed with try..catch, which works in both inbound and outbound directions:

function *foo() {
    try {
        yield 1;
    }
    catch (err) {
        console.log( err );
    }
    yield 2;
    throw "Hello!";
}
var it = foo();
it.next();                // { value: 1, done: false }
try {
    it.throw( "Hi!" );    // Hi!
                        // { value: 2, done: false }
    it.next();
    console.log( "never gets here" );
}
catch (err) {
    console.log( err );    // Hello!
}

Errors can also propagate in both directions through yield * delegation:

function *foo() {
    try {
        yield 1;
    }
    catch (err) {
        console.log( err );
    }
    yield 2;
    throw "foo: e2";
}
function *bar() {
    try {
        yield *foo();
        console.log( "never gets here" );
    }
    catch (err) {
        console.log( err );
    }
}
var it = bar();
try {
    it.next();            // { value: 1, done: false }
    it.throw( "e1" );    // e1
                        // { value: 2, done: false }
    it.next();            // foo: e2
                        // { value: undefined, done: true }
}
catch (err) {
    console.log( "never gets here" );
}
it.next();                // { value: undefined, done: true }

When *foo() calls yield 1, the 1 value passes through *bar() untouched, as we’ve already seen.

But what’s most interesting about this snippet is that when *foo() calls throw "foo: e2", this error propagates to *bar() and is immediately caught by *bar()‘s try..catch block. The error doesn’t pass through *bar() like the 1 value did.

*bar()‘s catch then does a normal output of err ("foo: e2") and then *bar() finishes normally, which is why the { value: undefined, done: true } iterator result comes back from it.next().

If *bar() didn’t have a try..catch around the yield *.. expression, the error would of course propagate all the way out, and on the way through it still would complete (abort) *bar().

Transpiling a Generator

Is it possible to represent a generator’s capabilities prior to ES6? It turns out it is, and there are several great tools that do so, including most notably Facebook’s Regenerator tool (https://facebook.github.io/regenerator/).

But just to better understand generators, let’s try our hand at manually converting. Basically, we’re going to create a simple closure-based state machine.

We’ll keep our source generator really simple:

function *foo() {
    var x = yield 42;
    console.log( x );
}

To start, we’ll need a function called foo() that we can execute, which needs to return an iterator:

function foo() {
    // ..
    return {
        next: function(v) {
            // ..
        }
        // we'll skip `return(..)` and `throw(..)`
    };
}

Now, we need some inner variable to keep track of where we are in the steps of our “generator”‘s logic. We’ll call it state. There will be three states: 0 initially, 1 while waiting to fulfill the yield expression, and 2 once the generator is complete.

Each time next(..) is called, we need to process the next step, and then increment state. For convenience, we’ll put each step into a case clause of a switch statement, and we’ll hold that in an inner function called nextState(..) that next(..) can call. Also, because x is a variable across the overall scope of the “generator,” it needs to live outside the nextState(..) function.

Here it is all together (obviously somewhat simplified, to keep the conceptual illustration clearer):

function foo() {
    function nextState(v) {
        switch (state) {
            case 0:
                state++;
                // the `yield` expression
                return 42;
            case 1:
                state++;
                // `yield` expression fulfilled
                x = v;
                console.log( x );
                // the implicit `return`
                return undefined;
            // no need to handle state `2`
        }
    }
    var state = 0, x;
    return {
        next: function(v) {
            var ret = nextState( v );
            return { value: ret, done: (state == 2) };
        }
        // we'll skip `return(..)` and `throw(..)`
    };
}

And finally, let’s test our pre-ES6 “generator”:

var it = foo();
it.next();                // { value: 42, done: false }
it.next( 10 );            // 10
                        // { value: undefined, done: true }

Not bad, huh? Hopefully this exercise solidifies in your mind that generators are actually just simple syntax for state machine logic. That makes them widely applicable.

Generator Uses

So, now that we much more deeply understand how generators work, what are they useful for?

We’ve seen two major patterns:

Producing a series of values: This usage can be simple (e.g., random strings or incremented numbers), or it can represent more structured data access (e.g., iterating over rows returned from a database query).

Either way, we use the iterator to control a generator so that some logic can be invoked for each call to next(..). Normal iterators on data structures merely pull values without any controlling logic.
Queue of tasks to perform serially: This usage often represents flow control for the steps in an algorithm, where each step requires retrieval of data from some external source. The fulfillment of each piece of data may be immediate, or may be asynchronously delayed.

From the perspective of the code inside the generator, the details of sync or async at a yield point are entirely opaque. Moreover, these details are intentionally abstracted away, such as not to obscure the natural sequential expression of steps with such implementation complications. Abstraction also means the implementations can be swapped/refactored often without touching the code in the generator at all.

When generators are viewed in light of these uses, they become a lot more than just a different or nicer syntax for a manual state machine. They are a powerful abstraction tool for organizing and controlling orderly production and consumption of data.