Chapter 1: Asynchrony: Now & Later - Concurrency - 《You Don't Know JS: Async & Performance（1st edition）》

Concurrency

Concurrency

Let’s imagine a site that displays a list of status updates (like a social network news feed) that progressively loads as the user scrolls down the list. To make such a feature work correctly, (at least) two separate “processes” will need to be executing simultaneously (i.e., during the same window of time, but not necessarily at the same instant).

Note: We’re using “process” in quotes here because they aren’t true operating system–level processes in the computer science sense. They’re virtual processes, or tasks, that represent a logically connected, sequential series of operations. We’ll simply prefer “process” over “task” because terminology-wise, it will match the definitions of the concepts we’re exploring.

The first “process” will respond to onscroll events (making Ajax requests for new content) as they fire when the user has scrolled the page further down. The second “process” will receive Ajax responses back (to render content onto the page).

Obviously, if a user scrolls fast enough, you may see two or more onscroll events fired during the time it takes to get the first response back and process, and thus you’re going to have onscroll events and Ajax response events firing rapidly, interleaved with each other.

Concurrency is when two or more “processes” are executing simultaneously over the same period, regardless of whether their individual constituent operations happen in parallel (at the same instant on separate processors or cores) or not. You can think of concurrency then as “process”-level (or task-level) parallelism, as opposed to operation-level parallelism (separate-processor threads).

Note: Concurrency also introduces an optional notion of these “processes” interacting with each other. We’ll come back to that later.

For a given window of time (a few seconds worth of a user scrolling), let’s visualize each independent “process” as a series of events/operations:

“Process” 1 (onscroll events):

onscroll, request 1
onscroll, request 2
onscroll, request 3
onscroll, request 4
onscroll, request 5
onscroll, request 6
onscroll, request 7

“Process” 2 (Ajax response events):

response 1
response 2
response 3
response 4
response 5
response 6
response 7

It’s quite possible that an onscroll event and an Ajax response event could be ready to be processed at exactly the same moment. For example, let’s visualize these events in a timeline:

onscroll, request 1
onscroll, request 2          response 1
onscroll, request 3          response 2
response 3
onscroll, request 4
onscroll, request 5
onscroll, request 6          response 4
onscroll, request 7
response 6
response 5
response 7

But, going back to our notion of the event loop from earlier in the chapter, JS is only going to be able to handle one event at a time, so either onscroll, request 2 is going to happen first or response 1 is going to happen first, but they cannot happen at literally the same moment. Just like kids at a school cafeteria, no matter what crowd they form outside the doors, they’ll have to merge into a single line to get their lunch!

Let’s visualize the interleaving of all these events onto the event loop queue.

Event Loop Queue:

onscroll, request 1   <--- Process 1 starts
onscroll, request 2
response 1            <--- Process 2 starts
onscroll, request 3
response 2
response 3
onscroll, request 4
onscroll, request 5
onscroll, request 6
response 4
onscroll, request 7   <--- Process 1 finishes
response 6
response 5
response 7            <--- Process 2 finishes

“Process 1” and “Process 2” run concurrently (task-level parallel), but their individual events run sequentially on the event loop queue.

By the way, notice how response 6 and response 5 came back out of expected order?

The single-threaded event loop is one expression of concurrency (there are certainly others, which we’ll come back to later).

Noninteracting

As two or more “processes” are interleaving their steps/events concurrently within the same program, they don’t necessarily need to interact with each other if the tasks are unrelated. If they don’t interact, nondeterminism is perfectly acceptable.

For example:

var res = {};
function foo(results) {
    res.foo = results;
}
function bar(results) {
    res.bar = results;
}
// ajax(..) is some arbitrary Ajax function given by a library
ajax( "http://some.url.1", foo );
ajax( "http://some.url.2", bar );

foo() and bar() are two concurrent “processes,” and it’s nondeterminate which order they will be fired in. But we’ve constructed the program so it doesn’t matter what order they fire in, because they act independently and as such don’t need to interact.

This is not a “race condition” bug, as the code will always work correctly, regardless of the ordering.

Interaction

More commonly, concurrent “processes” will by necessity interact, indirectly through scope and/or the DOM. When such interaction will occur, you need to coordinate these interactions to prevent “race conditions,” as described earlier.

Here’s a simple example of two concurrent “processes” that interact because of implied ordering, which is only sometimes broken:

var res = [];
function response(data) {
    res.push( data );
}
// ajax(..) is some arbitrary Ajax function given by a library
ajax( "http://some.url.1", response );
ajax( "http://some.url.2", response );

The concurrent “processes” are the two response() calls that will be made to handle the Ajax responses. They can happen in either-first order.

Let’s assume the expected behavior is that res[0] has the results of the "http://some.url.1" call, and res[1] has the results of the "http://some.url.2" call. Sometimes that will be the case, but sometimes they’ll be flipped, depending on which call finishes first. There’s a pretty good likelihood that this nondeterminism is a “race condition” bug.

Note: Be extremely wary of assumptions you might tend to make in these situations. For example, it’s not uncommon for a developer to observe that "http://some.url.2" is “always” much slower to respond than "http://some.url.1", perhaps by virtue of what tasks they’re doing (e.g., one performing a database task and the other just fetching a static file), so the observed ordering seems to always be as expected. Even if both requests go to the same server, and it intentionally responds in a certain order, there’s no real guarantee of what order the responses will arrive back in the browser.

So, to address such a race condition, you can coordinate ordering interaction:

var res = [];
function response(data) {
    if (data.url == "http://some.url.1") {
        res[0] = data;
    }
    else if (data.url == "http://some.url.2") {
        res[1] = data;
    }
}
// ajax(..) is some arbitrary Ajax function given by a library
ajax( "http://some.url.1", response );
ajax( "http://some.url.2", response );

Regardless of which Ajax response comes back first, we inspect the data.url (assuming one is returned from the server, of course!) to figure out which position the response data should occupy in the res array. res[0] will always hold the "http://some.url.1" results and res[1] will always hold the "http://some.url.2" results. Through simple coordination, we eliminated the “race condition” nondeterminism.

The same reasoning from this scenario would apply if multiple concurrent function calls were interacting with each other through the shared DOM, like one updating the contents of a <div> and the other updating the style or attributes of the <div> (e.g., to make the DOM element visible once it has content). You probably wouldn’t want to show the DOM element before it had content, so the coordination must ensure proper ordering interaction.

Some concurrency scenarios are always broken (not just sometimes) without coordinated interaction. Consider:

var a, b;
function foo(x) {
    a = x * 2;
    baz();
}
function bar(y) {
    b = y * 2;
    baz();
}
function baz() {
    console.log(a + b);
}
// ajax(..) is some arbitrary Ajax function given by a library
ajax( "http://some.url.1", foo );
ajax( "http://some.url.2", bar );

In this example, whether foo() or bar() fires first, it will always cause baz() to run too early (either a or b will still be undefined), but the second invocation of baz() will work, as both a and b will be available.

There are different ways to address such a condition. Here’s one simple way:

var a, b;
function foo(x) {
    a = x * 2;
    if (a && b) {
        baz();
    }
}
function bar(y) {
    b = y * 2;
    if (a && b) {
        baz();
    }
}
function baz() {
    console.log( a + b );
}
// ajax(..) is some arbitrary Ajax function given by a library
ajax( "http://some.url.1", foo );
ajax( "http://some.url.2", bar );

The if (a && b) conditional around the baz() call is traditionally called a “gate,” because we’re not sure what order a and b will arrive, but we wait for both of them to get there before we proceed to open the gate (call baz()).

Another concurrency interaction condition you may run into is sometimes called a “race,” but more correctly called a “latch.” It’s characterized by “only the first one wins” behavior. Here, nondeterminism is acceptable, in that you are explicitly saying it’s OK for the “race” to the finish line to have only one winner.

Consider this broken code:

var a;
function foo(x) {
    a = x * 2;
    baz();
}
function bar(x) {
    a = x / 2;
    baz();
}
function baz() {
    console.log( a );
}
// ajax(..) is some arbitrary Ajax function given by a library
ajax( "http://some.url.1", foo );
ajax( "http://some.url.2", bar );

Whichever one (foo() or bar()) fires last will not only overwrite the assigned a value from the other, but it will also duplicate the call to baz() (likely undesired).

So, we can coordinate the interaction with a simple latch, to let only the first one through:

var a;
function foo(x) {
    if (a == undefined) {
        a = x * 2;
        baz();
    }
}
function bar(x) {
    if (a == undefined) {
        a = x / 2;
        baz();
    }
}
function baz() {
    console.log( a );
}
// ajax(..) is some arbitrary Ajax function given by a library
ajax( "http://some.url.1", foo );
ajax( "http://some.url.2", bar );

The if (a == undefined) conditional allows only the first of foo() or bar() through, and the second (and indeed any subsequent) calls would just be ignored. There’s just no virtue in coming in second place!

Note: In all these scenarios, we’ve been using global variables for simplistic illustration purposes, but there’s nothing about our reasoning here that requires it. As long as the functions in question can access the variables (via scope), they’ll work as intended. Relying on lexically scoped variables (see the Scope & Closures title of this book series), and in fact global variables as in these examples, is one obvious downside to these forms of concurrency coordination. As we go through the next few chapters, we’ll see other ways of coordination that are much cleaner in that respect.

Cooperation

Another expression of concurrency coordination is called “cooperative concurrency.” Here, the focus isn’t so much on interacting via value sharing in scopes (though that’s obviously still allowed!). The goal is to take a long-running “process” and break it up into steps or batches so that other concurrent “processes” have a chance to interleave their operations into the event loop queue.

For example, consider an Ajax response handler that needs to run through a long list of results to transform the values. We’ll use Array#map(..) to keep the code shorter:

var res = [];
// `response(..)` receives array of results from the Ajax call
function response(data) {
    // add onto existing `res` array
    res = res.concat(
        // make a new transformed array with all `data` values doubled
        data.map( function(val){
            return val * 2;
        } )
    );
}
// ajax(..) is some arbitrary Ajax function given by a library
ajax( "http://some.url.1", response );
ajax( "http://some.url.2", response );

If "http://some.url.1" gets its results back first, the entire list will be mapped into res all at once. If it’s a few thousand or less records, this is not generally a big deal. But if it’s say 10 million records, that can take a while to run (several seconds on a powerful laptop, much longer on a mobile device, etc.).

While such a “process” is running, nothing else in the page can happen, including no other response(..) calls, no UI updates, not even user events like scrolling, typing, button clicking, and the like. That’s pretty painful.

So, to make a more cooperatively concurrent system, one that’s friendlier and doesn’t hog the event loop queue, you can process these results in asynchronous batches, after each one “yielding” back to the event loop to let other waiting events happen.

Here’s a very simple approach:

var res = [];
// `response(..)` receives array of results from the Ajax call
function response(data) {
    // let's just do 1000 at a time
    var chunk = data.splice( 0, 1000 );
    // add onto existing `res` array
    res = res.concat(
        // make a new transformed array with all `chunk` values doubled
        chunk.map( function(val){
            return val * 2;
        } )
    );
    // anything left to process?
    if (data.length > 0) {
        // async schedule next batch
        setTimeout( function(){
            response( data );
        }, 0 );
    }
}
// ajax(..) is some arbitrary Ajax function given by a library
ajax( "http://some.url.1", response );
ajax( "http://some.url.2", response );

We process the data set in maximum-sized chunks of 1,000 items. By doing so, we ensure a short-running “process,” even if that means many more subsequent “processes,” as the interleaving onto the event loop queue will give us a much more responsive (performant) site/app.

Of course, we’re not interaction-coordinating the ordering of any of these “processes,” so the order of results in res won’t be predictable. If ordering was required, you’d need to use interaction techniques like those we discussed earlier, or ones we will cover in later chapters of this book.

We use the setTimeout(..0) (hack) for async scheduling, which basically just means “stick this function at the end of the current event loop queue.”

Note: setTimeout(..0) is not technically inserting an item directly onto the event loop queue. The timer will insert the event at its next opportunity. For example, two subsequent setTimeout(..0) calls would not be strictly guaranteed to be processed in call order, so it is possible to see various conditions like timer drift where the ordering of such events isn’t predictable. In Node.js, a similar approach is process.nextTick(..). Despite how convenient (and usually more performant) it would be, there’s not a single direct way (at least yet) across all environments to ensure async event ordering. We cover this topic in more detail in the next section.