Chapter 6: Benchmarking & Tuning - Microperformance - 《You Don't Know JS: Async & Performance（1st edition）》

Microperformance
- Not All Engines Are Alike
- Big Picture

Microperformance

OK, until now we’ve been dancing around various microperformance issues and generally looking disfavorably upon obsessing about them. I want to take just a moment to address them directly.

The first thing you need to get more comfortable with when thinking about performance benchmarking your code is that the code you write is not always the code the engine actually runs. We briefly looked at that topic back in Chapter 1 when we discussed statement reordering by the compiler, but here we’re going to suggest the compiler can sometimes decide to run different code than you wrote, not just in different orders but different in substance.

Let’s consider this piece of code:

var foo = 41;
(function(){
    (function(){
        (function(baz){
            var bar = foo + baz;
            // ..
        })(1);
    })();
})();

You may think about the foo reference in the innermost function as needing to do a three-level scope lookup. We covered in the Scope & Closures title of this book series how lexical scope works, and the fact that the compiler generally caches such lookups so that referencing foo from different scopes doesn’t really practically “cost” anything extra.

But there’s something deeper to consider. What if the compiler realizes that foo isn’t referenced anywhere else but that one location, and it further notices that the value never is anything except the 41 as shown?

Isn’t it quite possible and acceptable that the JS compiler could decide to just remove the foo variable entirely, and inline the value, such as this:

(function(){
    (function(){
        (function(baz){
            var bar = 41 + baz;
            // ..
        })(1);
    })();
})();

Note: Of course, the compiler could probably also do a similar analysis and rewrite with the baz variable here, too.

When you begin to think about your JS code as being a hint or suggestion to the engine of what to do, rather than a literal requirement, you realize that a lot of the obsession over discrete syntactic minutia is most likely unfounded.

Another example:

function factorial(n) {
    if (n < 2) return 1;
    return n * factorial( n - 1 );
}
factorial( 5 );        // 120

Ah, the good ol’ fashioned “factorial” algorithm! You might assume that the JS engine will run that code mostly as is. And to be honest, it might — I’m not really sure.

But as an anecdote, the same code expressed in C and compiled with advanced optimizations would result in the compiler realizing that the call factorial(5) can just be replaced with the constant value 120, eliminating the function and call entirely!

Moreover, some engines have a practice called “unrolling recursion,” where it can realize that the recursion you’ve expressed can actually be done “easier” (i.e., more optimally) with a loop. It’s possible the preceding code could be rewritten by a JS engine to run as:

function factorial(n) {
    if (n < 2) return 1;
    var res = 1;
    for (var i=n; i>1; i--) {
        res *= i;
    }
    return res;
}
factorial( 5 );        // 120

Now, let’s imagine that in the earlier snippet you had been worried about whether n * factorial(n-1) or n *= factorial(--n) runs faster. Maybe you even did a performance benchmark to try to figure out which was better. But you miss the fact that in the bigger context, the engine may not run either line of code because it may unroll the recursion!

Speaking of --, --n versus n-- is often cited as one of those places where you can optimize by choosing the --n version, because theoretically it requires less effort down at the assembly level of processing.

That sort of obsession is basically nonsense in modern JavaScript. That’s the kind of thing you should be letting the engine take care of. You should write the code that makes the most sense. Compare these three for loops:

// Option 1
for (var i=0; i<10; i++) {
    console.log( i );
}
// Option 2
for (var i=0; i<10; ++i) {
    console.log( i );
}
// Option 3
for (var i=-1; ++i<10; ) {
    console.log( i );
}

Even if you have some theory where the second or third option is more performant than the first option by a tiny bit, which is dubious at best, the third loop is more confusing because you have to start with -1 for i to account for the fact that ++i pre-increment is used. And the difference between the first and second options is really quite irrelevant.

It’s entirely possible that a JS engine may see a place where i++ is used and realize that it can safely replace it with the ++i equivalent, which means your time spent deciding which one to pick was completely wasted and the outcome moot.

Here’s another common example of silly microperformance obsession:

var x = [ .. ];
// Option 1
for (var i=0; i < x.length; i++) {
    // ..
}
// Option 2
for (var i=0, len = x.length; i < len; i++) {
    // ..
}

The theory here goes that you should cache the length of the x array in the variable len, because ostensibly it doesn’t change, to avoid paying the price of x.length being consulted for each iteration of the loop.

If you run performance benchmarks around x.length usage compared to caching it in a len variable, you’ll find that while the theory sounds nice, in practice any measured differences are statistically completely irrelevant.

In fact, in some engines like v8, it can be shown (http://mrale.ph/blog/2014/12/24/array-length-caching.html) that you could make things slightly worse by pre-caching the length instead of letting the engine figure it out for you. Don’t try to outsmart your JavaScript engine, you’ll probably lose when it comes to performance optimizations.

Not All Engines Are Alike

The different JS engines in various browsers can all be “spec compliant” while having radically different ways of handling code. The JS specification doesn’t require anything performance related — well, except ES6’s “Tail Call Optimization” covered later in this chapter.

The engines are free to decide that one operation will receive its attention to optimize, perhaps trading off for lesser performance on another operation. It can be very tenuous to find an approach for an operation that always runs faster in all browsers.

There’s a movement among some in the JS dev community, especially those who work with Node.js, to analyze the specific internal implementation details of the v8 JavaScript engine and make decisions about writing JS code that is tailored to take best advantage of how v8 works. You can actually achieve a surprisingly high degree of performance optimization with such endeavors, so the payoff for the effort can be quite high.

Some commonly cited examples (https://github.com/petkaantonov/bluebird/wiki/Optimization-killers) for v8:

Don’t pass the arguments variable from one function to any other function, as such “leakage” slows down the function implementation.
Isolate a try..catch in its own function. Browsers struggle with optimizing any function with a try..catch in it, so moving that construct to its own function means you contain the de-optimization harm while letting the surrounding code be optimizable.

But rather than focus on those tips specifically, let’s sanity check the v8-only optimization approach in a general sense.

Are you genuinely writing code that only needs to run in one JS engine? Even if your code is entirely intended for Node.js right now, is the assumption that v8 will always be the used JS engine reliable? Is it possible that someday a few years from now, there’s another server-side JS platform besides Node.js that you choose to run your code on? What if what you optimized for before is now a much slower way of doing that operation on the new engine?

Or what if your code always stays running on v8 from here on out, but v8 decides at some point to change the way some set of operations works such that what used to be fast is now slow, and vice versa?

These scenarios aren’t just theoretical, either. It used to be that it was faster to put multiple string values into an array and then call join("") on the array to concatenate the values than to just use + concatenation directly with the values. The historical reason for this is nuanced, but it has to do with internal implementation details about how string values were stored and managed in memory.

As a result, “best practice” advice at the time disseminated across the industry suggesting developers always use the array join(..) approach. And many followed.

Except, somewhere along the way, the JS engines changed approaches for internally managing strings, and specifically put in optimizations for + concatenation. They didn’t slow down join(..) per se, but they put more effort into helping + usage, as it was still quite a bit more widespread.

Note: The practice of standardizing or optimizing some particular approach based mostly on its existing widespread usage is often called (metaphorically) “paving the cowpath.”

Once that new approach to handling strings and concatenation took hold, unfortunately all the code out in the wild that was using array join(..) to concatenate strings was then sub-optimal.

Another example: at one time, the Opera browser differed from other browsers in how it handled the boxing/unboxing of primitive wrapper objects (see the Types & Grammar title of this book series). As such, their advice to developers was to use a String object instead of the primitive string value if properties like length or methods like charAt(..) needed to be accessed. This advice may have been correct for Opera at the time, but it was literally completely opposite for other major contemporary browsers, as they had optimizations specifically for the string primitives and not their object wrapper counterparts.

I think these various gotchas are at least possible, if not likely, for code even today. So I’m very cautious about making wide ranging performance optimizations in my JS code based purely on engine implementation details, especially if those details are only true of a single engine.

The reverse is also something to be wary of: you shouldn’t necessarily change a piece of code to work around one engine’s difficulty with running a piece of code in an acceptably performant way.

Historically, IE has been the brunt of many such frustrations, given that there have been plenty of scenarios in older IE versions where it struggled with some performance aspect that other major browsers of the time seemed not to have much trouble with. The string concatenation discussion we just had was actually a real concern back in the IE6 and IE7 days, where it was possible to get better performance out of join(..) than +.

But it’s troublesome to suggest that just one browser’s trouble with performance is justification for using a code approach that quite possibly could be sub-optimal in all other browsers. Even if the browser in question has a large market share for your site’s audience, it may be more practical to write the proper code and rely on the browser to update itself with better optimizations eventually.

“There is nothing more permanent than a temporary hack.” Chances are, the code you write now to work around some performance bug will probably outlive the performance bug in the browser itself.

In the days when a browser only updated once every five years, that was a tougher call to make. But as it stands now, browsers across the board are updating at a much more rapid interval (though obviously the mobile world still lags), and they’re all competing to optimize web features better and better.

If you run across a case where a browser does have a performance wart that others don’t suffer from, make sure to report it to them through whatever means you have available. Most browsers have open public bug trackers suitable for this purpose.

Tip: I’d only suggest working around a performance issue in a browser if it was a really drastic show-stopper, not just an annoyance or frustration. And I’d be very careful to check that the performance hack didn’t have noticeable negative side effects in another browser.

Big Picture

Instead of worrying about all these microperformance nuances, we should instead be looking at big-picture types of optimizations.

How do you know what’s big picture or not? You have to first understand if your code is running on a critical path or not. If it’s not on the critical path, chances are your optimizations are not worth much.

Ever heard the admonition, “that’s premature optimization!”? It comes from a famous quote from Donald Knuth: “premature optimization is the root of all evil.”. Many developers cite this quote to suggest that most optimizations are “premature” and are thus a waste of effort. The truth is, as usual, more nuanced.

Here is Knuth’s quote, in context:

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. [emphasis added]

(http://web.archive.org/web/20130731202547/http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf, Computing Surveys, Vol 6, No 4, December 1974)

I believe it’s a fair paraphrasing to say that Knuth meant: “non-critical path optimization is the root of all evil.” So the key is to figure out if your code is on the critical path — you should optimize it! — or not.

I’d even go so far as to say this: no amount of time spent optimizing critical paths is wasted, no matter how little is saved; but no amount of optimization on noncritical paths is justified, no matter how much is saved.

If your code is on the critical path, such as a “hot” piece of code that’s going to be run over and over again, or in UX critical places where users will notice, like an animation loop or CSS style updates, then you should spare no effort in trying to employ relevant, measurably significant optimizations.

For example, consider a critical path animation loop that needs to coerce a string value to a number. There are of course multiple ways to do that (see the Types & Grammar title of this book series), but which one if any is the fastest?

var x = "42";    // need number `42`
// Option 1: let implicit coercion automatically happen
var y = x / 2;
// Option 2: use `parseInt(..)`
var y = parseInt( x, 0 ) / 2;
// Option 3: use `Number(..)`
var y = Number( x ) / 2;
// Option 4: use `+` unary operator
var y = +x / 2;
// Option 5: use `|` unary operator
var y = (x | 0) / 2;

Note: I will leave it as an exercise to the reader to set up a test if you’re interested in examining the minute differences in performance among these options.

When considering these different options, as they say, “One of these things is not like the others.” parseInt(..) does the job, but it also does a lot more — it parses the string rather than just coercing. You can probably guess, correctly, that parseInt(..) is a slower option, and you should probably avoid it.

Of course, if x can ever be a value that needs parsing, such as "42px" (like from a CSS style lookup), then parseInt(..) really is the only suitable option!

Number(..) is also a function call. From a behavioral perspective, it’s identical to the + unary operator option, but it may in fact be a little slower, requiring more machinery to execute the function. Of course, it’s also possible that the JS engine recognizes this behavioral symmetry and just handles the inlining of Number(..)‘s behavior (aka +x) for you!

But remember, obsessing about +x versus x | 0 is in most cases likely a waste of effort. This is a microperformance issue, and one that you shouldn’t let dictate/degrade the readability of your program.

While performance is very important in critical paths of your program, it’s not the only factor. Among several options that are roughly similar in performance, readability should be another important concern.