The Art of Graceful Reloading

Author: Roberto De Ioris

The following article is language-agnostic, and albeit uWSGI-specific, some ofits initial considerations apply to other application servers and platformstoo.

All of the described techniques assume a modern (>= 1.4) uWSGI release withthe master process enabled.

What is a “graceful reload”?

During the life-cycle of your webapp you will reload it hundreds of times.

You need reloading for code updates, you need reloading for changes in theuWSGI configuration, you need reloading to reset the state of your app.

Basically, reloading is one of the most simple, frequent and dangerousoperation you do every time.

So, why “graceful”?

Take a traditional (and highly suggested) architecture: a proxy/load balancer(like nginx) forwards requests to one or more uWSGI daemons listening on variousaddresses.

If you manage your reloads as “stop the instance, start the instance”, the timeslice between two phases will result in a brutal disservice for your customers.

The main trick for avoiding it is: not closing the file descriptors mapped tothe uWSGI daemon addresses and abusing the Unix fork() behaviour (read:file descriptors are inherited by default) to exec() the uwsgi binaryagain.

The result is your proxy enqueuing requests to the socket until the latterwill be able to accept() them again, with the user/customer only seeinga little slowdown in the first response (the time required for the app to befully loaded again).

Another important step of graceful reload is to avoid destroying workers/threadsthat are still managing requests. Obviously requests could be stuck, so youshould have a timeout for running workers (in uWSGI it is called the“worker’s mercy” and it has a default value of 60 seconds).

These kind of tricks are pretty easy to accomplish and basically all of themodern servers/application servers do it (more or less).

But, as always, the world is an ugly place and lot of problems arise, and the“inherited sockets” approach is often not enough.

Things go wrong

We have seen that holding the uWSGI sockets alive allows the proxy webserverto enqueue requests without spitting out errors to the clients. This is trueonly if your app restarts fast, and, sadly, this may not always happen.

Frameworks like Ruby on Rails or Zope start up really slow by default, yourapp could start up slowly by itself, or your machine could be so overloaded thatevery process spawn (fork()) takes ages.

In addition to this, your site could be so famous that even if your app restartsin a couple of seconds, the queue of your sockets could be filled up forcing theproxy server to raise an error.

Do not forget, your workers/threads that are still running requests could blockthe reload (for various reasons) for more seconds than your proxy server couldtolerate.

Finally, you could have made an application error in your just-committed code,so uWSGI will not start, or will start sending wrong things or errors…

Reloads (brutal or graceful) can easily fail.

The listen queue

Let’s start with the dream of every webapp developer: success.

Your app is visited by thousands of clients and you obviously make money withit. Unfortunately, it is a very complex app and requires 10 seconds to warm up.

During graceful reloads, you expect new clients to wait 10 seconds (best case)to start seeing contents, but, unfortunately, you have hundreds of concurrentrequests, so first 100 customers will wait during the server warm-up, whilethe others will get an error from the proxy.

This happens because the default size of uWSGI’s listen queue is 100 slots.Before you ask, it is an average value chosen by the maximum value allowedby default by your kernel.

Each operating system has a default limit (Linux has 128, for example), sobefore increasing it you need to increase your kernel limit too.

So, once your kernel is ready, you can increase the listen queue to themaximum number of users you expect to enqueue during a reload.

To increase the listen queue you use the —listen <n> option where<n> is the maximum number of slots.

To raise kernel limits, you should check your OS docs. Some examples:

  • sysctl kern.ipc.somaxconn on FreeBSD
  • /proc/sys/net/core/somaxconn on Linux.

Note

This is only one of the reasons to tune the listen queue, but do not blindlyset it to huge values as a way to increase availability.

Proxy timeouts

This is another thing you need to check if your reloads take a lot of time.

Generally, proxies allow you to set two timeouts:

  • connect
  • Maximum amount of time the proxy will wait for a successful connection.
  • read
  • Maximum amount of time the server will be able to wait for data beforegiving up.

When tuning the reloads, only the “connection” timeout matters. This timeoutenters the game in the time slice between uWSGI’s bind to an interface (orinheritance of it) and the call to accept().

Waiting instead of errors is good, no errors and no waiting is even better

This is the focus of this article. We have seen how to increase the toleranceof your proxy during application server reloading. The customers will waitinstead of getting scary errors, but we all want to make money, so why forcethem to wait?

We want zero-downtime and zero-wait.

Preforking VS lazy-apps VS lazy

This is one of the controversial choices of the uWSGI project.

By default uWSGI loads the whole application in the first process and afterthe app is loaded it does fork() itself multiple times.This is the common Unix pattern, it may highly reduce the memory usage of yourapp, allows lot of funny tricks and on some languages may bring you a lot ofheadaches.

Albeit its name, uWSGI was born as a Perl application server (it was not calleduWSGI and it was not open source), and in the Perl world preforking isgenerally the blessed way.

This is not true for a lot of other languages, platforms and frameworks, sobefore starting dealing with uWSGI you should choose how to manage fork()in your stack.

Seeing it from the “graceful reloading” point of view, preforking extremelyspeeds up things: your app is loaded only one time, and spawning additionalworkers will be really fast. Avoiding disk access for each worker of yourstack will decrease startup times, especially for frameworks or languagesdoing a lot of disk access to find modules.

Unfortunately, the preforking approach forces you to reload the whole stackwhenever you make code changes instead of reloading only the workers.

In addition to this, your app could need preforking, or could completelycrash due to it because of the way it has been developed.

lazy-apps mode instead loads your application one time per worker. It willrequire about O(n) time to load it (where n is the number of workers),will very probably consume more memory, but will run in a more consistentand clean environment.

Remember: lazy-apps is different from lazy, the first one only instructsuWSGI to load the application one time per worker, while the second ismore invasive (and generally discouraged) as it changes a lot of internaldefaults.

The following approaches will show you how to accomplish zero-downtime/waitreloads in both preforking and lazy modes.

Note

Each approach has pros and cons, choose carefully.

Standard (default/boring) graceful reload (aka SIGHUP)

To trigger it, you can:

  • send SIGHUP to the master
  • write r to The Master FIFO
  • use —touch-reload option
  • call uwsgi.reload() API.

In preforking and lazy-apps mode, it will:

  • Wait for running workers.
  • Close all of the file descriptors except the ones mapped to sockets.
  • Call exec() on itself.In lazy mode, it will:

  • Wait for running workers.

  • Restart all of them (this means you cannot change uWSGI options duringthis kind of reload).

Warning

lazy is discouraged!

Pros:

  • easy to manage
  • no corner-case problems
  • no inconsistent states
  • basically full reset of the instance.

Cons:

  • the ones we seen before
  • listen queue filling up
  • stuck workers
  • potentially long waiting times.

Workers reloading in lazy-apps mode

Requires —lazy-apps option.

To trigger it:

It will wait for running workers and then restart each of them.

Pros:

  • avoids restarting the whole instance.

Cons:

  • no user-experience improvements over standard graceful reload, it isonly a shortcut for situation when code updates do not imply instancereconfiguration.

Chain reloading (lazy apps)

Requires —lazy-apps option.

To trigger it:

This is the first approach that improves user experience. When triggered,it will restart one worker at time, and the following worker is not reloadeduntil the previous one is ready to accept new requests.

Pros:

  • potentially highly reduces waiting time for clients
  • reduces the load of the machine during reloads (no multiple processes loadingthe same code).

Cons:

  • only useful for code updates
  • you need a good amount of workers to get a better user experience.

Zerg mode

Requires a zerg server or a zerg pool.

To trigger it, run the instance in zerg mode.

This is the first approach that uses multiple instances of the same applicationto increase user experience.

Zerg mode works by making use of the venerable “fd passing over Unix sockets”technique.

Basically, an external process (the zerg server/pool) binds to the varioussockets required by your app. Your uWSGI instance, instead of binding byitself, asks the zerg server/pool to pass it the file descriptor. This meansmultiple unrelated instances can ask for the same file descriptors and worktogether.

Zerg mode was born to improve auto-scalability, but soon became one of the mostloved approaches for zero-downtime reloading.

Now, examples.

Spawn a zerg pool exposing 127.0.0.1:3031 to the Unix socket/var/run/pool1:

  1. [uwsgi]
  2. master = true
  3. zerg-pool = /var/run/pool1:127.0.0.1:3031

Now spawn one or more instances attached to the zerg pool:

  1. [uwsgi]
  2. ; this will give access to 127.0.0.1:3031 to the instance
  3. zerg = /var/run/pool1

When you want to make update of code or options, just spawn a new instanceattached to the zerg, and shut down the old one when the new one is ready toaccept requests.

The so-called “zerg dance” is a trick for automation of this kind of reload.There are various ways to accomplish it, the objective is to automatically“pause” or “destroy” the old instance when the new one is fully ready and ableto accept requests. More on this below.

Pros:

  • potentially the silver bullet
  • allows instances with different options to cooperate for the same app.

Cons:

  • requires an additional process
  • can be hard to master
  • reload requires copy of the whole uWSGI stack.

The Zerg Dance: Pausing instances

We all make mistakes, sysadmins must improve their skill of fast disasterrecovery. Focusing on avoiding them is a waste of time. Unfortunately, weare all humans.

Rolling back deployments could be your life-safer.

We have seen how zerg mode allows us to have multiple instances asking onthe same socket. In the previous section we used it to spawn a new instanceworking together with the old one. Now, instead of shutting down the oldinstance, why not “pause” it? A paused instance is like the standby modeof your TV. It consumes very few resources, but you can bring it back veryquickly.

“Zerg Dance” is the battle-name for the procedure of continuous swapping ofinstances during reloads. Every reload results in a “sleeping” instance anda running one. Following reloads destroy the old sleeping instance andtransform the old running to the sleeping one and so on.

There are literally dozens of ways to accomplish the “Zerg Dance”, the factthat you can easily use scripts in your reloading procedures makes thisapproach extremely powerful and customizable.

Here we will see the one that requires zero scripting, it could be the lessversatile (and requires at least uWSGI 1.9.21), but should be a good startingpoint for the improvements.

The Master FIFO is the best way to manage instances instead of relyingon Unix signals. Basically, you write single-char commands to govern theinstance.

The funny thing about the Master FIFOs is that you can have many of themconfigured for your instance and swap one with another very easily.

An example will clarify things.

We spawn an instance with 3 Master FIFOs: new (the default one), runningand sleeping:

  1. [uwsgi]
  2. ; fifo '0'
  3. master-fifo = /var/run/new.fifo
  4. ; fifo '1'
  5. master-fifo = /var/run/running.fifo
  6. ; fifo '2'
  7. master-fifo = /var/run/sleeping.fifo
  8. ; attach to zerg
  9. zerg = /var/run/pool1
  10. ; other options ...

By default the “new” one will be active (read: will be able to processcommands).

Now we want to spawn a new instance, that once is ready to accept requests willput the old one in sleeping mode. To do it, we will use uWSGI’s advanced hooks.Hooks allow you to “make things” at various phases of uWSGI’s life cycle.When the new instance is ready, we want to force the old instance to startworking on the sleeping FIFO and be in “pause” mode:

  1. [uwsgi]
  2. ; fifo '0'
  3. master-fifo = /var/run/new.fifo
  4. ; fifo '1'
  5. master-fifo = /var/run/running.fifo
  6. ; fifo '2'
  7. master-fifo = /var/run/sleeping.fifo
  8. ; attach to zerg
  9. zerg = /var/run/pool1
  10.  
  11. ; hooks
  12.  
  13. ; destroy the currently sleeping instance
  14. if-exists = /var/run/sleeping.fifo
  15. hook-accepting1-once = writefifo:/var/run/sleeping.fifo Q
  16. endif =
  17. ; force the currently running instance to became sleeping (slot 2) and place it in pause mode
  18. if-exists = /var/run/running.fifo
  19. hook-accepting1-once = writefifo:/var/run/running.fifo 2p
  20. endif =
  21. ; force this instance to became the running one (slot 1)
  22. hook-accepting1-once = writefifo:/var/run/new.fifo 1

The hook-accepting1-once phase is run one time per instance soon after thefirst worker is ready to accept requests.The writefifo command allows writing to FIFOs without failing if theother peers are not connected (this is different from a simple writecommand that would fail or completely block when dealing with bad FIFOs).

Note

Both features have been added only in uWSGI 1.9.21, with older releases you canuse the —hook-post-app option instead of —hook-accepting1-once, butyou will lose the “once” feature, so it will work reliably only in preforkingmode.

Instead of writefifo you can use the shell variant:exec:echo <string> > <fifo>.

Now start running instances with the same config files over and over again.If all goes well, you should always end with two instances, one sleeping andone running.

Finally, if you want to bring back a sleeping instance, just do:

  1. # destroy the running instance
  2. echo Q > /var/run/running.fifo
  3.  
  4. # unpause the sleeping instance and set it as the running one
  5. echo p1 > /var/run/sleeping.fifo

Pros:

  • truly zero-downtime reload.

Cons:

  • requires high-level uWSGI and Unix skills.

SO_REUSEPORT (Linux >= 3.9 and BSDs)

On recent Linux kernels and modern BSDs you may try —reuse-port option.This option allows multiple unrelated instances to bind on the same networkaddress. You may see it as a kernel-level zerg mode. Basically, all of the Zergapproaches can be followed.

Once you add —reuse-port to you instance, all of the sockets will havethe SO_REUSEPORT flag set.

Pros:

  • similar to zerg mode, could be even easier to manage.

Cons:

  • requires kernel support
  • could lead to inconsistent states
  • you lose ability to use TCP addresses as a way to avoid incidental multipleinstances running.

The Black Art (for rich and brave people): master forking

To trigger it, write f to The Master FIFO.

This is the most dangerous of the ways to reload, but once mastered, it couldlead to pretty cool results.

The approach is: call fork() in the master, close all of the filedescriptors except the socket-related ones, and exec() a new uWSGIinstance.

You will end with two specular uWSGI instances working on the same set ofsockets.

The scary thing about it is how easy (just write a single char to the masterFIFO) is to trigger it…

With a bit of mastery you can implement the zerg dance on top of it.

Pros:

  • does not require kernel support nor an additional process
  • pretty fast.

Cons:

  • a whole copy for each reload
  • inconstent states all over the place (pidfiles, logging, etc.: the masterFIFO commands could help fix them).

Subscription system

This is probably the best approach when you can count on multiple servers.You add the “fastrouter” between your proxy server (e.g., nginx) and yourinstances.

Instances will “subscribe” to the fastrouter that will pass requestsfrom proxy server (nginx) to them while load balancing and constantlymonitoring all of them.

Subscriptions are simple UDP packets that instruct the fastrouter whichdomain maps to which instance or instances.

As you can subscribe, you can unsubscribe too, and this is where the magichappens:

  1. [uwsgi]
  2. subscribe-to = 192.168.0.1:4040:unbit.it
  3. unsubscribe-on-graceful-reload = true
  4. ; all of the required options ...

Adding unsubscribe-on-graceful-reload will force the instance to send an“unsubscribe” packet to the fastrouter, so until it will not be back no requestwill be sent to it.

Pros:

  • low-cost zero-downtime
  • a KISS approach (finally).

Cons:

  • requires a subscription server (like the fastrouter) that introduces overhead(even if we are talking about microseconds).

Inconsistent states

Sadly, most of the approaches involving copies of the whole instance (likeZerg Dance or master forking) lead to inconsistent states.

Take, for example, an instance writing pidfiles: when starting a copy of it,that pidfile will be overwritten.

If you carefully plan your configurations, you can avoid inconsistent states,but thanks to The Master FIFO you can manage some of them (read: the mostcommon ones):

  • l command will reopen logfiles
  • P command will update all of the instance pidfiles.

Fighting inconsistent states with the Emperor

If you manage your instances with the Emperor, you canuse its features to avoid (or reduce number of) inconsistent states.

Giving each instance a different symbolic link name will allow you to mapfiles (like pidfiles or logs) to different paths:

  1. [uwsgi]
  2. logto = /var/log/%n.log
  3. safe-pidfile = /var/run/%n.pid
  4. ; and so on ...

The safe-pidfile option works similar to pidfile but performs the writea little later in the loading process. This avoids overwriting the value whenapp loading fails, with the consequent loss of a valid PID number.

Dealing with ultra-lazy apps (like Django)

Some applications or frameworks (like Django) may load the vast majority oftheir code only at the first request. This means that customer will continueto experience slowdowns during reload even when using things like zerg modeor similar.

This problem is hard to solve (impossible?) in the application server itself,so you should find a way to force your app to load itself ASAP. A good trick(read: works with Django) is to call the entry-point function (like the WSGIcallable) in the app itself:

  1. def application(environ, sr):
  2. sr('200 OK', [('Content-Type', 'text/plain')])
  3. yield "Hello"
  4.  
  5. application({}, lambda x, y: None) # call the entry-point function

You may need to pass CGI vars to the environ to make a true request: it dependson the WSGI app.

Finally: Do not blindly copy & paste!

Please, turn on your brain and try to adapt shown configs to your needs, orinvent new ones.

Each app and system is different from the others.

Experiment before making a choice.

References

The Master FIFO

Hooks

Zerg mode

The uWSGI FastRouter

uWSGI Subscription Server