Dealing with NaNs

Having a model yielding NaNs or Infs is quite common if some of the tinycomponents in your model are not set properly. NaNs are hard to deal withbecause sometimes it is caused by a bug or error in the code, sometimes it’sbecause of the numerical stability of your computational environment (libraryversions, etc.), and even, sometimes it relates to your algorithm. Here we tryto outline common issues which cause the model to yield NaNs, as well asprovide nails and hammers to diagnose it.

Check Superparameters and Weight Initialization

Most frequently, the cause would be that some of the hyperparameters, especiallylearning rates, are set incorrectly. A high learning rate can blow up your wholemodel into NaN outputs even within one epoch of training. So the first andeasiest solution is try to lower it. Keep halving your learning rate until youstart to get resonable output values.

Other hyperparameters may also play a role. For example, are your trainingalgorithms involve regularization terms? If so, are their correspondingpenalties set reasonably? Search a wider hyperparameter space with a few (one ortwo) training epochs each to see if the NaNs could disappear.

Some models can be very sensitive to the initialization of weight vectors. Ifthose weights are not initialized in a proper range, then it is not surprisingthat the model ends up with yielding NaNs.

Run in NanGuardMode, DebugMode, or MonitorMode

If adjusting hyperparameters doesn’t work for you, you can still get help fromTheano’s NanGuardMode. Change the mode of your theano function to NanGuardModeand run them again. The NanGuardMode will monitor all input/output variables ineach node, and raises an error if NaNs are detected. For how to use theNanGuardMode, please refer to nanguardmode. Using optimizer_including=alloc_empty_to_zeroswith NanGuardMode could be helpful to detect NaN, for more information please referto NaN Introduced by AllocEmpty.

DebugMode can also help. Run your code in DebugMode with flagmode=DebugMode,DebugMode.check_py=False. This will give you clue about whichop is causing this problem, and then you can inspect that op in more detail. Fordetails of using DebugMode, please refer to debugmode.

Theano’s MonitorMode provides another helping hand. It can be used to stepthrough the execution of a function. You can inspect the inputs and outputs ofeach node being executed when the function is called. For how to use that,please check “How do I Step through a Compiled Function?”.

Numerical Stability

After you have located the op which causes the problem, it may turn out that theNaNs yielded by that op are related to numerical issues. For example,1 / log(p(x) + 1) may result in NaNs for those nodes who have learned toyield a low probability p(x) for some input x.

Algorithm Related

In the most difficult situations, you may go through the above steps and findnothing wrong. If the above methods fail to uncover the cause, there is a goodchance that something is wrong with your algorithm. Go back to the mathematicsand find out if everything is derived correctly.

Cuda Specific Option

The Theano flag nvcc.fastmath=True can genarate NaN. Don’t setthis flag while debugging NaN.

NaN Introduced by AllocEmpty

AllocEmpty is used by many operation such as scan to allocate some memory without properly clearing it. The reason for that is that the allocated memory will subsequently be overwritten. However, this can sometimes introduce NaN depending on the operation and what was previously stored in the memory it is working on. For instance, trying to zero out memory using a multipication before applying an operation could cause NaN if NaN is already present in the memory, since 0 * NaN => NaN.

Using optimizerincluding=alloc_empty_to_zeros replaces _AllocEmpty by Alloc{0}, which is helpful to diagnose where NaNs come from. Please note that when running in NanGuardMode, this optimizer is not included by default. Therefore, it might be helpful to use them both together.