Interpolating data

xarray offers flexible interpolation routines, which have a similar interfaceto our indexing.

Note

interp requires scipy installed.

Scalar and 1-dimensional interpolation

Interpolating a DataArray works mostly like labeledindexing of a DataArray,

  1. In [1]: da = xr.DataArray(np.sin(0.3 * np.arange(12).reshape(4, 3)),
  2. ...: [('time', np.arange(4)),
  3. ...: ('space', [0.1, 0.2, 0.3])])
  4. ...:
  5.  
  6. # label lookup
  7. In [2]: da.sel(time=3)
  8. Out[2]:
  9. <xarray.DataArray (space: 3)>
  10. array([ 0.42738 , 0.14112 , -0.157746])
  11. Coordinates:
  12. time int64 3
  13. * space (space) float64 0.1 0.2 0.3
  14.  
  15. # interpolation
  16. In [3]: da.interp(time=2.5)
  17. Out[3]:
  18. <xarray.DataArray (space: 3)>
  19. array([0.700614, 0.502165, 0.258859])
  20. Coordinates:
  21. * space (space) float64 0.1 0.2 0.3
  22. time float64 2.5

Similar to the indexing, interp() also accepts anarray-like, which gives the interpolated result as an array.

  1. # label lookup
  2. In [4]: da.sel(time=[2, 3])
  3. Out[4]:
  4. <xarray.DataArray (time: 2, space: 3)>
  5. array([[ 0.973848, 0.863209, 0.675463],
  6. [ 0.42738 , 0.14112 , -0.157746]])
  7. Coordinates:
  8. * time (time) int64 2 3
  9. * space (space) float64 0.1 0.2 0.3
  10.  
  11. # interpolation
  12. In [5]: da.interp(time=[2.5, 3.5])
  13. Out[5]:
  14. <xarray.DataArray (time: 2, space: 3)>
  15. array([[0.700614, 0.502165, 0.258859],
  16. [ nan, nan, nan]])
  17. Coordinates:
  18. * space (space) float64 0.1 0.2 0.3
  19. * time (time) float64 2.5 3.5

To interpolate data with a numpy.datetime64() coordinate you can pass a string.

  1. In [6]: da_dt64 = xr.DataArray([1, 3],
  2. ...: [('time', pd.date_range('1/1/2000', '1/3/2000', periods=2))])
  3. ...:
  4.  
  5. In [7]: da_dt64.interp(time='2000-01-02')
  6. Out[7]:
  7. <xarray.DataArray ()>
  8. array(2.)
  9. Coordinates:
  10. time datetime64[ns] 2000-01-02

The interpolated data can be merged into the original DataArrayby specifying the time periods required.

  1. In [8]: da_dt64.interp(time=pd.date_range('1/1/2000', '1/3/2000', periods=3))
  2. Out[8]:
  3. <xarray.DataArray (time: 3)>
  4. array([1., 2., 3.])
  5. Coordinates:
  6. * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03

Interpolation of data indexed by a CFTimeIndex is alsoallowed. See Non-standard calendars and dates outside the Timestamp-valid range for examples.

Note

Currently, our interpolation only works for regular grids.Therefore, similarly to sel(),only 1D coordinates along a dimension can be used as theoriginal coordinate to be interpolated.

Multi-dimensional Interpolation

Like sel(), interp()accepts multiple coordinates. In this case, multidimensional interpolationis carried out.

  1. # label lookup
  2. In [9]: da.sel(time=2, space=0.1)
  3. Out[9]:
  4. <xarray.DataArray ()>
  5. array(0.973848)
  6. Coordinates:
  7. time int64 2
  8. space float64 0.1
  9.  
  10. # interpolation
  11. In [10]: da.interp(time=2.5, space=0.15)
  12. Out[10]:
  13. <xarray.DataArray ()>
  14. array(0.601389)
  15. Coordinates:
  16. time float64 2.5
  17. space float64 0.15

Array-like coordinates are also accepted:

  1. # label lookup
  2. In [11]: da.sel(time=[2, 3], space=[0.1, 0.2])
  3. Out[11]:
  4. <xarray.DataArray (time: 2, space: 2)>
  5. array([[0.973848, 0.863209],
  6. [0.42738 , 0.14112 ]])
  7. Coordinates:
  8. * time (time) int64 2 3
  9. * space (space) float64 0.1 0.2
  10.  
  11. # interpolation
  12. In [12]: da.interp(time=[1.5, 2.5], space=[0.15, 0.25])
  13. Out[12]:
  14. <xarray.DataArray (time: 2, space: 2)>
  15. array([[0.888106, 0.867052],
  16. [0.601389, 0.380512]])
  17. Coordinates:
  18. * time (time) float64 1.5 2.5
  19. * space (space) float64 0.15 0.25

interp_like() method is a useful shortcut. Thismethod interpolates an xarray object onto the coordinates of another xarrayobject. For example, if we want to compute the difference betweentwo DataArray s (da and other) staying on slightlydifferent coordinates,

  1. In [13]: other = xr.DataArray(np.sin(0.4 * np.arange(9).reshape(3, 3)),
  2. ....: [('time', [0.9, 1.9, 2.9]),
  3. ....: ('space', [0.15, 0.25, 0.35])])
  4. ....:

it might be a good idea to first interpolate da so that it will stay on thesame coordinates of other, and then subtract it.interp_like() can be used for such a case,

  1. # interpolate da along other's coordinates
  2. In [14]: interpolated = da.interp_like(other)
  3.  
  4. In [15]: interpolated
  5. Out[15]:
  6. <xarray.DataArray (time: 3, space: 3)>
  7. array([[0.786691, 0.911298, nan],
  8. [0.912444, 0.788879, nan],
  9. [0.347678, 0.069452, nan]])
  10. Coordinates:
  11. * time (time) float64 0.9 1.9 2.9
  12. * space (space) float64 0.15 0.25 0.35

It is now possible to safely compute the difference other - interpolated.

Interpolation methods

We use scipy.interpolate.interp1d() for 1-dimensional interpolation andscipy.interpolate.interpn() for multi-dimensional interpolation.

The interpolation method can be specified by the optional method argument.

  1. In [16]: da = xr.DataArray(np.sin(np.linspace(0, 2 * np.pi, 10)), dims='x',
  2. ....: coords={'x': np.linspace(0, 1, 10)})
  3. ....:
  4.  
  5. In [17]: da.plot.line('o', label='original')
  6. Out[17]: [<matplotlib.lines.Line2D at 0x7f3425653710>]
  7.  
  8. In [18]: da.interp(x=np.linspace(0, 1, 100)).plot.line(label='linear (default)')
  9. Out[18]: [<matplotlib.lines.Line2D at 0x7f34257677b8>]
  10.  
  11. In [19]: da.interp(x=np.linspace(0, 1, 100), method='cubic').plot.line(label='cubic')
  12. Out[19]: [<matplotlib.lines.Line2D at 0x7f3425653470>]
  13.  
  14. In [20]: plt.legend()
  15. Out[20]: <matplotlib.legend.Legend at 0x7f3424676c18>

_images/interpolation_sample1.pngAdditional keyword arguments can be passed to scipy’s functions.

  1. # fill 0 for the outside of the original coordinates.
  2. In [21]: da.interp(x=np.linspace(-0.5, 1.5, 10), kwargs={'fill_value': 0.0})
  3. Out[21]:
  4. <xarray.DataArray (x: 10)>
  5. array([ 0. , 0. , 0. , 0.813798, 0.604023, -0.604023,
  6. -0.813798, 0. , 0. , 0. ])
  7. Coordinates:
  8. * x (x) float64 -0.5 -0.2778 -0.05556 0.1667 ... 0.8333 1.056 1.278 1.5
  9.  
  10. # extrapolation
  11. In [22]: da.interp(x=np.linspace(-0.5, 1.5, 10), kwargs={'fill_value': 'extrapolate'})
  12. Out[22]:
  13. <xarray.DataArray (x: 10)>
  14. array([-2.892544, -1.606969, -0.321394, 0.813798, 0.604023, -0.604023,
  15. -0.813798, 0.321394, 1.606969, 2.892544])
  16. Coordinates:
  17. * x (x) float64 -0.5 -0.2778 -0.05556 0.1667 ... 0.8333 1.056 1.278 1.5

Advanced Interpolation

interp() accepts DataArrayas similar to sel(), which enables us more advanced interpolation.Based on the dimension of the new coordinate passed to interp(), the dimension of the result are determined.

For example, if you want to interpolate a two dimensional array along a particular dimension, as illustrated below,you can pass two 1-dimensional DataArray s witha common dimension as new coordinate.advanced indexing and interpolation For example:

  1. In [23]: da = xr.DataArray(np.sin(0.3 * np.arange(20).reshape(5, 4)),
  2. ....: [('x', np.arange(5)),
  3. ....: ('y', [0.1, 0.2, 0.3, 0.4])])
  4. ....:
  5.  
  6. # advanced indexing
  7. In [24]: x = xr.DataArray([0, 2, 4], dims='z')
  8.  
  9. In [25]: y = xr.DataArray([0.1, 0.2, 0.3], dims='z')
  10.  
  11. In [26]: da.sel(x=x, y=y)
  12. Out[26]:
  13. <xarray.DataArray (z: 3)>
  14. array([ 0. , 0.42738 , -0.772764])
  15. Coordinates:
  16. x (z) int64 0 2 4
  17. y (z) float64 0.1 0.2 0.3
  18. Dimensions without coordinates: z
  19.  
  20. # advanced interpolation
  21. In [27]: x = xr.DataArray([0.5, 1.5, 2.5], dims='z')
  22.  
  23. In [28]: y = xr.DataArray([0.15, 0.25, 0.35], dims='z')
  24.  
  25. In [29]: da.interp(x=x, y=y)
  26. Out[29]:
  27. <xarray.DataArray (z: 3)>
  28. array([ 0.556264, 0.634961, -0.466433])
  29. Coordinates:
  30. x (z) float64 0.5 1.5 2.5
  31. y (z) float64 0.15 0.25 0.35
  32. Dimensions without coordinates: z

where values on the original coordinates(x, y) = ((0.5, 0.15), (1.5, 0.25), (2.5, 0.35)) are obtained by the2-dimensional interpolation and mapped along a new dimension z.

If you want to add a coordinate to the new dimension z, you can supplyDataArray s with a coordinate,

  1. In [30]: x = xr.DataArray([0.5, 1.5, 2.5], dims='z', coords={'z': ['a', 'b','c']})
  2.  
  3. In [31]: y = xr.DataArray([0.15, 0.25, 0.35], dims='z',
  4. ....: coords={'z': ['a', 'b','c']})
  5. ....:
  6.  
  7. In [32]: da.interp(x=x, y=y)
  8. Out[32]:
  9. <xarray.DataArray (z: 3)>
  10. array([ 0.556264, 0.634961, -0.466433])
  11. Coordinates:
  12. x (z) float64 0.5 1.5 2.5
  13. y (z) float64 0.15 0.25 0.35
  14. * z (z) <U1 'a' 'b' 'c'

For the details of the advanced indexing,see more advanced indexing.

Interpolating arrays with NaN

Our interp() works with arrays with NaNthe same way thatscipy.interpolate.interp1d andscipy.interpolate.interpn do.linear and nearest methods return arrays including NaN,while other methods such as cubic or quadratic return all NaN arrays.

  1. In [33]: da = xr.DataArray([0, 2, np.nan, 3, 3.25], dims='x',
  2. ....: coords={'x': range(5)})
  3. ....:
  4.  
  5. In [34]: da.interp(x=[0.5, 1.5, 2.5])
  6. Out[34]:
  7. <xarray.DataArray (x: 3)>
  8. array([ 1., nan, nan])
  9. Coordinates:
  10. * x (x) float64 0.5 1.5 2.5
  11.  
  12. In [35]: da.interp(x=[0.5, 1.5, 2.5], method='cubic')
  13. Out[35]:
  14. <xarray.DataArray (x: 3)>
  15. array([nan, nan, nan])
  16. Coordinates:
  17. * x (x) float64 0.5 1.5 2.5

To avoid this, you can drop NaN by dropna(), andthen make the interpolation

  1. In [36]: dropped = da.dropna('x')
  2.  
  3. In [37]: dropped
  4. Out[37]:
  5. <xarray.DataArray (x: 4)>
  6. array([0. , 2. , 3. , 3.25])
  7. Coordinates:
  8. * x (x) int64 0 1 3 4
  9.  
  10. In [38]: dropped.interp(x=[0.5, 1.5, 2.5], method='cubic')
  11. Out[38]:
  12. <xarray.DataArray (x: 3)>
  13. array([1.190104, 2.507812, 2.929688])
  14. Coordinates:
  15. * x (x) float64 0.5 1.5 2.5

If NaNs are distributed randomly in your multidimensional array,dropping all the columns containing more than one NaNs bydropna() may lose a significant amount of information.In such a case, you can fill NaN by interpolate_na(),which is similar to pandas.Series.interpolate().

  1. In [39]: filled = da.interpolate_na(dim='x')
  2.  
  3. In [40]: filled
  4. Out[40]:
  5. <xarray.DataArray (x: 5)>
  6. array([0. , 2. , 2.5 , 3. , 3.25])
  7. Coordinates:
  8. * x (x) int64 0 1 2 3 4

This fills NaN by interpolating along the specified dimension.After filling NaNs, you can interpolate:

  1. In [41]: filled.interp(x=[0.5, 1.5, 2.5], method='cubic')
  2. Out[41]:
  3. <xarray.DataArray (x: 3)>
  4. array([1.308594, 2.316406, 2.738281])
  5. Coordinates:
  6. * x (x) float64 0.5 1.5 2.5

For the details of interpolate_na(),see Missing values.

Example

Let’s see how interp() works on real data.

  1. # Raw data
  2. In [42]: ds = xr.tutorial.open_dataset('air_temperature').isel(time=0)
  3.  
  4. In [43]: fig, axes = plt.subplots(ncols=2, figsize=(10, 4))
  5.  
  6. In [44]: ds.air.plot(ax=axes[0])
  7. Out[44]: <matplotlib.collections.QuadMesh at 0x7f34255b9da0>
  8.  
  9. In [45]: axes[0].set_title('Raw data')
  10. Out[45]: Text(0.5, 1.0, 'Raw data')
  11.  
  12. # Interpolated data
  13. In [46]: new_lon = np.linspace(ds.lon[0], ds.lon[-1], ds.dims['lon'] * 4)
  14.  
  15. In [47]: new_lat = np.linspace(ds.lat[0], ds.lat[-1], ds.dims['lat'] * 4)
  16.  
  17. In [48]: dsi = ds.interp(lat=new_lat, lon=new_lon)
  18.  
  19. In [49]: dsi.air.plot(ax=axes[1])
  20. Out[49]: <matplotlib.collections.QuadMesh at 0x7f3425426668>
  21.  
  22. In [50]: axes[1].set_title('Interpolated data')
  23. Out[50]: Text(0.5, 1.0, 'Interpolated data')

_images/interpolation_sample3.pngOur advanced interpolation can be used to remap the data to the new coordinate.Consider the new coordinates x and z on the two dimensional plane.The remapping can be done as follows

  1. # new coordinate
  2. In [51]: x = np.linspace(240, 300, 100)
  3.  
  4. In [52]: z = np.linspace(20, 70, 100)
  5.  
  6. # relation between new and original coordinates
  7. In [53]: lat = xr.DataArray(z, dims=['z'], coords={'z': z})
  8.  
  9. In [54]: lon = xr.DataArray((x[:, np.newaxis]-270)/np.cos(z*np.pi/180)+270,
  10. ....: dims=['x', 'z'], coords={'x': x, 'z': z})
  11. ....:
  12.  
  13. In [55]: fig, axes = plt.subplots(ncols=2, figsize=(10, 4))
  14.  
  15. In [56]: ds.air.plot(ax=axes[0])
  16. Out[56]: <matplotlib.collections.QuadMesh at 0x7f3424d20240>
  17.  
  18. # draw the new coordinate on the original coordinates.
  19. In [57]: for idx in [0, 33, 66, 99]:
  20. ....: axes[0].plot(lon.isel(x=idx), lat, '--k')
  21. ....:
  22.  
  23. In [58]: for idx in [0, 33, 66, 99]:
  24. ....: axes[0].plot(*xr.broadcast(lon.isel(z=idx), lat.isel(z=idx)), '--k')
  25. ....:
  26.  
  27. In [59]: axes[0].set_title('Raw data')
  28. Out[59]: Text(0.5, 1.0, 'Raw data')
  29.  
  30. In [60]: dsi = ds.interp(lon=lon, lat=lat)
  31.  
  32. In [61]: dsi.air.plot(ax=axes[1])
  33. Out[61]: <matplotlib.collections.QuadMesh at 0x7f3424d46f60>
  34.  
  35. In [62]: axes[1].set_title('Remapped data')
  36. Out[62]: Text(0.5, 1.0, 'Remapped data')

_images/interpolation_sample4.png