Weather and climate data

xarray can leverage metadata that follows the Climate and Forecast (CF) conventions if present. Examples include automatic labelling of plots with descriptive names and units if proper metadata is present (see Plotting) and support for non-standard calendars used in climate science through the cftime module (see Non-standard calendars and dates outside the Timestamp-valid range). There are also a number of geosciences-focused projects that build on xarray (see Xarray related projects).

CF-compliant coordinate variables

MetPy adds a metpy accessor that allows accessing coordinates with appropriate CF metadata using generic names x, y, vertical and time. There is also a cartopy_crs attribute that provides projection information, parsed from the appropriate CF metadata, as a Cartopy projection object. See their documentation for more information.

Non-standard calendars and dates outside the Timestamp-valid range

Through the standalone cftime library and a custom subclass ofpandas.Index, xarray supports a subset of the indexingfunctionality enabled through the standard pandas.DatetimeIndex fordates from non-standard calendars commonly used in climate science or datesusing a standard calendar, but outside the Timestamp-valid range(approximately between years 1678 and 2262).

Note

As of xarray version 0.11, by default, cftime.datetime objectswill be used to represent times (either in indexes, as aCFTimeIndex, or in data arrays with dtype object) ifany of the following are true:

  • The dates are from a non-standard calendar

  • Any dates are outside the Timestamp-valid range.

Otherwise pandas-compatible dates from a standard calendar will berepresented with the np.datetime64[ns] data type, enabling the use of apandas.DatetimeIndex or arrays with dtype np.datetime64[ns]and their full set of associated features.

For example, you can create a DataArray indexed by a timecoordinate with dates from a no-leap calendar and aCFTimeIndex will automatically be used:

  1. In [1]: from itertools import product
  2.  
  3. In [2]: from cftime import DatetimeNoLeap
  4.  
  5. In [3]: dates = [DatetimeNoLeap(year, month, 1) for year, month in
  6. ...: product(range(1, 3), range(1, 13))]
  7. ...:
  8.  
  9. In [4]: da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'], name='foo')

xarray also includes a cftime_range() function, which enablescreating a CFTimeIndex with regularly-spaced dates. Forinstance, we can create the same dates and DataArray we created above using:

  1. In [5]: dates = xr.cftime_range(start='0001', periods=24, freq='MS', calendar='noleap')
  2.  
  3. In [6]: da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'], name='foo')

With strftime() we can also easily generate formatted strings fromthe datetime values of a CFTimeIndex directly or through thedt() accessor for a DataArrayusing the same formatting as the standard datetime.strftime convention .

  1. In [7]: dates.strftime('%c')
  2. Out[7]:
  3. Index(['Tue Jan 1 00:00:00 1', 'Fri Feb 1 00:00:00 1',
  4. 'Fri Mar 1 00:00:00 1', 'Mon Apr 1 00:00:00 1',
  5. 'Wed May 1 00:00:00 1', 'Sat Jun 1 00:00:00 1',
  6. 'Mon Jul 1 00:00:00 1', 'Thu Aug 1 00:00:00 1',
  7. 'Sun Sep 1 00:00:00 1', 'Tue Oct 1 00:00:00 1',
  8. 'Fri Nov 1 00:00:00 1', 'Sun Dec 1 00:00:00 1',
  9. 'Wed Jan 1 00:00:00 2', 'Sat Feb 1 00:00:00 2',
  10. 'Sat Mar 1 00:00:00 2', 'Tue Apr 1 00:00:00 2',
  11. 'Thu May 1 00:00:00 2', 'Sun Jun 1 00:00:00 2',
  12. 'Tue Jul 1 00:00:00 2', 'Fri Aug 1 00:00:00 2',
  13. 'Mon Sep 1 00:00:00 2', 'Wed Oct 1 00:00:00 2',
  14. 'Sat Nov 1 00:00:00 2', 'Mon Dec 1 00:00:00 2'],
  15. dtype='object')
  16.  
  17. In [8]: da['time'].dt.strftime('%Y%m%d')
  18. Out[8]:
  19. <xarray.DataArray 'strftime' (time: 24)>
  20. array([' 10101', ' 10201', ' 10301', ' 10401', ' 10501', ' 10601',
  21. ' 10701', ' 10801', ' 10901', ' 11001', ' 11101', ' 11201',
  22. ' 20101', ' 20201', ' 20301', ' 20401', ' 20501', ' 20601',
  23. ' 20701', ' 20801', ' 20901', ' 21001', ' 21101', ' 21201'],
  24. dtype=object)
  25. Coordinates:
  26. * time (time) object 0001-01-01 00:00:00 ... 0002-12-01 00:00:00

For data indexed by a CFTimeIndex xarray currently supports:

  1. In [9]: da.sel(time='0001')
  2. Out[9]:
  3. <xarray.DataArray 'foo' (time: 12)>
  4. array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
  5. Coordinates:
  6. * time (time) object 0001-01-01 00:00:00 ... 0001-12-01 00:00:00
  7.  
  8. In [10]: da.sel(time=slice('0001-05', '0002-02'))
  9. Out[10]:
  10. <xarray.DataArray 'foo' (time: 10)>
  11. array([ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
  12. Coordinates:
  13. * time (time) object 0001-05-01 00:00:00 ... 0002-02-01 00:00:00
  • Access of basic datetime components via the dt accessor (in this casejust “year”, “month”, “day”, “hour”, “minute”, “second”, “microsecond”,“season”, “dayofyear”, and “dayofweek”):
  1. In [11]: da.time.dt.year
  2. Out[11]:
  3. <xarray.DataArray 'year' (time: 24)>
  4. array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
  5. Coordinates:
  6. * time (time) object 0001-01-01 00:00:00 ... 0002-12-01 00:00:00
  7.  
  8. In [12]: da.time.dt.month
  9. Out[12]:
  10. <xarray.DataArray 'month' (time: 24)>
  11. array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6,
  12. 7, 8, 9, 10, 11, 12])
  13. Coordinates:
  14. * time (time) object 0001-01-01 00:00:00 ... 0002-12-01 00:00:00
  15.  
  16. In [13]: da.time.dt.season
  17. Out[13]:
  18. <xarray.DataArray 'season' (time: 24)>
  19. array(['DJF', 'DJF', 'MAM', 'MAM', 'MAM', 'JJA', 'JJA', 'JJA', 'SON', 'SON',
  20. 'SON', 'DJF', 'DJF', 'DJF', 'MAM', 'MAM', 'MAM', 'JJA', 'JJA', 'JJA',
  21. 'SON', 'SON', 'SON', 'DJF'], dtype='<U3')
  22. Coordinates:
  23. * time (time) object 0001-01-01 00:00:00 ... 0002-12-01 00:00:00
  24.  
  25. In [14]: da.time.dt.dayofyear
  26. Out[14]:
  27. <xarray.DataArray 'dayofyear' (time: 24)>
  28. array([ 1, 32, 60, 91, 121, 152, 182, 213, 244, 274, 305, 335, 1, 32,
  29. 60, 91, 121, 152, 182, 213, 244, 274, 305, 335])
  30. Coordinates:
  31. * time (time) object 0001-01-01 00:00:00 ... 0002-12-01 00:00:00
  32.  
  33. In [15]: da.time.dt.dayofweek
  34. Out[15]:
  35. <xarray.DataArray 'dayofweek' (time: 24)>
  36. array([1, 4, 4, 0, 2, 5, 0, 3, 6, 1, 4, 6, 2, 5, 5, 1, 3, 6, 1, 4, 0, 2, 5, 0])
  37. Coordinates:
  38. * time (time) object 0001-01-01 00:00:00 ... 0002-12-01 00:00:00
  • Group-by operations based on datetime accessor attributes (e.g. by month ofthe year):
  1. In [16]: da.groupby('time.month').sum()
  2. Out[16]:
  3. <xarray.DataArray 'foo' (month: 12)>
  4. array([12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34])
  5. Coordinates:
  6. * month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
  • Interpolation using cftime.datetime objects:
  1. In [17]: da.interp(time=[DatetimeNoLeap(1, 1, 15), DatetimeNoLeap(1, 2, 15)])
  2. Out[17]:
  3. <xarray.DataArray 'foo' (time: 2)>
  4. array([0.451613, 1.5 ])
  5. Coordinates:
  6. * time (time) object 0001-01-15 00:00:00 0001-02-15 00:00:00
  • Interpolation using datetime strings:
  1. In [18]: da.interp(time=['0001-01-15', '0001-02-15'])
  2. Out[18]:
  3. <xarray.DataArray 'foo' (time: 2)>
  4. array([0.451613, 1.5 ])
  5. Coordinates:
  6. * time (time) object 0001-01-15 00:00:00 0001-02-15 00:00:00
  • Differentiation:
  1. In [19]: da.differentiate('time')
  2. Out[19]:
  3. <xarray.DataArray 'foo' (time: 24)>
  4. array([3.733572e-07, 3.943755e-07, 3.943755e-07, 3.796819e-07, 3.796819e-07,
  5. 3.796819e-07, 3.796819e-07, 3.733572e-07, 3.796819e-07, 3.796819e-07,
  6. 3.796819e-07, 3.796819e-07, 3.733572e-07, 3.943755e-07, 3.943755e-07,
  7. 3.796819e-07, 3.796819e-07, 3.796819e-07, 3.796819e-07, 3.733572e-07,
  8. 3.796819e-07, 3.796819e-07, 3.796819e-07, 3.858025e-07])
  9. Coordinates:
  10. * time (time) object 0001-01-01 00:00:00 ... 0002-12-01 00:00:00
  • Serialization:
  1. In [20]: da.to_netcdf('example-no-leap.nc')
  2.  
  3. In [21]: xr.open_dataset('example-no-leap.nc')
  4. Out[21]:
  5. <xarray.Dataset>
  6. Dimensions: (time: 24)
  7. Coordinates:
  8. * time (time) object 0001-01-01 00:00:00 ... 0002-12-01 00:00:00
  9. Data variables:
  10. foo (time) int64 ...
  • And resampling along the time dimension for data indexed by a CFTimeIndex:
  1. In [22]: da.resample(time='81T', closed='right', label='right', base=3).mean()
  2. Out[22]:
  3. <xarray.DataArray 'foo' (time: 12428)>
  4. array([ 0., nan, nan, ..., nan, nan, 23.])
  5. Coordinates:
  6. * time (time) object 0001-01-01 00:03:00 ... 0002-12-01 00:30:00

Note

For some use-cases it may still be useful to convert froma CFTimeIndex to a pandas.DatetimeIndex,despite the difference in calendar types. The recommended way of doing thisis to use the built-in to_datetimeindex()method:

  1. In [23]: modern_times = xr.cftime_range('2000', periods=24, freq='MS', calendar='noleap')
  2.  
  3. In [24]: da = xr.DataArray(range(24), [('time', modern_times)])
  4.  
  5. In [25]: da
  6. Out[25]:
  7. <xarray.DataArray (time: 24)>
  8. array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
  9. 18, 19, 20, 21, 22, 23])
  10. Coordinates:
  11. * time (time) object 2000-01-01 00:00:00 ... 2001-12-01 00:00:00
  12.  
  13. In [26]: datetimeindex = da.indexes['time'].to_datetimeindex()
  14.  
  15. In [27]: da['time'] = datetimeindex

However in this case one should use caution to only perform operations whichdo not depend on differences between dates (e.g. differentiation,interpolation, or upsampling with resample), as these could introduce subtleand silent errors due to the difference in calendar types between the datesencoded in your data and the dates stored in memory.