Pandas arrays

For most data types, pandas uses NumPy arrays as the concreteobjects contained with a Index, Series, orDataFrame.

For some data types, pandas extends NumPy’s type system.

Kind of DataPandas Data TypeScalarArray
TZ-aware datetimeDatetimeTZDtypeTimestampDatetime data
Timedeltas(none)TimedeltaTimedelta data
Period (time spans)PeriodDtypePeriodTimespan data
IntervalsIntervalDtypeIntervalInterval data
Nullable IntegerInt64Dtype, …(none)Nullable integer
CategoricalCategoricalDtype(none)Categorical data
SparseSparseDtype(none)Sparse data

Pandas and third-party libraries can extend NumPy’s type system (see Extension types).The top-level array() method can be used to create a new array, which may bestored in a Series, Index, or as a column in a DataFrame.

array(data, dtype, numpy.dtype, …)Create an array.

Datetime data

NumPy cannot natively represent timezone-aware datetimes. Pandas supports thiswith the arrays.DatetimeArray extension array, which can hold timezone-naiveor timezone-aware values.

Timestamp, a subclass of datetime.datetime, is pandas’scalar type for timezone-naive or timezone-aware datetime data.

TimestampPandas replacement for python datetime.datetime object.

Properties

Timestamp.asm8Return numpy datetime64 format in nanoseconds.
Timestamp.day
Timestamp.dayofweekReturn day of whe week.
Timestamp.dayofyearReturn the day of the year.
Timestamp.days_in_monthReturn the number of days in the month.
Timestamp.daysinmonthReturn the number of days in the month.
Timestamp.fold
Timestamp.hour
Timestamp.is_leap_yearReturn True if year is a leap year.
Timestamp.is_month_endReturn True if date is last day of month.
Timestamp.is_month_startReturn True if date is first day of month.
Timestamp.is_quarter_endReturn True if date is last day of the quarter.
Timestamp.is_quarter_startReturn True if date is first day of the quarter.
Timestamp.is_year_endReturn True if date is last day of the year.
Timestamp.is_year_startReturn True if date is first day of the year.
Timestamp.max
Timestamp.microsecond
Timestamp.min
Timestamp.minute
Timestamp.month
Timestamp.nanosecond
Timestamp.quarterReturn the quarter of the year.
Timestamp.resolutionReturn resolution describing the smallest difference between two times that can be represented by Timestamp object_state
Timestamp.second
Timestamp.tzAlias for tzinfo
Timestamp.tzinfo
Timestamp.value
Timestamp.weekReturn the week number of the year.
Timestamp.weekofyearReturn the week number of the year.
Timestamp.year

Methods

Timestamp.astimezone(self, tz)Convert tz-aware Timestamp to another time zone.
Timestamp.ceil(self, freq[, ambiguous, …])return a new Timestamp ceiled to this resolution
Timestamp.combine(date, time)date, time -> datetime with same date and time fields
Timestamp.ctime()Return ctime() style string.
Timestamp.date()Return date object with same year, month and day.
Timestamp.day_name(self[, locale])Return the day name of the Timestamp with specified locale.
Timestamp.dst()Return self.tzinfo.dst(self).
Timestamp.floor(self, freq[, ambiguous, …])return a new Timestamp floored to this resolution
Timestamp.freq
Timestamp.freqstrReturn the total number of days in the month.
Timestamp.fromordinal(ordinal[, freq, tz])passed an ordinal, translate and convert to a ts note: by definition there cannot be any tz info on the ordinal itself
Timestamp.fromtimestamp(ts)timestamp[, tz] -> tz’s local time from POSIX timestamp.
Timestamp.isocalendar()Return a 3-tuple containing ISO year, week number, and weekday.
Timestamp.isoformat(self[, sep])
Timestamp.isoweekday()Return the day of the week represented by the date.
Timestamp.month_name(self[, locale])Return the month name of the Timestamp with specified locale.
Timestamp.normalize(self)Normalize Timestamp to midnight, preserving tz information.
Timestamp.now([tz])Return new Timestamp object representing current time local to tz.
Timestamp.replace(self[, year, month, day, …])implements datetime.replace, handles nanoseconds
Timestamp.round(self, freq[, ambiguous, …])Round the Timestamp to the specified resolution
Timestamp.strftime()format -> strftime() style string.
Timestamp.strptime(string, format)Function is not implemented.
Timestamp.time()Return time object with same time but with tzinfo=None.
Timestamp.timestamp()Return POSIX timestamp as float.
Timestamp.timetuple()Return time tuple, compatible with time.localtime().
Timestamp.timetz()Return time object with same time and tzinfo.
Timestamp.to_datetime64()Return a numpy.datetime64 object with ‘ns’ precision.
Timestamp.to_numpy()Convert the Timestamp to a NumPy datetime64.
Timestamp.to_julian_date(self)Convert TimeStamp to a Julian Date.
Timestamp.to_period(self[, freq])Return an period of which this timestamp is an observation.
Timestamp.to_pydatetime()Convert a Timestamp object to a native Python datetime object.
Timestamp.today(cls[, tz])Return the current time in the local timezone.
Timestamp.toordinal()Return proleptic Gregorian ordinal.
Timestamp.tz_convert(self, tz)Convert tz-aware Timestamp to another time zone.
Timestamp.tz_localize(self, tz[, ambiguous, …])Convert naive Timestamp to local time zone, or remove timezone from tz-aware Timestamp.
Timestamp.tzname()Return self.tzinfo.tzname(self).
Timestamp.utcfromtimestamp(ts)Construct a naive UTC datetime from a POSIX timestamp.
Timestamp.utcnow()Return a new Timestamp representing UTC day and time.
Timestamp.utcoffset()Return self.tzinfo.utcoffset(self).
Timestamp.utctimetuple()Return UTC time tuple, compatible with time.localtime().
Timestamp.weekday()Return the day of the week represented by the date.

A collection of timestamps may be stored in a arrays.DatetimeArray.For timezone-aware data, the .dtype of a DatetimeArray is aDatetimeTZDtype. For timezone-naive data, np.dtype("datetime64[ns]")is used.

If the data are tz-aware, then every value in the array must have the same timezone.

arrays.DatetimeArray(values[, dtype, freq, copy])Pandas ExtensionArray for tz-naive or tz-aware datetime data.
DatetimeTZDtype([unit, tz])An ExtensionDtype for timezone-aware datetime data.

Timedelta data

NumPy can natively represent timedeltas. Pandas provides Timedeltafor symmetry with Timestamp.

TimedeltaRepresents a duration, the difference between two dates or times.

Properties

Timedelta.asm8Return a numpy timedelta64 array scalar view.
Timedelta.componentsReturn a components namedtuple-like.
Timedelta.daysNumber of days.
Timedelta.deltaReturn the timedelta in nanoseconds (ns), for internal compatibility.
Timedelta.freq
Timedelta.is_populated
Timedelta.max
Timedelta.microsecondsNumber of microseconds (>= 0 and less than 1 second).
Timedelta.min
Timedelta.nanosecondsReturn the number of nanoseconds (n), where 0 <= n < 1 microsecond.
Timedelta.resolutionReturn a string representing the lowest timedelta resolution.
Timedelta.secondsNumber of seconds (>= 0 and less than 1 day).
Timedelta.value
Timedelta.view()Array view compatibility.

Methods

Timedelta.ceil(self, freq)return a new Timedelta ceiled to this resolution
Timedelta.floor(self, freq)return a new Timedelta floored to this resolution
Timedelta.isoformat()Format Timedelta as ISO 8601 Duration like P[n]Y[n]M[n]DT[n]H[n]M[n]S, where the [n] s are replaced by the values.
Timedelta.round(self, freq)Round the Timedelta to the specified resolution
Timedelta.to_pytimedelta()Convert a pandas Timedelta object into a python timedelta object.
Timedelta.to_timedelta64()Return a numpy.timedelta64 object with ‘ns’ precision.
Timedelta.to_numpy()Convert the Timestamp to a NumPy timedelta64.
Timedelta.total_seconds()Total duration of timedelta in seconds (to ns precision).

A collection of timedeltas may be stored in a TimedeltaArray.

arrays.TimedeltaArray(values[, dtype, freq, …])Pandas ExtensionArray for timedelta data.

Timespan data

Pandas represents spans of times as Period objects.

Period

PeriodRepresents a period of time

Properties

Period.dayGet day of the month that a Period falls on.
Period.dayofweekDay of the week the period lies in, with Monday=0 and Sunday=6.
Period.dayofyearReturn the day of the year.
Period.days_in_monthGet the total number of days in the month that this period falls on.
Period.daysinmonthGet the total number of days of the month that the Period falls in.
Period.end_time
Period.freq
Period.freqstr
Period.hourGet the hour of the day component of the Period.
Period.is_leap_year
Period.minuteGet minute of the hour component of the Period.
Period.month
Period.ordinal
Period.quarter
Period.qyearFiscal year the Period lies in according to its starting-quarter.
Period.secondGet the second component of the Period.
Period.start_timeGet the Timestamp for the start of the period.
Period.weekGet the week of the year on the given Period.
Period.weekdayDay of the week the period lies in, with Monday=0 and Sunday=6.
Period.weekofyear
Period.year

Methods

Period.asfreq()Convert Period to desired frequency, either at the start or end of the interval
Period.now()
Period.strftime()Returns the string representation of the Period, depending on the selected fmt.
Period.to_timestamp()Return the Timestamp representation of the Period at the target frequency at the specified end (how) of the Period

A collection of timedeltas may be stored in a arrays.PeriodArray.Every period in a PeriodArray must have the same freq.

arrays.PeriodArray(values[, freq, dtype, copy])Pandas ExtensionArray for storing Period data.
PeriodDtypeAn ExtensionDtype for Period data.

Interval data

Arbitrary intervals can be represented as Interval objects.

IntervalImmutable object implementing an Interval, a bounded slice-like interval.

Properties

Interval.closedWhether the interval is closed on the left-side, right-side, both or neither
Interval.closed_leftCheck if the interval is closed on the left side.
Interval.closed_rightCheck if the interval is closed on the right side.
Interval.is_emptyIndicates if an interval is empty, meaning it contains no points.
Interval.leftLeft bound for the interval
Interval.lengthReturn the length of the Interval
Interval.midReturn the midpoint of the Interval
Interval.open_leftCheck if the interval is open on the left side.
Interval.open_rightCheck if the interval is open on the right side.
Interval.overlaps()Check whether two Interval objects overlap.
Interval.rightRight bound for the interval

A collection of intervals may be stored in an arrays.IntervalArray.

arrays.IntervalArrayPandas array for interval data that are closed on the same side.
IntervalDtypeAn ExtensionDtype for Interval data.

Nullable integer

numpy.ndarray cannot natively represent integer-data with missing values.Pandas provides this through arrays.IntegerArray.

arrays.IntegerArray(values, mask[, copy])Array of integer (optional missing) values.
Int8DtypeAn ExtensionDtype for int8 integer data.
Int16DtypeAn ExtensionDtype for int16 integer data.
Int32DtypeAn ExtensionDtype for int32 integer data.
Int64DtypeAn ExtensionDtype for int64 integer data.
UInt8DtypeAn ExtensionDtype for uint8 integer data.
UInt16DtypeAn ExtensionDtype for uint16 integer data.
UInt32DtypeAn ExtensionDtype for uint32 integer data.
UInt64DtypeAn ExtensionDtype for uint64 integer data.

Categorical data

Pandas defines a custom data type for representing data that can take only alimited, fixed set of values. The dtype of a Categorical can be described bya pandas.api.types.CategoricalDtype.

CategoricalDtype([categories])Type for categorical data with the categories and orderedness.
CategoricalDtype.categoriesAn Index containing the unique categories allowed.
CategoricalDtype.orderedWhether the categories have an ordered relationship.

Categorical data can be stored in a pandas.Categorical

Categorical(values[, categories, ordered, …])Represent a categorical variable in classic R / S-plus fashion.

The alternative Categorical.from_codes() constructor can be used when youhave the categories and integer codes already:

Categorical.from_codes(codes[, categories, …])Make a Categorical type from codes and categories or dtype.

The dtype information is available on the Categorical

Categorical.dtypeThe CategoricalDtype for this instance
Categorical.categoriesThe categories of this categorical.
Categorical.orderedWhether the categories have an ordered relationship.
Categorical.codesThe category codes of this categorical.

np.asarray(categorical) works by implementing the array interface. Be aware, that this convertsthe Categorical back to a NumPy array, so categories and order information is not preserved!

Categorical.array(self[, dtype])The numpy array interface.

A Categorical can be stored in a Series or DataFrame.To create a Series of dtype category, use cat = s.astype(dtype) orSeries(…, dtype=dtype) where dtype is either

  • the string 'category'
  • an instance of CategoricalDtype.

If the Series is of dtype CategoricalDtype, Series.cat can be used to change the categoricaldata. See Categorical accessor for more.

Sparse data

Data where a single value is repeated many times (e.g. 0 or NaN) maybe stored efficiently as a SparseArray.

SparseArray(data[, sparse_index, index, …])An ExtensionArray for storing sparse data.
SparseDtype(dtype, numpy.dtype, …)Dtype for data stored in SparseArray.

The Series.sparse accessor may be used to access sparse-specific attributesand methods if the Series contains sparse values. SeeSparse accessor for more.