Timezones

There are four distinct timezone components which relate to Apache Superset,

  1. The timezone that the underlying data is encoded in.
  2. The timezone of the database engine.
  3. The timezone of the Apache Superset backend.
  4. The timezone of the Apache Superset client.

where if a temporal field (DATETIME, TIME, TIMESTAMP, etc.) does not explicitly define a timezone it defaults to the underlying timezone of the component.

To help make the problem somewhat tractable—given that Apache Superset has no control on either how the data is ingested (1) or the timezone of the client (4)—from a consistency standpoint it is highly recommended that both (2) and (3) are configured to use the same timezone with a strong preference given to UTC to ensure temporal fields without an explicit timestamp are not incorrectly coerced into the wrong timezone. Actually Apache Superset currently has implicit assumptions that timestamps are in UTC and thus configuring (3) to a non-UTC timezone could be problematic.

To strive for data consistency (regardless of the timezone of the client) the Apache Superset backend tries to ensure that any timestamp sent to the client has an explicit (or semi-explicit as in the case with Epoch time which is always in reference to UTC) timezone encoded within.

The challenge however lies with the slew of database engines which Apache Superset supports and various inconsistencies between their Python Database API (DB-API) implementations combined with the fact that we use Pandas to read SQL into a DataFrame prior to serializing to JSON. Regrettably Pandas ignores the DB-API type_code relying by default on the underlying Python type returned by the DB-API. Currently only a subset of the supported database engines work correctly with Pandas, i.e., ensuring timestamps without an explicit timestamp are serializd to JSON with the server timezone, thus guaranteeing the client will display timestamps in a consistent manner irrespective of the client’s timezone.

For example the following is a comparison of MySQL and Presto,

  1. import pandas as pd
  2. from sqlalchemy import create_engine
  3. pd.read_sql_query(
  4. sql="SELECT TIMESTAMP('2022-01-01 00:00:00') AS ts",
  5. con=create_engine("mysql://root@localhost:3360"),
  6. ).to_json()
  7. pd.read_sql_query(
  8. sql="SELECT TIMESTAMP '2022-01-01 00:00:00' AS ts",
  9. con=create_engine("presto://localhost:8080"),
  10. ).to_json()

which outputs {"ts":{"0":1640995200000}} (which infers the UTC timezone per the Epoch time definition) and {"ts":{"0":"2022-01-01 00:00:00.000"}} (without an explicit timezone) respectively and thus are treated differently in JavaScript:

  1. new Date(1640995200000)
  2. > Sat Jan 01 2022 13:00:00 GMT+1300 (New Zealand Daylight Time)
  3. new Date("2022-01-01 00:00:00.000")
  4. > Sat Jan 01 2022 00:00:00 GMT+1300 (New Zealand Daylight Time)