Caching

Superset uses Flask-Caching for caching purposes. Configuring caching is as easy as providing a custom cache config in your superset_config.py that complies with the Flask-Caching specifications. Flask-Caching supports various caching backends, including Redis, Memcached, SimpleCache (in-memory), or the local filesystem. Custom cache backends are also supported. See here for specifics. The following cache configurations can be customized:

  • Metadata cache (optional): CACHE_CONFIG
  • Charting data queried from datasets (optional): DATA_CACHE_CONFIG
  • SQL Lab query results (optional): RESULTS_BACKEND. See Async Queries via Celery for details
  • Dashboard filter state (required): FILTER_STATE_CACHE_CONFIG.
  • Explore chart form data (required): EXPLORE_FORM_DATA_CACHE_CONFIG

Please note, that Dashboard and Explore caching is required. If these caches are undefined, Superset falls back to using a built-in cache that stores data in the metadata database. While it is recommended to use a dedicated cache, the built-in cache can also be used to cache other data. For example, to use the built-in cache to store chart data, use the following config:

  1. DATA_CACHE_CONFIG = {
  2. "CACHE_TYPE": "SupersetMetastoreCache",
  3. "CACHE_KEY_PREFIX": "superset_results", # make sure this string is unique to avoid collisions
  4. "CACHE_DEFAULT_TIMEOUT": 86400, # 60 seconds * 60 minutes * 24 hours
  5. }
  • Redis (recommended): we recommend the redis Python package
  • Memcached: we recommend using pylibmc client library as python-memcached does not handle storing binary data correctly.

Both of these libraries can be installed using pip.

For chart data, Superset goes up a “timeout search path”, from a slice’s configuration to the datasource’s, the database’s, then ultimately falls back to the global default defined in DATA_CACHE_CONFIG.

Celery beat

Caching Thumbnails

This is an optional feature that can be turned on by activating it’s feature flag on config:

  1. FEATURE_FLAGS = {
  2. "THUMBNAILS": True,
  3. "THUMBNAILS_SQLA_LISTENERS": True,
  4. }

For this feature you will need a cache system and celery workers. All thumbnails are stored on cache and are processed asynchronously by the workers.

An example config where images are stored on S3 could be:

  1. from flask import Flask
  2. from s3cache.s3cache import S3Cache
  3. ...
  4. class CeleryConfig(object):
  5. broker_url = "redis://localhost:6379/0"
  6. imports = ("superset.sql_lab", "superset.tasks", "superset.tasks.thumbnails")
  7. result_backend = "redis://localhost:6379/0"
  8. worker_prefetch_multiplier = 10
  9. task_acks_late = True
  10. CELERY_CONFIG = CeleryConfig
  11. def init_thumbnail_cache(app: Flask) -> S3Cache:
  12. return S3Cache("bucket_name", 'thumbs_cache/')
  13. THUMBNAIL_CACHE_CONFIG = init_thumbnail_cache
  14. # Async selenium thumbnail task will use the following user
  15. THUMBNAIL_SELENIUM_USER = "Admin"

Using the above example cache keys for dashboards will be superset_thumb__dashboard__{ID}. You can override the base URL for selenium using:

  1. WEBDRIVER_BASEURL = "https://superset.company.com"

Additional selenium web drive configuration can be set using WEBDRIVER_CONFIGURATION. You can implement a custom function to authenticate selenium. The default function uses the flask-login session cookie. Here’s an example of a custom function signature:

  1. def auth_driver(driver: WebDriver, user: "User") -> WebDriver:
  2. pass

Then on configuration:

  1. WEBDRIVER_AUTH_FUNC = auth_driver