Installing Superset Locally Using Docker Compose

The fastest way to try Superset locally is using Docker and Docker Compose on a Linux or Mac OSX computer. Superset does not have official support for Windows, so we have provided a VM workaround below.

1. Install a Docker Engine and Docker Compose

Mac OSX

Install Docker for Mac, which includes the Docker engine and a recent version of docker compose out of the box.

Once you have Docker for Mac installed, open up the preferences pane for Docker, go to the “Resources” section and increase the allocated memory to 6GB. With only the 2GB of RAM allocated by default, Superset will fail to start.

Linux

Install Docker on Linux by following Docker’s instructions for whichever flavor of Linux suits you. Because docker compose is not installed as part of the base Docker installation on Linux, once you have a working engine, follow the docker compose installation instructions for Linux.

Windows

Superset is not officially supported on Windows unfortunately. One option for Windows users to try out Superset locally is to install an Ubuntu Desktop VM via VirtualBox and proceed with the Docker on Linux instructions inside of that VM. We recommend assigning at least 8GB of RAM to the virtual machine as well as provisioning a hard drive of at least 40GB, so that there will be enough space for both the OS and all of the required dependencies. Docker Desktop recently added support for Windows Subsystem for Linux (WSL) 2, which may be another option.

2. Clone Superset’s GitHub repository

Clone Superset’s repo in your terminal with the following command:

  1. git clone https://github.com/apache/superset.git

Once that command completes successfully, you should see a new superset folder in your current directory.

3. Launch Superset Through Docker Compose

Navigate to the folder you created in step 1:

  1. cd superset

When working on master branch, run the following commands to run development mode using docker compose:

  1. docker compose up

Installing Locally Using Docker Compose - 图1tip

When running in development mode the superset-node container needs to finish building assets in order for the UI to render properly. If you would just like to try out Superset without making any code changes follow the steps documented for production or a specific version below.

When working on master branch, run the following commands to run production mode using docker compose:

  1. docker compose -f docker-compose-non-dev.yml pull
  2. docker compose -f docker-compose-non-dev.yml up

Alternatively, you can also run a specific version of Superset by first checking out the branch/tag, and then starting docker compose with the TAG variable. For example, to run the 3.0.0 version, run the following commands on Linux-based systems:

  1. git checkout 3.0.0
  2. TAG=3.0.0 docker compose -f docker-compose-non-dev.yml pull
  3. TAG=3.0.0 docker compose -f docker-compose-non-dev.yml up

If you are using Docker Desktop for Windows then run the following commands:

  1. git checkout 3.0.0
  2. set TAG=3.0.0
  3. docker compose -f docker-compose-non-dev.yml pull
  4. docker compose -f docker-compose-non-dev.yml up

Installing Locally Using Docker Compose - 图2tip

Note that some configuration is mandatory for production instances of Superset. In particular, Superset will not start without a user-specified value of SECRET_KEY in a Superset config file or SUPERSET_SECRET_KEY as an environment variable. Please see Configuring Superset for more details.

Installing Locally Using Docker Compose - 图3caution

All of the content belonging to a Superset instance - charts, dashboards, users, etc. - is stored in its metadata database. In production, this database should be backed up. The default installation with docker compose will store that data in a PostgreSQL database contained in a Docker volume, which is not backed up. To avoid risking data loss, either use a managed database for your metadata (recommended) or perform your own regular backups by extracting and storing the contents of the default PostgreSQL database from its volume (here’s an example of how to dump and restore).

You should see a wall of logging output from the containers being launched on your machine. Once this output slows, you should have a running instance of Superset on your local machine! To avoid the wall of text on future runs, add the -d option to the end of the docker compose up command.

Configuring Docker Compose

The following is for users who want to configure how Superset runs in Docker Compose; otherwise, you can skip to the next section.

You can install additional python packages and apply config overrides by following the steps mentioned in docker/README.md

You can configure the Docker Compose environment variables for dev and non-dev mode with docker/.env and docker/.env-non-dev respectively. These environment files set the environment for most containers in the Docker Compose setup, and some variables affect multiple containers and others only single ones.

One important variable is SUPERSET_LOAD_EXAMPLES which determines whether the superset_init container will populate example data and visualizations into the metadata database. These examples are helpful for learning and testing out Superset but unnecessary for experienced users and production deployments. The loading process can sometimes take a few minutes and a good amount of CPU, so you may want to disable it on a resource-constrained device.

Installing Locally Using Docker Compose - 图4note

Users often want to connect to other databases from Superset. Currently, the easiest way to do this is to modify the docker-compose-non-dev.yml file and add your database as a service that the other services depend on (via x-superset-depends-on). Others have attempted to set network_mode: host on the Superset services, but these generally break the installation, because the configuration requires use of the Docker Compose DNS resolver for the service names. If you have a good solution for this, let us know!

Installing Locally Using Docker Compose - 图5note

Superset uses Scarf Gateway to collect telemetry data. Knowing the installation counts for different Superset versions informs the project’s decisions about patching and long-term support. Scarf purges personally identifiable information (PII) and provides only aggregated statistics.

To opt-out of this data collection for packages downloaded through the Scarf Gateway by your docker compose based installation, edit the x-superset-image: line in your docker-compose.yml and docker-compose-non-dev.yml files, replacing apachesuperset.docker.scarf.sh/apache/superset with apache/superset to pull the image directly from Docker Hub.

To disable the Scarf telemetry pixel, set the SCARF_ANALYTICS environment variable to False in your terminal and/or in your docker/.env and docker/.env-non-dev files.

4. Log in to Superset

Your local Superset instance also includes a Postgres server to store your data and is already pre-loaded with some example datasets that ship with Superset. You can access Superset now via your web browser by visiting http://localhost:8088. Note that many browsers now default to https - if yours is one of them, please make sure it uses http.

Log in with the default username and password:

  1. username: admin
  1. password: admin

5. Connecting Superset to your local database instance

When running Superset using docker or docker compose it runs in its own docker container, as if the Superset was running in a separate machine entirely. Therefore attempts to connect to your local database with the hostname localhost won’t work as localhost refers to the docker container Superset is running in, and not your actual host machine. Fortunately, docker provides an easy way to access network resources in the host machine from inside a container, and we will leverage this capability to connect to our local database instance.

Here the instructions are for connecting to postgresql (which is running on your host machine) from Superset (which is running in its docker container). Other databases may have slightly different configurations but gist would be same and boils down to 2 steps -

  1. (Mac users may skip this step) Configuring the local postgresql/database instance to accept public incoming connections. By default, postgresql only allows incoming connections from localhost and under Docker, unless you use --network=host, localhost will refer to different endpoints on the host machine and in a docker container respectively. Allowing postgresql to accept connections from the Docker involves making one-line changes to the files postgresql.conf and pg_hba.conf; you can find helpful links tailored to your OS / PG version on the web easily for this task. For Docker it suffices to only whitelist IPs 172.0.0.0/8 instead of *, but in any case you are warned that doing this in a production database may have disastrous consequences as you are opening your database to the public internet.
  2. Instead of localhost, try using host.docker.internal (Mac users, Ubuntu) or 172.18.0.1 (Linux users) as the hostname when attempting to connect to the database. This is a Docker internal detail — what is happening is that, in Mac systems, Docker Desktop creates a dns entry for the hostname host.docker.internal which resolves to the correct address for the host machine, whereas in Linux this is not the case (at least by default). If neither of these 2 hostnames work then you may want to find the exact hostname you want to use, for that you can do ifconfig or ip addr show and look at the IP address of docker0 interface that must have been created by Docker for you. Alternately if you don’t even see the docker0 interface try (if needed with sudo) docker network inspect bridge and see if there is an entry for "Gateway" and note the IP address.