熟悉代码和工具

Now that you have an issue you want to fix, enhancement to add, or documentation to improve, you need to learn how to work with GitHub and the pandas code base.

Version control, Git, and GitHub

To the new user, working with Git is one of the more daunting aspects of contributing to pandas. It can very quickly become overwhelming, but sticking to the guidelines below will help keep the process straightforward and mostly trouble free. As always, if you are having difficulties please feel free to ask for help.

The code is hosted on GitHub. To contribute you will need to sign up for a free GitHub account. We use Git for version control to allow many people to work together on the project.

Some great resources for learning Git:

Getting started with Git

GitHub has instructions for installing git, setting up your SSH key, and configuring git. All these steps need to be completed before you can work seamlessly between your local repository and GitHub.

Forking

You will need your own fork to work on the code. Go to the pandas project page and hit the Fork button. You will want to clone your fork to your machine:

  1. git clone https://github.com/your-user-name/pandas.git pandas-yourname
  2. cd pandas-yourname
  3. git remote add upstream https://github.com/pandas-dev/pandas.git

This creates the directory pandas-yourname and connects your repository to the upstream (main project) pandas repository.

Creating a development environment

To test out code changes, you’ll need to build pandas from source, which requires a C compiler and Python environment. If you’re making documentation changes, you can skip to Contributing to the documentation but you won’t be able to build the documentation locally before pushing your changes.

Installing a C Compiler

Pandas uses C extensions (mostly written using Cython) to speed up certain operations. To install pandas from source, you need to compile these C extensions, which means you need a C compiler. This process depends on which platform you’re using. Follow the CPython contributing guidelines for getting a compiler installed. You don’t need to do any of the ./configure or make steps; you only need to install the compiler.

For Windows developers, the following links may be helpful.

Let us know if you have any difficulties by opening an issue or reaching out on Gitter.

Creating a Python Environment

Now that you have a C compiler, create an isolated pandas development environment:

  • Install either Anaconda or miniconda
  • Make sure your conda is up to date (conda update conda)
  • Make sure that you have cloned the repository
  • cd to the pandas source directory

We’ll now kick off a three-step process:

  • Install the build dependencies
  • Build and install pandas
  • Install the optional dependencies
  1. # Create and activate the build environment
  2. conda env create -f ci/environment-dev.yaml
  3. conda activate pandas-dev
  4. # or with older versions of Anaconda:
  5. source activate pandas-dev
  6. # Build and install pandas
  7. python setup.py build_ext --inplace -j 4
  8. python -m pip install -e .
  9. # Install the rest of the optional dependencies
  10. conda install -c defaults -c conda-forge --file=ci/requirements-optional-conda.txt

At this point you should be able to import pandas from your locally built version:

  1. $ python # start an interpreter
  2. >>> import pandas
  3. >>> print(pandas.__version__)
  4. 0.22.0.dev0+29.g4ad6d4d74

This will create the new environment, and not touch any of your existing environments, nor any existing Python installation.

To view your environments:

  1. conda info -e

To return to your root environment:

  1. conda deactivate

See the full conda docs here.

Creating a Python Environment (pip)

If you aren’t using conda for you development environment, follow these instructions. You’ll need to have at least python3.5 installed on your system.

  1. # Create a virtual environment
  2. # Use an ENV_DIR of your choice. We'll use ~/virtualenvs/pandas-dev
  3. # Any parent directories should already exist
  4. python3 -m venv ~/virtualenvs/pandas-dev
  5. # Activate the virtulaenv
  6. . ~/virtualenvs/pandas-dev/bin/activate
  7. # Install the build dependencies
  8. python -m pip install -r ci/requirements_dev.txt
  9. # Build and install pandas
  10. python setup.py build_ext --inplace -j 4
  11. python -m pip install -e .
  12. # Install additional dependencies
  13. python -m pip install -r ci/requirements-optional-pip.txt

Creating a branch

You want your master branch to reflect only production-ready code, so create a feature branch for making your changes. For example:

  1. git branch shiny-new-feature
  2. git checkout shiny-new-feature

The above can be simplified to:

  1. git checkout -b shiny-new-feature

This changes your working directory to the shiny-new-feature branch. Keep any changes in this branch specific to one bug or feature so it is clear what the branch brings to pandas. You can have many shiny-new-features and switch in between them using the git checkout command.

When creating this branch, make sure your master branch is up to date with the latest upstream master version. To update your local master branch, you can do:

  1. git checkout master
  2. git pull upstream master --ff-only

When you want to update the feature branch with changes in master after you created the branch, check the section on updating a PR.