Version control

Why would you use version control software and hosting (such as GitHub)?

  • Easier to collaborate Version control makes it easier to work on the same code simultaneously, while everyone still has a well defined version of the software (in contrast to a google-docs or shared file system type of system). Moreover, version control hosting websites such as Github provide way to communicate in a more structed way, such as in code reviews, about commits and about issues.
  • Reproducability By using version control, you never lose previous versions of the software. This also gives you a log of changes and allows you to understand what happened.
  • Backup Version control is usually pushed to an external a shared server, which immediately provides a backup.
  • Integration Version control software and host makes it more easy to integrate with other software that support modern software development, such as testing (continuous integration ,automatically run tests, build documentation, check code style, integration with bug-tracker, code review infrastructure, comment on code).

GitHub

Netherlands eScience center uses GitHub GitHub for version control.
To keep our code transparent and findable the preferred code hosting platform is
GitHub and version management is git. The repository should preferably be public
from the start.

By default an eScience Research Engineer is expected to create a new GitHub
organization

for each project and create repositories in there.
However a new repository should be made in the Netherlands eScience Center
GitHub organization (https://github.com/NLeSC)
when
the repository is used in multiple projects.

Policy

  • No repositories which the Netherlands eScience Center is paying for should be in
    personal accounts, they SHOULD always be in either the Netherlands eScience
    Center GitHub organization
    or in a project based
    GitHub organization
  • GitHub supports two-factor
    authentication
    .
    This SHOULD be enabled for your account
  • Project based GitHub organizations
    • MUST have at least two owners that are Netherlands eScience center employees
    • MUST be registered at https://nlesc.github.io/, to keep track of all the project organizations
    • Private repositories can be created. Free when GitHub’s education discount is requested. NOTE: The Netherlands eScience Center IP policy applies to any software we contribute to, so the repository SHOULD become open source at some point. To prevent private repositories from remaining unnecessarily private forever please add a brief statement in the README of your repository, clarifying:
      • Why is this repository private?
      • On which date can this repository be made public?
      • Who should be consulted if we would like to make the repository public in the future?
  • Netherlands eScience center Github organization (https://github.com/NLeSC)
    • Only Netherlands eScience center employees are members
    • All members have permission to create new repositories
    • Collaborators SHOULD be used to grant access to non-members
    • A limited number of slots for private repositories is available, but using them is discouraged
    • To prevent private repositories from remaining unnecessarily private forever please add a brief statement in the README of your repository, clarifying:
      • Why is this repository private?
      • On which date can this repository be made public?
      • Who should be consulted if we would like to make the repository public in the future?

Version control from the beginning of the project

It is highly recommended to start using version control on day one of the project.

Use git as version control system

Other version control systems can be used if the project does not start in the
eScience Center and does not use git, or when the prevailing version control
system in the particular community is not git. Even then, changing version
control systems should be considered (especially if Subversion or another
centralised system is used).

Git documentation:

Choose one branching model

A branching model describes how the project deals with different versions of the codebase, like releases and various development versions, and how to accept code contributions.
Make the choice explicit in the contribution guidelines, and link to documentation on how to get started with it.
Our default choice is GitHub flow branching model

GitHub flow is a very simple and sane branching model. It supports collaboration and is based on pull requests, therefore relies
heavily on GitHub. The Pro Git book describes in detail the workflow of collaboration on the project
with use of git branches,
forks and GitHub in Contributing to a Project chapter.
Other more complicated models could be used if necessary, but we should strive for simplicity and uniformity
within the eScience Center since that will enhance collaboration between the engineers. Learning a new branching model should not stand in the way
of contributions.
You can learn more about those other models from atlasian page.

Repositories should be public

A public code repository has several benefits:

  • It makes your code findable.
  • It is a central point for users and collaborators.
  • It shows your code to world, allowing (re)use and enables you to get credit for your work.
  • It is usually not hosted on your laptop, and hence provides an external backup.

Unless code cannot be open (e.g. when working with commercial partners, or when
there are competitiveness issues) it should be in a public online repository. In
case the code uses data that cannot be open, an engineer should try to keep
sensitive parts outside of the main codebase. If you accidentally included
copyrighted files in your repository, you need to remove them from the HEAD as
well as from history. There is a gist
here that
explains how.

Meaningful commit messages

Commit messages are the way for other developers to understand changes in the
codebase. In case of using GitHub flow model, commit messages can be very short
but pull request comments should explain all the changes. It is very important
to explain the why behind implementation choices. To learn more about writing
good commit messages, read tpope’s
guide
and
this post

GitHub has some interesting features that allow you to close issues directly
from commit
messages
.

Code snippets library

Sometimes, we develop small snippets of code that can be directly reused in other projects, but that are too small to put in a library. We store these code snippets in git, in GitHub Gists.