How to add test queries to ClickHouse CI

ClickHouse has hundreds (or even thousands) of features. Every commit get checked by a complex set of tests containing many thousands of test cases.

The core functionality is very well tested, but some corner-cases and different combinations of features can be uncovered with ClickHouse CI.

Most of the bugs/regressions we see happen in that ‘grey area’ where test coverage is poor.

And we are very interested in covering most of the possible scenarios and feature combinations used in real life by tests.

Why adding tests

Why/when you should add a test case into ClickHouse code:
1) you use some complicated scenarios / feature combinations / you have some corner case which is probably not widely used
2) you see that certain behavior gets changed between version w/o notifications in the changelog
3) you just want to help to improve ClickHouse quality and ensure the features you use will not be broken in the future releases
4) once the test is added/accepted, you can be sure the corner case you check will never be accidentally broken.
5) you will be a part of great open-source community
6) your name will be visible in the system.contributors table!
7) you will make a world bit better :)

Steps to do

Prerequisite

I assume you run some Linux machine (you can use docker / virtual machines on other OS) and any modern browser / internet connection, and you have some basic Linux & SQL skills.

Any highly specialized knowledge is not needed (so you don’t need to know C++ or know something about how ClickHouse CI works).

Preparation

1) create GitHub account (if you haven’t one yet)
2) setup git

  1. # for Ubuntu
  2. sudo apt-get update
  3. sudo apt-get install git
  4. git config --global user.name "John Doe" # fill with your name
  5. git config --global user.email "[email protected]" # fill with your email

3) fork ClickHouse project - just open https://github.com/ClickHouse/ClickHouse and press fork button in the top right corner:
fork repo

4) clone your fork to some folder on your PC, for example, ~/workspace/ClickHouse

  1. mkdir ~/workspace && cd ~/workspace
  2. git clone https://github.com/< your GitHub username>/ClickHouse
  3. cd ClickHouse
  4. git remote add upstream https://github.com/ClickHouse/ClickHouse

New branch for the test

1) create a new branch from the latest clickhouse master

  1. cd ~/workspace/ClickHouse
  2. git fetch upstream
  3. git checkout -b name_for_a_branch_with_my_test upstream/master

Install & run clickhouse

1) install clickhouse-server (follow official docs)
2) install test configurations (it will use Zookeeper mock implementation and adjust some settings)

  1. cd ~/workspace/ClickHouse/tests/config
  2. sudo ./install.sh

3) run clickhouse-server

  1. sudo systemctl restart clickhouse-server

Creating the test file

1) find the number for your test - find the file with the biggest number in tests/queries/0_stateless/

  1. $ cd ~/workspace/ClickHouse
  2. $ ls tests/queries/0_stateless/[0-9]*.reference | tail -n 1
  3. tests/queries/0_stateless/01520_client_print_query_id.reference

Currently, the last number for the test is 01520, so my test will have the number 01521

2) create an SQL file with the next number and name of the feature you test

  1. touch tests/queries/0_stateless/01521_dummy_test.sql

3) edit SQL file with your favorite editor (see hint of creating tests below)

  1. vim tests/queries/0_stateless/01521_dummy_test.sql

4) run the test, and put the result of that into the reference file:

  1. clickhouse-client -nmT < tests/queries/0_stateless/01521_dummy_test.sql | tee tests/queries/0_stateless/01521_dummy_test.reference

5) ensure everything is correct, if the test output is incorrect (due to some bug for example), adjust the reference file using text editor.

How create good test

  • test should be
    • minimal - create only tables related to tested functionality, remove unrelated columns and parts of query
    • fast - should not take longer than few seconds (better subseconds)
    • correct - fails then feature is not working
      • deteministic
    • isolated / stateless
      • don’t rely on some environment things
      • don’t rely on timing when possible
  • try to cover corner cases (zeros / Nulls / empty sets / throwing exceptions)
  • to test that query return errors, you can put special comment after the query: -- { serverError 60 } or -- { clientError 20 }
  • don’t switch databases (unless necessary)
  • you can create several table replicas on the same node if needed
  • you can use one of the test cluster definitions when needed (see system.clusters)
  • use number / numbers_mt / zeros / zeros_mt and similar for queries / to initialize data when appliable
  • clean up the created objects after test and before the test (DROP IF EXISTS) - in case of some dirty state
  • prefer sync mode of operations (mutations, merges, etc.)
  • use other SQL files in the 0_stateless folder as an example
  • ensure the feature / feature combination you want to tests is not covered yet with existsing tests

Commit / push / create PR.

1) commit & push your changes

  1. cd ~/workspace/ClickHouse
  2. git add tests/queries/0_stateless/01521_dummy_test.sql
  3. git add tests/queries/0_stateless/01521_dummy_test.reference
  4. git commit # use some nice commit message when possible
  5. git push origin HEAD

2) use a link which was shown during the push, to create a PR into the main repo
3) adjust the PR title and contents, in Changelog category (leave one) keep
Build/Testing/Packaging Improvement, fill the rest of the fields if you want.