Nightly Builds

Bugs affecting more than one service commonly surface only when the whole system is running. Our continuous integration system builds and runs a suite of integration tests against Spinnaker nightly on real cloud provider infrastructure to detect these bugs.

Access the CI system here: https://builds.spinnaker.io

Viewers must be a member of the build-cops GitHub Team.

Build Cop

The build cop responsibilities include:

  • Triage integration test failures on master and the 3 most recent release branches
  • Clean up orphaned resources across target cloud providers
  • Route new GitHub issues to the appropriate SIG (applying GitHub labels as appropriate). You can find the full list of SIGs in the governance repo
  • Observe any systemic problems raised in the #general and #dev Slack channels
  • Log observations and corrective actions taken in the rotation log

Process Structure

The CI system comprises both jobs, which do a specific task, and flows, which invoke a series of jobs.

<code>Flow_BuildAndValidate</code> is the primary entry point for the master branch flow.

Flow_BuildAndValidate_<version> is the the entry point for the respective top-level release. It is a copy of the primary Flow_BuildAndValidate when that release was cut. Top-level release flows work off their respective release-1.ABC.x branches.

As its name implies, Flow_BuildAndValidate builds and tests the whole Spinnaker system. It follows this general process:

1. Build_PrimaryArtifacts

  1. git checkout all services
  2. Constructs a BOM from the most recent commit on the target branch
  3. Builds a Docker container and a Debian package of each Spinnaker microservice.
  4. Builds additional supporting artifacts:
  • halyard
  • spin-cli
  • Changelog
  1. Publishes the BOM under the following names:
  • With the floating tag: <branchName>-latest-unvalidated (e.g. master-latest-unvalidated)
  • With a fixed tag: <branchName>-<timestamp> (e.g. master-20191213154039)
  1. Publishes the changelog

2. Validate_BomAndReportMultiPlatform

For uninteresting reasons, this job must wrap the following ValidateBomMultiPlatform in order to aggregate its results.

3. ValidateBomMultiPlatform

This “Multi-configuration project” specifies the same test(s) to run across different environments. This confirms Spinnaker works whether deployed as a single VM or in a Kubernetes cluster, for instance.

  1. Starts Halyard in a new VM
  2. Connects to this instance and executes a series of hal config steps, including account setup for the managed cloud provider(s).
  3. Deploys the configuration with hal deploy apply.
  4. Invokes <code>citest</code> integration tests against the new Spinnaker instance.
    • citest invokes a command to Spinnaker, and then uses the underlying cloud provider’s CLI to confirm the expected changes were made. For example, using gcloud to confirm a GCE server group was created or deleted.

Cleaning Orphaned Resources

Occasionally, integration tests fail in a way that is either undesirable or difficult to automatically clean up. Build cops should periodically ensure these orphaned resources are deleted from the following locations:

Deleting Obsolete Artifacts

The following jobs assist in removing old artifacts created during the build process:

Troubleshooting Playbook

Check whether the failure happened during the build or the test phase:

  1. Click the failing Flow.

    ![](troubleshooting - base - 10 - flow.png)

  2. Click for the most recent failing build.

    ![](troubleshooting - base - 20 - mostRecent.png)

  3. Click through to the failing phase.

    ![](troubleshooting - base - 30 - phase.png)

Build Failures

  1. The build phase uses many subshells to perform its work in parallel. Use the Console Output to help narrow down which step of the build has failed, and use the collected logs to view more information on what specificially went wrong.

    ![](troubleshooting - build - 10 - consoleOutput.png)

  2. The Console Output prints out after each completion how much work is still remaining.

    ![](troubleshooting - build - 20 - buildSteps.png)

  3. Frequently, the build error will be printed out directly to the Console Output, but sometimes this output can be hard to read. View the raw file directly using the Build Artifacts link from Step 1.

    ![](troubleshooting - build - 30 - failedOutput.png)

Common Build Failures

Bintray Conflicts

If an artifact is uploaded to the Bintray repository but never published (either because of a transient Bintray error or an interrupted build), you’ll get an error like this:

Bintray API Request ‘create version 0.20.0-20200512192702’ failed with HTTP response 409 Conflict

Follow these steps to delete the artifact and resolve the issue:

  1. Navigate to the specific version in the Bintray repository

  2. Click on the Spinnaker repository that had the failure. (If you don’t see it, click to the next page; there are only 10 items per page for some reason.)

  3. Click on the specific version that had the issue.

  4. Click “Actions” in the upper right and select “Edit”.

  5. On the next page, click the “Delete” link in the upper right. It will look like nothing happened, but after 10 seconds or so, the page will refresh and the version will be gone.

Now that the conflict has been removed, you can restart the build.

Test Failures

  1. View the Test Results Overview.

    ![](troubleshooting - test - 10 - testResultsOverview.png)

  2. Identify the failing test.

    ![](troubleshooting - test - 20 - failingTest.png)

  3. Identify which step in the test is failing.

    ![](troubleshooting - test - 30 - failingStep.png)

  4. It can sometimes help to view the last call that was made prior to that stage failing.

    ![](troubleshooting - test - 40 - failingDetails.png)

Connecting to the Jenkins VM

Members of the jenkins-debuggers@spinnaker.io group have access to SSH directly to the Jenkins VM. You can connect to the instance with this command:

  1. $ gcloud compute ssh --project spinnaker-community jenkins-transfer --zone us-central1-f --ssh-flag "-L 4040:test-jenkins:8080"

The extra --ssh-flag establishes a tunnel to the test-jenkins instance, which is used to trigger some integration tests. You can view this instance at http://localhost:4040 after the connection is established.

Change to jenkins user

All processes are run as the jenkins user and most of the useful links are in /home/jenkins. Switch to it with:

  1. $ sudo su - jenkins

Last modified June 24, 2021: Redesign Progress (#83) (8231bcf)