Tang server encryption key management

The cryptographic mechanism to recreate the encryption key is based on the blinded key stored on the node and the private key of the involved Tang servers. To protect against the possibility of an attacker who has obtained both the Tang server private key and the node’s encrypted disk, periodic rekeying is advisable.

You must perform the rekeying operation for every node before you can delete the old key from the Tang server. The following sections provide procedures for rekeying and deleting old keys.

Backing up keys for a Tang server

The Tang server, by default, stores its keys in the /usr/libexec/tangd-keygen directory. Back up the contents of this directory to enable recovery in the event of the loss of the Tang server. The keys are sensitive and since they are able to perform the boot disk decryption of all hosts that have used them, the keys must be protected accordingly.

Procedure

  • Copy the backup key from the /var/db/tang directory to the temp directory from which you can restore the key.

Recovering keys for a Tang server

You can recover the keys for a Tang server by accessing the keys from a backup.

Procedure

  • Restore the key from your backup folder to the /var/db/tang/ directory.

    When the Tang server starts up, it advertises and uses these restored keys.

Rekeying Tang servers

This procedure uses a set of three Tang servers, each with unique keys, as an example.

Using redundant Tang servers reduces the chances of nodes failing to boot automatically.

Rekeying a Tang server, and all associated NBDE-encrypted nodes, is a three-step procedure.

Prerequisites

  • A working Network-Bound Disk Encryption (NBDE) installation on one or more nodes.

Procedure

  1. Generate a new Tang server key.

  2. Rekey all NBDE-encrypted nodes so they use the new key.

  3. Delete the old Tang server key.

    Deleting the old key before all NBDE-encrypted nodes have completed their rekeying causes those nodes to become overly dependent on any other configured Tang servers.

Rekeying a Tang server

Figure 1. Example workflow for rekeying a Tang server

Generating a new Tang server key

Prerequisites

  • A root shell on the Linux machine running the Tang server.

  • To facilitate verification of the Tang server key rotation, encrypt a small test file with the old key:

    1. # echo plaintext | clevis encrypt tang '{"url":"http://localhost:7500”}' -y >/tmp/encrypted.oldkey
  • Verify that the encryption succeeded and the file can be decrypted to produce the same string plaintext:

    1. # clevis decrypt </tmp/encrypted.oldkey

Procedure

  1. Locate and access the directory that stores the Tang server key. This is usually the /var/db/tang directory. Check the currently advertised key thumbprint:

    1. # tang-show-keys 7500

    Example output

    1. 36AHjNH3NZDSnlONLz1-V4ie6t8
  2. Enter the Tang server key directory:

    1. # cd /var/db/tang/
  3. List the current Tang server keys:

    1. # ls -A1

    Example output

    1. 36AHjNH3NZDSnlONLz1-V4ie6t8.jwk
    2. gJZiNPMLRBnyo_ZKfK4_5SrnHYo.jwk

    During normal Tang server operations, there are two .jwk files in this directory: one for signing and verification, and another for key derivation.

  4. Disable advertisement of the old keys:

    1. # for key in *.jwk; do \
    2. mv -- "$key" ".$key"; \
    3. done

    New clients setting up Network-Bound Disk Encryption (NBDE) or requesting keys will no longer see the old keys. Existing clients can still access and use the old keys until they are deleted. The Tang server reads but does not advertise keys stored in UNIX hidden files, which start with the . character.

  5. Generate a new key:

    1. # /usr/libexec/tangd-keygen /var/db/tang
  6. List the current Tang server keys to verify the old keys are no longer advertised, as they are now hidden files, and new keys are present:

    1. # ls -A1

    Example output

    1. .36AHjNH3NZDSnlONLz1-V4ie6t8.jwk
    2. .gJZiNPMLRBnyo_ZKfK4_5SrnHYo.jwk
    3. Bp8XjITceWSN_7XFfW7WfJDTomE.jwk
    4. WOjQYkyK7DxY_T5pMncMO5w0f6E.jwk

    Tang automatically advertises the new keys.

    More recent Tang server installations include a helper /usr/libexec/tangd-rotate-keys directory that takes care of disabling advertisement and generating the new keys simultaneously.

  7. If you are running multiple Tang servers behind a load balancer that share the same key material, ensure the changes made here are properly synchronized across the entire set of servers before proceeding.

Verification

  1. Verify that the Tang server is advertising the new key, and not advertising the old key:

    1. # tang-show-keys 7500

    Example output

    1. WOjQYkyK7DxY_T5pMncMO5w0f6E
  2. Verify that the old key, while not advertised, is still available to decryption requests:

    1. # clevis decrypt </tmp/encrypted.oldkey

Rekeying all NBDE nodes

You can rekey all of the nodes on a remote cluster by using a DaemonSet object without incurring any downtime to the remote cluster.

If a node loses power during the rekeying, it is possible that it might become unbootable, and must be redeployed via Red Hat Advanced Cluster Management (RHACM) or a GitOps pipeline.

Prerequisites

  • cluster-admin access to all clusters with Network-Bound Disk Encryption (NBDE) nodes.

  • All Tang servers, not just the server being rotated, must be accessible to every NBDE node undergoing rekeying.

  • Obtain the Tang server URL and key thumbprint for every Tang server.

Procedure

  1. Create a DaemonSet object based on the following template. This template sets up three redundant Tang servers, but can be easily adapted to other situations. Change the Tang server URLs and thumbprints in the NEW_TANG_PIN environment to suit your environment:

    1. apiVersion: apps/v1
    2. kind: DaemonSet
    3. metadata:
    4. name: tang-rekey
    5. namespace: openshift-machine-config-operator
    6. spec:
    7. selector:
    8. matchLabels:
    9. name: tang-rekey
    10. template:
    11. metadata:
    12. labels:
    13. name: tang-rekey
    14. spec:
    15. containers:
    16. - name: tang-rekey
    17. image: registry.access.redhat.com/ubi8/ubi-minimal:8.4
    18. imagePullPolicy: IfNotPresent
    19. command:
    20. - "/sbin/chroot"
    21. - "/host"
    22. - "/bin/bash"
    23. - "-ec"
    24. args:
    25. - |
    26. rm -f /tmp/rekey-complete || true
    27. echo "Current tang pin:"
    28. clevis-luks-list -d $ROOT_DEV -s 1
    29. echo "Applying new tang pin: $NEW_TANG_PIN"
    30. clevis-luks-edit -f -d $ROOT_DEV -s 1 -c "$NEW_TANG_PIN"
    31. echo "Pin applied successfully"
    32. touch /tmp/rekey-complete
    33. sleep infinity
    34. readinessProbe:
    35. exec:
    36. command:
    37. - cat
    38. - /host/tmp/rekey-complete
    39. initialDelaySeconds: 30
    40. periodSeconds: 10
    41. env:
    42. - name: ROOT_DEV
    43. value: /dev/disk/by-partlabel/root
    44. - name: NEW_TANG_PIN
    45. value: >-
    46. {"t":1,"pins":{"tang":[
    47. {"url":"http://tangserver01:7500","thp":"WOjQYkyK7DxY_T5pMncMO5w0f6E"},
    48. {"url":"http://tangserver02:7500","thp":"I5Ynh2JefoAO3tNH9TgI4obIaXI"},
    49. {"url":"http://tangserver03:7500","thp":"38qWZVeDKzCPG9pHLqKzs6k1ons"}
    50. ]}}
    51. volumeMounts:
    52. - name: hostroot
    53. mountPath: /host
    54. securityContext:
    55. privileged: true
    56. volumes:
    57. - name: hostroot
    58. hostPath:
    59. path: /
    60. nodeSelector:
    61. kubernetes.io/os: linux
    62. priorityClassName: system-node-critical
    63. restartPolicy: Always
    64. serviceAccount: machine-config-daemon
    65. serviceAccountName: machine-config-daemon

    In this case, even though you are rekeying tangserver01, you must specify not only the new thumbprint for tangserver01, but also the current thumbprints for all other Tang servers. Failure to specify all thumbprints for a rekeying operation opens up the opportunity for a man-in-the-middle attack.

  2. To distribute the daemon set to every cluster that must be rekeyed, run the following command:

    1. $ oc apply -f tang-rekey.yaml

    However, to run at scale, wrap the daemon set in an ACM policy. This ACM configuration must contain one policy to deploy the daemon set, a second policy to check that all the daemon set pods are READY, and a placement rule to apply it to the appropriate set of clusters.

After validating that the daemon set has successfully rekeyed all servers, delete the daemon set. If you do not delete the daemon set, it must be deleted before the next rekeying operation.

Verification

After you distribute the daemon set, monitor the daemon sets to ensure that the rekeying has completed successfully. The script in the example daemon set terminates with an error if the rekeying failed, and remains in the CURRENT state if successful. There is also a readiness probe that marks the pod as READY when the rekeying has completed successfully.

  • This is an example of the output listing for the daemon set before the rekeying has completed:

    1. $ oc get -n openshift-machine-config-operator ds tang-rekey

    Example output

    1. NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
    2. tang-rekey 1 1 0 1 0 kubernetes.io/os=linux 11s
  • This is an example of the output listing for the daemon set after the rekeying has completed successfully:

    1. $ oc get -n openshift-machine-config-operator ds tang-rekey

    Example output

    1. NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
    2. tang-rekey 1 1 1 1 1 kubernetes.io/os=linux 13h

Rekeying usually takes a few minutes to complete.

If you use ACM policies to distribute the daemon sets to multiple clusters, you must include a compliance policy that checks every daemon set’s READY count is equal to the DESIRED count. In this way, compliance to such a policy demonstrates that all daemon set pods are READY and the rekeying has completed successfully. You could also use an ACM search to query all of the daemon sets’ states.

Troubleshooting temporary rekeying errors for Tang servers

To determine if the error condition from rekeying the Tang servers is temporary, perform the following procedure. Temporary error conditions might include:

  • Temporary network outages

  • Tang server maintenance

Generally, when these types of temporary error conditions occur, you can wait until the daemon set succeeds in resolving the error or you can delete the daemon set and not try again until the temporary error condition has been resolved.

Procedure

  1. Restart the pod that performs the rekeying operation using the normal Kubernetes pod restart policy.

  2. If any of the associated Tang servers are unavailable, try rekeying until all the servers are back online.

Troubleshooting permanent rekeying errors for Tang servers

If, after rekeying the Tang servers, the READY count does not equal the DESIRED count after an extended period of time, it might indicate a permanent failure condition. In this case, the following conditions might apply:

  • A typographical error in the Tang server URL or thumbprint in the NEW_TANG_PIN definition.

  • The Tang server is decommissioned or the keys are permanently lost.

Prerequisites

  • The commands shown in this procedure can be run on the Tang server or on any Linux system that has network access to the Tang server.

Procedure

  1. Validate the Tang server configuration by performing a simple encrypt and decrypt operation on each Tang server’s configuration as defined in the daemon set.

    This is an example of an encryption and decryption attempt with a bad thumbprint:

    1. $ echo "okay" | clevis encrypt tang \
    2. '{"url":"http://tangserver02:7500","thp":"badthumbprint"}' | \
    3. clevis decrypt

    Example output

    1. Unable to fetch advertisement: 'http://tangserver02:7500/adv/badthumbprint'!

    This is an example of an encryption and decryption attempt with a good thumbprint:

    1. $ echo "okay" | clevis encrypt tang \
    2. '{"url":"http://tangserver03:7500","thp":"goodthumbprint"}' | \
    3. clevis decrypt

    Example output

    1. okay
  2. After you identify the root cause, remedy the underlying situation:

    1. Delete the non-working daemon set.

    2. Edit the daemon set definition to fix the underlying issue. This might include any of the following actions:

      • Edit a Tang server entry to correct the URL and thumbprint.

      • Remove a Tang server that is no longer in service.

      • Add a new Tang server that is a replacement for a decommissioned server.

  1. Distribute the updated daemon set again.

When replacing, removing, or adding a Tang server from a configuration, the rekeying operation will succeed as long as at least one original server is still functional, including the server currently being rekeyed. If none of the original Tang servers are functional or can be recovered, recovery of the system is impossible and you must redeploy the affected nodes.

Verification

  • Check the logs from each pod in the daemon set to determine whether the rekeying completed successfully. If the rekeying is not successful, the logs might indicate the failure condition. The following log is from a completed successful rekeying operation:

    1. $ oc logs rekey-tang-kp4q2

    Example output

    1. Current tang pin:
    2. 1: sss '{"t":1,"pins":{"tang":[{"url":"http://10.46.55.192:7500"},{"url":"http://10.46.55.192:7501"},{"url":"http://10.46.55.192:7502"}]}}'
    3. Applying new tang pin: {"t":1,"pins":{"tang":[
    4. {"url":"http://tangserver01:7500","thp":"WOjQYkyK7DxY_T5pMncMO5w0f6E"},
    5. {"url":"http://tangserver02:7500","thp":"I5Ynh2JefoAO3tNH9TgI4obIaXI"},
    6. {"url":"http://tangserver03:7500","thp":"38qWZVeDKzCPG9pHLqKzs6k1ons"}
    7. ]}}
    8. Updating binding...
    9. Binding edited successfully
    10. Pin applied successfully

Deleting old Tang server keys

Prerequisites

  • A root shell on the Linux machine running the Tang server.

Procedure

  1. Locate and access the directory where the Tang server key is stored. This is usually the /var/db/tang directory:

    1. # cd /var/db/tang/
  2. List the current Tang server keys, showing the advertised and unadvertised keys:

    1. # ls -A1

    Example output

    1. .36AHjNH3NZDSnlONLz1-V4ie6t8.jwk
    2. .gJZiNPMLRBnyo_ZKfK4_5SrnHYo.jwk
    3. Bp8XjITceWSN_7XFfW7WfJDTomE.jwk
    4. WOjQYkyK7DxY_T5pMncMO5w0f6E.jwk
  3. Delete the old keys:

    1. # rm .*.jwk
  4. List the current Tang server keys to verify the unadvertised keys are no longer present:

    1. # ls -A1

    Example output

    1. Bp8XjITceWSN_7XFfW7WfJDTomE.jwk
    2. WOjQYkyK7DxY_T5pMncMO5w0f6E.jwk

Verification

At this point, the server still advertises the new keys, but an attempt to decrypt based on the old key will fail.

  1. Query the Tang server for the current advertised key thumbprints:

    1. # tang-show-keys 7500

    Example output

    1. WOjQYkyK7DxY_T5pMncMO5w0f6E
  2. Decrypt the test file created earlier to verify decryption against the old keys fails:

    1. # clevis decrypt </tmp/encryptValidation

    Example output

    1. Error communicating with the server!

If you are running multiple Tang servers behind a load balancer that share the same key material, ensure the changes made are properly synchronized across the entire set of servers before proceeding.