Understanding the File Integrity Operator

The File Integrity Operator is an OKD Operator that continually runs file integrity checks on the cluster nodes. It deploys a daemon set that initializes and runs privileged advanced intrusion detection environment (AIDE) containers on each node, providing a status object with a log of files that are modified during the initial run of the daemon set pods.

Currently, only Fedora CoreOS (FCOS) nodes are supported.

Creating the FileIntegrity custom resource

An instance of a FileIntegrity custom resource (CR) represents a set of continuous file integrity scans for one or more nodes.

Each FileIntegrity CR is backed by a daemon set running AIDE on the nodes matching the FileIntegrity CR specification.

Procedure

  1. Create the following example FileIntegrity CR named worker-fileintegrity.yaml to enable scans on worker nodes:

    Example FileIntegrity CR

    1. apiVersion: fileintegrity.openshift.io/v1alpha1
    2. kind: FileIntegrity
    3. metadata:
    4. name: worker-fileintegrity
    5. namespace: openshift-file-integrity
    6. spec:
    7. nodeSelector: (1)
    8. node-role.kubernetes.io/worker: ""
    9. tolerations: (2)
    10. - key: "myNode"
    11. operator: "Exists"
    12. effect: "NoSchedule"
    13. config: (3)
    14. name: "myconfig"
    15. namespace: "openshift-file-integrity"
    16. key: "config"
    17. gracePeriod: 20 (4)
    18. maxBackups: 5 (5)
    19. initialDelay: 60 (6)
    20. debug: false
    21. status:
    22. phase: Active (7)
    1Defines the selector for scheduling node scans.
    2Specify tolerations to schedule on nodes with custom taints. When not specified, a default toleration allowing running on main and infra nodes is applied.
    3Define a ConfigMap containing an AIDE configuration to use.
    4The number of seconds to pause in between AIDE integrity checks. Frequent AIDE checks on a node might be resource intensive, so it can be useful to specify a longer interval. Default is 900 seconds (15 minutes).
    5The maximum number of AIDE database and log backups (leftover from the re-init process) to keep on a node. Older backups beyond this number are automatically pruned by the daemon. Default is set to 5.
    6The number of seconds to wait before starting the first AIDE integrity check. Default is set to 0.
    7The running status of the FileIntegrity instance. Statuses are Initializing, Pending, or Active.
    Initializing

    The FileIntegrity object is currently initializing or re-initializing the AIDE database.

    Pending

    The FileIntegrity deployment is still being created.

    Active

    The scans are active and ongoing.

  2. Apply the YAML file to the openshift-file-integrity namespace:

    1. $ oc apply -f worker-fileintegrity.yaml -n openshift-file-integrity

Verification

  • Confirm the FileIntegrity object was created successfully by running the following command:

    1. $ oc get fileintegrities -n openshift-file-integrity

    Example output

    1. NAME AGE
    2. worker-fileintegrity 14s

Checking the FileIntegrity custom resource status

The FileIntegrity custom resource (CR) reports its status through the .status.phase subresource.

Procedure

  • To query the FileIntegrity CR status, run:

    1. $ oc get fileintegrities/worker-fileintegrity -o jsonpath="{ .status.phase }"

    Example output

    1. Active

FileIntegrity custom resource phases

  • Pending - The phase after the custom resource (CR) is created.

  • Active - The phase when the backing daemon set is up and running.

  • Initializing - The phase when the AIDE database is being reinitialized.

Understanding the FileIntegrityNodeStatuses object

The scan results of the FileIntegrity CR are reported in another object called FileIntegrityNodeStatuses.

  1. $ oc get fileintegritynodestatuses

Example output

  1. NAME AGE
  2. worker-fileintegrity-ip-10-0-130-192.ec2.internal 101s
  3. worker-fileintegrity-ip-10-0-147-133.ec2.internal 109s
  4. worker-fileintegrity-ip-10-0-165-160.ec2.internal 102s

It might take some time for the FileIntegrityNodeStatus object results to be available.

There is one result object per node. The nodeName attribute of each FileIntegrityNodeStatus object corresponds to the node being scanned. The status of the file integrity scan is represented in the results array, which holds scan conditions.

  1. $ oc get fileintegritynodestatuses.fileintegrity.openshift.io -ojsonpath='{.items[*].results}' | jq

The fileintegritynodestatus object reports the latest status of an AIDE run and exposes the status as Failed, Succeeded, or Errored in a status field.

  1. $ oc get fileintegritynodestatuses -w

Example output

  1. NAME NODE STATUS
  2. example-fileintegrity-ip-10-0-134-186.us-east-2.compute.internal ip-10-0-134-186.us-east-2.compute.internal Succeeded
  3. example-fileintegrity-ip-10-0-150-230.us-east-2.compute.internal ip-10-0-150-230.us-east-2.compute.internal Succeeded
  4. example-fileintegrity-ip-10-0-169-137.us-east-2.compute.internal ip-10-0-169-137.us-east-2.compute.internal Succeeded
  5. example-fileintegrity-ip-10-0-180-200.us-east-2.compute.internal ip-10-0-180-200.us-east-2.compute.internal Succeeded
  6. example-fileintegrity-ip-10-0-194-66.us-east-2.compute.internal ip-10-0-194-66.us-east-2.compute.internal Failed
  7. example-fileintegrity-ip-10-0-222-188.us-east-2.compute.internal ip-10-0-222-188.us-east-2.compute.internal Succeeded
  8. example-fileintegrity-ip-10-0-134-186.us-east-2.compute.internal ip-10-0-134-186.us-east-2.compute.internal Succeeded
  9. example-fileintegrity-ip-10-0-222-188.us-east-2.compute.internal ip-10-0-222-188.us-east-2.compute.internal Succeeded
  10. example-fileintegrity-ip-10-0-194-66.us-east-2.compute.internal ip-10-0-194-66.us-east-2.compute.internal Failed
  11. example-fileintegrity-ip-10-0-150-230.us-east-2.compute.internal ip-10-0-150-230.us-east-2.compute.internal Succeeded
  12. example-fileintegrity-ip-10-0-180-200.us-east-2.compute.internal ip-10-0-180-200.us-east-2.compute.internal Succeeded

FileIntegrityNodeStatus CR status types

These conditions are reported in the results array of the corresponding FileIntegrityNodeStatus CR status:

  • Succeeded - The integrity check passed; the files and directories covered by the AIDE check have not been modified since the database was last initialized.

  • Failed - The integrity check failed; some files or directories covered by the AIDE check have been modified since the database was last initialized.

  • Errored - The AIDE scanner encountered an internal error.

FileIntegrityNodeStatus CR success example

Example output of a condition with a success status

  1. [
  2. {
  3. "condition": "Succeeded",
  4. "lastProbeTime": "2020-09-15T12:45:57Z"
  5. }
  6. ]
  7. [
  8. {
  9. "condition": "Succeeded",
  10. "lastProbeTime": "2020-09-15T12:46:03Z"
  11. }
  12. ]
  13. [
  14. {
  15. "condition": "Succeeded",
  16. "lastProbeTime": "2020-09-15T12:45:48Z"
  17. }
  18. ]

In this case, all three scans succeeded and so far there are no other conditions.

FileIntegrityNodeStatus CR failure status example

To simulate a failure condition, modify one of the files AIDE tracks. For example, modify /etc/resolv.conf on one of the worker nodes:

  1. $ oc debug node/ip-10-0-130-192.ec2.internal

Example output

  1. Creating debug namespace/openshift-debug-node-ldfbj ...
  2. Starting pod/ip-10-0-130-192ec2internal-debug ...
  3. To use host binaries, run `chroot /host`
  4. Pod IP: 10.0.130.192
  5. If you don't see a command prompt, try pressing enter.
  6. sh-4.2# echo "# integrity test" >> /host/etc/resolv.conf
  7. sh-4.2# exit
  8. Removing debug pod ...
  9. Removing debug namespace/openshift-debug-node-ldfbj ...

After some time, the Failed condition is reported in the results array of the corresponding FileIntegrityNodeStatus object. The previous Succeeded condition is retained, which allows you to pinpoint the time the check failed.

  1. $ oc get fileintegritynodestatuses.fileintegrity.openshift.io/worker-fileintegrity-ip-10-0-130-192.ec2.internal -ojsonpath='{.results}' | jq -r

Alternatively, if you are not mentioning the object name, run:

  1. $ oc get fileintegritynodestatuses.fileintegrity.openshift.io -ojsonpath='{.items[*].results}' | jq

Example output

  1. [
  2. {
  3. "condition": "Succeeded",
  4. "lastProbeTime": "2020-09-15T12:54:14Z"
  5. },
  6. {
  7. "condition": "Failed",
  8. "filesChanged": 1,
  9. "lastProbeTime": "2020-09-15T12:57:20Z",
  10. "resultConfigMapName": "aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed",
  11. "resultConfigMapNamespace": "openshift-file-integrity"
  12. }
  13. ]

The Failed condition points to a config map that gives more details about what exactly failed and why:

  1. $ oc describe cm aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed

Example output

  1. Name: aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed
  2. Namespace: openshift-file-integrity
  3. Labels: file-integrity.openshift.io/node=ip-10-0-130-192.ec2.internal
  4. file-integrity.openshift.io/owner=worker-fileintegrity
  5. file-integrity.openshift.io/result-log=
  6. Annotations: file-integrity.openshift.io/files-added: 0
  7. file-integrity.openshift.io/files-changed: 1
  8. file-integrity.openshift.io/files-removed: 0
  9. Data
  10. integritylog:
  11. ------
  12. AIDE 0.15.1 found differences between database and filesystem!!
  13. Start timestamp: 2020-09-15 12:58:15
  14. Summary:
  15. Total number of files: 31553
  16. Added files: 0
  17. Removed files: 0
  18. Changed files: 1
  19. ---------------------------------------------------
  20. Changed files:
  21. ---------------------------------------------------
  22. changed: /hostroot/etc/resolv.conf
  23. ---------------------------------------------------
  24. Detailed information about changes:
  25. ---------------------------------------------------
  26. File: /hostroot/etc/resolv.conf
  27. SHA512 : sTQYpB/AL7FeoGtu/1g7opv6C+KT1CBJ , qAeM+a8yTgHPnIHMaRlS+so61EN8VOpg
  28. Events: <none>

Due to the config map data size limit, AIDE logs over 1 MB are added to the failure config map as a base64-encoded gzip archive. In this case, you want to pipe the output of the above command to base64 --decode | gunzip. Compressed logs are indicated by the presence of a file-integrity.openshift.io/compressed annotation key in the config map.

Understanding events

Transitions in the status of the FileIntegrity and FileIntegrityNodeStatus objects are logged by events. The creation time of the event reflects the latest transition, such as Initializing to Active, and not necessarily the latest scan result. However, the newest event always reflects the most recent status.

  1. $ oc get events --field-selector reason=FileIntegrityStatus

Example output

  1. LAST SEEN TYPE REASON OBJECT MESSAGE
  2. 97s Normal FileIntegrityStatus fileintegrity/example-fileintegrity Pending
  3. 67s Normal FileIntegrityStatus fileintegrity/example-fileintegrity Initializing
  4. 37s Normal FileIntegrityStatus fileintegrity/example-fileintegrity Active

When a node scan fails, an event is created with the add/changed/removed and config map information.

  1. $ oc get events --field-selector reason=NodeIntegrityStatus

Example output

  1. LAST SEEN TYPE REASON OBJECT MESSAGE
  2. 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-134-173.ec2.internal
  3. 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-168-238.ec2.internal
  4. 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-169-175.ec2.internal
  5. 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-152-92.ec2.internal
  6. 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-158-144.ec2.internal
  7. 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-131-30.ec2.internal
  8. 87m Warning NodeIntegrityStatus fileintegrity/example-fileintegrity node ip-10-0-152-92.ec2.internal has changed! a:1,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed

Changes to the number of added, changed, or removed files results in a new event, even if the status of the node has not transitioned.

  1. $ oc get events --field-selector reason=NodeIntegrityStatus

Example output

  1. LAST SEEN TYPE REASON OBJECT MESSAGE
  2. 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-134-173.ec2.internal
  3. 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-168-238.ec2.internal
  4. 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-169-175.ec2.internal
  5. 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-152-92.ec2.internal
  6. 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-158-144.ec2.internal
  7. 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-131-30.ec2.internal
  8. 87m Warning NodeIntegrityStatus fileintegrity/example-fileintegrity node ip-10-0-152-92.ec2.internal has changed! a:1,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed
  9. 40m Warning NodeIntegrityStatus fileintegrity/example-fileintegrity node ip-10-0-152-92.ec2.internal has changed! a:3,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed