IO chaos allows you to simulate file system faults such as IO delay,read/write errors, etc. It can inject delay and errno when you use the IO system calls such as open, read and write.

Note: IO Chaos can only be used if the relevant labels and annotations are set before the application is created. For more information, refer to Create a chaos experiment

Prerequisites

Commands and args for the application container

Chaos Mesh uses wait-fush.sh to ensure that the fuse-daemon server is running normally before the application starts.so wait-fush.sh needs to be injected into the startup command of the container. If the application process is not started by the commands and args of the container,IO chaos won't work properly. When Kubernetes natively supports Sidecar Containers in future versions, we will remove the wait-fush.sh dependency.

Admission Controller

IO chaos needs to inject a sidecar container to user pods and the sidecar container can be added to applicable Kubernetes podsusing a mutating webhook admission controller provided by Chaos Mesh.

While admission controllers are enabled by default, some Kubernetes distributions may disable them. If this is the case, follow the instructions to turn on admission controllers.
ValidatingAdmissionWebhooks and MutatingAdmissionWebhooks are required by IO chaos.

Data directory

The data directory of the application in the target pod should be a subdirectory of PersistentVolumes.

example:

  1. # the config about tikv PersistentVolumes
  2. volumeMounts:
  3. - name: datadir
  4. mountPath: /var/lib/tikv
  5.  
  6. # the arguments to start tikv
  7. ARGS="--pd=${CLUSTER_NAME}-pd:2379 \
  8. --advertise-addr=${HOSTNAME}.${HEADLESS_SERVICE_NAME}.${NAMESPACE}.svc:20160 \
  9. --addr=0.0.0.0:20160 \
  10. --data-dir=/var/lib/tikv/data \ # data directory
  11. --capacity=${CAPACITY} \
  12. --config=/etc/tikv/tikv.toml

Node: The default data directory of TiKV is not a subdirectory of PersistentVolumes. If your application is TiDB cluster, you need to modify it at _start_tikv.sh.tpl. PD has the same issue with TiKV, you need to modity the data directory of pd at _start_pd.sh.tpl.

Steps to run IO Chaos

Configure a ConfigMap

Chaos Mesh uses sidecar container to inject IO chaos,to fulfill this chaos you need to configure this sidecar container using a ConfigMapYou can refer to this document to define a ConfigMap for your application before starting your chaos experiment.

You can apply the ConfigMap defined for your application to Kubernetes cluster by using this command:

  1. kubectl apply -f app-configmap.yaml # app-configmap.yaml is the ConfigMap file

Define the chaos config file

Below is a sample YAML file of IO chaos:

  1. apiVersion: pingcap.com/v1alpha1
  2. kind: IoChaos
  3. metadata:
  4. name: io-delay-example
  5. namespace: chaos-testing
  6. spec:
  7. action: mixed
  8. mode: one
  9. duration: "400s"
  10. configName: "chaosfs-tikv"
  11. path: ""
  12. selector:
  13. namespaces:
  14. - tidb-cluster-demo
  15. labelSelectors:
  16. "app.kubernetes.io/component": "tikv"
  17. layer: "fs"
  18. percent: "50"
  19. delay: "1ms"
  20. scheduler:
  21. cron: "@every 10m"

For more sample files, see examples. You can edit them as needed.

Description:

  • selector: is used to select pods that are used to inject chaos actions.

  • action: represents the IO chaos actions. Currently the delay, errno, and mixed actions are supported. You can go to IO chaos available actions for more details.

  • mode: defines the mode to run chaos actions. Supported mode: one / all / fixed / fixed-percent / random-max-percent.

  • duration: represents the duration of a chaos action. The duration might be a string with the signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m".

  • delay: defines the value of IO chaos action delay. The duration might be a string with the signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as "300ms", "-1.5h" or ”2h45m”. Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", and "h".If Delay is empty, the operator will generate a value for it randomly.

  • errno: defines the error code that is returned by an IO action. It and the errno defined by Linux system are consistent. It is an int32 string like "2", "2" means No such file or directory.This field need to be set when you choose an errno or mixed action. If errno is empty, the operator will randomly generate an error code for it.See the common Linux system errors for more Linux system error codes.

  • percent: defines the percentage of injection errors and provides a number from 0-100. The default value is 100.

  • path: defines the path of files for injecting IO chaos actions. It should be a regular expression for the path you want to inject errno or delay. If the path is "" or not defined, IO chaos actions will be injected into all files.

  • methods: defines the IO methods for injecting IO chaos actions. It’s an array of string, which sets the IO syscalls such as open and read.See the available methods for more details.

  • addr: defines the sidecar HTTP server address for a sidecar container, such as ":8080".

  • configName: defines the config name which is used to inject chaos action into pods. You can refer to examples/tikv-configmap.yaml to define your configuration.

  • layer: represents the layer of the IO action. Supported value: fs (by default).

Create a chaos experiment

Before the application created, you need to make admission-webhook enable by label add an annotation to the application namespace:

  1. admission-webhook.pingcap.com/init-request:chaosfs-tikv

You can use the following commands to set labels and annotations of the application namespace:

  1. # If the application namespace does not exist. you can exec this command to create one,
  2. # otherwise ignore this command.
  3. kubectl create ns app-ns # "app-ns" is the application namespace
  4.  
  5. # enable admission-webhook
  6. kubectl label ns app-ns admission-webhook=enabled
  7.  
  8. # set annotation
  9. kubectl annotate ns app-ns admission-webhook.pingcap.com/init-request=chaosfs-tikv
  10.  
  11. # create your application
  12. ...

Then, you can start your application and define YAML file to start your chaos experiment.

Start a chaos experiment

Assume that you are using examples/io-mixed-example.yaml, you can run the following command to create a chaos experiment:

  1. kubectl apply -f examples/io-mixed-example.yaml

IO chaos available actions

IO chaos currently supports the following actions:

  • delay: IO delay action. You can specify the latency before the IO operation returns a result.
  • errno: IO errno action. In this mode, read/write IO operations will return an error.
  • mixed: Both delay and errno actions.

delay

If you are using the delay mode, you can edit spec as below:

  1. spec:
  2. action: delay
  3. delay: "1ms"

If delay is not specified, it will be generated randomly on runtime.

errno

If you are using the errno mode, you can edit spec as below:

  1. spec:
  2. action: errno
  3. errno: "32"

If errno is not specified, it will be generated randomly on runtime.

mixed

If you are using the mixed mode, you can edit spec as below:

  1. spec:
  2. action: mixed
  3. delay: "1ms"
  4. errno: "32"

The mix mode defines the delay and errno actions in one spec.

Common Linux system errors

  • 1: Operation not permitted
  • 2: No such file or directory
  • 5: I/O error
  • 6: No such device or address
  • 12: Out of memory
  • 16: Device or resource busy
  • 17: File exists
  • 20: Not a directory
  • 22: Invalid argument
  • 24: Too many open files
  • 28: No space left on device

The number represents the errno the Linux system error. More Linux system errors refer to Errors: Linux System Errors.

Available methods

Available methods are as below:

  • open
  • read
  • write
  • mkdir
  • rmdir
  • opendir
  • fsync
  • flush
  • release
  • truncate
  • getattr
  • chown
  • chmod
  • utimens
  • allocate
  • getlk
  • setlk
  • setlkw
  • statfs
  • readlink
  • symlink
  • create
  • access
  • link
  • mknod
  • rename
  • unlink
  • getxattr
  • listxattr
  • removexattr
  • setxattr