IOChaos Experiment

This document walks you through the IOChaos experiment.

IOChaos allows you to simulate file system faults such as IO delay and read/write errors. It can inject delay and fault when your program is running IO system calls such as open, read, and write.

Configuration file

Below is a sample YAML file of IOChaos:

  1. apiVersion: chaos-mesh.org/v1alpha1
  2. kind: IoChaos
  3. metadata:
  4. name: io-delay-example
  5. spec:
  6. action: latency
  7. mode: one
  8. selector:
  9. labelSelectors:
  10. app: etcd
  11. volumePath: /var/run/etcd
  12. path: '/var/run/etcd/**/*'
  13. delay: '100ms'
  14. percent: 50
  15. duration: '400s'
  16. scheduler:
  17. cron: '@every 10m'

For more sample files, see examples. You can edit them as needed.

FieldDescriptionSample Value
modeDefines the mode of the selector.one / all / fixed / fixed-percent / random-max-percent
selectorSpecifies the pods to be injected with IO chaos.
actionRepresents the IOChaos actions. Refer to Available actions for IOChaos for more details.delay / fault / attrOverride
volumePathThe mount path of the target volume.“/var/run/etcd”
delaySpecifies the latency of the fault injection. The duration might be a string with a signed sequence of decimal numbers, each with an optional fraction and a unit suffix. Valid time units are “ns”, “us” (or “µs”), “ms”, “s”, “m”, and “h”.“300ms” / “2h45m”
errnoDefines the error code returned by an IO action. See common Linux system errors for more Linux system error codes.2
attrDefines the attribute to be overridden and the corresponding valueexamples
percentDefines the probability of injecting errors in percentage.100 (by default)
pathDefines the path of files for injecting IOChaos actions. It should be a glob for the files which you want to inject fault or delay. It is base on glob pattern and should be in the volumePath directory.“/var/run/etcd/*/
methodsDefines the IO methods for injecting IOChaos actions. It is represented as an array of string.open / read See the available methods for more details.
durationRepresents the duration of a chaos action. The duration might be a string with the signed sequence of decimal numbers, each with an optional fraction and a unit suffix.“300ms” / “2h45m”
schedulerDefines the scheduler rules for the running time of the chaos experiment.see robfig/cron

Usage

Assume that you are using examples/io-mixed-example.yaml, you can run the following command to create a chaos experiment:

  1. kubectl apply -f examples/io-mixed-example.yaml

IOChaos available actions

IOChaos currently supports the following actions:

  • latency: IO latency action. You can specify the latency before the IO operation returns a result.
  • fault: IO fault action. In this mode, IO operations returns an error.
  • attrOverride: Override attributes of a file.

latency

If you are using the latency action, you can edit the specification as below:

  1. spec:
  2. action: latency
  3. delay: '1ms'

It will inject a latency of 1ms into the selected methods.

fault

If you are using the fault action, you can edit the specification as below:

  1. spec:
  2. action: fault
  3. errno: 32

The selected methods return error 32, which means broken pipe.

attrOverride

If you are using the attrOverride mode, you can edit the specification as below:

  1. spec:
  2. action: attrOverride
  3. attr:
  4. perm: 72

Then the permission of selected files will be overridden with 110 in octal, which means the files cannot be read or modified (without CAP_DAC_OVERRIDE). See available attributes for a list of all possible attributes to override.

Note:

Attributes could be cached by Linux kernel, so it might have no effect if your program had accessed it before.

Common Linux system errors

Common Linux system errors are as below:

  • 1: Operation not permitted
  • 2: No such file or directory
  • 5: I/O error
  • 6: No such device or address
  • 12: Out of memory
  • 16: Device or resource busy
  • 17: File exists
  • 20: Not a directory
  • 22: Invalid argument
  • 24: Too many open files
  • 28: No space left on device

Refer to related header files for more information.

Available methods

Available methods are as below:

  • lookup
  • forget
  • getattr
  • setattr
  • readlink
  • mknod
  • mkdir
  • unlink
  • rmdir
  • symlink
  • rename
  • link
  • open
  • read
  • write
  • flush
  • release
  • fsync
  • opendir
  • readdir
  • releasedir
  • fsyncdir
  • statfs
  • setxattr
  • getxattr
  • listxattr
  • removexattr
  • access
  • create
  • getlk
  • setlk
  • bmap

Available attributes

Available attributes and the meaning of them are listed here:

  • ino, inode of a file
  • size, total size, in bytes
  • blocks, number of 512B blocks allocated
  • atime, time of last access
  • mtime, time of last modification
  • ctime, time of last status change
  • kind, file type. It can be namedPipe, charDevice, blockDevice, directory, regularFile, symlink or socket
  • perm, permission of a file
  • nlink, number of hard links
  • uid, user id of owner
  • gid, group id of owner
  • rdev, device ID (if special file)