Injecting Faults

It is easy to inject failures into applications by using the Traffic SplitAPI of theService Mesh Interface. TrafficSplit allows you toredirect a percentage of traffic to a specific backend. This backend iscompletely flexible and can return whatever responses you want - 500s, timeoutsor even crazy payloads.

The books demo is a great way to show off this behavior. Theoverall topology looks like:

Topology)

Topology

In this guide, you will split some of the requests from webapp to books.Most requests will end up at the correct books destination, however some ofthem will be redirected to a faulty backend. This backend will return 500s forevery request and inject faults into the webapp service. No code changes arerequired and as this method is configuration driven, it is a process that can beadded to integration tests and CI pipelines. If you are really living the chaosengineering lifestyle, fault injection could even be used in production.

Prerequisites

To use this guide, you’ll need to have Linkerd installed on your cluster.Follow the Installing Linkerd Guide if you haven’t alreadydone this.

Setup the service

First, add the books sample application to your cluster:

  1. kubectl create ns booksapp && \
  2. linkerd inject https://run.linkerd.io/booksapp.yml | \
  3. kubectl -n booksapp apply -f -

As this manifest is used as a demo elsewhere, it has been configured with anerror rate. To show how fault injection works, the error rate needs to beremoved so that there is a reliable baseline. To increase success rate forbooksapp to 100%, run:

  1. kubectl -n booksapp patch deploy authors \
  2. --type='json' \
  3. -p='[{"op":"remove", "path":"/spec/template/spec/containers/0/env/2"}]'

After a little while, the stats will show 100% success rate. You can verify thisby running:

  1. linkerd -n booksapp stat deploy

The output will end up looking at little like:

  1. NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN
  2. authors 1/1 100.00% 7.1rps 4ms 26ms 33ms 6
  3. books 1/1 100.00% 8.6rps 6ms 73ms 95ms 6
  4. traffic 1/1 - - - - - -
  5. webapp 3/3 100.00% 7.9rps 20ms 76ms 95ms 9

Create the faulty backend

Injecting faults into booksapp requires a service that is configured to returnerrors. To do this, you can start NGINX and configure it to return 500s byrunning:

  1. cat <<EOF | linkerd inject - | kubectl apply -f -
  2. apiVersion: v1
  3. kind: ConfigMap
  4. metadata:
  5. name: error-injector
  6. namespace: booksapp
  7. data:
  8. nginx.conf: |-
  9. events {}
  10. http {
  11. server {
  12. listen 8080;
  13. location / {
  14. return 500;
  15. }
  16. }
  17. }
  18. ---
  19. apiVersion: apps/v1
  20. kind: Deployment
  21. metadata:
  22. name: error-injector
  23. namespace: booksapp
  24. labels:
  25. app: error-injector
  26. spec:
  27. selector:
  28. matchLabels:
  29. app: error-injector
  30. replicas: 1
  31. template:
  32. metadata:
  33. labels:
  34. app: error-injector
  35. spec:
  36. containers:
  37. - name: nginx
  38. image: nginx:alpine
  39. volumeMounts:
  40. - name: nginx-config
  41. mountPath: /etc/nginx/nginx.conf
  42. subPath: nginx.conf
  43. volumes:
  44. - name: nginx-config
  45. configMap:
  46. name: error-injector
  47. ---
  48. apiVersion: v1
  49. kind: Service
  50. metadata:
  51. name: error-injector
  52. namespace: booksapp
  53. spec:
  54. ports:
  55. - name: service
  56. port: 8080
  57. selector:
  58. app: error-injector
  59. EOF

Inject faults

With booksapp and NGINX running, it is now time to partially split the trafficbetween an existing backend, books, and the newly createderror-injector. This is done by adding aTrafficSplitconfiguration to your cluster:

  1. cat <<EOF | kubectl apply -f -
  2. apiVersion: split.smi-spec.io/v1alpha1
  3. kind: TrafficSplit
  4. metadata:
  5. name: error-split
  6. namespace: booksapp
  7. spec:
  8. service: books
  9. backends:
  10. - service: books
  11. weight: 900m
  12. - service: error-injector
  13. weight: 100m
  14. EOF

When Linkerd sees traffic going to the books service, it will send 910requests to the original service and 110 to the error injector. You can seewhat this looks like by running stat and filtering explicitly to just therequests from webapp:

  1. linkerd -n booksapp routes deploy/webapp --to service/books

Unlike the previous stat command which only looks at the requests received byservers, this routes command filters to all the requests being issued bywebapp destined for the books service itself. The output should show a 90%success rate:

  1. ROUTE SERVICE SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
  2. [DEFAULT] books 90.08% 2.0rps 5ms 69ms 94ms

NoteIn this instance, you are looking at the service instead of the deployment. Ifyou were to run this command and look at deploy/books, the success rate wouldstill be 100%. The reason for this is that error-injector is a completelyseparate deployment and traffic is being shifted at the service level. Therequests never reach the books pods and are instead rerouted to the errorinjector’s pods.

Cleanup

To remove everything in this guide from your cluster, run:

  1. kubectl delete ns booksapp