Experiments

Experiments are run on a Kubernetes cluster. Commands must be run from a machine with access to the clusterthat can run kubectl and kn.

Everything must be cleared away between runs to make sure stuff doesn't bleed across.

Client Machine Set-up

To run throughput/ latency experiments you'll need to set up the client machine with (on the machine itself):

  1. cd ansible
  2. ansible-playbook load_client.yml

Billing Estimates

To get resource measurements from the hosts running experiments we first need an inventory file atansible/inventory/billing.yml, something like:

  1. [all]
  2. myhost1
  3. myhost2
  4. ...

Then we can run the set-up with:

  1. cd ansible
  2. ansible-playbook -i inventory/billing.yml billing_setup.yml

Data

Data should be generated and uploaded ahead of time.

SGD

For details of the SGD experiment data see sgd.md notes.

Matrices

The matrix experiment data needs to be generated in bulk locally, uploaded to S3 then downloaded on the client machine (or directly copied with scp). You must have the native tooling and pyfaasm installed to generate it up front (butthis doesn't need to be done if it's already in S3):

  1. # Generate it
  2. inv libs.native
  3. inv matrix-data.generate-all
  4.  
  5. # Direct SCP from local machine
  6. export HOST=<your_host>
  7. export HOST_USER=<user_on_your_host>
  8. scp -r ~/faasm/data/matrix $HOST_USER@$HOST:/home/$HOST_USER/faasm/data
  9.  
  10. # Upload (note - >4GB)
  11. inv data.matrix-upload-s3
  12.  
  13. # Download
  14. inv data.matrix-download-s3

Tensorflow

Tensorflow data consists of the model and images. These need to beuploaded to your Faasm instance:

  1. inv data.tf-upload data.tf-state

SGD Experiment

  1. # -- Prepare --
  2. # Upload data (one off)
  3. inv data.reuters-state
  4.  
  5. # -- Build/ upload --
  6. inv knative.build-native sgd reuters_svm
  7. inv upload sgd reuters_svm
  8.  
  9. # -- Deploy --
  10.  
  11. # Vary number of workers on each run
  12. export N_WORKERS=10
  13.  
  14. # Native containers
  15. inv knative.deploy-native sgd reuters_svm $N_WORKERS
  16.  
  17. # Wasm
  18. inv knative.deploy $N_WORKERS
  19.  
  20. # -- Wait --
  21.  
  22. watch kn -n faasm service list
  23. watch kubectl -n faasm get pods
  24.  
  25. # -- Run experiment --
  26.  
  27. # Native SGD
  28. inv experiments.sgd --native $N_WORKERS 60000
  29.  
  30. # Wasm SGD
  31. inv experiments.sgd $N_WORKERS 60000
  32.  
  33. # -- Clean up --
  34.  
  35. # Native SGD
  36. inv knative.delete-native sgd reuters_svm
  37.  
  38. # Wasm
  39. inv knative.delete-worker --hard

Matrices Experiment

  1. # -- Build/ Upload --
  2. inv upload python mat_mul --py
  3.  
  4. # Number of workers kept the same throughout
  5. export N_WORKERS=<number of workers>
  6.  
  7. # -- Deploy --
  8.  
  9. # Native
  10. inv knative.deploy-native-python $N_WORKERS
  11.  
  12. # Wasm
  13. inv knative.deploy $N_WORKERS
  14.  
  15. # -- Run experiment --
  16.  
  17. # Native
  18. inv experiments.matrix-multi $N_WORKERS --native
  19.  
  20. # Wasm
  21. inv experiments.matrix-multi $N_WORKERS

Tensorflow Experiment

You need to set the following environment variables for these experiments (through the knative config):

  • COLD_START_DELAY_MS=800
  • TF_CODEGEN=on
  • SGD_CODEGEN=off
  • PYTHON_CODEGEN=off
  • PYTHON_PRELOAD=off

Preamble:

  1. # -- Build/ upload --
  2. inv knative.build-native tf image
  3. inv upload tf image
  4.  
  5. # -- Upload data (one-off)
  6. inv data.tf-upload data.tf-state

Latency:

  1. # -- Deploy both (note small number of workers) --
  2. inv knative.deploy-native tf image 1
  3. inv knative.deploy 1
  4.  
  5. # -- Run experiment --
  6. inv experiments.tf-lat

Throughput:

  1. # -- Deploy --
  2. # Native
  3. inv knative.deploy-native tf image 30
  4.  
  5. # Wasm
  6. inv knative.deploy 18
  7.  
  8. # -- Run experiment --
  9.  
  10. # Native
  11. inv experiments.tf-tpt --native
  12.  
  13. # Wasm latency
  14. inv experiments.tf-tpt

Results

Once you've done several runs, you need to pull the results to your local machine and process:

  1. # SGD
  2. inv experiments.sgd-pull-results <user> <host>
  3.  
  4. # Matrices
  5. inv experiments.matrix-pull-results <user> <host>
  6.  
  7. # Inference latency
  8. inv experiments.tf-lat-pull-results <user> <host>
  9.  
  10. # Inference throughput
  11. inv experiments.tf-tpt-pull-results <user> <host>