BLAS Support

The main blocker on BLAS support is a lack of Fortran-wasm compiler. A secondary issue is the loss of any machine-specific optimizations when compiling anything via wasm.

Although no longer officially maintained, CLAPACK can still be used to provide the relevant routines where necessary.

A fork of the CLAPACK code can be found here. There are a couple of tweaks required, primarily:

  • Using the Faasm toolchain
  • Building the cblas library (rather than the default BLAS directory)
  • Interfacing properly between cblas and the f2c'd code

Cblas sources aren't part of the default CLAPACK and need to be downloaded from https://www.netlib.org/clapack/cblas.tgz (this is already done and checked into the fork).

Building

The built archives are included in the bundled Faasm sysroot and Docker containers, so you shouldn't need to rebuild them at all.

To build from scratch:

  1. cd third-party/faasm-clapack
  2. make
  3. make install

This will install things at /usr/local/faasm/llvm-sysroot.

Numpy

When no system LAPACK is present, Numpy will use its own f2c'd code. This is configured here and is from their lapack_lite module.

To detect a system LAPACK, Numpy will look for different BLAS/ LAPACK libraries at build time via distutils/system_info.py.

Numpy expects a cblas interface to be present.

Configuring Numpy

As described in system_info.py numpy looks for extra info in certain locations like ~/.numpy-site.cfg and environment variables.

The following builds and installs numpy without any BLAS support:

  1. ./bin/build_unoptimized_numpy.sh

You can check the results with:

  1. import numpy as np
  2.  
  3. # This should show all as NOT_AVAILABLE
  4. np.__config__.show()
  5.  
  6. # This should take several seconds and only use one thread
  7. a = np.random.rand(2048, 2048)
  8. b = np.random.rand(2048, 2048)
  9. c = np.dot(a, b)

You can then try running the same thing with a default installation. There should be some info under lapack_opt_info and blas_opt_info, then the matrix multiplication will take under a second.