Multithreading

Multithreading allows you to run parts of your programming concurrently, performing tasks in parallel. Every program has a main thread - i.e. the one your main() started from, in addition to which are any that you create.

Examples of reasons to use threads:

  • Long running operations, e.g. zipping up a large file.
  • Activity that is blocking in nature, e.g. listening for connections on a socket
  • Processing data in parallel, e.g. physics, collision detection etc.
  • Asynchronous activities, e.g. timers, polling operations.

In addition, if you use a graphical toolkit, or 3rd party libraries they may spawn their own threads that you do not know about.

Thread safety

One word you will hear a lot in multithreading is thread safety.

By that we mean:

  • Threads should not be able to modify the data at the same time. When this happens it is called a data race and can corrupt the data, causing a crash. e.g. two threads trying to append to a string at the same time.
  • Threads must not lock resources in a way that could cause deadlock i.e. thread 1 obtains a lock on resource B and blocks on resource A, while thread 2 obtains a lock on resource A and blocks on resource B. Both threads are locked forever waiting for a resource to release that never will be.
  • Race conditions are bad, i.e. the order of thread execution produces unpredictable results on the output from the same input.
  • APIs that can be called by multiple threads must either protect their data structures or make it an explicit problem of the client to sort out.
  • Open files and other resources that are accessed by multiple threads must be managed safely.

Protecting shared data

Data should never be read at the same time it is written to in another thread. Nor should data be written to at the same time by two threads.

The common way to prevent this is either:

  • Use a mutex to guard access to the data. A mutex is a special class that only one thread can lock at a time. Other threads that try to lock the mutex will wait until the lock held by another thread is relinquished
  • Use a read-write lock. Similar to a mutex, it allows one thread to lock the thread for writing data, however it permits multiple threads to have read access, providing nothing is already writing to it. For data that is read more frequently than it is modified, this is a lot more efficient than just a mutex.

Avoiding deadlock

The best way to avoid deadlock is only ever obtain a lock to one thing ever and release it as soon as you are done. But if you have to lock more than one thing, ensure the locking order is consistent between all your threads. So if thread 1 locks A and B, then ensure that thread 2 also locks A and B in that order and not B then A. The latter is surely going to cause a deadlock.

C / C++

C and C++ predate threading to some extent so until C++11 the languages have had little built-in support for multi-threading and what there was tended to be compiler specific extensions.

A consequence of this is that C and C++ have ZERO ENFORCEMENT of thread safety. If you data race - too bad. If you forget to write a lock in one function even if you remembered all the others - too bad. You have to discipline yourself to think concurrently and apply the proper protections where it is required.

The consequence of not doing so may not even be felt until your software is in production and that one customer starts complaining that their server freezes about once a week. Good luck finding that bug!

Multithreading APIs

The most common APIs would be:

  • <thread>, <mutex> - from C++11 onwards
  • POSIX threads, or pthreads. Exposed by POSIX systems such as Linux and most other Unix derivatives, e.g. OS X. There is also pthread-win32 support built over the top of Win32 threads.
  • Win32 threads. Exposed by the Windows operating system.
  • OpenMP. Supported by many C++ compilers.
  • 3rd party libraries like Boost and Qt provide wrappers that abstract the differences between thread APIs.

All APIs will have in common:

  • Thread creation, destruction, joins (waiting on threads) and detaches (freeing the thread to do what it likes).
  • Synchronization between threads using locks and barriers.
  • Mutexes - mutual exclusion locks that protect shared data.
  • Conditional variables - a means to signal and notify of conditions becoming true.

std::thread

The std::thread represents a single thread of execution and provides an abstraction over platform dependent ways of threading.

  1. #include <iostream>
  2. #include <thread>
  3. using namespace std;
  4. void DoWork(int loop_count) {
  5. for (int i = 0; i < loop_count; ++i) {
  6. cout << "Hello world " << i << endl;
  7. }
  8. }
  9. int main() {
  10. thread worker(DoWork, 100);
  11. worker.join();
  12. }

The example spawns a thread which invokes the function and passes the parameter into it, printing a message 100 times.

std::mutex

C++ provides a family of various mutex types to protect access to shared data.

The mutex is obtained by a lock_guard and other attempts to obtain the mutex are blocked until the lock is relinquished.

  1. #include <iostream>
  2. #include <thread>
  3. #include <mutex>
  4. using namespace std;
  5. mutex data_guard;
  6. int result = 0;
  7. void DoWork(int loop_count) {
  8. for (auto i = 0; i < loop_count; ++i) {
  9. lock_guard<mutex> guard(data_guard);
  10. result += 1;
  11. }
  12. }
  13. int main() {
  14. thread worker1(DoWork, 100);
  15. thread worker2(DoWork, 150);
  16. worker1.join();
  17. worker2.join();
  18. cout << "result = " << result << endl;
  19. }

POSIX threads

The pthreads API is prefixed pthread_ and works like so:

  1. #include <iostream>
  2. #include <pthread.h>
  3. using namespace std;
  4. void *DoWork(void *data) {
  5. const int loop_count = (int) data;
  6. for (int i = 0; i < loop_count; ++i) {
  7. cout << "Hello world " << i << endl;
  8. }
  9. pthread_exit(NULL);
  10. }
  11. int main() {
  12. pthread_t worker_thread;
  13. int result = pthread_create(&worker_thread, NULL, DoWork, (void *) 100);
  14. // Wait for the thread to end
  15. result = pthread_join(worker_thread, NULL);
  16. }

This example spawns a thread which invokes DoWork with the payload of 100 which causes the function to print a message 100 times.

Win32 Threads

Win32 threading has functions analogous to those in POSIX. They have names such as CreateThread, ExitThread, SetThreadPriority etc.

OpenMP API

Open Multi-Processing (OpenMP) is an API for multi-threaded parallel processing. OpenMP relies on compiler support because you use special #pragma directives in your source to control thread creation and access to data.

GCC, Clang and Visual C++ have support for OpenMP so it is an option.

OpenMP is a complex standard but the use of directives can make for cleaner code than invoking threading APIs directly. The downside is it is also more opaque hiding what the software is doing, making it considerably more difficult to debug.

OpenMP is described in detail at the OpenMP website.

Thread local storage

Thread local storage, or TLS is static or global data which is private to every thread. Each thread holds its own copy of this data so it can modify it without fear of causing a data race.

Compilers also have proprietary ways to decorate types as thread local:

  1. __thread int private; // gcc / clang
  2. __declspec(thread) int private; // MSVC

C++11 has gained a thread_local directive to decorate variables which should use TLS.

  1. thread_local int private

Rust

We saw with C++ that you had to be disciplined to remember to protect data from race conditions.

Rust doesn’t give you that luxury -

  1. Any data that you share must be protected in a thread safe fashion
  2. Any data that you pass between threads must be marked thread safe

Spawning a thread

Spawning a thread is easy enough by calling spawn, supplying the closure you want to run in the context of your new thread.

  1. use std::thread;
  2. thread::spawn(move || {
  3. println!("Hello");
  4. });

Alternatively you can supply a function to spawn which is called in the same manner.

  1. fn my_thread() {
  2. println!("Hello");
  3. }
  4. //...
  5. thread::spawn(my_thread);

If you supply a closure then it must have a lifetime of 'static because threads can outlive the thing that created them. i.e. they are detached by default.

A closure can make use of move values that are marked Send so the compiler allows ownership to transfer between threads.

Likewise function / closure may also return a value which is marked Send so the compiler can transfer ownership between the terminating thread and the thread which calls join to obtain the value.

So the thread above is detached. If we wanted to wait for the thread to complete, the spawn returns a JoinHandle that we can call join to wait for termination.

  1. let h = thread::spawn(move || {
  2. println!("Hello");
  3. });
  4. h.join();

If the closure or function returns a value, we can use join to obtain it.

  1. let h = thread::spawn(move || 100 * 100);
  2. let result = h.join().unwrap();
  3. println!("Result = {}", result);

Data race protection in the compiler

Data races are bad news, but fortunately in Rust the compiler has your back. You MUST protect your shared data or it won’t compile.

The simplest way to protect your data is to wrap the data in a mutex and provide each thread instance with a reference counted copy of the mutex.

  1. let shared_data = Arc::new(Mutex::new(MySharedData::new()));
  2. // Each thread we spawn should have a clone of this Arc
  3. let shared_data = shared_data.clone();
  4. thread::spawn(move || {
  5. let mut shared_data = shared_data.lock().unwrap();
  6. shared_data.counter += 1;
  7. });

Here is a full example that spawns 10 threads that each increment the counter.

  1. struct MySharedData {
  2. pub counter: u32,
  3. }
  4. impl MySharedData {
  5. pub fn new() -> MySharedData {
  6. MySharedData {
  7. counter: 0
  8. }
  9. }
  10. }
  11. fn main() {
  12. spawn_threads();
  13. }
  14. fn spawn_threads() {
  15. let shared_data = Arc::new(Mutex::new(MySharedData::new()));
  16. // Spawn a number of threads and collect their join handles
  17. let handles: Vec<JoinHandle<_>> = (0..10).map(|_| {
  18. let shared_data = shared_data.clone();
  19. thread::spawn(move || {
  20. let mut shared_data = shared_data.lock().unwrap();
  21. shared_data.counter += 1;
  22. })
  23. }).collect();
  24. // Wait for each thread to complete
  25. for h in handles {
  26. h.join();
  27. }
  28. // Print the data
  29. let shared_data = shared_data.lock().unwrap();
  30. println!("Total = {}", shared_data.counter);
  31. }

So the basic strategy will be this:

  1. Every thread will get it’s own atomic reference to the mutex.
  2. Each thread that wishes to access the shared must obtain a lock on the mutex.
  3. Once the lock is released, the next waiting thread can obtain access.
  4. The compiler will enforce this and generate errors if ANYTHING is wrong.

Read Write Lock

A read write lock works much like a mutex - we wrap the shared data in a RwLock, and then in an Arc.

  1. let shared_data = Arc::new(RwLock::new(MySharedData::new()));

Each thread will then either need to obtain a read lock or a write lock on the shared data.

  1. let shared_data = shared_data.read().unwrap();
  2. // OR
  3. let mut shared_data = shared_data.write().unwrap();

The advantage of a RwLock is that many threads can concurrently read the data, providing nothing is writing to it. This may be more efficient in many cases.

Sending data between threads using channels

TODO mpsc channel

Thread local storage

As with C++ you may have reason to use thread local storage

  1. thread_local! {
  2. // TODO
  3. }

Useful crates

Rayon

The rayon crate implements parallel iterators that allow your collections to be iterated in parallel. The crate
utilises work stealing and divide and conquer algorithms couple to a thread pool to process collections more quickly
than they could be in a sequential fashion.

Generally speaking this is a drop-in replacement with the exception that you call par_iter instead of iter. The crate
implements a ParallelIterator trait on collection classes.

  1. use rayon::prelude::*;
  2. fn sum_of_squares(input: &[i32]) -> i32 {
  3. input.par_iter()
  4. .map(|&i| i * i)
  5. .sum()
  6. }

See the crate site for more information.