Speed

Speed

CPython, the most commonly used implementation of Python, is slow for CPU boundtasks. PyPy is fast.

Using a slightly modified version of David Beazley’s CPU bound test code(added loop for multiple tests), you can see the difference between CPythonand PyPy’s processing.

# PyPy
$ ./pypy -V
Python 2.7.1 (7773f8fc4223, Nov 18 2011, 18:47:10)
[PyPy 1.7.0 with GCC 4.4.3]
$ ./pypy measure2.py
0.0683999061584
0.0483210086823
0.0388588905334
0.0440690517426
0.0695300102234

# CPython
$ ./python -V
Python 2.7.1
$ ./python measure2.py
1.06774401665
1.45412397385
1.51485204697
1.54693889618
1.60109114647

Context

The GIL

The GIL (Global Interpreter Lock) is how Python allows multiple threads tooperate at the same time. Python’s memory management isn’t entirely thread-safe,so the GIL is required to prevent multiple threads from running the samePython code at once.

David Beazley has a great guide on how the GIL operates. He also covers thenew GIL in Python 3.2. His results show that maximizing performance in aPython application requires a strong understanding of the GIL, how it affectsyour specific application, how many cores you have, and where your applicationbottlenecks are.

C Extensions

The GIL

Special care must be taken when writing C extensions to make sure youregister your threads with the interpreter.

C Extensions

Cython

Cython implements a superset of the Python languagewith which you are able to write C and C++ modules for Python. Cython alsoallows you to call functions from compiled C libraries. Using Cython allowsyou to take advantage of Python’s strong typing of variables and operations.

Here’s an example of strong typing with Cython:

def primes(int kmax):
"""Calculation of prime numbers with additional
Cython keywords"""
 
    cdef int n, k, i
    cdef int p[1000]
    result = []
    if kmax > 1000:
        kmax = 1000
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
            result.append(n)
        n = n + 1
    return result

This implementation of an algorithm to find prime numbers has some additionalkeywords compared to the next one, which is implemented in pure Python:

def primes(kmax):
"""Calculation of prime numbers in standard Python syntax"""
 
    p = range(1000)
    result = []
    if kmax > 1000:
        kmax = 1000
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
            result.append(n)
        n = n + 1
    return result

Notice that in the Cython version you declare integers and integer arraysto be compiled into C types while also creating a Python list:

def primes(int kmax):
    """Calculation of prime numbers with additional
    Cython keywords"""
 
    cdef int n, k, i
    cdef int p[1000]
    result = []

def primes(kmax):
    """Calculation of prime numbers in standard Python syntax"""
 
    p = range(1000)
    result = []

What is the difference? In the upper Cython version you can see thedeclaration of the variable types and the integer array in a similar way asin standard C. For example cdef int n,k,i in line 3. This additional typedeclaration (i.e. integer) allows the Cython compiler to generate moreefficient C code from the second version. While standard Python code is savedin .py files, Cython code is saved in .pyx files.

What’s the difference in speed? Let’s try it!

import time
#activate pyx compiler
import pyximport
pyximport.install()
#primes implemented with Cython
import primesCy
#primes implemented with Python
import primes
 
print "Cython:"
t1= time.time()
print primesCy.primes(500)
t2= time.time()
print "Cython time: %s" %(t2-t1)
print ""
print "Python"
t1= time.time()
print primes.primes(500)
t2= time.time()
print "Python time: %s" %(t2-t1)

These lines both need a remark:

import pyximport
pyximport.install()

The pyximport module allows you to import .pyx files (e.g.,primesCy.pyx) with the Cython-compiled version of the primes_function. The _pyximport.install() command allows the Python interpreter tostart the Cython compiler directly to generate C code, which is automaticallycompiled to a .so C library. Cython is then able to import thislibrary for you in your Python code, easily and efficiently. With thetime.time() function you are able to compare the time between these 2different calls to find 500 prime numbers. On a standard notebook (dual coreAMD E-450 1.6 GHz), the measured values are:

Cython time: 0.0054 seconds
 
Python time: 0.0566 seconds

And here is the output of an embedded ARM beaglebone machine:

Cython time: 0.0196 seconds
 
Python time: 0.3302 seconds

Pyrex

Shedskin?

Concurrency

Concurrent.futures

The concurrent.futures module is a module in the standard library thatprovides a “high-level interface for asynchronously executing callables”. Itabstracts away a lot of the more complicated details about using multiplethreads or processes for concurrency, and allows the user to focus onaccomplishing the task at hand.

The concurrent.futures module exposes two main classes, theThreadPoolExecutor and the ProcessPoolExecutor. The ThreadPoolExecutorwill create a pool of worker threads that a user can submit jobs to. These jobswill then be executed in another thread when the next worker thread becomesavailable.

The ProcessPoolExecutor works in the same way, except instead of using multiplethreads for its workers, it will use multiple processes. This makes it possibleto side-step the GIL; however, because of the way things are passed to workerprocesses, only picklable objects can be executed and returned.

Because of the way the GIL works, a good rule of thumb is to use aThreadPoolExecutor when the task being executed involves a lot of blocking(i.e. making requests over the network) and to use a ProcessPoolExecutorexecutor when the task is computationally expensive.

There are two main ways of executing things in parallel using the twoExecutors. One way is with the map(func, iterables) method. This worksalmost exactly like the builtin map() function, except it will executeeverything in parallel.

from concurrent.futures import ThreadPoolExecutor
import requests
 
def get_webpage(url):
    page = requests.get(url)
    return page
 
pool = ThreadPoolExecutor(max_workers=5)
 
my_urls = ['http://google.com/']*10  # Create a list of urls
 
for page in pool.map(get_webpage, my_urls):
    # Do something with the result
    print(page.text)

For even more control, the submit(func, args, *kwargs) method will schedulea callable to be executed ( as func(args, *kwargs)) and returns a Futureobject that represents the execution of the callable.

The Future object provides various methods that can be used to check on theprogress of the scheduled callable. These include:

cancel(): Attempt to cancel the call.
cancelled(): Return True if the call was successfully cancelled.
running(): Return True if the call is currently being executed and cannot becancelled.
done(): Return True if the call was successfully cancelled or finished running.
result(): Return the value returned by the call. Note that this call will block untilthe scheduled callable returns by default.
exception(): Return the exception raised by the call. If no exception was raised thenthis returns None. Note that this will block just like result().
adddone_callback(fn): Attach a callback function that will be executed (as _fn(future)) when thescheduled callable returns.

from concurrent.futures import ProcessPoolExecutor, as_completed
 
def is_prime(n):
    if n % 2 == 0:
        return n, False
 
    sqrt_n = int(n**0.5)
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return n, False
    return n, True
 
PRIMES = [
    112272535095293,
    112582705942171,
    112272535095293,
    115280095190773,
    115797848077099,
    1099726899285419]
 
futures = []
with ProcessPoolExecutor(max_workers=4) as pool:
    # Schedule the ProcessPoolExecutor to check if a number is prime
    # and add the returned Future to our list of futures
    for p in PRIMES:
        fut = pool.submit(is_prime, p)
        futures.append(fut)
 
# As the jobs are completed, print out the results
for number, result in as_completed(futures):
    if result:
        print("{} is prime".format(number))
    else:
        print("{} is not prime".format(number))

The concurrent.futures module contains two helper functions for working withFutures. The as_completed(futures) function returns an iterator over the listof futures, yielding the futures as they complete.

The wait(futures) function will simply block until all futures in the list offutures provided have completed.

For more information, on using the concurrent.futures module, consult theofficial documentation.

threading

The standard library comes with a threading module that allows a user towork with multiple threads manually.

Running a function in another thread is as simple as passing a callable andits arguments to Thread’s constructor and then calling start():

from threading import Thread
import requests
 
def get_webpage(url):
    page = requests.get(url)
    return page
 
some_thread = Thread(get_webpage, 'http://google.com/')
some_thread.start()

To wait until the thread has terminated, call join():

some_thread.join()

After calling join(), it is always a good idea to check whether the thread isstill alive (because the join call timed out):

if some_thread.is_alive():
    print("join() must have timed out.")
else:
    print("Our thread has terminated.")

Because multiple threads have access to the same section of memory, sometimesthere might be situations where two or more threads are trying to write to thesame resource at the same time or where the output is dependent on the sequenceor timing of certain events. This is called a data race or race condition.When this happens, the output will be garbled or you may encounter problemswhich are difficult to debug. A good example is this Stack Overflow post_.

The way this can be avoided is by using a Lock that each thread needs toacquire before writing to a shared resource. Locks can be acquired and releasedthrough either the contextmanager protocol (with statement), or by usingacquire() and release() directly. Here is a (rather contrived) example:

from threading import Lock, Thread
 
file_lock = Lock()
 
def log(msg):
    with file_lock:
        open('website_changes.log', 'w') as f:
            f.write(changes)
 
def monitor_website(some_website):
    """
    Monitor a website and then if there are any changes,
    log them to disk.
    """
    while True:
        changes = check_for_changes(some_website)
        if changes:
            log(changes)
 
websites = ['http://google.com/', ... ]
for website in websites:
    t = Thread(monitor_website, website)
    t.start()

Here, we have a bunch of threads checking for changes on a list of sites andwhenever there are any changes, they attempt to write those changes to a fileby calling log(changes). When log() is called, it will wait to acquirethe lock with with file_lock:. This ensures that at any one time, only onethread is writing to the file.

Spawning Processes

Multiprocessing

原文: https://docs.python-guide.org/scenarios/speed/