Request Content Checksums

Various pieces of code can consume the request data and preprocess it. For instance JSON data ends up on the request object already read and processed, form data ends up there as well but goes through a different code path. This seems inconvenient when you want to calculate the checksum of the incoming request data. This is necessary sometimes for some APIs.

Fortunately this is however very simple to change by wrapping the input stream.

The following example calculates the SHA1 checksum of the incoming data as it gets read and stores it in the WSGI environment:

  1. import hashlib
  2. class ChecksumCalcStream(object):
  3. def __init__(self, stream):
  4. self._stream = stream
  5. self._hash = hashlib.sha1()
  6. def read(self, bytes):
  7. rv = self._stream.read(bytes)
  8. self._hash.update(rv)
  9. return rv
  10. def readline(self, size_hint):
  11. rv = self._stream.readline(size_hint)
  12. self._hash.update(rv)
  13. return rv
  14. def generate_checksum(request):
  15. env = request.environ
  16. stream = ChecksumCalcStream(env['wsgi.input'])
  17. env['wsgi.input'] = stream
  18. return stream._hash

To use this, all you need to do is to hook the calculating stream in before the request starts consuming data. (Eg: be careful accessing request.form or anything of that nature. before_request_handlers for instance should be careful not to access it).

Example usage:

  1. @app.route('/special-api', methods=['POST'])
  2. def special_api():
  3. hash = generate_checksum(request)
  4. # Accessing this parses the input stream
  5. files = request.files
  6. # At this point the hash is fully constructed.
  7. checksum = hash.hexdigest()
  8. return f"Hash was: {checksum}"