gridfs – Tools for working with GridFS

GridFS is a specification for storing large objects in Mongo.

The gridfs package is an implementation of GridFS on top ofpymongo, exposing a file-like interface.

See also

The MongoDB documentation on

gridfs

  • class gridfs.GridFS(database, collection='fs', disable_md5=False)
  • Create a new instance of GridFS.

Raises TypeError if database is not an instance ofDatabase.

Parameters:

  • database: database to use
  • collection (optional): root collection to use
  • disable_md5 (optional): When True, MD5 checksums will not becomputed for uploaded files. Useful in environments where MD5cannot be used for regulatory or other reasons. Defaults to False.

Changed in version 3.1: Indexes are only ensured on the first write to the DB.

Changed in version 3.0: database must use an acknowledgedwrite_concern

See also

The MongoDB documentation on

gridfs

  • delete(file_id, session=None)
  • Delete a file from GridFS by "_id".

Deletes all data belonging to the file with "id":_file_id.

Warning

Any processes/threads reading from the file whilethis method is executing will likely see an invalid/corruptfile. Care should be taken to avoid concurrent reads to a filewhile it is being deleted.

Note

Deletes of non-existent files are considered successfulsince the end result is the same: no file with that _id remains.

Parameters:

  1. - _file_id_: <code>&#34;_id&#34;</code> of the file to delete
  2. - _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)

Changed in version 3.6: Added session parameter.

Changed in version 3.1: delete no longer ensures indexes.

  • exists(document_or_id=None, session=None, **kwargs)
  • Check if a file exists in this instance of GridFS.

The file to check for can be specified by the value of its_id key, or by passing in a query document. A querydocument can be passed in as dictionary, or by using keywordarguments. Thus, the following three calls are equivalent:

  1. >>> fs.exists(file_id)
  2. >>> fs.exists({"_id": file_id})
  3. >>> fs.exists(_id=file_id)

As are the following two calls:

  1. >>> fs.exists({"filename": "mike.txt"})
  2. >>> fs.exists(filename="mike.txt")

And the following two:

  1. >>> fs.exists({"foo": {"$gt": 12}})
  2. >>> fs.exists(foo={"$gt": 12})

Returns True if a matching file exists, Falseotherwise. Calls to exists() will not automaticallycreate appropriate indexes; application developers should besure to create indexes if needed and as appropriate.

Parameters:

  1. - _document_or_id_ (optional): query document, or _id of thedocument to check for
  2. - _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)
  3. - _**kwargs_ (optional): keyword arguments are used as aquery document, if theyre present.

Changed in version 3.6: Added session parameter.

  • find(*args, **kwargs)
  • Query GridFS for files.

Returns a cursor that iterates across files matchingarbitrary queries on the files collection. Can be combinedwith other modifiers for additional control. For example:

  1. for grid_out in fs.find({"filename": "lisa.txt"},
  2. no_cursor_timeout=True):
  3. data = grid_out.read()

would iterate through all versions of “lisa.txt” stored in GridFS.Note that setting no_cursor_timeout to True may be important toprevent the cursor from timing out during long multi-file processingwork.

As another example, the call:

  1. most_recent_three = fs.find().sort("uploadDate", -1).limit(3)

would return a cursor to the three most recently uploaded filesin GridFS.

Follows a similar interface tofind()in Collection.

If a ClientSession is passed tofind(), all returned GridOut instancesare associated with that session.

Parameters:

  1. - _filter_ (optional): a SON object specifying elements whichmust be present for a document to be included in theresult set
  2. - _skip_ (optional): the number of files to omit (fromthe start of the result set) when returning the results
  3. - _limit_ (optional): the maximum number of results toreturn
  4. - _no_cursor_timeout_ (optional): if False (the default), anyreturned cursor is closed by the server after 10 minutes ofinactivity. If set to True, the returned cursor will nevertime out on the server. Care should be taken to ensure thatcursors with no_cursor_timeout turned on are properly closed.
  5. - _sort_ (optional): a list of (key, direction) pairsspecifying the sort order for this query. See[<code>sort()</code>]($11aa48d96c71b56e.md#pymongo.cursor.Cursor.sort) for details.

Raises TypeError if any of the arguments are ofimproper type. Returns an instance ofGridOutCursorcorresponding to this query.

Changed in version 3.0: Removed the read_preference, tag_sets, andsecondary_acceptable_latency_ms options.

New in version 2.7.

See also

The MongoDB documentation on

find

  • findone(_filter=None, session=None, *args, **kwargs)
  • Get a single file from gridfs.

All arguments to find() are also valid arguments forfind_one(), although any limit argument will beignored. Returns a single GridOut,or None if no matching file is found. For example:

  1. file = fs.find_one({"filename": "lisa.txt"})

Parameters:

  1. - _filter_ (optional): a dictionary specifyingthe query to be performing OR any other type to be used asthe value for a query for <code>&#34;_id&#34;</code> in the file collection.
  2. - _*args_ (optional): any additional positional arguments arethe same as the arguments to [<code>find()</code>](https://api.mongodb.com/python/current/api/gridfs/#gridfs.GridFS.find).
  3. - _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)
  4. - _**kwargs_ (optional): any additional keyword argumentsare the same as the arguments to [<code>find()</code>](https://api.mongodb.com/python/current/api/gridfs/#gridfs.GridFS.find).

Changed in version 3.6: Added session parameter.

  • get(file_id, session=None)
  • Get a file from GridFS by "_id".

Returns an instance of GridOut,which provides a file-like interface for reading.

Parameters:

  1. - _file_id_: <code>&#34;_id&#34;</code> of the file to get
  2. - _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)

Changed in version 3.6: Added session parameter.

  • getlast_version(_filename=None, session=None, **kwargs)
  • Get the most recent version of a file in GridFS by "filename"or metadata fields.

Equivalent to calling get_version() with the defaultversion (-1).

Parameters:

  1. - _filename_: <code>&#34;filename&#34;</code> of the file to get, or _None_
  2. - _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)
  3. - _**kwargs_ (optional): find files by custom metadata.

Changed in version 3.6: Added session parameter.

  • getversion(_filename=None, version=-1, session=None, **kwargs)
  • Get a file from GridFS by "filename" or metadata fields.

Returns a version of the file in GridFS whose filename matchesfilename and whose metadata fields match the supplied keywordarguments, as an instance of GridOut.

Version numbering is a convenience atop the GridFS API providedby MongoDB. If more than one file matches the query (either byfilename alone, by metadata fields, or by a combination ofboth), then version -1 will be the most recently uploadedmatching file, -2 the second most recentlyuploaded, etc. Version 0 will be the first versionuploaded, 1 the second version, etc. So if three versionshave been uploaded, then version 0 is the same as version-3, version 1 is the same as version -2, andversion 2 is the same as version -1.

Raises NoFile if no such version ofthat file exists.

Parameters:

  1. - _filename_: <code>&#34;filename&#34;</code> of the file to get, or _None_
  2. - _version_ (optional): version of the file to get (defaultsto -1, the most recent version uploaded)
  3. - _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)
  4. - _**kwargs_ (optional): find files by custom metadata.

Changed in version 3.6: Added session parameter.

Changed in version 3.1: get_version no longer ensures indexes.

  • list(session=None)
  • List the names of all files stored in this instance ofGridFS.

Parameters:

  1. - _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)

Changed in version 3.6: Added session parameter.

Changed in version 3.1: list no longer ensures indexes.

  • newfile(**kwargs_)
  • Create a new file in GridFS.

Returns a new GridIn instance towhich data can be written. Any keyword arguments will bepassed through to GridIn().

If the "_id" of the file is manually specified, it mustnot already exist in GridFS. OtherwiseFileExists is raised.

Parameters:

  1. - _**kwargs_ (optional): keyword arguments for file creation
  • put(data, **kwargs)
  • Put data in GridFS as a new file.

Equivalent to doing:

  1. try:
  2. f = new_file(**kwargs)
  3. f.write(data)
  4. finally:
  5. f.close()

data can be either an instance of str (bytesin python 3) or a file-like object providing a read() method.If an encoding keyword argument is passed, data can also be aunicode (str in python 3) instance, which willbe encoded as encoding before being written. Any keyword argumentswill be passed through to the created file - seeGridIn() for possible arguments. Returns the"_id" of the created file.

If the "_id" of the file is manually specified, it mustnot already exist in GridFS. OtherwiseFileExists is raised.

Parameters:

  1. - _data_: data to be written as a file.
  2. - _**kwargs_ (optional): keyword arguments for file creation

Changed in version 3.0: w=0 writes to GridFS are now prohibited.

  • class gridfs.GridFSBucket(db, bucket_name='fs', chunk_size_bytes=261120, write_concern=None, read_preference=None, disable_md5=False)
  • Create a new instance of GridFSBucket.

Raises TypeError if database is not an instance ofDatabase.

Raises ConfigurationError if _write_concern_is not acknowledged.

Parameters:

  • database: database to use.
  • bucket_name (optional): The name of the bucket. Defaults to ‘fs’.
  • chunk_size_bytes (optional): The chunk size in bytes. Defaultsto 255KB.
  • write_concern (optional): TheWriteConcern to use. If None(the default) db.write_concern is used.
  • read_preference (optional): The read preference to use. IfNone (the default) db.read_preference is used.
  • disable_md5 (optional): When True, MD5 checksums will not becomputed for uploaded files. Useful in environments where MD5cannot be used for regulatory or other reasons. Defaults to False.

New in version 3.1.

See also

The MongoDB documentation on

gridfs

  • delete(file_id, session=None)
  • Given an file_id, delete this stored file’s files collection documentand associated chunks from a GridFS bucket.

For example:

  1. my_db = MongoClient().test
  2. fs = GridFSBucket(my_db)
  3. # Get _id of file to delete
  4. file_id = fs.upload_from_stream("test_file", "data I want to store!")
  5. fs.delete(file_id)

Raises NoFile if no file with file_id exists.

Parameters:

  1. - _file_id_: The _id of the file to be deleted.
  2. - _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)

Changed in version 3.6: Added session parameter.

  • downloadto_stream(_file_id, destination, session=None)
  • Downloads the contents of the stored file specified by fileid andwrites the contents to _destination.

For example:

  1. my_db = MongoClient().test
  2. fs = GridFSBucket(my_db)
  3. # Get _id of file to read
  4. file_id = fs.upload_from_stream("test_file", "data I want to store!")
  5. # Get file to write to
  6. file = open('myfile','wb+')
  7. fs.download_to_stream(file_id, file)
  8. file.seek(0)
  9. contents = file.read()

Raises NoFile if no file with file_id exists.

Parameters:

  1. - _file_id_: The _id of the file to be downloaded.
  2. - _destination_: a file-like object implementing <code>write()</code>.
  3. - _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)

Changed in version 3.6: Added session parameter.

  • downloadto_stream_by_name(_filename, destination, revision=-1, session=None)
  • Write the contents of filename (with optional revision) todestination.

For example:

  1. my_db = MongoClient().test
  2. fs = GridFSBucket(my_db)
  3. # Get file to write to
  4. file = open('myfile','wb')
  5. fs.download_to_stream_by_name("test_file", file)

Raises NoFile if no such version ofthat file exists.

Raises ValueError if filename is not a string.

Parameters:

  1. - _filename_: The name of the file to read from.
  2. - _destination_: A file-like object that implements <code>write()</code>.
  3. - _revision_ (optional): Which revision (documents with the samefilename and different uploadDate) of the file to retrieve.Defaults to -1 (the most recent revision).
  4. - _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)Note:

Revision numbers are defined as follows:

  1. - 0 = the original stored file
  2. - 1 = the first revision
  3. - 2 = the second revision
  4. - etc
  5. - -2 = the second most recent revision
  6. - -1 = the most recent revision

Changed in version 3.6: Added session parameter.

  • find(*args, **kwargs)
  • Find and return the files collection documents that match filter

Returns a cursor that iterates across files matchingarbitrary queries on the files collection. Can be combinedwith other modifiers for additional control.

For example:

  1. for grid_data in fs.find({"filename": "lisa.txt"},
  2. no_cursor_timeout=True):
  3. data = grid_data.read()

would iterate through all versions of “lisa.txt” stored in GridFS.Note that setting no_cursor_timeout to True may be important toprevent the cursor from timing out during long multi-file processingwork.

As another example, the call:

  1. most_recent_three = fs.find().sort("uploadDate", -1).limit(3)

would return a cursor to the three most recently uploaded filesin GridFS.

Follows a similar interface tofind()in Collection.

If a ClientSession is passed tofind(), all returned GridOut instancesare associated with that session.

Parameters:

  1. - _filter_: Search query.
  2. - _batch_size_ (optional): The number of documents to return perbatch.
  3. - _limit_ (optional): The maximum number of documents to return.
  4. - _no_cursor_timeout_ (optional): The server normally times out idlecursors after an inactivity period (10 minutes) to prevent excessmemory use. Set this option to True prevent that.
  5. - _skip_ (optional): The number of documents to skip beforereturning.
  6. - _sort_ (optional): The order by which to sort results. Defaults toNone.
  • opendownload_stream(_file_id, session=None)
  • Opens a Stream from which the application can read the contents ofthe stored file specified by file_id.

For example:

  1. my_db = MongoClient().test
  2. fs = GridFSBucket(my_db)
  3. # get _id of file to read.
  4. file_id = fs.upload_from_stream("test_file", "data I want to store!")
  5. grid_out = fs.open_download_stream(file_id)
  6. contents = grid_out.read()

Returns an instance of GridOut.

Raises NoFile if no file with file_id exists.

Parameters:

  1. - _file_id_: The _id of the file to be downloaded.
  2. - _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)

Changed in version 3.6: Added session parameter.

  • opendownload_stream_by_name(_filename, revision=-1, session=None)
  • Opens a Stream from which the application can read the contents offilename and optional revision.

For example:

  1. my_db = MongoClient().test
  2. fs = GridFSBucket(my_db)
  3. grid_out = fs.open_download_stream_by_name("test_file")
  4. contents = grid_out.read()

Returns an instance of GridOut.

Raises NoFile if no such version ofthat file exists.

Raises ValueError filename is not a string.

Parameters:

  1. - _filename_: The name of the file to read from.
  2. - _revision_ (optional): Which revision (documents with the samefilename and different uploadDate) of the file to retrieve.Defaults to -1 (the most recent revision).
  3. - _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)Note:

Revision numbers are defined as follows:

  1. - 0 = the original stored file
  2. - 1 = the first revision
  3. - 2 = the second revision
  4. - etc
  5. - -2 = the second most recent revision
  6. - -1 = the most recent revision

Changed in version 3.6: Added session parameter.

  • openupload_stream(_filename, chunk_size_bytes=None, metadata=None, session=None)
  • Opens a Stream that the application can write the contents of thefile to.

The user must specify the filename, and can choose to add anyadditional information in the metadata field of the file document ormodify the chunk size.For example:

  1. my_db = MongoClient().test
  2. fs = GridFSBucket(my_db)
  3. grid_in = fs.open_upload_stream(
  4. "test_file", chunk_size_bytes=4,
  5. metadata={"contentType": "text/plain"})
  6. grid_in.write("data I want to store!")
  7. grid_in.close() # uploaded on close

Returns an instance of GridIn.

Raises NoFile if no such version ofthat file exists.Raises ValueError if filename is not a string.

Parameters:

  1. - _filename_: The name of the file to upload.
  2. - _chunk_size_bytes_ (options): The number of bytes per chunk of thisfile. Defaults to the chunk_size_bytes in [<code>GridFSBucket</code>](https://api.mongodb.com/python/current/api/gridfs/#gridfs.GridFSBucket).
  3. - _metadata_ (optional): User data for the metadata field of thefiles collection document. If not provided the metadata field willbe omitted from the files collection document.
  4. - _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)

Changed in version 3.6: Added session parameter.

  • openupload_stream_with_id(_file_id, filename, chunk_size_bytes=None, metadata=None, session=None)
  • Opens a Stream that the application can write the contents of thefile to.

The user must specify the file id and filename, and can choose to addany additional information in the metadata field of the file documentor modify the chunk size.For example:

  1. my_db = MongoClient().test
  2. fs = GridFSBucket(my_db)
  3. grid_in = fs.open_upload_stream_with_id(
  4. ObjectId(),
  5. "test_file",
  6. chunk_size_bytes=4,
  7. metadata={"contentType": "text/plain"})
  8. grid_in.write("data I want to store!")
  9. grid_in.close() # uploaded on close

Returns an instance of GridIn.

Raises NoFile if no such version ofthat file exists.Raises ValueError if filename is not a string.

Parameters:

  1. - _file_id_: The id to use for this file. The id must not havealready been used for another file.
  2. - _filename_: The name of the file to upload.
  3. - _chunk_size_bytes_ (options): The number of bytes per chunk of thisfile. Defaults to the chunk_size_bytes in [<code>GridFSBucket</code>](https://api.mongodb.com/python/current/api/gridfs/#gridfs.GridFSBucket).
  4. - _metadata_ (optional): User data for the metadata field of thefiles collection document. If not provided the metadata field willbe omitted from the files collection document.
  5. - _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)

Changed in version 3.6: Added session parameter.

  • rename(file_id, new_filename, session=None)
  • Renames the stored file with the specified file_id.

For example:

  1. my_db = MongoClient().test
  2. fs = GridFSBucket(my_db)
  3. # Get _id of file to rename
  4. file_id = fs.upload_from_stream("test_file", "data I want to store!")
  5. fs.rename(file_id, "new_test_name")

Raises NoFile if no file with file_id exists.

Parameters:

  1. - _file_id_: The _id of the file to be renamed.
  2. - _new_filename_: The new name of the file.
  3. - _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)

Changed in version 3.6: Added session parameter.

  • uploadfrom_stream(_filename, source, chunk_size_bytes=None, metadata=None, session=None)
  • Uploads a user file to a GridFS bucket.

Reads the contents of the user file from source and uploadsit to the file filename. Source can be a string or file-like object.For example:

  1. my_db = MongoClient().test
  2. fs = GridFSBucket(my_db)
  3. file_id = fs.upload_from_stream(
  4. "test_file",
  5. "data I want to store!",
  6. chunk_size_bytes=4,
  7. metadata={"contentType": "text/plain"})

Returns the _id of the uploaded file.

Raises NoFile if no such version ofthat file exists.Raises ValueError if filename is not a string.

Parameters:

  1. - _filename_: The name of the file to upload.
  2. - _source_: The source stream of the content to be uploaded. Must bea file-like object that implements <code>read()</code> or a string.
  3. - _chunk_size_bytes_ (options): The number of bytes per chunk of thisfile. Defaults to the chunk_size_bytes of [<code>GridFSBucket</code>](https://api.mongodb.com/python/current/api/gridfs/#gridfs.GridFSBucket).
  4. - _metadata_ (optional): User data for the metadata field of thefiles collection document. If not provided the metadata field willbe omitted from the files collection document.
  5. - _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)

Changed in version 3.6: Added session parameter.

  • uploadfrom_stream_with_id(_file_id, filename, source, chunk_size_bytes=None, metadata=None, session=None)
  • Uploads a user file to a GridFS bucket with a custom file id.

Reads the contents of the user file from source and uploadsit to the file filename. Source can be a string or file-like object.For example:

  1. my_db = MongoClient().test
  2. fs = GridFSBucket(my_db)
  3. file_id = fs.upload_from_stream(
  4. ObjectId(),
  5. "test_file",
  6. "data I want to store!",
  7. chunk_size_bytes=4,
  8. metadata={"contentType": "text/plain"})

Raises NoFile if no such version ofthat file exists.Raises ValueError if filename is not a string.

Parameters:

  1. - _file_id_: The id to use for this file. The id must not havealready been used for another file.
  2. - _filename_: The name of the file to upload.
  3. - _source_: The source stream of the content to be uploaded. Must bea file-like object that implements <code>read()</code> or a string.
  4. - _chunk_size_bytes_ (options): The number of bytes per chunk of thisfile. Defaults to the chunk_size_bytes of [<code>GridFSBucket</code>](https://api.mongodb.com/python/current/api/gridfs/#gridfs.GridFSBucket).
  5. - _metadata_ (optional): User data for the metadata field of thefiles collection document. If not provided the metadata field willbe omitted from the files collection document.
  6. - _session_ (optional): a[<code>ClientSession</code>]($9cd063bf36ed4635.md#pymongo.client_session.ClientSession)

Changed in version 3.6: Added session parameter.

Sub-modules: