GridFS

GridFS is a specification for storing and retrieving filesthat exceed the BSON-document size limit of 16 MB.

Note

GridFS does not support multi-document transactions.

Instead of storing a file in a single document, GridFS divides the fileinto parts, or chunks [1], and stores each chunk asa separate document. By default, GridFS uses a default chunk size of 255 kB;that is, GridFS divides a file into chunks of 255 kB with the exceptionof the last chunk. The last chunk is only as large as necessary.Similarly, files that are no larger than the chunk size only have afinal chunk, using only as much space as needed plus some additionalmetadata.

GridFS uses two collections to store files. One collection stores thefile chunks, and the other stores file metadata. The sectionGridFS Collections describes each collection in detail.

When you query GridFS for a file, the driver will reassemble the chunksas needed. You can perform range queries on files stored through GridFS.You can also access information from arbitrary sections of files, suchas to “skip” to the middle of a video or audio file.

GridFS is useful not only for storing files that exceed 16 MB but alsofor storing any files for which you want access without having to loadthe entire file into memory. See alsoWhen to Use GridFS.

When to Use GridFS

In MongoDB, use GridFS for storing files larger than 16 MB.

In some situations, storing large files may be more efficient in aMongoDB database than on a system-level filesystem.

  • If your filesystem limits the number of files in a directory, you canuse GridFS to store as many files as needed.
  • When you want to access information from portions of largefiles without having to load whole files into memory, you can useGridFS to recall sections of files without reading the entire fileinto memory.
  • When you want to keep your files and metadata automatically syncedand deployed across a number of systems and facilities, you can useGridFS. When using geographically distributed replica sets, MongoDB can distributefiles and their metadata automatically to a number ofmongod instances and facilities.

Do not use GridFS if you need to update the content of the entire fileatomically. As an alternative you can store multiple versions of eachfile and specify the current version of the file in the metadata. Youcan update the metadata field that indicates “latest” status in anatomic update after uploading the new version of the file, and laterremove previous versions if needed.

Furthermore, if your files are all smaller than the 16 MB BSONDocument Size limit, consider storing each file in a single document insteadof using GridFS. You may use the BinData data type to store the binary data.See your drivers documentation for details onusing BinData.

Use GridFS

To store and retrieve files using GridFS, use either of thefollowing:

  • A MongoDB driver. See the driversdocumentation for information on using GridFS with your driver.
  • The mongofiles command-line tool. See themongofiles reference for documentation.

GridFS Collections

GridFS stores files in two collections:

GridFS places the collections in a common bucket by prefixing eachwith the bucket name. By default, GridFS uses two collections witha bucket named fs:

  • fs.files
  • fs.chunks

You can choose a different bucket name, as well as create multiplebuckets in a single database. The full collection name, which includesthe bucket name, is subject to the namespace length limit.

The chunks Collection

Each document in the chunks [1] collectionrepresents a distinct chunk of a file as represented in GridFS.Documents in this collection have the following form:

  1. {
  2. "_id" : <ObjectId>,
  3. "files_id" : <ObjectId>,
  4. "n" : <num>,
  5. "data" : <binary>
  6. }

A document from the chunks collection contains the following fields:

  • chunks._id
  • The unique ObjectId of the chunk.
  • chunks.files_id
  • The _id of the “parent” document, as specified in the filescollection.
  • chunks.n
  • The sequence number of the chunk. GridFS numbers all chunks, startingwith 0.
  • chunks.data
  • The chunk’s payload as a BSON Binary type.

The files Collection

Each document in the files collection represents a file inGridFS.

  1. {
  2. "_id" : <ObjectId>,
  3. "length" : <num>,
  4. "chunkSize" : <num>,
  5. "uploadDate" : <timestamp>,
  6. "md5" : <hash>,
  7. "filename" : <string>,
  8. "contentType" : <string>,
  9. "aliases" : <string array>,
  10. "metadata" : <any>,
  11. }

Documents in the files collection contain some or all of thefollowing fields:

  • files._id
  • The unique identifier for this document. The _id is of the datatype you chose for the original document. The default type forMongoDB documents is BSONObjectId.
  • files.length
  • The size of the document in bytes.
  • files.chunkSize
  • The size of each chunk in bytes. GridFS divides the document intochunks of size chunkSize, except for the last, which is only aslarge as needed. The default size is 255 kilobytes (kB).
  • files.uploadDate
  • The date the document was first stored by GridFS. This value has theDate type.
  • files.md5
  • Deprecated

The MD5 algorithm is prohibited by FIPS 140-2. MongoDB driversdeprecate MD5 support and will remove MD5 generation in futurereleases. Applications that require a file digest should implementit outside of GridFS and store in files.metadata.

An MD5 hash of the complete file returned by the filemd5 command. This value has the Stringtype.

  • files.filename
  • Optional. A human-readable name for the GridFS file.
  • files.contentType
  • Deprecated

Optional. A valid MIME type for the GridFS file. For applicationuse only.

Use files.metadata for storing information related to theMIME type of the GridFS file.

  • files.aliases
  • Deprecated

Optional. An array of alias strings. For application use only.

Use files.metadata for storing information related to theMIME type of the GridFS file.

  • files.metadata
  • Optional. The metadata field may be of any data type and can holdany additional information you want to store. If you wish to addadditional arbitrary fields to documents in the filescollection, add them to an object in the metadata field.

GridFS Indexes

GridFS uses indexes on each of the chunks and files collectionsfor efficiency. Drivers that conform tothe GridFS specification automatically create these indexes forconvenience. You can also create any additional indexes as desired tosuit your application’s needs.

The chunks Index

GridFS uses a unique, compound index on the chunks collection using thefiles_id and n fields. This allows for efficient retrieval ofchunks, as demonstrated in the following example:

  1. db.fs.chunks.find( { files_id: myFileID } ).sort( { n: 1 } )

Drivers that conform to the GridFSspecification will automatically ensure that this index exists beforeread and write operations. See the relevant driver documentation for thespecific behavior of your GridFS application.

If this index does not exist, you can issue the following operation tocreate it using the mongo shell:

  1. db.fs.chunks.createIndex( { files_id: 1, n: 1 }, { unique: true } );

The files Index

GridFS uses an index on the files collection usingthe filename and uploadDate fields. This index allows forefficient retrieval of files, as shown in this example:

  1. db.fs.files.find( { filename: myFileName } ).sort( { uploadDate: 1 } )

Drivers that conform to the GridFSspecification will automatically ensure that this index exists beforeread and write operations. See the relevant driver documentation for thespecific behavior of your GridFS application.

If this index does not exist, you can issue the following operation tocreate it using the mongo shell:

  1. db.fs.files.createIndex( { filename: 1, uploadDate: 1 } );
[1](1, 2) The use of the term chunks in the contextof GridFS is not related to the use of the term chunks inthe context of sharding.

Sharding GridFS

There are two collections to consider with gridfs - files andchunks.

chunks Collection

To shard the chunks collection, use either { files_id : 1, n : 1} or { files_id : 1 } as the shard key index. files_id is anobjectid and changes monotonically.

For MongoDB drivers that do not run filemd5 to verifysuccessful upload (for example, MongoDB drivers that support MongoDB4.0 or greater), you can use Hashed Sharding for thechunks collection.

If the MongoDB driver runs filemd5, you cannot useHashed Sharding. For details, see SERVER-9888.

files Collection

The files collection is small and only contains metadata. None ofthe required keys for GridFS lend themselves to an even distribution ina sharded environment. Leaving files unsharded allows all the filemetadata documents to live on the primary shard.

If you must shard the files collection, use the _id field,possibly in combination with an application field.