Data Modeling Introduction

Data Modeling Introduction

The key challenge in data modeling is balancing the needs of theapplication, the performance characteristics of the database engine, andthe data retrieval patterns. When designing data models, alwaysconsider the application usage of the data (i.e. queries, updates, andprocessing of the data) as well as the inherent structure of the dataitself.

Flexible Schema

Unlike SQL databases, where you must determine and declare a table’sschema before inserting data, MongoDB’s collections, by default, does not require its documents to have the same schema. That is:

The documents in a single collection do not need to have the same setof fields and the data type for a field can differ across documentswithin a collection.
To change the structure of the documents in a collection, such as addnew fields, remove existing fields, or change the field values to anew type, update the documents to the new structure.

This flexibility facilitates the mapping of documents to an entity oran object. Each document can match the data fields of the representedentity, even if the document has substantial variation from otherdocuments in the collection.

In practice, however, the documents in a collection share a similarstructure, and you can enforce document validation rules for a collection during update and insertoperations. See Schema Validation for details.

Document Structure

The key decision in designing data models for MongoDB applicationsrevolves around the structure of documents and how the applicationrepresents relationships between data. MongoDB allows related data tobe embedded within a single document.

Embedded Data

Embedded documents capture relationships between data by storingrelated data in a single document structure. MongoDB documents make itpossible to embed document structures in a field orarray within a document. These denormalized data models allowapplications to retrieve and manipulate related data in a singledatabase operation.

For many use cases in MongoDB, the denormalized data model is optimal.

See Embedded Data Models for the strengths and weaknesses ofembedding documents.

References

References store the relationships between data by includinglinks or references from one document to another. Applications canresolve these references toaccess the related data. Broadly, these are normalized data models.

See Normalized Data Models for the strengths and weaknesses ofusing references.

Atomicity of Write Operations

Single Document Atomicity

In MongoDB, a write operation is atomic on the level of a singledocument, even if the operation modifies multiple embedded documentswithin a single document.

A denormalized data model with embedded data combines all related datain a single document instead of normalizing across multiple documentsand collections. This data model facilitates atomic operations.

When a single write operation (e.g.db.collection.updateMany()) modifies multiple documents,the modification of each document is atomic, but the operation as awhole is not atomic.

When performing multi-document write operations, whether through asingle write operation or multiple write operations, otheroperations may interleave.

For situations that require atomicity of reads and writes to multipledocuments (in a single or multiple collections), MongoDB supportsmulti-document transactions:

In version 4.0, MongoDB supports multi-document transactions onreplica sets.
In version 4.2, MongoDB introduces distributed transactions,which adds support for multi-document transactions on shardedclusters and incorporates the existing support formulti-document transactions on replica sets.

For details regarding transactions in MongoDB, see theTransactions page.

Multi-Document Transactions

For situations that require atomicity of reads and writes to multipledocuments (in a single or multiple collections), MongoDB supportsmulti-document transactions:

In version 4.0, MongoDB supports multi-document transactions onreplica sets.
In version 4.2, MongoDB introduces distributed transactions,which adds support for multi-document transactions on shardedclusters and incorporates the existing support formulti-document transactions on replica sets.

For details regarding transactions in MongoDB, see theTransactions page.

Important

In most cases, multi-document transaction incurs a greaterperformance cost over single document writes, and theavailability of multi-document transactions should not be areplacement for effective schema design. For many scenarios, thedenormalized data model (embedded documents and arrays) will continue to be optimal for yourdata and use cases. That is, for many scenarios, modeling your dataappropriately will minimize the need for multi-documenttransactions.

For additional transactions usage considerations(such as runtime limit and oplog size limit), see alsoProduction Considerations.

Data Use and Performance

When designing a data model, consider how applications will use yourdatabase. For instance, if your application only uses recentlyinserted documents, consider using Capped Collections. Orif your application needs are mainly read operations to a collection,adding indexes to support common queries can improve performance.

See Operational Factors and Data Models for more information on theseand other operational considerations that affect data model designs.