Data Modeling Introduction

On this page

Data in MongoDB has aflexible schema. Unlike SQL databases, where you must determine and declare a table’s schema before inserting data, MongoDB’scollectionsdo not enforcedocumentstructure. This flexibility facilitates the mapping of documents to an entity or an object. Each document can match the data fields of the represented entity, even if the data has substantial variation. In practice, however, the documents in a collection share a similar structure.

The key challenge in data modeling is balancing the needs of the application, the performance characteristics of the database engine, and the data retrieval patterns. When designing data models, always consider the application usage of the data (i.e. queries, updates, and processing of the data) as well as the inherent structure of the data itself.

Document Structure

The key decision in designing data models for MongoDB applications revolves around the structure of documents and how the application represents relationships between data. There are two tools that allow applications to represent these relationships:references_and_embedded documents.

References

References store the relationships between data by including links or_references_from one document to another. Applications can resolve thesereferencesto access the related data. Broadly, these are_normalized_data models.

数据模型 - 图1

SeeNormalized Data Modelsfor the strengths and weaknesses of using references.

Embedded Data

Embedded documents capture relationships between data by storing related data in a single document structure. MongoDB documents make it possible to embed document structures in a field or array within a document. These_denormalized_data models allow applications to retrieve and manipulate related data in a single database operation.

数据模型 - 图2

SeeEmbedded Data Modelsfor the strengths and weaknesses of embedding documents.

Atomicity of Write Operations

In MongoDB, write operations are atomic at thedocumentlevel, and no single write operation can atomically affect more than one document or more than one collection. A denormalized data model with embedded data combines all related data for a represented entity in a single document. This facilitates atomic write operations since a single write operation can insert or update the data for an entity. Normalizing the data would split the data across multiple collections and would require multiple write operations that are not atomic collectively.

However, schemas that facilitate atomic writes may limit ways that applications can use the data or may limit ways to modify applications. TheAtomicity Considerationsdocumentation describes the challenge of designing a schema that balances flexibility and atomicity.

Document Growth

Some updates, such as pushing elements to an array or adding new fields, increase adocument’ssize.

For the MMAPv1 storage engine, if the document size exceeds the allocated space for that document, MongoDB relocates the document on disk. When using the MMAPv1 storage engine, growth consideration can affect the decision to normalize or denormalize data. SeeDocument Growth Considerationsfor more about planning for and managing document growth for MMAPv1.

Data Use and Performance

When designing a data model, consider how applications will use your database. For instance, if your application only uses recently inserted documents, consider usingCapped Collections. Or if your application needs are mainly read operations to a collection, adding indexes to support common queries can improve performance.

SeeOperational Factors and Data Modelsfor more information on these and other operational considerations that affect data model designs.

Additional Resources