"Embed or reference?" is the MongoDB question that trips up even seasoned developers.
EMBED
. Data is usually read together
. Relationship is one-to-few
. Nested data stays bounded
. No join needed, one read gets all
REFERENCE
. Data shared across many documents
. Relationship grows large
. Child data can grow unbounded
. Suitable for users, tags, large comment threads
Choosing between embedding and referencing involves understanding the tradeoff between data locality and flexibility. Embedding provides atomic updates and quick reads when data is tightly coupled. Referencing offers flexibility and scalability as relationships grow and data is shared widely.
Where developers go wrong with MongoDB design:
1. Treating MongoDB like a relational DB. Splitting everything into separate collections, then relying heavily on
$lookup. This mimics join complexity without leveraging the document model's strengths.
2. Embedding too much. If nested data grows unbounded, you're in trouble. MongoDB has a 16MB document size limit, hit it, and your writes fail.
3. Referencing too eagerly. If most reads involve following multiple references, you're adding unnecessary complexity and slowing down your queries.
4. Ignoring access patterns. If you don't model data around how your application reads it, you're likely to encounter performance bottlenecks.
5. Failing to anticipate schema evolution. Systems change, and both embedded and referenced models must support future flexibility. Plan for changes in data structure and access patterns.
Design MongoDB documents based on how your application actually accesses data, not on relational habits. This approach maximizes performance and leverages MongoDB's strengths.
Bookmark this for when you find yourself designing MongoDB schemas at midnight!