Wednesday, May 18, 2016

Designing Scalable MongoDB Documents versus Relational DB Entities

One of the challenging to design MongoDB data model is the background knowledge of relational DB which will affect our ability to design optimal scalable data model structure.

In this post we will demonstrate a use case that is taken from the book; Instant MongoDB by Amol Nayak.

The use case is about Students enrolled in courses that taught by lecturers.
The relations can be summarized as following:

We have student use case where,
- Student enroll into courses (many to many)
- Each course can belong to many categories (one to many)
- Each course is delivered by many lecturers (one to many)
- Each course has content (one to one)
- Each course content is divided into parts (one to many)
- Each content part is related to assignments (one to many)
- Each student has assignment submission that is related to assignment (one to one)

Now to model this ER diagram for MongoDB documents, we need to do the following:

1) Think of the main documents that we have
The main document is a key player, well defined, and contains a lot of information that doesn't let it simply included in other documents.
We can think of Student, Course and Lecturer.

2) Embed Related Documents
We can see the Student embed his/her submissions while Course embed all other documents that is related to it such as catalog, content, assignment being all part of the course document.

3) Add reference to other documents (using id)
We can see the Student reference his/her courses.
Course reference the lecturers .

4) Add minimal information to the referenced documents
Select the information that is not frequently change and will be mostly needed in the application.
e.g. add course name in the referenced course in the student document (mostly will be required instead of go and query the course document, plus the course name is rarely change).
Also add lecturer name in the course document which will be mostly required and will change rarely as well but will prevent us from query the lecturer document to get the name with each course.

5) Revisit the documents 
To see if we can omit some documents and include them in one of the existing documents.
So for example if you decided to have a separate document for Course Category, at this step you'll see that the category has only name value so it is better to include it inside the course instead of reference it with id+name as it will cost us more information in that case.

 As we can see in the previous figure, we have identified 3 main documents with some embedded documents and we have selected the referenced documents, finally we have included the required minimal data in each of the referenced entities.
For example Course category has only static values, so we have included it entirely in our Course document and we didn't defined a separate document for it.
The same for Students submissions which reference the assignment but include all the required information so no separate document for it.
The other information that is related to course is also included in the course being part of the course document including the content parts, assignments, etc..

The challenge about the Lecture object here, the lecturer object has a lot of information about the lecturer that doesn't make any sense to put them in the course document and repeat them for different courses instead we can reference the lecturer document and define the minimal required information that we need to show or use it in our application, in this case lecturer name, the good thing about this information is not frequently change as well.

This is how to design documents in mongoDB for salable applications.