fbpx

MySQL vs. MongoDB – which is better for your product?

LAST UPDATED: September 18, 2020

Barisere Jonathan

MySQL vs MongoDB

It’s the start of a new project. Everyone is present at the meeting, although each person is thinking about something different from everyone else. Someone familiar pops the question, “what stack are we using?” Although the choice of technology is important, most products do fine with the bare minimum. Familiarity is often a bigger factor than technical merits, at the start of a project. So everyone pitches their favourites, or the “company standard” is chosen by default.

This process is similar for database choice, but non-trivial information systems need to pay more attention to their choice of database management systems (DBMS). React or Vue? Meh. C# or Java? Dude, we’re a Dot NET shop. MySQL or MongoDB? Um… hold on, let’s think about that.

Data is often the most durable asset in information systems. The rate at which data is ingested, the space available to store data, the kinds of queries to be made on the data, the usage patterns, the people interested in the data, the locations from which the data can be accessed, etc. These are some concerns of information systems, and they influence the choice of DBMS.

Who chooses the DBMS?

Having established that choice of DBMS is important, I must add that responsibility for this choice is equally important. The implications of this choice spread across the entire system. Choice of DBMS influences system architecture, operation, maintenance, availability, scalability, etc. A simple decision matrix is insufficient for understanding these implications. This article will help you make a good enough decision in situations where you don’t have an expert database administrator to help, as is often the case on small teams where a “back-end developer” has to make this decision. At larger scales, experts should be responsible for these decisions.

Considerations for choosing a DBMS

As you may have guessed, choosing a DBMS can be complex. Let us address a bit of how this choice can be made between two popular DBMS software, MySQL and MongoDB. We’ll compare both based on the following.

  1. Data model
  2. Consistency requirements (foreign key constraint)
  3. Scalability
  4. Familiarity
  5. Development experience and tooling
  6. Text search support
  7. Driver support

There are more criteria than these; your database administrator should understand your requirements better.

Data Model

Expressing the relationships among data entities in an information system is an important part of database design. Different databases support different data models, and each model comes with its trade-offs. MongoDB supports the document model, and MySQL supports the relational model.

The document model and the relational model are very similar. In fact, there are trivial mappings between both models. Documents (as stored in MongoDB) are just a different spatial arrangement of tables. The term NoSQL is, in fact, a misnomer for MongoDB, because we can build SQL interfaces to MongoDB. Why are there two models if they are so similar?

While the relational model is built on relational algebra, allowing you to apply relational reasoning to your database, the document model is not formally specified. The document model seeks to represent data in a form similar to that used by an application. Therefore, relational databases such as SQL can adhere to a standard model (at least in reasoning), while document databases have ad hoc implementations.

If relational reasoning is important to your information system, then relational databases may be the best choice for you. Even though MongoDB can represent several concepts from the relational model, this similarity is only incidental: you won’t be able to easily reason in the same way if you used Google Cloud Firestore, for instance. If having a simple document interface without the restrictions of the relational model is all you need, then MongoDB fits.

Consistency requirements

An obsessive consistency is the hobgoblin of relational databases, adored by paranoid database administrators and programmers and CTOs.

In online transaction processing (OLTP) systems, data is not just written to a database once. It is often subsequently read and modified. Users update their purchase orders, delete their cat pictures before we get a chance to like them, or even delete their accounts because they are feeling bad on a Saturday morning (it happens). Usually, multiple users access the same data simultaneously, and some of them may try to modify the data. Data that are referenced elsewhere in the system may be deleted. In all these cases, we want to ensure that the minimum required consistency guarantees of our system holds.

By consistency, I mean that:

  1. If there are several copies of the same data in the system, they must be in sync with one another. The synchronization may be delayed, but all copies must eventually converge on the same value.
  2. Operations on the data stored by the system must leave them in a valid state.

One such constraint on consistency is the foreign key constraint. It basically means that if a piece of data refers to another piece of data elsewhere in the system, that reference must exist at all times. For example, if my Twitter profile says I follow Sokari Gillis-Harry, then Sokari’s Twitter account must exist (how else would I follow him?).

MySQL (and other relational databases) supports and enforces the foreign key constraint, but MongoDB does not. This isn’t a terrible thing, there are benefits to it and drawbacks. Foreign key constraints mean that database writes and deletions can take longer because the DBMS has to look around and verify that everything is in order; combining data from various tables also takes longer because JOINs are expensive. If your use case doesn’t require that (say you are building an alt-Twitter where we can follow ghosts), then MongoDB is fine. But if you are taking my ice cream orders, you really want this constraint, and MySQL is better for you.

Consistency and duplication

“To normalise, or not to normalise, that is the question:

Whether ’tis better for the system to suffer

The confusion and wastage of data duplication,

Or to group the data into a thousand little tables,

And by joining end them…”

The foreign key constraint often comes in when we put our database in third normal form (3NF). We do this to avoid data repetitions across rows in a table, because such duplicated data can get out of sync easily (even if we try keeping them in sync, we end up doing much unnecessary work). But there are cases where what might appear as duplication is actually not a duplication.

Consider a platform that sells tickets for online meet-ups. A typical meetup has not more than 50 attendees. Each ticket is unique in the sense that they cannot reuse tickets. Meetup organisers can announce their schedule and capacity, and attendees can register for upcoming meetups.

We can model that as a one-to-many relationship among users and tickets, and among meetup events and tickets, in our database. An ER diagram for such a model may look like the following.

Meetup Ticketing Relationships ER Diagram
Meetup Ticketing Relationships ER Diagram

However, modelling the tickets-meetup relationship in this way can introduce complexity that we don’t need. Because each meetup event is unique, as is each ticket, there cannot be duplication. We can consider each meetup and its tickets as a single, coherent entity. Therefore, we can store the tickets for each meetup as an array on the meetup document in MongoDB, avoiding the need for joining events and tickets using foreign keys. A sample document following this model may look like the following JSON document.

{
  "_id": "5f5b58bb0db1a0694a340966",
  "name": "Port Harcourt Elixir Users Group",
  "date": "2020-10-20T11:30:00+01",
  "organizer": "5f5b598b0db1a0694a340967",
  "max_attendees": 50,
  "tickets": [
    {
      "_id": 11,
      "user": "5f5b598b0db1a0694a340968",
      "n_seats": 2,
      "purchase_txn": {
        "txn_ref": "",
        "cents_amount": 400,
        "date": "2020-09-20T12:30:00+01",
        "channel": "USSD"
      }
    }
  ]
}

Before you normalise, consider whether it is possible to represent your data in a way that preserves consistency and avoids duplication.

Scalability

As more users use your information system, their activities generate more data and traffic to the database. As more users sign up, you store more user accounts; as more purchases are made, you store more purchase orders; as more cat photos are uploaded,… you get the idea.

Increased load on a DBMS increases its use of system resources: CPU works harder, we use more memory, more I/O activity occurs. Soon a single server can no longer handle the load, so you add more resources to cope with the load. This is known as scaling.

There are two ways to scale your database. You can buy or rent a bigger server computer with more memory, a faster CPU with more processor cores, a faster network connection, etc. This is known as vertical scaling. Alternatively, you can buy more servers like the one you already have, then connect them together so they share the load. This is known as horizontal scaling.

Which method of scaling should you go for? As with many things in information systems, “it depends…” Vertical scaling usually keeps your applications the same as they were. Nothing changes, just more capacity. But it can get expensive quickly. While horizontal scaling moves you into the realm of distributed systems. Because your server machines are now physically separate, it takes time for shared data to be copied onto each one (replication lag). Your applications now need a way of knowing which machine to talk to (service discovery). Your server cluster also needs a way of knowing which machine holds the most up-to-date data, as there may be occasional differences between what data each one holds. Despite these problems, horizontal scaling can make your database more available, because another machine can service requests if one is down. You can also use partitioning to distribute traffic evenly among the database servers.

Whichever method of scaling you choose, know how your DBMS supports it. Both MongoDB and MySQL can be configured for either method, but the ease of setting them up varies. MySQL usually rely on third-party tools for horizontal scaling, while MongoDB has excellent support for horizontal scaling built in.

Familiarity

I mentioned earlier that familiarity is often a big factor in choosing implementation technology. The people that will use and operate your DBMS have to be familiar with it. The best tool is the one you can use effectively.

MySQL (and relational databases) have been around for a long time. Even if your team is not familiar with MySQL, they may be familiar with other RDBMS such as SQLite and PostgreSQL. Their experience with these other technologies can apply to MySQL easily.

MongoDB is newer, and there may not be many people familiar with it. Of course this has changed over the years, and MongoDB is popular. But if your team cannot use either effectively, it may be a good idea to use what they are familiar with, or to train them on it. Your DBMS is a critical part of your infrastructure, so you want to be sure that your team understands it well.

Development experience and tooling

The welfare of your development team matters as much as the product they are building. Ask any person who has ever used a crude tool how terrible it feels. Neither MySQL nor MongoDB are crude, but from our experience MongoDB is a much more pleasant technology to work with. There is a cohesive set of products to support applications, including the recently released MongoDB Realm that provides secure data access from client applications like mobile apps. Besides the core database, MongoDB’s claim to be “the database for modern applications” has some teeth to it. Of course your programming language and libraries also matter: F#’s SQL data provider is a brilliant tool that any programmer would crave. Whenever possible, pick the tools that make life easier for your team, there’s enough of other troubles to worry about.

Text search support

You may need to enable users to find things on your information system by searching for a phrase. When they submit a search phrase, you want to give them the most relevant results, the ones that match the search phrase better. The best way to do thus is by querying a separate search engine (such as Elasticsearch or Solr) to find the best matches. But you may not have the appetite to maintain a search server, especially if you’re building an MVP. The volume of text search queries may be too low to even bother. In such cases, MongoDB can serve you well. In our experience, MongoDB does much better than MySQL at text search, although it can’t compete with the likes of Elasticsearch. We have used MongoDB to handle text search in successful products, and you can do the same. If you have this use case, MongoDB is a good option.

Database Driver support

A database driver is a piece of software that allows you to access your DBMS from your application. Most databases have some protocols for communicating with them; the driver software formats your communication with the database in terms of its protocol.

Drivers are usually written for programming languages. If a driver is available for your chosen programming language, then you can access your database from that programming language. But if there is no driver, you may have to write one yourself or use another programming language.

MySQL, being an older technology, supports more programming languages than MongoDB.

TL;DR

 

What I want MongoDB MySQL
Consistency with foreign keys ✔✔
Easy horizontal scalability ✔✔
Something familiar ✔✔
A better development experience ✔✔
Basic text search support ✔✔
Support for many programming languages ✔✔
A data model that closely resembles my programming language’s data types ✔✔

 

Your choice of a database management system is important. Your information system builds on top of the data it stores. Databases can quickly become serious bottlenecks in an information system, and managing them is tricky because they are inherently stateful. The choice of what DBMS to use should be made thoughtfully. Your data model, your consistency requirements, the expected amount of load, your team’s familiarity with the DBMS, support for specific operations such as text search, and driver availability for your database clients, are factors you should consider when choosing a DBMS.

Sometimes what you think is an absolute requirement goes away if you formulate the problem differently. Explore different ways to represent your data, then pick the one that requires the least amount of constraints to meet your needs. Although familiarity is important, consider the best solution given your constraints.

About the Author

Message Sent

Let's connect