Decision Making: Relational or NoSql
From some time, products are leaning towards NoSql databases because of number of advantages that they provide compared to Relational databases (RDBMS), especially in today’s distributed systems. There is always a pressure of delivering things faster to make it Live to end user. But does that means relational databases won’t be able to compete with NoSql databases, provided Relational Databases still best known for adhering to ACID property.
Here, I am going to explore the possibilities from Relational Databases perspective and how they come a long way to compete with NoSql Databases. I am going to compare two managed databases from AWS i.e. DynamoDB and Aurora, to see is it really worth to give credit to one type of database over other?
When it comes to distributed systems or databases, there is fundamental difference in architecture of NoSql and Relational Database distributed DB modeling based on the priorities that they opted for. That is a common design pattern used in most of the databases available.
Relational Databases mainly uses Master-Slave Architecture and on contrary in NoSql Databases there is no such master-slave or primary secondary concept.
Following are the diagrams depicting these two patterns:
According to CAP theorem (based on Wikipedia):
- Consistency: Every read receives the most recent write or an error
- Availability: Every request receives a (non-error) response, without the guarantee that it contains the most recent write
- Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.
Generally, one can divide some common databases on the basis of CAP theorem and can get idea what is compromised by what type of DB.
Relational DB adheres to ACID property, hence they mainly prefer ‘Consistency’ and ‘Availability’ over ‘Partition tolerance’ and that is why the architecture pattern observed for distributed Relational Databases is generally Master-Slave, as that suits their need best. While, NoSql DB mainly Prefer ‘Partition tolerance’ to provide the high scalability, hence they can additionally provide either ‘Consistency’ or ‘Availability’.
Advantages of using NoSql over Relation
NoSql can provide following advantages that are usually considered while opting for NoSql over Relational Database:
- It can allow user to store structured or semi-structured or unstructured data.
- It can scale fast on heavy load, due to its Scale-Out capability (i.e. Horizontal Scaling).
- Zero downtime, if any of the DB instance is fails anytime.
A Unique Comparison
We are aware of Relational Database capabilities and their major capability of supporting ACID property all the time. That make this type of database as most reliable, but due to some of the advantages that NoSql is providing, shouldn’t make Relational Database out of the game. But if somehow, Relational DB start to provide NoSql capabilities to some extent that can fulfill our requirements then people would definitely prefer Relational DB for sure.
Let’s now explore this, on the basis of two managed Databases feature provided by AWS i.e. Aurora (Relational Database) and DynamoDB (NoSql).
Consistency: Aurora ensures Strong Consistent Read, this is because of its Architectural configuration in distributed system. Aurora splits a DB instance in two parts, one is its light weight Compute Engine and second is the Storage volume. On Write Request, Aurora’s master instance write data to each storage volume in cluster in Sync, hence ensuring consistent data across cluster. Depending on frequency of updates, Aurora mostly takes less than 100ms lag to replicate data within cluster volume after any write over primary Instance. On contrary, DynamoDB gives eventual Read Consistency (for base table and GSI), with a lag of ~1–2 seconds across replicas.
e.g. For an update frequency of 13.8K updates/second, Aurora takes 7.27ms replica lag**.
Throughput: DynamoDB provides can provide very high throughput and can scale out and in i.e. Horizontally scale without any known limit. But if we know how much load our system can expect today or in future then Aurora is not very far too in this race. Aurora cluster can also horizontally scale (i.e. scale out) for Read requests via running the replica instances (with a max of 15). For Write Requests there is no option of Scale out, only Scale Up (i.e. Vertical Scaling) is possible. But do check this performance stats before making decision here too.
Depending on the workload, an Aurora cluster with a single r4.16xlarge DB instance can process more than 600,000 SELECT statements per second. Again depending on workload, such a cluster can process 200,000 INSERT, UPDATE and DELETE statements per second. This read/write throughput can be further increased by horizontal/vertical scaling.**
Horizontal Scaling is easier and faster in case of Aurora, reason being the only requirement of spinning up new Compute engine only as Data replica is already there across cluster.
Concurrency: Aurora can serve concurrent requests in a better manner compared with DynamoDB. Any concurrent Write Requests will always go to single DB instance i.e. Master node and that will ensure no Data inconsistency also via internal locking mechanism. But in case of DynamoDB, it’s true that multiple Instances can pick Concurrent Requests but that will corrupt data in some way for sure.
Connectivity: Connectivity to Aurora is via database connectivity protocols and we have ORM supports to make that easy. For DynamoDB, connectivity and data read/write requests are using HTTP(s) protocols. DynamoDB comes with AWS SDK to make that middle work easier for developer.
Global Replication: Both Databases provides support to replicate data across regions via Global tables or Database.
Storage Capacity: There is no storage capacity limit so far with DynamoDB. But Aurora has a limit, but that is also huge i.e. 128 TB (as of now).
Data Backup: DynamoDB provides Point-In-time recovery and On-Demand backup features. But Aurora provides Continuous backup feature to S3, which can’t be disabled. Hence, data is always current and safe, in case of failure and restoration would be needed.
Failover Recovery: Any failure in Primary DB Instance in Aurora will results in promotion of one of the Replica as primary (depending on Priority given, if any). This failover recovery takes <~1–2 minutes and requests to primary instance faces downtime. If there is no Replica created then it could take around 10min to restore primary instance. This downtime ensures no data inconsistency or corruption in the system. While no downtime in case of DynamoDB with some chances of data corruption though, if instance fails before data sync across cluster.
Partitioning Strategy: Partitioning of requests in Aurora is via load balancer (i.e. Horizontal scaling), but for DynamoDB request partitioning is using Sharding on the basis of Partition Key.
Transactional Support: All relational database provides transactional support and Aurora is not different here. But unlike common NoSql DB characteristic, DynamoDB provides transactional support with some limitations though.
Reactive Paradigm Support: DynamoDB works well with reactive paradigm. For now, all RDBMS doesn’t have support to work with Reactive stack. But good news here is, Aurora which is based on either MySql/PostgreSql engine and R2DBC provides reactive driver support for both of these.
JSON Support: DynamoDB (and other NoSql) supports unstructured or semi-structured data i.e. kind of schema less data model due to the support of JSON data storage and retrieval. But here too, Aurora (based on MySql/PostgreSql) provides support for JSON column and retrieval based on any of its attributes. It also supports index creation on JSON attributes using some tweaks.
Costing: In case of DynamoDB costing is based on RCU/WCU. DynamoDB charges one write request unit for each write (up to 1 KB) and two write request units for transactional writes. For reads, DynamoDB charges one read request unit for each strongly consistent read (up to 4 KB), two read request units for each transactional read, and one-half read request unit for each eventually consistent read. For Aurora, costing is based on chosen DB instance, I/O requests, backup storage, Data transfer.
Going through all these features provided by both of these databases, one can make decision on the selection of Relational or NoSql database. Yes, NoSql (in some cases) have edge over Relational Database, but that can be analyzed based on the requirement, whether that can be compromised or not. If one is looking for ACID database property and wanted to ensure that every transaction comply that and wanted to get the flexibilities provides by NoSql DB, then today it’s possible as Relational Databases also providing very competing features in addition to its core feature.