RDBMS vs NoSQL vs NewSQL

SQL and traditional RDBMS is inevitably a key surviving skill for every programmer. SQL is good given that it is more or less an universal standard in database, or when we want to interact with any database. RDBMS is good because they provide ACID guarantee for programmer, which drastic simplify programmer life especially in web environment.

However, RDBMS suffers from scalability and concurrency problem when it comes to web scale. The common 1 master + N slaves or data sharding technique only postpone the problems by several times. It also posts a few limitations like Read-Write ratio and the partition key has to be carefully selected, which means the designers have to aware of the limitation.

Some people start moving on to NoSQL, whether it is NO SQL or NOT ONLY SQL is debatable. However, from an engineering point of view, NoSQL is a solution to a specific problem but not a silver bullet.

In general, NoSQL can be further divided into the following categories.
– Graph Database – Neo4j
– Document Base Database – MongoDB
– Key Value Database – Redis / Memcache
– Column Oriented Database – Cassandra
– Time Series Database (Extension) – Riak DB / OpenTSDB

Each of them are solving a particular business model or use cases. For example, Graph DB are used to handle parties and relationships very efficiently. Document Base DB can handle hierarchy data better than RDBMS. We usually trade scalability for giving up Transaction capability.

The famous CAP theorem (Brewer’s theorem) describes the situation that Consistency, Availability and Partition Tolerance are mutually exclusive, we can only pick two out of three. Each NoSQL DB and RDBMS are following this rules with NO EXCEPTION.

SQL is still widely used as of today, since many software are not really web scale or the management doesn’t know or want to pay the cost for web scale. So, people are dreaming to have SQL & RDBMS capabilities while having the NoSQL scalability? It becomes the goal for NewSQL. Google (The God, again) has published a paper for the concept of Spanner, which is now in production in Google Cloud Platform.

https://research.google.com/archive/spanner.html

In short, it tries to detach the TX manager and the underlying Storage Manager, so that for each query or update, the TX Manager can acquire the relevant Storage managers. Since the Storage is handle locally, the storage itself is usually bigger in size (64MB) and can be distributed. It fits perfectly in AWS S3 or Google Cloud Storage. However, as it involves Network operations, the overall performance should be slower than a local DB with smaller dataset.

There are other local implementations, like VoltDB or CockroachDB.