
DATA ON AWS
Specialty Databases in AWS — QLDB, Timestream, Neptune, and Keyspaces
A quick overview of the different specialty databases in AWS and their non-AWS alternatives
AWS probably leads the different cloud platforms in the number and maturity of the different DBaaS offerings. Starting with the most popular database services like RDS, Redshift, and DynamoDB, AWS has started offering many other specialty databases like QLDB, Timestream, Nepture, etc. In this article, I’ll summarize all the database offerings in plain English.
One can categorize the database offerings broadly into two categories — relational and non-relational. Still, that categorization doesn’t really help a lot these days because there are so many different types of non-relational databases. There’s a noticeable skew there. Rather than categorizing them, we’ll look at each of the databases individually. Let’s start with the latest additions first.
Timestream
Timeseries databases can both be relational and non-relational. The timeseries space is occupied mainly by the big guns of the TSDB arena like InfluxDB and TimescaleDB. Recently, there have been a few disruptors in the space like QuestDB. Amazon’s Timestream hopes to occupy some of that space. Amazon has built Timestream from the ground up.
As it’s early days for this database, don’t expect it to be as feature-rich as one of the old ones, but with AWS’s release history, one can expect it to get feature-rich very quickly. If you want to compare Timestream with other timeseries databases, you can use the timeseries benchmark suite (maintained by TimescaleDB). You can follow the instructions from a tutorial that I recently wrote:
Timeseries databases have seen the highest spike after Graph databases in the last decade. There are going to be increasing use-cases for timeseries databases in the time to come.
QLDB
No, this is not a blockchain protocol implementation. It does, however, take some of the same concepts of the blockchain, such as cryptographic verifiability and ledger immutability. QLDB stands for Quantum Ledger Database. It’s a database that, by design, stores every change in the database in such a fashion that you can cryptographically verify the current and any historical state of the data. This is especially useful in fighting high financial crime, auditing, etc.
Until now, when it comes to financial transactions, a relational database is the de facto standard. NoSQL databases have offered schema flexibility and scalability, but none of those has provided the level of ACID compliance that’s really required. It’s yet to be seen if databases like QLDB will beat the speed and accuracy of transactional databases like Oracle, MySQL, and PostgreSQL.
QLDB uses Amazon’s improvement on JSON called Amazon ION to store the data. Feature-wise, it’s a superset of JSON.
It’s worth repeating that QLDB is not like any other database out there. There’s no natural alternative to this right now. Looking up on Google, you might end up comparing Hyperledger with QLDB, but that won’t be a fair comparison.
Neptune
Graph databases have seen a meteoric rise in the last decade. Early players like neo4j and ArangoDB have acquired a chunk of the market share, but many other players are jumping in now. Amazon’s Neptune is one of the late additions to the list of graph database offerings. Again, because of the comparably recent launch, Neptune might fall short of many of the features of a neo4j, but it integrates seamlessly with all the other AWS services — and this is a big plus!
Neptune is an ACID-compliant database that stores data in edges and vertices. It supports queries in two languages — SPARQL and TinkerPop Gremlin. More might be on the way. Neptune might be significantly more available and scalable than any other hosted database. Like Aurora, it offers multi-AZ high availability and asynchronous replication for up to 15 read replicas.
Keyspaces
Other significant players like DataStax have also envisioned a future where Cassandra is cloud-native. This is Apache Cassandra on AWS. Like DataStax’s AstraDB, Keyspaces is also a serverless offering from AWS. Currently, there’s no option to provision a Cassandra cluster in AWS. You can do it using EC2, but then you can do it on any cloud platform.
Although Cassandra is very restrictive in the types of queries you can write to read data from the database; it is surprisingly good with many different use cases like streaming data ingestion, timeseries data, bulk data ingestion, etc. Cassandra is especially significant when you have really wide datasets. Many businesses use it as a dumping ground for data. Zomato moved away from Redis to Cassandra about five years ago to enable high scalability in their food feed feature. Now, Cassandra is a central part of their architecture.
Conclusion
The data stack, which probably has relational databases and data warehouses right now, will have specialty and boutique databases and data warehouses too in the near future. That’s why learning about these specialty database offerings is essential! Timeseries databases own only about a 1% share in the market right now, but that might increase with time if people realize that it makes more sense to use Timeseries databases over traditional relational databases for specific use-cases.
What’s next?
I will follow this article with an overview of the remaining databases and a cheat sheet (practical user’s guide) for each of the databases we discussed. I will also discuss other specialty databases that might be coming up soon for handling other types of data, for instance, geospatial data.
If you’re interested in reading more of my writings about data & infrastructure engineering, you can visit this page and subscribe! You can also connect with me on LinkedIn.
More content at plainenglish.io
