Specialty Databases in AWS — QLDB, Timestream, Neptune, and Keyspaces

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

3212

Abstract

towardsdatascience.com/the-case-for-using-timeseries-databases-c060a8afe727">increasing use-cases for timeseries databases</a> in the time to come.</p><h1 id="cfb8">QLDB</h1><p id="67ef">No, this is not a blockchain protocol implementation. It does, however, take some of the same concepts of the blockchain, such as cryptographic verifiability and ledger immutability. QLDB stands for Quantum Ledger Database. It’s a database that, by design, stores every change in the database in such a fashion that you can cryptographically verify the current and any historical state of the data. This is especially useful in fighting high financial crime, auditing, etc.</p><p id="121c">Until now, when it comes to financial transactions, a relational database is the de facto standard. NoSQL databases have offered schema flexibility and scalability, but none of those has provided the level of ACID compliance that’s really required. It’s yet to be seen if databases like QLDB will beat the speed and accuracy of transactional databases like Oracle, MySQL, and PostgreSQL.</p><blockquote id="f096"><p><i>QLDB uses Amazon’s improvement on JSON called <a href="https://amzn.github.io/ion-docs/guides/cookbook.html">Amazon ION</a> to store the data. Feature-wise, it’s a superset of JSON.</i></p></blockquote><p id="3df3">It’s worth repeating that QLDB is not like any other database out there. There’s no natural alternative to this right now. Looking up on Google, you might end up comparing Hyperledger with QLDB, but that won’t be a fair comparison.</p><h1 id="ba29">Neptune</h1><p id="bb60">Graph databases have seen <a href="https://thenewstack.io/graph-databases-why-are-they-suddenly-popular/">a meteoric rise in the last decade</a>. Early players like neo4j and ArangoDB have acquired a chunk of the market share, but many other players are jumping in now. Amazon’s Neptune is one of the late additions to the list of graph database offerings. Again, because of the comparably recent launch, Neptune might fall short of many of the features of a neo4j, but it integrates seamlessly with all the other AWS services — and this is a big plus!</p><p id="860c">Neptune is an ACID-compliant database that stores data in edges and vertices. It supports queries in two languages — SPARQL and TinkerPop Gremlin. More might be on the way. Neptune might be significantly more available and scalable than any other hosted database. Like Aurora, it offers multi-AZ high availability and asynchronous replication for up to 15 read replicas.</p><h1 id="965a">Keyspaces</h1><p id="105b">Other significant players like DataStax have also envisioned <a href="https://techcrunch.com/sponsor/datastax/the-future-of-apache-cassandra-is-cloud-native/">a future where Cassandra is cloud-native</a>. This is Apache Cassandra on AWS. Like DataStax’s AstraDB, Keyspaces is also a serverless offering from AWS. Currently, there’s no option to provision a Cassandra cluster in AWS. You can do it using EC2, but then you can do it on any cloud platform.</p><p id="0cb1">Although Cassandra is very restrictive in the types of queries you can write to read data from the database; it is surprisingly good with many different use cases like strea

Options

ming data ingestion, timeseries data, bulk data ingestion, etc. Cassandra is especially significant when you have really wide datasets. Many businesses use it as a dumping ground for data. Zomato <a href="https://www.zomato.com/blog/how-we-moved-our-food-feed-from-redis-to-cassandra">moved away from Redis to Cassandra about five years ago to enable high scalability in their food feed feature</a>. Now, Cassandra is a central part of their architecture.</p><h1 id="edf8">Conclusion</h1><p id="bf5d">The <a href="https://towardsdatascience.com/the-new-data-engineering-stack-78939850bb30">data stack</a>, which probably has relational databases and data warehouses right now, will have specialty and boutique databases and data warehouses too in the near future. That’s why learning about these specialty database offerings is essential! Timeseries databases own only about a 1% share in the market right now, but that might increase with time if people realize that it makes more sense to use Timeseries databases over traditional relational databases for specific use-cases.</p><div id="2690" class="link-block"> <a href="https://towardsdatascience.com/complete-data-engineers-vocabulary-87967e374fad"> <div> <div> <h2>Complete Data Engineer’s Vocabulary</h2> <div><h3>Concepts that data engineers must know in 10 words or less</h3></div> <div><p>towardsdatascience.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*j0ygxtpcMo74J5j5ESAAGg.jpeg)"></div> </div> </div> </a> </div><h1 id="8bca">What’s next?</h1><p id="74df">I will follow this article with an overview of the remaining databases and <a href="https://aws.plainenglish.io/aws-rds-cheatsheet-e421a4b1fb80">a cheat sheet (practical user’s guide)</a> for each of the databases we discussed. I will also discuss other specialty databases that might be coming up soon for handling other types of data, for instance, <a href="https://towardsdatascience.com/handling-geospatial-data-in-aws-a82ae364f80c">geospatial data</a>.</p><p id="db80">If you’re interested in reading more of my writings about data & infrastructure engineering, you can <a href="https://linktr.ee/kovid">visit this page and subscribe</a>! You can also connect with me on <a href="https://www.linkedin.com/in/kovidrathee/">LinkedIn</a>.</p><div id="2314" class="link-block"> <a href="https://aws.plainenglish.io/aws-rds-cheatsheet-e421a4b1fb80"> <div> <div> <h2>AWS RDS Cheatsheet</h2> <div><h3>And notes on using the relational database service by AWS</h3></div> <div><p>aws.plainenglish.io</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*lCCc8Max0yqo1O96gWqGwg.jpeg)"></div> </div> </div> </a> </div><p id="1325"><i>More content at <a href="http://plainenglish.io/"><b>plainenglish.io</b></a></i></p></article></body>

A quick overview of the different specialty databases in AWS and their non-AWS alternatives

AWS probably leads the different cloud platforms in the number and maturity of the different DBaaS offerings. Starting with the most popular database services like RDS, Redshift, and DynamoDB, AWS has started offering many other specialty databases like QLDB, Timestream, Nepture, etc. In this article, I’ll summarize all the database offerings in plain English.

One can categorize the database offerings broadly into two categories — relational and non-relational. Still, that categorization doesn’t really help a lot these days because there are so many different types of non-relational databases. There’s a noticeable skew there. Rather than categorizing them, we’ll look at each of the databases individually. Let’s start with the latest additions first.

Timestream

Timeseries databases can both be relational and non-relational. The timeseries space is occupied mainly by the big guns of the TSDB arena like InfluxDB and TimescaleDB. Recently, there have been a few disruptors in the space like QuestDB. Amazon’s Timestream hopes to occupy some of that space. Amazon has built Timestream from the ground up.

As it’s early days for this database, don’t expect it to be as feature-rich as one of the old ones, but with AWS’s release history, one can expect it to get feature-rich very quickly. If you want to compare Timestream with other timeseries databases, you can use the timeseries benchmark suite (maintained by TimescaleDB). You can follow the instructions from a tutorial that I recently wrote:

QLDB

No, this is not a blockchain protocol implementation. It does, however, take some of the same concepts of the blockchain, such as cryptographic verifiability and ledger immutability. QLDB stands for Quantum Ledger Database. It’s a database that, by design, stores every change in the database in such a fashion that you can cryptographically verify the current and any historical state of the data. This is especially useful in fighting high financial crime, auditing, etc.

Until now, when it comes to financial transactions, a relational database is the de facto standard. NoSQL databases have offered schema flexibility and scalability, but none of those has provided the level of ACID compliance that’s really required. It’s yet to be seen if databases like QLDB will beat the speed and accuracy of transactional databases like Oracle, MySQL, and PostgreSQL.

QLDB uses Amazon’s improvement on JSON called Amazon ION to store the data. Feature-wise, it’s a superset of JSON.

It’s worth repeating that QLDB is not like any other database out there. There’s no natural alternative to this right now. Looking up on Google, you might end up comparing Hyperledger with QLDB, but that won’t be a fair comparison.

Neptune

Graph databases have seen a meteoric rise in the last decade. Early players like neo4j and ArangoDB have acquired a chunk of the market share, but many other players are jumping in now. Amazon’s Neptune is one of the late additions to the list of graph database offerings. Again, because of the comparably recent launch, Neptune might fall short of many of the features of a neo4j, but it integrates seamlessly with all the other AWS services — and this is a big plus!

Neptune is an ACID-compliant database that stores data in edges and vertices. It supports queries in two languages — SPARQL and TinkerPop Gremlin. More might be on the way. Neptune might be significantly more available and scalable than any other hosted database. Like Aurora, it offers multi-AZ high availability and asynchronous replication for up to 15 read replicas.

Keyspaces

Other significant players like DataStax have also envisioned a future where Cassandra is cloud-native. This is Apache Cassandra on AWS. Like DataStax’s AstraDB, Keyspaces is also a serverless offering from AWS. Currently, there’s no option to provision a Cassandra cluster in AWS. You can do it using EC2, but then you can do it on any cloud platform.

Although Cassandra is very restrictive in the types of queries you can write to read data from the database; it is surprisingly good with many different use cases like streaming data ingestion, timeseries data, bulk data ingestion, etc. Cassandra is especially significant when you have really wide datasets. Many businesses use it as a dumping ground for data. Zomato moved away from Redis to Cassandra about five years ago to enable high scalability in their food feed feature. Now, Cassandra is a central part of their architecture.

Conclusion

The data stack, which probably has relational databases and data warehouses right now, will have specialty and boutique databases and data warehouses too in the near future. That’s why learning about these specialty database offerings is essential! Timeseries databases own only about a 1% share in the market right now, but that might increase with time if people realize that it makes more sense to use Timeseries databases over traditional relational databases for specific use-cases.

DATA ON AWS

Specialty Databases in AWS — QLDB, Timestream, Neptune, and Keyspaces

A quick overview of the different specialty databases in AWS and their non-AWS alternatives

Timestream

QuestDB vs. TimescaleDB

How to use the Time Series Benchmark Suite to compare database read and write performance of QuestDB versus TimescaleDB

QLDB

Neptune

Keyspaces

Conclusion

Complete Data Engineer’s Vocabulary

Concepts that data engineers must know in 10 words or less

What’s next?

AWS RDS Cheatsheet

And notes on using the relational database service by AWS