ElasticSearch as the primary database
The short answer is, it most likely wouldn’t be a good idea to use ElasticSearch as a primary store without some kind of backing database, due to the following reasons:
- Most critical reason is that there could be data loss, when dealing with large volumes of data. Apparently, all the innovation around ElasticSearch is around improving resiliency. Read more: https://www.elastic.co/guide/en/elasticsearch/resiliency/current/index.html
- ElasticSearch index sizes need to be pre-determined. Schema/Mapping changes require re-indexing. If the data grows in size or evolves and cannot be managed with original sharding or mapping strategies, have to migrate data into newer indexes. Now, the application has to both serve the incoming traffic and do migrations. No database technology would require you to estimate data sizes per table, if I were to take an analogy with typical databases.
- Performance is going to be a problem if all data queries need to be served out of ElasticSearch especially if volume of data is huge and all data is being indexed without specific attention paid to the query patterns being used.
Now, is it still possible to use ElasticSearch as a database ?
Yes, on the following cases:
- Event sourcing on the database end. That means, a message queue or event streaming system such as Kafka front the ElasticSearch indexing. This approach will buffer the requests in case ElasticSearch is performing cluster updates or leader elections that might potentially result in data loss.
- The writes are controlled and infrequent. So if you have relatively static content, but would like the data to be searchable, amenable for analytics, that makes a good case.
- The typical use case and the most widely used scenario is that ElasticSearch is a sink in a data pipeline and with another system/database mastering the data. In case of data loss, there is a way to replay data from upstream.




