Cloud Firestore - Querying and Pricing

Cloud Firestore
Cloud firestore is horizontally scaling NoSQL document database in cloud. It is a collection of objects and all these objects are stored in a tree like hierarchical structure. Cloud firestore is little more organized, that is made up of documents and collections.
Documents
Documents are similar to json objects or dictionaries, they consists of key value pairs which are referred to as fields in cloud firestore and their value can be strings, numbers, binary data, maps.
Collections
Collections are basically collections of documents, they are like hash or dictionary where values are always going to be some kind of a documents.
Rules when using documents and collections: 1. Collections can only contain documents, nothing else. No collections of strings, numbers or binaries. 2. Documents can only be of 1 MB of size. Larger than that we need to break it up. 3. Document cannot contain another document, documents can point to some collections but not other documents directly. 4. Very root of cloud firestore tree can only contain collections.
As a general rule we will be drilling down into data by specifying collection and then document and collection and then document and so on until we reach to the level of data we want. The data path will be the alternation starting from collection document collection document and so on.
Queries in cloud firestore
In firebase real time database, the deep down nesting structure of data, when we retrieve some element in tree we automatically retrieve everything below it that can lead to downloading hundreds of data when we only mean some specific but in cloud firestore queries are shallow by default in nature which means when we grab documents with in collection we only grab those documents not documents and tree below it.
Database is not only place to store data, it’s also about finding things when we need them.
Rules:
- Queries we run against cloud firestore can only be used to find documents within one specific collection or sub collection but they support it now by collection group queries. To enable this support, we can go to cloud firestore console and tell cloud firestore exactly which field we want to search for across what collection name. For collection ID we should enable field path and its scope index (collection group scope). Note on this type of queries: 1. Limited to about only 200 of these things. 2. These queries will look for all the collections of same name regardless of where they appear in the database. So, having unrelated collection with same name anywhere in the database will also be included in this index. So be careful in naming such collections.
- Queries should be based on quality or greater than or less than comparison of one or more field in document. It cannot be based on calculation.
- The results we get back from a cloud firestore are shallow. If we want to get back all the data all at once from query like this, we need to make multiple fetches or not break that information in sub collections.
- The time it takes to query is proportional to the number of results we get back not the number of documents we are searching through. For example: I need top 5 highest rated restaurants in my zip code that’s gonna be order or operation where or like number of results we are requesting. It means it takes equal time to run whether we have 60 total restaurants to search through or 60 thousands restaurants or 6 million. How does firestore do this? Whenever we add a document to a collection in a database cloud firestore automatically creates an index for every field in that document. Every entry in the index record the value of the field and where there corresponding docments exists in the database. When we have index like this it becomes incredably fast to find any particular value using binary search. This also means inserting or modifying documents takes a bit more time because it needs to update indexes
- There is no native pattern searching or regex searching. Querying database to find a name like “%Binod%” is not possible. If we still want to make such we can use some third party libraries.
- Cloud firestore provides limited support for logical OR queries. The
inandarray-contains-anyoperator support a logicalORof up to 10 equality (==) orarray-containsconditions on a single field. If we want to perform OR queries we should run both queries separately, grab all the required information and merge the two sets of documents ourselves on the client. If we know in advance if there is an specific OR query to use then we could add value in database that represent ORed value. - We also cannot perform not equal to (
!=) queries or look at document where value doesn’t exist like find document wherename=nil. This is because the index doesn’t exist for null values. It’s fine to mix null values in with other data. That helps solve the “you can’t query for fields that don’t exist” problem. - We should not mix different types of data in document fields like strings and numbers. It is totally possible to do this but as soon as we start to use strings in numeric fields two indexes are saved and we will have to do two searches. Do not mix types to the field we need to query.
- We can query multiple fields at once. For instance, I want all public schools in kathmandu. In such cases cloud firstore can cleverly join these multiple searches and still scale proportional to our result sets. It does this through a zig zag merge join.
- And queries in firestore are nice and performant.
Composite Indexes
In case of greater than or less than queries, suppose, we want all the public schools which levels are greater than secondary level. We have separate index for public schools and document ids, and school level and document ids. There is no easy way to intersect these two. They are also not sorted in a way such that zig zag merge join can be performed. So what do we do?. Cloud firestore introduced a combo fields. There aren’t any actual extra fields in database but these things only exists at the index level and clould firestore does all the work of building and maintaining this for us. This is known as “Composite indexes”
We don’t or can’t automatically create a composite index for every single combination of fields in a document. This is because composite index for document of just 10 fields can be millions. We can tell cloud firestore what sort of composite index we will need for our app. There are two ways to create composite index. One of the way is to manually add composite index from the console. But the recommended way to create composite index is to run required query in our app that would require one of the composite indexes. Firestore will notice that index is not available yet to support this query and it will give an error in our console logs. In the text of the error is the URL that will take right into the firebase console to generate exactly the same composite index we need to make to run this query. Just clicking the button “CREATE INDEX” of the dialog box is all the thing we need to do.
General rule to create composite index
To take the thing we gonna do less than or less than query on and put it last. For example, we want all the restaurant with specific zip codes with rating of 4.5 or more. For this we can create composite index of zip code first and then rating last. Keeping this value sorted we can find any restaurants of specific zip code with desired rating.
Cloud Firestore Pricing
We can structure our data in cloud firestore in a way that probably makes lot more sense logically. But do this logical database determine pricing we want?
Cloud firestore pricing model is based on the operations we perform on the database rather than the upload or download of the results from the database. Firestore primarily charges based on the reads, writes and deletes that we perform in the database. But what do these reads writes and deletes refer although they are obvious?
Writes are charge whenever we create or update a document. This holds true no matter how much documents we are changing, whether we are changing a single field from true to false or swapping out thirty different fields at once. This just counts as one write.
Read occurs any time a client gets data from a document. So, if we query a cloud firestore to get top thirty public schools in kathmandu valley and result is thirty documents then it is counted as thirty reads. Only the document that are retrieved are counted. Firestore does searching through indexes not documents so if I have thousands of schools to search through but querying only top thirty schools will result in thirty reads.
Read applies to real times updates as well. Every new update a client receives also counts as a read but only for the document that’s changed. For example, we have real time listener setup for the top thirty public schools and one of the school certainly changed their name that new document will be sent to the client and that will count as a read but it only counts as one document read.
Cloud firestore pricing should be considered while writing cloud functions and security rules. Improper cloud functions can lead to unwanted reads and writes which will ultimately increase firestore read write operations thus increasing the price.
Firestore also charges for storing data, this includes not only data but also indexes and metadata that goes along with it.
Billing
- The number of reads, writes, and deletes that you perform.
- The amount of storage that your database uses, including overhead for metadata and indexes.
- The amount of network bandwidth that you use. The network bandwidth cost of a Cloud Firestore request depends on the request’s response size, the location of your Cloud Firestore database, and the destination of the response.
Documents Reads (per 100,000): $0.06 Documents Writes (per 100,000): $0.18 Documents Deletes (per 100,000): $0.02 Stored Data: $0.18 per GB
Measuring and estimating cost
It is certainly important whether a service is cheap or expensive sometimes what matters more is its measurable and predictable. The nice thing about looking at reads and writes as main pricing factor is that estimating those is heck of lot more easier than estimating download sizes. Specially because we can limit the number of items that gets returned from a query. So, we can easily set up an app that could return top thirty public schools in kathmandu when we perform a search and know exactly how much that search would cost.
If we want to see our database usage we can go to google cloud console (console.cloud.google.com), app engine and head over to the quotas page. We can find bunch of different usage stats. The one we are interested are in Cloud Firestore Read Operations, Cloud Firestore Entity Writes, Cloud Firestore Entity Deletes. And for storage, we can look Cloud Firestore Stored Data. This can give us preety good ideas about what our usage history looks likes.
Moreover in App Engine settings we can set up daily limit on spending. By setting this daily limit if firestore costs ever gets over this threshold then API will start refusing requests. We can set this to a point where we are like if it reaches to this level than something is definitely wrong put the breaks until this is fixed. Making habit of going to this page once in a while is important. Looking how our costs are going historically and adjusting the limit to something that might be more accurate based on our previous levels. If we are in firebase plain plan or fixed price per month plan, it’s already done by firebase for us.
If we are on blaze i.e. pay as you go plan then I strongly request to get a monthly budget alert. If we go to billing section of google cloud console, we have an option to create a budget. This is king of telling google this is the expected monthly budget I can spend for all of your cloud services. This budget section covers all google cloud services not just firestore.
