Impression about Google Firestore and Cloud Functions
My current project for almost a year is based on Serverless Google Firestore and Cloud Functions backend, and Flutter to develop Android and iOS mobile apps. That allowed me to deeply evaluate those new Serverless projects, to see all the good and bad in these technologies at this moment (year 2021).
This far my specialization was Kubernetes based backends for mobile and Angular/React projects. Serverless was a totally new paradigm shift for me.
These technologies give you so much, however, they are much much more complex than traditional MySQL + PHP + Apache + Linux projects. They could be too complex for many people, and it is honest advice.
For some people, even the Kubernetes project will feel like an easier thing to do. (Serious project, not just quick modern prototyping)
Serverless Cluster
Serverless Cluster is something similar to Kubernetes Cluster, except that you do not need to create Docker images for your Node.js or Golang web services, you do not need to create Kubernetes cluster, you do not need to create Stateful database, logs, and so on. All you need to do is to develop database security rules, and cloud functions. All that works in automatic Cluster. No DevOps, no Systems Administration. That allows you to save on the whole team (6-10 people) - you need just 1 full stack developer. 2-3 if you need to get your apps and backend up and running faster. (1 for mobile apps, 1 for website, 1 for backend, or more, if you can afford that. (then you will have 100% automated tests coverage, and more))
Serverless concept is super cool. I would suggest checking out Microsoft Azure Functions as well, as they are having stateful functions.
Some reading about the Cloud Functions and Firestore is at the bottom of this article.
Firestore
Firestore is a noSQL database from Google. It is similar to Couchbase in some areas. Couchbase is more serious at the moment (2021).
Best features of Firestore are:
- It allows off-line data sync on Android and iOS mobile devices - such functionality usually takes almost a man year to develop and stabilize by yourself. Serious competitor here is Couchbase. Firestore has superior functionality here, however (point 2).
- It allows real time data changes monitoring: you create a query and listen for further data changes. That allows creating live mobile and web apps where data changes right before your eyes. Imagine an auction app, or hotels booking app with immediate situation changes - no need to pull to refresh.
- It has Security Rules which allow to validate and check CRUD operations for every data document, every property. I suppose no web services programmer does care so much about data protection and access restrictions as it is possible with Firestore. That requires an insane amount of automatic tests. (1K-2K tests?)
- Plus to that, Google Cloud Functions allow having triggers in the database to run the code for each smallest data change.
- It has a free usage plan which fits some part of the possible projects nicely and allows having PRO infrastructure right from the start. When the project grows and produces money, then you start paying for your infrastructure. Getting to the point where you would save the money with Kubernetes cluster would mean your business has become big and really serious.
- No SQL JOINs means using pre-processed data literally everywhere. That gives data caching effect in the database itself as well. In result one gets a really fast and optimized backend which rarely happens with MySQL based web services.
- No need for Systems Administration with load balancing, scaling, Kubernetes cluster, availability zones and separate data centers, authorization and authentication infrastructure, users management, etc. Enterprise features come out of the box, and try doing it all better by yourself!
Problems with Firestore are:
- Programmers have to use document schema versions so that later it is possible to run data migrations scripts, and to work in mobile apps with previous data structure, and save current data structure documents. That is messy if compared to RDBMS. However, in the Petabytes world that gives incredible advantages. Like the data migration scripts can transform data batch after batch all the time - no need to stop the database.
- Not suited for complex SQL selects like this < 3 AND that > 9. No JOINs. Queries limitations if compared to Couchbase and MySQL are too noticeable.
- Agile (SCRUM) and Firestore is not a good idea, as it differs from MySQL so much that without solid and big Technical Design document it won’t be possible to make prognosis about is it possible to use this database, or some “SQL queries” won’t be possible to implement at all. Not everything could be done with cached pre-processed data. Before the SCRUM era people were actually thinking about what they will be doing first, and it pays off tremendously.
- As each returned document costs some money, Firestore is useful only in the projects where the information costs very much and generates real money from the clients. It is not suited for Google Ads websites where information practically brings no income, but the visitors amount is mind blowing anyway. Costs will be higher than the income.
- Official Firestore Security Rules plugin for VS Code would be cool as writing these is so complicated and requires writing so many automated tests to be able to control the situation.
- Security Rules do not have complex Node.js code writing option which in some cases is a must, and no mental agility would be an excuse here, as less errors and productivity is important as well.
- Firestore Security Rules failures are not being reported to the developers in meaningful and productive ways which seriously damages productivity and damages clients business. The excuse is security considerations. That could be solved by collecting meaningful failures information and making it available to the authenticated developers only. It's so simple. Why hasn't it been done yet?
- Listening to live data snapshots like in the search sometimes produces an empty snapshot, then data - at least in FlutterFire libraries case Stream emits not once, like you expect, but twice. People who haven’t been programming sockets data stream interpretation before might get lost here and just drop Firestore, and Flutter as well. Looks like someone passed this kind of problem to SDK users instead of solving it by himself/herself. When composing a complex screen which has Firestore Query and Snapshots Listener, and the progress Spinner, then to have nice UI special effects one has to use Timer which for inexperienced Programmers could be an issue. Programming here reminds low level chips calls in IoT projects :-D This problem was actual to the date when this publication was written (2021). Asynchronous processes which cannot be put under one Future.wait() are even more complex than just the asynchronous processes.
- In my current Node.js based project I have already produced 776 automated tests for the Firestore Security Rules... (data validation and access restrictions)
- Many things were quite shocking to me too like various excuses in public docs about half finished things. It is noticeable that the product evolves, however. And it is arguable that this modern "Minimum Viable Product" business concept is applicable in all cases. A thought experiment to test it: what will happen if we will ship half finished products too early?
- While Google Firestore allows adding cloud database to the mobile apps so easily, it also requires having very powerful backend programmer in the team to ensure an easy and comfortable life for the mobile programmers.
- Firebase reliance on JWT tokens and their custom claims to allow or deny access in Security Rules is one strange concept to me... Especially in the banks, army, CIA projects for sure.
- Firestore for Flutter has some architectural "features" which could become a problem for the newbies. For example, when you save the document, you do not get a callback - it happens only when the document gets written to the Cloud database. How would you know when to close the screen or show an error popup? The solution was to use Firestore REST API where synchronous data UPSERT operations ARE possible. Or transactions, which makes sense for more complex data alterations.
- No indexes with default values. Some documents will stay hidden in queries as a result.
- Because of various Security Rules limits, it is impossible to validate whole document with a lots of embedded data. A simple array of objects of size 10 could not be validated. That is not acceptable in banking, government, and military projects, for example.
How to significantly improve the situation?
- Google could create open source projects which would stimulate their cloud offerings evolution and adoption, and would replace popular previous generation technologies products like Wordpress.
- Google can assign its engineers to more serious Firestore and Cloud Functions projects to get deep into the use case. That would generate even more output than my publication.
- If they would mix together Cloud and Mobile as a single Cloud Runtime Environment, it would be SOMETHING.
- Experienced Developer can develop Functions for mobile devices too. Other Developers can focus more on the UI design issues. In a way that already happens on the Android devices with Work Manager: https://developer.android.com/topic/libraries/architecture/workmanager Imagine events, workflows, guaranteed persistence and execution on the mobile devices.
- Adding these Cloud Functions triggers to the Firestore would make it even stronger: beforeCreate, beforeUpdate, beforeDelete, beforeWrite. Cloud Functions on these triggers should be able to cancel pending operations.
- RedHat / Fedora Linux have that SELinux wizard in forbidden data access situations. Similar wizard could be created for Firestore Security Rules as well?
- Adding forEach functionality to Security Rules checking language would open Firestore usage to governments and banks.
Cloud Functions
Google Cloud Functions are small Node.js, Golang, Python, Java, .NET, Ruby, and PHP functions which are being called after some trigger happens in Google Cloud:
- The document is being added to the database, or is being updated.
- cron scheduler event happens.
- Pub/Sub message arrives.
- And so on.
They are the same as AWS Lambda and Azure Functions.
Best features of Google Cloud Functions are:
- Small Typescript function is all that is needed to do that same work as 1 web service would do. In the Enterprise scenario the work will be bigger as quick prototyping won't be acceptable here.
- Cloud Functions with RETRY being enabled even allow to not use message queues in simpler situations. That does not compete with serious transaction management systems in no way.
- Programmer does not have to worry about Docker containers and Kubernetes configuration - all happens miraculously by itself.
- It has a free usage plan which fits some part of the possible projects nicely and allows having PRO infrastructure right from the start.
- Programming Cloud Functions teaches writing the code which will cost less cpu and mem to execute.
Problems with Google Cloud Functions are:
- Just like with Firestore Security Rules, without automated tests it won't be possible to achieve any meaningful results.
- Google Cloud Functions cascades / chains of events testing and bugs fixing is one hell of an adventure. One fat Spring Boot web service would be much simpler to write and test.
- Testing Serverless Cloud Functions with automated tests means forgetting about quick unit tests style. I had to divide tests in time from each other so that other triggered Cloud Functions could finish the things they do. Environment stabilizes, and the next test runs. Still, it's a mess getting log messages output from several parallel processes. And seeing their inter dependencies. The question is how to visualize all that on a small screen.
How to significantly improve the situation?
- When Cloud Functions perform database operations, they should NOT by default launch database triggers. Programmers should intentionally set parameters that triggers must run. That will reduce endless cycles in Cloud Functions for newbies. When database clients perform writes and updates, then triggers should run automatically, as intended.
- Circuit breakers like no more than 2-3 recursive calls would be great as well.
- Golang 1.17 should be added to Cloud Functions.
- That is the same about the Julia programming language, as it is a serious Python replacement in Data Science projects.
- That all could be done together with taking a list of Azure Functions functionality and adding to Google Cloud Functions.
- Next generation Golang Functions could even influence the concept of the Golang Runtime to make it more similar to Julia Tasks launcher on the cluster of the nodes. Imagine a mix of Kubernetes and Julia parallelism as a secure and controlled environment where functions live. Configurable environment. Secure.
- Microsoft Azure Functions have Durable Functions: https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview?tabs=javascript and Durable Entities: https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-entities?tabs=csharp
How does it all differ from Kubernetes?
Similar effect with absolutely no infrastructure planning and configuration. However, there will be the point where Kubernetes is the only way to go. At least today (2021).
CTO benefits from serverless projects
- No need to worry about the infrastructure, literally.
- This technology allows to simplify many things, and to use 1 programmer in the place of the team of 6. (few people created these estimates for me based on what I am doing this year) However, that requires more able than usually programmers in the team. That is an honest warning for too optimistic people. This project does not come easy for me.
References
Firestore: https://firebase.google.com/docs/firestore
Couchbase: https://www.couchbase.com/
Google Cloud Functions: https://cloud.google.com/functions
Firebase Cloud Functions: https://firebase.google.com/docs/functions
Azure Functions: https://azure.microsoft.com/en-us/services/functions/#overview
Knative: https://www.redhat.com/en/topics/microservices/what-is-knative