OneArmy Database Migration #1441

chrismclarke · 2022-02-01T21:06:04Z

chrismclarke
Feb 1, 2022
Maintainer

What

The platform was originally based around Firebase’s Firestore DB as the primary backend database, but since evolved to also include a local offline/cache database (Dexie) and an additional server cache (Firestore Realtime DB).

The existing setup has helped us move forward pretty quickly whilst keeping server costs pretty minimal, however as we scale to additional platform instances and features it’s struggling to keep pace.

It’s time for a change!

Why

The unstructured nature of the DB means that generally it will accept any data thrown at it. Whilst this can be mitigated through type-checking and scripting, it generally means that data migrations have to be considered more carefully as hard to keep track of all historic changes and any existing inconsistencies in the data. This has made us more averse to making changes are required
Billing is directly proportional to the size of read ops, strongly discouraging us from introducing features that might negatively impact (such as generating summaries from collection of documents, e.g. all howtos). Whilst not the worst practice ever, it often leads to overly complex or anti-patterns in how we organize data and queries.
Local development typically requires access to server instances, which makes it harder to test operations such as updating and deleting data without having a knock-on for other developers. It also means exposing various api credentials for test projects which could be exploited. Firebase does provide emulators but they are incomplete and can be buggy (e.g. issues sharing seed data across windows/linux instances)
Vendor lock-in to the Google/Firebase ecosystem (less flexible and generally opposed to our open principles)

How

A planned migration away from firebase to other open database platforms, which can either be hosted on our own servers or via hosting providers. This will be a gradual process that will initially focus only on the database (not hosting, storage etc.), with the aim to run both in parallel before fully switching over.

I would propose using Supabase as the alternative DB, as it is open source and is currently trying to position itself as a direct alternative to firebase (with roadmap to replicate most of firebase’s core features).

This would address the issues above broadly by:

Supabase uses a structured postgres db underneath, so not only will it require defined table structures moving forwards but can help us identify inconsistencies when trying to migrate legacy data
If using hosted supabase billing is for db size, not requests. The free tier is a good 500MB (our current db is more like 5MB I think). If self-hosting performance may still be a factor to consider for larger operations, but consequences of poor optimisations much less drastic (possibly requiring db reboot instead of large bill). It also gives options for things like replication if we did want to run some more intensive tasks, either on a local or server clone.
Whilst I would still probably suggest keeping a dev server running to make it easier for people just interested in working on frontend code, those working on backend could quite quickly run their own instances via docker desktop which will accurately mimic how things are run on a server.
Supabase is built around docker containers, so can be deployed to any infrastructure that supports, which these days is just about everything (even a raspberry pi). If using supabase hosted service (which I think is built on top of aws), future migration is still easily possible (plenty of readily available methods to clone one db to another in postgres). There is also planned future support for kubernetes, which could help if ever requiring to build for a larger scale.

Proposed Roadmap

Phase 0 - Prepare
Clear backlog of existing PRs to try and provide a bit more space for making larger changes. Existing issues/PRs that impact on the DB will be prioritized, and future issues/PRs that impact on the DB will be temporarily put on hold

Phase 1 - Deploy
Deploy 2 supabase instances for staging and production
Create firebase functions that will allow one-time migration of legacy data
Create firebase functions that will allow us to replicate data from firestore to supabase in an ongoing way
Get supabase up and running in full replication of firestore, with multiple test/ci projects on single staging instance and a single initial production site on production DB
Use replicated database as means to check existing data for inconsistencies and apply small bit of housekeeping

Phase 2 - Read
Move db read ops to supabase
Replace direct DB write ops with cloud functions (still via firestore, groundwork for next phase)

Phase 3 - Write
Update cloud functions to support writes to supabase
Add support for data triggers to replace those used by firestore
Enable direct writes to supabase alongside firebase (instead of triggered), comparing outputs of both writes to ensure kept in sync

Phase 4 – Transition
Add support and documentation for local development with Supabase
Move all DB ops to supabase

Sticking Points

Firestore supports nested documents (subcollections), but postgres does not. Alternative syntax will need to be considered.
Supabase’s support for triggers is quite new and possibly subject to change

Assuming the migration doesn't happen all at once, likely there will be some extra overhead work to keep any supabase-backed branch/fork in line with main.

Supporting Dev

As this would be quite a large/ongoing project I would recommend defining a set of multiple bounties to be released at key milestones (e.g. phases), with work ideally coordinated between at least 1 core maintainer and 1 feature bounty developer.

Alternatives

Of the initial issues, (1 – multi projects) and (2 – unstructured) could be solved reasonably well within the existing architecture. There is nothing stopping us hosting multiple sites on the same database (we already do this when testing in CI), and there are various tools that can help enforce more rigorous structure on the database (e.g. ORMs such as sequelize, typeorm, fireorm, or more simple data validators like joi or yup).

I would expect (4 – local DB emulators) to improve over time, but (3 – Billing ops) and (5 – Vendor lock-in) are quite fundamental to firebase.

Instead of Supabase there are plenty of alternatives (e.g. mongodb, pure mysql/postgres, fauna etc.). My main issue with mongodb is the way in which the company typically tries to put useful features behind paywalls (e.g. atlas, realm), which could prove problematic when trying to migrate more features such as automated triggers, local sync etc. As supabase is built on top of postgres anyway, I see the advantage working with their existing ecosystem of tools as many are highly useful to us (e.g. triggers, functions, storage, auth etc.).

Any thoughts?

thisislawatts · 2022-02-01T21:45:19Z

thisislawatts
Feb 1, 2022
Maintainer

Thanks for taking the time to write this up. Very useful starting point for discussion.

What is the end goal of this migration? I see the terms Firebase and Firestore being used interchangeably throughout this document. However it is unclear to me on the scope of work here. Can we clarify what is out of scope? For example we currently use Firebase (sub)products authentication and file storage. Is that something that would be tackled separately?

My assumption is that we are only talking about migrating the data persistence layer rather than the other features. Given that context, what additional complexity do we introduce moving data storage away whilst the others (Auth, filestorage, hosting and serverless functions) remain with Google Firebase.

1 reply

chrismclarke Feb 6, 2022
Maintainer Author

My bad, yeah focus on just firestore/realtimedb to start as the full scope of firebase is pretty vast and most of the other components haven't been holding us back in the same way firestore has (hosting, auth, storage would all ideally be future migrations but will depend a bit on what the alternatives are and probably will be a bigger discussion).

Right now I think the main interoperability issues would come from cloud functions - right now we can easily include db triggered functions however I expect via supabase we will need to find some way to manage db triggers independently and call cloud functions are required. But having never tried such a migration before it's hard to predict where other sticking points might be.

I do however think it's really useful to keep this discussion in mind as we think about things like beefing up security (via firebase proprietary rules) and/or refactoring how we interact with the database (e.g. apis vs SDK).

thisislawatts · 2022-02-01T22:01:34Z

thisislawatts
Feb 1, 2022
Maintainer

It would be useful to sketch out a schema for our existing system that would be compatible with relational data stores.

Both Research and Howto documents contain updates and steps respectively, which are extensible lists of sub documents. This is quite straightforward within a Document store but it's unlikely we'd want to use Postgres' JSON data type for these documents.

8 replies

chrismclarke Feb 7, 2022
Maintainer Author

One other minor comment on couchdb also - in the past I've found it pretty tricky to work on things like user access and security. I know we don't have very robust systems in place at the moment so wouldn't be a total deal breaker, but I have found myself having to create individual databases per user to get basic row-level access equivalent (discussion in thread https://github.com/apache/couchdb/issues/1891). I'm not sure what the current state is (this was a few years back), but from my limited time working with couchdb I did find it a bit of a challenge (not to say supabase won't be)...

thisislawatts Feb 8, 2022
Maintainer

My concern was around howto.steps and research.updates columns being JSONB rather using relational constructs to think about them.

There is an open issue to introduce comments on individual Research Update items, so it seems reasonable we would want to think about each individual Reearch.Update as an atomic item rather than something within a JSON blob.

I haven't worked with CouchDB (yet), it simply ranked highly whilst quickly search for for OSS DocumentDB solutions.

I think a schema design would be a useful tool to move this discussion forward. I have used https://www.dbdesigner.net/ before for this which I found pretty helpful! (edit: Although looks like they have introduced a paywall :( )

Also maybe worth updating the original Discussion post so that reasons for Why? is extended to include Security.

chrismclarke Feb 8, 2022
Maintainer Author

Definitely agreed for comments and research updates, but probably not for howto steps. Basically I think that anything that is created as part of the original document probably makes most sense to sit in the same table (e.g. howto description, images, steps, slug etc.) and then for anything that is user-generated on top probably in a separate table (e.g. howto_comments, research_updates, etc.)

DB diagram would be nice for sure. Usually I just use workbench for those kinda things, but have also used dbforge which I think is a bit more beginner-friendly. Haven't tried dbdesigner but might be nice to have an online tool that supports collaboration.

thisislawatts Feb 8, 2022
Maintainer

Oh cool, I've not heard of Workbench, I will check it out 👀

chrismclarke Feb 9, 2022
Maintainer Author

It's a bit overkill, very much the swiss army knife of all things mysql, but what it lacks in user-friendliness it makes up for in at least being officially maintained by oracle (mysql) and free. DBforge is a lot more user friendly but pretty much impossible to do anything within free tier so I'd struggle to recommend beyond testing purposes.

andreinwald · 2023-09-06T17:10:10Z

andreinwald
Sep 6, 2023

Hello everyone! I have some thoughts about this discussion.

a) Choosing between Supabase and own relational DB + backend code, the second one looks much better in your situation: you have already complex code in the frontend repository and functions, so it wouldn't be a problem to support your DB and as profit, you will never reach vendor limitations and lock of Supabase.

b) My idea of the realization process of migration to own DB:

Create a proxy Database API server that just makes requests to Firebase. It can be done with something like express.js.
Change all frontend data requests through this API (should be easy, bc data and keys are the same)
For some data keys (routes) change reading and writing from Firebase to new Relational DB or Supabase
Add routes for all data without haste. On easy speed and resources. Until you will stop doing requests to Firebase and turn it off.
For more complex keys or new data keys - create regular routes with some extra processing logic

Best, Andrew

1 reply

goratt12 Sep 6, 2023
Maintainer

Using a backend such as express.js has another advantages, for example:
currently when a user mark a research article as Useful/Un-useful, the UI get all the current votedUsefulBy array, append/remove the logged-in user from it and push the update to the server.
this implementation have some issues:

if two users open the article at the same time and mark it as useful, they will overwrite each other's changes
if a user is tempering with the UI code, he can potentially empty the array and push it

so I'm suggesting that if we will move the connection to the DB from the UI to backend, we should simply make backend function called: /research/<id>/vote-useful | /research/<id>/vote-unuseful
this will solve both issues I discussed.

mariojsnunes · 2023-11-28T01:01:41Z

mariojsnunes
Nov 28, 2023
Maintainer

There is also https://pocketbase.io/ people seem to favor it instead of supabase

0 replies

mariojsnunes · 2024-04-25T16:08:01Z

mariojsnunes
Apr 25, 2024
Maintainer

Before choosing a new database, we should consider a Multi-Tenant approach first.
For NodeJS, there is PayloadCMS which just announced a much improved 3.0 release.

Why would a multi-tenant solution be beneficial?

Reduced maintenance due to less DevOps work
Reduced infrastructure costs due to shared infrastructure
It's aligned to our platform roadmap 3. Then we make it easy to DIY as other communities could use our infrastructure and not require a Dev/IT team

Edit:
The cons:

Likely more configuration/learning curve
Ideally we would have 1 DB per tenant for data isolation, PayloadCMS doesn't support that as of now
We probably don't want to become SaaS providers, so the 3rd "benefit" wouldn't make much sense

0 replies

mariojsnunes · 2024-05-18T16:41:42Z

mariojsnunes
May 18, 2024
Maintainer

Supabase has official firebase migration guides:
https://supabase.com/docs/guides/resources/migrating-to-supabase/firebase-auth
https://supabase.com/docs/guides/resources/migrating-to-supabase/firestore-data
https://supabase.com/docs/guides/resources/migrating-to-supabase/firebase-storage

0 replies

pizzaisdavid · 2024-06-03T17:00:26Z

pizzaisdavid
Jun 3, 2024

For point 1, Firebase now supports SQL (I guess?) https://youtu.be/vYk6Uh2WGto?t=32

https://www.youtube.com/watch?v=7OdVatEI85o

https://firebase.google.com/products/data-connect

EDIT: seems to be in early access / private preview.

The smallest price seems to be like 12 Euros just for the database, which seems kinda lame. https://cloud.google.com/products/calculator-legacy/#id=7ae4087c-7184-403e-b71e-4502591a3e24

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OneArmy Database Migration #1441

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments 10 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

OneArmy Database Migration #1441

chrismclarke Feb 1, 2022 Maintainer

What

Why

How

Proposed Roadmap

Sticking Points

Supporting Dev

Alternatives

Replies: 7 comments · 10 replies

thisislawatts Feb 1, 2022 Maintainer

chrismclarke Feb 6, 2022 Maintainer Author

thisislawatts Feb 1, 2022 Maintainer

chrismclarke Feb 7, 2022 Maintainer Author

thisislawatts Feb 8, 2022 Maintainer

chrismclarke Feb 8, 2022 Maintainer Author

thisislawatts Feb 8, 2022 Maintainer

chrismclarke Feb 9, 2022 Maintainer Author

andreinwald Sep 6, 2023

goratt12 Sep 6, 2023 Maintainer

mariojsnunes Nov 28, 2023 Maintainer

mariojsnunes Apr 25, 2024 Maintainer

mariojsnunes May 18, 2024 Maintainer

pizzaisdavid Jun 3, 2024

chrismclarke
Feb 1, 2022
Maintainer

Replies: 7 comments 10 replies

thisislawatts
Feb 1, 2022
Maintainer

chrismclarke Feb 6, 2022
Maintainer Author

thisislawatts
Feb 1, 2022
Maintainer

chrismclarke Feb 7, 2022
Maintainer Author

thisislawatts Feb 8, 2022
Maintainer

chrismclarke Feb 8, 2022
Maintainer Author

thisislawatts Feb 8, 2022
Maintainer

chrismclarke Feb 9, 2022
Maintainer Author

andreinwald
Sep 6, 2023

goratt12 Sep 6, 2023
Maintainer

mariojsnunes
Nov 28, 2023
Maintainer

mariojsnunes
Apr 25, 2024
Maintainer

mariojsnunes
May 18, 2024
Maintainer

pizzaisdavid
Jun 3, 2024