Vizzuality · hotzevzl · Mar 2, 2022 · Feb 18, 2022 · Mar 2, 2022
diff --git a/README.md b/README.md
@@ -13,8 +13,19 @@ Services are packaged as Docker images.
 Microservices are set up to be run with or without Docker Compose for local 
 development - see the sections below for more details.
 
+The recommended setup for new developers is to run all the backend services (api
+and geoprocessing services, alongside their PostgreSQL and Redis databases) via
+Docker Compose, and the frontend app natively.
+
 In CI, testing, staging and production environments, microservices are
-orchestrated via Kubernetes (forthcoming).
+orchestrated via Kubernetes (see [the relevant
+documentation](./infrastructure/README.md)).
+
+Most of the commands listed in this README and referenced elsewhere in the
+repository are targeted at a GNU/Linux OS environment such as a recent Ubuntu,
+Arch or Debian system, whether running natively or in a VM or under Windows
+Subsystem for Linux 2 (WSL 2). They should also work identically on MacOS, while
+they may need some adaptation to run on Windows systems.
 
 ## Platform architecture
 
@@ -25,11 +36,20 @@ details.
 
 ## Dependencies
 
-- [Nodejs](https://nodejs.org/en/) v14.18
-- [Yarn](https://yarnpkg.com/)
+- [NodeJS](https://nodejs.org/en/) v14.18
+- [Yarn](https://classic.yarnpkg.com/) v1
 - [PostgreSQL](https://www.postgresql.org/) v14
 - [Postgis](https://postgis.net/) v3
 - [Redis](https://redis.io/) v6
+- A [Sparkpost](https://www.sparkpost.com/) account
+
+For development environments, a separate Sparkpost account than what is used for
+staging/production should be used. Unless the transactional email components of
+the platform are being actively worked on (email verification on signup, email
+confirmation for password changes, email flow for resetting forgotten passwords,
+etc.), there will be no need to set up email templates within the Sparkpost
+account, and only a Sparkpost API key will be needed (see documentation on
+[environment variables](./ENV_VARS.md) for details on this).
 
 ## Running API and Geoprocessing services using Docker
 
@@ -47,54 +67,60 @@ via a Docker volume.
 
 ### Running the Marxan Cloud platform
 
-Run `make start` to start all the services.
+Run `make start-api` to start all the backend services and all the required
+database services.
 
-Run `make start-api` to start api services.
+To run the frontend app, populate the `app/.env` file according to the [app
+documentation](./app/README.md), then:
 
-### Running the notebooks
 
-Run `make notebooks` to start the jupyterlab service.
+```
+npm install -g yarn
+cd app
+nvm use ## or fnm use, etc. according to the preferred node version manager
+yarn install
+yarn dev
+```
 
-### Seed data
+The frontend app will then be available on http://localhost:3000 (or at the URL
+shown when the app starts, if a different port has been configured).
 
-To seed the geodb database after a clean state, please follow the steps below:
+### Seeding a new instance with initial data
 
-``` bash
-make seed-geodb-data
-```
+Once all the backend services have started for the first time, initial seed data
+must be imported into the new instance. Likewise, after resetting an instance
+via `make clean-slate`, this step must be repeated after having started the
+backend services.
 
-This will populate the metadata DB and will trigger the geo-pipelines to seed
-the geoDB.
+Please make sure to wait for all of the backend services (api, geoprocessing and
+webshot) to fully start as database migrations will be run while the services
+are started: attempting to import seed data before migrations have run fully
+will result in errors.
 
-Note: Full db set up will require at least 16GB of RAM and 40GB of disk space in
-order to carry out some of these tasks (GADM and WDPA data import pipelines).
-Also, the number of CPU cores will impact the time needed to seed a new instance
-with the complete GADM and WDPA datasets.
+Importing initial seed data can be done in two distinct ways:
 
-To populate a new instance with a small subset of test data, instead:
+1. Full data (recommended in most cases)
 
 ``` bash
-make seed-dbs
+make seed-geodb-data
 ```
 
-We also provide a way to reset db instances from scratch. This can be useful to
-do regularly, to avoid keeping obsolete data in the local development instance.
+This will populate the metadata DB and will trigger the geoprocessing ETL
+pipelines to seed the geoDB with the full data that would normally be used in
+staging and production instances.
 
-``` bash
-make clean-slate
-```
+Please note that this full db set up will require at least 16GB of RAM and 40GB
+of disk space in order to carry out some of these tasks (GADM and WDPA data
+import pipelines). Also, the number of CPU cores will impact the time needed to
+seed a new instance with the complete GADM and WDPA datasets.
 
-And finally, we provide a set of commands to create new dbs dumps from upstream
-data sources, upload these dunps to an Azure storage bucket, and populate both
-dbs from these dumps. This will typically be faster than triggering the full
-geodb ETL pipelines.
+2. Small subset of test data
 
-``` bash
-make generate-content-dumps && make upload-dump-data
-```
+To populate a new instance with a small subset of test data, instead, the
+following command can be used:
 
 ``` bash
-make restore-dumps
+make seed-dbs
 ```
 
 ## Running API and Geoprocessing services natively
@@ -170,6 +196,45 @@ Running tests require previously loading the [test seed
 data](#setting-up-test-seed-data), and may modify data in the database - do not
 run tests using a database whose data you don't want to lose.
 
+## Maintenance
+
+### Resetting data to a clean slate status
+
+The main `Makefile` provides a way to reset db instances from scratch. This can
+be useful to do regularly, to avoid keeping obsolete data in the local
+development instance.
+
+``` bash
+make clean-slate
+```
+
+### Update seed data (GADM, WDPA) from newer upstream releases
+
+The main `Makefile` provides a set of commands to create new dbs dumps from
+upstream data sources, upload these dumps to an Azure storage bucket, and
+populating both dbs from these dumps. Populating clean dbs this way will
+typically be faster than triggering the full geodb ETL pipelines.
+
+To run the geoprocessing ETL pipelines (such as when using the *Seed data,
+option 1* above) and upload the processed data to an Azure bucket:
+
+``` bash
+make generate-content-dumps && make upload-dump-data
+```
+
+Other developers can then benefit from the pre-prepared data seeds when
+populating new development instances after their initial setup:
+
+``` bash
+make restore-dumps
+```
+
+### Running the notebooks
+
+This step is only needed when developing Python notebooks for Marxan.
+
+Run `make notebooks` to start the jupyterlab service.
+
 ## Development workflow (TBD)
 
 We use a lightweight git flow workflow. `develop`, `main`, feature/bug fix

diff --git a/airflow/dags/exampleDag.py b/airflow/dags/exampleDag.py
diff --git a/airflow/scripts/entrypoint.sh b/airflow/scripts/entrypoint.sh
diff --git a/docker-compose.yml b/docker-compose.yml
@@ -150,55 +150,11 @@ services:
     depends_on:
       - redis
 
-  postgresql-airflow:
-    image: postgres
-    container_name: marxan-postgresql-airflow
-    environment:
-      - POSTGRES_USER=airflow
-      - POSTGRES_PASSWORD=airflow
-      - POSTGRES_DB=airflow
-
-  airflow-scheduler:
-    container_name: marxan-airflow-scheduler
-    image: apache/airflow:2.0.0
-    command: scheduler
-    depends_on:
-        - postgresql-airflow
-    restart: on-failure
-    environment:
-      - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgresql-airflow/airflow
-      - AIRFLOW__CORE__EXECUTOR=LocalExecutor
-    volumes:
-      - ./airflow/dags:/opt/airflow/dags
-      - marxan-airflow-logs:/opt/airflow/logs
-
-  airflow-webserver:
-    container_name: marxan-airflow-webserver
-    image: apache/airflow:2.0.0
-    restart: on-failure
-    entrypoint: ['sh','./scripts/entrypoint.sh']
-    env_file:
-      - .env
-    environment:
-      - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgresql-airflow/airflow
-      - AIRFLOW__CORE__EXECUTOR=LocalExecutor
-    volumes:
-      - ./airflow/dags:/opt/airflow/dags
-      - ./airflow/scripts:/opt/airflow/scripts
-      - marxan-airflow-logs:/opt/airflow/logs
-    ports:
-      - "${AIRFLOW_PORT}:8080"
-    depends_on:
-      - postgresql-airflow
-      - airflow-scheduler
-
 volumes:
   marxan-cloud-postgresql-api-data:
 
   marxan-cloud-postgresql-geo-data:
 
   marxan-cloud-redis-api-data:
 
-  marxan-airflow-logs:
-
   marxan-cloud-backend-temp-storage: