diff --git a/.github/workflows/test_oonipipeline.yml b/.github/workflows/test_oonipipeline.yml
index abd1cb91..cf22ebe0 100644
--- a/.github/workflows/test_oonipipeline.yml
+++ b/.github/workflows/test_oonipipeline.yml
@@ -2,7 +2,7 @@ name: test oonipipeline
 on: push
 jobs:
   run_tests:
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-20.04
     steps:
       - uses: actions/checkout@v3
 
diff --git a/.gitignore b/.gitignore
index 52d8bbe3..999050a9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -9,4 +9,3 @@ coverage.xml
 /output
 /attic
 /prof
-/clickhouse-data
diff --git a/Readme.md b/Readme.md
index 8e81ddfe..98159cda 100644
--- a/Readme.md
+++ b/Readme.md
@@ -7,11 +7,13 @@ Most users will likely be interested in using this as a CLI tool for downloading
 measurements.
 
 If that is your goal, getting started is easy, run:
+
 ```
 pip install oonidata
 ```
 
 You will then be able to download measurements via:
+
 ```
 oonidata sync --probe-cc IT --start-day 2022-10-01 --end-day 2022-10-02 --output-dir measurements/
 ```
@@ -19,218 +21,6 @@ oonidata sync --probe-cc IT --start-day 2022-10-01 --end-day 2022-10-02 --output
 This will download all OONI measurements for Italy into the directory
 `./measurements` that were uploaded between 2022-10-01 and 2022-10-02.
 
-If you are interested in learning more about the design of the analysis tooling,
-please read on.
-
-## Developer setup
-
-This project makes use of [poetry](https://python-poetry.org/) for dependency
-management. Follow [their
-instructions](https://python-poetry.org/docs/#installation) on how to set it up.
-
-Once you have done that you should be able to run:
-```
-poetry install
-poetry run python -m oonidata --help
-```
-## Architecture overview
-
-The analysis engine is made up of several components:
-* Observation generation
-* Response body archiving
-* Ground truth generation
-* Experiment result generation
-
-Below we explain each step of this process in detail
-
-At a high level the pipeline looks like this:
-
-```mermaid
-graph
-    M{{Measurement}} --> OGEN[[make_observations]]
-    OGEN --> |many| O{{Observations}}
-    NDB[(NetInfoDB)] --> OGEN
-    OGEN --> RB{{ResponseBodies}}
-    RB --> BA[(BodyArchive)]
-    FDB[(FingerprintDB)] --> FPH
-    FPH --> BA
-    RB --> FPH[[fingerprint_hunter]]
-    O --> ODB[(ObservationTables)]
-
-    ODB --> MKGT[[make_ground_truths]]
-    MKGT --> GTDB[(GroundTruthDB)]
-    GTDB --> MKER
-    BA --> MKER
-    ODB --> MKER[[make_experiment_results]]
-    MKER --> |one| ER{{ExperimentResult}}
-```
-
-### Observation generation
-
-The goal of the Observation generation stage is to take raw OONI measurements
-as input data and produce as output observations.
-
-An observation is a timestamped statement about some network condition that was
-observed by a particular vantage point. For example, an observation could be
-"the TLS handshake to 8.8.4.4:443 with SNI equal to dns.google failed with
-a connection reset by peer error".
-
-What these observations mean for the
-target in question (e.g., is there blocking or is the target down?) is something
-that is to be determined when looking at data in aggregate and is the
-responsibility of the Verdict generation stage.
-
-During this stage we are also going to enrich observations with metadata about
-IP addresses (using the IPInfoDB).
-
-Each each measurement ends up producing observations that are all of the same
-type and are written to the same DB table.
-
-This has the benefit that we don't need to lookup the observations we care about
-in several disparate tables, but can do it all in the same one, which is
-incredibly fast.
-
-A side effect is that we end up with tables are can be a bit sparse (several
-columns are NULL).
-
-The tricky part, in the case of complex tests like web_connectivity, is to
-figure out which individual sub measurements fit into the same observation row.
-For example we would like to have the TCP connect result to appear in the same
-row as the DNS query that lead to it with the TLS handshake towards that IP,
-port combination.
-
-You can run the observation generation with a clickhouse backend like so:
-```
-poetry run python -m oonidata mkobs --clickhouse clickhouse://localhost/ --data-dir tests/data/datadir/ --start-day 2022-08-01 --end-day 2022-10-01 --create-tables --parallelism 20
-```
-
-Here is the list of supported observations so far:
-* [x] WebObservation, which has information about DNS, TCP, TLS and HTTP(s)
-* [x] WebControlObservation, has the control measurements run by web connectivity (is used to generate ground truths)
-* [ ] CircumventionToolObservation, still needs to be designed and implemented
-  (ideally we would use the same for OpenVPN, Psiphon, VanillaTor)
-
-### Response body archiving
-
-It is optionally possible to also create WAR archives of HTTP response bodies
-when running the observation generation.
+### OONI Pipeline
 
-This is enabled by passing the extra command line argument `--archives-dir`.
-
-Whenever a response body is detected in a measurement it is sent to the
-archiving queue which takes the response body, looks up in the database if it
-has seen it already (so we don't store exact duplicate bodies).
-If we haven't archived it yet, we write the body to a WAR file and record it's
-sha1 hash together with the filename where we wrote it to into a database.
-
-These WAR archives can then be mined asynchronously for blockpages using the
-fingerprint hunter command:
-```
-oonidata fphunt --data-dir tests/data/datadir/ --archives-dir warchives/ --parallelism 20
-```
-
-When a blockpage matching the fingerprint is detected, the relevant database row
-for that fingerprint is updated with the ID of the fingerprint which was
-detected.
-
-### Ground Truth generation
-
-In order to establish if something is being blocked or not, we need some ground truth for comparison.
-
-The goal of the ground truth generation task is to build a ground truth
-database, which contains all the ground truths for every target that has been
-tested in a particular day.
-
-Currently it's implemented using the WebControlObservations, but in the future
-we could just use other WebObservation.
-
-Each ground truth database is actually just a sqlite3 database. For a given day
-it's approximately 150MB in size and we load them in memory when we are running
-the analysis workflow.
-
-### ExperimentResult generation
-
-An experiment result is the interpretation of one or more observations with a
-determination of whether the target is `BLOCKED`, `DOWN` or `OK`.
-
-For each of these states a confidence indicator is given which is an estimate of the
-likelyhood of that result to be accurate.
-
-For each of the 3 states, it's possible also specify a `blocking_detail`, which
-gives more information as to why the block might be occurring.
-
-It's important to note that for a given measurement, multiple experiment results
-can be generated, because a target might be blocked in multiple ways or be OK in
-some regards, but not in orders.
-
-This is best explained through a concrete example. Let's say a censor is
-blocking https://facebook.com/ with the following logic:
-* any DNS query for facebook.com get's as answer "127.0.0.1"
-* any TCP connect request to 157.240.231.35 gets a RST
-* any TLS handshake with SNI facebook.com gets a RST
-
-In this scenario, assuming the probe has discovered other IPs for facebook.com
-through other means (ex. through the test helper or DoH as web_connectivity 0.5
-does), we would like to emit the following experiment results:
-* BLOCKED, `dns.bogon`, `facebook.com`
-* BLOCKED, `tcp.rst`, `157.240.231.35:80`
-* BLOCKED, `tcp.rst`, `157.240.231.35:443`
-* OK, `tcp.ok`, `157.240.231.100:80`
-* OK, `tcp.ok`, `157.240.231.100:443`
-* BLOCKED, `tls.rst`, `157.240.231.35:443`
-* BLOCKED, `tls.rst`, `157.240.231.100:443`
-
-This way we are fully characterising the block in all the methods through which
-it is implemented.
-
-### Current pipeline
-
-This section documents the current [ooni/pipeline](https://github.com/ooni/pipeline)
-design.
-
-```mermaid
-graph LR
-
-    Probes --> ProbeServices
-    ProbeServices --> Fastpath
-    Fastpath --> S3MiniCans
-    Fastpath --> S3JSONL
-    Fastpath --> FastpathClickhouse
-    S3JSONL --> API
-    FastpathClickhouse --> API
-    API --> Explorer
-```
-
-```mermaid
-classDiagram
-    direction RL
-    class CommonMeta{
-        measurement_uid
-        report_id
-        input
-        domain
-        probe_cc
-        probe_asn
-        test_name
-        test_start_time
-        measurement_start_time
-        platform
-        software_name
-        software_version
-    }
-
-    class Measurement{
-        +Dict test_keys
-    }
-
-    class Fastpath{
-        anomaly
-        confirmed
-        msm_failure
-        blocking_general
-        +Dict scores
-    }
-    Fastpath "1" --> "1" Measurement
-    Measurement *-- CommonMeta
-    Fastpath *-- CommonMeta
-```
+For documentation on OONI Pipeline v5, see the subdirectory `oonipipeline`.
diff --git a/oonipipeline/.env b/oonipipeline/.env
new file mode 100644
index 00000000..cc96a50b
--- /dev/null
+++ b/oonipipeline/.env
@@ -0,0 +1,12 @@
+COMPOSE_PROJECT_NAME=temporal
+CASSANDRA_VERSION=3.11.9
+ELASTICSEARCH_VERSION=7.16.2
+MYSQL_VERSION=8
+TEMPORAL_VERSION=1.23.0
+TEMPORAL_UI_VERSION=2.26.2
+POSTGRESQL_VERSION=13
+POSTGRES_PASSWORD=temporal
+POSTGRES_USER=temporal
+POSTGRES_DEFAULT_PORT=5432
+OPENSEARCH_VERSION=2.5.0
+JAEGER_VERSION=1.56
diff --git a/oonipipeline/.gitignore b/oonipipeline/.gitignore
new file mode 100644
index 00000000..b537087e
--- /dev/null
+++ b/oonipipeline/.gitignore
@@ -0,0 +1 @@
+/_clickhouse-data
diff --git a/oonipipeline/Design.md b/oonipipeline/Design.md
new file mode 100644
index 00000000..5f46a05e
--- /dev/null
+++ b/oonipipeline/Design.md
@@ -0,0 +1,207 @@
+## Architecture overview
+
+The analysis engine is made up of several components:
+
+- Observation generation
+- Response body archiving
+- Ground truth generation
+- Experiment result generation
+
+Below we explain each step of this process in detail
+
+At a high level the pipeline looks like this:
+
+```mermaid
+graph
+    M{{Measurement}} --> OGEN[[make_observations]]
+    OGEN --> |many| O{{Observations}}
+    NDB[(NetInfoDB)] --> OGEN
+    OGEN --> RB{{ResponseBodies}}
+    RB --> BA[(BodyArchive)]
+    FDB[(FingerprintDB)] --> FPH
+    FPH --> BA
+    RB --> FPH[[fingerprint_hunter]]
+    O --> ODB[(ObservationTables)]
+
+    ODB --> MKGT[[make_ground_truths]]
+    MKGT --> GTDB[(GroundTruthDB)]
+    GTDB --> MKER
+    BA --> MKER
+    ODB --> MKER[[make_experiment_results]]
+    MKER --> |one| ER{{ExperimentResult}}
+```
+
+### Observation generation
+
+The goal of the Observation generation stage is to take raw OONI measurements
+as input data and produce as output observations.
+
+An observation is a timestamped statement about some network condition that was
+observed by a particular vantage point. For example, an observation could be
+"the TLS handshake to 8.8.4.4:443 with SNI equal to dns.google failed with
+a connection reset by peer error".
+
+What these observations mean for the
+target in question (e.g., is there blocking or is the target down?) is something
+that is to be determined when looking at data in aggregate and is the
+responsibility of the Verdict generation stage.
+
+During this stage we are also going to enrich observations with metadata about
+IP addresses (using the IPInfoDB).
+
+Each each measurement ends up producing observations that are all of the same
+type and are written to the same DB table.
+
+This has the benefit that we don't need to lookup the observations we care about
+in several disparate tables, but can do it all in the same one, which is
+incredibly fast.
+
+A side effect is that we end up with tables are can be a bit sparse (several
+columns are NULL).
+
+The tricky part, in the case of complex tests like web_connectivity, is to
+figure out which individual sub measurements fit into the same observation row.
+For example we would like to have the TCP connect result to appear in the same
+row as the DNS query that lead to it with the TLS handshake towards that IP,
+port combination.
+
+You can run the observation generation with a clickhouse backend like so:
+
+```
+poetry run python -m oonidata mkobs --clickhouse clickhouse://localhost/ --data-dir tests/data/datadir/ --start-day 2022-08-01 --end-day 2022-10-01 --create-tables --parallelism 20
+```
+
+Here is the list of supported observations so far:
+
+- [x] WebObservation, which has information about DNS, TCP, TLS and HTTP(s)
+- [x] WebControlObservation, has the control measurements run by web connectivity (is used to generate ground truths)
+- [ ] CircumventionToolObservation, still needs to be designed and implemented
+      (ideally we would use the same for OpenVPN, Psiphon, VanillaTor)
+
+### Response body archiving
+
+It is optionally possible to also create WAR archives of HTTP response bodies
+when running the observation generation.
+
+This is enabled by passing the extra command line argument `--archives-dir`.
+
+Whenever a response body is detected in a measurement it is sent to the
+archiving queue which takes the response body, looks up in the database if it
+has seen it already (so we don't store exact duplicate bodies).
+If we haven't archived it yet, we write the body to a WAR file and record it's
+sha1 hash together with the filename where we wrote it to into a database.
+
+These WAR archives can then be mined asynchronously for blockpages using the
+fingerprint hunter command:
+
+```
+oonidata fphunt --data-dir tests/data/datadir/ --archives-dir warchives/ --parallelism 20
+```
+
+When a blockpage matching the fingerprint is detected, the relevant database row
+for that fingerprint is updated with the ID of the fingerprint which was
+detected.
+
+### Ground Truth generation
+
+In order to establish if something is being blocked or not, we need some ground truth for comparison.
+
+The goal of the ground truth generation task is to build a ground truth
+database, which contains all the ground truths for every target that has been
+tested in a particular day.
+
+Currently it's implemented using the WebControlObservations, but in the future
+we could just use other WebObservation.
+
+Each ground truth database is actually just a sqlite3 database. For a given day
+it's approximately 150MB in size and we load them in memory when we are running
+the analysis workflow.
+
+### ExperimentResult generation
+
+An experiment result is the interpretation of one or more observations with a
+determination of whether the target is `BLOCKED`, `DOWN` or `OK`.
+
+For each of these states a confidence indicator is given which is an estimate of the
+likelyhood of that result to be accurate.
+
+For each of the 3 states, it's possible also specify a `blocking_detail`, which
+gives more information as to why the block might be occurring.
+
+It's important to note that for a given measurement, multiple experiment results
+can be generated, because a target might be blocked in multiple ways or be OK in
+some regards, but not in orders.
+
+This is best explained through a concrete example. Let's say a censor is
+blocking https://facebook.com/ with the following logic:
+
+- any DNS query for facebook.com get's as answer "127.0.0.1"
+- any TCP connect request to 157.240.231.35 gets a RST
+- any TLS handshake with SNI facebook.com gets a RST
+
+In this scenario, assuming the probe has discovered other IPs for facebook.com
+through other means (ex. through the test helper or DoH as web_connectivity 0.5
+does), we would like to emit the following experiment results:
+
+- BLOCKED, `dns.bogon`, `facebook.com`
+- BLOCKED, `tcp.rst`, `157.240.231.35:80`
+- BLOCKED, `tcp.rst`, `157.240.231.35:443`
+- OK, `tcp.ok`, `157.240.231.100:80`
+- OK, `tcp.ok`, `157.240.231.100:443`
+- BLOCKED, `tls.rst`, `157.240.231.35:443`
+- BLOCKED, `tls.rst`, `157.240.231.100:443`
+
+This way we are fully characterising the block in all the methods through which
+it is implemented.
+
+### Current pipeline
+
+This section documents the current [ooni/pipeline](https://github.com/ooni/pipeline)
+design.
+
+```mermaid
+graph LR
+
+    Probes --> ProbeServices
+    ProbeServices --> Fastpath
+    Fastpath --> S3MiniCans
+    Fastpath --> S3JSONL
+    Fastpath --> FastpathClickhouse
+    S3JSONL --> API
+    FastpathClickhouse --> API
+    API --> Explorer
+```
+
+```mermaid
+classDiagram
+    direction RL
+    class CommonMeta{
+        measurement_uid
+        report_id
+        input
+        domain
+        probe_cc
+        probe_asn
+        test_name
+        test_start_time
+        measurement_start_time
+        platform
+        software_name
+        software_version
+    }
+
+    class Measurement{
+        +Dict test_keys
+    }
+
+    class Fastpath{
+        anomaly
+        confirmed
+        msm_failure
+        blocking_general
+        +Dict scores
+    }
+    Fastpath "1" --> "1" Measurement
+    Measurement *-- CommonMeta
+    Fastpath *-- CommonMeta
+```
diff --git a/oonipipeline/Readme.md b/oonipipeline/Readme.md
index 8775c9b2..1c2f87e5 100644
--- a/oonipipeline/Readme.md
+++ b/oonipipeline/Readme.md
@@ -3,37 +3,121 @@
 This it the fifth major iteration of the OONI Data Pipeline.
 
 For historical context, these are the major revisions:
-* `v0` - The "pipeline" is basically just writing the RAW json files into a public `www` directory. Used until ~2013
-* `v1` - OONI Pipeline based on custom CLI scripts using mongodb as a backend. Used until ~2015.
-* `v2` - OONI Pipeline based on [luigi](https://luigi.readthedocs.io/en/stable/). Used until ~2017.
-* `v3` - OONI Pipeline based on [airflow](https://airflow.apache.org/). Used until ~2020.
-* `v4` - OONI Pipeline basedon custom script and systemd units (aka fastpath). Currently in use in production.
-* `v5` - Next generation OONI Pipeline. What this readme is relevant to. Expected to become in production by Q4 2024.
+
+- `v0` - The "pipeline" is basically just writing the RAW json files into a public `www` directory. Used until ~2013
+- `v1` - OONI Pipeline based on custom CLI scripts using mongodb as a backend. Used until ~2015.
+- `v2` - OONI Pipeline based on [luigi](https://luigi.readthedocs.io/en/stable/). Used until ~2017.
+- `v3` - OONI Pipeline based on [airflow](https://airflow.apache.org/). Used until ~2020.
+- `v4` - OONI Pipeline basedon custom script and systemd units (aka fastpath). Currently in use in production.
+- `v5` - Next generation OONI Pipeline. What this readme is relevant to. Expected to become in production by Q4 2024.
 
 ## Setup
 
 In order to run the pipeline you should setup the following dependencies:
-* [Temporal for python](https://learn.temporal.io/getting_started/python/dev_environment/)
-* [Clickhouse](https://clickhouse.com/docs/en/install)
-* [hatch](https://hatch.pypa.io/1.9/install/)
 
+- [Temporal for python](https://learn.temporal.io/getting_started/python/dev_environment/)
+- [Clickhouse](https://clickhouse.com/docs/en/install)
+- [hatch](https://hatch.pypa.io/1.9/install/)
 
 ### Quick start
 
 Start temporal dev server:
+
 ```
 temporal server start-dev
 ```
 
 Start clickhouse server:
+
 ```
-mkdir -p clickhouse-data
+mkdir -p _clickhouse-data
+cd _clickhouse-data
 clickhouse server
 ```
 
 You can then start the desired workflow, for example to create signal observations for the US:
+
 ```
 hatch run oonipipeline mkobs --probe-cc US --test-name signal --start-day 2024-01-01 --end-day 2024-01-02
 ```
 
 Monitor the workflow executing by accessing: http://localhost:8233/
+
+If you would like to also collect OpenTelemetry traces, you can set it up like so:
+
+```
+docker run -d --name jaeger \
+  -e COLLECTOR_OTLP_ENABLED=true \
+  -p 16686:16686 \
+  -p 4317:4317 \
+  -p 4318:4318 \
+  jaegertracing/all-in-one:latest
+```
+
+They are then visible at the following address: http://localhost:16686/search
+
+### Production usage
+
+By default we use thread based parallelism, but in production you really want
+to have multiple workers processes which have inside of them multiple threads.
+
+You should also be using the production temporal server with an elasticsearch
+backend as opposed to the dev server.
+
+To start all the server side components, we have a handy docker-compose.yml
+that sets everything up.
+
+It can be started by running from this directory:
+
+```
+docker compose up
+```
+
+The important services you can access are the following:
+
+- Temporal UI: http://localhost:8080
+- Superset UI: http://localhost:8083 (u: `admin`, p: `oonity`)
+- OpenTelemetry UI: http://localhost:8088
+
+We don't include a clickhouse instance inside of the docker-compose file by
+design. The reason for that is that it's recommended you set that up separately
+and not inside of docker.
+
+To start the worker processes:
+
+```
+hatch run oonipipeline startworkers
+```
+
+Then you can trigger the workflow by passing the `--no-start-workers` flag:
+
+```
+hatch run oonipipeline mkobs --probe-cc US --start-day 2024-01-01 --end-day 2024-01-20 --no-start-workers
+```
+
+#### Superset
+
+Superset is a neat data viz platform.
+
+In order to set it up to speak to your clickhouse instance, assuming it's
+listening on localhost of the host container, you should:
+
+1. Click Settings -> Data - Database connections
+2. Click + Database
+3. In the Supported Databases drop down pick "Clickhouse Connect"
+4. Enter as Host `host.docker.internal` and port `8123`
+
+Note: `host.docker.internal` only works reliably on windows, macOS and very
+recent linux+docker versions. In linux the needed configuration is a bit more
+complex and requires discovering the gateway IP of the host container,
+adjusting the clickhouse setup to bind to that IP and setting up correct nft or
+similar firewall rules.
+
+5. Click connect
+6. Go to datasets and click + Dataset
+7. Add all the tables from the `clickhouse` database in the `default` schema.
+   Recommended tables to add are `obs_web` and `measurement_experiment_result`.
+8. You are now able to start building dashboards
+
+For more information on superset usage and setup refer to [their
+documentation](https://superset.apache.org/docs/).
diff --git a/oonipipeline/docker-compose.yml b/oonipipeline/docker-compose.yml
new file mode 100644
index 00000000..aa90f17f
--- /dev/null
+++ b/oonipipeline/docker-compose.yml
@@ -0,0 +1,215 @@
+---
+version: "3.5"
+services:
+
+#### Common services
+  elasticsearch:
+    container_name: elasticsearch
+    hostname: elasticsearch
+    environment:
+      - cluster.routing.allocation.disk.threshold_enabled=true
+      - cluster.routing.allocation.disk.watermark.low=512mb
+      - cluster.routing.allocation.disk.watermark.high=256mb
+      - cluster.routing.allocation.disk.watermark.flood_stage=128mb
+      - discovery.type=single-node
+      - ES_JAVA_OPTS=-Xms256m -Xmx256m
+      - xpack.security.enabled=false
+    image: elasticsearch:${ELASTICSEARCH_VERSION}
+    networks:
+      - main-network
+    expose:
+      - 9200
+    volumes:
+      - ./docker/esdata/:/var/lib/elasticsearch/data
+    healthcheck:
+       test: curl -s http://elasticsearch:9200 >/dev/null || exit 1
+       interval: 30s
+       timeout: 10s
+       retries: 50
+  postgresql:
+    container_name: postgresql
+    hostname: postgresql
+    environment:
+      POSTGRES_PASSWORD: oonipipeline
+      POSTGRES_USER: oonipipeline
+    image: postgres:${POSTGRESQL_VERSION}
+    networks:
+      - main-network
+    expose:
+      - 5432
+    volumes:
+      - ./docker/pgdata:/var/lib/postgresql/data
+  kibana:
+    image: docker.elastic.co/kibana/kibana:${ELASTICSEARCH_VERSION}
+    ports:
+      - "5601:5601"
+    environment:
+      ELASTICSEARCH_URL: http://elasticsearch:9200
+    depends_on:
+    - elasticsearch
+    networks:
+    - main-network
+#### Temporal
+  temporal:
+    container_name: temporal
+    hostname: temporal
+    depends_on:
+      - postgresql
+      - elasticsearch
+    environment:
+      - DB=postgres12
+      - DB_PORT=5432
+      - POSTGRES_USER=oonipipeline
+      - POSTGRES_PWD=oonipipeline
+      - POSTGRES_SEEDS=postgresql
+      - DYNAMIC_CONFIG_FILE_PATH=config/dynamicconfig/development-sql.yaml
+      - ENABLE_ES=true
+      - ES_SEEDS=elasticsearch
+      - ES_VERSION=v7
+    image: temporalio/auto-setup:${TEMPORAL_VERSION}
+    networks:
+      - main-network
+    ports:
+      - 7233:7233
+    labels:
+      kompose.volume.type: configMap
+    volumes:
+      - ./docker/temporal-config:/etc/temporal/config/dynamicconfig
+  temporal-admin-tools:
+    container_name: temporal-admin-tools
+    depends_on:
+      - temporal
+    environment:
+      - TEMPORAL_ADDRESS=temporal:7233
+      - TEMPORAL_CLI_ADDRESS=temporal:7233
+    image: temporalio/admin-tools:${TEMPORAL_VERSION}
+    networks:
+      - main-network
+    stdin_open: true
+    tty: true
+  temporal-ui:
+    container_name: temporal-ui
+    depends_on:
+      - temporal
+    environment:
+      - TEMPORAL_ADDRESS=temporal:7233
+      - TEMPORAL_CORS_ORIGINS=http://localhost:3000
+    image: temporalio/ui:${TEMPORAL_UI_VERSION}
+    networks:
+      - main-network
+    ports:
+      - 8080:8080
+
+#### Jaeger for open telemetry
+  jaeger:
+    image: jaegertracing/all-in-one:${JAEGER_VERSION}
+    ports:
+    - "8088:16686"
+    - "6831:6831/udp"
+    - "6832:6832/udp"
+    - "5778:5778"
+    - "4317:4317"
+    - "4318:4318"
+    - "14250:14250"
+    - "14268:14268"
+    - "14269:14269"
+    - "9411:9411"
+    container_name: jaeger
+    hostname: jaeger
+    restart: unless-stopped
+    networks:
+    - main-network
+    environment:
+      COLLECTOR_ZIPKIN_HOST_PORT: ":9411"
+
+
+### TODO(art): currently jaeger setup with elastic is not working, so we 
+## are temporarily just using the all-in-one container that's not meant for production use
+  # jaeger-collector:
+  #   image: jaegertracing/jaeger-collector:${JAEGER_VERSION}
+  #   ports:
+  #     - "14267:14267"
+  #     - "14268:14268"
+  #     - "9411:9411"
+  #     - "4317:4317"
+  #     - "4318:4318"
+  #   depends_on:
+  #   - elasticsearch
+  #   container_name: jaeger-collector
+  #   hostname: jaeger-collector
+  #   restart: unless-stopped
+  #   networks:
+  #   - main-network
+  #   volumes:
+  #   - ./scripts/:/scripts
+  #   environment:
+  #     SPAN_STORAGE_TYPE: "elasticsearch"
+  #     ES_SERVER_URLS: "http://elasticsearch:9200"
+  #   entrypoint: ["/bin/sh", "/scripts/wait-for.sh", "elasticsearch:9200"]
+  #   command:
+  #     - "/go/bin/collector-linux"
+
+  # jaeger-agent:
+  #   image: jaegertracing/jaeger-agent:${JAEGER_VERSION}
+  #   ports:
+  #     - "5775:5775/udp"
+  #     - "5778:5778"
+  #     - "6831:6831/udp"
+  #     - "6832:6832/udp"
+  #   depends_on:
+  #   - elasticsearch
+  #   - jaeger-collector
+  #   restart: unless-stopped
+  #   container_name: jaeger-agent
+  #   hostname: jaeger-agent
+  #   networks:
+  #   - main-network
+  #   command:
+  #   - "--reporter.grpc.host-port=jaeger-collector:14250"
+
+  # jaeger-query:
+  #   image: jaegertracing/jaeger-query:${JAEGER_VERSION}
+  #   ports:
+  #     - 8081:16686
+  #   depends_on:
+  #     - elasticsearch
+  #     - jaeger-collector
+  #   restart: unless-stopped
+  #   container_name: jaeger-query
+  #   hostname: jaeger-query
+  #   networks:
+  #   - main-network
+  #   volumes:
+  #   - ./scripts/:/scripts
+  #   entrypoint: ["/bin/sh", "/scripts/wait-for.sh", "elasticsearch:9200"]
+  #   environment:
+  #     SPAN_STORAGE_TYPE: "elasticsearch"
+  #     ES_SERVER_URLS: "http://elasticsearch:9200"
+  #   command:
+  #   - "/go/bin/query-linux"
+
+### Superset
+  superset:
+    image: ooni/oonipipeline-superset
+    build:
+      context: .
+      dockerfile: ./docker/superset.Dockerfile
+
+    ports:
+    - "8083:8088"
+    container_name: superset
+    hostname: superset
+    restart: unless-stopped
+    networks:
+    - main-network
+    volumes:
+      - ./docker/superset-config:/etc/superset
+    depends_on:
+      - postgresql
+    environment:
+      SUPERSET_CONFIG_PATH: "/etc/superset/superset_config.py"
+
+networks:
+  main-network:
+    driver: bridge
+    name: main-network
diff --git a/oonipipeline/docker/.gitignore b/oonipipeline/docker/.gitignore
new file mode 100644
index 00000000..16377704
--- /dev/null
+++ b/oonipipeline/docker/.gitignore
@@ -0,0 +1 @@
+/pgdata
diff --git a/oonipipeline/docker/run-server-with-setup.sh b/oonipipeline/docker/run-server-with-setup.sh
new file mode 100644
index 00000000..f3606f9b
--- /dev/null
+++ b/oonipipeline/docker/run-server-with-setup.sh
@@ -0,0 +1,18 @@
+#!/usr/bin/env bash
+set -ex
+
+echo "starting superset"
+
+if [ ! -f /var/run/superset/superset_is_configured ]; then
+    echo "superset is not configured, setting it up"
+    superset fab create-admin \
+                --username admin \
+                --firstname OONI \
+                --lastname Tarian \
+                --email admin@ooni.org \
+                --password oonity
+    superset db upgrade
+    superset init
+    touch /var/run/superset/superset_is_configured
+fi
+/usr/bin/run-server.sh
\ No newline at end of file
diff --git a/oonipipeline/src/oonipipeline/workflows/__init__.py b/oonipipeline/docker/superset-config/_isconfigured
similarity index 100%
rename from oonipipeline/src/oonipipeline/workflows/__init__.py
rename to oonipipeline/docker/superset-config/_isconfigured
diff --git a/oonipipeline/docker/superset-config/superset_config.py b/oonipipeline/docker/superset-config/superset_config.py
new file mode 100644
index 00000000..e4c21b3e
--- /dev/null
+++ b/oonipipeline/docker/superset-config/superset_config.py
@@ -0,0 +1,2 @@
+SQLALCHEMY_DATABASE_URI = 'postgresql://oonipipeline:oonipipeline@postgresql'
+SECRET_KEY = 'oonity_superset_supersecret_CHANGEME'
diff --git a/oonipipeline/docker/superset.Dockerfile b/oonipipeline/docker/superset.Dockerfile
new file mode 100644
index 00000000..0d847942
--- /dev/null
+++ b/oonipipeline/docker/superset.Dockerfile
@@ -0,0 +1,10 @@
+
+FROM apache/superset
+USER root
+RUN pip install clickhouse-connect
+
+RUN mkdir -p /var/run/superset/ && chown superset:superset /var/run/superset/
+COPY --chown=superset --chmod=755 ./docker/run-server-with-setup.sh /usr/bin/
+
+USER superset
+CMD ["/usr/bin/env", "bash", "/usr/bin/run-server-with-setup.sh"]
\ No newline at end of file
diff --git a/oonipipeline/docker/temporal-config/development-cass.yml b/oonipipeline/docker/temporal-config/development-cass.yml
new file mode 100644
index 00000000..4b916163
--- /dev/null
+++ b/oonipipeline/docker/temporal-config/development-cass.yml
@@ -0,0 +1,3 @@
+system.forceSearchAttributesCacheRefreshOnRead:
+  - value: true # Dev setup only. Please don't turn this on in production.
+    constraints: {}
diff --git a/oonipipeline/docker/temporal-config/development-sql.yaml b/oonipipeline/docker/temporal-config/development-sql.yaml
new file mode 100644
index 00000000..8862dfad
--- /dev/null
+++ b/oonipipeline/docker/temporal-config/development-sql.yaml
@@ -0,0 +1,6 @@
+limit.maxIDLength:
+  - value: 255
+    constraints: {}
+system.forceSearchAttributesCacheRefreshOnRead:
+  - value: true # Dev setup only. Please don't turn this on in production.
+    constraints: {}
diff --git a/oonipipeline/docker/temporal-config/docker.yaml b/oonipipeline/docker/temporal-config/docker.yaml
new file mode 100644
index 00000000..e69de29b
diff --git a/oonipipeline/pyproject.toml b/oonipipeline/pyproject.toml
index f6ae82a5..902bce2f 100644
--- a/oonipipeline/pyproject.toml
+++ b/oonipipeline/pyproject.toml
@@ -35,6 +35,8 @@ dependencies = [
   "flask ~= 2.2.0",
   "jupyterlab ~= 4.0.7",
   "temporalio ~= 1.5.1",
+  "temporalio[opentelemetry] ~= 1.5.1",
+  "opentelemetry-exporter-otlp-proto-grpc ~= 1.18.0"
 ]
 
 [tool.hatch.build.targets.sdist]
@@ -55,6 +57,7 @@ dependencies = [
   "memray",
   "viztracer",
   "pytest-docker",
+  "ipdb"
 ]
 python = "3.11"
 path = ".venv/"
@@ -65,6 +68,7 @@ path = "src/oonipipeline/__about__.py"
 [tool.hatch.envs.default.scripts]
 oonipipeline = "python -m oonipipeline.main {args}"
 test = "pytest {args:tests}"
-test-cov = "pytest -s --full-trace --log-level=INFO  --log-cli-level=INFO -v --setup-show --cov=./ --cov-report=xml --cov-report=html --cov-report=term {args:tests}"
+# --full-trace --log-level=INFO  --log-cli-level=INFO -v --setup-show -s
+test-cov = "pytest --cov=./ --cov-report=xml --cov-report=html --cov-report=term {args:tests}"
 cov-report = ["coverage report"]
 cov = ["test-cov", "cov-report"]
diff --git a/oonipipeline/scripts/wait-for.sh b/oonipipeline/scripts/wait-for.sh
new file mode 100755
index 00000000..0acf2a94
--- /dev/null
+++ b/oonipipeline/scripts/wait-for.sh
@@ -0,0 +1,15 @@
+#!/bin/sh
+
+set -e
+
+host="$1"
+shift
+cmd="$@"
+
+until wget --spider --quiet $host > /dev/null; do
+  >&2 echo "Waiting for $host to become available..."
+  sleep 1
+done
+
+>&2 echo "$host is up - executing command $cmd"
+exec $cmd
diff --git a/oonipipeline/src/oonipipeline/__about__.py b/oonipipeline/src/oonipipeline/__about__.py
index 8f3f4b21..ed042c45 100644
--- a/oonipipeline/src/oonipipeline/__about__.py
+++ b/oonipipeline/src/oonipipeline/__about__.py
@@ -1 +1 @@
-VERSION = "4.0.0dev1"
+VERSION = "5.0.0a0"
diff --git a/oonipipeline/src/oonipipeline/analysis/control.py b/oonipipeline/src/oonipipeline/analysis/control.py
index 1f910381..6af25208 100644
--- a/oonipipeline/src/oonipipeline/analysis/control.py
+++ b/oonipipeline/src/oonipipeline/analysis/control.py
@@ -197,6 +197,11 @@ def build_from_rows(self, rows: Iterable):
         self.db.execute("pragma optimize;")
         self.create_indexes()
 
+    def count_rows(self) -> int:
+        row = self.db.execute(f"SELECT COUNT() FROM {self._table_name};").fetchone()
+        assert len(row) == 1
+        return row[0]
+
     def build_from_existing(self, db_str: str):
         with sqlite3.connect(db_str) as src_db:
             self.db = sqlite3.connect(":memory:")
@@ -283,8 +288,13 @@ def select_query(
         if hostnames:
             sub_q = "("
             sub_q += "OR ".join(
-                # When hostname was supplied, we only care about it in relation to DNS resolutions
-                [" hostname = ? AND dns_success = 1 " for _ in range(len(hostnames))]
+                # When hostname was supplied, we only care about it in relation
+                # to DNS resolutions, so we only get DNS failure or DNS success
+                # rows
+                [
+                    " hostname = ? AND (dns_success = 1 OR dns_failure IS NOT NULL) "
+                    for _ in range(len(hostnames))
+                ]
             )
             sub_q += ")"
             q_args += hostnames
diff --git a/oonipipeline/src/oonipipeline/analysis/signal.py b/oonipipeline/src/oonipipeline/analysis/signal.py
index 95d0eb99..11ffbc0c 100644
--- a/oonipipeline/src/oonipipeline/analysis/signal.py
+++ b/oonipipeline/src/oonipipeline/analysis/signal.py
@@ -12,6 +12,9 @@
 from ..fingerprintdb import FingerprintDB
 
 
+## TODO(art): port this over to the new MeasurementExperimentResult model
+
+
 def make_signal_experiment_result(
     web_observations: List[WebObservation],
     fingerprintdb: FingerprintDB,
diff --git a/oonipipeline/src/oonipipeline/analysis/web_analysis.py b/oonipipeline/src/oonipipeline/analysis/web_analysis.py
index 48cf7677..0f793a1a 100644
--- a/oonipipeline/src/oonipipeline/analysis/web_analysis.py
+++ b/oonipipeline/src/oonipipeline/analysis/web_analysis.py
@@ -199,7 +199,7 @@ def make_dns_ground_truth(ground_truths: Iterable[WebGroundTruth]):
     failure_count = 0
     nxdomain_count = 0
     for gt in ground_truths:
-        if gt.dns_success is None:
+        if gt.dns_success is None and gt.dns_failure is None:
             continue
 
         if gt.dns_failure == "dns_nxdomain_error":
@@ -207,7 +207,7 @@ def make_dns_ground_truth(ground_truths: Iterable[WebGroundTruth]):
             nxdomain_cc_asn.add((gt.vp_cc, gt.vp_asn))
             continue
 
-        if not gt.dns_success:
+        if gt.dns_failure is not None:
             failure_count += gt.count
             failure_cc_asn.add((gt.vp_cc, gt.vp_asn))
             continue
@@ -697,18 +697,7 @@ def make_web_analysis(
         )
 
         if dns_analysis:
-            website_analysis.dns_ground_truth_nxdomain_count = (
-                dns_analysis.ground_truth.nxdomain_count
-            )
-            website_analysis.dns_ground_truth_ok_cc_asn_count = (
-                dns_analysis.ground_truth.ok_cc_asn_count
-            )
-            website_analysis.dns_ground_truth_failure_cc_asn_count = (
-                dns_analysis.ground_truth.failure_cc_asn_count
-            )
-            website_analysis.dns_ground_truth_nxdomain_cc_asn_count = (
-                dns_analysis.ground_truth.nxdomain_cc_asn_count
-            )
+
             website_analysis.dns_consistency_system_answers = (
                 dns_analysis.consistency_system.answers
             )
@@ -775,6 +764,26 @@ def make_web_analysis(
             website_analysis.dns_consistency_system_answer_asn_ground_truth_asn_count = (
                 dns_analysis.consistency_system.answer_asn_ground_truth_asn_count
             )
+
+            website_analysis.dns_ground_truth_failure_count = (
+                dns_analysis.ground_truth.failure_count
+            )
+            website_analysis.dns_ground_truth_ok_count = (
+                dns_analysis.ground_truth.ok_count
+            )
+            website_analysis.dns_ground_truth_nxdomain_count = (
+                dns_analysis.ground_truth.nxdomain_count
+            )
+            website_analysis.dns_ground_truth_ok_cc_asn_count = (
+                dns_analysis.ground_truth.ok_cc_asn_count
+            )
+            website_analysis.dns_ground_truth_failure_cc_asn_count = (
+                dns_analysis.ground_truth.failure_cc_asn_count
+            )
+            website_analysis.dns_ground_truth_nxdomain_cc_asn_count = (
+                dns_analysis.ground_truth.nxdomain_cc_asn_count
+            )
+
             """
             website_analysis.dns_ground_truth_nxdomain_cc_asn = (
                 dns_analysis.ground_truth.nxdomain_cc_asn
@@ -782,15 +791,9 @@ def make_web_analysis(
             website_analysis.dns_ground_truth_failure_cc_asn = (
                 dns_analysis.ground_truth.failure_cc_asn
             )
-            website_analysis.dns_ground_truth_failure_count = (
-                dns_analysis.ground_truth.failure_count
-            )
             website_analysis.dns_ground_truth_ok_cc_asn = (
                 dns_analysis.ground_truth.ok_cc_asn
             )
-            website_analysis.dns_ground_truth_ok_count = (
-                dns_analysis.ground_truth.ok_count
-            )
             website_analysis.dns_ground_truth_other_ips = (
                 dns_analysis.ground_truth.other_ips
             )
diff --git a/oonipipeline/src/oonipipeline/analysis/website_experiment_results.py b/oonipipeline/src/oonipipeline/analysis/website_experiment_results.py
index de4c4e3d..9c442e60 100644
--- a/oonipipeline/src/oonipipeline/analysis/website_experiment_results.py
+++ b/oonipipeline/src/oonipipeline/analysis/website_experiment_results.py
@@ -48,10 +48,13 @@ def to_dict(self) -> Dict[str, float]:
         return d
 
     def sum(self) -> float:
-        s = 0
-        for _, val in self.to_dict().items():
-            s += val
-        return s
+        return sum([v for v in self.to_dict().values()])
+
+    def max(self) -> float:
+        return max([v for v in self.to_dict().values()])
+
+    def min(self) -> float:
+        return min([v for v in self.to_dict().values()])
 
 
 @dataclass
@@ -214,21 +217,23 @@ def calculate_web_loni(
             blocked_key = "dns.confirmed"
             blocking_scope = web_analysis.dns_consistency_system_answer_fp_scope
             blocked_value = 0.9
+            down_value = 0.0
             if (
                 web_analysis.dns_consistency_system_is_answer_fp_country_consistent
                 == True
             ):
                 blocked_key = "dns.confirmed.country_consistent"
                 blocked_value = 1.0
+                down_value = 0.0
             elif (
                 web_analysis.dns_consistency_system_is_answer_fp_country_consistent
                 == False
             ):
-                # We let the blocked value be slightly less for cases where the fingerprint is not country consistent
+                # If the fingerprint is not country consistent, we consider it down to avoid false positives
                 blocked_key = "dns.confirmed.not_country_consistent"
-                blocked_value = 0.8
-            ok_value = 0
-            down_value = 0
+                down_value = 0.8
+                blocked_value = 0.2
+            ok_value = 0.0
         elif web_analysis.dns_consistency_system_is_answer_bogon == True:
             # Bogons are always fishy, yet we don't know if we see it because
             # the site is misconfigured.
@@ -383,6 +388,7 @@ def calculate_web_loni(
         down_value, blocked_value = 0.0, 0.0
         blocked.tcp = OutcomeStatus(key=blocked_key, value=blocked_value)
         down.tcp = OutcomeStatus(key=down_key, value=down_value)
+        ok.tcp = OutcomeStatus(key="tcp", value=1 - (blocked.sum() + down.sum()))
 
     elif web_analysis.tcp_success == False:
         analysis_transcript.append("web_analysis.tcp_success == False")
@@ -649,7 +655,10 @@ def calculate_web_loni(
                         blocked_value = 0.8
                 elif web_analysis.http_is_http_fp_false_positive == True:
                     blocked_value = 0.0
-            else:
+            elif (
+                web_analysis.http_response_body_length is not None
+                and web_analysis.http_ground_truth_body_length is not None
+            ):
                 # We need to apply some fuzzy logic to fingerprint it
                 # TODO(arturo): in the future can use more features, such as the following
                 """
@@ -815,6 +824,9 @@ def make_website_experiment_results(
     loni_ok_list: List[OutcomeSpace] = []
     for wa in web_analysis:
         loni, analysis_transcript = calculate_web_loni(wa)
+        log.debug("wa: %s", wa)
+        log.debug("analysis_transcript: %s", analysis_transcript)
+        log.debug("loni: %s", loni)
         analysis_transcript_list.append(analysis_transcript)
         loni_list.append(loni)
         loni_blocked_list.append(loni.blocked)
@@ -953,7 +965,7 @@ def get_agg_outcome(loni_list, category, agg_func) -> Optional[OutcomeStatus]:
     )
     log.debug(f"final_loni: {final_loni}")
 
-    loni_ok_value = final_loni.ok_final
+    loni_ok_value = final_ok.min()
 
     loni_down = final_loni.down.to_dict()
     loni_down_keys, loni_down_values = list(loni_down.keys()), list(loni_down.values())
diff --git a/oonipipeline/src/oonipipeline/cli/commands.py b/oonipipeline/src/oonipipeline/cli/commands.py
index 072948b0..659703d2 100644
--- a/oonipipeline/src/oonipipeline/cli/commands.py
+++ b/oonipipeline/src/oonipipeline/cli/commands.py
@@ -1,86 +1,204 @@
+from concurrent.futures import ProcessPoolExecutor, as_completed
+from dataclasses import dataclass
+import dataclasses
 import logging
 import multiprocessing
 from pathlib import Path
+import asyncio
+import signal
 import sys
 from typing import List, Optional
-from datetime import date, timedelta, datetime
+from datetime import date, timedelta, datetime, timezone
 from typing import List, Optional
 
+import opentelemetry.context
+from opentelemetry import trace
+from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
+from opentelemetry.sdk.resources import SERVICE_NAME, Resource
+from opentelemetry.sdk.trace import TracerProvider
+from opentelemetry.sdk.trace.export import BatchSpanProcessor
+
 import click
 from click_loglevel import LogLevel
 
+from temporalio.runtime import (
+    OpenTelemetryConfig,
+    Runtime as TemporalRuntime,
+    TelemetryConfig,
+)
+from temporalio.client import (
+    Client as TemporalClient,
+)
+from temporalio.types import MethodAsyncSingleParam, SelfType, ParamType, ReturnType
+
+from temporalio.contrib.opentelemetry import TracingInterceptor
+
+from ..temporal.workers import make_threaded_worker
+
+from ..temporal.workflows import (
+    AnalysisBackfillWorkflow,
+    BackfillWorkflowParams,
+    GroundTruthsWorkflow,
+    GroundTruthsWorkflowParams,
+    ObservationsBackfillWorkflow,
+    TASK_QUEUE_NAME,
+)
+
 from ..__about__ import VERSION
 from ..db.connections import ClickhouseConnection
 from ..db.create_tables import create_queries, list_all_table_diffs
 from ..netinfo import NetinfoDB
 
-log = logging.getLogger("oonidata")
-
-import asyncio
 
-import concurrent.futures
+def init_runtime_with_telemetry(endpoint: str) -> TemporalRuntime:
+    provider = TracerProvider(resource=Resource.create({SERVICE_NAME: "oonipipeline"}))
+    exporter = OTLPSpanExporter(
+        endpoint=endpoint, insecure=endpoint.startswith("http://")
+    )
+    provider.add_span_processor(BatchSpanProcessor(exporter))
+    trace.set_tracer_provider(provider)
 
-from temporalio.client import Client as TemporalClient
-from temporalio.worker import Worker, SharedStateManager
+    return TemporalRuntime(
+        telemetry=TelemetryConfig(metrics=OpenTelemetryConfig(url=endpoint))
+    )
 
-from temporalio.types import MethodAsyncSingleParam, SelfType, ParamType, ReturnType
 
-from ..workflows.observations import (
-    ObservationsWorkflow,
-    ObservationsWorkflowParams,
-    make_observation_in_day,
-)
-
-from ..workflows.ground_truths import (
-    GroundTruthsWorkflow,
-    GroundTruthsWorkflowParams,
-    make_ground_truths_in_day,
-)
-
-from ..workflows.analysis import (
-    AnalysisWorkflow,
-    AnalysisWorkflowParams,
-    make_analysis_in_a_day,
-)
+async def temporal_connect(telemetry_endpoint: str, temporal_address: str):
+    runtime = init_runtime_with_telemetry(telemetry_endpoint)
+    client = await TemporalClient.connect(
+        temporal_address,
+        interceptors=[TracingInterceptor()],
+        runtime=runtime,
+    )
+    return client
 
 
-TASK_QUEUE_NAME = "oonipipeline-task-queue"
+@dataclass
+class WorkerParams:
+    temporal_address: str
+    telemetry_endpoint: str
+    thread_count: int
+    process_idx: int = 0
 
 
-async def run_workflow(
+async def start_threaded_worker(params: WorkerParams):
+    client = await temporal_connect(
+        telemetry_endpoint=params.telemetry_endpoint,
+        temporal_address=params.temporal_address,
+    )
+    worker = make_threaded_worker(client, parallelism=params.thread_count)
+    await worker.run()
+
+
+def run_worker(params: WorkerParams):
+    try:
+        asyncio.run(start_threaded_worker(params))
+    except KeyboardInterrupt:
+        print("shutting down")
+
+
+def start_workers(params: WorkerParams, process_count: int):
+    def signal_handler(signal, frame):
+        print("shutdown requested: Ctrl+C detected")
+        sys.exit(0)
+
+    signal.signal(signal.SIGINT, signal_handler)
+
+    process_params = [
+        dataclasses.replace(params, process_idx=idx) for idx in range(process_count)
+    ]
+    executor = ProcessPoolExecutor(max_workers=process_count)
+    try:
+        futures = [executor.submit(run_worker, param) for param in process_params]
+        for future in as_completed(futures):
+            future.result()
+    except KeyboardInterrupt:
+        print("ctrl+C detected, cancelling tasks...")
+        for future in futures:
+            future.cancel()
+        executor.shutdown(wait=True)
+        print("all tasks have been cancelled and cleaned up")
+    except Exception as e:
+        print(f"an error occurred: {e}")
+        executor.shutdown(wait=False)
+        raise
+
+
+async def execute_workflow_with_workers(
     workflow: MethodAsyncSingleParam[SelfType, ParamType, ReturnType],
     arg: ParamType,
-    parallelism: int = 5,
-    temporal_address: str = "localhost:7233",
+    parallelism,
+    workflow_id_prefix: str,
+    telemetry_endpoint: str,
+    temporal_address: str,
 ):
-    client = await TemporalClient.connect(temporal_address)
-    async with Worker(
-        client,
-        task_queue=TASK_QUEUE_NAME,
-        workflows=[
-            ObservationsWorkflow,
-            GroundTruthsWorkflow,
-            AnalysisWorkflow,
-        ],
-        activities=[
-            make_observation_in_day,
-            make_ground_truths_in_day,
-            make_analysis_in_a_day,
-        ],
-        activity_executor=concurrent.futures.ProcessPoolExecutor(parallelism + 2),
-        max_concurrent_activities=parallelism,
-        shared_state_manager=SharedStateManager.create_from_multiprocessing(
-            multiprocessing.Manager()
-        ),
-    ):
+    click.echo(
+        f"running workflow {workflow} temporal_address={temporal_address} telemetry_address={telemetry_endpoint} parallelism={parallelism}"
+    )
+    ts = datetime.now(timezone.utc).strftime("%Y%m%d%H%M%S")
+    client = await temporal_connect(
+        telemetry_endpoint=telemetry_endpoint, temporal_address=temporal_address
+    )
+    async with make_threaded_worker(client, parallelism=parallelism):
         await client.execute_workflow(
             workflow,
             arg,
-            id=TASK_QUEUE_NAME,
+            id=f"{workflow_id_prefix}-{ts}",
             task_queue=TASK_QUEUE_NAME,
         )
 
 
+async def execute_workflow(
+    workflow: MethodAsyncSingleParam[SelfType, ParamType, ReturnType],
+    arg: ParamType,
+    parallelism,
+    workflow_id_prefix: str,
+    telemetry_endpoint: str,
+    temporal_address: str,
+):
+    click.echo(
+        f"running workflow {workflow} temporal_address={temporal_address} telemetry_address={telemetry_endpoint} parallelism={parallelism}"
+    )
+    ts = datetime.now(timezone.utc).strftime("%Y%m%d%H%M%S")
+    client = await temporal_connect(
+        telemetry_endpoint=telemetry_endpoint, temporal_address=temporal_address
+    )
+    await client.execute_workflow(
+        workflow,
+        arg,
+        id=f"{workflow_id_prefix}-{ts}",
+        task_queue=TASK_QUEUE_NAME,
+    )
+
+
+def run_workflow(
+    workflow: MethodAsyncSingleParam[SelfType, ParamType, ReturnType],
+    arg: ParamType,
+    parallelism,
+    start_workers: bool,
+    workflow_id_prefix: str,
+    telemetry_endpoint: str,
+    temporal_address: str,
+):
+    action = execute_workflow
+    if start_workers:
+        print("starting also workers")
+        action = execute_workflow_with_workers
+    try:
+        asyncio.run(
+            action(
+                workflow=workflow,
+                arg=arg,
+                parallelism=parallelism,
+                workflow_id_prefix=workflow_id_prefix,
+                telemetry_endpoint=telemetry_endpoint,
+                temporal_address=temporal_address,
+            )
+        )
+    except KeyboardInterrupt:
+        print("shutting down")
+
+
 def _parse_csv(ctx, param, s: Optional[str]) -> List[str]:
     if s:
         return s.split(",")
@@ -115,9 +233,35 @@ def _parse_csv(ctx, param, s: Optional[str]) -> List[str]:
     """,
 )
 
+start_at_option = click.option(
+    "--start-at",
+    type=click.DateTime(),
+    default=str(datetime.now(timezone.utc).date() - timedelta(days=14)),
+    help="""the timestamp of the day for which we should start processing data (inclusive).
+
+    Note: this is the upload date, which doesn't necessarily match the measurement date.
+    """,
+)
+end_at_option = click.option(
+    "--end-at",
+    type=click.DateTime(),
+    default=str(datetime.now(timezone.utc).date() + timedelta(days=1)),
+    help="""the timestamp of the day for which we should start processing data (inclusive). 
+
+    Note: this is the upload date, which doesn't necessarily match the measurement date.
+    """,
+)
+
 clickhouse_option = click.option(
     "--clickhouse", type=str, required=True, default="clickhouse://localhost"
 )
+telemetry_endpoint_option = click.option(
+    "--telemetry-endpoint", type=str, required=True, default="http://localhost:4317"
+)
+temporal_address_option = click.option(
+    "--temporal-address", type=str, required=True, default="localhost:7233"
+)
+start_workers_option = click.option("--start-workers/--no-start-workers", default=True)
 
 datadir_option = click.option(
     "--data-dir",
@@ -126,10 +270,15 @@ def _parse_csv(ctx, param, s: Optional[str]) -> List[str]:
     default="tests/data/datadir",
     help="data directory to store fingerprint and geoip databases",
 )
+parallelism_option = click.option(
+    "--parallelism",
+    type=int,
+    default=multiprocessing.cpu_count() + 2,
+    help="number of processes to use. Only works when writing to a database",
+)
 
 
 @click.group()
-@click.option("--error-log-file", type=Path)
 @click.option(
     "-l",
     "--log-level",
@@ -139,13 +288,8 @@ def _parse_csv(ctx, param, s: Optional[str]) -> List[str]:
     show_default=True,
 )
 @click.version_option(VERSION)
-def cli(error_log_file: Path, log_level: int):
-    log.addHandler(logging.StreamHandler(sys.stderr))
-    log.setLevel(log_level)
-    if error_log_file:
-        logging.basicConfig(
-            filename=error_log_file, encoding="utf-8", level=logging.ERROR
-        )
+def cli(log_level: int):
+    logging.basicConfig(level=log_level)
 
 
 @cli.command()
@@ -155,12 +299,10 @@ def cli(error_log_file: Path, log_level: int):
 @end_day_option
 @clickhouse_option
 @datadir_option
-@click.option(
-    "--parallelism",
-    type=int,
-    default=multiprocessing.cpu_count() + 2,
-    help="number of processes to use. Only works when writing to a database",
-)
+@parallelism_option
+@telemetry_endpoint_option
+@temporal_address_option
+@start_workers_option
 @click.option(
     "--fast-fail",
     is_flag=True,
@@ -187,6 +329,9 @@ def mkobs(
     fast_fail: bool,
     create_tables: bool,
     drop_tables: bool,
+    telemetry_endpoint: str,
+    temporal_address: str,
+    start_workers: bool,
 ):
     """
     Make observations for OONI measurements and write them into clickhouse or a CSV file
@@ -207,22 +352,24 @@ def mkobs(
     NetinfoDB(datadir=Path(data_dir), download=True)
     click.echo("downloaded netinfodb")
 
-    arg = ObservationsWorkflowParams(
+    params = BackfillWorkflowParams(
         probe_cc=probe_cc,
         test_name=test_name,
-        start_day=start_day,
-        end_day=end_day,
         clickhouse=clickhouse,
         data_dir=str(data_dir),
         fast_fail=fast_fail,
+        start_day=start_day,
+        end_day=end_day,
     )
-    click.echo(f"starting to make observations with arg={arg}")
-    asyncio.run(
-        run_workflow(
-            ObservationsWorkflow.run,
-            arg,
-            parallelism=parallelism,
-        )
+    click.echo(f"starting to make observations with params={params}")
+    run_workflow(
+        ObservationsBackfillWorkflow.run,
+        params,
+        parallelism=parallelism,
+        workflow_id_prefix="oonipipeline-mkobs",
+        telemetry_endpoint=telemetry_endpoint,
+        temporal_address=temporal_address,
+        start_workers=start_workers,
     )
 
 
@@ -233,12 +380,10 @@ def mkobs(
 @end_day_option
 @clickhouse_option
 @datadir_option
-@click.option(
-    "--parallelism",
-    type=int,
-    default=multiprocessing.cpu_count() + 2,
-    help="number of processes to use. Only works when writing to a database",
-)
+@parallelism_option
+@telemetry_endpoint_option
+@temporal_address_option
+@start_workers_option
 @click.option(
     "--fast-fail",
     is_flag=True,
@@ -249,11 +394,6 @@ def mkobs(
     is_flag=True,
     help="should we attempt to create the required clickhouse tables",
 )
-@click.option(
-    "--rebuild-ground-truths",
-    is_flag=True,
-    help="should we force the rebuilding of ground truths",
-)
 def mkanalysis(
     probe_cc: List[str],
     test_name: List[str],
@@ -264,7 +404,9 @@ def mkanalysis(
     parallelism: int,
     fast_fail: bool,
     create_tables: bool,
-    rebuild_ground_truths: bool,
+    telemetry_endpoint: str,
+    temporal_address: str,
+    start_workers: bool,
 ):
     if create_tables:
         with ClickhouseConnection(clickhouse) as db:
@@ -276,24 +418,23 @@ def mkanalysis(
     NetinfoDB(datadir=Path(data_dir), download=True)
     click.echo("downloaded netinfodb")
 
-    arg = AnalysisWorkflowParams(
+    params = BackfillWorkflowParams(
         probe_cc=probe_cc,
         test_name=test_name,
         start_day=start_day,
         end_day=end_day,
         clickhouse=clickhouse,
         data_dir=str(data_dir),
-        parallelism=parallelism,
         fast_fail=fast_fail,
-        rebuild_ground_truths=rebuild_ground_truths,
     )
-    click.echo(f"starting to make analysis with arg={arg}")
-    asyncio.run(
-        run_workflow(
-            AnalysisWorkflow.run,
-            arg,
-            parallelism=parallelism,
-        )
+    run_workflow(
+        AnalysisBackfillWorkflow.run,
+        params,
+        parallelism=parallelism,
+        workflow_id_prefix="oonipipeline-mkanalysis",
+        telemetry_endpoint=telemetry_endpoint,
+        temporal_address=temporal_address,
+        start_workers=start_workers,
     )
 
 
@@ -302,28 +443,64 @@ def mkanalysis(
 @end_day_option
 @clickhouse_option
 @datadir_option
+@parallelism_option
+@telemetry_endpoint_option
+@temporal_address_option
+@start_workers_option
 def mkgt(
     start_day: str,
     end_day: str,
     clickhouse: str,
     data_dir: Path,
+    parallelism: int,
+    telemetry_endpoint: str,
+    temporal_address: str,
+    start_workers: bool,
 ):
     click.echo("Starting to build ground truths")
     NetinfoDB(datadir=Path(data_dir), download=True)
     click.echo("downloaded netinfodb")
 
-    arg = GroundTruthsWorkflowParams(
+    params = GroundTruthsWorkflowParams(
         start_day=start_day,
         end_day=end_day,
         clickhouse=clickhouse,
         data_dir=str(data_dir),
     )
-    click.echo(f"starting to make ground truths with arg={arg}")
-    asyncio.run(
-        run_workflow(
-            GroundTruthsWorkflow.run,
-            arg,
-        )
+    click.echo(f"starting to make ground truths with arg={params}")
+    run_workflow(
+        GroundTruthsWorkflow.run,
+        params,
+        parallelism=parallelism,
+        workflow_id_prefix="oonipipeline-mkgt",
+        telemetry_endpoint=telemetry_endpoint,
+        temporal_address=temporal_address,
+        start_workers=start_workers,
+    )
+
+
+@cli.command()
+@datadir_option
+@parallelism_option
+@telemetry_endpoint_option
+@temporal_address_option
+def startworkers(
+    data_dir: Path,
+    parallelism: int,
+    telemetry_endpoint: str,
+    temporal_address: str,
+):
+    click.echo(f"starting {parallelism} workers")
+    click.echo(f"downloading NetinfoDB to {data_dir}")
+    NetinfoDB(datadir=Path(data_dir), download=True)
+    click.echo("done downloading netinfodb")
+    start_workers(
+        params=WorkerParams(
+            temporal_address=temporal_address,
+            telemetry_endpoint=telemetry_endpoint,
+            thread_count=parallelism,
+        ),
+        process_count=parallelism,
     )
 
 
diff --git a/oonipipeline/src/oonipipeline/db/connections.py b/oonipipeline/src/oonipipeline/db/connections.py
index 43f29eec..62be40f0 100644
--- a/oonipipeline/src/oonipipeline/db/connections.py
+++ b/oonipipeline/src/oonipipeline/db/connections.py
@@ -6,6 +6,7 @@
 from datetime import datetime, timezone
 from pprint import pformat
 import logging
+from typing import Optional
 
 log = logging.getLogger("oonidata.processing")
 
@@ -26,7 +27,13 @@ def close(self):
 
 
 class ClickhouseConnection(DatabaseConnection):
-    def __init__(self, conn_url, row_buffer_size=0, max_block_size=1_000_000):
+    def __init__(
+        self,
+        conn_url,
+        row_buffer_size=0,
+        max_block_size=1_000_000,
+        dump_failing_rows: Optional[str] = None,
+    ):
         from clickhouse_driver import Client
 
         self.clickhouse_url = conn_url
@@ -37,6 +44,7 @@ def __init__(self, conn_url, row_buffer_size=0, max_block_size=1_000_000):
 
         self._column_names = {}
         self._row_buffer = defaultdict(list)
+        self.dump_failing_rows = dump_failing_rows
 
     def __enter__(self):
         return self
@@ -91,8 +99,10 @@ def flush_rows(self, table_name, rows):
                     time.sleep(0.1)
                 except Exception as exc:
                     log.error(f"Failed to write {row} ({exc}) {query_str}")
-                    with open(f"failing-rows.pickle", "ab") as out_file:
-                        pickle.dump({"query_str": query_str, "row": row}, out_file)
+
+                    if self.dump_failing_rows:
+                        with open(self.dump_failing_rows, "ab") as out_file:
+                            pickle.dump({"query_str": query_str, "row": row}, out_file)
 
     def flush_all_rows(self):
         for table_name, rows in self._row_buffer.items():
diff --git a/oonipipeline/src/oonipipeline/temporal/__init__.py b/oonipipeline/src/oonipipeline/temporal/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/oonipipeline/src/oonipipeline/temporal/activities/__init__.py b/oonipipeline/src/oonipipeline/temporal/activities/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/oonipipeline/src/oonipipeline/temporal/activities/analysis.py b/oonipipeline/src/oonipipeline/temporal/activities/analysis.py
new file mode 100644
index 00000000..37b3615a
--- /dev/null
+++ b/oonipipeline/src/oonipipeline/temporal/activities/analysis.py
@@ -0,0 +1,241 @@
+import dataclasses
+from dataclasses import dataclass
+import pathlib
+
+from datetime import datetime
+from typing import Dict, List
+
+import opentelemetry.trace
+from temporalio import workflow, activity
+
+with workflow.unsafe.imports_passed_through():
+    import clickhouse_driver
+
+    import orjson
+
+    from oonidata.models.analysis import WebAnalysis
+    from oonidata.models.experiment_result import MeasurementExperimentResult
+
+    from ...analysis.control import BodyDB, WebGroundTruthDB
+    from ...analysis.datasources import iter_web_observations
+    from ...analysis.web_analysis import make_web_analysis
+    from ...analysis.website_experiment_results import make_website_experiment_results
+    from ...db.connections import ClickhouseConnection
+    from ...fingerprintdb import FingerprintDB
+
+    from ..common import (
+        get_prev_range,
+        make_db_rows,
+        maybe_delete_prev_range,
+    )
+
+log = activity.logger
+
+
+def make_cc_batches(
+    cnt_by_cc: Dict[str, int],
+    probe_cc: List[str],
+    parallelism: int,
+) -> List[List[str]]:
+    """
+    The goal of this function is to spread the load of each batch of
+    measurements by probe_cc. This allows us to parallelize analysis on a
+    per-country basis based on the number of measurements.
+    We assume that the measurements are uniformly distributed over the tested
+    interval and then break them up into a number of batches equivalent to the
+    parallelism count based on the number of measurements in each country.
+
+    Here is a concrete example, suppose we have 3 countries IT, IR, US with 300,
+    400, 1000 measurements respectively and a parallelism of 2, we will be
+    creating 2 batches where the first has in it IT, IR and the second has US.
+    """
+    if len(probe_cc) > 0:
+        selected_ccs_with_cnt = set(probe_cc).intersection(set(cnt_by_cc.keys()))
+        if len(selected_ccs_with_cnt) == 0:
+            raise Exception(
+                f"No observations for {probe_cc} in the time range. Try adjusting the date range or choosing different countries"
+            )
+        # We remove from the cnt_by_cc all the countries we are not interested in
+        cnt_by_cc = {k: cnt_by_cc[k] for k in selected_ccs_with_cnt}
+
+    total_obs_cnt = sum(cnt_by_cc.values())
+
+    # We assume uniform distribution of observations per (country, day)
+    max_obs_per_batch = total_obs_cnt / parallelism
+
+    # We break up the countries into batches where the count of observations in
+    # each batch is roughly equal.
+    # This is done so that we can spread the load based on the countries in
+    # addition to the time range.
+    cc_batches = []
+    current_cc_batch_size = 0
+    current_cc_batch = []
+    cnt_by_cc_sorted = sorted(cnt_by_cc.items(), key=lambda x: x[0])
+    while cnt_by_cc_sorted:
+        while current_cc_batch_size <= max_obs_per_batch:
+            try:
+                cc, cnt = cnt_by_cc_sorted.pop()
+            except IndexError:
+                break
+            current_cc_batch.append(cc)
+            current_cc_batch_size += cnt
+        cc_batches.append(current_cc_batch)
+        current_cc_batch = []
+        current_cc_batch_size = 0
+    if len(current_cc_batch) > 0:
+        cc_batches.append(current_cc_batch)
+    return cc_batches
+
+
+@dataclass
+class MakeAnalysisParams:
+    probe_cc: List[str]
+    test_name: List[str]
+    clickhouse: str
+    data_dir: str
+    fast_fail: bool
+    day: str
+
+
+@activity.defn
+def make_analysis_in_a_day(params: MakeAnalysisParams) -> dict:
+    data_dir = pathlib.Path(params.data_dir)
+    clickhouse = params.clickhouse
+    day = datetime.strptime(params.day, "%Y-%m-%d").date()
+    probe_cc = params.probe_cc
+    test_name = params.test_name
+
+    tracer = opentelemetry.trace.get_tracer(__name__)
+
+    with opentelemetry.trace.get_current_span():
+        fingerprintdb = FingerprintDB(datadir=data_dir, download=False)
+        body_db = BodyDB(db=ClickhouseConnection(clickhouse))
+        db_writer = ClickhouseConnection(clickhouse, row_buffer_size=10_000)
+        db_lookup = ClickhouseConnection(clickhouse)
+
+        column_names_wa = [f.name for f in dataclasses.fields(WebAnalysis)]
+        column_names_er = [
+            f.name for f in dataclasses.fields(MeasurementExperimentResult)
+        ]
+
+        # TODO(art): this previous range search and deletion makes the idempotence
+        # of the activity not 100% accurate.
+        # We should look into fixing it.
+        prev_range_list = [
+            get_prev_range(
+                db=db_lookup,
+                table_name=WebAnalysis.__table_name__,
+                timestamp=datetime.combine(day, datetime.min.time()),
+                test_name=[],
+                probe_cc=probe_cc,
+                timestamp_column="measurement_start_time",
+            ),
+            get_prev_range(
+                db=db_lookup,
+                table_name=MeasurementExperimentResult.__table_name__,
+                timestamp=datetime.combine(day, datetime.min.time()),
+                test_name=[],
+                probe_cc=probe_cc,
+                timestamp_column="timeofday",
+                probe_cc_column="location_network_cc",
+            ),
+        ]
+
+        log.info(f"loading ground truth DB for {day}")
+        with tracer.start_as_current_span(
+            "MakeObservations:load_ground_truths"
+        ) as span:
+            ground_truth_db_path = (
+                data_dir / "ground_truths" / f"web-{day.strftime('%Y-%m-%d')}.sqlite3"
+            )
+            web_ground_truth_db = WebGroundTruthDB()
+            web_ground_truth_db.build_from_existing(
+                str(ground_truth_db_path.absolute())
+            )
+            log.info(f"loaded ground truth DB for {day}")
+            span.add_event(f"loaded ground truth DB for {day}")
+            span.set_attribute("day", day.strftime("%Y-%m-%d"))
+            span.set_attribute(
+                "ground_truth_row_count", web_ground_truth_db.count_rows()
+            )
+
+        failures = 0
+        no_exp_results = 0
+        observation_count = 0
+        with tracer.start_as_current_span(
+            "MakeObservations:iter_web_observations"
+        ) as span:
+            for web_obs in iter_web_observations(
+                db_lookup,
+                measurement_day=day,
+                probe_cc=probe_cc,
+                test_name="web_connectivity",
+            ):
+                try:
+                    relevant_gts = web_ground_truth_db.lookup_by_web_obs(
+                        web_obs=web_obs
+                    )
+                except:
+                    log.error(
+                        f"failed to lookup relevant_gts for {web_obs[0].measurement_uid}",
+                        exc_info=True,
+                    )
+                    failures += 1
+                    continue
+
+                try:
+                    website_analysis = list(
+                        make_web_analysis(
+                            web_observations=web_obs,
+                            body_db=body_db,
+                            web_ground_truths=relevant_gts,
+                            fingerprintdb=fingerprintdb,
+                        )
+                    )
+                    if len(website_analysis) == 0:
+                        log.info(f"no website analysis for {probe_cc}, {test_name}")
+                        no_exp_results += 1
+                        continue
+
+                    observation_count += 1
+                    table_name, rows = make_db_rows(
+                        dc_list=website_analysis, column_names=column_names_wa
+                    )
+
+                    db_writer.write_rows(
+                        table_name=table_name,
+                        rows=rows,
+                        column_names=column_names_wa,
+                    )
+
+                    website_er = list(make_website_experiment_results(website_analysis))
+                    table_name, rows = make_db_rows(
+                        dc_list=website_er,
+                        column_names=column_names_er,
+                        custom_remap={"loni_list": orjson.dumps},
+                    )
+
+                    db_writer.write_rows(
+                        table_name=table_name,
+                        rows=rows,
+                        column_names=column_names_er,
+                    )
+
+                except:
+                    web_obs_ids = ",".join(map(lambda wo: wo.observation_id, web_obs))
+                    log.error(
+                        f"failed to generate analysis for {web_obs_ids}", exc_info=True
+                    )
+                    failures += 1
+
+            span.set_attribute("total_failure_count", failures)
+            span.set_attribute("total_observation_count", observation_count)
+            span.set_attribute("no_experiment_results_count", no_exp_results)
+            span.set_attribute("day", day.strftime("%Y-%m-%d"))
+            span.set_attribute("probe_cc", probe_cc)
+
+    for prev_range in prev_range_list:
+        maybe_delete_prev_range(db=db_lookup, prev_range=prev_range)
+    db_writer.close()
+
+    return {"count": observation_count}
diff --git a/oonipipeline/src/oonipipeline/temporal/activities/common.py b/oonipipeline/src/oonipipeline/temporal/activities/common.py
new file mode 100644
index 00000000..f623fcc6
--- /dev/null
+++ b/oonipipeline/src/oonipipeline/temporal/activities/common.py
@@ -0,0 +1,48 @@
+from dataclasses import dataclass
+from datetime import date
+from typing import Dict, List, Tuple
+from oonipipeline.db.connections import ClickhouseConnection
+from oonipipeline.db.create_tables import create_queries
+
+from temporalio import activity
+
+
+@dataclass
+class ClickhouseParams:
+    clickhouse_url: str
+
+
+@activity.defn
+def optimize_all_tables(params: ClickhouseParams):
+    with ClickhouseConnection(params.clickhouse_url) as db:
+        for _, table_name in create_queries:
+            db.execute(f"OPTIMIZE TABLE {table_name}")
+
+
+@dataclass
+class ObsCountParams:
+    clickhouse_url: str
+    # TODO(art): we should also be using test_name here
+    # test_name: List[str]
+    start_day: str
+    end_day: str
+    table_name: str = "obs_web"
+
+
+@activity.defn
+def get_obs_count_by_cc(
+    params: ObsCountParams,
+) -> Dict[str, int]:
+    with ClickhouseConnection(params.clickhouse_url) as db:
+        q = f"""
+        SELECT 
+        probe_cc, COUNT()
+        FROM {params.table_name} 
+        WHERE measurement_start_time > %(start_day)s AND measurement_start_time < %(end_day)s 
+        GROUP BY probe_cc
+        """
+        cc_list: List[Tuple[str, int]] = db.execute(
+            q, {"start_day": params.start_day, "end_day": params.end_day}
+        )  # type: ignore
+        assert isinstance(cc_list, list)
+    return dict(cc_list)
diff --git a/oonipipeline/src/oonipipeline/temporal/activities/ground_truths.py b/oonipipeline/src/oonipipeline/temporal/activities/ground_truths.py
new file mode 100644
index 00000000..54df2fe4
--- /dev/null
+++ b/oonipipeline/src/oonipipeline/temporal/activities/ground_truths.py
@@ -0,0 +1,54 @@
+from dataclasses import dataclass
+import pathlib
+import logging
+
+from datetime import datetime
+
+from temporalio import workflow, activity
+
+with workflow.unsafe.imports_passed_through():
+    import clickhouse_driver
+
+    from oonidata.datautils import PerfTimer
+    from ...analysis.control import WebGroundTruthDB, iter_web_ground_truths
+    from ...netinfo import NetinfoDB
+    from ...db.connections import (
+        ClickhouseConnection,
+    )
+
+log = activity.logger
+
+
+@dataclass
+class MakeGroundTruthsParams:
+    clickhouse: str
+    data_dir: str
+    day: str
+
+
+def get_ground_truth_db_path(data_dir: str, day: str):
+    ground_truth_dir = pathlib.Path(data_dir) / "ground_truths"
+    ground_truth_dir.mkdir(exist_ok=True)
+    return ground_truth_dir / f"web-{day}.sqlite3"
+
+
+@activity.defn
+def make_ground_truths_in_day(params: MakeGroundTruthsParams):
+    clickhouse = params.clickhouse
+
+    db = ClickhouseConnection(clickhouse)
+    netinfodb = NetinfoDB(datadir=pathlib.Path(params.data_dir), download=False)
+
+    dst_path = get_ground_truth_db_path(data_dir=params.data_dir, day=params.day)
+
+    if dst_path.exists():
+        dst_path.unlink()
+
+    t = PerfTimer()
+    day = datetime.strptime(params.day, "%Y-%m-%d").date()
+    log.info(f"building ground truth DB for {day}")
+    web_ground_truth_db = WebGroundTruthDB(connect_str=str(dst_path.absolute()))
+    web_ground_truth_db.build_from_rows(
+        rows=iter_web_ground_truths(db=db, measurement_day=day, netinfodb=netinfodb)
+    )
+    log.info(f"built ground truth DB {day} in {t.pretty}")
diff --git a/oonipipeline/src/oonipipeline/temporal/activities/observations.py b/oonipipeline/src/oonipipeline/temporal/activities/observations.py
new file mode 100644
index 00000000..51631fa9
--- /dev/null
+++ b/oonipipeline/src/oonipipeline/temporal/activities/observations.py
@@ -0,0 +1,201 @@
+from dataclasses import dataclass
+import dataclasses
+from typing import List, Sequence, Tuple
+from oonidata.dataclient import (
+    ccs_set,
+    list_file_entries_batches,
+    load_measurement,
+    stream_measurements,
+)
+from oonidata.datautils import PerfTimer
+from oonidata.models.nettests import SupportedDataformats
+from oonipipeline.db.connections import ClickhouseConnection
+from oonipipeline.netinfo import NetinfoDB
+from oonipipeline.temporal.common import (
+    get_prev_range,
+    make_db_rows,
+    maybe_delete_prev_range,
+)
+
+from opentelemetry import trace
+
+from temporalio import activity
+
+
+import pathlib
+from datetime import datetime, timedelta
+
+from oonipipeline.transforms.observations import measurement_to_observations
+
+log = activity.logger
+
+
+@dataclass
+class MakeObservationsParams:
+    probe_cc: List[str]
+    test_name: List[str]
+    clickhouse: str
+    data_dir: str
+    fast_fail: bool
+    bucket_date: str
+
+
+def write_observations_to_db(
+    msmt: SupportedDataformats,
+    netinfodb: NetinfoDB,
+    db: ClickhouseConnection,
+    bucket_date: str,
+):
+    for observations in measurement_to_observations(msmt=msmt, netinfodb=netinfodb):
+        if len(observations) == 0:
+            continue
+
+        column_names = [f.name for f in dataclasses.fields(observations[0])]
+        table_name, rows = make_db_rows(
+            bucket_date=bucket_date,
+            dc_list=observations,
+            column_names=column_names,
+        )
+        db.write_rows(table_name=table_name, rows=rows, column_names=column_names)
+
+
+def make_observations_for_file_entry_batch(
+    file_entry_batch: Sequence[Tuple[str, str, str, int]],
+    clickhouse: str,
+    row_buffer_size: int,
+    data_dir: pathlib.Path,
+    bucket_date: str,
+    probe_cc: List[str],
+    fast_fail: bool,
+):
+    netinfodb = NetinfoDB(datadir=data_dir, download=False)
+    tbatch = PerfTimer()
+
+    tracer = trace.get_tracer(__name__)
+
+    total_failure_count = 0
+    current_span = trace.get_current_span()
+    with current_span, ClickhouseConnection(
+        clickhouse, row_buffer_size=row_buffer_size
+    ) as db:
+        ccs = ccs_set(probe_cc)
+        idx = 0
+        for bucket_name, s3path, ext, fe_size in file_entry_batch:
+            failure_count = 0
+            # Nest the traced span within the current span
+            with tracer.start_as_current_span(
+                "MakeObservations:stream_file_entry"
+            ) as span:
+                log.debug(f"processing file s3://{bucket_name}/{s3path}")
+                t = PerfTimer()
+                try:
+                    for msmt_dict in stream_measurements(
+                        bucket_name=bucket_name, s3path=s3path, ext=ext
+                    ):
+                        # Legacy cans don't allow us to pre-filter on the probe_cc, so
+                        # we need to check for probe_cc consistency in here.
+                        if ccs and msmt_dict["probe_cc"] not in ccs:
+                            continue
+                        msmt = None
+                        try:
+                            t = PerfTimer()
+                            msmt = load_measurement(msmt_dict)
+                            if not msmt.test_keys:
+                                log.error(
+                                    f"measurement with empty test_keys: ({msmt.measurement_uid})",
+                                    exc_info=True,
+                                )
+                                continue
+                            write_observations_to_db(msmt, netinfodb, db, bucket_date)
+                            idx += 1
+                        except Exception as exc:
+                            msmt_str = msmt_dict.get("report_id", None)
+                            if msmt:
+                                msmt_str = msmt.measurement_uid
+                            log.error(
+                                f"failed at idx: {idx} ({msmt_str})", exc_info=True
+                            )
+                            failure_count += 1
+
+                            if fast_fail:
+                                db.close()
+                                raise exc
+                    log.debug(f"done processing file s3://{bucket_name}/{s3path}")
+                except Exception as exc:
+                    log.error(
+                        f"failed to stream measurements from s3://{bucket_name}/{s3path}"
+                    )
+                    log.error(exc)
+                # TODO(art): figure out if the rate of these metrics is too
+                # much. For each processed file a telemetry event is generated.
+                span.set_attribute("kb_per_sec", fe_size / 1024 / t.s)
+                span.set_attribute("fe_size", fe_size)
+                span.set_attribute("failure_count", failure_count)
+                span.add_event(f"s3_path: s3://{bucket_name}/{s3path}")
+                total_failure_count += failure_count
+
+        current_span.set_attribute("total_runtime_ms", tbatch.ms)
+        current_span.set_attribute("total_failure_count", total_failure_count)
+    return idx
+
+
+@activity.defn
+def make_observation_in_day(params: MakeObservationsParams) -> dict:
+    day = datetime.strptime(params.bucket_date, "%Y-%m-%d").date()
+
+    # TODO(art): this previous range search and deletion makes the idempotence
+    # of the activity not 100% accurate.
+    # We should look into fixing it.
+    with ClickhouseConnection(params.clickhouse, row_buffer_size=10_000) as db:
+        prev_ranges = []
+        for table_name in ["obs_web"]:
+            prev_ranges.append(
+                (
+                    table_name,
+                    get_prev_range(
+                        db=db,
+                        table_name=table_name,
+                        bucket_date=params.bucket_date,
+                        test_name=params.test_name,
+                        probe_cc=params.probe_cc,
+                    ),
+                )
+            )
+    log.info(f"prev_ranges: {prev_ranges}")
+
+    t = PerfTimer()
+    total_t = PerfTimer()
+    file_entry_batches, total_size = list_file_entries_batches(
+        probe_cc=params.probe_cc,
+        test_name=params.test_name,
+        start_day=day,
+        end_day=day + timedelta(days=1),
+    )
+    log.info(f"running {len(file_entry_batches)} batches took {t.pretty}")
+
+    total_msmt_count = 0
+    for batch in file_entry_batches:
+        msmt_cnt = make_observations_for_file_entry_batch(
+            batch,
+            params.clickhouse,
+            10_000,
+            pathlib.Path(params.data_dir),
+            params.bucket_date,
+            params.probe_cc,
+            params.fast_fail,
+        )
+        total_msmt_count += msmt_cnt
+
+    mb_per_sec = round(total_size / total_t.s / 10**6, 1)
+    msmt_per_sec = round(total_msmt_count / total_t.s)
+    log.info(
+        f"finished processing all batches in {total_t.pretty} speed: {mb_per_sec}MB/s ({msmt_per_sec}msmt/s)"
+    )
+
+    if len(prev_ranges) > 0:
+        with ClickhouseConnection(params.clickhouse, row_buffer_size=10_000) as db:
+            for table_name, pr in prev_ranges:
+                log.info("deleting previous range of {pr}")
+                maybe_delete_prev_range(db=db, prev_range=pr)
+
+    return {"size": total_size, "measurement_count": total_msmt_count}
diff --git a/oonipipeline/src/oonipipeline/workflows/common.py b/oonipipeline/src/oonipipeline/temporal/common.py
similarity index 79%
rename from oonipipeline/src/oonipipeline/workflows/common.py
rename to oonipipeline/src/oonipipeline/temporal/common.py
index 25764d3a..6c1bf32e 100644
--- a/oonipipeline/src/oonipipeline/workflows/common.py
+++ b/oonipipeline/src/oonipipeline/temporal/common.py
@@ -4,7 +4,7 @@
 import multiprocessing as mp
 from multiprocessing.synchronize import Event as EventClass
 
-from datetime import date, datetime, timedelta
+from datetime import datetime, timedelta
 
 from typing import (
     Any,
@@ -21,7 +21,6 @@
     MeasurementListProgress,
 )
 from ..db.connections import ClickhouseConnection
-from ..db.create_tables import create_queries
 
 log = logging.getLogger("oonidata.processing")
 
@@ -89,7 +88,7 @@ def maybe_delete_prev_range(db: ClickhouseConnection, prev_range: PrevRange):
     q_args["max_created_at"] = prev_range.max_created_at
     q_args["min_created_at"] = prev_range.min_created_at
     where = f"{where} AND created_at <= %(max_created_at)s AND created_at >= %(min_created_at)s"
-    log.info(f"runing {where} with {q_args}")
+    log.debug(f"runing {where} with {q_args}")
 
     q = f"ALTER TABLE {prev_range.table_name} DELETE "
     final_query = q + where
@@ -165,27 +164,6 @@ def get_prev_range(
     return prev_range
 
 
-def optimize_all_tables(clickhouse):
-    with ClickhouseConnection(clickhouse) as db:
-        for _, table_name in create_queries:
-            db.execute(f"OPTIMIZE TABLE {table_name}")
-
-
-def get_obs_count_by_cc(
-    db: ClickhouseConnection,
-    test_name: List[str],
-    start_day: date,
-    end_day: date,
-    table_name: str = "obs_web",
-) -> Dict[str, int]:
-    q = f"SELECT probe_cc, COUNT() FROM {table_name} WHERE measurement_start_time > %(start_day)s AND measurement_start_time < %(end_day)s GROUP BY probe_cc"
-    cc_list: List[Tuple[str, int]] = db.execute(
-        q, {"start_day": start_day, "end_day": end_day}
-    )  # type: ignore
-    assert isinstance(cc_list, list)
-    return dict(cc_list)
-
-
 def make_db_rows(
     dc_list: List,
     column_names: List[str],
@@ -208,32 +186,3 @@ def maybe_remap(k, value):
         rows.append(tuple(maybe_remap(k, getattr(d, k)) for k in column_names))
 
     return table_name, rows
-
-
-class StatusMessage(NamedTuple):
-    src: str
-    exception: Optional[Exception] = None
-    traceback: Optional[str] = None
-    progress: Optional[MeasurementListProgress] = None
-    idx: Optional[int] = None
-    day_str: Optional[str] = None
-    archive_queue_size: Optional[int] = None
-
-
-def run_progress_thread(
-    status_queue: mp.Queue, shutdown_event: EventClass, desc: str = "analyzing data"
-):
-    pbar = tqdm(position=0)
-
-    log.info("starting error handling thread")
-    while not shutdown_event.is_set():
-        try:
-            count = status_queue.get(block=True, timeout=0.1)
-        except queue.Empty:
-            continue
-
-        try:
-            pbar.update(count)
-            pbar.set_description(desc)
-        finally:
-            status_queue.task_done()  # type: ignore
diff --git a/oonipipeline/src/oonipipeline/workflows/to_port/fingerprint_hunter.py b/oonipipeline/src/oonipipeline/temporal/to_port/fingerprint_hunter.py
similarity index 100%
rename from oonipipeline/src/oonipipeline/workflows/to_port/fingerprint_hunter.py
rename to oonipipeline/src/oonipipeline/temporal/to_port/fingerprint_hunter.py
diff --git a/oonipipeline/src/oonipipeline/workflows/to_port/response_archiver.py b/oonipipeline/src/oonipipeline/temporal/to_port/response_archiver.py
similarity index 100%
rename from oonipipeline/src/oonipipeline/workflows/to_port/response_archiver.py
rename to oonipipeline/src/oonipipeline/temporal/to_port/response_archiver.py
diff --git a/oonipipeline/src/oonipipeline/temporal/workers.py b/oonipipeline/src/oonipipeline/temporal/workers.py
new file mode 100644
index 00000000..93d5e3b8
--- /dev/null
+++ b/oonipipeline/src/oonipipeline/temporal/workers.py
@@ -0,0 +1,64 @@
+import multiprocessing
+from oonipipeline.temporal.activities.analysis import make_analysis_in_a_day
+from oonipipeline.temporal.activities.common import (
+    get_obs_count_by_cc,
+    optimize_all_tables,
+)
+from oonipipeline.temporal.activities.ground_truths import make_ground_truths_in_day
+from oonipipeline.temporal.activities.observations import make_observation_in_day
+from oonipipeline.temporal.workflows import (
+    TASK_QUEUE_NAME,
+    AnalysisBackfillWorkflow,
+    AnalysisWorkflow,
+    GroundTruthsWorkflow,
+    ObservationsBackfillWorkflow,
+    ObservationsWorkflow,
+)
+
+
+from temporalio.client import Client as TemporalClient
+from temporalio.worker import SharedStateManager, Worker
+
+
+from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
+
+WORKFLOWS = [
+    ObservationsWorkflow,
+    GroundTruthsWorkflow,
+    AnalysisWorkflow,
+    ObservationsBackfillWorkflow,
+    AnalysisBackfillWorkflow,
+]
+
+ACTIVTIES = [
+    make_observation_in_day,
+    make_ground_truths_in_day,
+    make_analysis_in_a_day,
+    optimize_all_tables,
+    get_obs_count_by_cc,
+]
+
+
+def make_threaded_worker(client: TemporalClient, parallelism: int) -> Worker:
+    return Worker(
+        client,
+        task_queue=TASK_QUEUE_NAME,
+        workflows=WORKFLOWS,
+        activities=ACTIVTIES,
+        activity_executor=ThreadPoolExecutor(parallelism + 2),
+        max_concurrent_activities=parallelism,
+    )
+
+
+def make_multiprocess_worker(client: TemporalClient, parallelism: int) -> Worker:
+    return Worker(
+        client,
+        task_queue=TASK_QUEUE_NAME,
+        workflows=WORKFLOWS,
+        activities=ACTIVTIES,
+        activity_executor=ProcessPoolExecutor(parallelism + 2),
+        max_concurrent_activities=parallelism,
+        shared_state_manager=SharedStateManager.create_from_multiprocessing(
+            multiprocessing.Manager()
+        ),
+    )
diff --git a/oonipipeline/src/oonipipeline/temporal/workflows.py b/oonipipeline/src/oonipipeline/temporal/workflows.py
new file mode 100644
index 00000000..738e47f1
--- /dev/null
+++ b/oonipipeline/src/oonipipeline/temporal/workflows.py
@@ -0,0 +1,410 @@
+from dataclasses import dataclass
+from typing import List, Optional
+
+import logging
+import asyncio
+from datetime import datetime, timedelta, timezone
+
+
+from temporalio import workflow
+from temporalio.common import SearchAttributeKey
+from temporalio.client import (
+    Client as TemporalClient,
+    Schedule,
+    ScheduleActionStartWorkflow,
+    ScheduleIntervalSpec,
+    ScheduleSpec,
+    ScheduleState,
+)
+
+from oonipipeline.temporal.activities.common import (
+    optimize_all_tables,
+    ClickhouseParams,
+)
+from oonipipeline.temporal.activities.ground_truths import get_ground_truth_db_path
+
+with workflow.unsafe.imports_passed_through():
+    import clickhouse_driver
+
+    from oonidata.dataclient import date_interval
+    from oonidata.datautils import PerfTimer
+    from oonipipeline.db.connections import ClickhouseConnection
+    from oonipipeline.temporal.activities.analysis import (
+        MakeAnalysisParams,
+        log,
+        make_analysis_in_a_day,
+        make_cc_batches,
+    )
+    from oonipipeline.temporal.activities.common import (
+        get_obs_count_by_cc,
+        ObsCountParams,
+    )
+    from oonipipeline.temporal.activities.observations import (
+        MakeObservationsParams,
+        make_observation_in_day,
+    )
+
+    from oonipipeline.temporal.activities.ground_truths import (
+        MakeGroundTruthsParams,
+        make_ground_truths_in_day,
+    )
+
+# Handle temporal sandbox violations related to calls to self.processName =
+# mp.current_process().name in logger, see:
+# https://github.com/python/cpython/blob/1316692e8c7c1e1f3b6639e51804f9db5ed892ea/Lib/logging/__init__.py#L362
+logging.logMultiprocessing = False
+
+log = workflow.logger
+
+TASK_QUEUE_NAME = "oonipipeline-task-queue"
+OBSERVATION_WORKFLOW_ID = "oonipipeline-observations"
+
+MAKE_OBSERVATIONS_START_TO_CLOSE_TIMEOUT = timedelta(hours=24)
+MAKE_GROUND_TRUTHS_START_TO_CLOSE_TIMEOUT = timedelta(hours=1)
+MAKE_ANALYSIS_START_TO_CLOSE_TIMEOUT = timedelta(hours=10)
+
+
+def get_workflow_start_time() -> datetime:
+    workflow_start_time = workflow.info().typed_search_attributes.get(
+        SearchAttributeKey.for_datetime("TemporalScheduledStartTime")
+    )
+    assert workflow_start_time is not None, "TemporalScheduledStartTime not set"
+    return workflow_start_time
+
+
+@dataclass
+class ObservationsWorkflowParams:
+    probe_cc: List[str]
+    test_name: List[str]
+    clickhouse: str
+    data_dir: str
+    fast_fail: bool
+    log_level: int = logging.INFO
+    bucket_date: Optional[str] = None
+
+
+@workflow.defn
+class ObservationsWorkflow:
+    @workflow.run
+    async def run(self, params: ObservationsWorkflowParams) -> dict:
+        if params.bucket_date is None:
+            params.bucket_date = (
+                get_workflow_start_time() - timedelta(days=1)
+            ).strftime("%Y-%m-%d")
+
+        await workflow.execute_activity(
+            optimize_all_tables,
+            ClickhouseParams(clickhouse_url=params.clickhouse),
+            start_to_close_timeout=timedelta(minutes=5),
+        )
+
+        log.info(
+            f"Starting observation making with probe_cc={params.probe_cc},test_name={params.test_name} bucket_date={params.bucket_date}"
+        )
+
+        res = await workflow.execute_activity(
+            make_observation_in_day,
+            MakeObservationsParams(
+                probe_cc=params.probe_cc,
+                test_name=params.test_name,
+                clickhouse=params.clickhouse,
+                data_dir=params.data_dir,
+                fast_fail=params.fast_fail,
+                bucket_date=params.bucket_date,
+            ),
+            start_to_close_timeout=MAKE_OBSERVATIONS_START_TO_CLOSE_TIMEOUT,
+        )
+        res["bucket_date"] = params.bucket_date
+        return res
+
+
+@dataclass
+class BackfillWorkflowParams:
+    probe_cc: List[str]
+    test_name: List[str]
+    start_day: str
+    end_day: str
+    clickhouse: str
+    data_dir: str
+    fast_fail: bool
+    log_level: int = logging.INFO
+
+
+@workflow.defn
+class ObservationsBackfillWorkflow:
+    @workflow.run
+    async def run(self, params: BackfillWorkflowParams) -> dict:
+        start_day = datetime.strptime(params.start_day, "%Y-%m-%d")
+        end_day = datetime.strptime(params.end_day, "%Y-%m-%d")
+
+        t = PerfTimer(unstoppable=True)
+        task_list = []
+        workflow_id = workflow.info().workflow_id
+        for day in date_interval(start_day, end_day):
+            bucket_date = day.strftime("%Y-%m-%d")
+            task_list.append(
+                workflow.execute_child_workflow(
+                    ObservationsWorkflow.run,
+                    ObservationsWorkflowParams(
+                        bucket_date=bucket_date,
+                        probe_cc=params.probe_cc,
+                        test_name=params.test_name,
+                        clickhouse=params.clickhouse,
+                        data_dir=params.data_dir,
+                        fast_fail=params.fast_fail,
+                        log_level=params.log_level,
+                    ),
+                    id=f"{workflow_id}/{bucket_date}",
+                )
+            )
+
+        total_size = 0
+        total_measurement_count = 0
+
+        for task in asyncio.as_completed(task_list):
+            res = await task
+            bucket_date = res["bucket_date"]
+            total_size += res["size"]
+            total_measurement_count += res["measurement_count"]
+
+            mb_per_sec = round(total_size / t.s / 10**6, 1)
+            msmt_per_sec = round(total_measurement_count / t.s)
+            log.info(
+                f"finished processing {bucket_date} speed: {mb_per_sec}MB/s ({msmt_per_sec}msmt/s)"
+            )
+
+        mb_per_sec = round(total_size / t.s / 10**6, 1)
+        msmt_per_sec = round(total_measurement_count / t.s)
+        log.info(
+            f"finished processing {params.start_day} - {params.end_day} speed: {mb_per_sec}MB/s ({msmt_per_sec}msmt/s)"
+        )
+
+        return {
+            "size": total_size,
+            "measurement_count": total_measurement_count,
+            "runtime_ms": t.ms,
+            "mb_per_sec": mb_per_sec,
+            "msmt_per_sec": msmt_per_sec,
+            "start_day": params.start_day,
+            "end_day": params.start_day,
+        }
+
+
+OBSERVATIONS_SCHEDULE_ID = "oonipipeline-observations-schedule-id"
+
+
+def gen_observation_schedule_id(params: ObservationsWorkflowParams) -> str:
+    probe_cc_key = "ALLCCS"
+    if len(params.probe_cc) > 0:
+        probe_cc_key = ".".join(map(lambda x: x.lower(), sorted(params.probe_cc)))
+    test_name_key = "ALLTNS"
+    if len(params.test_name) > 0:
+        test_name_key = ".".join(map(lambda x: x.lower(), sorted(params.test_name)))
+
+    return f"oonipipeline-observations-{probe_cc_key}-{test_name_key}"
+
+
+async def schedule_observations(
+    client: TemporalClient, params: ObservationsWorkflowParams
+):
+    schedule_id = gen_observation_schedule_id(params)
+
+    await client.create_schedule(
+        schedule_id,
+        Schedule(
+            action=ScheduleActionStartWorkflow(
+                ObservationsWorkflow.run,
+                params,
+                id=OBSERVATION_WORKFLOW_ID,
+                task_queue=TASK_QUEUE_NAME,
+                execution_timeout=MAKE_OBSERVATIONS_START_TO_CLOSE_TIMEOUT,
+                task_timeout=MAKE_OBSERVATIONS_START_TO_CLOSE_TIMEOUT,
+                run_timeout=MAKE_OBSERVATIONS_START_TO_CLOSE_TIMEOUT,
+            ),
+            spec=ScheduleSpec(
+                intervals=[
+                    ScheduleIntervalSpec(
+                        every=timedelta(days=1), offset=timedelta(hours=2)
+                    )
+                ],
+            ),
+            state=ScheduleState(
+                note="Run the observations workflow every day with an offset of 2 hours to ensure the files have been written to s3"
+            ),
+        ),
+    )
+
+
+@dataclass
+class GroundTruthsWorkflowParams:
+    start_day: str
+    end_day: str
+    clickhouse: str
+    data_dir: str
+
+
+@workflow.defn
+class GroundTruthsWorkflow:
+    @workflow.run
+    async def run(
+        self,
+        params: GroundTruthsWorkflowParams,
+    ):
+        start_day = datetime.strptime(params.start_day, "%Y-%m-%d").date()
+        end_day = datetime.strptime(params.end_day, "%Y-%m-%d").date()
+
+        async with asyncio.TaskGroup() as tg:
+            for day in date_interval(start_day, end_day):
+                tg.create_task(
+                    workflow.execute_activity(
+                        make_ground_truths_in_day,
+                        MakeGroundTruthsParams(
+                            clickhouse=params.clickhouse,
+                            data_dir=params.data_dir,
+                            day=day.strftime("%Y-%m-%d"),
+                        ),
+                        start_to_close_timeout=MAKE_GROUND_TRUTHS_START_TO_CLOSE_TIMEOUT,
+                    )
+                )
+
+
+@dataclass
+class AnalysisWorkflowParams:
+    probe_cc: List[str]
+    test_name: List[str]
+    day: str
+    clickhouse: str
+    data_dir: str
+    parallelism: int
+    fast_fail: bool
+    force_rebuild_ground_truths: bool = False
+    log_level: int = logging.INFO
+
+
+@workflow.defn
+class AnalysisWorkflow:
+    @workflow.run
+    async def run(self, params: AnalysisWorkflowParams) -> dict:
+        await workflow.execute_activity(
+            optimize_all_tables,
+            ClickhouseParams(clickhouse_url=params.clickhouse),
+            start_to_close_timeout=timedelta(minutes=5),
+        )
+
+        log.info("building ground truth databases")
+        t = PerfTimer()
+        if (
+            params.force_rebuild_ground_truths
+            or not get_ground_truth_db_path(
+                day=params.day, data_dir=params.data_dir
+            ).exists()
+        ):
+            await workflow.execute_activity(
+                make_ground_truths_in_day,
+                MakeGroundTruthsParams(
+                    clickhouse=params.clickhouse,
+                    data_dir=params.data_dir,
+                    day=params.day,
+                ),
+                start_to_close_timeout=timedelta(minutes=30),
+            )
+            log.info(f"built ground truth db in {t.pretty}")
+
+        start_day = datetime.strptime(params.day, "%Y-%m-%d").date()
+        cnt_by_cc = await workflow.execute_activity(
+            get_obs_count_by_cc,
+            ObsCountParams(
+                clickhouse_url=params.clickhouse,
+                start_day=start_day.strftime("%Y-%m-%d"),
+                end_day=(start_day + timedelta(days=1)).strftime("%Y-%m-%d"),
+            ),
+            start_to_close_timeout=timedelta(minutes=30),
+        )
+
+        cc_batches = make_cc_batches(
+            cnt_by_cc=cnt_by_cc,
+            probe_cc=params.probe_cc,
+            parallelism=params.parallelism,
+        )
+
+        log.info(
+            f"starting processing of {len(cc_batches)} batches for {params.day} days (parallelism = {params.parallelism})"
+        )
+        log.info(f"({cc_batches})")
+
+        task_list = []
+        async with asyncio.TaskGroup() as tg:
+            for probe_cc in cc_batches:
+                task = tg.create_task(
+                    workflow.execute_activity(
+                        make_analysis_in_a_day,
+                        MakeAnalysisParams(
+                            probe_cc=probe_cc,
+                            test_name=params.test_name,
+                            clickhouse=params.clickhouse,
+                            data_dir=params.data_dir,
+                            fast_fail=params.fast_fail,
+                            day=params.day,
+                        ),
+                        start_to_close_timeout=MAKE_ANALYSIS_START_TO_CLOSE_TIMEOUT,
+                    )
+                )
+                task_list.append(task)
+
+        total_obs_count = sum(map(lambda x: x.result()["count"], task_list))
+        return {"obs_count": total_obs_count, "day": params.day}
+
+
+@workflow.defn
+class AnalysisBackfillWorkflow:
+    @workflow.run
+    async def run(self, params: BackfillWorkflowParams) -> dict:
+        start_day = datetime.strptime(params.start_day, "%Y-%m-%d")
+        end_day = datetime.strptime(params.end_day, "%Y-%m-%d")
+
+        t = PerfTimer(unstoppable=True)
+        task_list = []
+        workflow_id = workflow.info().workflow_id
+        for day in date_interval(start_day, end_day):
+            day_str = day.strftime("%Y-%m-%d")
+            task_list.append(
+                workflow.execute_child_workflow(
+                    AnalysisWorkflow.run,
+                    AnalysisWorkflowParams(
+                        day=day_str,
+                        probe_cc=params.probe_cc,
+                        test_name=params.test_name,
+                        clickhouse=params.clickhouse,
+                        data_dir=params.data_dir,
+                        fast_fail=params.fast_fail,
+                        log_level=params.log_level,
+                        parallelism=10,
+                    ),
+                    id=f"{workflow_id}/{day_str}",
+                )
+            )
+
+        total_obs_count = 0
+
+        for task in asyncio.as_completed(task_list):
+            res = await task
+            day = res["day"]
+            total_obs_count += res["obs_count"]
+
+            obs_per_sec = round(total_obs_count / t.s, 1)
+            log.info(
+                f"finished processing {day} in {t.pretty} total_obs_count={total_obs_count} ({obs_per_sec}obs/s)"
+            )
+
+        obs_per_sec = round(total_obs_count / t.s, 1)
+        log.info(
+            f"finished processing {day} in {t.pretty} total_obse_count={total_obs_count} ({obs_per_sec}obs/s)"
+        )
+
+        return {
+            "observation_count": total_obs_count,
+            "runtime_ms": t.ms,
+            "obs_per_sec": obs_per_sec,
+            "start_day": params.start_day,
+            "end_day": params.start_day,
+        }
diff --git a/oonipipeline/src/oonipipeline/workflows/analysis.py b/oonipipeline/src/oonipipeline/workflows/analysis.py
deleted file mode 100644
index 76f854e2..00000000
--- a/oonipipeline/src/oonipipeline/workflows/analysis.py
+++ /dev/null
@@ -1,339 +0,0 @@
-import asyncio
-import dataclasses
-from dataclasses import dataclass
-import logging
-import pathlib
-
-from datetime import date, datetime, timedelta, timezone
-from typing import Dict, List
-
-from temporalio import workflow, activity
-
-with workflow.unsafe.imports_passed_through():
-    import clickhouse_driver
-
-    import orjson
-    import statsd
-
-    from oonidata.dataclient import date_interval
-    from oonidata.datautils import PerfTimer
-    from oonidata.models.analysis import WebAnalysis
-    from oonidata.models.experiment_result import MeasurementExperimentResult
-
-    from ..analysis.control import BodyDB, WebGroundTruthDB
-    from ..analysis.datasources import iter_web_observations
-    from ..analysis.web_analysis import make_web_analysis
-    from ..analysis.website_experiment_results import make_website_experiment_results
-    from ..db.connections import ClickhouseConnection
-    from ..fingerprintdb import FingerprintDB
-
-    from .ground_truths import make_ground_truths_in_day, MakeGroundTruthsParams
-
-    from .common import (
-        get_obs_count_by_cc,
-        get_prev_range,
-        make_db_rows,
-        maybe_delete_prev_range,
-        optimize_all_tables,
-    )
-
-log = logging.getLogger("oonidata.processing")
-
-
-@dataclass
-class AnalysisWorkflowParams:
-    probe_cc: List[str]
-    test_name: List[str]
-    start_day: str
-    end_day: str
-    clickhouse: str
-    data_dir: str
-    parallelism: int
-    fast_fail: bool
-    rebuild_ground_truths: bool
-    log_level: int = logging.INFO
-
-
-@dataclass
-class MakeAnalysisParams:
-    probe_cc: List[str]
-    test_name: List[str]
-    clickhouse: str
-    data_dir: str
-    fast_fail: bool
-    day: str
-
-
-@activity.defn
-def make_analysis_in_a_day(params: MakeAnalysisParams) -> dict:
-    t_total = PerfTimer()
-    log.info("Optimizing all tables")
-    optimize_all_tables(params.clickhouse)
-    data_dir = pathlib.Path(params.data_dir)
-    clickhouse = params.clickhouse
-    day = datetime.strptime(params.day, "%Y-%m-%d").date()
-    probe_cc = params.probe_cc
-    test_name = params.test_name
-
-    statsd_client = statsd.StatsClient("localhost", 8125)
-    fingerprintdb = FingerprintDB(datadir=data_dir, download=False)
-    body_db = BodyDB(db=ClickhouseConnection(clickhouse))
-    db_writer = ClickhouseConnection(clickhouse, row_buffer_size=10_000)
-    db_lookup = ClickhouseConnection(clickhouse)
-
-    column_names_wa = [f.name for f in dataclasses.fields(WebAnalysis)]
-    column_names_er = [f.name for f in dataclasses.fields(MeasurementExperimentResult)]
-
-    prev_range_list = [
-        get_prev_range(
-            db=db_lookup,
-            table_name=WebAnalysis.__table_name__,
-            timestamp=datetime.combine(day, datetime.min.time()),
-            test_name=[],
-            probe_cc=probe_cc,
-            timestamp_column="measurement_start_time",
-        ),
-        get_prev_range(
-            db=db_lookup,
-            table_name=MeasurementExperimentResult.__table_name__,
-            timestamp=datetime.combine(day, datetime.min.time()),
-            test_name=[],
-            probe_cc=probe_cc,
-            timestamp_column="timeofday",
-            probe_cc_column="location_network_cc",
-        ),
-    ]
-
-    log.info(f"loading ground truth DB for {day}")
-    t = PerfTimer()
-    ground_truth_db_path = (
-        data_dir / "ground_truths" / f"web-{day.strftime('%Y-%m-%d')}.sqlite3"
-    )
-    web_ground_truth_db = WebGroundTruthDB()
-    web_ground_truth_db.build_from_existing(str(ground_truth_db_path.absolute()))
-    statsd_client.timing("oonidata.web_analysis.ground_truth", t.ms)
-    log.info(f"loaded ground truth DB for {day} in {t.pretty}")
-
-    idx = 0
-    for web_obs in iter_web_observations(
-        db_lookup, measurement_day=day, probe_cc=probe_cc, test_name="web_connectivity"
-    ):
-        try:
-            t_er_gen = PerfTimer()
-            t = PerfTimer()
-            relevant_gts = web_ground_truth_db.lookup_by_web_obs(web_obs=web_obs)
-        except:
-            log.error(
-                f"failed to lookup relevant_gts for {web_obs[0].measurement_uid}",
-                exc_info=True,
-            )
-            continue
-
-        try:
-            statsd_client.timing("oonidata.web_analysis.gt_lookup", t.ms)
-            website_analysis = list(
-                make_web_analysis(
-                    web_observations=web_obs,
-                    body_db=body_db,
-                    web_ground_truths=relevant_gts,
-                    fingerprintdb=fingerprintdb,
-                )
-            )
-            log.info(f"generated {len(website_analysis)} website_analysis")
-            if len(website_analysis) == 0:
-                log.info(f"no website analysis for {probe_cc}, {test_name}")
-                continue
-            idx += 1
-            table_name, rows = make_db_rows(
-                dc_list=website_analysis, column_names=column_names_wa
-            )
-            statsd_client.incr("oonidata.web_analysis.analysis.obs", 1, rate=0.1)  # type: ignore
-            statsd_client.gauge("oonidata.web_analysis.analysis.obs_idx", idx, rate=0.1)  # type: ignore
-            statsd_client.timing("oonidata.web_analysis.analysis.obs", t_er_gen.ms, rate=0.1)  # type: ignore
-
-            with statsd_client.timer("db_write_rows.timing"):
-                db_writer.write_rows(
-                    table_name=table_name,
-                    rows=rows,
-                    column_names=column_names_wa,
-                )
-
-            with statsd_client.timer("oonidata.web_analysis.experiment_results.timing"):
-                website_er = list(make_website_experiment_results(website_analysis))
-                log.info(f"generated {len(website_er)} website_er")
-                table_name, rows = make_db_rows(
-                    dc_list=website_er,
-                    column_names=column_names_er,
-                    custom_remap={"loni_list": orjson.dumps},
-                )
-
-            db_writer.write_rows(
-                table_name=table_name,
-                rows=rows,
-                column_names=column_names_er,
-            )
-
-        except:
-            web_obs_ids = ",".join(map(lambda wo: wo.observation_id, web_obs))
-            log.error(f"failed to generate analysis for {web_obs_ids}", exc_info=True)
-
-    for prev_range in prev_range_list:
-        maybe_delete_prev_range(db=db_lookup, prev_range=prev_range)
-    db_writer.close()
-
-    with ClickhouseConnection(clickhouse) as db:
-        db.execute(
-            "INSERT INTO oonidata_processing_logs (key, timestamp, runtime_ms, bytes, msmt_count, comment) VALUES",
-            [
-                [
-                    "oonidata.analysis.made_day_analysis",
-                    datetime.now(timezone.utc).replace(tzinfo=None),
-                    int(t_total.ms),
-                    0,
-                    idx,
-                    day.strftime("%Y-%m-%d"),
-                ]
-            ],
-        )
-    return {"count": idx}
-
-
-def make_cc_batches(
-    cnt_by_cc: Dict[str, int],
-    probe_cc: List[str],
-    parallelism: int,
-) -> List[List[str]]:
-    """
-    The goal of this function is to spread the load of each batch of
-    measurements by probe_cc. This allows us to parallelize analysis on a
-    per-country basis based on the number of measurements.
-    We assume that the measurements are uniformly distributed over the tested
-    interval and then break them up into a number of batches equivalent to the
-    parallelism count based on the number of measurements in each country.
-
-    Here is a concrete example, suppose we have 3 countries IT, IR, US with 300,
-    400, 1000 measurements respectively and a parallelism of 2, we will be
-    creating 2 batches where the first has in it IT, IR and the second has US.
-    """
-    if len(probe_cc) > 0:
-        selected_ccs_with_cnt = set(probe_cc).intersection(set(cnt_by_cc.keys()))
-        if len(selected_ccs_with_cnt) == 0:
-            raise Exception(
-                f"No observations for {probe_cc} in the time range. Try adjusting the date range or choosing different countries"
-            )
-        # We remove from the cnt_by_cc all the countries we are not interested in
-        cnt_by_cc = {k: cnt_by_cc[k] for k in selected_ccs_with_cnt}
-
-    total_obs_cnt = sum(cnt_by_cc.values())
-
-    # We assume uniform distribution of observations per (country, day)
-    max_obs_per_batch = total_obs_cnt / parallelism
-
-    # We break up the countries into batches where the count of observations in
-    # each batch is roughly equal.
-    # This is done so that we can spread the load based on the countries in
-    # addition to the time range.
-    cc_batches = []
-    current_cc_batch_size = 0
-    current_cc_batch = []
-    cnt_by_cc_sorted = sorted(cnt_by_cc.items(), key=lambda x: x[0])
-    while cnt_by_cc_sorted:
-        while current_cc_batch_size <= max_obs_per_batch:
-            try:
-                cc, cnt = cnt_by_cc_sorted.pop()
-            except IndexError:
-                break
-            current_cc_batch.append(cc)
-            current_cc_batch_size += cnt
-        cc_batches.append(current_cc_batch)
-        current_cc_batch = []
-        current_cc_batch_size = 0
-    if len(current_cc_batch) > 0:
-        cc_batches.append(current_cc_batch)
-    return cc_batches
-
-
-# TODO(art)
-# We disable the sanbox for all this workflow, since otherwise pytz fails to
-# work which is a requirement for clickhouse.
-# This is most likely due to it doing an open() in order to read the timezone
-# definitions.
-# I spent some time debugging this, but eventually gave up. We should eventually
-# look into making this run OK inside of the sandbox.
-@workflow.defn(sandboxed=False)
-class AnalysisWorkflow:
-    @workflow.run
-    async def run(self, params: AnalysisWorkflowParams) -> dict:
-        t_total = PerfTimer()
-
-        t = PerfTimer()
-        start_day = datetime.strptime(params.start_day, "%Y-%m-%d").date()
-        end_day = datetime.strptime(params.end_day, "%Y-%m-%d").date()
-
-        log.info("building ground truth databases")
-
-        async with asyncio.TaskGroup() as tg:
-            for day in date_interval(start_day, end_day):
-                tg.create_task(
-                    workflow.execute_activity(
-                        make_ground_truths_in_day,
-                        MakeGroundTruthsParams(
-                            day=day.strftime("%Y-%m-%d"),
-                            clickhouse=params.clickhouse,
-                            data_dir=params.data_dir,
-                            rebuild_ground_truths=params.rebuild_ground_truths,
-                        ),
-                        start_to_close_timeout=timedelta(minutes=2),
-                    )
-                )
-            log.info(f"built ground truth db in {t.pretty}")
-
-        with ClickhouseConnection(params.clickhouse) as db:
-            cnt_by_cc = get_obs_count_by_cc(
-                db, start_day=start_day, end_day=end_day, test_name=params.test_name
-            )
-        cc_batches = make_cc_batches(
-            cnt_by_cc=cnt_by_cc,
-            probe_cc=params.probe_cc,
-            parallelism=params.parallelism,
-        )
-        log.info(
-            f"starting processing of {len(cc_batches)} batches over {(end_day - start_day).days} days (parallelism = {params.parallelism})"
-        )
-        log.info(f"({cc_batches} from {start_day} to {end_day}")
-
-        task_list = []
-        async with asyncio.TaskGroup() as tg:
-            for probe_cc in cc_batches:
-                for day in date_interval(start_day, end_day):
-                    task = tg.create_task(
-                        workflow.execute_activity(
-                            make_analysis_in_a_day,
-                            MakeAnalysisParams(
-                                probe_cc=probe_cc,
-                                test_name=params.test_name,
-                                clickhouse=params.clickhouse,
-                                data_dir=params.data_dir,
-                                fast_fail=params.fast_fail,
-                                day=day.strftime("%Y-%m-%d"),
-                            ),
-                            start_to_close_timeout=timedelta(minutes=30),
-                        )
-                    )
-                    task_list.append(task)
-
-        t = PerfTimer()
-        # size, msmt_count =
-        total_obs_count = 0
-        for task in task_list:
-            res = task.result()
-
-            total_obs_count += res["count"]
-
-        log.info(f"produces a total of {total_obs_count} analysis")
-        obs_per_sec = round(total_obs_count / t_total.s)
-        log.info(
-            f"finished processing {start_day} - {end_day} speed: {obs_per_sec}obs/s)"
-        )
-        log.info(f"{total_obs_count} msmts in {t_total.pretty}")
-        return {"total_obs_count": total_obs_count}
diff --git a/oonipipeline/src/oonipipeline/workflows/ground_truths.py b/oonipipeline/src/oonipipeline/workflows/ground_truths.py
deleted file mode 100644
index eda81727..00000000
--- a/oonipipeline/src/oonipipeline/workflows/ground_truths.py
+++ /dev/null
@@ -1,90 +0,0 @@
-import asyncio
-from dataclasses import dataclass
-import pathlib
-import logging
-
-from datetime import datetime, timedelta
-
-from temporalio import workflow, activity
-
-with workflow.unsafe.imports_passed_through():
-    import clickhouse_driver
-
-    from oonidata.dataclient import date_interval
-    from oonidata.datautils import PerfTimer
-    from ..analysis.control import WebGroundTruthDB, iter_web_ground_truths
-    from ..netinfo import NetinfoDB
-    from ..db.connections import (
-        ClickhouseConnection,
-    )
-
-log = logging.getLogger("oonidata.processing")
-
-
-@dataclass
-class GroundTruthsWorkflowParams:
-    start_day: str
-    end_day: str
-    clickhouse: str
-    data_dir: str
-
-
-@dataclass
-class MakeGroundTruthsParams:
-    clickhouse: str
-    data_dir: str
-    day: str
-    rebuild_ground_truths: bool
-
-
-@activity.defn
-def make_ground_truths_in_day(params: MakeGroundTruthsParams):
-    clickhouse = params.clickhouse
-    day = datetime.strptime(params.day, "%Y-%m-%d").date()
-    data_dir = pathlib.Path(params.data_dir)
-    rebuild_ground_truths = params.rebuild_ground_truths
-
-    db = ClickhouseConnection(clickhouse)
-    netinfodb = NetinfoDB(datadir=data_dir, download=False)
-    ground_truth_dir = data_dir / "ground_truths"
-    ground_truth_dir.mkdir(exist_ok=True)
-    dst_path = ground_truth_dir / f"web-{day.strftime('%Y-%m-%d')}.sqlite3"
-    if not dst_path.exists() or rebuild_ground_truths != False:
-        if dst_path.exists():
-            dst_path.unlink()
-
-        t = PerfTimer()
-        log.info(f"building ground truth DB for {day}")
-        web_ground_truth_db = WebGroundTruthDB(connect_str=str(dst_path.absolute()))
-        web_ground_truth_db.build_from_rows(
-            rows=iter_web_ground_truths(db=db, measurement_day=day, netinfodb=netinfodb)
-        )
-        log.info(f"built ground truth DB {day} in {t.pretty}")
-
-
-@workflow.defn
-class GroundTruthsWorkflow:
-    @workflow.run
-    async def run(
-        self,
-        params: GroundTruthsWorkflowParams,
-    ):
-        task_list = []
-        start_day = datetime.strptime(params.start_day, "%Y-%m-%d").date()
-        end_day = datetime.strptime(params.end_day, "%Y-%m-%d").date()
-
-        async with asyncio.TaskGroup() as tg:
-            for day in date_interval(start_day, end_day):
-                task = tg.create_task(
-                    workflow.execute_activity(
-                        make_ground_truths_in_day,
-                        MakeGroundTruthsParams(
-                            clickhouse=params.clickhouse,
-                            data_dir=params.data_dir,
-                            day=day.strftime("%Y-%m-%d"),
-                            rebuild_ground_truths=True,
-                        ),
-                        start_to_close_timeout=timedelta(minutes=30),
-                    )
-                )
-                task_list.append(task)
diff --git a/oonipipeline/src/oonipipeline/workflows/observations.py b/oonipipeline/src/oonipipeline/workflows/observations.py
deleted file mode 100644
index 1232165a..00000000
--- a/oonipipeline/src/oonipipeline/workflows/observations.py
+++ /dev/null
@@ -1,278 +0,0 @@
-import asyncio
-import pathlib
-import logging
-import dataclasses
-from dataclasses import dataclass
-from datetime import datetime, timedelta
-
-from typing import (
-    List,
-    Sequence,
-    Tuple,
-)
-
-from temporalio import workflow, activity
-
-with workflow.unsafe.imports_passed_through():
-    import statsd
-    import clickhouse_driver
-    from oonidata.datautils import PerfTimer
-    from oonidata.dataclient import (
-        date_interval,
-        list_file_entries_batches,
-        stream_measurements,
-        ccs_set,
-        load_measurement,
-    )
-    from oonidata.models.nettests import SupportedDataformats
-
-    from ..netinfo import NetinfoDB
-    from ..db.connections import ClickhouseConnection
-    from ..transforms.observations import measurement_to_observations
-
-    from .common import (
-        get_prev_range,
-        make_db_rows,
-        maybe_delete_prev_range,
-        optimize_all_tables,
-    )
-
-log = logging.getLogger("oonidata.processing")
-
-
-def write_observations_to_db(
-    msmt: SupportedDataformats,
-    netinfodb: NetinfoDB,
-    db: ClickhouseConnection,
-    bucket_date: str,
-):
-    for observations in measurement_to_observations(msmt, netinfodb=netinfodb):
-        if len(observations) == 0:
-            continue
-
-        column_names = [f.name for f in dataclasses.fields(observations[0])]
-        table_name, rows = make_db_rows(
-            bucket_date=bucket_date,
-            dc_list=observations,
-            column_names=column_names,
-        )
-        db.write_rows(table_name=table_name, rows=rows, column_names=column_names)
-
-
-def make_observations_for_file_entry_batch(
-    file_entry_batch: Sequence[Tuple[str, str, str, int]],
-    clickhouse: str,
-    row_buffer_size: int,
-    data_dir: pathlib.Path,
-    bucket_date: str,
-    probe_cc: List[str],
-    fast_fail: bool,
-):
-    netinfodb = NetinfoDB(datadir=data_dir, download=False)
-    tbatch = PerfTimer()
-    with ClickhouseConnection(clickhouse, row_buffer_size=row_buffer_size) as db:
-        statsd_client = statsd.StatsClient("localhost", 8125)
-        ccs = ccs_set(probe_cc)
-        idx = 0
-        for bucket_name, s3path, ext, fe_size in file_entry_batch:
-            log.info(f"processing file s3://{bucket_name}/{s3path}")
-            t = PerfTimer()
-            try:
-                for msmt_dict in stream_measurements(
-                    bucket_name=bucket_name, s3path=s3path, ext=ext
-                ):
-                    # Legacy cans don't allow us to pre-filter on the probe_cc, so
-                    # we need to check for probe_cc consistency in here.
-                    if ccs and msmt_dict["probe_cc"] not in ccs:
-                        continue
-                    msmt = None
-                    try:
-                        t = PerfTimer()
-                        msmt = load_measurement(msmt_dict)
-                        if not msmt.test_keys:
-                            log.error(
-                                f"measurement with empty test_keys: ({msmt.measurement_uid})",
-                                exc_info=True,
-                            )
-                            continue
-                        write_observations_to_db(msmt, netinfodb, db, bucket_date)
-                        # following types ignored due to https://github.com/jsocol/pystatsd/issues/146
-                        statsd_client.timing("oonidata.make_observations.timed", t.ms, rate=0.1)  # type: ignore
-                        statsd_client.incr("oonidata.make_observations.msmt_count", rate=0.1)  # type: ignore
-                        idx += 1
-                    except Exception as exc:
-                        msmt_str = msmt_dict.get("report_id", None)
-                        if msmt:
-                            msmt_str = msmt.measurement_uid
-                        log.error(f"failed at idx: {idx} ({msmt_str})", exc_info=True)
-
-                        if fast_fail:
-                            db.close()
-                            raise exc
-                log.info(f"done processing file s3://{bucket_name}/{s3path}")
-            except Exception as exc:
-                log.error(
-                    f"failed to stream measurements from s3://{bucket_name}/{s3path}"
-                )
-                log.error(exc)
-            statsd_client.timing("oonidata.dataclient.stream_file_entry.timed", t.ms, rate=0.1)  # type: ignore
-            statsd_client.gauge("oonidata.dataclient.file_entry.kb_per_sec.gauge", fe_size / 1024 / t.s, rate=0.1)  # type: ignore
-        statsd_client.timing("oonidata.dataclient.batch.timed", tbatch.ms)  # type: ignore
-    return idx
-
-
-@dataclass
-class ObservationsWorkflowParams:
-    probe_cc: List[str]
-    test_name: List[str]
-    start_day: str
-    end_day: str
-    clickhouse: str
-    data_dir: str
-    fast_fail: bool
-    log_level: int = logging.INFO
-
-
-@dataclass
-class MakeObservationsParams:
-    probe_cc: List[str]
-    test_name: List[str]
-    clickhouse: str
-    data_dir: str
-    fast_fail: bool
-    bucket_date: str
-
-
-@activity.defn
-def make_observation_in_day(params: MakeObservationsParams) -> dict:
-    statsd_client = statsd.StatsClient("localhost", 8125)
-
-    day = datetime.strptime(params.bucket_date, "%Y-%m-%d").date()
-
-    with ClickhouseConnection(params.clickhouse, row_buffer_size=10_000) as db:
-        prev_ranges = []
-        for table_name in ["obs_web"]:
-            prev_ranges.append(
-                (
-                    table_name,
-                    get_prev_range(
-                        db=db,
-                        table_name=table_name,
-                        bucket_date=params.bucket_date,
-                        test_name=params.test_name,
-                        probe_cc=params.probe_cc,
-                    ),
-                )
-            )
-
-    t = PerfTimer()
-    total_t = PerfTimer()
-    file_entry_batches, total_size = list_file_entries_batches(
-        probe_cc=params.probe_cc,
-        test_name=params.test_name,
-        start_day=day,
-        end_day=day + timedelta(days=1),
-    )
-    log.info(f"running {len(file_entry_batches)} batches took {t.pretty}")
-
-    total_msmt_count = 0
-    for batch in file_entry_batches:
-        msmt_cnt = make_observations_for_file_entry_batch(
-            batch,
-            params.clickhouse,
-            10_000,
-            pathlib.Path(params.data_dir),
-            params.bucket_date,
-            params.probe_cc,
-            params.fast_fail,
-        )
-        total_msmt_count += msmt_cnt
-
-    mb_per_sec = round(total_size / total_t.s / 10**6, 1)
-    msmt_per_sec = round(total_msmt_count / total_t.s)
-    log.info(
-        f"finished processing all batches in {total_t.pretty} speed: {mb_per_sec}MB/s ({msmt_per_sec}msmt/s)"
-    )
-    statsd_client.timing("oonidata.dataclient.daily.timed", total_t.ms)
-
-    if len(prev_ranges) > 0:
-        with ClickhouseConnection(params.clickhouse, row_buffer_size=10_000) as db:
-            for table_name, pr in prev_ranges:
-                maybe_delete_prev_range(db=db, prev_range=pr)
-
-    return {"size": total_size, "measurement_count": total_msmt_count}
-
-
-@workflow.defn
-class ObservationsWorkflow:
-    @workflow.run
-    async def run(self, params: ObservationsWorkflowParams) -> dict:
-        log.info("Optimizing all tables")
-        optimize_all_tables(params.clickhouse)
-
-        t_total = PerfTimer()
-        log.info(
-            f"Starting observation making on {params.probe_cc} ({params.start_day} - {params.end_day})"
-        )
-        task_list = []
-        start_day = datetime.strptime(params.start_day, "%Y-%m-%d").date()
-        end_day = datetime.strptime(params.end_day, "%Y-%m-%d").date()
-
-        async with asyncio.TaskGroup() as tg:
-            for day in date_interval(start_day, end_day):
-                task = tg.create_task(
-                    workflow.execute_activity(
-                        make_observation_in_day,
-                        MakeObservationsParams(
-                            probe_cc=params.probe_cc,
-                            test_name=params.test_name,
-                            clickhouse=params.clickhouse,
-                            data_dir=params.data_dir,
-                            fast_fail=params.fast_fail,
-                            bucket_date=day.strftime("%Y-%m-%d"),
-                        ),
-                        start_to_close_timeout=timedelta(minutes=30),
-                    )
-                )
-                task_list.append(task)
-
-        t = PerfTimer()
-        # size, msmt_count =
-        total_size, total_msmt_count = 0, 0
-        for task in task_list:
-            res = task.result()
-
-            total_size += res["size"]
-            total_msmt_count += res["measurement_count"]
-
-        # This needs to be adjusted once we get the the per entry concurrency working
-        # mb_per_sec = round(total_size / t.s / 10**6, 1)
-        # msmt_per_sec = round(total_msmt_count / t.s)
-        # log.info(
-        #     f"finished processing {day} speed: {mb_per_sec}MB/s ({msmt_per_sec}msmt/s)"
-        # )
-
-        # with ClickhouseConnection(params.clickhouse) as db:
-        #     db.execute(
-        #         "INSERT INTO oonidata_processing_logs (key, timestamp, runtime_ms, bytes, msmt_count, comment) VALUES",
-        #         [
-        #             [
-        #                 "oonidata.bucket_processed",
-        #                 datetime.now(timezone.utc).replace(tzinfo=None),
-        #                 int(t.ms),
-        #                 total_size,
-        #                 total_msmt_count,
-        #                 day.strftime("%Y-%m-%d"),
-        #             ]
-        #         ],
-        #     )
-
-        mb_per_sec = round(total_size / t_total.s / 10**6, 1)
-        msmt_per_sec = round(total_msmt_count / t_total.s)
-        log.info(
-            f"finished processing {params.start_day} - {params.end_day} speed: {mb_per_sec}MB/s ({msmt_per_sec}msmt/s)"
-        )
-        log.info(
-            f"{round(total_size/10**9, 2)}GB {total_msmt_count} msmts in {t_total.pretty}"
-        )
-        return {"size": total_size, "measurement_count": total_msmt_count}
diff --git a/oonipipeline/tests/_fixtures.py b/oonipipeline/tests/_fixtures.py
index 47960b32..358dbe95 100644
--- a/oonipipeline/tests/_fixtures.py
+++ b/oonipipeline/tests/_fixtures.py
@@ -33,6 +33,11 @@
     "20221101055235.141387_RU_webconnectivity_046ce024dd76b564",  # ru_blocks_twitter
     "20230907000740.785053_BR_httpinvalidrequestline_bdfe6d70dcbda5e9",  # middlebox detected
     "20221110235922.335062_IR_webconnectivity_e4114ee32b8dbf74",  # Iran blocking reddit
+    "20240420235427.477327_US_webconnectivity_9b3cac038dc2ba22",  # down site
+    "20240302000048.790188_RU_webconnectivity_e7ffd3bc0f525eb7",  # connection reset RU
+    "20240302000050.000654_SN_webconnectivity_fe4221088fbdcb0a",  # nxdomain down
+    "20240302000305.316064_EG_webconnectivity_397bca9091b07444",  # nxdomain blocked, unknown_failure and from the future
+    "20240309112858.009725_SE_webconnectivity_dce757ef4ec9b6c8",  # blockpage for Iran in Sweden
 ]
 
 SAMPLE_POSTCANS = ["2024030100_AM_webconnectivity.n1.0.tar.gz"]
diff --git a/oonipipeline/tests/data/.gitignore b/oonipipeline/tests/data/.gitignore
index 213f74a9..ec372b8e 100644
--- a/oonipipeline/tests/data/.gitignore
+++ b/oonipipeline/tests/data/.gitignore
@@ -1,2 +1,3 @@
 /datadir
 /measurements
+/raw_measurements
diff --git a/oonipipeline/tests/docker-compose.yml b/oonipipeline/tests/docker-compose.yml
index 7546ca5b..b0dcb40d 100644
--- a/oonipipeline/tests/docker-compose.yml
+++ b/oonipipeline/tests/docker-compose.yml
@@ -3,4 +3,4 @@ services:
   clickhouse:
     image: "clickhouse/clickhouse-server"
     ports:
-      - "9000:9000"
+      - "19000:9000"
diff --git a/oonipipeline/tests/fixme_test_workers.py b/oonipipeline/tests/fixme_test_workers.py
deleted file mode 100644
index 17556d91..00000000
--- a/oonipipeline/tests/fixme_test_workers.py
+++ /dev/null
@@ -1,341 +0,0 @@
-from datetime import date, datetime, timedelta, timezone
-import gzip
-from pathlib import Path
-import sqlite3
-from typing import List, Tuple
-from unittest.mock import MagicMock
-import time
-
-from oonidata.dataclient import stream_jsonl, load_measurement
-from oonidata.models.nettests.dnscheck import DNSCheck
-from oonidata.models.nettests.web_connectivity import WebConnectivity
-from oonidata.models.nettests.http_invalid_request_line import HTTPInvalidRequestLine
-from oonidata.models.observations import HTTPMiddleboxObservation
-
-from oonipipeline.workflows.analysis import (
-    make_analysis_in_a_day,
-    make_cc_batches,
-    make_ctrl,
-)
-from oonipipeline.workflows.common import (
-    get_obs_count_by_cc,
-    get_prev_range,
-    maybe_delete_prev_range,
-)
-from oonipipeline.workflows.observations import (
-    make_observations_for_file_entry_batch,
-    write_observations_to_db,
-)
-from oonipipeline.workflows.response_archiver import ResponseArchiver
-from oonipipeline.workflows.fingerprint_hunter import fingerprint_hunter
-from oonipipeline.transforms import measurement_to_observations
-from oonipipeline.transforms.nettests.measurement_transformer import (
-    MeasurementTransformer,
-)
-
-
-def wait_for_mutations(db, table_name):
-    while True:
-        res = db.execute(
-            f"SELECT * FROM system.mutations WHERE is_done=0 AND table='{table_name}';"
-        )
-        if len(res) == 0:  # type: ignore
-            break
-        time.sleep(1)
-
-
-def test_get_prev_range(db):
-    db.execute("DROP TABLE IF EXISTS test_range")
-    db.execute(
-        """CREATE TABLE test_range (
-        created_at DateTime64(3, 'UTC'),
-        bucket_date String,
-        test_name String,
-        probe_cc String
-    )
-    ENGINE = MergeTree
-    ORDER BY (bucket_date, created_at)
-    """
-    )
-    bucket_date = "2000-01-01"
-    test_name = "web_connectivity"
-    probe_cc = "IT"
-    min_time = datetime(2000, 1, 1, 23, 42, 00)
-    rows = [(min_time, bucket_date, test_name, probe_cc)]
-    for i in range(200):
-        rows.append((min_time + timedelta(seconds=i), bucket_date, test_name, probe_cc))
-    db.execute(
-        "INSERT INTO test_range (created_at, bucket_date, test_name, probe_cc) VALUES",
-        rows,
-    )
-    prev_range = get_prev_range(
-        db,
-        "test_range",
-        test_name=[test_name],
-        bucket_date=bucket_date,
-        probe_cc=[probe_cc],
-    )
-    assert prev_range.min_created_at and prev_range.max_created_at
-    assert prev_range.min_created_at == (min_time - timedelta(seconds=1))
-    assert prev_range.max_created_at == (rows[-1][0] + timedelta(seconds=1))
-    db.execute("TRUNCATE TABLE test_range")
-
-    bucket_date = "2000-03-01"
-    test_name = "web_connectivity"
-    probe_cc = "IT"
-    min_time = datetime(2000, 1, 1, 23, 42, 00)
-    rows: List[Tuple[datetime, str, str, str]] = []
-    for i in range(10):
-        rows.append(
-            (min_time + timedelta(seconds=i), "2000-02-01", test_name, probe_cc)
-        )
-    min_time = rows[-1][0]
-    for i in range(10):
-        rows.append((min_time + timedelta(seconds=i), bucket_date, test_name, probe_cc))
-
-    db.execute(
-        "INSERT INTO test_range (created_at, bucket_date, test_name, probe_cc) VALUES",
-        rows,
-    )
-    prev_range = get_prev_range(
-        db,
-        "test_range",
-        test_name=[test_name],
-        bucket_date=bucket_date,
-        probe_cc=[probe_cc],
-    )
-    assert prev_range.min_created_at and prev_range.max_created_at
-    assert prev_range.min_created_at == (min_time - timedelta(seconds=1))
-    assert prev_range.max_created_at == (rows[-1][0] + timedelta(seconds=1))
-
-    maybe_delete_prev_range(
-        db=db,
-        prev_range=prev_range,
-    )
-    wait_for_mutations(db, "test_range")
-    res = db.execute("SELECT COUNT() FROM test_range")
-    assert res[0][0] == 10
-    db.execute("DROP TABLE test_range")
-
-
-def test_make_cc_batches():
-    cc_batches = make_cc_batches(
-        cnt_by_cc={"IT": 100, "IR": 300, "US": 1000},
-        probe_cc=["IT", "IR", "US"],
-        parallelism=2,
-    )
-    assert len(cc_batches) == 2
-    # We expect the batches to be broken up into (IT, IR), ("US")
-    assert any([set(x) == set(["US"]) for x in cc_batches]) == True
-
-
-def test_make_file_entry_batch(datadir, db):
-    file_entry_batch = [
-        (
-            "ooni-data-eu-fra",
-            "raw/20231031/15/IR/webconnectivity/2023103115_IR_webconnectivity.n1.0.tar.gz",
-            "tar.gz",
-            4074306,
-        )
-    ]
-    obs_msmt_count = make_observations_for_file_entry_batch(
-        file_entry_batch, db.clickhouse_url, 100, datadir, "2023-10-31", "IR", False
-    )
-    assert obs_msmt_count == 453
-
-    make_ctrl(
-        clickhouse=db.clickhouse_url,
-        data_dir=datadir,
-        rebuild_ground_truths=True,
-        day=date(2023, 10, 31),
-    )
-    analysis_msmt_count = make_analysis_in_a_day(
-        probe_cc=["IR"],
-        test_name=["webconnectivity"],
-        clickhouse=db.clickhouse_url,
-        data_dir=datadir,
-        day=date(2023, 10, 31),
-        fast_fail=False,
-    )
-    assert analysis_msmt_count == obs_msmt_count
-
-
-def test_write_observations(measurements, netinfodb, db):
-    msmt_uids = [
-        ("20210101190046.780850_US_webconnectivity_3296f126f79ca186", "2021-01-01"),
-        ("20210101181154.037019_CH_webconnectivity_68ce38aa9e3182c2", "2021-01-01"),
-        ("20231031032643.267235_GR_dnscheck_abcbfc460b9424b6", "2023-10-31"),
-        (
-            "20231101164541.763506_NP_httpinvalidrequestline_0cf676868fa36cc4",
-            "2023-10-31",
-        ),
-        (
-            "20231101164544.534107_BR_httpheaderfieldmanipulation_4caa0b0556f0b141",
-            "2023-10-31",
-        ),
-        ("20231101164649.235575_RU_tor_ccf7519bf683c022", "2023-10-31"),
-        (
-            "20230907000740.785053_BR_httpinvalidrequestline_bdfe6d70dcbda5e9",
-            "2023-09-07",
-        ),
-    ]
-    for msmt_uid, bucket_date in msmt_uids:
-        msmt = load_measurement(msmt_path=measurements[msmt_uid])
-        write_observations_to_db(msmt, netinfodb, db, bucket_date)
-    db.close()
-    cnt_by_cc = get_obs_count_by_cc(
-        db,
-        test_name=[],
-        start_day=date(2020, 1, 1),
-        end_day=date(2023, 12, 1),
-    )
-    assert cnt_by_cc["CH"] == 2
-    assert cnt_by_cc["GR"] == 4
-    assert cnt_by_cc["US"] == 3
-    assert cnt_by_cc["RU"] == 3
-
-
-def test_hirl_observations(measurements, netinfodb):
-    msmt = load_measurement(
-        msmt_path=measurements[
-            "20230907000740.785053_BR_httpinvalidrequestline_bdfe6d70dcbda5e9"
-        ]
-    )
-    assert isinstance(msmt, HTTPInvalidRequestLine)
-    middlebox_obs: List[HTTPMiddleboxObservation] = measurement_to_observations(
-        msmt, netinfodb=netinfodb
-    )[0]
-    assert isinstance(middlebox_obs[0], HTTPMiddleboxObservation)
-    assert middlebox_obs[0].hirl_success == True
-    assert middlebox_obs[0].hirl_sent_0 != middlebox_obs[0].hirl_received_0
-
-
-def test_insert_query_for_observation(measurements, netinfodb):
-    http_blocked = load_measurement(
-        msmt_path=measurements[
-            "20220608121828.356206_RU_webconnectivity_80e3fa60eb2cd026"
-        ]
-    )
-    assert isinstance(http_blocked, WebConnectivity)
-    mt = MeasurementTransformer(measurement=http_blocked, netinfodb=netinfodb)
-    all_web_obs = [
-        obs
-        for obs in mt.make_http_observations(
-            http_blocked.test_keys.requests,
-        )
-    ]
-    assert all_web_obs[-1].request_url == "http://proxy.org/"
-
-
-def test_web_connectivity_processor(netinfodb, measurements):
-    msmt = load_measurement(
-        msmt_path=measurements[
-            "20220627131742.081225_GB_webconnectivity_e1e2cf4db492b748"
-        ]
-    )
-    assert isinstance(msmt, WebConnectivity)
-
-    web_obs_list, web_ctrl_list = measurement_to_observations(msmt, netinfodb=netinfodb)
-    assert len(web_obs_list) == 3
-    assert len(web_ctrl_list) == 3
-
-
-def test_dnscheck_processor(measurements, netinfodb):
-    db = MagicMock()
-    db.write_row = MagicMock()
-
-    msmt = load_measurement(
-        msmt_path=measurements["20221013000000.517636_US_dnscheck_bfd6d991e70afa0e"]
-    )
-    assert isinstance(msmt, DNSCheck)
-    obs_list = measurement_to_observations(msmt=msmt, netinfodb=netinfodb)[0]
-    assert len(obs_list) == 20
-
-
-def test_full_processing(raw_measurements, netinfodb):
-    for msmt_path in raw_measurements.glob("*/*/*.jsonl.gz"):
-        with msmt_path.open("rb") as in_file:
-            for msmt_dict in stream_jsonl(in_file):
-                msmt = load_measurement(msmt_dict)
-                measurement_to_observations(
-                    msmt=msmt,
-                    netinfodb=netinfodb,
-                )
-
-
-def test_archive_http_transaction(measurements, tmpdir):
-    db = MagicMock()
-    db.write_row = MagicMock()
-
-    msmt = load_measurement(
-        msmt_path=measurements[
-            "20220627131742.081225_GB_webconnectivity_e1e2cf4db492b748"
-        ]
-    )
-    assert isinstance(msmt, WebConnectivity)
-    assert msmt.test_keys.requests
-    dst_dir = Path(tmpdir)
-    with ResponseArchiver(dst_dir=dst_dir) as archiver:
-        for http_transaction in msmt.test_keys.requests:
-            if not http_transaction.response or not http_transaction.request:
-                continue
-            request_url = http_transaction.request.url
-            status_code = http_transaction.response.code or 0
-            response_headers = http_transaction.response.headers_list_bytes or []
-            response_body = http_transaction.response.body_bytes
-            assert response_body
-            archiver.archive_http_transaction(
-                request_url=request_url,
-                status_code=status_code,
-                response_headers=response_headers,
-                response_body=response_body,
-                matched_fingerprints=[],
-            )
-
-    warc_files = list(dst_dir.glob("*.warc.gz"))
-    assert len(warc_files) == 1
-    with gzip.open(warc_files[0], "rb") as in_file:
-        assert b"Run OONI Probe to detect internet censorship" in in_file.read()
-
-    conn = sqlite3.connect(dst_dir / "graveyard.sqlite3")
-    res = conn.execute("SELECT COUNT() FROM oonibodies_archive")
-    assert res.fetchone()[0] == 1
-
-
-def test_fingerprint_hunter(fingerprintdb, measurements, tmpdir):
-    db = MagicMock()
-    db.write_rows = MagicMock()
-
-    archives_dir = Path(tmpdir)
-    http_blocked = load_measurement(
-        msmt_path=measurements[
-            "20220608121828.356206_RU_webconnectivity_80e3fa60eb2cd026"
-        ]
-    )
-    assert isinstance(http_blocked, WebConnectivity)
-    with ResponseArchiver(dst_dir=archives_dir) as response_archiver:
-        assert http_blocked.test_keys.requests
-        for http_transaction in http_blocked.test_keys.requests:
-            if not http_transaction.response or not http_transaction.request:
-                continue
-            request_url = http_transaction.request.url
-            status_code = http_transaction.response.code or 0
-            response_headers = http_transaction.response.headers_list_bytes or []
-            response_body = http_transaction.response.body_bytes
-            assert response_body
-            response_archiver.archive_http_transaction(
-                request_url=request_url,
-                status_code=status_code,
-                response_headers=response_headers,
-                response_body=response_body,
-                matched_fingerprints=[],
-            )
-
-    archive_path = list(archives_dir.glob("*.warc.gz"))[0]
-    detected_fps = list(
-        fingerprint_hunter(
-            fingerprintdb=fingerprintdb,
-            archive_path=archive_path,
-        )
-    )
-    assert len(detected_fps) == 1
diff --git a/oonipipeline/tests/test_analysis.py b/oonipipeline/tests/test_analysis.py
index 242da280..7b9cea84 100644
--- a/oonipipeline/tests/test_analysis.py
+++ b/oonipipeline/tests/test_analysis.py
@@ -1,15 +1,23 @@
 from base64 import b64decode
 from datetime import datetime
+from pprint import pprint
 import random
-from typing import List
+from typing import List, Tuple
 from unittest.mock import MagicMock
 
 import pytest
 
 from oonidata.dataclient import load_measurement
+from oonidata.models.analysis import WebAnalysis
+from oonidata.models.experiment_result import MeasurementExperimentResult
 from oonidata.models.nettests.signal import Signal
 from oonidata.models.nettests.web_connectivity import WebConnectivity
-from oonidata.models.observations import WebObservation, print_nice, print_nice_vertical
+from oonidata.models.observations import (
+    WebControlObservation,
+    WebObservation,
+    print_nice,
+    print_nice_vertical,
+)
 from oonidata.datautils import validate_cert_chain
 
 from oonipipeline.analysis.web_analysis import make_web_analysis
@@ -19,10 +27,14 @@
     iter_ground_truths_from_web_control,
     WebGroundTruthDB,
 )
-from oonipipeline.analysis.signal import make_signal_experiment_result
 from oonipipeline.transforms.nettests.signal import SIGNAL_PEM_STORE
 from oonipipeline.transforms.observations import measurement_to_observations
 
+from oonipipeline.analysis.signal import make_signal_experiment_result
+from oonipipeline.analysis.website_experiment_results import (
+    make_website_experiment_results,
+)
+
 
 def test_signal(fingerprintdb, netinfodb, measurements):
     signal_old_ca = load_measurement(
@@ -120,67 +132,6 @@ def test_signal(fingerprintdb, netinfodb, measurements):
     assert blocking_event[0].confirmed == True
 
 
-def test_website_dns_blocking_event(fingerprintdb, netinfodb, measurements):
-    pytest.skip("TODO(arturo): implement this with the new analysis")
-    msmt_path = measurements[
-        "20220627030703.592775_IR_webconnectivity_80e199b3c572f8d3"
-    ]
-    er = list(make_experiment_result_from_wc_ctrl(msmt_path, fingerprintdb, netinfodb))
-    be = list(
-        filter(
-            lambda be: be.outcome_scope == "n",
-            er,
-        )
-    )
-    assert len(be) == 1
-
-    msmt_path = measurements[
-        "20220627134426.194308_DE_webconnectivity_15675b61ec62e268"
-    ]
-    er = list(make_experiment_result_from_wc_ctrl(msmt_path, fingerprintdb, netinfodb))
-    be = list(
-        filter(
-            lambda be: be.blocked_score > 0.5,
-            er,
-        )
-    )
-    assert len(be) == 1
-    assert be[0].outcome_detail == "inconsistent.bogon"
-
-    msmt_path = measurements[
-        "20220627125833.737451_FR_webconnectivity_bca9ad9d3371919a"
-    ]
-    er = make_experiment_result_from_wc_ctrl(msmt_path, fingerprintdb, netinfodb)
-    be = list(
-        filter(
-            lambda be: be.blocked_score > 0.6,
-            er,
-        )
-    )
-    # TODO: is it reasonable to double count NXDOMAIN for AAAA and A queries?
-    assert len(be) == 2
-    assert be[0].outcome_detail == "inconsistent.nxdomain"
-
-    msmt_path = measurements[
-        "20220625234824.235023_HU_webconnectivity_3435a5df0e743d39"
-    ]
-    er = list(make_experiment_result_from_wc_ctrl(msmt_path, fingerprintdb, netinfodb))
-    be = list(
-        filter(
-            lambda be: be.ok_score > 0.5,
-            er,
-        )
-    )
-    nok_be = list(
-        filter(
-            lambda be: be.ok_score < 0.5,
-            er,
-        )
-    )
-    assert len(be) == len(er)
-    assert len(nok_be) == 0
-
-
 def make_experiment_result_from_wc_ctrl(msmt_path, fingerprintdb, netinfodb):
     msmt = load_measurement(msmt_path=msmt_path)
     assert isinstance(msmt, WebConnectivity)
@@ -203,32 +154,41 @@ def make_experiment_result_from_wc_ctrl(msmt_path, fingerprintdb, netinfodb):
     return []
 
 
-def test_website_experiment_result_blocked(fingerprintdb, netinfodb, measurements):
-    pytest.skip("TODO(arturo): implement this with the new analysis")
-    experiment_results = list(
-        make_experiment_result_from_wc_ctrl(
-            measurements["20220627030703.592775_IR_webconnectivity_80e199b3c572f8d3"],
-            fingerprintdb,
-            netinfodb,
-        )
+def make_web_er_from_msmt(msmt, fingerprintdb, netinfodb) -> Tuple[
+    List[MeasurementExperimentResult],
+    List[WebAnalysis],
+    List[WebObservation],
+    List[WebControlObservation],
+]:
+    assert isinstance(msmt, WebConnectivity)
+    web_observations, web_control_observations = measurement_to_observations(
+        msmt, netinfodb=netinfodb
+    )
+    assert isinstance(msmt.input, str)
+    web_ground_truth_db = WebGroundTruthDB()
+    web_ground_truth_db.build_from_rows(
+        rows=iter_ground_truths_from_web_control(
+            web_control_observations=web_control_observations,
+            netinfodb=netinfodb,
+        ),
     )
-    assert len(experiment_results) == 1
-    assert experiment_results[0].anomaly == True
-
 
-def test_website_experiment_result_ok(fingerprintdb, netinfodb, measurements):
-    pytest.skip("TODO(arturo): implement this with the new analysis")
-    experiment_results = list(
-        make_experiment_result_from_wc_ctrl(
-            measurements["20220608132401.787399_AM_webconnectivity_2285fc373f62729e"],
-            fingerprintdb,
-            netinfodb,
+    web_ground_truths = web_ground_truth_db.lookup_by_web_obs(web_obs=web_observations)
+    web_analysis = list(
+        make_web_analysis(
+            web_observations=web_observations,
+            web_ground_truths=web_ground_truths,
+            body_db=BodyDB(db=None),  # type: ignore
+            fingerprintdb=fingerprintdb,
         )
     )
-    assert len(experiment_results) == 4
-    assert experiment_results[0].anomaly == False
-    for er in experiment_results:
-        assert er.ok_score > 0.5
+
+    return (
+        list(make_website_experiment_results(web_analysis)),
+        web_analysis,
+        web_observations,
+        web_control_observations,
+    )
 
 
 def test_website_web_analysis_blocked(fingerprintdb, netinfodb, measurements, datadir):
@@ -237,74 +197,266 @@ def test_website_web_analysis_blocked(fingerprintdb, netinfodb, measurements, da
             "20221110235922.335062_IR_webconnectivity_e4114ee32b8dbf74"
         ],
     )
-    web_obs: List[WebObservation] = measurement_to_observations(
-        msmt, netinfodb=netinfodb
-    )[0]
-    FASTLY_IPS = [
-        "151.101.1.140",
-        "151.101.129.140",
-        "151.101.193.140",
-        "151.101.65.140",
-        "199.232.253.140",
-        "2a04:4e42:400::396",
-        "2a04:4e42::396",
-        "2a04:4e42:fd3::396",
+    er, web_analysis, web_obs, web_ctrl_obs = make_web_er_from_msmt(
+        msmt, fingerprintdb=fingerprintdb, netinfodb=netinfodb
+    )
+    assert len(web_analysis) == len(web_obs)
+    assert len(web_ctrl_obs) == 5
+
+    assert len(er) == 1
+    assert er[0].loni_blocked_values == [1.0]
+    assert er[0].loni_ok_value == 0
+    assert er[0].loni_blocked_keys[0].startswith("dns.")
+
+
+def test_website_web_analysis_plaintext_ok(fingerprintdb, netinfodb, measurements):
+    msmt = load_measurement(
+        msmt_path=measurements[
+            "20220608132401.787399_AM_webconnectivity_2285fc373f62729e"
+        ],
+    )
+    er, web_analysis, web_obs, web_ctrl_obs = make_web_er_from_msmt(
+        msmt, fingerprintdb=fingerprintdb, netinfodb=netinfodb
+    )
+    assert len(web_analysis) == len(web_obs)
+    assert len(web_ctrl_obs) == 2
+
+    assert len(er) == 1
+    ok_dict = dict(zip(er[0].loni_ok_keys, er[0].loni_ok_values))
+    assert ok_dict["dns"] > 0.8
+    assert ok_dict["tcp"] > 0.8
+    assert ok_dict["tls"] > 0.8
+    assert ok_dict["http"] > 0.8
+
+    assert er[0].loni_ok_value > 0.8
+
+
+def test_website_web_analysis_blocked_2(fingerprintdb, netinfodb, measurements):
+    msmt = load_measurement(
+        msmt_path=measurements[
+            "20220627030703.592775_IR_webconnectivity_80e199b3c572f8d3"
+        ],
+    )
+    er, web_analysis, web_obs, web_ctrl_obs = make_web_er_from_msmt(
+        msmt, fingerprintdb=fingerprintdb, netinfodb=netinfodb
+    )
+    assert len(web_analysis) == len(web_obs)
+    assert len(web_ctrl_obs) == 6
+
+    assert len(er) == 1
+    assert er[0].loni_blocked_values == [1.0]
+    assert er[0].loni_ok_value == 0
+    assert er[0].loni_blocked_keys[0].startswith("dns.")
+
+
+def test_website_dns_blocking_event(fingerprintdb, netinfodb, measurements):
+    msmt_path = measurements[
+        "20220627134426.194308_DE_webconnectivity_15675b61ec62e268"
+    ]
+    msmt = load_measurement(
+        msmt_path=msmt_path,
+    )
+    er, web_analysis, web_obs, web_ctrl_obs = make_web_er_from_msmt(
+        msmt, fingerprintdb=fingerprintdb, netinfodb=netinfodb
+    )
+    assert len(web_analysis) == len(web_obs)
+    assert len(web_ctrl_obs) == 6
+
+    assert len(er) == 1
+    assert er[0].loni_ok_value == 0
+    assert er[0].loni_blocked_values[0] > 0.7
+    assert er[0].loni_blocked_keys[0].startswith("dns.")
+
+
+def test_website_dns_blocking_event_2(fingerprintdb, netinfodb, measurements):
+    msmt_path = measurements[
+        "20220627125833.737451_FR_webconnectivity_bca9ad9d3371919a"
     ]
-    # Equivalent to the following call, but done manually
-    # relevant_gts = web_ground_truth_db.lookup_by_web_obs(web_obs=web_obs)
-    relevant_gts = []
-    for is_trusted in [True, False]:
-        for ip in FASTLY_IPS:
-            relevant_gts.append(
-                WebGroundTruth(
-                    vp_asn=0,
-                    vp_cc="ZZ",
-                    # TODO FIXME in lookup
-                    is_trusted_vp=is_trusted,
-                    hostname="www.reddit.com",
-                    ip=ip,
-                    # TODO FIXME in webgroundtruth lookup
-                    port=443,
-                    dns_failure=None,
-                    # TODO fixme in lookup
-                    dns_success=True,
-                    tcp_failure=None,
-                    # TODO fixme in lookup
-                    tcp_success=True,
-                    tls_failure=None,
-                    tls_success=True,
-                    tls_is_certificate_valid=True,
-                    http_request_url=None,
-                    http_failure=None,
-                    http_success=None,
-                    # FIXME in lookup function "ZZ",
-                    http_response_body_length=131072 - random.randint(0, 100),
-                    # TODO FIXME in lookup function
-                    timestamp=datetime(
-                        2022,
-                        11,
-                        10,
-                        0,
-                        0,
-                    ),
-                    count=2,
-                    ip_asn=54113,
-                    # TODO FIXME in lookup function
-                    ip_as_org_name="Fastly, Inc.",
-                ),
-            )
-    # XXX currently not working
-    body_db = BodyDB(db=None)  # type: ignore
+    msmt = load_measurement(
+        msmt_path=msmt_path,
+    )
+    er, web_analysis, web_obs, web_ctrl_obs = make_web_er_from_msmt(
+        msmt, fingerprintdb=fingerprintdb, netinfodb=netinfodb
+    )
+    assert len(web_analysis) == len(web_obs)
+    assert len(web_ctrl_obs) == 5
 
-    web_analysis = list(
-        make_web_analysis(
-            web_observations=web_obs,
-            body_db=body_db,
-            web_ground_truths=relevant_gts,
-            fingerprintdb=fingerprintdb,
-        )
+    assert len(er) == 1
+    assert er[0].loni_ok_value == 0
+    assert er[0].loni_blocked_values[0] > 0.5
+    assert er[0].loni_blocked_keys[0].startswith("dns.")
+
+
+def test_website_dns_ok(fingerprintdb, netinfodb, measurements):
+    msmt_path = measurements[
+        "20220625234824.235023_HU_webconnectivity_3435a5df0e743d39"
+    ]
+    msmt = load_measurement(
+        msmt_path=msmt_path,
+    )
+    er, web_analysis, web_obs, web_ctrl_obs = make_web_er_from_msmt(
+        msmt, fingerprintdb=fingerprintdb, netinfodb=netinfodb
+    )
+    # assert len(web_analysis) == len(web_obs)
+    assert len(web_ctrl_obs) == 5
+
+    assert len(er) == 1
+    assert er[0].loni_ok_value == 1
+
+
+# Check this for wc 0.5 overwriting tls analsysis
+# 20231031000227.813597_MY_webconnectivity_2f0b80761373aa7e
+def test_website_experiment_results(measurements, netinfodb, fingerprintdb):
+    msmt = load_measurement(
+        msmt_path=measurements[
+            "20221101055235.141387_RU_webconnectivity_046ce024dd76b564"
+        ]
+    )
+    er, web_analysis, web_obs, web_ctrl_obs = make_web_er_from_msmt(
+        msmt, fingerprintdb=fingerprintdb, netinfodb=netinfodb
+    )
+    assert len(web_analysis) == len(web_obs)
+    assert len(web_ctrl_obs) == 3
+
+    assert len(er) == 1
+    assert er[0].loni_ok_value < 0.2
+    ok_dict = dict(zip(er[0].loni_ok_keys, er[0].loni_ok_values))
+    assert ok_dict["tcp"] == 0
+
+    blocked_dict = dict(zip(er[0].loni_blocked_keys, er[0].loni_blocked_values))
+    assert blocked_dict["tcp.timeout"] > 0.4
+
+
+def test_website_web_analysis_down(measurements, netinfodb, fingerprintdb):
+    msmt = load_measurement(
+        msmt_path=measurements[
+            "20240420235427.477327_US_webconnectivity_9b3cac038dc2ba22"
+        ]
+    )
+    er, web_analysis, web_obs, web_ctrl_obs = make_web_er_from_msmt(
+        msmt, fingerprintdb=fingerprintdb, netinfodb=netinfodb
     )
     assert len(web_analysis) == len(web_obs)
-    # for wa in web_analysis:
-    #    print(wa.measurement_uid)
-    #    print_nice_vertical(wa)
+    assert len(web_ctrl_obs) == 3
+
+    assert len(er) == 1
+    assert er[0].loni_ok_value < 0.2
+    ok_dict = dict(zip(er[0].loni_ok_keys, er[0].loni_ok_values))
+    assert ok_dict["tcp"] == 0
+
+    down_dict = dict(zip(er[0].loni_down_keys, er[0].loni_down_values))
+
+    blocked_dict = dict(zip(er[0].loni_blocked_keys, er[0].loni_blocked_values))
+
+    assert sum(down_dict.values()) > sum(blocked_dict.values())
+    assert down_dict["tcp.timeout"] > 0.5
+
+
+def test_website_web_analysis_blocked_connect_reset(
+    measurements, netinfodb, fingerprintdb
+):
+    msmt_path = measurements[
+        "20240302000048.790188_RU_webconnectivity_e7ffd3bc0f525eb7"
+    ]
+    msmt = load_measurement(msmt_path=msmt_path)
+    er, web_analysis, web_obs, web_ctrl_obs = make_web_er_from_msmt(
+        msmt, fingerprintdb=fingerprintdb, netinfodb=netinfodb
+    )
+    # assert len(web_analysis) == len(web_obs)
+    assert len(web_ctrl_obs) == 4
+
+    assert len(er) == 1
+    # TODO(art): this should be changed
+    # assert er[0].loni_ok_value == 0
+    assert er[0].loni_ok_value < 0.2
+
+    ok_dict = dict(zip(er[0].loni_ok_keys, er[0].loni_ok_values))
+    assert ok_dict["tls"] == 0
+
+    down_dict = dict(zip(er[0].loni_down_keys, er[0].loni_down_values))
+    blocked_dict = dict(zip(er[0].loni_blocked_keys, er[0].loni_blocked_values))
+
+    assert sum(down_dict.values()) < sum(blocked_dict.values())
+    assert blocked_dict["tls.connection_reset"] > 0.5
+
+
+def print_debug_er(er):
+    for idx, e in enumerate(er):
+        print(f"\n# ER#{idx}")
+        for idx, transcript in enumerate(e.analysis_transcript_list):
+            print(f"## Analysis #{idx}")
+            print("\n".join(transcript))
+        pprint(er)
+
+
+def test_website_web_analysis_nxdomain_down(measurements, netinfodb, fingerprintdb):
+    msmt_path = measurements[
+        "20240302000050.000654_SN_webconnectivity_fe4221088fbdcb0a"
+    ]
+    msmt = load_measurement(msmt_path=msmt_path)
+    er, web_analysis, web_obs, web_ctrl_obs = make_web_er_from_msmt(
+        msmt, fingerprintdb=fingerprintdb, netinfodb=netinfodb
+    )
+    assert len(web_analysis) == len(web_obs)
+    assert len(web_ctrl_obs) == 2
+
+    assert len(er) == 1
+    assert er[0].loni_ok_value < 0.2
+
+    ok_dict = dict(zip(er[0].loni_ok_keys, er[0].loni_ok_values))
+    assert ok_dict["dns"] == 0
+
+    down_dict = dict(zip(er[0].loni_down_keys, er[0].loni_down_values))
+    blocked_dict = dict(zip(er[0].loni_blocked_keys, er[0].loni_blocked_values))
+
+    assert sum(down_dict.values()) > sum(blocked_dict.values())
+    assert down_dict["dns.nxdomain"] > 0.7
+
+
+def test_website_web_analysis_nxdomain_blocked(measurements, netinfodb, fingerprintdb):
+    msmt_path = measurements[
+        "20240302000305.316064_EG_webconnectivity_397bca9091b07444"
+    ]
+    msmt = load_measurement(msmt_path=msmt_path)
+    er, web_analysis, web_obs, web_ctrl_obs = make_web_er_from_msmt(
+        msmt, fingerprintdb=fingerprintdb, netinfodb=netinfodb
+    )
+    assert len(web_analysis) == len(web_obs)
+    assert len(web_ctrl_obs) == 7
+
+    assert len(er) == 1
+    assert er[0].loni_ok_value < 0.2
+
+    ok_dict = dict(zip(er[0].loni_ok_keys, er[0].loni_ok_values))
+    assert ok_dict["dns"] == 0
+
+    down_dict = dict(zip(er[0].loni_down_keys, er[0].loni_down_values))
+    blocked_dict = dict(zip(er[0].loni_blocked_keys, er[0].loni_blocked_values))
+
+    assert sum(down_dict.values()) < sum(blocked_dict.values())
+    assert blocked_dict["dns.nxdomain"] > 0.7
+
+
+def test_website_web_analysis_blocked_inconsistent_country(
+    measurements, netinfodb, fingerprintdb
+):
+    msmt_path = measurements[
+        "20240309112858.009725_SE_webconnectivity_dce757ef4ec9b6c8"
+    ]
+    msmt = load_measurement(msmt_path=msmt_path)
+    er, web_analysis, web_obs, web_ctrl_obs = make_web_er_from_msmt(
+        msmt, fingerprintdb=fingerprintdb, netinfodb=netinfodb
+    )
+    assert len(web_analysis) == len(web_obs)
+    assert len(web_ctrl_obs) == 3
+
+    assert len(er) == 1
+    assert er[0].loni_ok_value < 0.2
+
+    ok_dict = dict(zip(er[0].loni_ok_keys, er[0].loni_ok_values))
+    assert ok_dict["dns"] == 0
+
+    down_dict = dict(zip(er[0].loni_down_keys, er[0].loni_down_values))
+    blocked_dict = dict(zip(er[0].loni_blocked_keys, er[0].loni_blocked_values))
+
+    assert sum(down_dict.values()) > sum(blocked_dict.values())
diff --git a/oonipipeline/tests/test_cli.py b/oonipipeline/tests/test_cli.py
new file mode 100644
index 00000000..9e2f9959
--- /dev/null
+++ b/oonipipeline/tests/test_cli.py
@@ -0,0 +1,140 @@
+import asyncio
+from multiprocessing import Process
+from pathlib import Path
+import time
+
+from oonipipeline.cli.commands import cli
+
+
+def wait_for_mutations(db, table_name):
+    while True:
+        res = db.execute(
+            f"SELECT * FROM system.mutations WHERE is_done=0 AND table='{table_name}';"
+        )
+        if len(res) == 0:  # type: ignore
+            break
+        time.sleep(1)
+
+
+def test_full_workflow(
+    db,
+    cli_runner,
+    fingerprintdb,
+    netinfodb,
+    datadir,
+    tmp_path: Path,
+    temporal_dev_server,
+):
+    result = cli_runner.invoke(
+        cli,
+        [
+            "mkobs",
+            "--probe-cc",
+            "BA",
+            "--start-day",
+            "2022-10-20",
+            "--end-day",
+            "2022-10-21",
+            "--test-name",
+            "web_connectivity",
+            "--create-tables",
+            "--data-dir",
+            datadir,
+            "--clickhouse",
+            db.clickhouse_url,
+            # "--archives-dir",
+            # tmp_path.absolute(),
+        ],
+    )
+    assert result.exit_code == 0
+    # assert len(list(tmp_path.glob("*.warc.gz"))) == 1
+    res = db.execute(
+        "SELECT bucket_date, COUNT(DISTINCT(measurement_uid)) FROM obs_web WHERE probe_cc = 'BA' GROUP BY bucket_date"
+    )
+    bucket_dict = dict(res)
+    assert "2022-10-20" in bucket_dict, bucket_dict
+    assert bucket_dict["2022-10-20"] == 200, bucket_dict
+    obs_count = bucket_dict["2022-10-20"]
+
+    result = cli_runner.invoke(
+        cli,
+        [
+            "mkobs",
+            "--probe-cc",
+            "BA",
+            "--start-day",
+            "2022-10-20",
+            "--end-day",
+            "2022-10-21",
+            "--test-name",
+            "web_connectivity",
+            "--create-tables",
+            "--data-dir",
+            datadir,
+            "--clickhouse",
+            db.clickhouse_url,
+        ],
+    )
+    assert result.exit_code == 0
+
+    # Wait for the mutation to finish running
+    wait_for_mutations(db, "obs_web")
+    res = db.execute(
+        "SELECT bucket_date, COUNT(DISTINCT(measurement_uid)) FROM obs_web WHERE probe_cc = 'BA' GROUP BY bucket_date"
+    )
+    bucket_dict = dict(res)
+    assert "2022-10-20" in bucket_dict, bucket_dict
+    # By re-running it against the same date, we should still get the same observation count
+    assert bucket_dict["2022-10-20"] == obs_count, bucket_dict
+
+    result = cli_runner.invoke(
+        cli,
+        [
+            "mkgt",
+            "--start-day",
+            "2022-10-20",
+            "--end-day",
+            "2022-10-21",
+            "--data-dir",
+            datadir,
+            "--clickhouse",
+            db.clickhouse_url,
+        ],
+    )
+    assert result.exit_code == 0
+
+    # result = cli_runner.invoke(
+    #    cli,
+    #    [
+    #        "fphunt",
+    #        "--data-dir",
+    #        datadir,
+    #        "--archives-dir",
+    #        tmp_path.absolute(),
+    #    ],
+    # )
+    # assert result.exit_code == 0
+
+    result = cli_runner.invoke(
+        cli,
+        [
+            "mkanalysis",
+            "--probe-cc",
+            "BA",
+            "--start-day",
+            "2022-10-20",
+            "--end-day",
+            "2022-10-21",
+            "--test-name",
+            "web_connectivity",
+            "--data-dir",
+            datadir,
+            "--clickhouse",
+            db.clickhouse_url,
+        ],
+    )
+    assert result.exit_code == 0
+    res = db.execute(
+        "SELECT COUNT(DISTINCT(measurement_uid)) FROM measurement_experiment_result WHERE measurement_uid LIKE '20221020%' AND location_network_cc = 'BA'"
+    )
+    assert res[0][0] == 200  # type: ignore
diff --git a/oonipipeline/tests/test_ctrl.py b/oonipipeline/tests/test_ctrl.py
index 1a7f384a..f03418bc 100644
--- a/oonipipeline/tests/test_ctrl.py
+++ b/oonipipeline/tests/test_ctrl.py
@@ -8,7 +8,9 @@
     WebGroundTruthDB,
     iter_web_ground_truths,
 )
-from oonipipeline.workflows.observations import make_observations_for_file_entry_batch
+from oonipipeline.temporal.activities.observations import (
+    make_observations_for_file_entry_batch,
+)
 
 
 def test_web_ground_truth_from_clickhouse(db, datadir, netinfodb, tmp_path):
diff --git a/oonipipeline/tests/test_db.py b/oonipipeline/tests/test_db.py
index a1b025a5..e07ac08b 100644
--- a/oonipipeline/tests/test_db.py
+++ b/oonipipeline/tests/test_db.py
@@ -1,8 +1,71 @@
+from dataclasses import dataclass
+from typing import Dict, List, Optional, Tuple
 from unittest.mock import MagicMock, call
 
 from clickhouse_driver import Client
 
 from oonipipeline.db.connections import ClickhouseConnection
+from oonipipeline.db.create_tables import (
+    get_table_column_diff,
+    get_column_map_from_create_query,
+    typing_to_clickhouse,
+)
+
+
+def test_create_tables():
+    col_map = get_column_map_from_create_query(
+        """
+    CREATE TABLE IF NOT EXISTS my_table
+    (
+        col_int Int32,
+        col_str String,
+        col_dict String,
+        col_opt_list_str Nullable(Array(String)),
+        col_opt_tup_str_str Nullable(Tuple(String, String)),
+        col_opt_list_tup_str_byt Nullable(Array(Array(String))),
+        col_dict_str_str Map(String, String)
+    )
+    ENGINE = MergeTree()
+    PRIMARY KEY (col_int)
+"""
+    )
+    assert col_map["col_int"] == typing_to_clickhouse(int)
+    assert col_map["col_str"] == typing_to_clickhouse(str)
+    assert col_map["col_dict"] == typing_to_clickhouse(dict)
+    assert col_map["col_opt_list_str"] == typing_to_clickhouse(Optional[List[str]])
+    assert col_map["col_opt_tup_str_str"] == typing_to_clickhouse(
+        Optional[Tuple[str, str]]
+    )
+    assert col_map["col_opt_list_tup_str_byt"] == typing_to_clickhouse(
+        Optional[List[Tuple[str, bytes]]]
+    )
+    assert col_map["col_dict_str_str"] == typing_to_clickhouse(Dict[str, str])
+
+    @dataclass
+    class SampleTable:
+        __table_name__ = "my_table"
+
+        my_col_int: int
+        my_new_col_str: str
+
+    db = MagicMock()
+    db.execute.return_value = [
+        [
+            """
+    CREATE TABLE IF NOT EXISTS my_table
+    (
+        my_col_int Int32,
+    )
+    ENGINE = MergeTree()
+    PRIMARY KEY (my_col_int)"""
+        ]
+    ]
+    diff = get_table_column_diff(db=db, base_class=SampleTable)
+    assert len(diff) == 1
+    assert diff[0].table_name == "my_table"
+    assert diff[0].column_name == "my_new_col_str"
+    assert diff[0].expected_type == "String"
+    assert diff[0].actual_type == None
 
 
 def test_flush_rows(db):
diff --git a/oonipipeline/tests/test_experiment_results.py b/oonipipeline/tests/test_experiment_results.py
deleted file mode 100644
index ffa1a551..00000000
--- a/oonipipeline/tests/test_experiment_results.py
+++ /dev/null
@@ -1,71 +0,0 @@
-from pprint import pprint
-
-from oonidata.models.observations import print_nice, print_nice_vertical
-from oonidata.dataclient import load_measurement
-
-from oonipipeline.analysis.control import (
-    BodyDB,
-    WebGroundTruthDB,
-    iter_ground_truths_from_web_control,
-)
-from oonipipeline.analysis.web_analysis import make_web_analysis
-from oonipipeline.analysis.website_experiment_results import (
-    make_website_experiment_results,
-)
-from oonipipeline.transforms.observations import measurement_to_observations
-
-
-# Check this for wc 0.5 overwriting tls analsysis
-# 20231031000227.813597_MY_webconnectivity_2f0b80761373aa7e
-def test_website_experiment_results(measurements, netinfodb, fingerprintdb):
-    msmt = load_measurement(
-        msmt_path=measurements[
-            "20221101055235.141387_RU_webconnectivity_046ce024dd76b564"
-        ]
-    )
-    web_observations, web_control_observations = measurement_to_observations(
-        msmt, netinfodb=netinfodb
-    )
-    assert isinstance(msmt.input, str)
-    web_ground_truth_db = WebGroundTruthDB()
-    web_ground_truth_db.build_from_rows(
-        rows=iter_ground_truths_from_web_control(
-            web_control_observations=web_control_observations,
-            netinfodb=netinfodb,
-        ),
-    )
-
-    web_ground_truths = web_ground_truth_db.lookup_by_web_obs(web_obs=web_observations)
-    web_analysis = list(
-        make_web_analysis(
-            web_observations=web_observations,
-            web_ground_truths=web_ground_truths,
-            body_db=BodyDB(db=None),  # type: ignore
-            fingerprintdb=fingerprintdb,
-        )
-    )
-
-    # TODO(arturo): there is currently an edge case here which is when we get an
-    # IPv6 answer, since we are ignoring them in the analysis, we will have N
-    # less analysis where N is the number of IPv6 addresses.
-    assert len(web_analysis) == len(web_observations)
-    # for wa in web_analysis:
-    #    print_nice_vertical(wa)
-
-    website_er = list(make_website_experiment_results(web_analysis))
-    assert len(website_er) == 1
-
-    wer = website_er[0]
-    analysis_transcript_list = wer.analysis_transcript_list
-
-    assert (
-        sum(wer.loni_blocked_values) + sum(wer.loni_down_values) + wer.loni_ok_value
-        == 1
-    )
-    assert wer.anomaly == True
-
-    # wer.analysis_transcript_list = None
-    # print_nice_vertical(wer)
-    # for loni in wer.loni_list:
-    #    pprint(loni.to_dict())
-    # print(analysis_transcript_list)
diff --git a/oonipipeline/tests/test_scoring.py b/oonipipeline/tests/test_scoring.py
deleted file mode 100644
index 790fd2c8..00000000
--- a/oonipipeline/tests/test_scoring.py
+++ /dev/null
@@ -1,55 +0,0 @@
-from unittest.mock import MagicMock
-
-import pytest
-
-from oonidata.models.experiment_result import print_nice_er
-from oonidata.dataclient import load_measurement
-
-from oonipipeline.analysis.control import (
-    WebGroundTruthDB,
-    iter_ground_truths_from_web_control,
-)
-from oonipipeline.transforms.observations import measurement_to_observations
-
-
-def test_tcp_scoring(measurements, netinfodb, fingerprintdb):
-    pytest.skip("TODO(arturo): implement this with the new analysis")
-    msmt = load_measurement(
-        msmt_path=measurements[
-            "20221101055235.141387_RU_webconnectivity_046ce024dd76b564"
-        ]
-    )
-    web_observations, web_control_observations = measurement_to_observations(
-        msmt, netinfodb=netinfodb
-    )
-    assert isinstance(msmt.input, str)
-    web_ground_truth_db = WebGroundTruthDB()
-    web_ground_truth_db.build_from_rows(
-        rows=iter_ground_truths_from_web_control(
-            web_control_observations=web_control_observations,
-            netinfodb=netinfodb,
-        ),
-    )
-    gt = web_ground_truth_db.lookup(
-        probe_cc="RU", probe_asn=8402, ip_ports=[("104.244.42.1", 443)]
-    )
-    assert len(gt) == 1
-    assert gt[0].tcp_success == 1
-
-    body_db = MagicMock()
-    body_db.lookup = MagicMock()
-    body_db.lookup.return_value = []
-
-    web_ground_truths = web_ground_truth_db.lookup_by_web_obs(web_obs=web_observations)
-    assert len(web_ground_truths) == 3
-    er = make_website_experiment_result(
-        web_observations=web_observations,
-        web_ground_truths=web_ground_truths,
-        body_db=body_db,
-        fingerprintdb=fingerprintdb,
-    )
-    all_er = list(er)
-
-    tcp_er = list(filter(lambda er: er.outcome_category == "tcp", all_er))
-    assert len(tcp_er) == 1
-    assert tcp_er[0].blocked_score > 0.6
diff --git a/oonipipeline/tests/test_workflows.py b/oonipipeline/tests/test_workflows.py
index 4305c993..fea940e7 100644
--- a/oonipipeline/tests/test_workflows.py
+++ b/oonipipeline/tests/test_workflows.py
@@ -1,9 +1,44 @@
-import asyncio
-from multiprocessing import Process
+from datetime import date, datetime, timedelta, timezone
+import gzip
 from pathlib import Path
+import sqlite3
+from typing import List, Tuple
+from unittest.mock import MagicMock
 import time
 
-from oonipipeline.cli.commands import cli
+import pytest
+
+from oonidata.dataclient import stream_jsonl, load_measurement
+from oonidata.models.nettests.dnscheck import DNSCheck
+from oonidata.models.nettests.web_connectivity import WebConnectivity
+from oonidata.models.nettests.http_invalid_request_line import HTTPInvalidRequestLine
+from oonidata.models.observations import HTTPMiddleboxObservation
+
+from oonipipeline.temporal.activities.common import get_obs_count_by_cc, ObsCountParams
+from oonipipeline.temporal.activities.observations import (
+    make_observations_for_file_entry_batch,
+)
+from oonipipeline.transforms.measurement_transformer import MeasurementTransformer
+from oonipipeline.transforms.observations import measurement_to_observations
+from oonipipeline.temporal.activities.analysis import (
+    MakeAnalysisParams,
+    make_analysis_in_a_day,
+    make_cc_batches,
+)
+from oonipipeline.temporal.common import (
+    get_prev_range,
+    maybe_delete_prev_range,
+)
+from oonipipeline.temporal.activities.ground_truths import (
+    MakeGroundTruthsParams,
+    make_ground_truths_in_day,
+)
+from oonipipeline.temporal.activities.observations import (
+    write_observations_to_db,
+)
+
+# from oonipipeline.workflows.response_archiver import ResponseArchiver
+# from oonipipeline.workflows.fingerprint_hunter import fingerprint_hunter
 
 
 def wait_for_mutations(db, table_name):
@@ -16,124 +51,303 @@ def wait_for_mutations(db, table_name):
         time.sleep(1)
 
 
-def test_full_workflow(
-    db,
-    cli_runner,
-    fingerprintdb,
-    netinfodb,
-    datadir,
-    tmp_path: Path,
-    temporal_dev_server,
-):
-    result = cli_runner.invoke(
-        cli,
-        [
-            "mkobs",
-            "--probe-cc",
-            "BA",
-            "--start-day",
-            "2022-10-20",
-            "--end-day",
-            "2022-10-21",
-            "--test-name",
-            "web_connectivity",
-            "--create-tables",
-            "--data-dir",
-            datadir,
-            "--clickhouse",
-            db.clickhouse_url,
-            # "--archives-dir",
-            # tmp_path.absolute(),
-        ],
-    )
-    assert result.exit_code == 0
-    # assert len(list(tmp_path.glob("*.warc.gz"))) == 1
-    res = db.execute(
-        "SELECT COUNT(DISTINCT(measurement_uid)) FROM obs_web WHERE bucket_date = '2022-10-20' AND probe_cc = 'BA'"
-    )
-    assert res[0][0] == 200  # type: ignore
-    res = db.execute(
-        "SELECT COUNT() FROM obs_web WHERE bucket_date = '2022-10-20' AND probe_cc = 'BA'"
-    )
-    obs_count = res[0][0]  # type: ignore
-
-    result = cli_runner.invoke(
-        cli,
-        [
-            "mkobs",
-            "--probe-cc",
-            "BA",
-            "--start-day",
-            "2022-10-20",
-            "--end-day",
-            "2022-10-21",
-            "--test-name",
-            "web_connectivity",
-            "--create-tables",
-            "--data-dir",
-            datadir,
-            "--clickhouse",
-            db.clickhouse_url,
-        ],
-    )
-    assert result.exit_code == 0
-
-    # Wait for the mutation to finish running
-    wait_for_mutations(db, "obs_web")
-    res = db.execute(
-        "SELECT COUNT() FROM obs_web WHERE bucket_date = '2022-10-20' AND probe_cc = 'BA'"
-    )
-    # By re-running it against the same date, we should still get the same observation count
-    assert res[0][0] == obs_count  # type: ignore
-
-    result = cli_runner.invoke(
-        cli,
-        [
-            "mkgt",
-            "--start-day",
-            "2022-10-20",
-            "--end-day",
-            "2022-10-21",
-            "--data-dir",
-            datadir,
-            "--clickhouse",
-            "clickhouse://localhost/testing_oonidata",
-        ],
-    )
-    assert result.exit_code == 0
-
-    # result = cli_runner.invoke(
-    #    cli,
-    #    [
-    #        "fphunt",
-    #        "--data-dir",
-    #        datadir,
-    #        "--archives-dir",
-    #        tmp_path.absolute(),
-    #    ],
-    # )
-    # assert result.exit_code == 0
-
-    result = cli_runner.invoke(
-        cli,
-        [
-            "mkanalysis",
-            "--probe-cc",
-            "BA",
-            "--start-day",
-            "2022-10-20",
-            "--end-day",
-            "2022-10-21",
-            "--test-name",
-            "web_connectivity",
-            "--data-dir",
-            datadir,
-            "--clickhouse",
-            db.clickhouse_url,
-        ],
-    )
-    assert result.exit_code == 0
-    res = db.execute(
-        "SELECT COUNT(DISTINCT(measurement_uid)) FROM measurement_experiment_result WHERE measurement_uid LIKE '20221020%' AND location_network_cc = 'BA'"
-    )
-    assert res[0][0] == 200  # type: ignore
+def test_get_prev_range(db):
+    db.execute("DROP TABLE IF EXISTS test_range")
+    db.execute(
+        """CREATE TABLE test_range (
+        created_at DateTime64(3, 'UTC'),
+        bucket_date String,
+        test_name String,
+        probe_cc String
+    )
+    ENGINE = MergeTree
+    ORDER BY (bucket_date, created_at)
+    """
+    )
+    bucket_date = "2000-01-01"
+    test_name = "web_connectivity"
+    probe_cc = "IT"
+    min_time = datetime(2000, 1, 1, 23, 42, 00)
+    rows = [(min_time, bucket_date, test_name, probe_cc)]
+    for i in range(200):
+        rows.append((min_time + timedelta(seconds=i), bucket_date, test_name, probe_cc))
+    db.execute(
+        "INSERT INTO test_range (created_at, bucket_date, test_name, probe_cc) VALUES",
+        rows,
+    )
+    prev_range = get_prev_range(
+        db,
+        "test_range",
+        test_name=[test_name],
+        bucket_date=bucket_date,
+        probe_cc=[probe_cc],
+    )
+    assert prev_range.min_created_at and prev_range.max_created_at
+    assert prev_range.min_created_at == (min_time - timedelta(seconds=1))
+    assert prev_range.max_created_at == (rows[-1][0] + timedelta(seconds=1))
+    db.execute("TRUNCATE TABLE test_range")
+
+    bucket_date = "2000-03-01"
+    test_name = "web_connectivity"
+    probe_cc = "IT"
+    min_time = datetime(2000, 1, 1, 23, 42, 00)
+    rows: List[Tuple[datetime, str, str, str]] = []
+    for i in range(10):
+        rows.append(
+            (min_time + timedelta(seconds=i), "2000-02-01", test_name, probe_cc)
+        )
+    min_time = rows[-1][0]
+    for i in range(10):
+        rows.append((min_time + timedelta(seconds=i), bucket_date, test_name, probe_cc))
+
+    db.execute(
+        "INSERT INTO test_range (created_at, bucket_date, test_name, probe_cc) VALUES",
+        rows,
+    )
+    prev_range = get_prev_range(
+        db,
+        "test_range",
+        test_name=[test_name],
+        bucket_date=bucket_date,
+        probe_cc=[probe_cc],
+    )
+    assert prev_range.min_created_at and prev_range.max_created_at
+    assert prev_range.min_created_at == (min_time - timedelta(seconds=1))
+    assert prev_range.max_created_at == (rows[-1][0] + timedelta(seconds=1))
+
+    maybe_delete_prev_range(
+        db=db,
+        prev_range=prev_range,
+    )
+    wait_for_mutations(db, "test_range")
+    res = db.execute("SELECT COUNT() FROM test_range")
+    assert res[0][0] == 10
+    db.execute("DROP TABLE test_range")
+
+
+def test_make_cc_batches():
+    cc_batches = make_cc_batches(
+        cnt_by_cc={"IT": 100, "IR": 300, "US": 1000},
+        probe_cc=["IT", "IR", "US"],
+        parallelism=2,
+    )
+    assert len(cc_batches) == 2
+    # We expect the batches to be broken up into (IT, IR), ("US")
+    assert any([set(x) == set(["US"]) for x in cc_batches]) == True
+
+
+def test_make_file_entry_batch(datadir, db):
+    file_entry_batch = [
+        (
+            "ooni-data-eu-fra",
+            "raw/20231031/15/IR/webconnectivity/2023103115_IR_webconnectivity.n1.0.tar.gz",
+            "tar.gz",
+            4074306,
+        )
+    ]
+    obs_msmt_count = make_observations_for_file_entry_batch(
+        file_entry_batch, db.clickhouse_url, 100, datadir, "2023-10-31", ["IR"], False
+    )
+    assert obs_msmt_count == 453
+    make_ground_truths_in_day(
+        MakeGroundTruthsParams(
+            day=date(2023, 10, 31).strftime("%Y-%m-%d"),
+            clickhouse=db.clickhouse_url,
+            data_dir=datadir,
+        ),
+    )
+    analysis_res = make_analysis_in_a_day(
+        MakeAnalysisParams(
+            probe_cc=["IR"],
+            test_name=["webconnectivity"],
+            clickhouse=db.clickhouse_url,
+            data_dir=datadir,
+            fast_fail=False,
+            day=date(2023, 10, 31).strftime("%Y-%m-%d"),
+        ),
+    )
+    assert analysis_res["count"] == obs_msmt_count
+
+
+def test_write_observations(measurements, netinfodb, db):
+    msmt_uids = [
+        ("20210101190046.780850_US_webconnectivity_3296f126f79ca186", "2021-01-01"),
+        ("20210101181154.037019_CH_webconnectivity_68ce38aa9e3182c2", "2021-01-01"),
+        ("20231031032643.267235_GR_dnscheck_abcbfc460b9424b6", "2023-10-31"),
+        (
+            "20231101164541.763506_NP_httpinvalidrequestline_0cf676868fa36cc4",
+            "2023-10-31",
+        ),
+        (
+            "20231101164544.534107_BR_httpheaderfieldmanipulation_4caa0b0556f0b141",
+            "2023-10-31",
+        ),
+        ("20231101164649.235575_RU_tor_ccf7519bf683c022", "2023-10-31"),
+        (
+            "20230907000740.785053_BR_httpinvalidrequestline_bdfe6d70dcbda5e9",
+            "2023-09-07",
+        ),
+    ]
+    for msmt_uid, bucket_date in msmt_uids:
+        msmt = load_measurement(msmt_path=measurements[msmt_uid])
+        write_observations_to_db(msmt, netinfodb, db, bucket_date)
+    db.close()
+    cnt_by_cc = get_obs_count_by_cc(
+        ObsCountParams(
+            clickhouse_url=db.clickhouse_url,
+            start_day="2020-01-01",
+            end_day="2023-12-01",
+        )
+    )
+    assert cnt_by_cc["CH"] == 2
+    assert cnt_by_cc["GR"] == 4
+    assert cnt_by_cc["US"] == 3
+    assert cnt_by_cc["RU"] == 3
+
+
+def test_hirl_observations(measurements, netinfodb):
+    msmt = load_measurement(
+        msmt_path=measurements[
+            "20230907000740.785053_BR_httpinvalidrequestline_bdfe6d70dcbda5e9"
+        ]
+    )
+    assert isinstance(msmt, HTTPInvalidRequestLine)
+    middlebox_obs: List[HTTPMiddleboxObservation] = measurement_to_observations(
+        msmt, netinfodb=netinfodb
+    )[0]
+    assert isinstance(middlebox_obs[0], HTTPMiddleboxObservation)
+    assert middlebox_obs[0].hirl_success == True
+    assert middlebox_obs[0].hirl_sent_0 != middlebox_obs[0].hirl_received_0
+
+
+def test_insert_query_for_observation(measurements, netinfodb):
+    http_blocked = load_measurement(
+        msmt_path=measurements[
+            "20220608121828.356206_RU_webconnectivity_80e3fa60eb2cd026"
+        ]
+    )
+    assert isinstance(http_blocked, WebConnectivity)
+    mt = MeasurementTransformer(measurement=http_blocked, netinfodb=netinfodb)
+    all_web_obs = [
+        obs
+        for obs in mt.make_http_observations(
+            http_blocked.test_keys.requests,
+        )
+    ]
+    assert all_web_obs[-1].request_url == "http://proxy.org/"
+
+
+def test_web_connectivity_processor(netinfodb, measurements):
+    msmt = load_measurement(
+        msmt_path=measurements[
+            "20220627131742.081225_GB_webconnectivity_e1e2cf4db492b748"
+        ]
+    )
+    assert isinstance(msmt, WebConnectivity)
+
+    web_obs_list, web_ctrl_list = measurement_to_observations(msmt, netinfodb=netinfodb)
+    assert len(web_obs_list) == 3
+    assert len(web_ctrl_list) == 3
+
+
+def test_dnscheck_processor(measurements, netinfodb):
+    db = MagicMock()
+    db.write_row = MagicMock()
+
+    msmt = load_measurement(
+        msmt_path=measurements["20221013000000.517636_US_dnscheck_bfd6d991e70afa0e"]
+    )
+    assert isinstance(msmt, DNSCheck)
+    obs_list = measurement_to_observations(msmt=msmt, netinfodb=netinfodb)[0]
+    assert len(obs_list) == 20
+
+
+def test_full_processing(raw_measurements, netinfodb):
+    for msmt_path in raw_measurements.glob("*/*/*.jsonl.gz"):
+        with msmt_path.open("rb") as in_file:
+            for msmt_dict in stream_jsonl(in_file):
+                msmt = load_measurement(msmt_dict)
+                measurement_to_observations(
+                    msmt=msmt,
+                    netinfodb=netinfodb,
+                )
+
+
+def test_archive_http_transaction(measurements, tmpdir):
+    pytest.skip("TODO(art): fixme")
+    db = MagicMock()
+    db.write_row = MagicMock()
+
+    msmt = load_measurement(
+        msmt_path=measurements[
+            "20220627131742.081225_GB_webconnectivity_e1e2cf4db492b748"
+        ]
+    )
+    assert isinstance(msmt, WebConnectivity)
+    assert msmt.test_keys.requests
+    dst_dir = Path(tmpdir)
+    with ResponseArchiver(dst_dir=dst_dir) as archiver:
+        for http_transaction in msmt.test_keys.requests:
+            if not http_transaction.response or not http_transaction.request:
+                continue
+            request_url = http_transaction.request.url
+            status_code = http_transaction.response.code or 0
+            response_headers = http_transaction.response.headers_list_bytes or []
+            response_body = http_transaction.response.body_bytes
+            assert response_body
+            archiver.archive_http_transaction(
+                request_url=request_url,
+                status_code=status_code,
+                response_headers=response_headers,
+                response_body=response_body,
+                matched_fingerprints=[],
+            )
+
+    warc_files = list(dst_dir.glob("*.warc.gz"))
+    assert len(warc_files) == 1
+    with gzip.open(warc_files[0], "rb") as in_file:
+        assert b"Run OONI Probe to detect internet censorship" in in_file.read()
+
+    conn = sqlite3.connect(dst_dir / "graveyard.sqlite3")
+    res = conn.execute("SELECT COUNT() FROM oonibodies_archive")
+    assert res.fetchone()[0] == 1
+
+
+def test_fingerprint_hunter(fingerprintdb, measurements, tmpdir):
+    pytest.skip("TODO(art): fixme")
+    db = MagicMock()
+    db.write_rows = MagicMock()
+
+    archives_dir = Path(tmpdir)
+    http_blocked = load_measurement(
+        msmt_path=measurements[
+            "20220608121828.356206_RU_webconnectivity_80e3fa60eb2cd026"
+        ]
+    )
+    assert isinstance(http_blocked, WebConnectivity)
+    with ResponseArchiver(dst_dir=archives_dir) as response_archiver:
+        assert http_blocked.test_keys.requests
+        for http_transaction in http_blocked.test_keys.requests:
+            if not http_transaction.response or not http_transaction.request:
+                continue
+            request_url = http_transaction.request.url
+            status_code = http_transaction.response.code or 0
+            response_headers = http_transaction.response.headers_list_bytes or []
+            response_body = http_transaction.response.body_bytes
+            assert response_body
+            response_archiver.archive_http_transaction(
+                request_url=request_url,
+                status_code=status_code,
+                response_headers=response_headers,
+                response_body=response_body,
+                matched_fingerprints=[],
+            )
+
+    archive_path = list(archives_dir.glob("*.warc.gz"))[0]
+    detected_fps = list(
+        fingerprint_hunter(
+            fingerprintdb=fingerprintdb,
+            archive_path=archive_path,
+        )
+    )
+    assert len(detected_fps) == 1