You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: readme.md
+5-6
Original file line number
Diff line number
Diff line change
@@ -98,7 +98,7 @@ A minimal modern data stack with working data pipelines in a single Docker conta
98
98
-[Superset][Superset-url] - data visualization and exploration platform
99
99
- Sample data pipelines with [USGS Earthquake][USGSEarthquakeAPI-url] data and [European Gas Inventory][GIEAPI-url] levels.
100
100
101
-
Explore the functionality of the tools by using the examples as-is; and to modify and expand on the exmples for further exploration.
101
+
Explore the functionality of the tools by using the examples as-is; modify and expand on the examples for further exploration.
102
102
103
103
This is a convenient starting point for exploration. The project is not a showcase of all or even the best functionality that each tool has to offer.
104
104
@@ -140,7 +140,6 @@ Have [Docker Desktop][DockerDesktop-url] installed.
140
140
141
141
### Installation
142
142
143
-
In order to create the docker container you can do the following:
144
143
<!--
145
144
1. Clone this GIT repo:
146
145
```sh
@@ -206,15 +205,15 @@ Below we highlight the core configuration for these components. For (much) more
206
205
207
206
### Definition and Configuration
208
207
209
-
The data pipelines are fully defined in a set of files. This includes the source definitions, schedules, dependencies, transformation logic, tests and documentation. (The reporting/dashboards in Superset are defined within Superset, but can be exported from there.)
208
+
The data pipelines are fully defined in a set of files. This includes the source definitions, schedules, dependencies, transformation logic, tests and documentation. The reporting/dashboards in Superset are defined within Superset, but can be exported from there.
210
209
211
210
These files are all found in the `/project/mimodast/` folder in the Docker container. It is best practice to capture this folder in a version control tool. Git is included in the Docker image.
212
211
213
212
Some of the core files include:
214
213
215
-
- `/project/mimodast/meltano.yml` - this contains items like the source specification, destination databaseand schedule.
214
+
- `/project/mimodast/meltano.yml` - this includes configurations for data source specification, destination database, schedule and more.
216
215
- `/project/mimodast/orhestration/dags/gie_dag.py` - python code defining how to orchestrate a data pipeline in Airflow. Note that the GIE data uses this manually created file, whereas the USGS data orhestration relies purely on logic defined in `meltano.yml`.
217
-
- `/project/mimodast/tranformation/` - this folder contains transformation logic (under `models/`) and also testsand documentation.
216
+
- `/project/mimodast/tranformation/` - this folder contains transformation logic (under `models/`). It also includes configuration for dbt, tests, documentation and various other items.
218
217
219
218
<p align="right">(<a href="#readme-top">back to top</a>)</p>
220
219
@@ -376,7 +375,7 @@ The following differences are noteworthy:
376
375
x-key: $ENV_GIE_XKEY
377
376
```
378
377
379
-
3. Schedule/orhestration is not configured using `meltano.yml` but instead with two manually coded Airflow DAGs. The Python file containing the code for these can be found at `/project/mimodast/orchestrate/dags/gie_dag.py`.
378
+
3. Schedule/orchestration is not configured using `meltano.yml` but instead with two manually coded Airflow DAGs. The Python file containing the code for these can be found at `/project/mimodast/orchestrate/dags/gie_dag.py`.
380
379
- The backfill dag captures historic data from source. To specify the date range, two Airflow variables are used. These values can be changed using the Airflow UI.
381
380
- It takes some time (<1 minute) for the new date range to be reflected in the DAG.
382
381
- Note that using Airflow variables in a DAG in this way is not a [best practice][AirflowBestPractices-url] design but is used for simplicity.
0 commit comments