Skip to content

Commit f01a335

Browse files
authored
Merge pull request #21 from EJOOSTEROP/documentation_update
Documentation only. Minor edits
2 parents e62a4a9 + 5113183 commit f01a335

File tree

1 file changed

+5
-6
lines changed

1 file changed

+5
-6
lines changed

readme.md

+5-6
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ A minimal modern data stack with working data pipelines in a single Docker conta
9898
- [Superset][Superset-url] - data visualization and exploration platform
9999
- Sample data pipelines with [USGS Earthquake][USGSEarthquakeAPI-url] data and [European Gas Inventory][GIEAPI-url] levels.
100100

101-
Explore the functionality of the tools by using the examples as-is; and to modify and expand on the exmples for further exploration.
101+
Explore the functionality of the tools by using the examples as-is; modify and expand on the examples for further exploration.
102102

103103
This is a convenient starting point for exploration. The project is not a showcase of all or even the best functionality that each tool has to offer.
104104

@@ -140,7 +140,6 @@ Have [Docker Desktop][DockerDesktop-url] installed.
140140

141141
### Installation
142142

143-
In order to create the docker container you can do the following:
144143
<!--
145144
1. Clone this GIT repo:
146145
```sh
@@ -206,15 +205,15 @@ Below we highlight the core configuration for these components. For (much) more
206205
207206
### Definition and Configuration
208207
209-
The data pipelines are fully defined in a set of files. This includes the source definitions, schedules, dependencies, transformation logic, tests and documentation. (The reporting/dashboards in Superset are defined within Superset, but can be exported from there.)
208+
The data pipelines are fully defined in a set of files. This includes the source definitions, schedules, dependencies, transformation logic, tests and documentation. The reporting/dashboards in Superset are defined within Superset, but can be exported from there.
210209
211210
These files are all found in the `/project/mimodast/` folder in the Docker container. It is best practice to capture this folder in a version control tool. Git is included in the Docker image.
212211
213212
Some of the core files include:
214213
215-
- `/project/mimodast/meltano.yml` - this contains items like the source specification, destination database and schedule.
214+
- `/project/mimodast/meltano.yml` - this includes configurations for data source specification, destination database, schedule and more.
216215
- `/project/mimodast/orhestration/dags/gie_dag.py` - python code defining how to orchestrate a data pipeline in Airflow. Note that the GIE data uses this manually created file, whereas the USGS data orhestration relies purely on logic defined in `meltano.yml`.
217-
- `/project/mimodast/tranformation/` - this folder contains transformation logic (under `models/`) and also tests and documentation.
216+
- `/project/mimodast/tranformation/` - this folder contains transformation logic (under `models/`). It also includes configuration for dbt, tests, documentation and various other items.
218217
219218
<p align="right">(<a href="#readme-top">back to top</a>)</p>
220219
@@ -376,7 +375,7 @@ The following differences are noteworthy:
376375
x-key: $ENV_GIE_XKEY
377376
```
378377

379-
3. Schedule/orhestration is not configured using `meltano.yml` but instead with two manually coded Airflow DAGs. The Python file containing the code for these can be found at `/project/mimodast/orchestrate/dags/gie_dag.py`.
378+
3. Schedule/orchestration is not configured using `meltano.yml` but instead with two manually coded Airflow DAGs. The Python file containing the code for these can be found at `/project/mimodast/orchestrate/dags/gie_dag.py`.
380379
- The backfill dag captures historic data from source. To specify the date range, two Airflow variables are used. These values can be changed using the Airflow UI.
381380
- It takes some time (<1 minute) for the new date range to be reflected in the DAG.
382381
- Note that using Airflow variables in a DAG in this way is not a [best practice][AirflowBestPractices-url] design but is used for simplicity.

0 commit comments

Comments
 (0)