Skip to content

Commit

Permalink
doc(ingestion/airflow-plugin): update for developers (datahub-project…
Browse files Browse the repository at this point in the history
  • Loading branch information
dushayntAW authored and sleeperdeep committed Jun 25, 2024
1 parent d120e4f commit 9d589b0
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 1 deletion.
2 changes: 1 addition & 1 deletion docs/lineage/airflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ enabled = True # default
| -------------------------- | -------------------- | ---------------------------------------------------------------------------------------- |
| enabled | true | If the plugin should be enabled. |
| conn_id | datahub_rest_default | The name of the datahub rest connection. |
| cluster | prod | name of the airflow cluster |
| cluster | prod | name of the airflow cluster, this is equivalent to the `env` of the instance |
| capture_ownership_info | true | Extract DAG ownership. |
| capture_tags_info | true | Extract DAG tags. |
| capture_executions | true | Extract task runs and success/failure statuses. This will show up in DataHub "Runs" tab. |
Expand Down
23 changes: 23 additions & 0 deletions metadata-ingestion/developing.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,30 @@ cd metadata-ingestion-modules/airflow-plugin
../../gradlew :metadata-ingestion-modules:airflow-plugin:installDev
source venv/bin/activate
datahub version # should print "DataHub CLI version: unavailable (installed in develop mode)"

# start the airflow web server
export AIRFLOW_HOME=~/airflow
airflow webserver --port 8090 -d

# start the airflow scheduler
airflow scheduler

# access the airflow service and run any of the DAG
# open http://localhost:8090/
# select any DAG and click on the `play arrow` button to start the DAG

# add the debug lines in the codebase, i.e. in ./src/datahub_airflow_plugin/datahub_listener.py
logger.debug("this is the sample debug line")

# run the DAG again and you can see the debug lines in the task_run log at,
#1. click on the `timestamp` in the `Last Run` column
#2. select the task
#3. click on the `log` option
```


> **P.S. if you are not able to see the log lines, then restart the `airflow scheduler` and rerun the DAG**
### (Optional) Set up your Python environment for developing on Dagster Plugin

From the repository root:
Expand Down

0 comments on commit 9d589b0

Please sign in to comment.