Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(docs): Document __DATAHUB_TO_FILE_ directive #10968

Merged
merged 5 commits into from
Jul 26, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions metadata-ingestion/recipe_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,35 @@ similar to variable substitution in GNU bash or in docker-compose files.
For details, see [variable-substitution](https://docs.docker.com/compose/compose-file/compose-file-v2/#variable-substitution).
This environment variable substitution should be used to mask sensitive information in recipe files. As long as you can get env variables securely to the ingestion process there would not be any need to store sensitive information in recipes.

### Loading Sensitive Data as Files in Recipes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comma for better readability.

A comma might be missing after "necessary configuration".

- To add files to ingestion processes as part of the necessary configuration DataHub offers a directive `__DATAHUB_TO_FILE_` which allows recipes to load content as files.
+ To add files to ingestion processes as part of the necessary configuration, DataHub offers a directive `__DATAHUB_TO_FILE_` which allows recipes to load content as files.

Committable suggestion was skipped due to low confidence.



Some sources (e.g. kafka, bigquery, mysql) require paths to files on a local file system. This doesn't work for UI ingestion, where the recipe needs to be totally self-sufficient. To add files to ingestion processes as part of the necessary configuration, DataHub offers a directive `__DATAHUB_TO_FILE_` which allows recipes to set the contents of files.

The syntax for this directive is: `__DATAHUB_TO_FILE_<property>: <value>` which will get turned into `<property>: <path to file containing value>`. Note that value can be specified inline or using an env var/secret.

I.e:

```yaml
source:
type: mysql
config:
# Coordinates
host_port: localhost:3306
database: dbname

# Credentials
username: root
password: example
# If you need to use SSL with MySQL:
options:
connect_args:
__DATAHUB_TO_FILE_ssl_key: '${secret}' # use this for secrets that you need to mount to a file
# this will get converted into
# ssl_key: /tmp/path/to/file # where file contains the contents of ${secret}
...
```

### Transformations

If you'd like to modify data before it reaches the ingestion sinks – for instance, adding additional owners or tags – you can use a transformer to write your own module and integrate it with DataHub. Transformers require extending the recipe with a new section to describe the transformers that you want to run.
Expand Down
Loading