-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(docs): refactor source and sink docs #3031
Merged
Merged
Changes from all commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
0b2f343
Begin reorg
kevinhu 0916b75
Add links
kevinhu 2bb1d79
Fix link
kevinhu 487a2b6
Fix glue link
kevinhu a24dc59
Add module installs to each page
kevinhu 5c6a19a
Consistency
kevinhu 2382c30
Standardize sqlalchemy pattern
kevinhu 34fbccf
Add missing sql options
kevinhu 9808735
More consistent recipes
kevinhu 9af3cab
Finish consistency checks for recipes
kevinhu 9dc365f
As above
kevinhu 9afa393
Typo fixes
kevinhu c6388cb
More typo fixes
kevinhu 8588cb9
More consistency fixes
kevinhu 63691dd
Fix broken links
kevinhu f186b49
Merge branch 'master' of github.com:kevinhu/datahub into reorganize-docs
kevinhu 410b9b8
Merge
kevinhu 59623e4
Merge
kevinhu eef2a62
Note on allow/deny
kevinhu bee872f
Add questions section
kevinhu 124c0a3
Merge branch 'master' of github.com:kevinhu/datahub into reorganize-docs
kevinhu 6ffd8a1
Fix inconsistencies
kevinhu ba3cb36
Merge branch 'master' of github.com:kevinhu/datahub into reorganize-docs
kevinhu 8a4de6d
Begin separation of quickstart and config details
kevinhu 8bf27a5
Write generic sqlalchemy options
kevinhu 3dbb736
Up to looker
kevinhu 186235f
Add all config vars
kevinhu 35ecc45
Add source config docs
kevinhu 73a42fd
Clean up quickstart configs
kevinhu b1bf7e7
Update usage docs
kevinhu 5933f1f
Formatting
kevinhu bbbe612
Revise capabilities
kevinhu 30f9e6f
Merge branch 'master' of github.com:kevinhu/datahub into reorganize-docs
kevinhu 9cf1acb
Merge
kevinhu aa608b6
PR fixes
kevinhu f429324
Add link back to main readme
kevinhu 5fbac7b
Add link back to recipe section
kevinhu 387137f
Add sink config placeholder
kevinhu 34d6c57
Categories
kevinhu 625baa0
Remove sink compatibility
kevinhu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# Console | ||
|
||
For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md). | ||
|
||
## Setup | ||
|
||
Works with `acryl-datahub` out of the box. | ||
|
||
## Capabilities | ||
|
||
Simply prints each metadata event to stdout. Useful for experimentation and debugging purposes. | ||
|
||
## Quickstart recipe | ||
|
||
Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options. | ||
|
||
For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes). | ||
|
||
```yml | ||
source: | ||
# source configs | ||
|
||
sink: | ||
type: "console" | ||
``` | ||
|
||
## Config details | ||
|
||
None! | ||
|
||
## Questions | ||
|
||
If you've got any questions on configuring this sink, feel free to ping us on [our Slack](https://slack.datahubproject.io/)! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
# DataHub | ||
|
||
## DataHub Rest | ||
|
||
For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md). | ||
|
||
### Setup | ||
|
||
To install this plugin, run `pip install 'acryl-datahub[datahub-rest]'`. | ||
|
||
### Capabilities | ||
|
||
Pushes metadata to DataHub using the GMA rest API. The advantage of the rest-based interface | ||
is that any errors can immediately be reported. | ||
|
||
### Quickstart recipe | ||
|
||
Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options. | ||
|
||
For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes). | ||
|
||
```yml | ||
source: | ||
# source configs | ||
sink: | ||
type: "datahub-rest" | ||
config: | ||
server: "http://localhost:8080" | ||
``` | ||
|
||
### Config details | ||
|
||
Note that a `.` is used to denote nested fields in the YAML recipe. | ||
|
||
| Field | Required | Default | Description | | ||
| -------- | -------- | ------- | ---------------------------- | | ||
kevinhu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| `server` | ✅ | | URL of DataHub GMS endpoint. | | ||
|
||
## DataHub Kafka | ||
|
||
For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md). | ||
|
||
### Setup | ||
|
||
To install this plugin, run `pip install 'acryl-datahub[datahub-kafka]'`. | ||
|
||
### Capabilities | ||
|
||
Pushes metadata to DataHub by publishing messages to Kafka. The advantage of the Kafka-based | ||
interface is that it's asynchronous and can handle higher throughput. | ||
|
||
### Quickstart recipe | ||
|
||
Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options. | ||
|
||
For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes). | ||
|
||
```yml | ||
source: | ||
# source configs | ||
|
||
sink: | ||
kevinhu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
type: "datahub-kafka" | ||
config: | ||
connection: | ||
bootstrap: "localhost:9092" | ||
schema_registry_url: "http://localhost:8081" | ||
``` | ||
|
||
### Config details | ||
|
||
Note that a `.` is used to denote nested fields in the YAML recipe. | ||
|
||
| Field | Required | Default | Description | | ||
| -------------------------------------------- | -------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
kevinhu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| `connection.bootstrap` | ✅ | | Kafka bootstrap URL. | | ||
| `connection.producer_config.<option>` | | | Passed to https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#confluent_kafka.SerializingProducer | | ||
| `connection.schema_registry_url` | ✅ | | URL of schema registry being used. | | ||
| `connection.schema_registry_config.<option>` | | | Passed to https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#confluent_kafka.schema_registry.SchemaRegistryClient | | ||
|
||
The options in the producer config and schema registry config are passed to the Kafka SerializingProducer and SchemaRegistryClient respectively. | ||
|
||
For a full example with a number of security options, see this [example recipe](../examples/recipes/secured_kafka.yml). | ||
|
||
## Questions | ||
|
||
If you've got any questions on configuring this sink, feel free to ping us on [our Slack](https://slack.datahubproject.io/)! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# File | ||
|
||
For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md). | ||
|
||
## Setup | ||
|
||
Works with `acryl-datahub` out of the box. | ||
|
||
## Capabilities | ||
|
||
Outputs metadata to a file. This can be used to decouple metadata sourcing from the | ||
process of pushing it into DataHub, and is particularly useful for debugging purposes. | ||
Note that the [file source](../source_docs/file.md) can read files generated by this sink. | ||
|
||
## Quickstart recipe | ||
|
||
Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options. | ||
|
||
For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes). | ||
|
||
```yml | ||
source: | ||
# source configs | ||
|
||
sink: | ||
type: file | ||
config: | ||
filename: ./path/to/mce/file.json | ||
``` | ||
|
||
## Config details | ||
|
||
Note that a `.` is used to denote nested fields in the YAML recipe. | ||
|
||
| Field | Required | Default | Description | | ||
| -------- | -------- | ------- | ------------------------- | | ||
| filename | ✅ | | Path to file to write to. | | ||
kevinhu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Questions | ||
|
||
If you've got any questions on configuring this sink, feel free to ping us on [our Slack](https://slack.datahubproject.io/)! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
# Athena | ||
|
||
For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md). | ||
|
||
## Setup | ||
|
||
To install this plugin, run `pip install 'acryl-datahub[athena]'`. | ||
|
||
## Capabilities | ||
|
||
This plugin extracts the following: | ||
|
||
- Metadata for databases, schemas, and tables | ||
- Column types associated with each table | ||
|
||
## Quickstart recipe | ||
|
||
Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options. | ||
|
||
For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes). | ||
|
||
```yml | ||
source: | ||
type: athena | ||
config: | ||
# Coordinates | ||
aws_region: my_aws_region_name | ||
work_group: my_work_group | ||
|
||
# Credentials | ||
username: my_aws_access_key_id | ||
password: my_aws_secret_access_key | ||
database: my_database | ||
|
||
# Options | ||
s3_staging_dir: "s3://<bucket-name>/<folder>/" | ||
|
||
sink: | ||
# sink configs | ||
``` | ||
|
||
## Config details | ||
|
||
Note that a `.` is used to denote nested fields in the YAML recipe. | ||
|
||
| Field | Required | Default | Description | | ||
| ---------------------- | -------- | ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| `username` | | Autodetected | Username credential. If not specified, detected with boto3 rules. See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html | | ||
| `password` | | Autodetected | Same detection scheme as `username` | | ||
| `database` | | Autodetected | | | ||
| `aws_region` | ✅ | | AWS region code. | | ||
| `s3_staging_dir` | ✅ | | Of format `"s3://<bucket-name>/prefix/"`. The `s3_staging_dir` parameter is needed because Athena always writes query results to S3. <br />See https://docs.aws.amazon.com/athena/latest/ug/querying.html. | | ||
| `work_group` | ✅ | | Name of Athena workgroup. <br />See https://docs.aws.amazon.com/athena/latest/ug/manage-queries-control-costs-with-workgroups.html. | | ||
| `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. | | ||
| `options.<option>` | | | Any options specified here will be passed to SQLAlchemy's `create_engine` as kwargs.<br />See https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.create_engine for details. | | ||
| `table_pattern.allow` | | | Regex pattern for tables to include in ingestion. | | ||
| `table_pattern.deny` | | | Regex pattern for tables to exclude from ingestion. | | ||
| `schema_pattern.allow` | | | Regex pattern for schemas to include in ingestion. | | ||
| `schema_pattern.deny` | | | Regex pattern for schemas to exclude from ingestion. | | ||
| `view_pattern.allow` | | | Regex pattern for views to include in ingestion. | | ||
| `view_pattern.deny` | | | Regex pattern for views to exclude from ingestion. | | ||
| `include_tables` | | `True` | Whether tables should be ingested. | | ||
|
||
## Compatibility | ||
|
||
Coming soon! | ||
|
||
## Questions | ||
|
||
If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)! |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing.