Skip to content

Commit

Permalink
docs(ingest): clarify setuptools requirement (#2177)
Browse files Browse the repository at this point in the history
  • Loading branch information
hsheth2 authored Mar 5, 2021
1 parent b8b023e commit 0d374f3
Showing 1 changed file with 11 additions and 10 deletions.
21 changes: 11 additions & 10 deletions metadata-ingestion/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,6 @@ Before running any metadata ingestion job, you should make sure that DataHub bac

<!-- You can run this ingestion framework by building from source or by running docker images. -->

### Migrating from the old scripts
If you were previously using the `mce_cli.py` tool to push metadata into DataHub: the new way for doing this is by creating a recipe with a file source pointing at your JSON file and a DataHub sink to push that metadata into DataHub.
This [example recipe](./examples/recipes/example_to_datahub_rest.yml) demonstrates how to ingest the [sample data](./examples/mce_files/bootstrap_mce.json) (previously called `bootstrap_mce.dat`) into DataHub over the REST API.
Note that we no longer use the `.dat` format, but instead use JSON. The main differences are that the JSON uses `null` instead of `None` and uses objects/dictionaries instead of tuples when representing unions.

If you were previously using one of the `sql-etl` scripts: the new way for doing this is by using the associated source. See [below](#Sources) for configuration details. Note that the source needs to be paired with a sink - likely `datahub-kafka` or `datahub-rest`, depending on your needs.

### Building from source:

#### Pre-Requisites
Expand All @@ -39,7 +32,7 @@ If you were previously using one of the `sql-etl` scripts: the new way for doing
```sh
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip wheel
pip install --upgrade pip wheel setuptools
pip install -e .
./scripts/codegen.sh
```
Expand All @@ -51,7 +44,7 @@ Common issues:

This means Python's `wheel` is not installed. Try running the following commands and then retry.
```sh
pip install --upgrade pip wheel
pip install --upgrade pip wheel setuptools
pip cache purge
```
</details>
Expand All @@ -68,7 +61,7 @@ Common issues:
The underlying `avro-python3` package is buggy. In particular, it often only installs correctly when installed from a pre-built "wheel" but not when from source. Try running the following commands and then retry.
```sh
pip uninstall avro-python3 # sanity check, ok if this fails
pip install --upgrade pip wheel
pip install --upgrade pip wheel setuptools
pip cache purge
pip install avro-python3
```
Expand Down Expand Up @@ -179,6 +172,7 @@ source:
config:
username: user
password: pass
host_port: localhost:1433
database: DemoDatabase
table_pattern:
allow:
Expand Down Expand Up @@ -344,6 +338,13 @@ sink:
filename: ./path/to/mce/file.json
```

## Migrating from the old scripts
If you were previously using the `mce_cli.py` tool to push metadata into DataHub: the new way for doing this is by creating a recipe with a file source pointing at your JSON file and a DataHub sink to push that metadata into DataHub.
This [example recipe](./examples/recipes/example_to_datahub_rest.yml) demonstrates how to ingest the [sample data](./examples/mce_files/bootstrap_mce.json) (previously called `bootstrap_mce.dat`) into DataHub over the REST API.
Note that we no longer use the `.dat` format, but instead use JSON. The main differences are that the JSON uses `null` instead of `None` and uses objects/dictionaries instead of tuples when representing unions.

If you were previously using one of the `sql-etl` scripts: the new way for doing this is by using the associated source. See [below](#Sources) for configuration details. Note that the source needs to be paired with a sink - likely `datahub-kafka` or `datahub-rest`, depending on your needs.

## Contributing

Contributions welcome!
Expand Down

0 comments on commit 0d374f3

Please sign in to comment.