Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest/dynamoDB): flatten struct fields #9852

Merged
merged 2 commits into from
Feb 15, 2024

Conversation

TonyOuyangGit
Copy link
Contributor

This PR implements flattening the map attribute type when scanning table items for DynamoDB ingestion. The majority of expanding nested field code logic is adopted from
metadata-ingestion/src/datahub/ingestion/source/schema_inference/object.py, where it recursively calls append_schema for map data type field and constructs the field path delimited by FIELD_DELIMITER.

According to data types supported in DynamoDB in aws docs, List and Map type both support recursive structure and since it would add more complexity for expanding list or list of maps, for now we'll only expand Map type and will handle expanding list in the future.

This PR also adopts a
fix in MongoDB to sort by count and delimiter_name when downsampling the table schema

Updated test_dynamodb.py to add List and Map type items into test table and Map type nested fields are ingested correctly


Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

…datahub-project#286)

This PR implements flattening the `map` attribute type when scanning
table items for DynamoDB ingestion. The majority of expanding nested
field code logic is adopted from
`metadata-ingestion/src/datahub/ingestion/source/schema_inference/object.py`,
where it recursively calls `append_schema` for `map` data type field and
constructs the field path delimited by `FIELD_DELIMITER`.

According to data types supported in DynamoDB in aws
[docs](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.DataTypes),
`List` and `Map` type both support recursive structure and since it
would add more complexity for expanding list or list of maps, for now
we'll only expand `Map` type and will handle expanding list in the
future.

This PR also adopts a
[fix](datahub-project#9612) in MongoDB
to sort by `count` and `delimiter_name` when downsampling the table
schema

Updated `test_dynamodb.py` to add `List` and `Map` type items into test
table and `Map` type nested fields are ingested correctly

---------

Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Feb 14, 2024
Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly looks good, just one minor comment

schema.values(),
key=lambda x: (
-x["count"],
x["delimited_name"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like we're already sorting a few lines down - can we just do the more sophisticated sort below instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestion! That sounds good to me, it looks like same sophisticated sort can apply to MongoDB as well, as this is referred from mongodb.py, I can update this same change in my MongoDB PR as well

@hsheth2 hsheth2 changed the title feat(DynamoDB): Implement flattening of struct fields in DynamoDB ingestion feat(ingest/dynamoDB): flatten struct fields Feb 15, 2024
@hsheth2 hsheth2 merged commit ae1806f into datahub-project:master Feb 15, 2024
54 checks passed
dushayntAW pushed a commit to dushayntAW/datahub that referenced this pull request Feb 26, 2024
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants