feat(ingest/dynamoDB): flatten struct fields #9852

TonyOuyangGit · 2024-02-14T19:25:15Z

This PR implements flattening the map attribute type when scanning table items for DynamoDB ingestion. The majority of expanding nested field code logic is adopted from
metadata-ingestion/src/datahub/ingestion/source/schema_inference/object.py, where it recursively calls append_schema for map data type field and constructs the field path delimited by FIELD_DELIMITER.

According to data types supported in DynamoDB in aws docs, List and Map type both support recursive structure and since it would add more complexity for expanding list or list of maps, for now we'll only expand Map type and will handle expanding list in the future.

This PR also adopts a
fix in MongoDB to sort by count and delimiter_name when downsampling the table schema

Updated test_dynamodb.py to add List and Map type items into test table and Map type nested fields are ingested correctly

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

…datahub-project#286) This PR implements flattening the `map` attribute type when scanning table items for DynamoDB ingestion. The majority of expanding nested field code logic is adopted from `metadata-ingestion/src/datahub/ingestion/source/schema_inference/object.py`, where it recursively calls `append_schema` for `map` data type field and constructs the field path delimited by `FIELD_DELIMITER`. According to data types supported in DynamoDB in aws [docs](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.DataTypes), `List` and `Map` type both support recursive structure and since it would add more complexity for expanding list or list of maps, for now we'll only expand `Map` type and will handle expanding list in the future. This PR also adopts a [fix](datahub-project#9612) in MongoDB to sort by `count` and `delimiter_name` when downsampling the table schema Updated `test_dynamodb.py` to add `List` and `Map` type items into test table and `Map` type nested fields are ingested correctly --------- Co-authored-by: Tamas Nemeth <treff7es@gmail.com>

hsheth2

mostly looks good, just one minor comment

hsheth2 · 2024-02-15T03:57:29Z

metadata-ingestion/src/datahub/ingestion/source/dynamodb/dynamodb.py

+                schema.values(),
+                key=lambda x: (
+                    -x["count"],
+                    x["delimited_name"],


it looks like we're already sorting a few lines down - can we just do the more sophisticated sort below instead?

Thank you for your suggestion! That sounds good to me, it looks like same sophisticated sort can apply to MongoDB as well, as this is referred from mongodb.py, I can update this same change in my MongoDB PR as well

Co-authored-by: Tamas Nemeth <treff7es@gmail.com>

github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Feb 14, 2024

vercel bot deployed to Preview February 14, 2024 20:35 View deployment

hsheth2 reviewed Feb 15, 2024

View reviewed changes

address comment

655b443

TonyOuyangGit mentioned this pull request Feb 15, 2024

feat(ingest/mongodb): improve sorting when downsampling collection schema #9856

Merged

5 tasks

vercel bot deployed to Preview February 15, 2024 08:37 View deployment

hsheth2 approved these changes Feb 15, 2024

View reviewed changes

hsheth2 changed the title ~~feat(DynamoDB): Implement flattening of struct fields in DynamoDB ingestion~~ feat(ingest/dynamoDB): flatten struct fields Feb 15, 2024

hsheth2 merged commit ae1806f into datahub-project:master Feb 15, 2024
54 checks passed

dushayntAW pushed a commit to dushayntAW/datahub that referenced this pull request Feb 26, 2024

feat(ingest/dynamoDB): flatten struct fields (datahub-project#9852)

d408944

Co-authored-by: Tamas Nemeth <treff7es@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ingest/dynamoDB): flatten struct fields #9852

feat(ingest/dynamoDB): flatten struct fields #9852

TonyOuyangGit commented Feb 14, 2024

hsheth2 left a comment

hsheth2 Feb 15, 2024

TonyOuyangGit Feb 15, 2024

feat(ingest/dynamoDB): flatten struct fields #9852

feat(ingest/dynamoDB): flatten struct fields #9852

Conversation

TonyOuyangGit commented Feb 14, 2024

Checklist

hsheth2 left a comment

Choose a reason for hiding this comment

hsheth2 Feb 15, 2024

Choose a reason for hiding this comment

TonyOuyangGit Feb 15, 2024

Choose a reason for hiding this comment