-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest/dynamoDB): flatten struct fields #9852
feat(ingest/dynamoDB): flatten struct fields #9852
Conversation
…datahub-project#286) This PR implements flattening the `map` attribute type when scanning table items for DynamoDB ingestion. The majority of expanding nested field code logic is adopted from `metadata-ingestion/src/datahub/ingestion/source/schema_inference/object.py`, where it recursively calls `append_schema` for `map` data type field and constructs the field path delimited by `FIELD_DELIMITER`. According to data types supported in DynamoDB in aws [docs](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.DataTypes), `List` and `Map` type both support recursive structure and since it would add more complexity for expanding list or list of maps, for now we'll only expand `Map` type and will handle expanding list in the future. This PR also adopts a [fix](datahub-project#9612) in MongoDB to sort by `count` and `delimiter_name` when downsampling the table schema Updated `test_dynamodb.py` to add `List` and `Map` type items into test table and `Map` type nested fields are ingested correctly --------- Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly looks good, just one minor comment
schema.values(), | ||
key=lambda x: ( | ||
-x["count"], | ||
x["delimited_name"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like we're already sorting a few lines down - can we just do the more sophisticated sort below instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your suggestion! That sounds good to me, it looks like same sophisticated sort can apply to MongoDB as well, as this is referred from mongodb.py
, I can update this same change in my MongoDB PR as well
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
This PR implements flattening the
map
attribute type when scanning table items for DynamoDB ingestion. The majority of expanding nested field code logic is adopted frommetadata-ingestion/src/datahub/ingestion/source/schema_inference/object.py
, where it recursively callsappend_schema
formap
data type field and constructs the field path delimited byFIELD_DELIMITER
.According to data types supported in DynamoDB in aws docs,
List
andMap
type both support recursive structure and since it would add more complexity for expanding list or list of maps, for now we'll only expandMap
type and will handle expanding list in the future.This PR also adopts a
fix in MongoDB to sort by
count
anddelimiter_name
when downsampling the table schemaUpdated
test_dynamodb.py
to addList
andMap
type items into test table andMap
type nested fields are ingested correctlyChecklist