-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ETL-294] create glue table stack #10
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 Looks great @rxu17 . I'll let @philerooski take another pass at this.
- Name: Device | ||
Type: struct<Name:string,Model:string,Manufacturer:string,HardwareVersion:string,SoftwareVersion:string,FirmwareVersion:string,LocalIdentifier:string,FDAIdentifier:string> | ||
- Name: Metadata | ||
Type: struct<HKMetadataKeySyncVersion:string,HKVO2MaxTestType:string,HKMetadataKeySyncIdentifier:string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these arbitrary JSON fields, reference the tables in test-database
that I created using the crawlers. You'll find a few more fields there. For example, this is the struct for the Metadata field:
{
"metadata": {
"HKWasUserEntered": "string",
"HKAlgorithmVersion": "string",
"HKMetadataKeyHeartRateMotionContext": "string",
"HKDateOfEarliestDataUsedForEstimate": "string",
"HKMetadataKeyAppleDeviceCalibrated": "string",
"HKTimeZone": "string",
"HKMetadataKeySyncVersion": "string",
"HKVO2MaxTestType": "string",
"HKMetadataKeySyncIdentifier": "string",
"HKMetadataKeyDevicePlacementSide": "string"
}
}
@@ -0,0 +1,461 @@ | |||
tables: | |||
EnrolledParticipants: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every table needs at least one of export_start_date
or export_end_date
fields. If only a single date is included in the file name (as it is for the EnrolledParticipants data type), then only export_end_date
is needed. Otherwise both fields are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! I think I went down the route of seeing if the datatypes themselves have a start date or end date variable in their schemas instead
templates/glue-tables.j2
Outdated
Properties: | ||
CatalogId: !Ref AWS::AccountId | ||
DatabaseInput: | ||
Description: !Sub 'Recover database' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include a reference to the namespace in the Description
templates/glue-tables.j2
Outdated
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | ||
StoredAsSubDirectories: false | ||
TableType: EXTERNAL_TABLE | ||
{% endfor %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is indented once too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this from BridgeDownstream and it had it as well. Will make a note to correct it there for syntax purposes. It doesn't appear to affect the code's working ability at least here.
I just noticed that at least one of the table names don't match the data types. For example, what in Re-request my review once you've made all your changes. |
Purpose: Creates a glue tables' stack along with relevant configs and table schemas that dynamically create glue tables for the recover datasets for each data type