-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge_manifests fix for rows with no guid #196
Conversation
The style in this PR agrees with This formatting comment was generated automatically by a script in uc-cdis/wool. |
…nifest_fix' into fix/merge_bucket_manifest_fix
@@ -106,17 +106,23 @@ def merge_bucket_manifests( | |||
|
|||
headers = set() | |||
all_rows = {} | |||
records_with_no_guid = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move records_with_no_guid
initialization, as well as the for record in records_with_no_guid
loop to outside the for manifest in files:
loop. This would process the records with no guid after all manifests rows have been initially processed.
# and NO guid was provided (e.g. we want a single combined row of updated values) | ||
any_guid_provided = [ | ||
record.get(GUID_STANDARD_KEY) | ||
for record in records_with_no_guid: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above comment re: moving this loop outside of for manifest in files
loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please write a regression test.
Also the bug fix section in your PR template is very thorough but unfortunately our release note automation won't be able to parse it properly due to the multiple line breaks.
I think something more succinct like
"Fixes issue with merge_bucket_manifests where the order of input manifests would inconsistent (and incorrect) merge behavior."
You can leave the detailed description above the headings for posterity if someone looks at this PR again. But that's right, the bullet under the heading should be succinct, like one sentence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit test provided that verifies the order of manifests does not matter when a guid is not provided in one manifest and that the end result is the same. Approved.
Jira Ticket: PXP-11205
The
merge_bucket_manifests()
function ingen3.tools.indexing.merge_manifests
should be able to merge the following files:manifest_with_guid.tsv
manifest_WITHOUT_guid.tsv
However, depending on the order of the list of files passed into
merge_bucket_manifests()
the result will either merge the two into a single row, or keep them separateThis is due to the variable
records_with_no_guid
in the aforementioned function being initialized within the for loop that is iterating through the list of files to merge. Given a row in filemanifest_WITHOUT_guid.tsv
withmd5: 2de45
, it will not be merged with the row in filemanifest_with_guid.tsv
with samemd5: 2de45
ifmanifest_WITHOUT_guid.tsv
is before in the list of files passed in. Initializing and processingrecords_with_no_guid
outside the loop fixes this behavior.New Features
Breaking Changes
Bug Fixes
Improvements
Dependency updates
Deployment changes