Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure only imaging studies are processed from OMOP ES parquet files #212

Closed
stefpiatek opened this issue Jan 8, 2024 · 2 comments · Fixed by #271
Closed

Ensure only imaging studies are processed from OMOP ES parquet files #212

stefpiatek opened this issue Jan 8, 2024 · 2 comments · Fixed by #271
Assignees
Milestone

Comments

@stefpiatek
Copy link
Contributor

stefpiatek commented Jan 8, 2024

Definition of Done / Acceptance Criteria

When reading data from OMOP ES, only imaging procedures (accessino number is not None) are added to PIXL queues

Testing

No changes needed.

Current status

  • Hardcoded to use only ng tube and chest X-rays in Filter procedures to ngtube and chest x-ray #207, had considered a way to get configuration but we can filter where the accession number is not None as only imaging results will have that field filled in the link file
  • We may have to edit the test parquet files so that only the two expected images have the accession number filled in, have a feeling that they were another imaging modality, but the tests should only have two images (e.g. the current filtering output is correct)
@stefpiatek stefpiatek added this to the 100-days milestone Jan 8, 2024
@stefpiatek stefpiatek changed the title Configuration to filter omop data by procedure concept Ensure only imaging studies are processed from OMOP ES parquet files Jan 15, 2024
@stefpiatek stefpiatek modified the milestones: 100-days, VOXL Jan 29, 2024
@stefpiatek stefpiatek assigned peshence and unassigned stefpiatek Jan 30, 2024
@peshence
Copy link
Contributor

@stefpiatek this is public/PROCEDURE_OCCURRENCE.parquet

{"procedure_occurrence_id":1,"person_id":1,"procedure_concept_id":4200610,"procedure_date":"2021-07-01T00:00:00.000Z","procedure_datetime":1625127300000,"procedure_type_concept_id":32817,"procedure_end_date":null,"procedure_end_datetime":null,"modifier_concept_id":0,"quantity":null,"visit_occurrence_id":null,"procedure_source_value":null,"procedure_source_concept_id":0,"modifier_source_value":null}
{"procedure_occurrence_id":2,"person_id":1,"procedure_concept_id":4058335,"procedure_date":"2021-07-01T00:00:00.000Z","procedure_datetime":1625140800000,"procedure_type_concept_id":32817,"procedure_end_date":null,"procedure_end_datetime":null,"modifier_concept_id":0,"quantity":null,"visit_occurrence_id":null,"procedure_source_value":null,"procedure_source_concept_id":0,"modifier_source_value":null}
{"procedure_occurrence_id":3,"person_id":2,"procedure_concept_id":4327032,"procedure_date":"2020-05-01T00:00:00.000Z","procedure_datetime":1588339813000,"procedure_type_concept_id":32817,"procedure_end_date":null,"procedure_end_datetime":null,"modifier_concept_id":0,"quantity":null,"visit_occurrence_id":null,"procedure_source_value":null,"procedure_source_concept_id":0,"modifier_source_value":null}
{"procedure_occurrence_id":4,"person_id":2,"procedure_concept_id":4163872,"procedure_date":"2020-05-23T00:00:00.000Z","procedure_datetime":1590240671000,"procedure_type_concept_id":32817,"procedure_end_date":null,"procedure_end_datetime":null,"modifier_concept_id":0,"quantity":null,"visit_occurrence_id":null,"procedure_source_value":null,"procedure_source_concept_id":0,"modifier_source_value":null}
{"procedure_occurrence_id":5,"person_id":2,"procedure_concept_id":4163872,"procedure_date":"2020-05-23T00:00:00.000Z","procedure_datetime":1590240940000,"procedure_type_concept_id":32817,"procedure_end_date":null,"procedure_end_datetime":null,"modifier_concept_id":0,"quantity":null,"visit_occurrence_id":null,"procedure_source_value":null,"procedure_source_concept_id":0,"modifier_source_value":null}
{"procedure_occurrence_id":6,"person_id":3,"procedure_concept_id":4327032,"procedure_date":"2015-05-01T00:00:00.000Z","procedure_datetime":1430487013000,"procedure_type_concept_id":32817,"procedure_end_date":null,"procedure_end_datetime":null,"modifier_concept_id":0,"quantity":null,"visit_occurrence_id":null,"procedure_source_value":null,"procedure_source_concept_id":0,"modifier_source_value":null}

Only one of the ids mentioned in the code is here, and one is duplicated (the whole row is). Is that intentional?

Would it make sense to keep the test data in some human readable format (so we can track it on github too) and only convert to parquet at test time? Or alternatively keep both (this introduces chance of them not matching)?

@stefpiatek
Copy link
Contributor Author

stefpiatek commented Jan 30, 2024

Only one of the ids mentioned in the code is here, and one is duplicated (the whole row is). Is that intentional?
Yeah they're the same imaging type for the same person but taken at a different time

Would it make sense to keep the test data in some human readable format (so we can track it on github too) and only convert to parquet at test time? Or alternatively keep both (this introduces chance of them not matching)?

I'd be up for having a helper test function that takes a CSV/json/toml of input data and then splits it out into parquet files in the format that we expect. Had been an option earlier that wasn't used in #159 in the testing heading

peshence added a commit that referenced this issue Jan 31, 2024
…212  (#271)

* feat[cli]: load images on accessionnumber not null

* remove accession numbers from example parquet files in CLI and system tests for all non-imaging rows
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants