Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of --activePeriod feature. #152

Merged
merged 5 commits into from
Apr 14, 2021
Merged

Conversation

bashir2
Copy link
Collaborator

@bashir2 bashir2 commented Apr 3, 2021

Description of what I changed

This implements issue #128 by adding an --activePeriod feature. For details, see this comment.

E2E test

TESTED:
Ran:

$ java -cp batch/target/fhir-batch-etl-bundled-0.1.0-SNAPSHOT.jar org.openmrs.analytics.FhirEtl \
  --openmrsServerUrl=http://localhost:8099/openmrs --resourceList=Patient,Encounter,Observation \
  --targetParallelism=10 --numFileShards=3 --secondsToFlushFiles=1200 \
  --outputParquetPath=[PATH] --activePeriod=2020-11-10T00:00:00

Then compared the list of patientIds for whom the historical resources were fetched to uuids of the following MySQL query (this is not a perfect query but works for this case):

SELECT person.uuid, patient_id, COUNT(0) AS num_encounters
FROM person, encounter
WHERE person.person_id=encounter.patient_id AND encounter.voided=0
  AND encounter_datetime >= "2020-11-10"
GROUP BY patient_id ORDER BY patient_id

This list includes 171 patients (with the big test DB) and there are 17321 Observations associated to these patients which was confirmed by this MySQL query:

SELECT COUNT(0)
FROM obs, encounter
WHERE obs.encounter_id=encounter.encounter_id AND obs.voided=0 AND patient_id IN (
  SELECT patient_id FROM encounter WHERE encounter_datetime >= "2020-11-10");

Also ran this to check the two dates version of --activePeriod:

$ java -cp batch/target/fhir-batch-etl-bundled-0.1.0-SNAPSHOT.jar org.openmrs.analytics.FhirEtl \
  --openmrsServerUrl=http://localhost:8099/openmrs --resourceList=Patient,Encounter,Observation \
  --targetParallelism=10 --numFileShards=3 --secondsToFlushFiles=1200 \
  --outputParquetPath=[PATH] --activePeriod=2020-11-10T00:00:00_2020-11-20

Checklist: I completed these to help reviewers :)

  • My IDE is configured to follow the code style of this project.

    No? Unsure? -> configure your IDE, format the code and add the changes with git add . && git commit --amend

  • I am familiar with Google Style Guides for the language I have coded in.

    No? Please take some time and review Java and Python style guides. Note, when in conflict, OpenMRS style guide overrules.

  • I have added tests to cover my changes. (If you refactored existing code that was well tested you do not have to add tests) [More unit-tests as TODO.]

    No? -> write tests and add them to this commit git add . && git commit --amend

  • I ran mvn clean package right before creating this pull request and added all formatting changes to my commit.

  • All new and existing tests passed.

    No? -> figure out why and add the fix to your commit. It is your responsibility to make sure your code works.

  • My pull request is based on the latest changes of the master branch.

    No? Unsure? -> execute command git pull --rebase upstream master

@codecov
Copy link

codecov bot commented Apr 3, 2021

Codecov Report

Merging #152 (85a7e79) into master (a65f482) will decrease coverage by 7.26%.
The diff coverage is 7.09%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #152      +/-   ##
============================================
- Coverage     42.76%   35.50%   -7.27%     
- Complexity       97      102       +5     
============================================
  Files            21       24       +3     
  Lines           774      969     +195     
  Branches         67       87      +20     
============================================
+ Hits            331      344      +13     
- Misses          421      597     +176     
- Partials         22       28       +6     
Impacted Files Coverage Δ Complexity Δ
...ava/org/openmrs/analytics/FetchPatientHistory.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...main/java/org/openmrs/analytics/FetchPatients.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
.../java/org/openmrs/analytics/FetchSearchPageFn.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...h/src/main/java/org/openmrs/analytics/FhirEtl.java 0.00% <0.00%> (ø) 0.00 <0.00> (ø)
...n/java/org/openmrs/analytics/DebeziumListener.java 59.64% <ø> (ø) 5.00 <0.00> (ø)
...ain/java/org/openmrs/analytics/FhirSearchUtil.java 15.45% <2.38%> (-43.17%) 7.00 <4.00> (+1.00) ⬇️
...ain/java/org/openmrs/analytics/FetchResources.java 22.22% <22.22%> (ø) 2.00 <2.00> (?)
...main/java/org/openmrs/analytics/JdbcFetchUtil.java 59.03% <50.00%> (ø) 11.00 <0.00> (ø)
...c/main/java/org/openmrs/analytics/OpenmrsUtil.java 70.00% <50.00%> (-11.82%) 6.00 <2.00> (+2.00) ⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a65f482...85a7e79. Read the comment docs.

@bashir2 bashir2 changed the title Date review Implementation of --activePeriod feature. Apr 3, 2021
@bashir2 bashir2 requested a review from kimaina April 3, 2021 09:06
throw new IllegalArgumentException("--activePeriod is not supported in JDBC mode.");
}
Set<String> resourceSet = Sets.newHashSet(options.getResourceList().split(","));
if (resourceSet.contains("Patinet")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patient?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
Please note that I changed the way Patient resources are handled in the --activePeriod mode. Hence, this check is now different.


@VisibleForTesting
static String getSubjectPatientIdOrNull(Resource resource) {
String patinetId = null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: patientId

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -24,7 +24,7 @@ ENV OPENMRS_PASSWORD="Admin123"
ENV SINK_PATH=""
ENV SINK_USERNAME=""
ENV SINK_PASSWORD=""
ENV SEARCH_LIST="Patient,Encounter,Observation"
ENV RESOURCE_LIST="Patient,Encounter,Observation"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and thanks for catching this. Also updated the JDBC code-paths for consistency.

Copy link
Collaborator

@kimaina kimaina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, just a few NITs. I'll also comment on the results from yesterday's discussion!

Copy link
Collaborator Author

@bashir2 bashir2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the tests we did last week, I changed this new feature to fetch Patient resources separately. The reason is that fetching all Patients may still time-out if there are millions of them. With the new implementation, only Patient resources are fetched that have an Observation/Encounter in the active period.

@@ -24,7 +24,7 @@ ENV OPENMRS_PASSWORD="Admin123"
ENV SINK_PATH=""
ENV SINK_USERNAME=""
ENV SINK_PASSWORD=""
ENV SEARCH_LIST="Patient,Encounter,Observation"
ENV RESOURCE_LIST="Patient,Encounter,Observation"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and thanks for catching this. Also updated the JDBC code-paths for consistency.


@VisibleForTesting
static String getSubjectPatientIdOrNull(Resource resource) {
String patinetId = null;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

throw new IllegalArgumentException("--activePeriod is not supported in JDBC mode.");
}
Set<String> resourceSet = Sets.newHashSet(options.getResourceList().split(","));
if (resourceSet.contains("Patinet")) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
Please note that I changed the way Patient resources are handled in the --activePeriod mode. Hence, this check is now different.

Copy link
Collaborator

@kimaina kimaina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the suggestion! This looks great!

@bashir2 bashir2 merged commit a540b5f into google:master Apr 14, 2021
@bashir2 bashir2 deleted the date_review branch April 22, 2021 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants