Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the number of parquet files generation for Direct Runner #1063

Closed
chandrashekar-s opened this issue May 14, 2024 · 0 comments · Fixed by #1070
Closed

Fix the number of parquet files generation for Direct Runner #1063

chandrashekar-s opened this issue May 14, 2024 · 0 comments · Fixed by #1070
Assignees
Labels
bug Something isn't working P1:must As issue that definitely needs to be implemented in near future.

Comments

@chandrashekar-s
Copy link
Collaborator

The number of parquet files created for each resource type in case of DirectRunner was equal to the parallelism parameter being passed in the FhirEtlOptions, however after the changes made in this PR, the number of parquet files created is equal to the number of FHIR resources, i.e. for a total of 12K Observations now we create 12K parquet files. This can cause performance issues while reading back the data. This has to be fixed.

@bashir2 bashir2 self-assigned this May 14, 2024
@bashir2 bashir2 added bug Something isn't working P2:should An issue to be addressed in a quarter or so. labels May 14, 2024
@bashir2 bashir2 added P1:must As issue that definitely needs to be implemented in near future. and removed P2:should An issue to be addressed in a quarter or so. labels May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1:must As issue that definitely needs to be implemented in near future.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants