-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArrayIndexOutOfBoundsException in writing Parquet files #168
Comments
a few updates on this.. using |
Tested the pipeline with |
I spent some more time on this but have not manage to reproduce the problem yet. Without a reproducible case it is hard to tell what is wrong but from more reading of the code (and some searches) it feels that one record (i.e., one FHIR resource) should be the problem; so it is not obvious why setting @kimaina can you please try to reproduce this with your data and then reduce the |
Thanks @bashir2 for following on this. Let me conduct more investigation then get back to you. Thanks, |
A few updates... Running using 1 thread led to this
Here is the specific obs
|
Thanks @kimaina but this Observation resource should not fail the pipeline. It has the BigDecimal conversion bug (Issue #156) but that bug has an try/catch guard and does not fail the pipeline (as the first line of your log dump shows too, i.e., |
Thanks @kimaina, #188 is really helpful. So the core issue seems to be #156 and the try/catch we added before is creating other issues. So the TL;DR; is that we have to properly fix #156 (adding another more generic The longer version is that all of the 4 resources you have in the new Observation bundle in #188, have the #156 bug. If you drop any of them we cannot reproduce #168 anymore. So it seems when the AvroTypeException in #156 happens we skip some of the required final steps for writing the Avro record, e.g., this call and that causes the I think I know the root cause of #156 which is a discrepancy between the value scale and schema scale that Bunsen sets for |
Precisely!
Agreed, Bunsen fix should resolve this problem! |
Let's close this issue and track this under #156. |
This is reported by @kimaina when running the batch pipeline on AMPATH DB for some specific dates. The stack trace for the exception is copied below. The error is from this line. To find the root cause we need some FHIR resources that trigger this; but while investigating this, I realized we are using a very old version of
parquet-column
which we should fix regardless of the root cause.The text was updated successfully, but these errors were encountered: