-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Really poor performance with streaming parser #79
Comments
Can you please add your query. |
The issue is actually separate from any query. The problem is in
|
I created a branch with your message and added a performance benchmark I am running 2.2 Ghz I7 2016 MacBookPro and here are the performance benchmarks I got
|
Right, this is the workaround. In your benchmark, you load the entire stream into memory first, and then use a If you change the benchmark to use the StreamTextReader directly, like this:
then I think you'll run into the bug I'm seeing. So while not using the StreamTextReader is a potential solution, there still exists a bug in the stream parsing logic for really long lines. |
Parsing an HL7 file with really long lines results in unusable performance. In most of the ORU_R01 messages I deal with, we have some OBX segments with embedded PDF files (base64 encoded). This results in lines in the PDF being millions of bytes long. So the problem is that in
StreamTextCursor.ParseText
it does a full parse after every time it loads a new chunk because it reaches the end of the data before it reaches the end of the line.Here is an HL7 attachment that can help illustrate the issue. It's parsed like this:
HR7Message.txt
The text was updated successfully, but these errors were encountered: