-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/stream decode #612
Conversation
69ab6f7
to
85ae6b8
Compare
Hi @raeldor, Excellent work! Thank you. At this moment, two test cases fail.
So we need to adjust the code to make it work for queries with only column selections and queries that include attributes. |
85ae6b8
to
80ceebe
Compare
@raeldor The idea is wonderful! Didn't know such things existed. Thank you for brining it in. |
Hi @rkvinoth, I would like to share my findings regarding the memory footprint and performance of the new approach. When querying a dataset of 1.5 M cells, the memory of the python process didn't go above 700MB while with the old code it went as high as 5GB. In terms of performance new approach is slightly slower. Based on that data should we perhaps make the iterative JSON parsing optional? |
@MariusWirtz Thank you for the performance test. I agree that the new parsing method should be made optional. Not all systems worry about memory and most of us parallelize the query. |
I behave not had much time to review the code, but I did read about ijson a bit.
Looks like we have some opportunities to seed it up
https://github.com/ICRAR/ijson#id21
…Sent from my mobile phone
On Sep 25, 2021 9:38 AM, Vinoth Kumar Ravi ***@***.***> wrote:
@MariusWirtz<https://github.com/MariusWirtz> Thank you for the performance test.
I agree that the new parsing method should be made optional. Not all systems worry about memory and most of us parallelize the query.
-
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#612 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEK7GZTEAKXIOG3RXNV73KLUDX3GJANCNFSM5ERAKMWA>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
do we have a path to merge this branch eventually? |
I need to fix the code so that it doesn't break the two tests. Then we merge. I plan to get to it earliest next week. If someone wants to go ahead and fix it, feel free to create a Pull Request based on the latest commit. |
…N to CSV by streaming rather than using dict to reduce memory usage on large data sets.
d445ef6
to
e41e372
Compare
e41e372
to
9276ed4
Compare
3f7c12c
to
a4a321b
Compare
Unfortunately, I haven't been able to make if iterative_json_parsing and include_attributes:
raise ValueError("Iterative JSON parsing must not be used together with include_attributes") |
@raeldor |
Could you provide some more detail on what the trouble seems to be? |
Thanks for asking. Perhaps we will figure this out together if we chat about it. For the prefixes_of_interest = ['Cells.item.Value', 'Axes.item.Tuples.item.Members.item.Name',
'Cells.item.Ordinal', 'Axes.item.Tuples.item.Ordinal', 'Cube.Dimensions.item.Name',
'Axes.item.Ordinal'] To catch the attributes, we would need to add prefixes for (potential) attributes, such as: prefixes_of_interest.append('Axes.item.Tuples.item.Members.item.Attributes.Color')
prefixes_of_interest.append('Axes.item.Tuples.item.Members.item.Attributes.Size')
prefixes_of_interest.append('Axes.item.Tuples.item.Members.item.Attributes.Manager') and catch the events, kinda like this: elif (prefix, event) == ('Axes.item.Tuples.item.Members.item.Attributes.Color', 'map_key'):
attribute_name = "Color"
attribute_value = value # e.g., 'red', 'green' I managed to find the attribute-name and attribute-value pairs in the JSON, but I failed to consume them and integrate them into the CSV, not to mention getting the CSV header line right. I will push my current state into a WIP PR. Feedback and contribution are very welcome. I would like to get this to work it just seemed a hard nut to crack so I decided to move on for the moment. |
Continuation of #606
Based on 5e89a15