You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I see from above that record batches will be parsed (esp. decompression could be rather intensive computation workload) in parallel if the Julia runtime has multithread enabled, which is great.
But according to the implementation, the original order of batches as they had been written will not be guaranteed as preserved, which I think is not ideal. I'm not sure how Arrow spec should say about this aspect, but I'm dealing with time series data recorded batch-by-batch where the order signifies a lot.
I'd like to draft a PR to preserve batch order with regard to this concern, and as I start tinkering with the codebase, I file this issue to ask your opinions about it.
(Btw, I'm also tinkering about a PR for #293, which is orthogonal wrt functionality, but seems closely related wrt implementation details. I'd think 2 separate PRs would make better clarity for review and release purpose, but if you can accept a single PR addressing the 2 things together, it could be a lot easier for me, given I'm not fluent in git rebasing and related skills.)
The text was updated successfully, but these errors were encountered:
We already have a utility defined (`OrderedChannel`) that we use when
writing record batches to ensure batches get _written_ in the same order
they are provided; it makes sense to use the same utility when reading
to ensure incoming record batches are _read_ in the appropriate order.
arrow-julia/src/table.jl
Lines 276 to 293 in 614fce0
I see from above that record batches will be parsed (esp. decompression could be rather intensive computation workload) in parallel if the Julia runtime has multithread enabled, which is great.
But according to the implementation, the original order of batches as they had been written will not be guaranteed as preserved, which I think is not ideal. I'm not sure how Arrow spec should say about this aspect, but I'm dealing with time series data recorded batch-by-batch where the order signifies a lot.
I'd like to draft a PR to preserve batch order with regard to this concern, and as I start tinkering with the codebase, I file this issue to ask your opinions about it.
(Btw, I'm also tinkering about a PR for #293, which is orthogonal wrt functionality, but seems closely related wrt implementation details. I'd think 2 separate PRs would make better clarity for review and release purpose, but if you can accept a single PR addressing the 2 things together, it could be a lot easier for me, given I'm not fluent in git rebasing and related skills.)
The text was updated successfully, but these errors were encountered: