-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper stream support #15
Comments
I haven’t quite figured out how to accomplish this with transforms. It appears that whenever any transform (e.g. |
@andylolz that sounds about right. We basically need a way to say: EOF (or rather EOS = End of Stream) in a sensible way. The complication used to be that we returned null to indicate drop this row so couldn't use null as indicator for end of file but maybe that is now possible with latest refactors :-) |
Yep – so now we have to explicitly push rows downstream – not pushing the row indicates “drop this row”. |
@andylolz can we close this issue now? My understanding from your comments is that we've fixed the issue this was addressing. @andylolz one thing it would be good to clarify though - do we still end up processing all rows in a file even when we are finished. e.g. suppose we have head as an op - then the delete should basically want to stop parsing any new rows after the first X - how does this work atm? How does a transform tell things upstream to stop processing? This is important for large files - e.g. if you give me a 1GB CSV and i do head on it i don't want to keep streaming the CSV! |
Yep – agreed. I think this can be closed.
Nope. Anyway, you can actually test this out and see. Here’s a 20mb file… Let’s delete rows 500-1000, and then take the first 20 rows. In practice, we don’t ever perform the Pretty cool. |
@andylolz somehow missed this comment first time round. This is super cool. So, to confirm, we're sure the example you gave only ends processing a tiny fraction of the 20MB file as a whole (is there a way for us to check this?) If so can we close this :-) - and again - add a nice comment to the front page about this awesome feature with your example (btw if we did go with your example i think it would be even nicer to have you delete first 500 rows - it makes more sense given the head 20) |
Aha… adding some logging shows this is not quite working yet :\ although the response completes, we do continue processing the whole file currently. I’ll have another look at this one. |
@andylolz cool - i'll be on #okfn much of the day if you want to chat! |
At the moment e.g. head operations read the whole file even if we just want the first part.
Instead we should allow cut-off in a sensible way. To see how to hack this up with csv module see https://github.com/okfn/data.okfn.org/blob/78dc75bad035c36744b46badb258fbf63ed8d016/routes/index.js#L84
The text was updated successfully, but these errors were encountered: