Fix reading CSV from non seekable network stream #1472
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When using the
FSharp.Data.Csv.Core
package to read massive CSV data from an AWS S3 bucket we had some issues because sometimes many rows got lost.The problem seems to be related to the
StreamReader.Peek()
method that is used. WhenStreamReader.Peek()
returns-1
the whole reading stops with the current implementation. The documentation states that-1
is not only returned in case of the end of the stream, but also if the stream is not seekable and the stream doesn't read all data that was requested.This can be fixed by using
StreamReader.Read()
instead, because this method is blocking. I have also tried to add a unit test to simulate the problem with a non seekable stream that always reads just 1 byte when callingStream.Read(buffer, offset, count)
.