You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you have a line longer than the 1MB buffer length (don't ask) the scannerloop's scanner.Scan() for condition will evaluate to false. When this happens line counting for the current file stops where it is and reports incorrect results for that file.
Scanning the files ahead of time to find the longest gap between line endings and then automatically setting that as the buffer size. This does require reading the file twice though.
Changing the scannerloop to use something like mmap instead of scanner.
If you're interested in the third one let me know and I'll work on a PR.
The first one probably touches a bit more of the overall design than I should take on for a first PR.
I think the second one is safe but it does double the I/O required. Disk caching may make this less of an issue than doubling the amount of raw data read from disk but still feels like a last resort.
The text was updated successfully, but these errors were encountered:
If you have a line longer than the 1MB buffer length (don't ask) the
scannerloop
'sscanner.Scan()
for condition will evaluate to false. When this happens line counting for the current file stops where it is and reports incorrect results for that file.gocloc/file.go
Line 90 in 7b24285
I could see a few fixes for this.
Scanning the files ahead of time to find the longest gap between line endings and then automatically setting that as the buffer size. This does require reading the file twice though.
Changing the
scannerloop
to use something like mmap instead of scanner.If you're interested in the third one let me know and I'll work on a PR.
The first one probably touches a bit more of the overall design than I should take on for a first PR.
I think the second one is safe but it does double the I/O required. Disk caching may make this less of an issue than doubling the amount of raw data read from disk but still feels like a last resort.
The text was updated successfully, but these errors were encountered: