scannerloop stops after encountering a very long line #81

timmattison · 2024-01-31T13:52:14Z

If you have a line longer than the 1MB buffer length (don't ask) the scannerloop's scanner.Scan() for condition will evaluate to false. When this happens line counting for the current file stops where it is and reports incorrect results for that file.

gocloc/file.go

Line 90 in 7b24285

for scanner.Scan() {

I could see a few fixes for this.

A new option to set the buffer size with a maximum of 1MB being the default if it is unset:

	if opts.MaxLineLength > 0 {
		scanner.Buffer(buf.Bytes(), opts.MaxLineLength)
	} else {
		scanner.Buffer(buf.Bytes(), 1024*1024)
	}

Scanning the files ahead of time to find the longest gap between line endings and then automatically setting that as the buffer size. This does require reading the file twice though.
Changing the scannerloop to use something like mmap instead of scanner.

If you're interested in the third one let me know and I'll work on a PR.

The first one probably touches a bit more of the overall design than I should take on for a first PR.

I think the second one is safe but it does double the I/O required. Disk caching may make this less of an issue than doubling the amount of raw data read from disk but still feels like a last resort.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scannerloop stops after encountering a very long line #81

scannerloop stops after encountering a very long line #81

timmattison commented Jan 31, 2024

scannerloop stops after encountering a very long line #81

scannerloop stops after encountering a very long line #81

Comments

timmattison commented Jan 31, 2024