Description
The csv.Reader has no way to limit the size of the fields read. In combination with LazyQuotes, this can lead to cases where a simple syntax error can make the Reader read everything until io.EOF into a single field.
Probably the simplest case (with LazyQuotes == true) is a quoted field, where there is some other rune between the closing quote and the comma (e.g. a,"b" ,c
). In this case the second field will contain all bytes until either a single quote folowed by a comma or EOF is found. (See my comment in #8059)
This behaviour can lead to excessive memory usage or even OOM when reading broken CSV files.
To avoid this I propose to add a new, optional field to Reader to allow limiting the size of each field. When set the Reader would return an error as soon as it hits the limit.
Alternatively the limit could be per record instead. This could help especially with situations where FieldsPerRecord == -1, because there can be a different, basically unlimitted number of fields per record so a limit of 100 bytes per field doesn't help with e.g. missing new lines leading to records with many fields.
The new field would be specified as (as proposed by @bradfitz)
type Reader struct {
...
// BytesPerField optionally specifies the maximum allowed number of bytes
// per field. Zero means no limit.
BytesPerField int