-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit line length to 100 MB to not exhaust system memory on /dev/zero
#1972
Conversation
@@ -251,7 +251,7 @@ pub(crate) struct InputReader<'a> { | |||
impl<'a> InputReader<'a> { | |||
fn new<R: BufRead + 'a>(mut reader: R) -> InputReader<'a> { | |||
let mut first_line = vec![]; | |||
reader.read_until(b'\n', &mut first_line).ok(); | |||
Self::read_line_safely(&mut reader, &mut first_line).ok(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We ignore errors here as before, but I don't think that is a problem in practice. Not ignoring the error here is likely to cascade into a larger change with higher risk.
fn read_line_safely<R: BufRead + 'a>(reader: &mut R, buf: &mut Vec<u8>) -> io::Result<usize> { | ||
let limit = 100_000_000_usize; | ||
|
||
match reader.take(limit as u64).read_until(b'\n', buf) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: take
does not actually take anything, so the name is a bit odd. It just puts an upper limit. See https://doc.rust-lang.org/std/io/trait.Read.html#method.take
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for looking into this!
I believe I'm not in favor of adding this fix. I would like to maintain the drop-in-cat
-replacement property of bat
(and the POSIX compliance, as far as possible). If I understand correctly, this chance could very well cause problems in real-world use cases where bat
is used in a shell pipeline to loop-through large (binary) files.
I have also personally seen single-line JSON files on the order of tens, if not hundreds of MiB. Sure, it's unlikely that someone wants to see 100MiB of syntax highlighted JSON on the terminal. But a generic "100 MB lines are not supported" error seems like an unnecessary limitation.
Oh, right, I actually forgot about the cat compatibility use case when writing this code. I agree it is quite likely that people setup pipelines with huge files that does not have a newline char every 100 MB. Good catch. I can't think if an easy way to make this PR "work", so I'll just close this PR. |
Here is a simple fix for #636 that I believe also allows us to close our oldest open PR #712.
Performance is not affected by this change. Here is git master vs this PR: