Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit line length to 100 MB to not exhaust system memory on /dev/zero #1972

Closed
wants to merge 1 commit into from

Conversation

Enselic
Copy link
Collaborator

@Enselic Enselic commented Dec 6, 2021

Here is a simple fix for #636 that I believe also allows us to close our oldest open PR #712.

% bat /dev/zero
───────┬────────────────────────────────────────────────────
       │ File: /dev/zero   <BINARY>
───────┴────────────────────────────────────────────────────
[bat error]: Lines longer than 100 MB are not supported

Performance is not affected by this change. Here is git master vs this PR:

max-100mb-line-benchmark

@@ -251,7 +251,7 @@ pub(crate) struct InputReader<'a> {
impl<'a> InputReader<'a> {
fn new<R: BufRead + 'a>(mut reader: R) -> InputReader<'a> {
let mut first_line = vec![];
reader.read_until(b'\n', &mut first_line).ok();
Self::read_line_safely(&mut reader, &mut first_line).ok();
Copy link
Collaborator Author

@Enselic Enselic Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We ignore errors here as before, but I don't think that is a problem in practice. Not ignoring the error here is likely to cascade into a larger change with higher risk.

fn read_line_safely<R: BufRead + 'a>(reader: &mut R, buf: &mut Vec<u8>) -> io::Result<usize> {
let limit = 100_000_000_usize;

match reader.take(limit as u64).read_until(b'\n', buf) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: take does not actually take anything, so the name is a bit odd. It just puts an upper limit. See https://doc.rust-lang.org/std/io/trait.Read.html#method.take

Copy link
Owner

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into this!

I believe I'm not in favor of adding this fix. I would like to maintain the drop-in-cat-replacement property of bat (and the POSIX compliance, as far as possible). If I understand correctly, this chance could very well cause problems in real-world use cases where bat is used in a shell pipeline to loop-through large (binary) files.

I have also personally seen single-line JSON files on the order of tens, if not hundreds of MiB. Sure, it's unlikely that someone wants to see 100MiB of syntax highlighted JSON on the terminal. But a generic "100 MB lines are not supported" error seems like an unnecessary limitation.

@Enselic
Copy link
Collaborator Author

Enselic commented Dec 8, 2021

Oh, right, I actually forgot about the cat compatibility use case when writing this code. I agree it is quite likely that people setup pipelines with huge files that does not have a newline char every 100 MB.

Good catch. I can't think if an easy way to make this PR "work", so I'll just close this PR.

@Enselic Enselic closed this Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants