Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove repeated contains calls #1421

Merged
merged 2 commits into from
Dec 21, 2020

Conversation

CosmicHorrorDev
Copy link
Contributor

(I can edit changelog if you want, but this should result in no usage difference nor is it a bugfix)

Background

First off thanks for the great program (not to mention the others you've made that I also use)! I've already recommended it to several of my friends.

So just to give some background I was using bat to read a file and pipe the info through some other programs. I was a bit confused why changes to the program weren't mattering when I finally realized that bat was limiting the throughput. I decided to see if there were any obvious changes that could help out, and from cargo flamegraph I saw a surprising amount of time in HashSet contains which I narrowed down to checking the styles for different values. Long story short it was being checked repeatedly in the hot loop leading to a significant amount of the runtime. Moving the check outside the loop yielded a decent performance increase and shouldn't have an impact on the program's behavior since the values shouldn't change while the program is displaying (at least AFAIK).

(Partially off topic, but does it really make sense to store the styles in a HashSet? I would think that a Vec would be faster unless there are going to be more styles than I would expect.)

Potential Remaining Improvements

The other change that looks like it could be low hanging fruit would be modifying the line range checks since they now take up about ~10% of the runtime from random.file below. I think the behavior could be modified to boil down to determining the next time the range value will change ahead of time, but I haven't dug into it too much.

After that the vast majority of the time is spent either reading or writing. I've already read that you don't want to switch to a buffered writer for valid reasons. A large portion of the reading does seem to be taken up by checking for where the new-line using read_until. Nothing too notably really seemed to stick out from writing, it looks like stdout isn't locked from OutputType: however, my attempt locking it didn't seem to improve performance from what I could tell. Another possible way for improving throughput would be to move reading and writing into separate threads passing values over some sync_channel? It's possible that switching reading and writing to be async would essentially do this too? I'm not very familiar with async code if I'm being honest.

Basic Benchmarks

All results run on my rather lightweight laptop. Linux 5.9.11, Intel i5-5200U @ 2.7GHz. I used three different 1GB files all in a tmpfs (I only have 8GB RAM total so that's pretty much my limit). zero.file is entirely 0x00 bytes, random.file is from /dev/urandom, and controller_1gb.rs is controller.rs duplicated till it was 1GB.

# Note this one caused a spike in memory usage due to extending the buffer to
# cover the whole file since there's no new line.
❯ ./bat_orig zero.file | pv > /dev/null
1.00GiB 0:00:02 [ 490MiB/s]

❯ ./bat_orig random.file | pv > /dev/null
1.00GiB 0:00:03 [ 272MiB/s]

❯ ./bat_orig controller_1gb.rs | pv > /dev/null
1.00GiB 0:00:25 [40.1MiB/s]

~10% of time from random.file is spent in Hashset::contains. The remaining time all looks to be from reading and writing.

❯ ./bat_less_hashing zero.file | pv > /dev/null
1.00GiB 0:00:02 [ 493MiB/s]

❯ ./bat_less_hashing random.file | pv > /dev/null
1.00GiB 0:00:03 [ 294MiB/s]

❯ ./bat_less_hashing controller_1gb.rs | pv > /dev/null
1.00GiB 0:00:23 [44.5MiB/s]

Essentially no change from zero.file as expected. The files that iterate through the loop more have a moderate improvement.

@eth-p
Copy link
Collaborator

eth-p commented Dec 2, 2020

(Partially off topic, but does it really make sense to store the styles in a HashSet? I would think that a Vec would be faster unless there are going to be more styles than I would expect.)

On my own fork, I toyed around with the idea of using a bitfield for this. I didn't end up doing anything with it, but it might be worth revisiting if the HashSet makes that much of a performance difference.

Copy link
Collaborator

@keith-hall keith-hall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution and the detailed description/benchmark findings. This PR looks good to me.

@sharkdp
Copy link
Owner

sharkdp commented Dec 21, 2020

Thank you very much for your contribution and feedback!

After that the vast majority of the time is spent either reading or writing. I've already read that you don't want to switch to a buffered writer for valid reasons. A large portion of the reading does seem to be taken up by checking for where the new-line using read_until. Nothing too notably really seemed to stick out from writing, it looks like stdout isn't locked from OutputType: however, my attempt locking it didn't seem to improve performance from what I could tell. Another possible way for improving throughput would be to move reading and writing into separate threads passing values over some sync_channel? It's possible that switching reading and writing to be async would essentially do this too? I'm not very familiar with async code if I'm being honest.

I'm happy for any kind of exploratory work in this respect 👍. See also: #712 which I sadly didn't have time to look at so far. Also, if we would like to have a really fast throughput mode (i.e. only if syntax highlighting is disabled, of course): https://endler.dev/2018/fastcat/

@sharkdp sharkdp merged commit 73cff42 into sharkdp:master Dec 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants