Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
(I can edit changelog if you want, but this should result in no usage difference nor is it a bugfix)
Background
First off thanks for the great program (not to mention the others you've made that I also use)! I've already recommended it to several of my friends.
So just to give some background I was using
bat
to read a file and pipe the info through some other programs. I was a bit confused why changes to the program weren't mattering when I finally realized thatbat
was limiting the throughput. I decided to see if there were any obvious changes that could help out, and fromcargo flamegraph
I saw a surprising amount of time inHashSet
contains which I narrowed down to checking the styles for different values. Long story short it was being checked repeatedly in the hot loop leading to a significant amount of the runtime. Moving the check outside the loop yielded a decent performance increase and shouldn't have an impact on the program's behavior since the values shouldn't change while the program is displaying (at least AFAIK).(Partially off topic, but does it really make sense to store the styles in a
HashSet
? I would think that aVec
would be faster unless there are going to be more styles than I would expect.)Potential Remaining Improvements
The other change that looks like it could be low hanging fruit would be modifying the line range checks since they now take up about ~10% of the runtime from
random.file
below. I think the behavior could be modified to boil down to determining the next time the range value will change ahead of time, but I haven't dug into it too much.After that the vast majority of the time is spent either reading or writing. I've already read that you don't want to switch to a buffered writer for valid reasons. A large portion of the reading does seem to be taken up by checking for where the new-line using
read_until
. Nothing too notably really seemed to stick out from writing, it looks like stdout isn't locked fromOutputType
: however, my attempt locking it didn't seem to improve performance from what I could tell. Another possible way for improving throughput would be to move reading and writing into separate threads passing values over somesync_channel
? It's possible that switching reading and writing to be async would essentially do this too? I'm not very familiar with async code if I'm being honest.Basic Benchmarks
All results run on my rather lightweight laptop. Linux 5.9.11, Intel i5-5200U @ 2.7GHz. I used three different 1GB files all in a tmpfs (I only have 8GB RAM total so that's pretty much my limit).
zero.file
is entirely0x00
bytes,random.file
is from/dev/urandom
, andcontroller_1gb.rs
iscontroller.rs
duplicated till it was 1GB.~10% of time from
random.file
is spent inHashset::contains
. The remaining time all looks to be from reading and writing.Essentially no change from
zero.file
as expected. The files that iterate through the loop more have a moderate improvement.