-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ripgrep much slower than (rip)grep #1335
Comments
Unfortunately, there is really nothing I can do unless I can reproduce it. I'd suggest trying to reproduce the performance difference on a corpus that we both have access to. I generally find it difficult to read the data you've given to me. I'm not really sure what I'm looking at or the nature of the query or the corpus. For example, how many matches are printed? Do ripgrep and grep print the same output? Do your timings really suggest that multi-threaded ripgrep is slower than single threaded grep? That's pretty crazy to me if true, and suggests something wonky is going on. Is there anything special about your system's configuration? |
I also find it interesting that you don't have AVX enabled in your CPU. To me, that suggests your CPU is quite old. What kind of CPU is it and how many cores does it have? |
Sorry, I tried to make it readable. So that's a directory listing. To find out what causes the slowdown:
I hope it helps a bit.
Yes, sorry, both prints two matches.
Yes, for example with
I wouldn't say. I'm just working on tmpfs, but it just should make things faster.
Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz
Tried on linux codebase now, but no luck.
Yeah, I know. Later in the day I can run profiler tools on rg. |
I tried obfuscating the files but timings became distorted in a great extent. Instead, I quickly made a little syscall statistics. grep
ripgrep
|
The syscall frequency table is quite interesting. ripgrep not only appears to be opening more files than grep, but it's also getting errors for a lot of them. A few more thoughts:
The Is it possible for you to narrow down the corpus? For example, if you run ripgrep (and grep) on the each top-level directory in your tree, do they all have similar timings? Hopefully, one of those directories will take much longer to search with ripgrep than grep, which should then let you repeat the process recursively. I think it would be nice if ripgrep grew a flag that allowed it to print the total search time for each individual file. You can almost get it with the
|
My naïve first thought about it was that (before I got the statistics) maybe ripgrep always chooses next paths to explore in a very unfortunate way. I know nothing about internal working of ripgrep, but an unbalanced directory structure can cause it, can it be possible?
At first I ran it in a custom
These files contain mostly things that don't exist, so
No.
Ran with
This is what I did and would like to illustrate with my table above. ripgrepping the top level directory takes 964ms, but total search time of individual children files and directories is not more than ~300ms. Take the 4th entry for example. It takes 217ms, but if I try to ripgrep its children one-by-one, then their total search time will be (more) less. So there is no one big file. There are no binary files, just text files with very looong lines (7.5 millon characters).
Hmm. I got very very weird results. |
I mean, compared to grep. Your second entry has
How long are the lines? And do you mean to say there are 7.5 million files? Quite frankly, I'm pretty surprised that grep is searching that entire directory in under a second (if I am to understand your timings correctly). For example, on my checkout of the entire Chromium repository, So maybe the other idea I have is whether grep is actually searching all of your data or not. Could you compare |
I meant 7.5 million character long lines. |
Break the command along ':'s and you will see how values were computed.
I guess 0.919s is at
I know |
|
My next step here would be to keep comparing directories. In your table, there are lots of directories that have ripgrep being much slower than grep. I would just keep digging into those directories recursively until you widdle the corpus down. It might also be interesting to try your search in a way where both tools exclude files with very long lines, if that's possible. My guess is that the 7.5 million character lines has something to do with this. That is extremely unusual for a plain text file, and it's plausible it's exposing pathological behavior in either tool. |
Maybe, but on the other hand '\n' is just like any other character, and these lines don't have to be printed. And as an important note:
Why? Look at my
That 7.5M file is one long line. It takes the longest time to scan: 0.05s. 8 lines above a 0.0006M file took 0.02s to scan. (I checked three times the results and it's not an individual case... but the most shocking.) |
Nope, that's very wrong. Both grep (AFAIK) and ripgrep require an entire line to be in memory in order to search it. Therefore, when reading from a file, they both (should) buffer until it sees EOF or the next Remember, I am just guessing based on my mental model of how these programs work and my interpretation of the data you've presented so far. Both things could be wrong in subtle ways. ;-) I agree that your JSON results are a bit interesting. I'm not sure what to make of them. I think I'm at a loss at this point and would require a reproduction to move further. |
Oops, sorry. What was in my mind is that they don't have to be printed, so they don't have to be stored. Silly thing. |
I can't believe it, but hurray ripgrep is faster:
Notes:
So... what if it isn't an issue with ripgrep but an issue with
Seems similar, don't you think? |
I poked at the long lines because a 7.5 million character long line is highly unusual, so it's just a shot in the dark with respect to it being an anomaly. I cannot come up with any hypothesis that would explain slowness in ripgrep's directory traversal. Multithreaded and single threaded directory traversal use completely different code, so the issue would need to be present in both implementations. I wrote both of them myself, so it's possible. While the syscall traces are interesting, your I'm afraid the only path forward here is to either recursively widdle the corpus down so that a proper analysis can be done, or by giving me a reproduction so that I can investigate myself. Sorry. :-/ |
Tokei uses the same directory traversal code as ripgrep, so that is an interesting point in favor of that being the slow down. But I still can't think of a hypothesis to explain it. |
Have you tried different queries? e.g., |
Yes, of course, but results have to taken with care, because with "complex" regexp queries ripgrep can easily beat grep even if it's slowed down by this issue. But generally speaking yes, if I grep for regexp-less or queries with moderated amount of regexps I get similar times. In the mean time, I created some simple projects to test UPDATE: And futex errors come from UPDATE2: Walking the directory: 0.150s; parallel: highly varies between 0.125-0.170s. I will try making some changes in the parallel walk code of |
For grins, can you try searching for
Right yeah. Those shouldn't be happening though when running with Thanks for really digging into this! I'll be quite curious to see what you find! |
Searched for every such pattern using ripgrep had higher variance in times (0.92-1.23s) compared to grep (0.29-0.36s). Aside form the working of the algorithm, it suggests me that it's the effect of the bug. |
I think I got it. Look at the times.
Used a little shitty hack, that tells some threads to refuse to the work if it is a directory, so basically one threads does the directory traversal and others taking care of files. It can have huge performance penalty though, but... it works sometimes. (Contains probable fix for removing unnecessary diff --git a/ignore/src/walk.rs b/ignore/src/walk.rs
index df796d4..ecced49 100644
--- a/ignore/src/walk.rs
+++ b/ignore/src/walk.rs
@@ -1161,7 +1161,7 @@ impl WalkParallel {
let num_quitting = Arc::new(AtomicUsize::new(0));
let quit_now = Arc::new(AtomicBool::new(false));
let mut handles = vec![];
- for _ in 0..threads {
+ for i in 0..threads {
let worker = Worker {
f: mkf(),
tx: tx.clone(),
@@ -1177,7 +1177,7 @@ impl WalkParallel {
follow_links: self.follow_links,
skip: self.skip.clone(),
};
- handles.push(thread::spawn(|| worker.run()));
+ handles.push(thread::spawn(move || worker.run(i > 0)));
}
drop(tx);
drop(rx);
@@ -1314,7 +1314,7 @@ impl Worker {
///
/// The worker will call the caller's callback for all entries that aren't
/// skipped by the ignore matcher.
- fn run(mut self) {
+ fn run(mut self, files_only: bool) {
while let Some(mut work) = self.get_work() {
// If the work is not a directory, then we can just execute the
// caller's callback immediately and move on.
@@ -1331,6 +1331,12 @@ impl Worker {
return;
}
}
+
+ if files_only {
+ self.tx.send(Message::Work(work)).unwrap();
+ continue;
+ }
+
let readdir = match work.read_dir() {
Ok(readdir) => readdir,
Err(err) => {
@@ -1470,11 +1476,12 @@ impl Worker {
/// If all work has been exhausted, then this returns None. The worker
/// should then subsequently quit.
fn get_work(&mut self) -> Option<Work> {
+ let mut value = self.rx.try_recv().map_err(|_| ());
loop {
if self.is_quit_now() {
return None;
}
- match self.rx.try_recv() {
+ match value {
Ok(Message::Work(work)) => {
self.waiting(false);
self.quitting(false);
@@ -1514,22 +1521,15 @@ impl Worker {
Err(_) => {
self.waiting(true);
self.quitting(false);
+ // If all threads are waiting, then quit.
if self.num_waiting() == self.threads {
for _ in 0..self.threads {
self.tx.send(Message::Quit).unwrap();
}
- } else {
- // You're right to consider this suspicious, but it's
- // a useful heuristic to permit producers to catch up
- // to consumers without burning the CPU. It is also
- // useful as a means to prevent burning the CPU if only
- // one worker is left doing actual work. It's not
- // perfect and it doesn't leave the CPU completely
- // idle, but it's not clear what else we can do. :-/
- thread::sleep(Duration::from_millis(1));
}
}
}
+ value = self.rx.recv().map_err(|_| ());
}
} |
Hmmm but that doesn't explain why single-threaded ripgrep is so much slower than single-threaded grep? |
Ehhh. |
|
Sorry, don't know what you mean. Probably when reading files.
To check the BOM. |
From perf report it seems like ripgrep does nothing just copies memory all the time.
Just because, it shows up as a separate syscall. It means a bunch of syscalls and 0.1s (estimated) in total in my case. Can't a bigger chunk be read? |
I'd have to see it myself.
The answers to these types of questions is almost always "yes." But I don't know what the complexities are off the top of my head, nor do I know what the trade-offs are. It would require someone making the change and measuring it. I think looking at these micro-optimizations is probably a distraction, and I really just do not have the time or patience to go back and forth about them all the time. I realize you're being super helpful here by trying to diagnose a performance problem, but all I have are guesses and bad theories. The most likely explanation, given that you can reproduce this on other corpora, is that there is something special about your corpus that is provoking some kind of bad behavior in ripgrep. |
I also know that not this causes the slowdown for me, I just thought you would be interested in it. Anyway, I'm not sure I understand why 10000 system calls (linear in searched files), that takes up 10% of running time is labeled as micro-optimization. I flattened the directories, to exclude my slow directory traversal theory.
The last thing I can think is:
|
Two difference between the two is a file with a 7.5m-character-long line. perf stats
( perf reports
shuffle $files then ripgrep
I hope you can reproduce the issue: clone this gist and run the shell script. (Sorry, for the lot's of files, it surely can be reproduced from much less, but I have no time to experiment with it.) My new theoryA huge memory reallocation could explain all these symptoms. |
Wonderful, thank you. I can finally repro it. This is a very interesting bug.
To establish a baseline and get my bearings, I just tried poking at that single
The first command is a bit deceptive, because ripgrep will open the file using Another curiosity here is that the maximum memory used by grep is ~2x more than However, things start to change once we try to search multiple files at once.
The first two commands confirm that using memory maps "fixes" this problem, so
So that seems fine. Digging into this a bit more, running
looks fairly uninteresting and about what I'd expect. Most time is being spent However, changing the command to
reveals that 90% of the time is being spent in My instinct tells me that something is going wrong with reusing the Hmmm... yes... this is starting to make sense. In the
which effectively moves everything at But copying everything after
And re-running the command
Voilà! To pop back and make sure everything works across the entire directory:
So I think that fixes it. :-) I think this explains everything. Even your This was an excellent bug. Thank you so much for sticking with it and providing |
Nice. I lost all my hairs... and seeing that the fix is only 8 chars... 😀 I just wondered a little about Rust. Wasn't it a bit too implicit in this case? If you should have to explicitly type the end of the range, maybe you could spot out this issue more sooner. |
I don't think so. I've been using Line 517 in b1c52b5
|
I haven't read through everything here, but is he using ZFS on Linux by any chance? I have some evidence that (at least in some cases) ZFS locking is leading to multithreaded ripgrep getting slower with each added core ( -j1 is faster than -j4, etc). @BurntSushi have you ever heard anything related to that issue before? (update) sorry, I see that this was a bug and was resolved with a code change, but I am still interested to hear whether you've seen and ZFS-related issues. |
I haven't heard anything about that, no. I don't think I've ever tried ripgrep on zfs. |
I was looking at the behavior trying to better understand what is happening. I noticed the crossbeam-channel threads during get_work and run_one were yielding the CPU a lot when a certain ZFS feature was enabled. However they weren't done, there was more work to do (just perhaps not available at that moment, presumably because of directory locks). When running rg --files on a large directory structure with ZFS, the runs has low CPU utilization due to the threads yielding repeatedly much of the time. Can you explain what conditions the crossbeam-channel consumer threads would yield during a run? |
You might be using an older version of ripgrep. The latest version doesn't use crossbeam at all. It just uses a stack guarded by a mutex: ripgrep/crates/ignore/src/walk.rs Lines 1390 to 1396 in 11c7b2a
See #1554 for more details on that change. |
Otherwise, I would expect a yield to occur if there's nothing in the queue maybe? |
What version of ripgrep are you using?
ripgrep 11.0.1
-SIMD -AVX (compiled)
+SIMD -AVX (runtime)
How did you install ripgrep?
pacman -S ripgrep
What operating system are you using ripgrep on?
Arch Linux
Describe your question, feature request, or bug.
ripgrep is 3-4-5 times slower than GNU grep on simple string pattern.
If this is a bug, what is the actual behavior?
As I tried to find one-by-one what top-level file (or directory) causes the slowdown, I came to the conclusion: none. Ripgrep should be faster... So I did this table:
(Sorry, for hiding the names, I'm not sure if I'm allowed to show them.)
Tomorrow I will make a line length statistics if it counts, but I think it should be an issue around directory listing. Anyway, I would like to help, so tell me what to do/share.
--debug
didn't help too much.The text was updated successfully, but these errors were encountered: