Way to report total count of matches? #411

elirnm · 2017-03-19T09:29:25Z

Is there a way to print the total count of matches? I can pipe the output to a line counter, but then I don't get the actual matches printed.

BurntSushi · 2017-03-19T13:51:54Z

Have you tried the -c flag? (Which is also in grep.)

elirnm · 2017-03-20T01:42:36Z

That prints the number of matches per file (and grep -c seems to do the same). I'm interested in the total number of matches across all files.

BurntSushi · 2017-03-20T01:55:29Z

Okay, then I guess i don't understand why piping to wc -l doesn't work?

elirnm · 2017-03-20T02:00:34Z

It works fine, I just thought I'd check if there was a built-in option just in case. The only downside to piping is that you don't get the actual matches printed, only the count, but that can be worked around by storing the results in a variable first.

BurntSushi · 2017-03-20T02:20:38Z

I'm still confused. You want both the matches printed and the count? Could you please provide an example so that your request is more clear?

elirnm · 2017-03-20T02:27:45Z

What I'm interested in is something like

> rg blah blah --total-count
match
match
match
Total matches: 3

Basically the current functionality but with the total count printed at the end.

I don't expect support for that because I don't think other tools support it either, and I can get that information by storing the results in a variable first (or by running rg twice) so it's not a big deal. I was just checking to make sure it indeed wasn't supported. Sorry for the confusion.

kale · 2017-03-29T15:25:59Z

@elirnm I believe you could just use the tee command?

$ rg blah blah | tee >(wc -l)
match
match
match
    3

If you want to remove that tab before the output, you can do this:

$ rg blah blah | tee >(wc -l | xargs echo)

elirnm · 2017-03-30T01:55:17Z

@kale Not on Windows.

DoumanAsh · 2017-04-02T12:20:12Z

Powershell ?

rg blah blah | Measure-Object -Line

BurntSushi · 2017-04-02T13:02:20Z

Does that show the matches and the count?

DoumanAsh · 2017-04-02T13:59:12Z

Measure-Object -Line counts number of lines. The same as wc -l

BurntSushi · 2017-04-02T14:03:52Z

@DoumanAsh Please re-read this thread. The OP is looking for a way to print both the matches and the count of the matches in a single command.

DoumanAsh · 2017-04-02T15:08:52Z

Opps, im sorry.
I can think of way to do but it will break colors and most likely it will not be one liner :(

BurntSushi · 2017-04-09T12:54:05Z

I'm going to close this. I don't see an option like this being added to ripgrep proper. I think it's too niche and working around it is very simple by piping the output through a line counter.

kaushalmodi · 2017-04-19T15:27:55Z

I ended up here looking for a way to doing something like --stats that ag does in rg too.

Using ag foo --stats returns the usual matches, plus puts out this at the end which is very useful:

65 matches
26 files contained matches
1539 files searched
75669452 bytes searched
1.044798 seconds

wc -l is not very elegant as I tend to view rg results with file headers enabled.. as follows:


textmodes/page-ext.el
244:(defcustom pages-directory-buffer-narrowing-p t
249:(defcustom pages-directory-for-adding-page-narrowing-p t
254:(defcustom pages-directory-for-adding-new-page-before-current-page-p t
268:(defcustom pages-directory-for-addresses-goto-narrowing-p t
273:(defcustom pages-directory-for-addresses-buffer-keep-windows-p t
278:(defcustom pages-directory-for-adding-addresses-narrowing-p t

textmodes/ispell.el
141:(defcustom ispell-highlight-p 'block
285:(defcustom ispell-look-p (file-exists-p ispell-look-command)
301:(defcustom ispell-use-ptys-p nil
337:(defcustom ispell-use-framepop-p nil

Can you please re-open this issue, and consider adding a --stats-like switch?

BurntSushi · 2017-04-19T15:31:19Z

wc -l is not very elegant as I tend to view rg results with file headers enabled.. as follows:

Did you actually try it though? If you pipe the output of ripgrep into another command, then it should revert to the standard output format of grep (unless you pass the --heading flag explicity).

kaushalmodi · 2017-04-19T15:37:46Z

The situation is like I want to eat the cake (show results with headings) and have it too (show the match statistics too).

With rg and wc -l, I get just:

So I would need to run once to see the result and run second time to get the match count. But that is not as informative as the --stats in ag I show above.. I do not get how many files matched.

It's a different thing to investigate why the total matches are not the same when searched using ag vs rg.

BurntSushi · 2017-04-19T15:42:58Z

The situation is like I want to eat the cake (show results with headings) and have it too (show the match statistics too).

At what point does ripgrep have to solve every problem associated with displaying stats? It should be very simple for anyone to write a wrapper script that does what you want here, although the easiest path would run rg multiple times, which seems fine for most use cases IMO.

I will re-open this for now, but at a certain point, I have to be allowed to say "No" to new feature requests and people have to respect that reasonable people can disagree where that line is drawn.

It's a different thing to investigate why the total matches are not the same when searched using ag vs rg.

The silver searcher has a very very large number of bugs associated with its gitignore support. It's more surprising when the total number of results are the same.

kaushalmodi · 2017-04-19T15:54:05Z

I don't want my suggestion to rub you the wrong way. It was as I said.. just a suggestion.

I fully respect your project.

So if it's your decision to never support this, I will respect that.

This request came up when I had to find the total number of matches during a discussion, and that's where I realized that I needed to switch to ag for that.

BurntSushi · 2017-05-08T22:27:22Z

I think I've come around to this feature. Starting with what the silver searcher does seems reasonable:

$ ag PM_RESUME
# results omitted
16 matches
9 files contained matches
55263 files searched
643515649 bytes searched
0.453583 seconds

I do have a question though: should stats be printed to stdout or to stderr? ag prints them to stdout. I think I'm fine either way, and probably lean slightly towards stdout.

BurntSushi · 2017-05-08T22:29:09Z

An argument in favor of stderr is that you can do this: rg foo --stats > /dev/null and see the statistics without worrying about a bunch of output being printed to your terminal.

kaushalmodi · 2017-05-08T22:32:25Z

Wouldn't printing --stats throw off a lot of wrapper scripts?

I would favor STDOUT as that stats are not errors technically.

without worrying about a bunch of output being printed to your terminal.

In that case, may be a different switch be added to not output the matched lines at all? Probably --stats=only prints only the stats?

BurntSushi · 2017-05-08T22:55:30Z

Wouldn't printing --stats throw off a lot of wrapper scripts?

I'm not sure I follow. If you don't want to throw off wrapper scripts, then don't use --stats? How else do you expect this to be implemented?

In that case, may be a different switch be added to not output the matched lines at all? Probably --stats=only prints only the stats?

I try to prefer composition of existing tools.

kaushalmodi · 2017-05-08T23:34:04Z

If you don't want to throw off wrapper scripts, then don't use --stats?

I don't have such a wrapper script.. but in case one is doing

set -euo pipefail # http://redsymbol.net/articles/unofficial-bash-strict-mode
foo_match_count=$(rg foo --stats | grep -Po '\d+(?=\s+matches)')

The script will fail even where the match is positive.

The STDERR output can cause confusion here.

BurntSushi · 2017-05-08T23:54:15Z

But in that case, the problem is immediately obvious because the stats will be dropped to stderr. And then it's easy to fix.

elirnm · 2017-05-09T02:37:22Z

I'd prefer stdout because if stats are part of stderr there's no easy way to separate the stuff related to results from the stuff related to actual errors while keeping the stats with the rest of the results-related stuff.

If stats are printed to stderr, then rg blah --stats > some_file.txt will still print stats to the console rather than the file, but rg blah --stats > some_file.txt 2> some_file.txt will print actual errors to the file rather than the console. So there's not an easy way to get stats and results in a file but errors to the console.

santagada · 2017-07-25T12:15:28Z

I don't know if it is the correct place to ask for it, but could we have a way to print how many matches per file? I'm searching binary files so I care about matches and not lines... showing both is perfectly fine, just -c doesn't give me any meaninfull number.

BurntSushi · 2017-07-25T12:19:54Z

@santagada That seems like an orthogonal issue to what this is. Could you please open a new issue? Please also explain why you think -c/--count doesn't give you any meaningful number. It seems meaningful to me. In fact, it seems like it does exactly what you want: it shows the number of matches in each file:

$ rg -c foo
README.md:12
CHANGELOG.md:10
globset/README.md:7
tests/tests.rs:186
src/args.rs:1
src/printer.rs:4
src/app.rs:4
doc/rg.1:6
doc/rg.1.md:6
globset/src/pathutil.rs:8
globset/src/glob.rs:59
globset/src/lib.rs:16
grep/src/literals.rs:1
ignore/src/gitignore.rs:33
ignore/src/dir.rs:26
ignore/src/types.rs:18
ignore/src/overrides.rs:28
ignore/src/walk.rs:27
grep/src/data/sherlock.txt:45

santagada · 2017-07-27T09:37:22Z

your example shows it, README.md has 17 occurrences of foo (on at least one line it shows 3 times). what -c is showing is how many lines matched the regex (or maybe we searched very different versions of the README.md). I will open a new issue

santagada · 2017-07-27T09:44:42Z

filed a ticket for it in #566

balajisivaraman · 2018-02-01T16:13:14Z

I can pick this one up next if we still want this implemented. I don't see any other PRs open for this, and it looks pretty straightforward from a specification standpoint (duplicate what ag does) and could be very interesting for me to work on. 👍

kaushalmodi · 2018-02-01T16:15:52Z

@balajisivaraman I'd be very grateful, thank you! This is one of the two things why I need to keep ag installed on my system :)

BurntSushi · 2018-02-01T16:22:55Z

@balajisivaraman Thanks! Let me know if you want any help coming up with how to organize this in the code. It is likely that familiarizing yourself with Rust's support for atomics will be helpful. As another caveat, I would like to prevent the two search modules (search_stream.rs and search_buffer.rs) from modifying any kind of shared mutable state because that will hurt later refactoring. Instead, the searchers likely need to return the stats for a particular file somehow, and then the worker should merge them with stats that it already has. And of course, this all needs to be conditional on whether a --stats flag is passed. We shouldn't be counting things unless told to. :-)

Here are some possible simplifications that you may elect to choose to do:

If --stats is passed, then force ripgrep into single threaded mode. Then you could focus on just the single threaded worker, which might be easier. We can enable parallelism later.
If --stats is passed, do not permit memory map searching. This would let you implement stat tracking in only the src/search_stream.rs searcher. (It would be nicer to only do memory maps since it would be considerably simpler, but memory maps can't handle every type of search.)

balajisivaraman · 2018-02-01T16:47:39Z

@BurntSushi, Thanks for the pointers. I'll have an initial look this weekend and come up with a rough idea of how I want to go about it, and I'll post it here for vetting. 👍

BurntSushi · 2018-02-01T17:00:55Z

@balajisivaraman Aye. Another idea that I might like even better is seeing if this could be done in the printer instead of the search code. That way it would work for both searchers.

balajisivaraman · 2018-02-03T16:00:04Z

@BurntSushi, I get the feeling we should be able to easily do this in main.rs itself, bypassing both the searchers and the printer, with one caveat which I get into below. (I apologise in advance if this post is a bit long and too detailed.)

Here's my reasoning as to why:

Files Searched: We already track the total number of paths searched in main.rs in both run_one_thread and run_parallel for outputting debug messages at the end. So we get that for free already.
Match Count: We also already count the overall match_count in both the aforementioned functions since that is the value to be returned; so we also get that for free.
Files With Matches: It should be trivial to track the files with matches if necessary by having a new counter and updating that if count from the current path is greater than 0.
Time Taken: The time taken should be trivial to implement in main.rs as a difference between when search starts (entry into run_parallel or run_one_thread) and ends (exit from run_parallel or run_one_thread).

My thought is that we should be able to do all of the above in main.rs without any impact on performance or current code structure. And since these are overall stats to be tracked about the current run of rg, it does make sense to do this in main.rs instead of the searchers, which are tracking stats for individual files.

The trickiest part will be tracking bytes searched. I haven't been able to come up with an easy way to do this that doesn't involve making changes to search_stream.rs and search_buffer.rs. These are the two files where the actual searching happens. From what I can see, those are the places where the input Reader is actually loaded into buffers and searched. So that will be the place to change if we actually want to track the overall number of bytes that were searched across individual files.

My current thought is that we could do two things about it:

Not implement the bytes searched feature. The easier option. :-)
Make the search_stream.run and search_buffer.run return the match_count and an optional bytes_searched (will be None if --stats is not passed), preferably combined together in a Struct of some sort (SearchStats or FileStats or something). This way we can combine them together in main.rs like we do with the other stats and output it.

Also, as you suggested, I had a look at seeing whether we could offload some of this to the printer, but I had difficulty doing it. This is because all the stats we want to output are only available in the searchers or the main.rs file. The best I could come up with was to have a Stats struct (containing atomic values for match count, files with matches and overall files) wrapped in an Arc as the state in printer.rs. This state is then updated by search_stream.rs by passing in the relevant values and doing atomic updates.

This is bad because there's shared mutable state going on. Another pain point is that we create a new printer for every file that is searched whereas the stats to be output are more global in nature. (This could work if we only did single threaded search.) Currently what I feel would be the ideal option is to return the stats we want as values from the searchers and combine them in main.rs, like we do already with the match count.

I also found the following quirks in ag that we should take a call about:

If I cat a file and pipe it to ag, I get the following output. Now this is very confusing because it shows 17 files searched when I use Stdin. We could still display other stats for Stdin, but it seems to me that the wiser option would be to negate --stats when we're in Stdin. Thoughts?
```
balaji@hogsmeade $ cat .xmobarrc | ag --stats 'temp'
    , template          = " %StdinReader% }{ %net%   %date% "
1 matches
1 files contained matches
17 files searched
764 bytes searched
0.000041 seconds
```
As a result of this, should rg simply ignore stats for Stdin?
If I do ag --files-without-matches 'bufwtr' ., it behaves as if --invert-match was passed in. What that means is that the match count and matching file count displayed in the stats are inverted. But this feels weird to me because the argument asks to print only the files without matches, not to actually invert the match in the stats. Should rg just ignore stats if -l or --files-without-matches is passed in?

I'll have another look to see whether there would be any other way of doing this, but that's what I came up after going through the code today.

BurntSushi · 2018-02-03T16:41:22Z

@balajisivaraman Thanks for writing that up! The task of counting bytes is definitely an interesting one and I grant that it does appear to be a little tricky to do with the current code. My feeling is that the "best" way to do this would be as a new type that implements io::Read. The implementation would wrap another io::Read type and basically passthrough all calls unmodified, but would count the number of bytes read. I've done this in various ways before and it's pretty simple. Here's an example for a writer: https://github.com/BurntSushi/fst/blob/9a144a1c99605a210609147aaa8b09cf2776efd9/src/raw/counting_writer.rs --- You would probably want to construct this type in the worker and pass a mutable reference to the buffered searcher. Once the searcher is done, you can ask the type for the count of bytes read. For memory maps, you'll need to devise a different strategy, but we could skip that for now.

But yeah, we can definitely punt on the byte counting for now and do that at a later time. With that said, it is definitely a useful part of the stats output because it's what will let you compute a thoughput statistic (which I suppose we should also include once we do byte counting).

If I cat a file and pipe it to ag, I get the following output. Now this is very confusing because it shows 17 files searched when I use Stdin. We could still display other stats for Stdin, but it seems to me that the wiser option would be to negate --stats when we're in Stdin. Thoughts?

ag has a lot of bugs. I think ripgrep can probably get stats right for stdin without claiming that it searched 17 other files. :-)

If I do ag --files-without-matches 'bufwtr' ., it behaves as if --invert-match was passed in. What that means is that the match count and matching file count displayed in the stats are inverted. But this feels weird to me because the argument asks to print only the files without matches, not to actually invert the match in the stats. Should rg just ignore stats if -l or --files-without-matches is passed in?

You're on to something here. I think it would be fine to ignore --stats if --files-with-matches or --files-without-match were given.

balajisivaraman · 2018-02-03T16:53:42Z

@BurntSushi, Ah that's a nifty little trick. Thanks for pointing that out. I'll see whether I'd be able to cook up something similar for counting bytes here.

If you're OK with the rest of the suggestions in terms of tracking the existing stats and outputting them in main.rs, I can go ahead and begin working on the changes.

BurntSushi · 2018-02-03T16:59:05Z

@balajisivaraman Oh right, I forgot to respond to that part! Yes, doing those counts in main.rs is great.

balajisivaraman · 2018-02-14T18:46:47Z

@BurntSushi, I just realised that there are some similarities between this and #566.

Although I have a WIP PR (#799) open for this, I realised that the match_count that is displayed in the current implementation is a line count instead of the actual occurrence count. As reported in the aforementioned issue, we probably want a way to keep track of the occurrence count, which I completely overlooked in my original post above. Apologies for that. 😞

I'll leave the pending PR open and look at ways I can work on the occurrence count issue. We should then be able to reuse that for computing stats, if that is fine.

adikwok · 2018-09-20T07:28:14Z

how to rg -c top 10 most words from a disk?

JonDum · 2018-09-29T06:29:47Z

Sorry for necro'ing this thread, but just wanted to say I really appreciate the work and effort in this feature! rg is one of my most used and favorite tools and I was super happy to find this functionality today without having to resort to some cut -d: -f2 | awk trickery!

ElectricRCAircraftGuy · 2022-01-03T19:25:51Z

Count the total number of matches in ripgrep

For anyone still stumbling upon this thread, the --stats option is now present in Ripgrep (it looks like it was added as a result of this thread), so just do rg --stats 'my regex search'. Done. :)

Example output: the line you're looking for is the one where I put <===:

...
[all of the match content up here]
...

156 matches          <===
156 matched lines
86 files contained matches
2954 files searched
18355 bytes printed
35266829 bytes searched
0.014166 seconds spent searching
0.012024 seconds

If you just want just that one line, do this:

rg --stats "my search term" | tail -n 8 | head -n 1

Example output:

156 matches

If you just want the 156, do this:

rg --stats "my search term" | tail -n 8 | head -n 1 | awk '{print $1}'

Output:

Alternative hack

As @clashman says below, you can also do this hack:

rg "my search term" | rg -c "my search term"

Sample output:

If you need this to run in Windows

...Use the Git Bash terminal which comes with Git for Windows. See my instructions here: Installing Git For Windows.

For building software in Windows, or running the GCC or LLVM Clang compilers from the command-line, and to get other Linux tools in Windows, use the MSYS2 terminals. See my MSYS2 setup answer here: Installing & setting up MSYS2 from scratch, including adding all 7 profiles to Windows Terminal.

clashman · 2022-01-28T14:24:51Z

A simpler, albeit dirty solution: rg FOO | rg FOO -c

Amiralgaby · 2023-03-06T19:44:57Z

For me the best answers is actually
rg --stats "<regex>" -q | rg -e "\d+ matches" | cut -d" " -f1

for what I needed I've not the same result
The command above is better I think instead of "rg FOO | rg FOO -c". I have not the same result between my command and yours so I trust the --stats flag and the "matches" result ^^

taariksiers · 2023-09-27T12:24:47Z

what works for me

rg [search term] -l | wc -l

BurntSushi · 2023-09-27T13:18:37Z

@taariksiers That will only report the total number of files that contain a match. The -l/--files-with-matches just prints one line per file path that contains at least one match of [search term].

TeaDrinkingProgrammer · 2024-01-30T10:31:14Z

Because I use Windows, I have turned to a small python script (written by ChatGPT):

import subprocess

# Run the command and capture the output
command = "rg -c foo"
output = subprocess.run(command, capture_output=True, text=True).stdout.strip()

# Extract the last number from each line and sum them up
total_count = sum(int(line.split(":")[-1].strip()) for line in output.split("\n"))

# Output the modified output and the total count
print(output)
print(f"Total Count: {total_count}")

ElectricRCAircraftGuy · 2024-01-30T16:22:46Z

@TeaDrinkingProgrammer , FYI: all of the Bash/terminal solutions above work on Windows too. You just have to use the Git Bash Linux-like terminal that comes with Git for Windows, is all. You can also use the MSYS2 terminal.

Here are my installation instructions for those terminals in Windows:

ltrzesniewski · 2024-01-30T16:26:23Z

You can also use WSL:

rg foo | wsl wc -l

drygdryg · 2024-02-28T13:13:43Z

rg -c 'my search term ' | awk -F: '{s+=$2} END {print s}'

adikwok · 2024-02-28T14:11:07Z

thank you, sirOn Feb 28, 2024 8:13 PM, Victor Golovanenko ***@***.***> wrote: rg -c 'my search term' | cut -d: -f2 | awk '{s+=$1} END {print s}' —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

BurntSushi added the question An issue that is lacking clarity on one or more points. label Mar 19, 2017

BurntSushi closed this as completed Apr 9, 2017

BurntSushi reopened this Apr 19, 2017

BurntSushi added the help wanted Others are encouraged to work on this issue. label May 8, 2017

BurntSushi mentioned this issue Sep 16, 2017

search summary at the end #605

Closed

BurntSushi closed this as completed in 00520b3 Mar 10, 2018

ElectricRCAircraftGuy mentioned this issue Jan 9, 2024

rgr.sh: add --count-all option to count all matches ElectricRCAircraftGuy/ripgrep_replace#5

Open

Way to report total count of matches? #411

Way to report total count of matches? #411

Comments

elirnm commented Mar 19, 2017

BurntSushi commented Mar 19, 2017

elirnm commented Mar 20, 2017

BurntSushi commented Mar 20, 2017

elirnm commented Mar 20, 2017

BurntSushi commented Mar 20, 2017

elirnm commented Mar 20, 2017 • edited Loading

kale commented Mar 29, 2017 • edited Loading

elirnm commented Mar 30, 2017

DoumanAsh commented Apr 2, 2017 • edited Loading

BurntSushi commented Apr 2, 2017

DoumanAsh commented Apr 2, 2017

BurntSushi commented Apr 2, 2017

DoumanAsh commented Apr 2, 2017

BurntSushi commented Apr 9, 2017

kaushalmodi commented Apr 19, 2017 • edited Loading

BurntSushi commented Apr 19, 2017

kaushalmodi commented Apr 19, 2017 • edited Loading

BurntSushi commented Apr 19, 2017

kaushalmodi commented Apr 19, 2017

BurntSushi commented May 8, 2017

BurntSushi commented May 8, 2017

kaushalmodi commented May 8, 2017

BurntSushi commented May 8, 2017

kaushalmodi commented May 8, 2017

BurntSushi commented May 8, 2017

elirnm commented May 9, 2017 • edited Loading

santagada commented Jul 25, 2017

BurntSushi commented Jul 25, 2017

santagada commented Jul 27, 2017

santagada commented Jul 27, 2017

balajisivaraman commented Feb 1, 2018

kaushalmodi commented Feb 1, 2018

BurntSushi commented Feb 1, 2018 • edited Loading

balajisivaraman commented Feb 1, 2018

BurntSushi commented Feb 1, 2018

balajisivaraman commented Feb 3, 2018

BurntSushi commented Feb 3, 2018

balajisivaraman commented Feb 3, 2018

BurntSushi commented Feb 3, 2018

balajisivaraman commented Feb 14, 2018

adikwok commented Sep 20, 2018

JonDum commented Sep 29, 2018

ElectricRCAircraftGuy commented Jan 3, 2022 • edited Loading

Count the total number of matches in ripgrep

Alternative hack

If you need this to run in Windows

clashman commented Jan 28, 2022

Amiralgaby commented Mar 6, 2023

taariksiers commented Sep 27, 2023

BurntSushi commented Sep 27, 2023

TeaDrinkingProgrammer commented Jan 30, 2024

ElectricRCAircraftGuy commented Jan 30, 2024 • edited Loading

ltrzesniewski commented Jan 30, 2024

drygdryg commented Feb 28, 2024 • edited Loading

adikwok commented Feb 28, 2024 via email

elirnm commented Mar 20, 2017 •

edited

Loading

kale commented Mar 29, 2017 •

edited

Loading

DoumanAsh commented Apr 2, 2017 •

edited

Loading

kaushalmodi commented Apr 19, 2017 •

edited

Loading

kaushalmodi commented Apr 19, 2017 •

edited

Loading

elirnm commented May 9, 2017 •

edited

Loading

BurntSushi commented Feb 1, 2018 •

edited

Loading

ElectricRCAircraftGuy commented Jan 3, 2022 •

edited

Loading

ElectricRCAircraftGuy commented Jan 30, 2024 •

edited

Loading

drygdryg commented Feb 28, 2024 •

edited

Loading