-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to fullfill all the goals of the generic decoder program feature #981
Conversation
src/app.rs
Outdated
@@ -1462,6 +1463,33 @@ This flag can be disabled with --no-search-zip. | |||
args.push(arg); | |||
} | |||
|
|||
fn flag_preprocessor(args: &mut Vec<RGArg>) { | |||
const SHORT: &str = "search outputs of \"COMMAND FILE\" for each FILE"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quoting COMMAND FILE
doesn't seem right; place-holders aren't quoted anywhere else in the help text. Also you switch between COMMAND FILE
and COMMAND
; COMMAND
seems simpler (or maybe PROGRAM
if you're worried that's too vague). Also deactivates
is misspelt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Dequoted & fixed spelling. I use COMMAND FILE when there is also a FILE in the sentence context and COMMAND when it's just about the program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, i think i see what you mean
src/app.rs
Outdated
esac; | ||
esac | ||
"); | ||
let arg = RGArg::flag("preprocessor", "COMMAND").short("P") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually when one option overrides another (as you said this does with -z
), you put that in the help text and then specify it here. You can see examples of that elsewhere in the file
I actually would have suggested making them conflict, because it's not intuitively obvious how they should behave together, but then i thought that a lot of people probably add -z
to their aliases/configs, so that might be too inconvenient. Not sure
complete/_rg
Outdated
@@ -173,6 +173,7 @@ _rg() { | |||
+ '(zip)' # Compressed-file options | |||
{-z,--search-zip}'[search in compressed files]' | |||
$no"--no-search-zip[don't search in compressed files]" | |||
{-P,--preprocessor}'[search files needing preprocessing]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to put these together, i would suggest renaming the group and changing the comment, since it's no longer accurate. Maybe (preprocessor-zip) # File-preprocessing options
or something, to match the way pretty-vimgrep
is done
Also, the spec isn't right. It should be something like:
{-P+,--preprocessor=}'[specify preprocessor utility]:preprocessor utility:_command_names -e'
(_command_names -e
will prefer PATH
executables, but also do the right thing if you try to complete an actual file path)
Also, it probably should be placed before the other two options, since they were meant to be... semi-alphabetically ordered :/
and not really looked at other group orderings. Most general first with special case optimizations makes about as much sense anyway.
Just as an example of why this would be awesome: Together with this caching pdftotext wrapper as a preprocessor this is able improve on pdfgrep by orders of magnitude: On a semi-large directory of pdfs:
That's almost a 9000% performance improvement. (Even without caching it only takes 0.5s, not sure what the pdfgrep people are doing) |
Yeah...Besides lecture slides I have a slew of papers I've collected over the years, as I expect many others have. I use this all the time with a custom GNU grep patch I did that does basically the same thing. Line numbers might be nice and all, but honestly I mostly use this with a specific enough pattern and As mentioned in the feature request, applications are really bounded only by your imagination. The searching through only the parts of context diffs with actual changes, conceivably even (trustworthy) foreign language translations if there was a good library/CLI tool for that, Caching transformers surely can speed things up at the expense of disk space to maintain the cache. Personally, my archives are small enough that I just re-decode on the fly. The parallel operation of To get even as low as 70 us/file + 0.34 sec/GB on my box at home, I have to have a statically linked classifier that does its own pass-through for "uncoded data" and trim my environment to just $PATH. That may not sound like much overhead, but the microseconds and milliseconds of overhead pile up when you have 10s of thousands of files on fast NVMe storage or buffered in RAM and most but not all of that is uncoded data. The per-byte costs come from copies over the pipe buffer. My email history is sort of like that. Anyway, the real user time of those overheads is usually smaller/around the same as my time to enter a pattern and interpret the results. A less careful management of the overheads can easily blow that up by 10X or more (some fancy dynlinked bash script dispatcher type stuff to dynlinked cat with a big environment, etc.) which pushes it into annoying territory (at least for me). |
complete/_rg
Outdated
@@ -170,7 +170,8 @@ _rg() { | |||
{-w,--word-regexp}'[only show matches surrounded by word boundaries]' | |||
{-x,--line-regexp}'[only show matches surrounded by line boundaries]' | |||
|
|||
+ '(zip)' # Compressed-file options | |||
+ '(input-decoding)' # Compressed-file options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed the comment here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Did you want that -E,--encoding
option down in that input-decoding group, too? Instead of "misc/other"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, these options were grouped together because they're completed exclusively of each other (that's what the (...)
in the group name means). -E
is independent and doesn't have any particular relationship with other options, so it can be dumped in with the miscellaneous stuff. I can see how the group name input-decoding
might kind of imply that relationship, which is why i'd suggested preprocessor-zip
or something more specific like that; doesn't matter too much tho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Ok. I see. Well, if someone optimizes some other common case and adds another option like --search-pdf (just as an example) we have a general group name. Doesn't matter to me unless someone complains.
collision with ancient compress/uncompress zcat on MacOS/Darwin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! For your first Rust code, this ain't bad at all! :-)
I am going to clean this up with a variety of minor nits, but overall, your approach looks solid!
@c-blake Do you want to think of a different short option for this other than |
Some notes from working through the code and cleaning it up:
I've opened #989 with these changes. |
request: #978
There are a a bunch of choices maybe I should mention in this PR.
I called it "-P,--preprocessor" to suggest its primary function.
I basically just imitated the way decompression was handled with
a new file
src/preprocessor.rs
instead ofsrc/decompressor.rs
.Currently, if both
--search-zip
and--preprocessor PROGRAM
aregiven, the latter is used. This seemed reasonable since preprocessing is more
general and probably involves more sophisticated users who can more easily
include whatever compression programs they want (or don't want) using whatever
dispatching algorithm they want.
For me, it compiles and runs fine on rust-1.27.1 both with no
-P
at all,with
-P program-using-only-stdin
and-P program-using-argv1
. The onlyfailing I see right now on Linux is that if you specify a bogus preprocessor
like
rg -P /junk foo
it does not error out at the very first file.Oh, and while the shell script snippet formats fine in the --help output, the
auto-generated man page corrupts it into a 2- or 3-liner with some chars
dropped out. Not sure what to do about that. Could also just put that
material in GUIDE.md and drop it from the help.
Also, this is my very first significant stab at Rust work. So, I may well
have done some things in an undesirable way.