sp (Search and Print) is a basic implementation of grep/ripgrep. It can be used to find patterns/words in files.
The main idea behind sp was to create a cli that in terms of features lies somewhere between simply matching a substring and a regex search. In its current state, sp can be best seen as a limited extension of a substring search.
USAGE:
sp [OPTIONS] <PATTERN> <PATH>
ARGS:
<PATTERN> A pattern used for matching
<PATH> A file to search
OPTIONS:
-c, --count Suppress normal output and show number of matching lines
-e, --ends-with Only show matches containing fields ending with PATTERN
-h, --help Prints help information
-i, --ignore-case Case insensitive search
-m, --max-count <NUM> Limit number of shown matches
-n, --no-line-number Do not show line number which is enabled by default
-s, --starts-with Only show matches containing fields starting with PATTERN
-V, --version Prints version information
-w, --words Whole words search (i.e. non-word characters are stripped)
Fields are strings separated by contiguous whitespace (as defined by Unicode)
This is a Rust project so first you have to make sure that Rust is running on your machine.
To build this project:
$ git clone https://github.com/streof/sp
$ cd sp
$ cargo build --release
$ ./target/release/sp --version
sp 0.1.2
sp includes unit and integration tests. Run them as follows:
$ cargo test
Tha main goal of this project was to reimplement a small number of grep/ripgrep alike features. Performance was not a strict requirement althought, in my opinion, cli's should at least be perceived as fast by their users. Performance is obviously a trade-off and for sp depends on things like:
- memory allocation
- cpu utilization
- heuristics
- algorithm implementation (e.g. searching, encoding/decoding)
- number system calls
Here are some thoughts from the exploration phase:
- A simple way to reduce memory allocation is using an IO buffer. Rust's
standard library provides for example the very convenient
lines
API but also the lower levelread_line
andread_until
methods. Initially, this project usedread_line
but then I read this reddit thread where linereader was mentioned. I ended up using bstr which offers a good balance between rich, ergonomic API and performance (see this commit). - Counts in sp rely on a very naive implementation that does not take any advantage of modern CPU capabilities (see for example bytecount)
- The current matching algorithm relies on high level API's exposed by
bstr
. However, I also performed some simple benchmarks which suggested that switching totwoway
will give a significant performance boost (>2x speedup). Rust uses the twoway algorithm for things like pattern matching, althought the implementation differs from the one provided by the twoway crate. - In some cases the number of read syscalls used by sp is significantly higher than when using ripgrep.
- Ripgrep uses
encoding_rs
for fast encoding/decoding.