Skip to content

Commit

Permalink
only split by newlines
Browse files Browse the repository at this point in the history
To reduce overhead of the Extractor itself, we can chunk the work by
lines instead of every whitespace-separated chunk.

This seems to improve the overall cost even more!

Co-authored-by: Jordan Pittman <jordan@cryptica.me>
  • Loading branch information
RobinMalfait and thecrypticace committed Dec 2, 2024
1 parent e99f276 commit 8fe3977
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion crates/oxide/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -456,7 +456,7 @@ fn read_all_files(changed_content: Vec<ChangedContent>) -> Vec<Vec<u8>> {
fn parse_all_blobs(blobs: Vec<Vec<u8>>) -> Vec<String> {
let mut result: Vec<_> = blobs
.par_iter()
.flat_map(|blob| blob.par_split(|x| x.is_ascii_whitespace()))
.flat_map(|blob| blob.par_split(|x| matches!(x, b'\n' | b'\r')))
.map(|blob| Extractor::unique(blob, Default::default()))
.reduce(Default::default, |mut a, b| {
a.extend(b);
Expand Down

0 comments on commit 8fe3977

Please sign in to comment.