Skip to content

Conversation

@drinkcat
Copy link
Collaborator

@drinkcat drinkcat commented Apr 18, 2025

Follow-up of a discussion in #7645 by @karlmcdowall. We came up with similar string-arithmetic functions, so it's probably best to merge them. As part of this, I realized that I could easily extract fast_inc_one from fast_inc, as this is what cat requires.

This stacks on top of #7564, and makes me a bit more comfortable about the extra complexity, seeing that this code can be reused.

Performance-wise, this is about 4-5% better than the original implementation.

find -name '*.rs' | xargs -I{} cat {} > all
cargo build -r -p uu_cat && taskset -c 0 hyperfine -L cat cat,./cat-main,./target/release/cat "{cat} -n all"
    Finished `release` profile [optimized] target(s) in 0.07s
Benchmark 1: cat -n all
  Time (mean ± σ):      39.1 ms ±   2.0 ms    [User: 33.6 ms, System: 5.2 ms]
  Range (min … max):    38.4 ms …  53.0 ms    64 runs
 
Benchmark 2: ./cat-main -n all
  Time (mean ± σ):      21.8 ms ±   0.3 ms    [User: 15.0 ms, System: 6.7 ms]
  Range (min … max):    21.0 ms …  22.4 ms    132 runs
 
Benchmark 3: ./target/release/cat -n all
  Time (mean ± σ):      20.9 ms ±   1.2 ms    [User: 14.1 ms, System: 6.6 ms]
  Range (min … max):    20.1 ms …  33.8 ms    136 runs
 
Summary
  ./target/release/cat -n all ran
    1.04 ± 0.06 times faster than ./cat-main -n all
    1.87 ± 0.14 times faster than cat -n all

cat: add LineNumber.to_str to clean up tests, limit to 32 digits

uucore: fast_inc: Change start to a &mut

Instead of having the caller repeatedly reassign start, it's easier
to just pass it as a mutable reference.

cat: Switch to uucore's fast_inc_one

Instead of reimplementing a string increment function, use the
one in uucore. Also, performance is around 5% better.

uucore: Move fast_inc functions from seq

A new fast-inc feature, to be used by seq and cat.

seq: fast_inc: split carry operation to a separate function

This has no impact on performance, and will be useful for the
cat usecase when we move this to uucore.

assert_eq!(b" 1\t", incrementing_string.buf.as_slice());
assert_eq!(
b" 1\t",
&incrementing_string.buf[incrementing_string.print_start..]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I see those not-so-nice changes here, I wonder if I should instead create some struct to encapsulate the buffer, start/end position (basically, similar to what LineNumber does here, but move to uucore)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it difficult to merge code, as the logic in cat and seq is quite custom and different (different padding logic, etc...). We could try to merge if we have more callers, otherwise I think it's good as is.

One thing I changed though is to pass start as a &mut, makes the calling API a little bit nicer IMHO.

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

Copy link
Contributor

@karlmcdowall karlmcdowall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's great to move cat over to the new uucore code, thanks for cleaning this all up 👍

// called, using uucore's fast_inc function that operates on strings.
impl LineNumber {
fn new() -> Self {
// 1024-digit long line number should be enough to run `cat` for the lifetime of the universe.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it straight forward to change buf to a [u8; 1024] rather than using the Vec<u8>? I think that would be slightly more efficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the buffer only needs to hold up to u64::MAX, plus the b'\t' right? Would save a few bytes...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I could shrink the buffer quite a bit.

Copy link
Collaborator Author

@drinkcat drinkcat Apr 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the line number could grow larger than u64::MAX, as fast_inc operates on arbitrary large integers (but that's mostly a theoretical question, not something that would happen in reality...).

In any case, 32 digits is more than enough, and I used a [u8; 32] as you advised.

Thanks!

let mut new_val = inc[inc_pos] + carry;
// Be careful here, only add existing digit of val.
if pos >= start {
new_val += val[pos] - b'0';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an API change, so I'm not sure if you want to take it on, but if inc already had the b'0' subtracted from each digit (i.e. rather than each u8 in inc containing the ascii character for the value, it just contains the value) you could avoid subtracting the 'b'0' for every character on every loop.
Does that make sense? I'd probably want to wrap the inc in my own struct type like...

struct FastIncDelta {
   buf: Vec<u8>,
}

impl FastIncDelta {
    fn new(mut val: usize) -> Self {
        let buf: Vec<u8>;
        while val > 0 {
            buf.insert(0, val%10);
            val = val/10;
        }
    }
}

I wouldn't mainline that code obviously but hopefully that gives you the idea of what I'm thinking.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, if I had a FastIncrementer struct, I could easily precompute the increment to subtract the '0's...

But, I don't think this is going to help, because I then need to add extra logic in the loop (since I used to rely on inc digits starting with 0...) and the execution becomes slower.

In caller: let inc_str: Vec<u8> = inc_str.as_bytes().iter().map(|x| *x - b'0').collect();

Then:

        let mut new_val = inc[inc_pos] + carry;
        // Be careful here, only add existing digit of val.
        if pos >= start {
            new_val += val[pos];
        } else { /// <<< this is a problem
            new_val += b'0';
        }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I understand now. Thanks!

@sylvestre sylvestre force-pushed the seq-perf-more-use-cat branch from 592a219 to 63c7edd Compare April 19, 2025 10:03
@karlmcdowall
Copy link
Contributor

Hey, I just wanted to draw your attention to format_byte_offset in od/src/inputoffset.rs. Seems like another good candidate for a client of this code, though we'd need to add support for hex and octal representations (not sure if that's something you want to bake into the API?).

@drinkcat
Copy link
Collaborator Author

Hey, I just wanted to draw your attention to format_byte_offset in od/src/inputoffset.rs. Seems like another good candidate for a client of this code, though we'd need to add support for hex and octal representations (not sure if that's something you want to bake into the API?).

Interesting. We could do some quick prototype to evaluate performance. One thing to note though is that seq and cat do very little other computation/formatting apart from incrementing that number (seq does nothing else, cat just prints the rest of the line), so the performance advantage of a custom incrementer is very large. It may not matter as much in od (or hexdump) that have to do a lot more formatting on the rest of the data.

@sylvestre sylvestre requested a review from Copilot April 21, 2025 08:25
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR extracts string‐arithmetic functions from seq into uucore, and then uses the optimized fast_inc_one function in cat to improve performance and reduce duplicated code. Key changes include:

  • Moving and exposing fast_inc functions from seq in uucore.
  • Updating seq to enable a fast-print integer path using fast_inc.
  • Refactoring cat’s LineNumber implementation to leverage fast_inc_one.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/by-util/test_seq.rs Adds extra test cases for empty separators in both integer and float paths.
src/uucore/src/lib/lib.rs Re-exports the fast_inc module under a new feature flag.
src/uucore/src/lib/features/fast_inc.rs Implements fast_inc and fast_inc_one for efficient string arithmetic.
src/uucore/src/lib/features/extendedbigdecimal.rs Updates spell-check ignore list with new identifiers.
src/uucore/src/lib/features.rs Includes the fast_inc module.
src/uucore/Cargo.toml Adds the fast-inc feature dependency.
src/uu/seq/src/seq.rs Introduces a fast_print_seq function and fast path activation logic.
src/uu/seq/BENCHMARKING.md Documents the performance gains from using the fast increment path.
src/uu/cat/src/cat.rs Refactors the LineNumber struct to use a fixed-size array with fast_inc_one.
src/uu/cat/Cargo.toml Enables the fast-inc feature in cat.
Comments suppressed due to low confidence (1)

src/uucore/src/lib/features/fast_inc.rs:33

  • [nitpick] Consider renaming the variable 'pos' to a more descriptive name such as 'index' to clarify its role as a buffer index.
let mut pos = end;

…ements

A lot of custom logic, we basically do arithmetic on character
arrays, but this comes at with huge performance gains.

Unlike coreutils `seq`, we do this for all positive increments
(because why not), and we do not fall back to slow path if
the last parameter is in scientific notation.

Also, add some tests for empty separator, as that may catch
some corner cases.
It is actually quite easy to implement, we just start with
a padded number and increment as usual.
This has no impact on performance, and will be useful for the
`cat` usecase when we move this to uucore.
A new fast-inc feature, to be used by seq and cat.
Instead of reimplementing a string increment function, use the
one in uucore. Also, performance is around 5% better.
Instead of having the caller repeatedly reassign start, it's easier
to just pass it as a mutable reference.
@drinkcat drinkcat force-pushed the seq-perf-more-use-cat branch from 63c7edd to e84de9b Compare April 21, 2025 09:32
@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/misc/tee is no longer failing!

///
/// We also assume that there is enough space in val to expand if start needs
/// to be updated.
/// ```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use the // example syntax ?

Copy link
Collaborator Author

@drinkcat drinkcat Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the markdown header?

///
/// # Examples
///
/// ```

Those examples are already compiled when running with cargo test --workspace --doc

@sylvestre sylvestre merged commit 1986c96 into uutils:main Apr 22, 2025
70 checks passed
@drinkcat drinkcat deleted the seq-perf-more-use-cat branch April 22, 2025 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants