Skip to content

Commit 1331ff1

Browse files
authored
docs: add benchmarking guidelines for cp (#8807)
1 parent be25344 commit 1331ff1

File tree

4 files changed

+273
-0
lines changed

4 files changed

+273
-0
lines changed

Cargo.lock

Lines changed: 2 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/uu/cp/BENCHMARKING.md

Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
<!-- spell-checker:ignore hyperfine tmpfs reflink fsxattr xattrs clonefile vmtouch APFS pathlib Btrfs fallocate journaling -->
2+
3+
# Benchmarking cp
4+
5+
`cp` copies file contents together with metadata such as permissions, ownership,
6+
timestamps, extended attributes, and directory structures. Although copying
7+
looks simple, `cp` exercises many filesystem features. Its performance depends
8+
heavily on the workload shape (large sequential files, many tiny files, special
9+
files, sparse images) and the storage stack underneath.
10+
11+
## Understanding cp
12+
13+
Most of the time spent inside `cp` falls into two broad categories:
14+
15+
- **Data transfer path**: When copying large contiguous files, throughput is
16+
dominated by read/write bandwidth. The overhead from `cp` itself comes from
17+
performing buffered reads and writes, copying memory between buffers, and the
18+
number of system calls issued per block.
19+
- **Metadata handling**: When recursively copying trees with thousands of small
20+
files, performance is limited by metadata work such as `open`, `stat`,
21+
`lstat`, attribute preservation, directory creation, and link handling.
22+
23+
`cp` supports many switches that alter these paths, including attribute
24+
preservation, hard-link and reflink creation, sparse detection, and
25+
`--remove-destination` semantics. Benchmarks should call out which pathways are
26+
being exercised so results can be interpreted correctly.
27+
28+
## Benchmarking guidelines
29+
30+
- Build a release binary first: `cargo build --release -p uu_cp`.
31+
- Use `hyperfine` for timing and rely on the `--prepare` hook to reset state
32+
between runs.
33+
- Prefer running on a fast device (RAM disk, tmpfs, NVMe) to minimize raw
34+
storage latency when isolating the cost of the tool.
35+
- On Linux, control the page cache where appropriate using tools like
36+
`vmtouch` or `echo 3 > /proc/sys/vm/drop_caches` (root required). Prioritize
37+
repeatability and stay within the policies of the host system.
38+
- Keep the workload definition explicit. When comparing against GNU `cp` or
39+
other implementations, ensure identical datasets and mount options.
40+
41+
## Large-file throughput
42+
43+
1. Create a clean working directory and reduce cache interference.
44+
2. Generate an input file of known size, for example with `truncate` or `dd`.
45+
3. Run repeated copies with `hyperfine`, deleting the destination beforehand.
46+
47+
```shell
48+
mkdir -p benchmark/cp && cd benchmark/cp
49+
truncate -s 2G input.bin
50+
hyperfine \
51+
--warmup 2 \
52+
--prepare 'rm -f output.bin' \
53+
'../target/release/cp input.bin output.bin'
54+
```
55+
56+
What to record:
57+
58+
- Achieved throughput (MB/s) for large sequential copies.
59+
- Behavior with `--reflink=auto` or `--sparse=auto` on filesystems that
60+
support copy-on-write or sparse regions.
61+
- CPU overhead when enabling attribute preservation such as
62+
`--preserve=mode,timestamps,xattr`.
63+
64+
If the underlying filesystem performs transparent copy-on-write (for example,
65+
APFS via `clonefile`), consider running the same benchmark with `--reflink=never`
66+
or on a filesystem without reflink support to measure raw data transfer.
67+
68+
## Many small files
69+
70+
Large directory trees stress metadata throughput. Pre-create a synthetic tree
71+
and copy it recursively.
72+
73+
```shell
74+
mkdir -p dataset/src
75+
python3 - <<'PY'
76+
from pathlib import Path
77+
root = Path('dataset/src')
78+
for i in range(2000):
79+
sub = root / f'dir_{i//200}'
80+
sub.mkdir(parents=True, exist_ok=True)
81+
for j in range(5):
82+
path = sub / f'file_{i}_{j}.txt'
83+
path.write_text('payload' * 16)
84+
PY
85+
hyperfine \
86+
--warmup 1 \
87+
--prepare 'rm -rf dataset/dst && mkdir -p dataset/dst' \
88+
'../target/release/cp -r dataset/src dataset/dst'
89+
```
90+
91+
What to record:
92+
93+
- Time spent in directory traversal and metadata replication.
94+
- Impact of toggling options such as `--preserve`, `--no-preserve`, `--link`,
95+
`--hard-link`, and `--archive`.
96+
- Behavior when symbolic links or hard links are present, especially with
97+
`--dereference` versus `--no-dereference`.
98+
99+
## Copy-on-write and sparse files
100+
101+
`--reflink=always` can dramatically reduce work on Btrfs, XFS, APFS, and other
102+
reflink-aware filesystems. Compare results with `--reflink=never` to understand
103+
how much time is spent in copy-on-write system calls versus fallback copying.
104+
Sparse workloads benefit from dedicated benchmarks as well.
105+
106+
```shell
107+
truncate -s 4G sparse.img
108+
fallocate -d sparse.img # On filesystems that support punching holes
109+
hyperfine \
110+
--prepare 'rm -f sparse-copy.img' \
111+
'../target/release/cp --sparse=always sparse.img sparse-copy.img'
112+
```
113+
114+
Check both the elapsed time and the on-disk size of the destination (for
115+
example using `du -h sparse-copy.img`) to confirm sparse regions are preserved.
116+
117+
## Evaluating attribute preservation and extras
118+
119+
Measure the incremental cost of individual options by enabling them one at a
120+
time:
121+
122+
- Test `--preserve=context` or `--preserve=xattr` on files that actually carry
123+
extended attributes.
124+
- Evaluate ACL and SELinux handling with `--archive` on systems where those
125+
features are active.
126+
- Compare modes that remove or back up the destination (`--remove-destination`,
127+
`--backup=numbered`) to see the impact of extra file operations.
128+
129+
Supplementary analysis with `strace -c` or `perf record` can show which system
130+
calls dominate and guide optimization work.
131+
132+
## Interpreting results
133+
134+
- If a benchmark completes in well under a second, increase the dataset size to
135+
reduce process start-up noise.
136+
- Document filesystem features such as journaling, compression, or encryption
137+
that may skew results.
138+
- When changes are made to `cp`, track how system call counts, I/O patterns,
139+
and CPU time shift between runs to catch regressions early.
140+
141+
Use these guidelines to isolate the workloads you care about (large sequential
142+
transfers, directory-heavy copies, attribute preservation, reflink paths) and
143+
collect reproducible measurements.

src/uu/cp/Cargo.toml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,15 @@ exacl = { workspace = true, optional = true }
4747
name = "cp"
4848
path = "src/main.rs"
4949

50+
[dev-dependencies]
51+
divan = { workspace = true }
52+
tempfile = { workspace = true }
53+
uucore = { workspace = true, features = ["benchmark"] }
54+
55+
[[bench]]
56+
name = "cp_bench"
57+
harness = false
58+
5059
[features]
5160
feat_selinux = ["selinux", "uucore/selinux"]
5261
feat_acl = ["exacl"]

src/uu/cp/benches/cp_bench.rs

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
// This file is part of the uutils coreutils package.
2+
//
3+
// For the full copyright and license information, please view the LICENSE
4+
// file that was distributed with this source code.
5+
6+
use divan::{Bencher, black_box};
7+
use std::fs::{self, File};
8+
use std::io::Write;
9+
use std::path::Path;
10+
use tempfile::TempDir;
11+
use uu_cp::uumain;
12+
use uucore::benchmark::{fs_tree, run_util_function};
13+
14+
fn remove_path(path: &Path) {
15+
if !path.exists() {
16+
return;
17+
}
18+
19+
if path.is_dir() {
20+
fs::remove_dir_all(path).unwrap();
21+
} else {
22+
fs::remove_file(path).unwrap();
23+
}
24+
}
25+
26+
fn bench_cp_directory<F>(bencher: Bencher, args: &[&str], setup_source: F)
27+
where
28+
F: Fn(&Path),
29+
{
30+
let temp_dir = TempDir::new().unwrap();
31+
let source = temp_dir.path().join("source");
32+
let dest = temp_dir.path().join("dest");
33+
34+
fs::create_dir(&source).unwrap();
35+
setup_source(&source);
36+
37+
let source_str = source.to_str().unwrap();
38+
let dest_str = dest.to_str().unwrap();
39+
40+
bencher.bench(|| {
41+
remove_path(&dest);
42+
43+
let mut full_args = Vec::with_capacity(args.len() + 2);
44+
full_args.extend_from_slice(args);
45+
full_args.push(source_str);
46+
full_args.push(dest_str);
47+
48+
black_box(run_util_function(uumain, &full_args));
49+
});
50+
}
51+
52+
#[divan::bench(args = [(5, 4, 10)])]
53+
fn cp_recursive_balanced_tree(
54+
bencher: Bencher,
55+
(depth, dirs_per_level, files_per_dir): (usize, usize, usize),
56+
) {
57+
bench_cp_directory(bencher, &["-R"], |source| {
58+
fs_tree::create_balanced_tree(source, depth, dirs_per_level, files_per_dir);
59+
});
60+
}
61+
62+
#[divan::bench(args = [(5, 4, 10)])]
63+
fn cp_archive_balanced_tree(
64+
bencher: Bencher,
65+
(depth, dirs_per_level, files_per_dir): (usize, usize, usize),
66+
) {
67+
bench_cp_directory(bencher, &["-a"], |source| {
68+
fs_tree::create_balanced_tree(source, depth, dirs_per_level, files_per_dir);
69+
});
70+
}
71+
72+
#[divan::bench(args = [(6000, 800)])]
73+
fn cp_recursive_wide_tree(bencher: Bencher, (total_files, total_dirs): (usize, usize)) {
74+
bench_cp_directory(bencher, &["-R"], |source| {
75+
fs_tree::create_wide_tree(source, total_files, total_dirs);
76+
});
77+
}
78+
79+
#[divan::bench(args = [(120, 4)])]
80+
fn cp_recursive_deep_tree(bencher: Bencher, (depth, files_per_level): (usize, usize)) {
81+
bench_cp_directory(bencher, &["-R"], |source| {
82+
fs_tree::create_deep_tree(source, depth, files_per_level);
83+
});
84+
}
85+
86+
#[divan::bench(args = [(5, 4, 10)])]
87+
fn cp_preserve_metadata(
88+
bencher: Bencher,
89+
(depth, dirs_per_level, files_per_dir): (usize, usize, usize),
90+
) {
91+
bench_cp_directory(bencher, &["-R", "--preserve=mode,timestamps"], |source| {
92+
fs_tree::create_balanced_tree(source, depth, dirs_per_level, files_per_dir);
93+
});
94+
}
95+
96+
#[divan::bench(args = [16])]
97+
fn cp_large_file(bencher: Bencher, size_mb: usize) {
98+
let temp_dir = TempDir::new().unwrap();
99+
let source = temp_dir.path().join("source.bin");
100+
let dest = temp_dir.path().join("dest.bin");
101+
102+
let buffer = vec![b'x'; size_mb * 1024 * 1024];
103+
let mut file = File::create(&source).unwrap();
104+
file.write_all(&buffer).unwrap();
105+
file.sync_all().unwrap();
106+
107+
let source_str = source.to_str().unwrap();
108+
let dest_str = dest.to_str().unwrap();
109+
110+
bencher.bench(|| {
111+
remove_path(&dest);
112+
113+
black_box(run_util_function(uumain, &[source_str, dest_str]));
114+
});
115+
}
116+
117+
fn main() {
118+
divan::main();
119+
}

0 commit comments

Comments
 (0)