|
| 1 | +<!-- spell-checker:ignore hyperfine tmpfs reflink fsxattr xattrs clonefile vmtouch APFS pathlib Btrfs fallocate journaling --> |
| 2 | + |
| 3 | +# Benchmarking cp |
| 4 | + |
| 5 | +`cp` copies file contents together with metadata such as permissions, ownership, |
| 6 | +timestamps, extended attributes, and directory structures. Although copying |
| 7 | +looks simple, `cp` exercises many filesystem features. Its performance depends |
| 8 | +heavily on the workload shape (large sequential files, many tiny files, special |
| 9 | +files, sparse images) and the storage stack underneath. |
| 10 | + |
| 11 | +## Understanding cp |
| 12 | + |
| 13 | +Most of the time spent inside `cp` falls into two broad categories: |
| 14 | + |
| 15 | +- **Data transfer path**: When copying large contiguous files, throughput is |
| 16 | +dominated by read/write bandwidth. The overhead from `cp` itself comes from |
| 17 | +performing buffered reads and writes, copying memory between buffers, and the |
| 18 | +number of system calls issued per block. |
| 19 | +- **Metadata handling**: When recursively copying trees with thousands of small |
| 20 | +files, performance is limited by metadata work such as `open`, `stat`, |
| 21 | +`lstat`, attribute preservation, directory creation, and link handling. |
| 22 | + |
| 23 | +`cp` supports many switches that alter these paths, including attribute |
| 24 | +preservation, hard-link and reflink creation, sparse detection, and |
| 25 | +`--remove-destination` semantics. Benchmarks should call out which pathways are |
| 26 | +being exercised so results can be interpreted correctly. |
| 27 | + |
| 28 | +## Benchmarking guidelines |
| 29 | + |
| 30 | +- Build a release binary first: `cargo build --release -p uu_cp`. |
| 31 | +- Use `hyperfine` for timing and rely on the `--prepare` hook to reset state |
| 32 | +between runs. |
| 33 | +- Prefer running on a fast device (RAM disk, tmpfs, NVMe) to minimize raw |
| 34 | +storage latency when isolating the cost of the tool. |
| 35 | +- On Linux, control the page cache where appropriate using tools like |
| 36 | +`vmtouch` or `echo 3 > /proc/sys/vm/drop_caches` (root required). Prioritize |
| 37 | +repeatability and stay within the policies of the host system. |
| 38 | +- Keep the workload definition explicit. When comparing against GNU `cp` or |
| 39 | +other implementations, ensure identical datasets and mount options. |
| 40 | + |
| 41 | +## Large-file throughput |
| 42 | + |
| 43 | +1. Create a clean working directory and reduce cache interference. |
| 44 | +2. Generate an input file of known size, for example with `truncate` or `dd`. |
| 45 | +3. Run repeated copies with `hyperfine`, deleting the destination beforehand. |
| 46 | + |
| 47 | +```shell |
| 48 | +mkdir -p benchmark/cp && cd benchmark/cp |
| 49 | +truncate -s 2G input.bin |
| 50 | +hyperfine \ |
| 51 | + --warmup 2 \ |
| 52 | + --prepare 'rm -f output.bin' \ |
| 53 | + '../target/release/cp input.bin output.bin' |
| 54 | +``` |
| 55 | + |
| 56 | +What to record: |
| 57 | + |
| 58 | +- Achieved throughput (MB/s) for large sequential copies. |
| 59 | +- Behavior with `--reflink=auto` or `--sparse=auto` on filesystems that |
| 60 | +support copy-on-write or sparse regions. |
| 61 | +- CPU overhead when enabling attribute preservation such as |
| 62 | +`--preserve=mode,timestamps,xattr`. |
| 63 | + |
| 64 | +If the underlying filesystem performs transparent copy-on-write (for example, |
| 65 | +APFS via `clonefile`), consider running the same benchmark with `--reflink=never` |
| 66 | +or on a filesystem without reflink support to measure raw data transfer. |
| 67 | + |
| 68 | +## Many small files |
| 69 | + |
| 70 | +Large directory trees stress metadata throughput. Pre-create a synthetic tree |
| 71 | +and copy it recursively. |
| 72 | + |
| 73 | +```shell |
| 74 | +mkdir -p dataset/src |
| 75 | +python3 - <<'PY' |
| 76 | +from pathlib import Path |
| 77 | +root = Path('dataset/src') |
| 78 | +for i in range(2000): |
| 79 | + sub = root / f'dir_{i//200}' |
| 80 | + sub.mkdir(parents=True, exist_ok=True) |
| 81 | + for j in range(5): |
| 82 | + path = sub / f'file_{i}_{j}.txt' |
| 83 | + path.write_text('payload' * 16) |
| 84 | +PY |
| 85 | +hyperfine \ |
| 86 | + --warmup 1 \ |
| 87 | + --prepare 'rm -rf dataset/dst && mkdir -p dataset/dst' \ |
| 88 | + '../target/release/cp -r dataset/src dataset/dst' |
| 89 | +``` |
| 90 | + |
| 91 | +What to record: |
| 92 | + |
| 93 | +- Time spent in directory traversal and metadata replication. |
| 94 | +- Impact of toggling options such as `--preserve`, `--no-preserve`, `--link`, |
| 95 | +`--hard-link`, and `--archive`. |
| 96 | +- Behavior when symbolic links or hard links are present, especially with |
| 97 | +`--dereference` versus `--no-dereference`. |
| 98 | + |
| 99 | +## Copy-on-write and sparse files |
| 100 | + |
| 101 | +`--reflink=always` can dramatically reduce work on Btrfs, XFS, APFS, and other |
| 102 | +reflink-aware filesystems. Compare results with `--reflink=never` to understand |
| 103 | +how much time is spent in copy-on-write system calls versus fallback copying. |
| 104 | +Sparse workloads benefit from dedicated benchmarks as well. |
| 105 | + |
| 106 | +```shell |
| 107 | +truncate -s 4G sparse.img |
| 108 | +fallocate -d sparse.img # On filesystems that support punching holes |
| 109 | +hyperfine \ |
| 110 | + --prepare 'rm -f sparse-copy.img' \ |
| 111 | + '../target/release/cp --sparse=always sparse.img sparse-copy.img' |
| 112 | +``` |
| 113 | + |
| 114 | +Check both the elapsed time and the on-disk size of the destination (for |
| 115 | +example using `du -h sparse-copy.img`) to confirm sparse regions are preserved. |
| 116 | + |
| 117 | +## Evaluating attribute preservation and extras |
| 118 | + |
| 119 | +Measure the incremental cost of individual options by enabling them one at a |
| 120 | +time: |
| 121 | + |
| 122 | +- Test `--preserve=context` or `--preserve=xattr` on files that actually carry |
| 123 | +extended attributes. |
| 124 | +- Evaluate ACL and SELinux handling with `--archive` on systems where those |
| 125 | +features are active. |
| 126 | +- Compare modes that remove or back up the destination (`--remove-destination`, |
| 127 | +`--backup=numbered`) to see the impact of extra file operations. |
| 128 | + |
| 129 | +Supplementary analysis with `strace -c` or `perf record` can show which system |
| 130 | +calls dominate and guide optimization work. |
| 131 | + |
| 132 | +## Interpreting results |
| 133 | + |
| 134 | +- If a benchmark completes in well under a second, increase the dataset size to |
| 135 | +reduce process start-up noise. |
| 136 | +- Document filesystem features such as journaling, compression, or encryption |
| 137 | +that may skew results. |
| 138 | +- When changes are made to `cp`, track how system call counts, I/O patterns, |
| 139 | +and CPU time shift between runs to catch regressions early. |
| 140 | + |
| 141 | +Use these guidelines to isolate the workloads you care about (large sequential |
| 142 | +transfers, directory-heavy copies, attribute preservation, reflink paths) and |
| 143 | +collect reproducible measurements. |
0 commit comments