Skip to content

Conversation

@dekuu5
Copy link
Contributor

@dekuu5 dekuu5 commented Aug 31, 2025

Summary

This PR addresses a TODO in find_mount_point() by properly handling mount point paths that contain non-UTF8 bytes. Previously, to_string_lossy() was used, which could corrupt mount point names with invalid UTF-8 sequences. The fix ensures raw bytes are preserved on Unix-like systems and safely handled on Windows.

Changes Made

  1. find_mount_point() return type updated
    Changed from Option<String>Option<OsString> to preserve original bytes.

  2. Added OsStr variant to OutputType enum
    Allows proper handling of non-UTF8 data through the output pipeline.

  3. Implemented print_os_str()

    • Unix/macOS: prints raw bytes directly, preserving non-UTF8 sequences.
    • Windows: falls back to .to_string_lossy(), safely handling UTF-16 strings.
  4. Added pad_and_print_bytes() helper
    Handles alignment, width, and padding for raw byte output.

  5. Updated 'm' format specifier
    Returns OutputType::OsStr instead of a lossy string, preserving original data.

Platform Behavior

  • Unix/macOS: preserves non-UTF8 bytes when printing.
  • Windows: uses .to_string_lossy() safely, since Windows internally uses UTF-16.

Related Issues

Fixes the TODO in stat.rs:

it is mentioned in this issue

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@sylvestre
Copy link
Contributor

please ask your AI to cleanup a bit the patch, too much duplicated code

  • tests are missing, thanks :)

@dekuu5
Copy link
Contributor Author

dekuu5 commented Aug 31, 2025

Hello @sylvestre
I made a change in the print_os_str if that's what you meant by a duplicate code I will upload it
About the tests can you explain how I can write tests for that code I think it is an io operation I tested it locally the same why I mentioned in the issue and it fixes the bug

Comment on lines 140 to 146
for _ in 0..padding_needed {
print!(" ");
}
} else {
for _ in 0..padding_needed {
print!(" ");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is duplicated code here

Comment on lines 147 to 150
io::stdout().write_all(display_bytes)?;
}
} else {
io::stdout().write_all(display_bytes)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay thanks i will edit it

@sylvestre
Copy link
Contributor

can you explain how I can write tests

please look at the other tests for this :)

@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)

@github-actions
Copy link

github-actions bot commented Sep 2, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

1 similar comment
@github-actions
Copy link

github-actions bot commented Sep 2, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

io::stdout().write_all(display_bytes)?;

if left && padding_needed > 0 {
for _ in 0..padding_needed {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still duplicated code

let bytes = s.as_bytes();

if pad_and_print_bytes(bytes, flags.left, width, precision).is_err() {
let lossy_string = s.to_string_lossy();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename the variable to explain what it is, not the style

@github-actions
Copy link

github-actions bot commented Sep 3, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

@github-actions
Copy link

github-actions bot commented Sep 3, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

if path.starts_with(root) {
// TODO: This is probably wrong, we should pass the OsString
return Some(root.to_string_lossy().into_owned());
return Some(root.clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we avoid the clone here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we can by annotating the function and the enum OutputType by a lifetime to be like this
fn find_mount_point<'a, P: AsRef<Path>>(&'a self, p: P) -> Option<&'a OsString>
and the outputtype be generic over 'a but in line
1064: OutputType::OsStr(self.find_mount_point(file).unwrap())
i can't use unwrap or default i don't know how

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it errors we can fall back to normal outputType::str with a default value of "" is that acceptable ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please investigate :)

@github-actions
Copy link

github-actions bot commented Sep 3, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@github-actions
Copy link

github-actions bot commented Sep 3, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

///
/// On Unix systems, this preserves non-UTF8 data by printing raw bytes
/// On other platforms, falls back to lossy string conversion
fn pad_and_print_bytes(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have unit tests in this file, maybe test also this function ? thanks

@github-actions
Copy link

github-actions bot commented Sep 3, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

@github-actions
Copy link

github-actions bot commented Sep 4, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

@dekuu5 dekuu5 requested a review from sylvestre September 4, 2025 15:15
@dekuu5
Copy link
Contributor Author

dekuu5 commented Sep 6, 2025

Hello @sylvestre, is there any updates on this pr

@github-actions
Copy link

github-actions bot commented Sep 7, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@sylvestre sylvestre force-pushed the fix/stat-string-to-osstr branch from a0fd51e to 300fbdb Compare September 9, 2025 08:59
@github-actions
Copy link

github-actions bot commented Sep 9, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

(padding_needed, 0)
};

writer.write_all(&vec![b' '; left_pad])?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try you try using using repeat() iterator or a pre-allocated buffer ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like
std::io::repeat(b' ').take(n)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay will take a look at it

@dekuu5 dekuu5 requested a review from sylvestre September 9, 2025 18:56
@github-actions
Copy link

github-actions bot commented Sep 9, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

@github-actions
Copy link

github-actions bot commented Sep 9, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

@dekuu5
Copy link
Contributor Author

dekuu5 commented Sep 16, 2025

Hello @sylvestre , is there any updates on this pr

@sylvestre sylvestre merged commit 12e875c into uutils:main Sep 24, 2025
95 checks passed
@dekuu5 dekuu5 deleted the fix/stat-string-to-osstr branch September 29, 2025 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

stat: mount point output should preserve non-UTF8 bytes

2 participants