Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsoundness in 'parse','child_by_field_name' and 'field_id_for_name' #792

Open
lwz23 opened this issue Nov 25, 2024 · 1 comment
Open

Unsoundness in 'parse','child_by_field_name' and 'field_id_for_name' #792

lwz23 opened this issue Nov 25, 2024 · 1 comment

Comments

@lwz23
Copy link

lwz23 commented Nov 25, 2024

Describe the bug
The method 'parse', child_by_field_name and 'field_id_for_name'uses std::str::from_utf8_unchecked to convert a byte slice to a string without validating whether the input is valid UTF-8. This can lead to undefined behavior (UB) if the input byte slice contains invalid UTF-8 sequences. Since AsRef<[u8]> does not enforce any validation on the byte slice, the caller can supply invalid UTF-8 data, violating the assumptions of from_utf8_unchecked.

pub fn child_by_field_name(&self, field_name: impl AsRef<[u8]>) -> Option<Self> {
            let field_name = field_name.as_ref();
            let field_name = unsafe { std::str::from_utf8_unchecked(field_name) };
            self.inner.child_for_field_name(field_name).map(Into::into)
        }

pub fn field_id_for_name(&self, field_name: impl AsRef<[u8]>) -> Option<u16> {
            let field_name = field_name.as_ref();
            let field_name = unsafe { std::str::from_utf8_unchecked(field_name) };
            self.inner.field_id_for_name(field_name)
        }

pub fn parse(
            &mut self,
            text: impl AsRef<[u8]>,
            old_tree: Option<&Tree>,
        ) -> Result<Option<Tree>, ParserError> {
            let text = text.as_ref();
            let text = unsafe { std::str::from_utf8_unchecked(text) };
            let text = &text.into();
            let old_tree = old_tree.map(|tree| &tree.inner);
            let options = Some(&self.options);
            self.inner
                .parse_with_string(text, old_tree, options)
                .map(|ok| ok.map(Into::into))
                .map_err(Into::into)
        }

To Reproduce
Steps to reproduce the behavior:

  1. Create an instance of the structure that implements child_by_field_name.
  2. Call the child_by_field_name method with an invalid UTF-8 byte slice as input.
fn main() {
    // Invalid UTF-8 byte sequence
    let invalid_utf8: &[u8] = &[0xFF, 0xFF]; // Invalid UTF-8 bytes
    let obj = MyStruct { inner: InnerStruct };

    obj.child_by_field_name(invalid_utf8); // Call the unsafe method
}

Expected behavior
The method should validate the input byte slice for UTF-8 compliance before using std::str::from_utf8_unchecked. Invalid UTF-8 inputs should result in an error instead of causing undefined behavior.

The panic output when running the provided example:

thread 'main' panicked at core\src\panicking.rs:221:5:
unsafe precondition(s) violated: hint::unreachable_unchecked must never be reached
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread caused non-unwinding panic. aborting.

same for 'field_id_for_name' and 'parse'

@lwz23 lwz23 changed the title Unsoundness in 'field_id_for_name' and 'child_by_field_name' Unsoundness in 'child_by_field_name' Nov 25, 2024
@lwz23 lwz23 changed the title Unsoundness in 'child_by_field_name' Unsoundness in 'child_by_field_name' and 'field_id_for_name' Nov 25, 2024
@lwz23 lwz23 changed the title Unsoundness in 'child_by_field_name' and 'field_id_for_name' Unsoundness in 'parse','child_by_field_name' and 'field_id_for_name' Nov 25, 2024
@Xophmeister
Copy link
Member

The use of std::str::from_utf8_unchecked is limited to topiary-tree-sitter-facade. This is vendored code which is needed for the WASM-based playground. There are plans to move away from the WASM-based playground, so correcting these occurrences is deprioritised.

The problem does not affect the Topiary CLI:

$ echo -ne '\xff\xff' > invalid-utf8.txt
$ topiary format --language json < invalid-utf8.txt
[2024-11-25T09:57:58Z ERROR topiary] Failed to read input contents
[2024-11-25T09:57:58Z ERROR topiary] Cause: stream did not contain valid UTF-8
$ topiary format --language json --query invalid-utf8.txt <<< "{}"
[2024-11-25T09:58:43Z ERROR topiary] Could not read or write to file
[2024-11-25T09:58:43Z ERROR topiary] Cause: stream did not contain valid UTF-8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants