You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The method 'parse', child_by_field_name and 'field_id_for_name'uses std::str::from_utf8_unchecked to convert a byte slice to a string without validating whether the input is valid UTF-8. This can lead to undefined behavior (UB) if the input byte slice contains invalid UTF-8 sequences. Since AsRef<[u8]> does not enforce any validation on the byte slice, the caller can supply invalid UTF-8 data, violating the assumptions of from_utf8_unchecked.
pub fn child_by_field_name(&self, field_name: impl AsRef<[u8]>) -> Option<Self> {
let field_name = field_name.as_ref();
let field_name = unsafe { std::str::from_utf8_unchecked(field_name) };
self.inner.child_for_field_name(field_name).map(Into::into)
}
pub fn field_id_for_name(&self, field_name: impl AsRef<[u8]>) -> Option<u16> {
let field_name = field_name.as_ref();
let field_name = unsafe { std::str::from_utf8_unchecked(field_name) };
self.inner.field_id_for_name(field_name)
}
pub fn parse(
&mut self,
text: impl AsRef<[u8]>,
old_tree: Option<&Tree>,
) -> Result<Option<Tree>, ParserError> {
let text = text.as_ref();
let text = unsafe { std::str::from_utf8_unchecked(text) };
let text = &text.into();
let old_tree = old_tree.map(|tree| &tree.inner);
let options = Some(&self.options);
self.inner
.parse_with_string(text, old_tree, options)
.map(|ok| ok.map(Into::into))
.map_err(Into::into)
}
To Reproduce
Steps to reproduce the behavior:
Create an instance of the structure that implements child_by_field_name.
Call the child_by_field_name method with an invalid UTF-8 byte slice as input.
Expected behavior
The method should validate the input byte slice for UTF-8 compliance before using std::str::from_utf8_unchecked. Invalid UTF-8 inputs should result in an error instead of causing undefined behavior.
The panic output when running the provided example:
thread 'main' panicked at core\src\panicking.rs:221:5:
unsafe precondition(s) violated: hint::unreachable_unchecked must never be reached
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread caused non-unwinding panic. aborting.
same for 'field_id_for_name' and 'parse'
The text was updated successfully, but these errors were encountered:
lwz23
changed the title
Unsoundness in 'field_id_for_name' and 'child_by_field_name'
Unsoundness in 'child_by_field_name'
Nov 25, 2024
lwz23
changed the title
Unsoundness in 'child_by_field_name'
Unsoundness in 'child_by_field_name' and 'field_id_for_name'
Nov 25, 2024
lwz23
changed the title
Unsoundness in 'child_by_field_name' and 'field_id_for_name'
Unsoundness in 'parse','child_by_field_name' and 'field_id_for_name'
Nov 25, 2024
The use of std::str::from_utf8_unchecked is limited to topiary-tree-sitter-facade. This is vendored code which is needed for the WASM-based playground. There are plans to move away from the WASM-based playground, so correcting these occurrences is deprioritised.
The problem does not affect the Topiary CLI:
$ echo -ne '\xff\xff'> invalid-utf8.txt
$ topiary format --language json < invalid-utf8.txt[2024-11-25T09:57:58Z ERROR topiary] Failed to read input contents[2024-11-25T09:57:58Z ERROR topiary] Cause: stream did not contain valid UTF-8
$ topiary format --language json --query invalid-utf8.txt <<<"{}"[2024-11-25T09:58:43Z ERROR topiary] Could not read or write to file[2024-11-25T09:58:43Z ERROR topiary] Cause: stream did not contain valid UTF-8
Describe the bug
The method 'parse',
child_by_field_name
and 'field_id_for_name'usesstd::str::from_utf8_unchecked
to convert a byte slice to a string without validating whether the input is valid UTF-8. This can lead to undefined behavior (UB) if the input byte slice contains invalid UTF-8 sequences. SinceAsRef<[u8]>
does not enforce any validation on the byte slice, the caller can supply invalid UTF-8 data, violating the assumptions offrom_utf8_unchecked
.To Reproduce
Steps to reproduce the behavior:
child_by_field_name
.child_by_field_name
method with an invalid UTF-8 byte slice as input.Expected behavior
The method should validate the input byte slice for UTF-8 compliance before using std::str::from_utf8_unchecked. Invalid UTF-8 inputs should result in an error instead of causing undefined behavior.
The panic output when running the provided example:
same for 'field_id_for_name' and 'parse'
The text was updated successfully, but these errors were encountered: