Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler panic: "byte index 10 is not a char boundary" #62524

Closed
dwrensha opened this issue Jul 9, 2019 · 15 comments · Fixed by #66054 or #66429
Closed

compiler panic: "byte index 10 is not a char boundary" #62524

dwrensha opened this issue Jul 9, 2019 · 15 comments · Fixed by #66054 or #66429
Labels
A-parser Area: The parsing of Rust source code to an AST A-Unicode Area: Unicode C-bug Category: This is a bug. glacier ICE tracked in rust-lang/glacier. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@dwrensha
Copy link
Contributor

dwrensha commented Jul 9, 2019

$ rustc -Vv
rustc 1.36.0 (a53f9df32 2019-07-03)
binary: rustc
commit-hash: a53f9df32fbb0b5f4382caaad8f1a46f36ea887c
commit-date: 2019-07-03
host: x86_64-apple-darwin
release: 1.36.0
LLVM version: 8.0

$ echo Zm4gbWFpbigo2Lw= | base64 -D > main.rs

$ rustc main.rs
error: this file contains an un-closed delimiter
 --> main.rs:1:11
  |
1 | fn main((ؼ
  |        -- ^
  |        ||
  |        |un-closed delimiter
  |        un-closed delimiter

thread 'rustc' panicked at 'byte index 10 is not a char boundary; it is inside 'ؼ' (bytes 9..11) of `fn main((ؼ`', src/libcore/str/mod.rs:2034:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
error: aborting due to previous error


error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports

note: rustc 1.36.0 (a53f9df32 2019-07-03) running on x86_64-apple-darwin

Found with the help of libfuzzer-sys.

@ExpHP
Copy link
Contributor

ExpHP commented Jul 9, 2019

  12: core::str::traits::<impl core::slice::SliceIndex<str> for core::ops::range::RangeFrom<usize>>::index::{{closure}}
  13: syntax::source_map::SourceMap::find_width_of_character_at_span
  14: syntax::source_map::SourceMap::next_point
  15: syntax::parse::diagnostics::<impl syntax::parse::parser::Parser>::unexpected_try_recover
  16: syntax::parse::parser::Parser::parse_fn_args::{{closure}}
  17: syntax::parse::parser::Parser::parse_fn_args
  18: syntax::parse::parser::Parser::parse_fn_decl
  19: syntax::parse::parser::Parser::parse_item_fn
  20: syntax::parse::parser::Parser::parse_item_implementation
  21: syntax::parse::parser::Parser::parse_item_

There's two slicing operations in find_width_of_character_at_span. I'm trying to figure out how a non-boundary index could be appearing in there... maybe unexpected_try_recover is calling next_point with a span that begins at index 10? (where does this come from? Parser.prev_span?) Building a debug compiler to check those log statements...

Edit: Building the debug compiler was a bust. Embarassingly, things seem to have changed and I don't know how to get those debug macros to fire in the current compiler.

@jonas-schievink jonas-schievink added A-parser Area: The parsing of Rust source code to an AST C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ I-nominated T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 10, 2019
@pnkfelix
Copy link
Member

pnkfelix commented Jul 11, 2019

@ExpHP do you mean you were using RUST_LOG and not seeing debug output?

If so, that is because the environment variable name under rustc was changed to RUSTC_LOG. (I make this mistake pretty much every day due to muscle memory.)


As an example, here is the tail of my log output:

% RUSTC_LOG=syntax::parse,syntax::source_map ./build/x86_64-unknown-linux-gnu/stage1/bin/rustc  /tmp/off_index.rs
DEBUG 2019-07-11T12:58:22Z: syntax::parse::attr: parse_outer_attributes: self.token=Token { kind: OpenDelim(Paren), spa\
n: Span { lo: BytePos(8), hi: BytePos(9), ctxt: #0 } }
DEBUG 2019-07-11T12:58:22Z: syntax::parse::parser: parse_arg_general parse_pat (is_name_required:true)
DEBUG 2019-07-11T12:58:22Z: syntax::source_map: find_width_of_character_at_span: local_begin=`SourceFileAndBytePos { sf\
: SourceFile(/tmp/off_index.rs), pos: BytePos(10) }`, local_end=`SourceFileAndBytePos { sf: SourceFile(/tmp/off_index.r\
s), pos: BytePos(11) }`
DEBUG 2019-07-11T12:58:22Z: syntax::source_map: find_width_of_character_at_span: start_index=`10`, end_index=`11`
DEBUG 2019-07-11T12:58:22Z: syntax::source_map: find_width_of_character_at_span: source_len=`11`
thread 'rustc' panicked at 'byte index 10 is not a char boundary; it is inside 'ؼ' (bytes 9..11) of `fn main((ؼ`', src/\
libcore/str/mod.rs:2039:5

@pnkfelix
Copy link
Member

triage: P-medium. Removing nomination.

@pnkfelix pnkfelix added P-medium Medium priority and removed I-nominated labels Jul 11, 2019
@pnkfelix
Copy link
Member

This was broken between 1.15 and 1.16, but those are so old and the nature of the ICE itself has changed from 1.28 to 1.29, so I do not think bisection would be worthwhile.

@pnkfelix pnkfelix added the regression-from-stable-to-stable Performance or correctness regression from one stable version to another. label Jul 11, 2019
@ExpHP
Copy link
Contributor

ExpHP commented Jul 11, 2019

Offtopic

@pnkfelix : Thanks. Unfortunately, I did guess the name of RUSTC_LOG after RUST_LOG didn't work, and all it showed me was this:

$ RUSTC_LOG=trace rustc +stage1 src/main.rs
 INFO 2019-07-11T13:33:39Z: jobserver::imp: created a jobserver: Client { read: File { fd: 3, path: "pipe:[2538076]", read: true, write: false }, write: File { fd: 4, path: "pipe:[2538076]", read: false, write: true } }
 INFO 2019-07-11T13:33:39Z: rustc_interface::util: codegen backend candidate: /home/lampam/asd/clone/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/codegen-backends
 INFO 2019-07-11T13:33:39Z: rustc_interface::util: probing /home/lampam/asd/clone/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/codegen-backends for a codegen backend
error: this file contains an un-closed delimiter
 --> src/main.rs:1:11
  |
1 | fn main((ؼ
  |        -- ^
  |        ||
  |        |un-closed delimiter
  |        un-closed delimiter

thread 'rustc' panicked at 'byte index 10 is not a char boundary; it is inside 'ؼ' (bytes 9..11) of `fn main((ؼ`', src/libcore/str/mod.rs:2039:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
error: aborting due to previous error

I set debug = true in my config.toml before building, and didn't see anything else in there. (I recall that debug! and trace! used to be explicitly mentioned in the comment for debug-assertions but it is no longer, so I did not enable it.)

@ExpHP
Copy link
Contributor

ExpHP commented Jul 11, 2019

Okay, I did the following evil, evil thing:

Pure evil, do not open
diff --git a/src/libsyntax_pos/span_encoding.rs b/src/libsyntax_pos/span_encoding.rs
index 525ec13623..d7da2206ab 100644
--- a/src/libsyntax_pos/span_encoding.rs
+++ b/src/libsyntax_pos/span_encoding.rs
@@ -74,6 +74,12 @@ pub const DUMMY_SP: Span = Span { base_or_index: 0, len_or_tag: 0, ctxt_or_zero:
 impl Span {
     #[inline]
     pub fn new(mut lo: BytePos, mut hi: BytePos, ctxt: SyntaxContext) -> Self {
+        if lo == BytePos(10) || hi == BytePos(10) {
+            eprintln!("=========================");
+            eprintln!("{:?}", (lo, hi));
+            eprintln!();
+            let _ = std::panic::catch_unwind(|| panic!()); // potentially print backtrace
+        }
         if lo > hi {
             std::mem::swap(&mut lo, &mut hi);
         }

And acquired the following backtrace of when the span was first constructed:

=========================
(BytePos(10), BytePos(11))

  15: syntax_pos::span_encoding::Span::new at /home/lampam/asd/clone/rust/src/libsyntax_pos/span_encoding.rs:81
  16: syntax_pos::SpanData::with_lo at /home/lampam/asd/clone/rust/src/libsyntax_pos/lib.rs:218
  17: syntax_pos::<impl syntax_pos::span_encoding::Span>::with_lo at /home/lampam/asd/clone/rust/src/libsyntax_pos/lib.rs:268
  18: syntax::tokenstream::TokenTree::close_tt at src/libsyntax/tokenstream.rs:141
  19: syntax::parse::parser::TokenCursor::next at src/libsyntax/parse/parser.rs:315
  20: syntax::parse::parser::Parser::next_tok at src/libsyntax/parse/parser.rs:528
  21: syntax::parse::parser::Parser::bump at src/libsyntax/parse/parser.rs:1031
  22: syntax::parse::parser::Parser::parse_ident_common at src/libsyntax/parse/parser.rs:632
  23: syntax::parse::parser::Parser::parse_ident at src/libsyntax/parse/parser.rs:617
  24: syntax::parse::parser::Parser::parse_pat_ident at src/libsyntax/parse/parser.rs:4132
  25: syntax::parse::parser::Parser::parse_pat_with_range_pat at src/libsyntax/parse/parser.rs:3987
  26: syntax::parse::parser::Parser::parse_pat at src/libsyntax/parse/parser.rs:3908
  27: syntax::parse::parser::Parser::parse_pat_list at src/libsyntax/parse/parser.rs:3582
  28: syntax::parse::parser::Parser::parse_parenthesized_pat_list at src/libsyntax/parse/parser.rs:3547
  29: syntax::parse::parser::Parser::parse_pat_with_range_pat at src/libsyntax/parse/parser.rs:3938
  30: syntax::parse::parser::Parser::parse_pat at src/libsyntax/parse/parser.rs:3908
  31: syntax::parse::parser::Parser::parse_arg_general at src/libsyntax/parse/parser.rs:1510
  32: syntax::parse::parser::Parser::parse_fn_args::{{closure}} at src/libsyntax/parse/parser.rs:5380
  33: syntax::parse::parser::Parser::parse_seq_to_before_tokens at src/libsyntax/parse/parser.rs:983
  34: syntax::parse::parser::Parser::parse_seq_to_before_end at src/libsyntax/parse/parser.rs:916
  35: syntax::parse::parser::Parser::parse_fn_args at src/libsyntax/parse/parser.rs:5368
  36: syntax::parse::parser::Parser::parse_fn_decl at src/libsyntax/parse/parser.rs:5429
  37: syntax::parse::parser::Parser::parse_item_fn at src/libsyntax/parse/parser.rs:5660

It also gets constructed a second time:

Second backtrace
=========================
(BytePos(10), BytePos(11))

  15: syntax_pos::span_encoding::Span::new at /home/lampam/asd/clone/rust/src/libsyntax_pos/span_encoding.rs:81
  16: syntax_pos::SpanData::with_lo at /home/lampam/asd/clone/rust/src/libsyntax_pos/lib.rs:218
  17: syntax_pos::<impl syntax_pos::span_encoding::Span>::with_lo at /home/lampam/asd/clone/rust/src/libsyntax_pos/lib.rs:268
  18: syntax::tokenstream::TokenTree::close_tt at src/libsyntax/tokenstream.rs:141
  19: syntax::parse::parser::TokenCursor::next at src/libsyntax/parse/parser.rs:315
  20: syntax::parse::parser::Parser::next_tok at src/libsyntax/parse/parser.rs:528
  21: syntax::parse::parser::Parser::bump at src/libsyntax/parse/parser.rs:1031
  22: syntax::parse::parser::Parser::expect_one_of at src/libsyntax/parse/parser.rs:590
  23: syntax::parse::parser::Parser::expect at src/libsyntax/parse/parser.rs:577
  24: syntax::parse::parser::Parser::parse_parenthesized_pat_list at src/libsyntax/parse/parser.rs:3555
  25: syntax::parse::parser::Parser::parse_pat_with_range_pat at src/libsyntax/parse/parser.rs:3938
  26: syntax::parse::parser::Parser::parse_pat at src/libsyntax/parse/parser.rs:3908
  27: syntax::parse::parser::Parser::parse_arg_general at src/libsyntax/parse/parser.rs:1510
  28: syntax::parse::parser::Parser::parse_fn_args::{{closure}} at src/libsyntax/parse/parser.rs:5380
  29: syntax::parse::parser::Parser::parse_seq_to_before_tokens at src/libsyntax/parse/parser.rs:983
  30: syntax::parse::parser::Parser::parse_seq_to_before_end at src/libsyntax/parse/parser.rs:916
  31: syntax::parse::parser::Parser::parse_fn_args at src/libsyntax/parse/parser.rs:5368
  32: syntax::parse::parser::Parser::parse_fn_decl at src/libsyntax/parse/parser.rs:5429
  33: syntax::parse::parser::Parser::parse_item_fn at src/libsyntax/parse/parser.rs:5660

(the region of the backtrace that differs between the two is items 22-29 in the first one, or 22-25 in the second)


Some interesting spots from here:

This is clearly where the value of 10 is created:

/// Returns the closing delimiter as a token tree.
pub fn close_tt(span: Span, delim: DelimToken) -> TokenTree {
let close_span = if span.is_dummy() {
span
} else {
span.with_lo(span.hi() - BytePos(delim.len() as u32))
};
TokenTree::token(token::CloseDelim(delim), close_span)
}

TokenCursor::next calls this if there are no remaining tokens and it still hasn't seen the closing delimiter (it seems to have as an unstated precondition that the closing delimiter must exist).

Here's the innermost bit that's specific to ident parsing. Doesn't look that odd...

token::Ident(name, _) => {
if self.token.is_reserved_ident() {
let mut err = self.expected_ident_found();
if recover {
err.emit();
} else {
return Err(err);
}
}
let span = self.token.span;
self.bump();
Ok(Ident::new(name, span))
}

@pnkfelix
Copy link
Member

pnkfelix commented Jul 11, 2019

@ExpHP I’m not familiar with RUSTC_LOG=trace

I tend to list (comma separated) module paths in my own use of RUSTC_LOG; did you try that? You can see an example in the transcript I gave in my comment above

@ExpHP
Copy link
Contributor

ExpHP commented Jul 11, 2019

Oh. (I'm so used to $ prompts I didn't notice you included the command!). Yeah, that was the issue.

@fmckeogh
Copy link
Member

fmckeogh commented Jul 15, 2019

Same ICE occurs with:

y![
Ϥ, 

Which returns:

error: this file contains an un-closed delimiter
 --> crash-298117e3012a17b3e85cddad606b2697232cba40:2:3
  |
1 | y![
  |   - un-closed delimiter
2 | Ϥ,
  |   ^

error: macros that expand to items must be delimited with braces or followed by a semicolon
 --> crash-298117e3012a17b3e85cddad606b2697232cba40:1:3
  |
1 |   y![
  |  ___^
2 | | Ϥ,
  | |__^
thread 'rustc' panicked at 'byte index 1 is not a char boundary; it is inside 'Ϥ' (bytes 0..2) of `Ϥ,`', src/libcore/str/mod.rs:2039:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
error: aborting due to previous error


error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

@dwrensha
Copy link
Contributor Author

@Chocol4te could you post a base64-encoded version of your crash-298117e3012a17b3e85cddad606b2697232cba40? When I copy/paste what you've posted above, I don't see a compiler crash, presumably because some non-ascii characters are getting lost.

@fmckeogh
Copy link
Member

@dwrensha My bad

eSFbCs+kLA==

@estebank estebank added the A-Unicode Area: Unicode label Jul 31, 2019
@rust-lang-glacier-bot rust-lang-glacier-bot added the glacier ICE tracked in rust-lang/glacier. label Oct 15, 2019
Centril added a commit to Centril/rust that referenced this issue Nov 5, 2019
syntax: Avoid span arithmetic for delimiter tokens

The +/-1 logic is from the time where the whole group had a single span and the delimiter spans had to be calculated from it.
Now the delimiters have their own spans which are constructed by lexer or proc macro API and can be used directly.
If those spans are not perfect, then it should be fixed by tweaking the corresponding lexer logic rather than by trying to add or substract `1` from the span boundaries.

Fixes rust-lang#62524
r? @estebank
@bors bors closed this as completed in e5da1a1 Nov 6, 2019
@JohnTitor
Copy link
Member

Hm, the example from #62524 (comment) hasn't been fixed?

@Alexendoo
Copy link
Member

Alexendoo commented Nov 13, 2019

Seems so, on the current nightly:

error: this file contains an un-closed delimiter
 --> 62524-2.rs:2:3
  |
1 | y![
  |   - un-closed delimiter
2 | Ϥ,
  |   ^

error: macros that expand to items must be delimited with braces or followed by a semicolon
 --> 62524-2.rs:1:3
  |
1 |   y![
  |  ___^
2 | | Ϥ,
  | |__^
  |
thread 'rustc' panicked at 'byte index 1 is not a char boundary; it is inside 'Ϥ' (bytes 0..2) of `Ϥ,`', src\libcore\str\mod.rs:2069:5
stack backtrace:
   0: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
   1: core::fmt::write
   2: <std::io::IoSliceMut as core::fmt::Debug>::fmt
   3: std::panicking::take_hook
   4: std::panicking::take_hook
   5: rustc_driver::report_ice
   6: std::panicking::rust_panic_with_hook
   7: std::panicking::begin_panic_fmt
   8: rust_begin_unwind
   9: core::panicking::panic_fmt
  10: core::str::slice_error_fail
  11: <rustc_driver::args::Error as core::fmt::Debug>::fmt
  12: <rustc_errors::lock::acquire_global_lock::Handle as core::ops::drop::Drop>::drop
  13: rustc_errors::annotate_snippet_emitter_writer::AnnotateSnippetEmitterWriter::ui_testing
  14: <rustc_errors::emitter::EmitterWriter as rustc_errors::emitter::Emitter>::emit_diagnostic
  15: rustc_errors::HandlerInner::emit_diagnostic
  16: rustc_errors::diagnostic_builder::DiagnosticBuilder::emit
  17: rustc_parse::parser::item::<impl rustc_parse::parser::Parser>::parse_foreign_item
  18: rustc_parse::parser::item::<impl rustc_parse::parser::Parser>::parse_item
  19: rustc_parse::parser::item::<impl rustc_parse::parser::Parser>::parse_item
  20: rustc_parse::parser::item::<impl rustc_parse::parser::Parser>::parse_item
  21: rustc_parse::parser::module::<impl rustc_parse::parser::Parser>::parse_crate_mod
  22: rustc_parse::parser::module::<impl rustc_parse::parser::Parser>::parse_crate_mod
  23: rustc_parse::parse_crate_from_file
  24: <rustc_interface::proc_macro_decls::Finder as rustc::hir::itemlikevisit::ItemLikeVisitor>::visit_item
  25: <rustc_interface::proc_macro_decls::Finder as rustc::hir::itemlikevisit::ItemLikeVisitor>::visit_item
  26: rustc_interface::queries::<impl rustc_interface::interface::Compiler>::compile
  27: rustc_interface::queries::<impl rustc_interface::interface::Compiler>::parse
  28: <syntax_pos::symbol::SymbolStr as core::fmt::Display>::fmt
  29: <syntax_pos::symbol::SymbolStr as core::fmt::Display>::fmt
  30: <syntax_pos::symbol::SymbolStr as core::fmt::Display>::fmt
  31: <syntax_pos::symbol::SymbolStr as core::fmt::Display>::fmt
  32: _rust_maybe_catch_panic
  33: <syntax_pos::symbol::SymbolStr as core::fmt::Display>::fmt
  34: ZN244_$LT$std..error..$LT$impl$u20$core..convert..From$LT$alloc..string..String$GT$$u20$for$u20$alloc..boxed..Box$LT$dyn$u20$std..error..Error$u2b$core..marker..Send$u2b$core..marker..Sync$GT$$GT$..from..StringError$u20$as$u20$core..fmt..Display$GT$3fmt17
  35: std::sys::windows::thread::Thread::new
  36: BaseThreadInitThunk
  37: RtlUserThreadStart
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports

note: rustc 1.40.0-nightly (4f03f4a98 2019-11-12) running on x86_64-pc-windows-msvc

query stack during panic:
end of query stack
error: aborting due to previous error

@Alexendoo Alexendoo reopened this Nov 13, 2019
@guanqun
Copy link
Contributor

guanqun commented Nov 13, 2019

@Alexendoo I guess my fix #66264 would help. I'll get some time to test it out.

@guanqun
Copy link
Contributor

guanqun commented Nov 15, 2019

Confirmed that my fix fixed this issue, I also add this unit test in this PR: #66429

Centril pushed a commit to Centril/rust that referenced this issue Nov 15, 2019
Centril added a commit to Centril/rust that referenced this issue Nov 15, 2019
bors added a commit that referenced this issue Nov 15, 2019
Rollup of 4 pull requests

Successful merges:

 - #66197 (Push `ast::{ItemKind, ImplItemKind}::OpaqueTy` hack down into lowering)
 - #66429 (Add a regression test for #62524)
 - #66435 (Correct `const_in_array_repeat_expressions` feature name)
 - #66443 (Port erased cleanup)

Failed merges:

r? @ghost
@bors bors closed this as completed in c6cdbe9 Nov 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area: The parsing of Rust source code to an AST A-Unicode Area: Unicode C-bug Category: This is a bug. glacier ICE tracked in rust-lang/glacier. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
10 participants