Skip to content

In arrow_json, Decoder::decode can panic if it encounters two high surrogates in a row. #7712

@nicklan

Description

@nicklan

Describe the bug

In arrow_json, Decoder::decode can panic if it encounters two high surrogates in a row. Since this method returns a Result, panics are not expected, even in error cases.

To Reproduce

The following program reproduces the bug:

use std::io::{BufRead, BufReader};
use std::sync::Arc;

use arrow::datatypes::{DataType, Field};
use arrow_json::ReaderBuilder;

fn main() {
    let mut decoder =
        ReaderBuilder::new_with_field(Arc::new(Field::new("test", DataType::Utf8, true)))
            .build_decoder()
            .unwrap();
    let s = r#"{"test": "\uD800\uD801"}"#;
    let mut reader = BufReader::new(s.as_bytes());
    let buf = reader.fill_buf().unwrap();
    let _ = decoder.decode(buf);
}

Running this gives:

thread 'main' panicked at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-json-55.1.0/src/reader/tape.rs:708:49:
attempt to subtract with overflow
stack backtrace:
   0: rust_begin_unwind
             at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/std/src/panicking.rs:695:5
   1: core::panicking::panic_fmt
             at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/core/src/panicking.rs:75:14
   2: core::panicking::panic_const::panic_const_sub_overflow
             at /rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/core/src/panicking.rs:178:21
   3: arrow_json::reader::tape::char_from_surrogate_pair
             at [..]/arrow-json-55.1.0/src/reader/tape.rs:708:49
   4: arrow_json::reader::tape::TapeDecoder::decode
             at [..]/arrow-json-55.1.0/src/reader/tape.rs:514:37
   5: arrow_json::reader::Decoder::decode
             at [..]/arrow-json-55.1.0/src/reader/mod.rs:439:9
   6: arrow_panic::main
             at ./src/main.rs:15:13
   7: core::ops::function::FnOnce::call_once
             at [..]/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5

Expected behavior

decode should return an error as the string is invalid, but it should not panic.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions