Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control Characters wrongly allowed #3

Closed
indietyp opened this issue Jul 28, 2024 · 3 comments
Closed

Control Characters wrongly allowed #3

indietyp opened this issue Jul 28, 2024 · 3 comments

Comments

@indietyp
Copy link

During fuzzing, I found that str_fold is incorrectly implemented. The RFC 8259 states in Section 7:

All Unicode characters may be placed within the quotation marks, except for the characters that MUST be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F)

but str_fold incorrectly implements this incorrectly, by only marking everything until 0x13 as a control character, not until 0x1F

matches!(c, b'\\' | b'"' | 0..=19)

01mf02 added a commit that referenced this issue Jul 29, 2024
@01mf02
Copy link
Owner

01mf02 commented Jul 29, 2024

Thanks for spotting this, @indietyp. I corrected that right now.

I am curious about your fuzzing procedure: How did you find this bug?

@01mf02 01mf02 closed this as completed Jul 29, 2024
@indietyp
Copy link
Author

Yea for sure, I am currently working on a parser that is also able to parse into a Value struct, something similar to:

#[derive(Debug, Clone, PartialEq, Eq)]
pub enum ValueKind<'source> {
    /// Represents a JSON null value
    Null,

    /// Represents a JSON boolean
    Bool(bool),

    /// Represents a JSON number, whether integer or floating point
    Number(Cow<'source, Number>),

    /// Represents a JSON string
    String(Cow<'source, str>),

    /// Represents a JSON array
    Array(Vec<Value<'source>>),

    /// Represents a JSON object
    Object(HashMap<Cow<'source, str>, Value<'source>>),
}

(Number here is from the json-number crate). Parsing and creating this struct - in itself - is very similar to the one that serde_json provided, so my train of thought was: the current battle tested json implementation in rust is serde_json, therefore, everything that is able to be parsed with serde_json must be parseable using my implementation.

Therefore, using test-fuzz, I created a simple little test function - very similar to the following:

#[test_fuzz::test_fuzz]
fn assert_eq_serde(input: &str) {
    let value_result = parse_complete(&input);
    let serde_result = serde_json::from_str::<serde_json::Value>(input);

    if let Err(error) = &serde_result {
        // serde_json number out of range is normal, because we don't have any precision checks
        // (we save it as a string for later)
        if error.to_string().contains("number out of range") {
            return;
        }
    }

    if let Err(error) = &value_result {
        let is_duplicate = /* was the error we received duplicate? check removed for brevities sake */

        if is_duplicate {
            // we error out on duplicate keys, serde_json does not
            return;
        }
    }

    match (value_result, serde_result) {
        (Ok(a), Ok(b)) => assert_eq!(a, b),
        (Err(_), Err(_)) => {}
        (value, serde) => {
            panic!("input: {input:?}\nvalue: {value:?}\nserde: {serde:?}");
        }
    };
}


const INPUT: &str = r#"
[
  {
    "Name": "Edward the Elder",
    "Country": "United Kingdom",
    "House": "House of Wessex",
    "Reign": "899-925",
    "ID": 1
  },
  {
    "Name": "Athelstan",
    "Country": "United Kingdom",
    "House": "House of Wessex",
    "Reign": "925-940",
    "ID": 2
  },
  {
    "Name": "Edmund",
    "Country": "United Kingdom",
    "House": "House of Wessex",
    "Reign": "940-946",
    "ID": 3
  },
  {
    "Name": "Edred",
    "Country": "United Kingdom",
    "House": "House of Wessex",
    "Reign": "946-955",
    "ID": 4
  },
  {
    "Name": "Edwy",
    "Country": "United Kingdom",
    "House": "House of Wessex",
    "Reign": "955-959",
    "ID": 5
  }
]"#;

#[test]
fn serde_integration() {
    // ensure that everything that serde can parse, we can parse as well
    assert_eq_serde(INPUT);
    assert_eq_serde("[[");
    assert_eq_serde("[[[]]");
    assert_eq_serde("{}");
    assert_eq_serde("{}}");
    assert_eq_serde(r#"{12: "12"}"#);
}

Then just run:

cargo test
cargo test-fuzz assert_eq_serde

This then found the control character error.

I don't know if something like this could also work in this repository, because of the reliance on a Value enum as well as the external json-number crate (in theory that part could be made of hifijson itself?), one would either need to provide that in the library itself (something that might be useful either way) or implement it in the fuzzing code separately.

@01mf02
Copy link
Owner

01mf02 commented Jul 29, 2024

That's a pretty nice idea! Thanks for explaining it in such detail.
I'm currently not motivated enough to integrate this fuzzing infrastructure into hifijson myself, but I'd accept a PR for this. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants