Introduce Spanned toml deserialization, add some validation/tests #230

Gankra · 2022-06-27T03:29:47Z

Spanning is done with the Spanned hack from toml-rs. A bunch of serialization code had to be rejigged
to play nice with that hack. In particular things like flatten and untagged_enum make it freak out, so
now we deserialize all the fields and then pick which variant it is.

Store now records the source files it imported, and we now check that criteria names are valid everywhere.
New tests allow us to quickly check different inputs parse/fail-parse/fail-validate.

Gankra · 2022-06-27T03:31:13Z

I have been struggling in a mental labyrinth trying to get the Spanned stuff to work with all the weird parsing hacks we already had for like the last week, and ended up obsessively working on it this sunday just to get it out of my head, so I'll take off some other day this week to make up.

Gankra · 2022-06-27T03:36:37Z

beautiful ;-;

src/serialization.rs

mystor · 2022-06-27T18:39:13Z

src/serialization.rs

+                start: 0,
+                end: 0,


I wonder if we could use better values to indicate that the span isn't specified, so we can handle that case better? We could e.g. use usize::MAX?

I've found miette can be a bit brittle in degenerate source/span situations (panics on the empty string being a NamedSource), so (0,0) is safer. In practice we only really set spans when we're about to writeback, in which case their values don't matter (but this may change in the future).

mystor · 2022-06-27T18:50:38Z

src/storage.rs

+        let (line, col) = error.line_col().unwrap_or((0, 0));
+        TomlParseError {
+            source_code: audit_source.clone(),
+            span: SourceOffset::from_location(&string, line + 1, col + 1),


IIRC generally column isn't 1-indexed, unlike line. Is this correct?

These offsets were determined by empirical checking, everyone's got weird opinions it seems.

mystor · 2022-06-27T18:50:48Z

src/storage.rs

 where
    T: for<'a> Deserialize<'a>,
 {
    let mut reader = BufReader::new(reader);
    let mut string = String::new();
    reader.read_to_string(&mut string)?;
-    let toml = toml_edit::de::from_str(&string).map_err(|error| TomlParseError { error })?;
-    Ok(toml)
+    let audit_source = Arc::new(NamedSource::new(file_name, string.clone()));


Hmm, I don't love that we're cloning the string here, when we really only need one copy, as deserialization doesn't require us to actually own the payload.

Unfortunately it looks like miette throws out the type information as soon as you call NamedSource::new so it would be quite the hassle to get back out a &str for the full source file. Could we perhaps rework this like this:

let mut string = String::new(); reader.read_to_string(&mut string)?; let result = toml::de::from_str(&string); let source_code = Arc::new(NamedSource::new(file_name, string.clone())); match result { Ok(toml) => Ok((source_code, toml)), Err(error) => { let (line, col) = error.line_col().unwrap_or((0, 0)); Err(TomlParseError { source_code, span: SourceOffset::from_location(&string, line + 1, col), error, }) } }

I'm not sure I follow, you seem to still be cloning the string?

Oops, well I meant to remove the .clone(), but forgot to. I also totally missed that you need the string to do SourceOffset::from_location. Something like this should work IIRC (though it has some duplication)

let mut string = String::new(); reader.read_to_string(&mut string)?; let result = toml::de::from_str(&string); match result { Ok(toml) => Ok((Arc::new(NamedSource::new(file_name, string)), toml)), Err(error) => { let (line, col) = error.line_col().unwrap_or((0, 0)); Err(TomlParseError { span: SourceOffset::from_location(&string, line + 1, col + 1), source_code: Arc::new(NamedSource::new(file_name, string)), error, }) } }

I iterated on this and now we don't clone the string

ah nice, same impl :)

Spanning is done with the Spanned<T> hack from toml-rs. A bunch of serialization code had to be rejigged to play nice with that hack. In particular things like flatten and untagged_enum make it freak out, so now we deserialize all the fields and then pick which variant it is. Store now records the source files it imported, and we now check that criteria names are valid everywhere. New tests allow us to quickly check different inputs parse/fail-parse/fail-validate.

Gankra mentioned this pull request Jun 27, 2022

Rename the unaudited table to the "exemptions" table #228

Closed

mystor requested changes Jun 27, 2022

View reviewed changes

Gankra added 4 commits June 27, 2022 15:43

proxy Spanned instead of fully copying its private details

adf0e80

fixup error message

0390642

fixup test

3976b85

Gankra force-pushed the spanned-errors branch from 8887cba to 3976b85 Compare June 27, 2022 19:45

defer NamedSource creation to avoid big string clone

989b65b

mystor approved these changes Jun 27, 2022

View reviewed changes

Gankra merged commit c91e468 into mozilla:main Jun 27, 2022

MichaReiser mentioned this pull request Jan 22, 2025

Create Unknown rule diagnostics with a source range astral-sh/ruff#15648

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Spanned toml deserialization, add some validation/tests #230

Introduce Spanned toml deserialization, add some validation/tests #230

Gankra commented Jun 27, 2022

Gankra commented Jun 27, 2022 •

edited

Loading

Gankra commented Jun 27, 2022

mystor Jun 27, 2022

Gankra Jun 27, 2022

mystor Jun 27, 2022

Gankra Jun 27, 2022

mystor Jun 27, 2022

Gankra Jun 27, 2022

mystor Jun 27, 2022

Gankra Jun 27, 2022

Gankra Jun 27, 2022

Introduce Spanned toml deserialization, add some validation/tests #230

Introduce Spanned toml deserialization, add some validation/tests #230

Conversation

Gankra commented Jun 27, 2022

Gankra commented Jun 27, 2022 • edited Loading

Gankra commented Jun 27, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gankra commented Jun 27, 2022 •

edited

Loading