Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Spanned toml deserialization, add some validation/tests #230

Merged
merged 5 commits into from
Jun 27, 2022

Conversation

Gankra
Copy link
Contributor

@Gankra Gankra commented Jun 27, 2022

Spanning is done with the Spanned hack from toml-rs. A bunch of serialization code had to be rejigged
to play nice with that hack. In particular things like flatten and untagged_enum make it freak out, so
now we deserialize all the fields and then pick which variant it is.

Store now records the source files it imported, and we now check that criteria names are valid everywhere.
New tests allow us to quickly check different inputs parse/fail-parse/fail-validate.

@Gankra
Copy link
Contributor Author

Gankra commented Jun 27, 2022

I have been struggling in a mental labyrinth trying to get the Spanned stuff to work with all the weird parsing hacks we already had for like the last week, and ended up obsessively working on it this sunday just to get it out of my head, so I'll take off some other day this week to make up.

@Gankra
Copy link
Contributor Author

Gankra commented Jun 27, 2022

image

beautiful ;-;

Comment on lines +536 to +530
start: 0,
end: 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could use better values to indicate that the span isn't specified, so we can handle that case better? We could e.g. use usize::MAX?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found miette can be a bit brittle in degenerate source/span situations (panics on the empty string being a NamedSource), so (0,0) is safer. In practice we only really set spans when we're about to writeback, in which case their values don't matter (but this may change in the future).

src/storage.rs Outdated
let (line, col) = error.line_col().unwrap_or((0, 0));
TomlParseError {
source_code: audit_source.clone(),
span: SourceOffset::from_location(&string, line + 1, col + 1),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC generally column isn't 1-indexed, unlike line. Is this correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These offsets were determined by empirical checking, everyone's got weird opinions it seems.

src/storage.rs Outdated
where
T: for<'a> Deserialize<'a>,
{
let mut reader = BufReader::new(reader);
let mut string = String::new();
reader.read_to_string(&mut string)?;
let toml = toml_edit::de::from_str(&string).map_err(|error| TomlParseError { error })?;
Ok(toml)
let audit_source = Arc::new(NamedSource::new(file_name, string.clone()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't love that we're cloning the string here, when we really only need one copy, as deserialization doesn't require us to actually own the payload.

Unfortunately it looks like miette throws out the type information as soon as you call NamedSource::new so it would be quite the hassle to get back out a &str for the full source file. Could we perhaps rework this like this:

let mut string = String::new();
reader.read_to_string(&mut string)?;
let result = toml::de::from_str(&string);
let source_code = Arc::new(NamedSource::new(file_name, string.clone()));
match result {
    Ok(toml) => Ok((source_code, toml)),
    Err(error) => {
        let (line, col) = error.line_col().unwrap_or((0, 0));
        Err(TomlParseError {
            source_code,
            span: SourceOffset::from_location(&string, line + 1, col),
            error,
        })
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow, you seem to still be cloning the string?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, well I meant to remove the .clone(), but forgot to. I also totally missed that you need the string to do SourceOffset::from_location. Something like this should work IIRC (though it has some duplication)

let mut string = String::new();
reader.read_to_string(&mut string)?;
let result = toml::de::from_str(&string);
match result {
    Ok(toml) => Ok((Arc::new(NamedSource::new(file_name, string)), toml)),
    Err(error) => {
        let (line, col) = error.line_col().unwrap_or((0, 0));
        Err(TomlParseError {
            span: SourceOffset::from_location(&string, line + 1, col + 1),
            source_code: Arc::new(NamedSource::new(file_name, string)),
            error,
        })
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I iterated on this and now we don't clone the string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah nice, same impl :)

Gankra added 4 commits June 27, 2022 15:43
Spanning is done with the Spanned<T> hack from toml-rs. A bunch of serialization code had to be rejigged
to play nice with that hack. In particular things like flatten and untagged_enum make it freak out, so
now we deserialize all the fields and then pick which variant it is.

Store now records the source files it imported, and we now check that criteria names are valid everywhere.
New tests allow us to quickly check different inputs parse/fail-parse/fail-validate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants