Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Value::Struct variant and preserve struct names and formatting #328

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

cswinter
Copy link
Contributor

@cswinter cswinter commented Oct 26, 2021

This change adds a dreadful hackcreative workaround that allows us to distinguish between structs and maps during parsing and preserve struct names. It does so by abusing the size_hint method of the MapAccessor as a side channel that exposes information about the struct name and whether we are visiting a struct or a map. I am somewhat confident that, within the context of the current implementation, this is sound and won't unexpectedly blow up in our face. The CommaSeparated struct with the unusual size_hint method is passed directly to the ValueVisitor and drained before we return control flow to any serde code, so anything else that might call the size_hint method will just receive None values. I haven't confirmed this, but with some luck, size_hint will also get inlined after monomorphization and optimized into something efficient.

This makes progress towards #122, fixes #189, and fixes #264.

This change has been arrived at by sheer power of will rather than any kind of understanding of serde so there may be a better solution. There are still some smaller issues with the PR, before fixing those I first wanted to check whether this has any hope of getting merged.

@cswinter
Copy link
Contributor Author

Another hitch I haven't figured out yet is that the serde serialize_struct method expects a name: &'static str which makes it unusable for Value::Struct variants with dynamic names: serde-rs/serde#2043

@cswinter
Copy link
Contributor Author

cswinter commented Oct 27, 2021

I suppose it's easy enough to adapt the Serialize::serialize implementation to take a concrete ron::ser::Serializer which can actually have a more permissive lifetime and call that in to_string/to_pretty_string. It does mean that we couldn't actually correctly implement the serde::Serialize trait on Value, not sure much we care about that.

@cswinter
Copy link
Contributor Author

cswinter commented Oct 27, 2021

Hmm ok actually a little more involved because of the recursion on the Compounds. Seems pretty manageable but not sure yet what the right approach is.

@cswinter
Copy link
Contributor Author

Ok I think I got it. KeepingValue Serialize seems important if you wanted to e.g. convert ron to yaml, but we can still keep the existing implementation and just serialize Struct as map which retains the existing behavior and is simply the best we can do. And then to_string_pretty can just use an enhanced version of serialize with some moderate amount of code deduplication.

@cswinter
Copy link
Contributor Author

cswinter commented Oct 27, 2021

Well, I got it working but since to_string is fully generic we can't actually do something custom for Value, so the only option I see is to have a separate version of the methods just for Value. Which could still be useful but somewhat unsatisfying, not sure if there's a way to salvage this.

@cswinter cswinter force-pushed the struct-hack branch 2 times, most recently from 80e938a to 91fdaf1 Compare October 28, 2021 03:23
@kvark
Copy link
Collaborator

kvark commented Nov 2, 2021

Sorry, I've been punting on reviewing this.
@torkleyy is this something you'd be comfortable reviewing?

@cswinter
Copy link
Contributor Author

cswinter commented Nov 2, 2021

So I did get this to work and am using it in https://github.com/cswinter/pyron to (de)serialize Python dataclasses as ron, but there are still some issues with this that I'm not sure yet if they can be fixed. Both the ValueVisitor used in the Deserialize implementation of Value, and the Deserializer fail to conform to the expected serde interface and are public which might cause some issues if you use them in combination with a different Deserialize/Deserializer.
Particularly, if you try to deserialize something else (e.g. yaml) into a ron::Value, the ValueVisitor won't behave correctly and do weird things if it receives size hints. We could possibly fix this by further changing the ron Deserializer to emit a sentinel value of usize::MAX as the first size_hint (which would never be emitted by a valid Deserializer) and allows the ValueVisitor to work normally if it doesn't see this sentinel. However, this would now break the Deserializer, and actually, my current implementation might already be broken for deserializing into structs. There could still be some way around this but I'm not sure yet if that's the way to go. It might be better to just write a new parser from scratch that is not tied to serde, which I'm considering anyway since it enables some other features that are not currently possible.

The serialization implementation that allows preserving struct names/tuples is in better shape. It requires some amount of code duplication, and unfortunately needs an explicit call to a new Value::to_string/to_pretty_string method so we can call our custom serialize method that doesn't quite work like the serde one, but I'm not aware of any glaring soundness issues. It's a lot less useful with the corresponding deserialization code of course.

@cswinter
Copy link
Contributor Author

cswinter commented Nov 2, 2021

For now, my main use case for this is https://github.com/cswinter/pyron, which doesn't use ron in a way that would run into the remaining issues. I would like to fix this properly at some point but still need to decide on the right approach and exactly what features I want so I'm planning to just let this stew for a while. The main approaches I'm considering:

  • come up with one more clever idea to make this hack work
  • implement custom deserializer (and maybe also pretty printer) which isn't tied to serde
  • get serde to add some new methods for us that allow an implementation without hacks (I think this could be done in a way that doesn't break existing methods by adding alternative variants with dynamic lifetime etc. which have a default implementation that returns a specific error. Existing serializer would just never call these, and our serializer could call these to use the extra functionality when available, or revert to the current behavior for Deserializer/Deserialize implementations that don't support this functionality.)

If anyone has any other good ideas or opinions on how this should be approached I'd be glad to hear about them.

@torkleyy
Copy link
Contributor

torkleyy commented Nov 2, 2021

Sorry, I've been punting on reviewing this. @torkleyy is this something you'd be comfortable reviewing?

I might be able to next week, feel free to ping me if I forget.

@github-actions
Copy link

github-actions bot commented Jan 1, 2022

Issue has had no activity in the last 60 days and is going to be closed in 7 days if no further activity occurs

@github-actions github-actions bot added the stale label Jan 1, 2022
@kvark
Copy link
Collaborator

kvark commented Jan 6, 2022

@torkleyy ping

@github-actions github-actions bot removed the stale label Jan 6, 2022
@juntyr juntyr marked this pull request as draft January 28, 2022 17:21
@juntyr juntyr mentioned this pull request Aug 15, 2022
17 tasks
@juntyr
Copy link
Member

juntyr commented Aug 20, 2023

I've been thinking about a more AST-like Value type again and this is a sketch of what a NewValue might look like:

pub enum NewValue<S: std::borrow::Borrow<str>> {
    Unit,
    Bool(bool),
    Number(Number),
    Char(char),
    String(String),
    Bytes(Vec<u8>),
    List(Vec<NewValue<S>>),
    Map(NewMap<S>),
    NamedUnit {
        name: S,
    },
    Newtype(Box<NewValue<S>>),
    NamedNewtype {
        name: S,
        inner: Box<NewValue<S>>,
    },
    Tuple(Vec<NewValue<S>>),
    NamedTuple {
        name: S,
        elems: Vec<NewValue<S>>,
    },
    Struct {
        fields: FieldMap<S>,
    },
    NamedStruct {
        name: S,
        fields: FieldMap<S>,
    }
}

#[cfg(not(feature = "indexmap"))]
type NewMap<S> = std::collections::BTreeMap<NewValue<S>, NewValue<S>>;
#[cfg(feature = "indexmap")]
type NewMap<S> = indexmap::IndexMap<NewValue<S>, NewValue<S>>;

#[cfg(not(feature = "indexmap"))]
type FieldMap<S> = std::collections::BTreeMap<S, NewValue<S>>;
#[cfg(feature = "indexmap")]
type FieldMap<S> = indexmap::IndexMap<S, NewValue<S>>;

I think the biggest questions are:

  • how to represent Option, since the unwrap newtype variants feature includes Some and thus all of the following are allowed: None, Some(val), Some() = Some(()), Some(a, b) = Some((a, b)), and Some(a: a, b: b) = Some((a: a, b: b)). On a syntactic level, None is just a named ident and Some is just a named newtype / tuple / struct with special handling on serde's side
  • how to handle extensions - maybe we could support attributes more generally around every value subtree?
  • how to handle newtypes, since ron sometimes erases them

@rlidwka
Copy link

rlidwka commented Dec 25, 2023

I've been thinking about a more AST-like Value type again

Is there any progress made on this?

I'd like to have lossless Value, which I can have multiple attempts at trying to deserialize from for untagged enums. Trying to make something like this work:

#[derive(Debug, serde::Serialize, serde::Deserialize)]
enum A {
    A1(String),
    A2(String),
}

#[derive(Debug, serde::Serialize, serde::Deserialize)]
enum B {
    B1(String),
    B2(String),
}

#[derive(Debug, serde::Serialize, serde::Deserialize)]
#[serde(untagged)]
enum MyEnum {
    A(A),
    B(B),
}

let value = MyEnum::A(A::A1("Hello, world!".to_string()));

// valid RON serializations are:
// A1("")
// A2("")
// B1("")
// B2("")

// works fine with serde_json
let json = serde_json::to_string_pretty(&value).unwrap();
dbg!(serde_json::from_str::<MyEnum>(&json).unwrap());

// but doesn't work with ron
let ron = ron::ser::to_string_pretty(&value, ron::ser::PrettyConfig::default()).unwrap();
dbg!(ron::de::from_str::<MyEnum>(&ron).unwrap());

@juntyr
Copy link
Member

juntyr commented Dec 25, 2023

Is there any progress made on this?

Not yet, unfortunately. Once I am back from parental leave, I will first tackle finally publishing v0.9, which includes a list of all the weird serde attribute cases that we currently cannot support because serde treats every format like JSON. If serde were to ever change that (and e.g. impl Trait in type aliases in traits makes that technically possible), an improved version of Ron’s Value type (as outlined above) would be used to fix most of these cases.

While it will take me a few more months to get back to this PR, feel free to implement something similar to my idea sketched out above (essentially an AST-like Value type that allows round trips of RON [without serde attributes] but is more an AST than a Value enum since e.g. named structs and enum variants are ambiguous in RON when we are not given type information by the user).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_string(from_str::<Value>("ron")) should be lossless Extracting struct name from Value?
5 participants