Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserializing tagged enum derails the parser #586

Closed
Kriskras99 opened this issue Apr 3, 2023 · 3 comments
Closed

Deserializing tagged enum derails the parser #586

Kriskras99 opened this issue Apr 3, 2023 · 3 comments
Labels
bug duplicate serde Issues related to mapping from Rust types to XML

Comments

@Kriskras99
Copy link
Contributor

Example:

use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize)]
#[serde(tag = "@tag")]
pub enum TaggedEnum {
    VariantTag(WrappedVariant),
}

#[derive(Serialize, Deserialize)]
#[serde(deny_unknown_fields)]
pub struct WrappedVariant {
    pub variant: Variant,
}

#[derive(Serialize, Deserialize)]
#[serde(deny_unknown_fields)]
pub struct Variant {
    #[serde(rename = "@float")]
    pub float: f64,
}


fn main() {
    let document = r#"
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <variant float="0.0">
    </variant>
    "#;
    let _: Variant = quick_xml::de::from_str(document).unwrap();

    let document = r#"
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <tagged_enum tag="VariantTag">
        <variant float="0.0">
        </variant>
    </tagged_enum>
    "#;
    let _: TaggedEnum = quick_xml::de::from_str(document).unwrap();
}

Expected behaviour

Both documents are deserialized correctly without any errors.

Actual behaviour

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom("invalid type: string \"0.0\", expected f64")', src/main.rs:38:59

The parser seems to derail completely, any type that is not a String gives an error.

@Mingun Mingun added bug duplicate serde Issues related to mapping from Rust types to XML labels Apr 3, 2023
@Mingun
Copy link
Collaborator

Mingun commented Apr 3, 2023

Yes, this is a fundamental flaw of serde design -- serde-rs/serde#1183. Internally tagged enums are implemented using internal buffering which, in case of XML, buffers data only as Strings.

@Mingun Mingun closed this as not planned Won't fix, can't repro, duplicate, stale Apr 3, 2023
@Kriskras99
Copy link
Contributor Author

Here's a workaround for anyone who finds this issue:

/// Example of how to deserialize tagged enums with quick-xml's Serde implementation
use serde::{
    de::{Error, Visitor},
    Deserialize,
};

pub enum TaggedEnum {
    NonMatchingTag(Variant),
    Spam(Spam),
}

#[derive(Deserialize)]
#[serde(deny_unknown_fields)]
pub struct Variant {
    #[serde(rename = "@float")]
    pub float: f64,
}

#[derive(Deserialize)]
#[serde(deny_unknown_fields)]
pub struct Spam {
    #[serde(rename = "@ham")]
    pub ham: u64,
}

impl<'de> Deserialize<'de> for TaggedEnum {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: serde::Deserializer<'de>,
    {
        // The Visitor struct is normally used for state, but none is needed
        struct TaggedEnumVisitor;

        // The main logic of the deserializing happens in the Visitor trait
        impl<'de> Visitor<'de> for TaggedEnumVisitor {
            // The type that is being deserialized
            type Value = TaggedEnum;

            // try to give a better error message when this is used wrong
            fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
                formatter.write_str("expecting map with @tag key and tagged value")
            }

            // the xml data is provided as an opaque map, that map is parsed into the type
            fn visit_map<A>(self, mut map: A) -> Result<Self::Value, A::Error>
            where
                A: serde::de::MapAccess<'de>,
            {
                // the first entry (entries) in the map is (are) the attribute(s)
                // here the type of the attribute is forced to String, but it could also be set to f64 or any other type
                let entry: Option<(String, String)> = map.next_entry()?;
                // here the assumption is made that only one attribute exists and it's the tag
                // if there are more attributes those would need to be parsed as well
                let tag = match entry {
                    // return an error if the no attributes are found, and indicate that the @tag attribute is missing
                    None => Err(A::Error::missing_field("@tag")),
                    // check if the attribute is the tag
                    Some((attribute, value)) => {
                        if attribute == "@tag" {
                            // return the value of the tag
                            Ok(value)
                        } else {
                            // the attribute is not @tag, return an error indicating that there is an unexpected attribute
                            Err(A::Error::unknown_field(&attribute, &["@tag"]))
                        }
                    }
                }?;

                // get the next key in the map, this should be the variant of the enum
                let key: Option<String> = map.next_key()?;
                // again the assumption is made that there are no other attributes or elements in the map
                match key {
                    // return an error indicating that no data was found to deserialize as variant
                    // using missing_field would be better here, but it expects a &'static str which cannot be provided
                    None => Err(A::Error::custom(format!("No data found for {tag}"))),
                    // check if the variant matches the tag
                    Some(element_name) => {
                        // match the tag to the variant type
                        match tag.as_str() {
                            "spam" => {
                                // if the tag matches the element it can be easily checked
                                if element_name == tag {
                                    // deserialize the map value as a variant
                                    let option_variant: Option<Spam> = map.next_value()?;
                                    // extract the variant from the option
                                    if let Some(variant) = option_variant {
                                        // return the variant as wrapped in it's enum
                                        Ok(TaggedEnum::Spam(variant))
                                    } else {
                                        // the variant was not found, return an errorindicating that data is missing
                                        Err(A::Error::custom(format!("Missing data for {tag}!")))
                                    }
                                } else {
                                    Err(A::Error::custom(format!(
                                        "Tag {tag} does not match Element name {element_name}!"
                                    )))
                                }
                            }
                            "NonMatchingTag" => {
                                // if the tag does not match the element, it needs to be manually checked
                                if element_name == "variant" {
                                    // deserialize the map value as a variant
                                    let option_variant: Option<Variant> = map.next_value()?;
                                    // extract the variant from the option
                                    if let Some(variant) = option_variant {
                                        // return the variant as wrapped in it's enum
                                        Ok(TaggedEnum::NonMatchingTag(variant))
                                    } else {
                                        // the variant was not found, return an errorindicating that data is missing
                                        Err(A::Error::custom(format!("Missing data for {tag}!")))
                                    }
                                } else {
                                    Err(A::Error::custom(format!(
                                        "Tag {tag} does not match Element name {element_name}!"
                                    )))
                                }
                            }
                            _ => Err(A::Error::unknown_field(&tag, &["variant"])),
                        }
                    }
                }
            }
        }

        // tell the deserializer to deserialize the data as a map, using the TaggedEnumVisitor as the decoder
        deserializer.deserialize_map(TaggedEnumVisitor {})
    }
}

fn main() {
    let document = r#"
    <?xml version="1.0" encoding="ISO-8859-1"?>
    <tagged_enum tag="NonMatchingTag">
        <variant float="0.0">
        </variant>
    </tagged_enum>
    <tagged_enum tag="spam">
        <spam ham="15">
        </spam>
    </tagged_enum>
    "#;
    let _: Vec<TaggedEnum> = quick_xml::de::from_str(document).unwrap();
}

@Mingun do you want to include this in the documentation somewhere? If so, please let me know where, then I'll open a pull request :)

@Mingun
Copy link
Collaborator

Mingun commented Apr 4, 2023

Yes, updating documentation would be beautiful. The place is here (rendered):

quick-xml/src/de/mod.rs

Lines 1576 to 1577 in 2e9123a

//! Frequently Used Patterns
//! ========================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug duplicate serde Issues related to mapping from Rust types to XML
Projects
None yet
Development

No branches or pull requests

2 participants