Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Can't deserialize in HashMap<String, String> if there is a subelement #526

Closed
Pastequee opened this issue Dec 19, 2022 · 5 comments
Labels
serde Issues related to mapping from Rust types to XML

Comments

@Pastequee
Copy link

Problem Description

Hello, I want to deserialize a list of unknown elements, that contains either a String literal or a sub element, but in case of a sub element I just want a default empty String. And I want to store this in an HashMap<String, String>

What I've found so far

Let say I have this XML

<Data>
    <A>value1</A>
    <B>value2</B>
    <C>value3</C>
    <D>
        <Nested>Breaks !</Nested>
    </D>
</Data>

Where A, B, C have unknown name and their values are strings. I have this code struct that deserialize just well if I have only A, B and C.

#[derive(Debug, Serialize, Deserialize)]
#[serde(rename_all = "PascalCase")]
struct Event {
    #serde[(default)]
    data: HashMap<String, String>,
};

Here, if all my sub elements A, B and C contains only strings it works just well, but if I have a nested element in one of them like in D I have a UnexpextedStart([78,101,115,116,101,100]) error (the int array represent the ascii codes for 'Nested').
So my question is, is there a way to either skip D if there is a nested element that is not a String, or define a default value like "" in those case to not lose all the rest of the data.
Thank you for your time 😄

@Mingun Mingun added the serde Issues related to mapping from Rust types to XML label Dec 19, 2022
@Mingun
Copy link
Collaborator

Mingun commented Dec 19, 2022

You should use a custom type for the value of map that would be able to be deserialized using either Visitor::visit_str or Visitor::visit_map. The String::deserialize only expect Visitor::visit_str. It seems that untagged enum

#[derive(Deserialize)]
#[serde(untagged)]
enum MyString {
  Str(String),
  Nested,
}

should do the work, but probably manually implemented Deserialize will be more ergonomic.

Because this issue is duplicate of #383, I close it.

@Mingun Mingun closed this as not planned Won't fix, can't repro, duplicate, stale Dec 19, 2022
@treywelsh
Copy link

treywelsh commented Feb 17, 2023

Hi,

It seems I have a similar problem, I want to capture keys and values in a HashMap (the difference is that I'm using HashMap temporarily, for the time of the deserialization) and I'm interested in the solution for this issue.
I already asked my question on stack overflow https://stackoverflow.com/questions/75337112/deserializing-dynamic-xml-in-rust

I didn't find a way to make this work via untagged enum (I wasn't able to capture key and values) so I took the "Implement Deserialize for a custom map type" serde example to update it a bit:

#![allow(dead_code)]

use serde::Deserialize;
use std::{collections::HashMap, fmt};

use serde::de::{self, Deserializer, MapAccess, Visitor};

const XML: &str = r#"
<vm>
    <id>15</id>
    <template>
        <tag1>content1</tag1>
        <tag2>content2</tag2>
        <vec1>
            <tag3>content3</tag3>
            <tag4>content4</tag4>
        </vec1>
        <vec1>
            <tag3>content5</tag3>
        </vec1>
        <tag2>content6</tag2>
        <vec2>
            <tag3>content7</tag3>
        </vec2>
    </template>
</vm>
"#;

#[derive(Debug, Clone)]
struct Pair(String, String);

#[derive(Debug, Clone)]
struct Vector(String, Vec<Pair>);

#[derive(Debug, Clone)]
struct Template {
    pairs: Vec<Pair>,
    vectors: Vec<Vector>,
}

impl Template {
    fn new() -> Self {
        Template {
            pairs: Vec::new(),
            vectors: Vec::new(),
        }
    }
}

#[derive(Debug, Deserialize)]
pub struct VM {
    id: i64,
    template: Template,
}

struct TemplateVisitor;

impl<'de> Visitor<'de> for TemplateVisitor {
    type Value = Template;

    fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        formatter.write_str("a very special map")
    }

    fn visit_map<M>(self, mut access: M) -> Result<Self::Value, M::Error>
    where
        M: MapAccess<'de>,
    {
        let mut map = Template::new();

        while let Some(key) = access.next_key::<String>()? {
            let map_value = access.next_value::<HashMap<String, String>>().unwrap();

            if map_value.contains_key("$text") {
                map.pairs
                    .push(Pair(key, map_value.get("$text").unwrap().clone()));
            } else {

                let mut vector = Vec::new();
                for (k, v) in map_value {
                    vector.push(Pair(k, v))
                }
                map.vectors.push(Vector(key, vector));
            }
        }

        Ok(map)
    }
}

impl<'de> Deserialize<'de> for Template {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        deserializer.deserialize_map(TemplateVisitor {})
    }
}

fn main() {
    let obj: VM = quick_xml::de::from_str(XML).unwrap();
    println!("{:#?}", obj);
}

Did you find an other interesting solution @arthur-pi ?
I'm not happy with the test map_value.contains_key("$text"), I'm feeling like I'm doing something dirty. Any tip to share @Mingun ?

Sorry if this comment was not appropriate in this issue, I wasn't sure that #383 was really a duplicate of this one ?

@Mingun
Copy link
Collaborator

Mingun commented Feb 17, 2023

I do not see abilities to avoid map_value.contains_key("$text") hack. What's happens there:

// `key` deserialized from
// - <tag1>
// - <tag2>
// - <vec1>
// - <vec1>
// - <tag2>
// - <vec2>
while let Some(key) = access.next_key::<String>()? {
    // map_values deserialized from ... to
    // - <tag1> content1 </tag1>
    //          ^^^^^^^^              -- map: { $text: content1 }
    // - <tag2> content2 </tag2>
    //          ^^^^^^^^              -- map: { $text: content2 }
    // - <vec1> <tag3>content3</tag3><tag4>content4</tag4> </vec1>
    //          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -- map: { tag3: content3, tag4: content4 }
    // - <vec1> <tag3>content5</tag3> </vec1>
    //          ^^^^^^^^^^^^^^^^^^^^^ -- map: { tag: content5 }
    // - <tag2> content6 </tag2>
    //          ^^^^^^^^              -- map: { $text: content6 }
    // - <vec2> <tag3>content7</tag3> </vec2>
    //          ^^^^^^^^^^^^^^^^^^^^^ -- map: { tag3: content7 }
    let map_value = access.next_value::<HashMap<String, String>>().unwrap();

    if map_value.contains_key("$text") {
        map.pairs
            .push(Pair(key, map_value.get("$text").unwrap().clone()));
    } else {

        let mut vector = Vec::new();
        for (k, v) in map_value {
            vector.push(Pair(k, v))
        }
        map.vectors.push(Vector(key, vector));
    }
}

PS. I would better to use HashMap::remove instead of contains_key + get.

https://serde.rs/string-or-struct.html example does not work here, because both deserialize_str and deserialize_map can handle content1 and <tag3>content5</tag3> and return

  • String(content1) and String(content5)
  • HashMap{ $text: content1 } and HashMap{ tag: content5 }

so enum couldn't distinguish between them (it relies on deserialization failure). Moreover, because deserializer is consumed during deserialization, (Deserialize::deserialize takes ownership) the deserialization is performed into an internal serde::de::__private::Content type, which uses deserialize_any.

Probably, if we'll change our deserialize_any implementation, we could solve your case by using untagged enum

#[derive(Deserialize)]
#[serde(untagged)]
enum {
  String(String),
  Map(HashMap<String, String>),
}

quick-xml/src/de/mod.rs

Lines 2525 to 2536 in 64292c7

fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, DeError>
where
V: Visitor<'de>,
{
match self.peek()? {
DeEvent::Start(_) => self.deserialize_map(visitor),
// Redirect to deserialize_unit in order to consume an event and return an appropriate error
DeEvent::End(_) | DeEvent::Eof => self.deserialize_unit(visitor),
_ => self.deserialize_string(visitor),
}
}
}

The self.peek() here peeks a <tag1>, <tag2> and so on.

Feel free to investigate this! That could open a rabbit hole!

@treywelsh
Copy link

Thanks for taking time to answer !

@codingbaobao
Copy link

codingbaobao commented Jan 18, 2024

Hi, I have a similar question to transform into hashmap

<imgdir name="info">
    <int name="version" value="10" />
    <float name="speed" value="1.6" />
    <string name="path" value="a/b/c" />
</imgdir>

I have an Enum to handle different types

enum Value {
    Int(i32),
    Float(f32),
    String(String),
}

And I want to transform this xml into HashMap<@name,@value>

#[derive(Debug, Clone)]
struct MyMap(HashMap<String, Value>);

struct Imgdir {
    #[serde(rename = "@name")]
    name: String,
    #[serde(rename = "$value")]
    map: MyMap,
}

I implement Visitor trait like @treywelsh did, however seems like it gets error on access.next_value::<HashMap<String, String>>().
Could you give some tips to get it right? Or it just can't :( @Mingun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
serde Issues related to mapping from Rust types to XML
Projects
None yet
Development

No branches or pull requests

4 participants