Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserializing borrowed string when it contains \n escape #1413

Closed
terhechte opened this issue Oct 17, 2018 · 8 comments
Closed

Deserializing borrowed string when it contains \n escape #1413

terhechte opened this issue Oct 17, 2018 · 8 comments
Labels

Comments

@terhechte
Copy link

terhechte commented Oct 17, 2018

Hey!

I have this very simple example:

extern crate serde;
extern crate serde_json;
#[macro_use]
extern crate serde_derive;

use std::fs::File;
use std::collections::HashMap;
use std::io::prelude::*;

#[derive(Deserialize)]
struct User<'a> {
    username: &'a str
}

#[derive(Deserialize)]
struct Comment<'a> {
    author: User<'a>,
    text: &'a str,
    likes: Vec<User<'a>>
}

#[derive(Deserialize)]
struct Media<'a> {
    author: User<'a>,
    likes: Vec<User<'a>>,
    comments: Vec<Comment<'a>>,
    images: HashMap<String, String>,
    description: &'a str
}

fn conv<'a>(string: &'a str) -> Vec<&'a str> {
    let media: Vec<Media> = serde_json::from_str(&string).unwrap();
    media.into_iter().map(|x|x.author.username).collect::<Vec<&str>>()
}

fn main() {
    let mut f = File::open("./test.json").expect("file not found");
    let mut contents = String::new();
    f.read_to_string(&mut contents).expect("something went wrong reading the file");
    println!("{:?}", &conv(&contents));
}

This is the input json:

[{"images": {"small": "http://appventure.me/small.png", "medium": "http://appventure.me/medium.png"}, "description": {"text": "Duis quam lorem, lacinia nec tempus non, Sondra Swigart tristique", "likes": [{"username": "Loan"}, {"username": "Shaneka"}], "author": {"username": "Hattie"}}, "likes": [{"username": "Les"}, {"username": "Vance"}, {"username": "Marilee"}], "comments": [{"text": "dictum feugiat. Ut blandit volutpat Kandis Ammerman ante in commodo.", "likes": [{"username": "Les"}], "author": {"username": "Les"}}, {"text": "luctus, enim lacus sagittis arcu, Clement Cassity at mollis tellus", "likes": [{"username": "Marisha"}], "author": {"username": "Liliana"}}], "author": {"username": "Debbie"}}]

I'm getting the error:

invalid type: string [...] expected a borrowed string

What did I do wrong? I did read the docs but couldn't figure it out.

@dtolnay
Copy link
Member

dtolnay commented Oct 17, 2018

That is not the error I get when running the program you gave.

Your input file contains "description": { ... } where the description value is a map, but the Media struct contains description: &'a str where the value is a string.

@terhechte
Copy link
Author

Hey! Sorry for the confusion. I had edited the JSON because the actual file is 5mb big. I botched it while doing so. I've created a playground that exhibits the error:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2015&gist=ae491238f0e4da1c555e7b26180ad2f6

@dtolnay
Copy link
Member

dtolnay commented Oct 17, 2018

Here is a minimized example (same as your code but with extraneous fields removed):

https://play.rust-lang.org/?edition=2015&gist=378fc363d8ec9752d4e19f4500c07851

Here you need to consider what you expect 'a to be, i.e. what variable owns the data from which the &'a str is borrowed.

@terhechte
Copy link
Author

Doesn't (in your example) j own the data as 'static? And even if not, everything is dropped at the end of main. How could I change it, so that it works? I have to admit I don't get what's wrong here. Changing it to:

struct Media {
    description: String
}

would obviously work, but I wanted to play around the serde zero-copy deserialization feature.

@dtolnay
Copy link
Member

dtolnay commented Oct 17, 2018

Well the length of the string in description needs to be 3 and it needs to contain the byte 'A' followed by a newline (byte '\x0A') followed by the byte 'B', because this is the JSON string in the input. There is no 'static data (in j or anywhere else) containing 'A' followed by a newline followed by 'B'.

@michael-grunder
Copy link

For anyone landing on this issue (like I did), I think a bit more context may be useful.

The reason serde can't deserialize into a borrowed string is that it needs to modify (unescape) the encoded json, which requires ownership.

That said, it appears you can get around this by using a Cow<'a, str> allowing deserialization to borrowed or owned memory depending on the string itself.

Link to playground

If this is incorrect please let me know (I'm still somewhat new to Rust).

@Xaeroxe
Copy link
Contributor

Xaeroxe commented Sep 16, 2020

This seems like a case where the error messaging describing the problem could be improved. I can accept that the lib needs to own the data once the "escape" dilemma is explained to me (thanks @michael-grunder ) however when I first saw this error message in my own code I was baffled as to why a borrowed string posed a problem here. Perhaps serde's error message could better explain why the distinction matters.

@ytrezq
Copy link

ytrezq commented Dec 10, 2020

For anyone landing on this issue (like I did), I think a bit more context may be useful.

The reason serde can't deserialize into a borrowed string is that it needs to modify (unescape) the encoded json, which requires ownership.

That said, it appears you can get around this by using a Cow<'a, str> allowing deserialization to borrowed or owned memory depending on the string itself.

Link to playground

If this is incorrect please let me know (I'm still somewhat new to Rust).

@michael-grunder as an alternative, is there a way to specify a borrowed string from the json?

DmitrySamoylov added a commit to lumeohq/lumeo-types-rs that referenced this issue May 28, 2021
DmitrySamoylov added a commit to lumeohq/lumeo-types-rs that referenced this issue May 28, 2021
bnaecker added a commit to oxidecomputer/omicron that referenced this issue Sep 30, 2021
This commit fixes two bugs dealing with ClickHouse, and adds regressions
for them.

The first is about deserialization of structs which have a `&str` field.
`serde_json` is generally fine about zero-copy, except if the string
requires escaping. In that case, it can no longer deserialize the data
into a `&str`, since it must take ownership and manipulate the returned
buffer. This requires replacing `&'a str` with `Cow<'a, str>` in the DB
model types, specifically for the timeseries key fields. See
serde-rs/serde#1413 for more details. Note
that we also parse all string datum from the DB as `String` directly,
rather than as `&str`, again to avoid this (and because we'll be
converting it to an owned representation soon anyway).

The second bug is handling a ClickHouse oddity. By default, 64-bit
integer types are returned as _quoted_ strings. As in, `"1"` instead of
`1`. This is apparently to support JavaScript's JSON implementation, but
in any case it's not the behavior we expect or want. There is thankfully
a setting to turn this off, which the database client object now sets
when it makes requests returning JSON data. See
ClickHouse/ClickHouse#2375 for more details.
bnaecker added a commit to oxidecomputer/omicron that referenced this issue Sep 30, 2021
This commit fixes two bugs dealing with ClickHouse, and adds regressions
for them.

The first is about deserialization of structs which have a `&str` field.
`serde_json` is generally fine about zero-copy, except if the string
requires escaping. In that case, it can no longer deserialize the data
into a `&str`, since it must take ownership and manipulate the returned
buffer. This requires replacing `&'a str` with `Cow<'a, str>` in the DB
model types, specifically for the timeseries key fields. See
serde-rs/serde#1413 for more details. Note
that we also parse all string datum from the DB as `String` directly,
rather than as `&str`, again to avoid this (and because we'll be
converting it to an owned representation soon anyway).

The second bug is handling a ClickHouse oddity. By default, 64-bit
integer types are returned as _quoted_ strings. As in, `"1"` instead of
`1`. This is apparently to support JavaScript's JSON implementation, but
in any case it's not the behavior we expect or want. There is thankfully
a setting to turn this off, which the database client object now sets
when it makes requests returning JSON data. See
ClickHouse/ClickHouse#2375 for more details.
bnaecker added a commit to oxidecomputer/omicron that referenced this issue Sep 30, 2021
This commit fixes two bugs dealing with ClickHouse, and adds regressions
for them.

The first is about deserialization of structs which have a `&str` field.
`serde_json` is generally fine about zero-copy, except if the string
requires escaping. In that case, it can no longer deserialize the data
into a `&str`, since it must take ownership and manipulate the returned
buffer. There are a few ways around this, but this commit opts for the
simplest -- using a owned `String` instead of a `&str`. The string is
immediately moved into a `Measurement`, which takes ownership anyway, so
this is the most straightfoward choice. See
serde-rs/serde#1413 for more details. Note
that we also parse all string datum from the DB as `String` directly,
rather than as `&str`, again to avoid this.

The second bug is handling a ClickHouse oddity. By default, 64-bit
integer types are returned as _quoted_ strings. As in, `"1"` instead of
`1`. This is apparently to support JavaScript's JSON implementation, but
in any case it's not the behavior we expect or want. There is thankfully
a setting to turn this off, which the database client object now sets
when it makes requests returning JSON data. See
ClickHouse/ClickHouse#2375 for more details.
bnaecker added a commit to oxidecomputer/omicron that referenced this issue Sep 30, 2021
This commit fixes two bugs dealing with ClickHouse, and adds regressions
for them.

The first is about deserialization of structs which have a `&str` field.
`serde_json` is generally fine about zero-copy, except if the string
requires escaping. In that case, it can no longer deserialize the data
into a `&str`, since it must take ownership and manipulate the returned
buffer. There are a few ways around this, but this commit opts for the
simplest -- using a owned `String` instead of a `&str`. The string is
immediately moved into a `Measurement`, which takes ownership anyway, so
this is the most straightfoward choice. See
serde-rs/serde#1413 for more details. Note
that we also parse all string datum from the DB as `String` directly,
rather than as `&str`, again to avoid this.

The second bug is handling a ClickHouse oddity. By default, 64-bit
integer types are returned as _quoted_ strings. As in, `"1"` instead of
`1`. This is apparently to support JavaScript's JSON implementation, but
in any case it's not the behavior we expect or want. There is thankfully
a setting to turn this off, which the database client object now sets
when it makes requests returning JSON data. See
ClickHouse/ClickHouse#2375 for more details.
bnaecker added a commit to oxidecomputer/omicron that referenced this issue Sep 30, 2021
This commit fixes two bugs dealing with ClickHouse, and adds regressions
for them.

The first is about deserialization of structs which have a `&str` field.
`serde_json` is generally fine about zero-copy, except if the string
requires escaping. In that case, it can no longer deserialize the data
into a `&str`, since it must take ownership and manipulate the returned
buffer. There are a few ways around this, but this commit opts for the
simplest -- using a owned `String` instead of a `&str`. The string is
immediately moved into a `Measurement`, which takes ownership anyway, so
this is the most straightfoward choice. See
serde-rs/serde#1413 for more details. Note
that we also parse all string datum from the DB as `String` directly,
rather than as `&str`, again to avoid this.

The second bug is handling a ClickHouse oddity. By default, 64-bit
integer types are returned as _quoted_ strings. As in, `"1"` instead of
`1`. This is apparently to support JavaScript's JSON implementation, but
in any case it's not the behavior we expect or want. There is thankfully
a setting to turn this off, which the database client object now sets
when it makes requests returning JSON data. See
ClickHouse/ClickHouse#2375 for more details.
david-perez added a commit to smithy-lang/smithy-rs that referenced this issue Jan 11, 2022
We are currently deserializing the query string into `Vec<(&str,
&str)>`. `serde_urlencoded` panics if the input string slice contains
escaped data, since in that case it needs to allocate a new `String` to
unescape the input string slice's contents.

Instead of deserializing to `Vec<(String, String)>`, we can instead use
`Cow<'a, str>` so that deserialization only allocates when strictly
required.

Reference: serde-rs/serde#1413 (comment)
david-perez added a commit to smithy-lang/smithy-rs that referenced this issue Jan 19, 2022
We are currently deserializing the query string into `Vec<(&str,
&str)>`. `serde_urlencoded` panics if the input string slice contains
escaped data, since in that case it needs to allocate a new `String` to
unescape the input string slice's contents.

Instead of deserializing to `Vec<(String, String)>`, we can instead use
`Cow<'a, str>` so that deserialization only allocates when strictly
required.

Reference: serde-rs/serde#1413 (comment)
david-perez added a commit to smithy-lang/smithy-rs that referenced this issue Jan 19, 2022
…ta (#1058)

We are currently deserializing the query string into `Vec<(&str,
&str)>`. `serde_urlencoded` panics if the input string slice contains
escaped data, since in that case it needs to allocate a new `String` to
unescape the input string slice's contents.

Instead of deserializing to `Vec<(String, String)>`, we can instead use
`Cow<'a, str>` so that deserialization only allocates when strictly
required.

Reference: serde-rs/serde#1413 (comment)
aws-sdk-rust-ci pushed a commit to awslabs/aws-sdk-rust that referenced this issue Jan 19, 2022
…y string data (#1058)

We are currently deserializing the query string into `Vec<(&str,
&str)>`. `serde_urlencoded` panics if the input string slice contains
escaped data, since in that case it needs to allocate a new `String` to
unescape the input string slice's contents.

Instead of deserializing to `Vec<(String, String)>`, we can instead use
`Cow<'a, str>` so that deserialization only allocates when strictly
required.

Reference: serde-rs/serde#1413 (comment)
Velfi pushed a commit to awslabs/aws-sdk-rust that referenced this issue Jan 19, 2022
…y string data (#1058)

We are currently deserializing the query string into `Vec<(&str,
&str)>`. `serde_urlencoded` panics if the input string slice contains
escaped data, since in that case it needs to allocate a new `String` to
unescape the input string slice's contents.

Instead of deserializing to `Vec<(String, String)>`, we can instead use
`Cow<'a, str>` so that deserialization only allocates when strictly
required.

Reference: serde-rs/serde#1413 (comment)
@serde-rs serde-rs locked and limited conversation to collaborators Feb 20, 2022
@dtolnay dtolnay changed the title I'm getting invalid type: string. expected a borrowed string Deserializing borrowed string when it contains \n escape Feb 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Development

No branches or pull requests

5 participants