Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(misuse): Add validation to deserialization #504

Merged

Conversation

HenningHolmDE
Copy link
Contributor

@HenningHolmDE HenningHolmDE commented Jun 14, 2024

Previously, correctness through the existing validation functions was
not enforced during deserialization.
This is now achieved by going through TryFrom implementations or
specific field deserialization functions. Thus, it should no longer be
possible to create invalid instances by deserialization.

This fixes #502.

@coveralls
Copy link
Collaborator

coveralls commented Jun 14, 2024

Pull Request Test Coverage Report for Build 9523103389

Details

  • 201 of 208 (96.63%) changed or added relevant lines in 3 files are covered.
  • 456 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-3.2%) to 86.72%

Changes Missing Coverage Covered Lines Changed/Added Lines %
imap-types/src/datetime.rs 35 36 97.22%
imap-types/src/core.rs 133 139 95.68%
Files with Coverage Reduction New Missed Lines %
imap-types/src/mailbox.rs 40 69.19%
imap-types/src/datetime.rs 49 72.93%
imap-types/src/core.rs 367 70.6%
Totals Coverage Status
Change from base Build 9522522665: -3.2%
Covered Lines: 10167
Relevant Lines: 11724

💛 - Coveralls

@HenningHolmDE
Copy link
Contributor Author

@duesee I just read once again through your reply in the issue and noticed that I overlooked your hint to CommandContinuationRequestBasic, which has no validate method but a validation in its new method. I will take a look at that tomorrow.

(Also I will look into the strange non-diff the failing check is complaining about.)

@HenningHolmDE HenningHolmDE marked this pull request as draft June 15, 2024 08:27
@HenningHolmDE HenningHolmDE force-pushed the issue_502_fix_deserialization_misuse branch from a253f8e to 26e043e Compare June 15, 2024 11:10
@coveralls
Copy link
Collaborator

coveralls commented Jun 15, 2024

Pull Request Test Coverage Report for Build 9527640958

Details

  • 208 of 225 (92.44%) changed or added relevant lines in 4 files are covered.
  • 505 unchanged lines in 4 files lost coverage.
  • Overall coverage decreased (-3.4%) to 86.481%

Changes Missing Coverage Covered Lines Changed/Added Lines %
imap-types/src/mailbox.rs 31 32 96.88%
imap-types/src/core.rs 123 128 96.09%
imap-types/src/response.rs 21 32 65.63%
Files with Coverage Reduction New Missed Lines %
imap-types/src/datetime.rs 29 80.29%
imap-types/src/mailbox.rs 39 69.19%
imap-types/src/response.rs 70 52.84%
imap-types/src/core.rs 367 70.6%
Totals Coverage Status
Change from base Build 9522522665: -3.4%
Covered Lines: 10190
Relevant Lines: 11783

💛 - Coveralls

@jakoschiko
Copy link
Collaborator

You are using #[serde(try_from = "String")]. As far as I can tell this will create an intermediate copy and do some unnecessary unicode checks. Did you try #[serde(try_from = "&'a [u8]")]? I was not able to compile your branch with my suggestion, but it works in playground.

@HenningHolmDE HenningHolmDE force-pushed the issue_502_fix_deserialization_misuse branch from 26e043e to 4877e0d Compare June 17, 2024 08:29
@coveralls
Copy link
Collaborator

coveralls commented Jun 17, 2024

Pull Request Test Coverage Report for Build 9544457310

Details

  • 208 of 225 (92.44%) changed or added relevant lines in 4 files are covered.
  • 505 unchanged lines in 4 files lost coverage.
  • Overall coverage decreased (-3.4%) to 86.481%

Changes Missing Coverage Covered Lines Changed/Added Lines %
imap-types/src/mailbox.rs 31 32 96.88%
imap-types/src/core.rs 123 128 96.09%
imap-types/src/response.rs 21 32 65.63%
Files with Coverage Reduction New Missed Lines %
imap-types/src/datetime.rs 29 80.29%
imap-types/src/mailbox.rs 39 69.19%
imap-types/src/response.rs 70 52.84%
imap-types/src/core.rs 367 70.6%
Totals Coverage Status
Change from base Build 9539427042: -3.4%
Covered Lines: 10190
Relevant Lines: 11783

💛 - Coveralls

@HenningHolmDE
Copy link
Contributor Author

@jakoschiko Thanks for pointing this out. I will try to address the issues in your comment one by one:

You are using #[serde(try_from = "String")]. [...]

Hopefully, I only added deserialization through String in cases, where the underlying type is some type of string (e.g. Cow<'a, str>). If you found any occurence where I went through a String on the way to a byte type (e.g. Cow<'a, [u8]>), please let me know. That would definetely be incorrect.

[...] As far as I can tell this will create an intermediate copy [...]

I would agree that deserializing into String will copy the data from the deserializer. After that, moving through TryFrom into Cow should not result into creating an additional copy. Let's check this in the playground.

However, if you are suggesting that previously, data was not copied, this seems not to be the case as deserialization into a Cow always results in a Cow::Owned, thus copying the data. (see serde-rs/serde#1852)
There seem to be ways to mitigate this (e.g. using #[serde(borrow)] or introducing serde_with::BorrowCow). But when I tried this out briefly, I ended up sprinkling borrow all over the place and yet was not able to make things work without copying when used in combination with try_from.
There is probably some way of doing this, but would suggest adding an issue to tackle this optimization at a later point in time. What do you think?

[...] and do some unnecessary unicode checks. [...]

Where would you expect additional unicode checks? On the way to the first Rust string type in the conversion, there will always be a unicode check to ensure Rust's UTF-8 guarantees. However, when converting between string types, there should be no additional checks (e.g. when moving the String into a Cow<'a, str>).

[...] Did you try #[serde(try_from = "&'a [u8]")]? I was not able to compile your branch with my suggestion, but it works in playground.

I tried using &'a str in combination with #[serde(borrow)], but as stated above, unfortunately, I was not able to easily get rid of any copying this way in the actual code.

@HenningHolmDE
Copy link
Contributor Author

@duesee I'm a bit lost with the Coveralls failure.

In the report, there seem to be arbitrarily sprinkled coverage misses in empty lines and comments. I first thought that might have something to do with mixing in old coverage information into the report, but even with the changes to the coverage report generation we merged today, this is still the case.

Also, when I look at the coverage reports generated locally, the PR seems to only improve things in the files critizied by the tool and the report generated by grcov makes actual sense.

For the base report of main, everything seems to be fine and comparable to the local report.

Any ideas on this?

@HenningHolmDE HenningHolmDE marked this pull request as ready for review June 17, 2024 10:09
@jakoschiko
Copy link
Collaborator

jakoschiko commented Jun 17, 2024

Hopefully, I only added deserialization through String in cases, where the underlying type is some type of string (e.g. Cow<'a, str>). If you found any occurence where I went through a String on the way to a byte type (e.g. Cow<'a, [u8]>), please let me know. That would definetely be incorrect.

My bad. I thought that types like Atom were wrapping &[u8], not &str.

However, if you are suggesting that previously, data was not copied, this seems not to be the case as deserialization into a Cow always results in a Cow::Owned, thus copying the data.

The previous serialization was also non-optimal (probably by accident). It should be possible to have an implementation that does not copy the bytes. E.g. the TryFrom implementation for Atom is using Cow::Borrowed:

impl<'a> TryFrom<&'a str> for Atom<'a> {
    type Error = ValidationError;

    fn try_from(value: &'a str) -> Result<Self, Self::Error> {
        Self::validate(value)?;

        Ok(Self(Cow::Borrowed(value)))
    }
}

There is probably some way of doing this, but would suggest adding an issue to tackle this optimization at a later point in time. What do you think?

Sure, if it's non-trivial to fix it, let's do it later.

Where would you expect additional unicode checks? On the way to the first Rust string type in the conversion, there will always be a unicode check to ensure Rust's UTF-8 guarantees.

Again, my bad. I'm a little bit surprised that @duesee decided to use &str instead of &[u8]. I far as I know we don't need to uphold Rust's UTF-8 guarantees here.

I tried using &'a str in combination with #[serde(borrow)], but as stated above, unfortunately, I was not able to easily get rid of any copying this way in the actual code.

This confuses me. Because of the TryFrom implementation I mentioned above I think this should be possible without much effort.

@HenningHolmDE
Copy link
Contributor Author

I'm a little bit surprised that @duesee decided to use &str instead of &[u8]. I far as I know we don't need to uphold Rust's UTF-8 guarantees here.

When writing the previous comment, I gave this some more thought and I'm actually also not sure if assuming this is correct. I have created #506 for discussing this further.

I tried using &'a str in combination with #[serde(borrow)], but as stated above, unfortunately, I was not able to easily get rid of any copying this way in the actual code.

This confuses me. Because of the TryFrom implementation I mentioned above I think this should be possible without much effort.

Unfortunately, your playground example simplifies things a bit by not using Cow and therefor not requiring #[serde(borrow)].
However, I'm not sure what I observed a few days ago, but trying it out now in the playground, the combination indeed seems to work.
Anyways, there is still the point of having to add #[serde(borrow)] all the way up the type tree for which I'm not sure of the consequences. So I would rather not throw this improvement into this PR as well and have created #507 to keep track of the topic.

@duesee
Copy link
Owner

duesee commented Jun 17, 2024

Regarding the coverage: Maybe, maaaybe, we screwed up the GitHub Actions cache when porting everything to just a few months ago. I had this feeling already that something is flaky... If your local coverage is sane, I would suggest not blocking on this. I'll open an issue to take a closer look.

Copy link
Owner

@duesee duesee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks good to me. I'll file two issues:

  • Check coverage in CI, and
  • Fuzz-test serde (de)serialization (when serde feature is used)

@jakoschiko Do you think this can already be merged?

@HenningHolmDE
Copy link
Contributor Author

Regarding the coverage: Maybe, maaaybe, we screwed up the GitHub Actions cache when porting everything to just a few months ago. I had this feeling already that something is flaky... If your local coverage is sane, I would suggest not blocking on this. I'll open an issue to take a closer look.

#511 should fix that. I am happy to wait for that one to be merged first and see this PR not reducing the coverage after rebasing.

The non-doc-comment line inside the doc comment code block seemed to
confuse rustfmt. Because of this, the nightly version was adding a
trailing space while stable was removing it.
Previously, correctness through the existing validation functions was
not enforced during deserialization.
This is now achieved by going through `TryFrom` implementations or
specific field deserialization functions. Thus, it should no longer be
possible to create invalid instances by deserialization.

This fixes duesee#502.
@HenningHolmDE HenningHolmDE force-pushed the issue_502_fix_deserialization_misuse branch from 4877e0d to df477b7 Compare June 18, 2024 16:43
@coveralls
Copy link
Collaborator

coveralls commented Jun 18, 2024

Pull Request Test Coverage Report for Build 9568958583

Details

  • 208 of 208 (100.0%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.4%) to 92.678%

Totals Coverage Status
Change from base Build 9568868462: 0.4%
Covered Lines: 10190
Relevant Lines: 10995

💛 - Coveralls

@duesee
Copy link
Owner

duesee commented Jun 18, 2024

Awesome work! Thanks! :-)

@duesee duesee merged commit 5c463bc into duesee:main Jun 18, 2024
9 checks passed
@HenningHolmDE HenningHolmDE deleted the issue_502_fix_deserialization_misuse branch June 18, 2024 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

misuse: Deserialization into incorrect types
4 participants