-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trim next element after DocType #603
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment and update changelog. I would like also to have some more formal tests for that situation (can be added in de/mod.rs
). Preferably to test each possible situation that you can imagine. I have a feeling, that the same problem can be happened not only for DocType + Comment
events.
@@ -2880,7 +2880,7 @@ impl StartTrimmer { | |||
#[inline(always)] | |||
fn trim<'a>(&mut self, event: Event<'a>) -> Option<PayloadEvent<'a>> { | |||
let (event, trim_next_event) = match event { | |||
Event::DocType(e) => (PayloadEvent::DocType(e), false), | |||
Event::DocType(e) => (PayloadEvent::DocType(e), true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment describing, why we need trim here because this is not obvious (otherwise it would be done initially)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Immediately following a Doctype event, the Serde parser expects to read an initial structure from the XML file - it expects a 'Start' tag event. Any comments following the doctype in the stream is parsed to Text/CData events, which leads the parse to fail with an 'ExpectedStart' panic. By skipping (trimming) any Text/CData events the next event the serde deserializer sees will be a Start event, which is always required to start building up a new Struct from the xml.
The Doctype should trim events here just like the Start/End events. Only the CData/Text events should not trim (otherwise any CData/Text following them will be ignored)
See <https://www.w3.org/TR/xml11/#NT-prolog> failures (2): xml_prolog::comments xml_prolog::pi
Otherwise consequent `Text` events (which is possible if their delimited by Comment or PI events, which is skipped) will be merged but not trimmed. That will lead to returning a `Text` event when try to call `deserialize_struct` or `deserialize_map` which will trigger `DeError::ExpectedStart` error. The incorrect trim behavior was introduced in tafia#581, when DocType event began to be processed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work! After investigating situation I realized, that actually comment is not needed, because trimming Text
events after DocType
event just follow common rule. Returning false
was an oversight in #583, which introduced this regression.
After investigating I realized that we should add tests for deserialization that uses deserialize_struct
or deserialize_map
methods (HashMap
uses the last) and an XML contains comments or processing instructions in XML prolog
Codecov Report
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. @@ Coverage Diff @@
## master #603 +/- ##
==========================================
+ Coverage 64.38% 64.44% +0.05%
==========================================
Files 36 36
Lines 16914 16916 +2
==========================================
+ Hits 10890 10901 +11
+ Misses 6024 6015 -9
Flags with carried forward coverage won't be shown. Click here to find out more.
|
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [quick-xml](https://github.com/tafia/quick-xml) | dependencies | minor | `0.28.2` -> `0.29.0` | --- ### Release Notes <details> <summary>tafia/quick-xml</summary> ### [`v0.29.0`](https://github.com/tafia/quick-xml/blob/HEAD/Changelog.md#​0290----2023-06-13) [Compare Source](tafia/quick-xml@v0.28.2...v0.29.0) ##### New Features - [#​601]: Add `serde_helper` module to the crate root with some useful utility functions and document using of enum's unit variants as a text content of element. - [#​606]: Implement indentation for `AsyncWrite` trait implementations. ##### Bug Fixes - [#​603]: Fix a regression from [#​581] that an XML comment or a processing instruction between a <!DOCTYPE> and the root element in the file brokes deserialization of structs by returning `DeError::ExpectedStart` - [#​608]: Return a new error `Error::EmptyDocType` on empty doctype instead of crashing because of a debug assertion. ##### Misc Changes - [#​594]: Add a helper macro to help deserialize internally tagged enums with Serde, which doesn't work out-of-box due to serde limitations. [#​581]: tafia/quick-xml#581 [#​594]: tafia/quick-xml#594 [#​601]: tafia/quick-xml#601 [#​603]: tafia/quick-xml#603 [#​606]: tafia/quick-xml#606 [#​608]: tafia/quick-xml#608 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS4xMTguMCIsInVwZGF0ZWRJblZlciI6IjM1LjExOC4wIiwidGFyZ2V0QnJhbmNoIjoiZGV2ZWxvcCJ9--> Co-authored-by: cabr2-bot <cabr2.help@gmail.com> Co-authored-by: crapStone <crapstone01@gmail.com> Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1940 Reviewed-by: crapStone <crapstone01@gmail.com> Co-authored-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org> Co-committed-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Allow comment to exist between DocType and root element
With the new EntityResolver/Doctype changes, putting an xml comment between a doctype and the root element causes the parse to fail. Trimming the next element fixes this.