Skip to content

quick-xml captures UTF BOM as Event::Text #191

@TakaakiFuruse

Description

@TakaakiFuruse

Hi, thank you for creating a cool crate, I'm using your crate in my project!

Problem

It looks quick-xml captures UTF BOM as an Event::Text.

Normally, If you parse following XML with the Reader code,

<tag1 att1 = "test">HELLO</tag1>

you would capture

  1. "test" as Event::Start,
  2. "HELLO" as Event::Text

However, if BOM was embedded in the file, the Reader code captures

  1. "test" as Event::Start,
  2. "\u{feff}" as Event::Text and
  3. "HELLO" as Event::Text

The second one should be BOM code.

I think it's better not to capture such code and I happy to make a pull request if you think that should be fixed.

(Please also have a look at my branch, I have added a test to parse an XML with BOM. It might be able to reproduce the problem.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    encodingIssues related to support of various encodings of the XML documents

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions