Will this work on xml documents? #73

chovyprognos · 2021-11-24T17:05:03Z

I will be parsing RSS feeds and am wondering if I can use querySelectorAll on an xml doc

b-fuze · 2021-11-24T21:01:51Z

I've never tried, but if anything, it will parse the XML as if it is HTML which you may not want. I plan to add a proper XML parser at some point, but I haven't gotten around to it yet. You can try it, and it may be good enough for what you need.

b-fuze · 2021-11-24T21:05:17Z

I should add that parseFromString only accepts text/html for now

amundo · 2022-06-02T18:29:13Z

Thanks for the explanation, interested in this too. I have routinely parsed XML with .parseFromString(xml, 'text/html') and it often seems to work; could you explain what it means to “parse XML as if it is HTML”?

Thanks for this awesome library.

0kku · 2022-06-02T19:04:21Z

There are some subtle differences. For example, HTML doesn't support self-closing elements other than void HTML elements, while XML does. Also if the XML looks like HTML, the parser might shuffle around nodes without warning to fit with the rules of HTML, while XML should never do that.

amundo · 2022-06-03T23:51:24Z

I see, thanks.

Siltaar · 2024-07-09T13:10:48Z

Hi, I'm also facing the need to parse RSS feeds for a CLI version of Meta-Press.es (press meta-search-engine).

Parsing RSS XML files with text/html content-type allows to get the content of some elements (such as titles) but not links nor pubDates (which aren't HTML).

In my use case, as long as I can querySelector() elements and reach their textContent it's OK, I don't need strict parsing nor element order.

(well, to be true, over the nearly 1000 scrapped newspaper websites sometimes I need to reach attributes and sometimes I use XPath)

b-fuze mentioned this issue Apr 26, 2023

Add HTMLUnknownElement #136

Open

b-fuze mentioned this issue Sep 30, 2023

Namespace support #143

Open

Siltaar mentioned this issue Jul 9, 2024

What about an XPathEvaluator ? #172

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Will this work on xml documents? #73

Will this work on xml documents? #73

chovyprognos commented Nov 24, 2021

b-fuze commented Nov 24, 2021

b-fuze commented Nov 24, 2021

amundo commented Jun 2, 2022

0kku commented Jun 2, 2022

amundo commented Jun 3, 2022

Siltaar commented Jul 9, 2024

Will this work on xml documents? #73

Will this work on xml documents? #73

Comments

chovyprognos commented Nov 24, 2021

b-fuze commented Nov 24, 2021

b-fuze commented Nov 24, 2021

amundo commented Jun 2, 2022

0kku commented Jun 2, 2022

amundo commented Jun 3, 2022

Siltaar commented Jul 9, 2024