-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Will this work on xml documents? #73
Comments
I've never tried, but if anything, it will parse the XML as if it is HTML which you may not want. I plan to add a proper XML parser at some point, but I haven't gotten around to it yet. You can try it, and it may be good enough for what you need. |
I should add that |
Thanks for the explanation, interested in this too. I have routinely parsed XML with Thanks for this awesome library. |
There are some subtle differences. For example, HTML doesn't support self-closing elements other than void HTML elements, while XML does. Also if the XML looks like HTML, the parser might shuffle around nodes without warning to fit with the rules of HTML, while XML should never do that. |
I see, thanks. |
Hi, I'm also facing the need to parse RSS feeds for a CLI version of Meta-Press.es (press meta-search-engine). Parsing RSS XML files with text/html content-type allows to get the content of some elements (such as titles) but not links nor pubDates (which aren't HTML). In my use case, as long as I can querySelector() elements and reach their textContent it's OK, I don't need strict parsing nor element order. (well, to be true, over the nearly 1000 scrapped newspaper websites sometimes I need to reach attributes and sometimes I use XPath) |
I will be parsing RSS feeds and am wondering if I can use querySelectorAll on an xml doc
The text was updated successfully, but these errors were encountered: