-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No matches when searching by tagName in XMLDOM's text/html DOM #27
Comments
I've tracked this down to the difference between Would you mind changing Xpath so it would? Prefixing every tag in the query with the XHTML namespace is rather cumbersome. Thanks! |
@moll What is it exactly that you are suggesting be changed? If the nodes have a namespace, then named node tests for them must use a namespace prefix: http://www.w3.org/TR/xpath/#node-tests
|
Perhaps a way to set a default namespace for unqualified names. The Try the example above. It's rather cumbersome to prefix every tag name with either the full namespace URI or an alias -> |
As someone who uses XPath a lot, I don't see it as all that cumbersome to use prefixes most of the time, but you definitely have a valid point in that the original API had a (Incidentally, the best case is I don't currently have collaborator access to this repo and the owner doesn't seem to be checking pull requests, but I have actually just sent an e-mail to the owner asking whether they could make me a collaborator. If that happens, I would be willing to expose the |
The |
I see, I confused myself. The XPath evaluator has a "case insensitive mode" (which was never exposed externally), and I had misremembered this as an "ignore namespaces mode". I would not be in favor of outright supporting non-null default namespaces, because as silly as it may seem, that would violate the XPath 1.0 spec. Most professional XPath implementations (.NET, most XSLT engines, etc.) do not allow using a default namespace because the spec is quite clear on the point that XPath 1.0 has no such capacity. |
What's the default namespace feature for then? Querying HTML DOMs with Xpath here is simply too cumbersome out of the box. Why not make it more convenient is what I'm proposing. That's what computers are for — to handle repetitive tasks for us. ;-) |
Which feature are you referring to? I see that the
I don't really see what's so terribly cumbersome about adding an extra two characters to each step of an XPath. People who work with XML do this all the time.
Because, as I already said, it would violate the XPath 1.0 spec. I think spec compliance is a worthwhile thing, especially with a library like this. It could also certainly have unintended consequences for someone using the library and expecting it to behave according to the spec.
Yes, but not at the expense of spec compliance. |
It's only practical to add namespaces to queries when working with tightly controlled corpora. An example scenario is when you have a set of XPath selectors that you want to apply to a large set of HTML documents. Those documents may or may not be explicitly namespaced, and namespaces may differ between documents (HTML4/xhtml etc.). In this case it's not possible to add namespaces to queries so that they apply to the whole corpus, but by using default namespaces you can resolve the problem with minimal hacking. |
Same issue here. I agree with @moss. The following sould work, but it does not:
In my opinion xpath should return the body element also on documents with text/html... text/html is a valid mime type (https://developer.mozilla.org/en/docs/Web/API/DOMParser). Or is there any good reason for xpath to refuse to operate on text/html? |
This library operates on text/html just fine. It is simply strict with regards to namespaces, and html elements are in the The following works:
|
I'm going to keep this open a little longer for a few more potential changes. |
👋 hi, just to add some content to that because it also got me confused (and initially I thought it was a bug of
IIUC, 2. won't work because this library is strict with xpath 1.0 specs (which makes sense), so we need to provide the namespace. The reason it's working in browsers, is that browsers follow the the I'm not sure how feasible is it to add a boolean param / mode to also support that modified spec here, but at the very least, it'd be nice to document that behaviour, because it's very easy to fall into that trap, and it's easy to get lost because browsers work differently. I can work on a PR if you agree on the analysis. |
@sdeprez I can't find any place where this is documented or mentioned explicitly, but cae87df added an const path = xpath.parse('//input');
const nodes = path.select({ node: myHtmlDom, isHtml: true }); You can see some examples of this in this repo's test.js file. |
Thanks! It looks like On the other hand, I needed to use
I can work on a PR if you agree on the principle. [EDIT]: actually about using |
@sdeprez Sorry for not getting back to you sooner. We actually have an API page that mentions the .evaluate() method, but it takes a bit of digging to get there: api docs -> link under xpath.parse -> "documentation page" link Maybe it would be better to have The XPathEvaluator page also doesn't include information about the I'd be happy to look at a PR if you can submit one. Thanks!
As indicated on the page linked above, it follows the same behavior as xpath.select. If you would like to select into a specific type, you can use one of the other |
Hey,
Giving Xpath a spin with XMLDOM. I can't seem to get it to match any elements by tag name when the DOM's MIME type is set to
text/html
. Is it supposed to work?The above returns an empty array. Add a random attribute to an element, try to match by that instead (
//*[@foo=123]
) and it works.Thanks!
The text was updated successfully, but these errors were encountered: