-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getAttribute only works with lower-case values in XML #651
Comments
Since we're using window.DOMParser to parse XML on the client side and that supports parsing XML perhaps we've fallen into a bit of a grey area. Swapping in jsdom makes it really easy to convert the lib to node—there's basically little to no code in our core that needs to change—but perhaps there's an alternative you would recommend? Here's the project fork I'm working on if you're curious. In #393 @tmpvar mentions that perhaps jsdom might consider supporting xml in the future. Is there anything I can do to help that along? |
I mean, we're committed to supporting XML to exactly the extent that browsers do. It looks like
That's our goal :). So yeah, this definitely is in jsdom's purview.
For sure! To fix this particular issue, I'd suggest an approach where you figure out where we normalize the tag names, and if it's too early, make us stop doing that. We need to normalize them in HTML node serialization and when doing matches on HTML DOMs, but presumably there are places where we normalize them that we shouldn't be. I can't imagine it'll be that straightforward, but we do have pretty nice test coverage, so you're unlikely to break anything. |
ok cool. If I have some cycles this weekend I'll dive in. thanks! |
I think I've tracked it down to the line 60
changing line 165
So changing |
@domenic I could put together a PR but I'm still not sure what's the best way to tell |
@robdodson I don't think there should be a need for a flag. You should just be able match what browsers do, so when you use a |
OK, I created the above PR to try to resolve this. |
Your test fails when there's an XML document passed as raw string and it contains a byte order mark., as it expects the <?xml to be at the beginning of the string. |
Oh hm.. Can you show me an example of what that looks like? |
sample first line of a utf-8 encoded xml file with byte order mark:
I'm using this from nodejs, reading the data with readFileSync (with encoding given), i.e.,:
without the substring call, the match wouldn't work as it's anchored to the beginning of the string, and there the byte order mark is sitting. potentially PEBKAC... |
Consider that the XML spec appendix F Autodetection of character encodings notes that this byte sequence (octal 0357 0273 0277 or hexadecimal 0xEF 0xBB 0xBF) denotes a UTF-8 encoding. It also denotes the valid byte order marks that may live at the beginning of the input before the xml declaration. IMHO you should consider supporting these byte order marks. |
Would you be interested in putting together a PR which adds that support? I'm no longer working with jsdom so not sure when I would have to the time to do it. |
When I use jsdom to parse an xml document it lower-cases all attributes. I believe this is the correct behavior for HTML according to MDN, but it is not the correct behavior for XML. In XML attributes are case sensitive so
codeSystem
andcodesystem
should be treated as two separate items.Because
DOMParser
treatscodeSystem
andcodesystem
as separate entities my library works in the browser but not in node with jsdom. I can add a flag so it lower cases its arguments when it's using jsdom but that makes me a little iffy because it's not the expected behavior for XML.The text was updated successfully, but these errors were encountered: