-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extra dom additions 1 #93
Conversation
var_dump($dom->getElementsByTagName('child')[0]->getInScopeNamespaces()); | ||
var_dump($dom->getElementsByTagName('b:sibling')[0]->getInScopeNamespaces()); | ||
var_dump($dom->getElementsByTagName('d:child')[0]->getInScopeNamespaces()); | ||
var_dump($dom->documentElement->getInScopeNamespaces()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think technically this is correct since the element only knows about xmlns="urn:a", but what I am looking for in
optimize_namespaces` is a way to list all namespaces declared inside the element and its children. (especially the xmlns attribute value (URI) since the prefix will be optimized out)
This probably comes with a complexity that a URI can have multiple localNames in different child elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right that makes sense, you also want the namespaces that were shadowed of course.
Then we'd need to return an array where each prefix corresponds to an array of possible uris.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it could make sense to have 2 functions or a flag to control the behaviour:
- List all namespaces from the upper-elements + this element
- List all namespaces declared in this element + lower elements
This could then be used to mimic both the behaviour of : ./namespace::*
and .//namespace::*
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. So a getInScopeNamespaces and getDescendantNamespaces method. That should work.
If getDescendantNamespaces returns an array like getInScopeNamespaces does, then for an xml like this:
<parent xmlns:x="urn:foo">
<child xmlns:x="urn:bar"/>
</parent>
It will return an array with key x, with value ["urn:foo", "urn:bar"]
.
In the array you just get the namespace uris, that's it. You don't get on which element the prefix+uri combination is in scope, which is a downgrade from old DOM.
It makes me think whether it makes sense to somehow reintroduce the NamespaceNode class anyway, just for xpath. The problem with this is that it's not really a node, old DOM just misused it to be like a node (instead of a Attr instance). Although I could call it NamespaceInfo or something like that. One of the consequences is that the return types of XPath::query and XPath::evaluate have to include NamespaceInfo, as it doesn't inherit from Node. This is a problem for static analysis tools like psalm, who refused to update the return type for those methods in old DOM because it would cause many warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I see it : If we have the methods you introduce above, we don't need to run namespace::*
queries on xpath anymore. This would result in xpath only returning Node
s and not some custom object. I'm not sure, but how do other languages cope with this namespace
-axis?
Introducing a NamespaceInfo
value object as return value might be better than the multi-dimensional array. The functions would just return list<NamespaceInfo>
and the end-user can decide wether to group based on namespaceURI or local-name.
So my current feeling is to introduce the object, but not make it part of xpath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be a bit off, see:
- https://github.com/veewee/xml/blob/5144ee213f47976063efce545ff7ae663acccd7f/tests/Xml/Dom/Locator/Xmlns/LinkedNamespacesTest.php
- https://github.com/veewee/xml/blob/5144ee213f47976063efce545ff7ae663acccd7f/tests/Xml/Dom/Locator/Xmlns/RecursiveLinkedNamespacesTest.php
The linked one (using getInScopeNamespaces
) returns the same result as getDescendantNamespaces
for the item
here - which is not in line with ./namespace::*
.
<root xmlns="http://hello.com" xmlns:world="http://world.com" >
<item>1</item>
</root>
Not sure if it's not correct or that my understanding of it is wrong through :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's first check the LinkedNamespacesTest
.
./namespace::*
and .//namespace::*
return the same thing for item
: https://3v4l.org/FcavJ
getDescendantNamespaces
should correspond to .//namespace::*
which means "inclusive descendant namespace axis".
In the output we see the same result for both XPath queries.
Comparing with the new methods, I get this output from the script below:
prefix: null
uri: http://hello.com
prefix: "world"
uri: http://world.com
---
prefix: null
uri: http://hello.com
prefix: "world"
uri: http://world.com
script:
<?php
$xml = <<<XML
<root xmlns="http://hello.com" xmlns:world="http://world.com" >
<item>1</item>
</root>
XML;
$dom = DOM\XMLDocument::createFromString($xml);
foreach ($dom->documentElement->firstElementChild->getInScopeNamespaces() as $x) {
echo "prefix: ", json_encode($x->prefix), "\n";
echo "uri: ", $x->namespaceURI, "\n";
}
echo "---\n";
foreach ($dom->documentElement->firstElementChild->getDescendantNamespaces() as $x) {
echo "prefix: ", json_encode($x->prefix), "\n";
echo "uri: ", $x->namespaceURI, "\n";
}
So this seems to be right (except the xml namespace that XPath always adds).
I think what's going on is that your existing test on branch v3.1.0 is wrong: https://github.com/veewee/xml/blob/47504aec3a1a8146ff7e870acce07b561e3c2947/tests/Xml/Dom/Locator/Xmlns/LinkedNamespacesTest.php#L41
In this line, you think you get <item>
by doing $element->childNodes->item(0)
but that's not true. The first child node is the whitespace DOMText in front of <item>
. If you replace that with firstElementChild
you get the same behaviour as with the test on the spec-compliance branch. Since a DOMText has no namespaces, it returns the empty array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RecursiveLinkedNamespacesTest
fails because of inconsistent namespace ordering. This happens because my code traverses the tree backwards: from bottom to top, but for attributes left to right.
I think the more logical approach is if the namespaces in the array appear in the ordering they appear as attributes in the document. I pushed a commit that fixes this and that fixes your test as well.
It's worth noting that libxml does list the namespaces from top to bottom but from right to left... I think the behaviour that I changed to is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allright cool! Thanks for pointing out the error in my previous test-case.
I got everything working in my lib now. Except there is one more issue left regarding namespace optimization. I'll try to figure it out later, it's most likely on my end.
Keep you posted!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks for the effort!
5101836
to
615c52f
Compare
Rebased on current master; and cleaned up using the new macros on master. |
@@ -0,0 +1,98 @@ | |||
--TEST-- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm bumping against this case:
$doc = \Dom\XMLDocument::createFromString(
'<hello><a:item a:foo="bar" xmlns:a="http://ok" xmlns:whatever="http://whatever"/></hello>'
);
$item = $doc->documentElement->firstElementChild;
$item->rename($item->namespaceURI, 'b:item');
echo $doc->saveXML();
// ACTUAL : <hello><a:item xmlns:a="http://ok" xmlns:whatever="http://whatever" a:foo="bar"/></hello>
// EXPECTED : <hello><b:thing xmlns:b="http://ok" xmlns:whatever="http://whatever" b:foo="bar" /></hello>
The outcome here currently seems unexpected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit I was confused too at first sight.
When dumping out the prefix and namespaceURI (var_dump($item->prefix, $item->namespaceURI);
) I do see that the prefix did change from a
to b
. So that means that the serialization is at fault.
First, keep in mind that if you rename a node, you're only renaming that particular node. So renaming the element will not rename the attributes nor its children.
Why does it still use the a
prefix? This is due to a particular rule in the XML serialization spec, that enforces that a namespaceURI on an element is only associated with exactly one prefix. See the note of bullet point 2 of https://www.w3.org/TR/DOM-Parsing/#dfn-concept-serialize-xml. That means that even if you add an xmlns:b="http://ok"
attribute manually at the end, you still can't mix prefix a
and b
for attributes.
A similar situation arises without the use of renaming.
For example in this code (this is JS so you can verify it in your browser):
xml='<hello><a:item xmlns:a="http://ok"/></hello>';
dom=(new DOMParser).parseFromString(xml, "text/xml");
dom.documentElement.firstElementChild.setAttributeNS("http://ok", "b:test", "value");
(new XMLSerializer).serializeToString(dom)
// Output: <hello><a:item xmlns:a="http://ok" a:test="value"/></hello>
Note how in this output the b:test
attribute got the a
prefix.
If we check the prefix on the attribute though we can see that it does have the b
prefix, it's just the serialization that has a
.
dom.documentElement.firstElementChild.getAttributeNodeNS("http://ok", "test").prefix
// Output: 'b'
Back to your code now. If we change the code such that the xmlns attribute gets removed:
$doc = \Dom\XMLDocument::createFromString(
'<hello><a:item a:foo="bar" xmlns:a="http://ok" xmlns:whatever="http://whatever"/></hello>'
);
$item = $doc->documentElement->firstElementChild;
$item->removeAttribute('xmlns:a');
$item->rename($item->namespaceURI, 'b:item');
echo $doc->saveXML(), "\n";
Then we get: <hello><b:item xmlns:b="http://ok" xmlns:whatever="http://whatever" b:foo="bar"/></hello>
.
This is because now there is no explicit xmlns attribute for the http://ok
namespace, so it automatically associates the prefix from the b:foo
attribute from the http://ok
namespace.
Looking at your code, you seem to want to rename all occurrences of a
with the http://ok
namespace.
We can do that: if we rename all attributes and elements such that there is no occurrence of a
anymore, the serialized prefix will change to b
(slightly undertested PoC):
$doc = \Dom\XMLDocument::createFromString(
'<hello><a:item a:foo="bar" xmlns:a="http://ok" xmlns:whatever="http://whatever"/></hello>'
);
$item = $doc->documentElement->firstElementChild;
function recursiveRename($root) {
foreach ($root->childNodes as $node) {
if ($node instanceof DOM\Element)
recursiveRename($node);
}
foreach ($root->attributes as $attr) {
if ($attr->namespaceURI === 'http://ok')
$attr->rename($attr->namespaceURI, 'b:' . $attr->localName);
else if ($attr->namespaceURI === 'http://www.w3.org/2000/xmlns/' && $attr->localName == 'a')
$attr->rename($attr->namespaceURI, 'xmlns:b');
}
if ($attr->namespaceURI === 'http://ok')
$root->rename($root->namespaceURI, 'b:' . $root->localName);
}
recursiveRename($item);
echo $doc->saveXML(), "\n";
This finally gives: <hello><b:item xmlns:b="http://ok" xmlns:whatever="http://whatever" b:foo="bar"/></hello>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Implemented both suggestions:
- The
Element\rename
function contains logic for removing xlmns declerations in this specific case - The
Xmlns\rename_element_namespace
is now able to recursively rename xmlns-es on elements. (had to tweak the code a bit, but the tests are passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Glad it worked :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have this one situation that is a bit annoying and I can't seem to easily fix.
it's based on your code above:
doc = \Dom\XMLDocument::createFromString(
<<<EOXML
<schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://soapinterop.org/store1">
<xsd:include schemaLocation="./store1.xsd" />
</schema>
EOXML
);
$root = $doc->documentElement;
$namespaceURI = 'http://www.w3.org/2001/XMLSchema';
$newPrefix = 'ns1';
$recurse = function (\DOM\Element $element) use (&$recurse, $namespaceURI, $newPrefix): void {
foreach ($element->childNodes as $child) {
if (is_element($child)) {
$recurse($child);
}
}
foreach ($element->attributes as $attr) {
if ($attr->namespaceURI === $namespaceURI) {
$attr->rename($namespaceURI, $newPrefix . ':' . $attr->localName);
}
if ($attr->namespaceURI === 'http://www.w3.org/2000/xmlns/' && $attr->value === $namespaceURI) {
$attr->rename($attr->namespaceURI, 'xmlns:' . $newPrefix);
}
}
if ($element->namespaceURI === $namespaceURI) {
$element->rename($namespaceURI, $newPrefix . ':' . $element->localName);
}
};
$recurse($root);
$res = $doc->saveXML();
echo $res."\n";
$reload = \Dom\XMLDocument::createFromString($res);
expected
<ns1:schema xmlns:ns1="http://www.w3.org/2001/XMLSchema" targetNamespace="http://soapinterop.org/store1">
<ns1:include schemaLocation="./store1.xsd"/>
</ns1:schema>
actual
The actual output redeclares ns1 a second time:
<ns1:schema xmlns:ns1="http://www.w3.org/2001/XMLSchema" xmlns:ns1="http://www.w3.org/2001/XMLSchema" targetNamespace="http://soapinterop.org/store1">
<ns1:include schemaLocation="./store1.xsd"/>
</ns1:schema>
This gives an error whilst reloading the XML:
Warning: DOM\XMLDocument::createFromString(): Attribute xmlns:ns1 redefined in Entity, line: 2
Fatal error: Uncaught Exception: XML document is malformed in
The problem is related to this line:
if ($attr->namespaceURI === 'http://www.w3.org/2000/xmlns/' && $attr->value === $namespaceURI) {
$attr->rename($attr->namespaceURI, 'xmlns:' . $newPrefix);
}
I tried (conditionally removing duplicate xmlns: attrs, but this result in the XML being prefixed with 'xsd:' instead of 'ns1:'. Maybe you can think of a better solution?
There is one issue that is more or less related:
Given:
<foo>
<bar xmlns:whatever="http://whatever">
<whatever:baz/>
</bar>
</foo>
When I rename the namespace here, it will result in:
<foo>
<bar xmlns:ns1="http://whatever">
<ns1:baz/>
</bar>
</foo>
Which is what one would expect. But, I am trying to optimize the namespaces, meaning they should be made available on the root element. I've tried changing it by explicitely setting the xmlns attribute on the document-element before starting the recursive rename:
$root->setAttributeNS('http://www.w3.org/2000/xmlns/', $newPrefix, $namespaceURI);
This results in:
<foo xmlns:ns1="http://whatever" xmlns:ns1="http://whatever">
<bar>
<ns1:baz/>
</bar>
</foo>
Which brings me back to the first issue of having the duplicate namespace declaration I've posted in this comment.
So at this point, I'm not really sure if it is something I should fix from my side, or could be improved at XML node or serialization level.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know it this would work: if I remove the second one, the new prefix gets removed as well.
Not entirely sure I understand, but then again it's late on the evening... :P
If the rename is smart enough to neutralize duplicates, that could be fine as well. But it depends on how it would be able to achieve the end goal (as-in the examples above)
Right, but if the values of attribute with the same name differs, we can't neutralize and have to do something, e.g. throw. I'd rather not add special cases because they are often difficult for people to work with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely sure I understand, but then again it's late on the evening... :P
Now that I reread, I understand the confusion :) So let me try to reprhase:
I think it's best to throw a DOMException in that case, and not perform the rename. Does that work for you too? You'd have to remove the one you tried to rename manually then in the catch clause.
That's what I tried before and mentioned in the examples:
I tried (conditionally removing duplicate xmlns: attrs, but this result in the XML being prefixed with 'xsd:' instead of 'ns1:'. Maybe you can think of a better solution?
<ns1:schema xmlns:ns1="http://www.w3.org/2001/XMLSchema" xmlns:ns1="http://www.w3.org/2001/XMLSchema" targetNamespace="http://soapinterop.org/store1">
<ns1:include schemaLocation="./store1.xsd"/>
</ns1:schema>
If I remove the seconds xmlns, I get:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://soapinterop.org/store1">
<xsd:include schemaLocation="./store1.xsd"/>
</xsd:schema>
Which is not the expected ns1
.
So throwing an exception on renaming is fine - as long as the removal of the second one results in ns1
instead of xsd
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a change to throw when renaming would result in a duplicate attribute.
Here's your adapted code which does seem to work: https://gist.github.com/nielsdos/b86fff8f792e718cb1e7953c22c383f4
basically I added an exception handler that catches the new exception, and then removes the attribute instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allright, that seems to work!
I just had to add foreach ([...$element->attributes] as $attr) {
, otherwise the removal of the node stops iterating.
Is that intended or not (because I had similar issues in the old DOM extension)?
More info, see veewee/xml#50
<?php
$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->loadXML(
<<<EOXML
<x>
<item1 />
<item2 />
<item3 />
<item4 />
</x>
EOXML
);
foreach ($doc->documentElement->childNodes as $i => $node) {
echo $node->localName . PHP_EOL;
if ($i === 2) {
$node->remove();
}
}
item1
item2
item3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad you got it to work!
As for the removal issue, that's a bug in PHP, for both attributes
and childNodes
.
The reason it stops is because it fetches the next item of the list after removal, but once an item is removed from the list, the next field becomes NULL, so the iteration stops.
These lists are live, so that means that changes must be reflected immediately.
So all nodes should've been removed actually.
I'll open an issue at php-src, I don't know yet if we should fix it in "old DOM" too (BC reasons).
615c52f
to
fade6c4
Compare
ad5e5f9
to
70f5af1
Compare
1cbaeeb
to
ed2b742
Compare
ed2b742
to
ec5cc51
Compare
ec5cc51
to
3da8a49
Compare
@nielsdos Just wanted to follow up on this PR : How did you see the timeline for getting this into PHP 8.4? |
@veewee I didn't have the energy to finish up the RFC text because it's such a tedious process. I'll try to finish it this weekend so it can enter discussion on Monday or so. |
f3f6015
to
edb64cf
Compare
@veewee I finished writing the draft of the RFC: https://wiki.php.net/rfc/dom_additions_84 I'm sharing this with some people internally too for early feedback before posting on the list. |
Thanks for all the work you did here! |
34695b0
to
c7b4b64
Compare
Moved to php#14754 |
Based on spec compliance PR