-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do we change the DOMDocument instance that get passed in, and is this an issue? #174
Comments
Confirmed in php-mf2 if you pass in a DOMDocument, it's modified during parsing: Input HTML: <div class="hentry">
<div class="entry-content">
<p class="entry-summary">This is a summary</p>
<p>This is <a href="/tags/mytag" rel="tag">mytag</a> inside content. </p>
</div>
</div> $doc = new DOMDocument();
$doc->loadHTML($html);
echo $doc->saveHTML();
$parse = Mf2\parse($doc);
echo $doc->saveHTML(); Output (trimmed doctype and <div class="hentry">
<div class="entry-content">
<p class="entry-summary">This is a summary</p>
<p>This is <a href="/tags/mytag" rel="tag">mytag</a> inside content. </p>
</div>
</div>
<div class="hentry h-entry">
<div class="entry-content e-content">
<p class="entry-summary p-summary">This is a summary</p>
<p>This is <a href="/tags/mytag" rel="tag">mytag</a> inside content. </p>
</div>
<data class="category p-category" value="mytag"></data></div> |
Appears to be a simple fix: |
Zegnat
added a commit
to Zegnat/php-mf2
that referenced
this issue
May 27, 2018
A DOMDocument instance being passed to the parser should not have changed after parsing. This could potentially trip-up further use of the same DOMDocument instance. See microformats#174.
Zegnat
added a commit
to Zegnat/php-mf2
that referenced
this issue
May 27, 2018
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
See microformats/mf2py#104. For backwards compatibility parsing, the Python parser changes the DOM on the fly. I believe the PHP parser does a similar thing. It turns out that – in the case of the Python parser – the same DOM object can’t be parsed successfully a second time. The microformats in the base document have been “damaged”.
How can we best test if this is the case with our parser too? Maybe also add a test case where we check that a second parse gives the same result?
Needs investigating. Thanks @kartikprabhu for bringing this up!
(This is basically a todo for myself, therefore also assigning myself.)
The text was updated successfully, but these errors were encountered: