You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For the attached eisenachonline-de.html.gz, jsoup 1.15.3 returns a tree different from that shown by Firefox and Chrome in their dev tools. There are nested body and especially html tags that seem to trip up the parser to close the stack prematurely.
<div class="entry-date"><time datetime="2022-10-20 14:24">20. Oktober 2022</time>
</div>
but prints null instead. The entry-date div is present in the parsed document, but it is a child of the body element.
There seems to be a deviation from the HTML spec in the jsoup implementation. In the AfterBody state, when processing an html end tag, jsoup pops the stack to close. In the "after body" insertion mode, the HTML spec only says to switch to the "after after body" mode, but it doesn't ask to close the stack. So it appears to me the stack is closed too early.
The text was updated successfully, but these errors were encountered:
For the attached eisenachonline-de.html.gz, jsoup 1.15.3 returns a tree different from that shown by Firefox and Chrome in their dev tools. There are nested body and especially html tags that seem to trip up the parser to close the stack prematurely.
The relevant HTML section looks like this:
For this HTML, the following code:
should print
but prints null instead. The
entry-date
div is present in the parsed document, but it is a child of thebody
element.There seems to be a deviation from the HTML spec in the jsoup implementation. In the
AfterBody
state, when processing anhtml
end tag, jsoup pops the stack to close. In the "after body" insertion mode, the HTML spec only says to switch to the "after after body" mode, but it doesn't ask to close the stack. So it appears to me the stack is closed too early.The text was updated successfully, but these errors were encountered: