Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative range positions in malformed HTML fragment #2175

Closed
KennyWongPFPT opened this issue Jul 22, 2024 · 1 comment
Closed

Negative range positions in malformed HTML fragment #2175

KennyWongPFPT opened this issue Jul 22, 2024 · 1 comment
Assignees
Labels
bug Confirmed bug that we should fix fixed
Milestone

Comments

@KennyWongPFPT
Copy link

Hello, please see below a test program that tries to extract the text node range positions from the malformed fragment foo<p/>far. Notice the malformed tag <p/>.

import org.jsoup.nodes.*;
import org.jsoup.parser.*;
import org.jsoup.select.*;

public class Test {
    public static void main(String[] args) {
        HtmlTreeBuilder treeBuilder = new HtmlTreeBuilder();
        Parser parser = new Parser(treeBuilder);
        parser.setTrackPosition(true);
        Document document = parser.parseInput("foo<p/>bar", "");
        NodeTraversor.traverse((Node node, int depth) -> {
            if (node instanceof TextNode textNode) {
                Range sourceRange = textNode.sourceRange();
                System.out.printf("text=%s start=%d end=%d%n",
                    textNode.text(),
                    sourceRange.start().pos(),
                    sourceRange.end().pos());
            }
        }, document);
    }
}

With release 1.16.1, all positions are negative:

% java -cp ~/.m2/repository/org/jsoup/jsoup/1.16.1/jsoup-1.16.1.jar Test.java
text=foo start=-1 end=-1
text=bar start=-1 end=-1

With release 1.18.1, it's a little better, except for the -1 start position for the bar text immediately following the malformed tag.

% java -cp ~/.m2/repository/org/jsoup/jsoup/1.18.1/jsoup-1.18.1.jar Test.java
text=foo start=0 end=3
text=bar start=-1 end=10
@jhy jhy self-assigned this Jul 29, 2024
@jhy jhy added bug Confirmed bug that we should fix fixed labels Jul 29, 2024
@jhy jhy added this to the 1.18.2 milestone Jul 29, 2024
@jhy jhy closed this as completed in dc3b6c5 Jul 29, 2024
@jhy
Copy link
Owner

jhy commented Jul 29, 2024

Thanks for the clear report, fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bug that we should fix fixed
Projects
None yet
Development

No branches or pull requests

2 participants