-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Element.hasClass ignores html strict mode #1446
Comments
Excuse me. I have reproduced the behaviour by:
Could you tell me how to use html strict mode so I can test and add some features for Jsoup.select()? |
Hi, |
Hi, For simplicity, maybe you can do text replacement before select, and replace the uppercase or lowercase search content with different content to eliminate conflicts, or you can nest another case sensitive method after selection. There is no doubt that your findings are correct. In source code of jsoup 1.13.1 (the latest version so far), if we change the 1374th line of Element.java from "return className.equalsIgnoreCase(classAttr);" to "return className.equals(classAttr);" then the problem with the example you gave is solved. Class ".c1" with a lowercase c in it will not be selected by document.select(".C1"); any more. If we want to solve this problem completely, we need to add an case sensitive option to the selectors in jsoup. Due to default parameters are not supported in Java, and for not to distrubing old funtions, overloading the hasxxx methods seems a good solution. For example:
But then a branch of methods need to be modified like this, since the methods are nested and we need to pass the boolean value from head to tail. This will make jsoup more complex, I'm not sure if it will bring some bad effects. Also, as far as I know, HTML class names are case-sensitive, while CSS selectors are generally case-insensitive. My suggestion is that we should always write code case sensitively. |
I have tried to fix this issue, following is my pull request. #1527 |
Here‘s the code. Hope this helps you. |
Great. |
You're welcome! Just a reminder, you may also write like this to automatically determine whether to use strict mode.
|
Hi,
Jsoup ignores case sensitive class selector.
This happens regardless if we use strict html mode or not (<!DOCTYPE html>).
It causes different behaviour from a browser behaviour when using strict mode.
For example:
<!DOCTYPE html>
<html><head><style type="text/css">
.c1{ font-size:44px; }
.C1{ color:red; }
</style></head><body>
<div class="c1">
Some text
</div></body></html>
The following will fetch the div, although the c is in lowercase in the div:
document.select(".C1");
My findings:
The class evaluator matches method calls Element.hasClass.
Element.has class checks for a match - ignoring the case sensitive.
The text was updated successfully, but these errors were encountered: