Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IsBlankString matcher is inconsistent with String.isBlank() #325

Open
Thorn1089 opened this issue Jan 27, 2021 · 5 comments · May be fixed by #326
Open

IsBlankString matcher is inconsistent with String.isBlank() #325

Thorn1089 opened this issue Jan 27, 2021 · 5 comments · May be fixed by #326

Comments

@Thorn1089
Copy link

The naming of the matcher implies it would give consistent results with String.isBlank(). However, it does not. The matcher uses a regex with pattern \\s* which will only match ASCII whitespace characters. String.isBlank() eventually delegates to Character.isWhitespace() which checks for Unicode whitespace as well. See https://stackoverflow.com/a/57998178 for a detailed breakdown, summarized below:

But further, there are different definitions of white-space

Character.isWhitespace:

Determines if the specified character (Unicode code point) is white space according to Java. A character is a Java whitespace character if and only if it satisfies one of the following criteria:

It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F').
It is '\t', U+0009 HORIZONTAL TABULATION.
It is '\n', U+000A LINE FEED.
It is '\u000B', U+000B VERTICAL TABULATION.
It is '\f', U+000C FORM FEED.
It is '\r', U+000D CARRIAGE RETURN.
It is '\u001C', U+001C FILE SEPARATOR.
It is '\u001D', U+001D GROUP SEPARATOR.
It is '\u001E', U+001E RECORD SEPARATOR.
It is '\u001F', U+001F UNIT SEPARATOR.
the \s pattern (by default):

\s A whitespace character: [ \t\n\x0B\f\r]

I discovered this when using JUnit Quickcheck with an assumption to prevent handling whitespace strings of the form:
Assume.assumeThat(string, not(blankString()));
The assumption passed, and my code would check the string via String.isBlank() and fail the test.

@Thorn1089 Thorn1089 linked a pull request Feb 5, 2021 that will close this issue
@hakanai
Copy link

hakanai commented Feb 18, 2021

For those of us with more experience it doesn't imply anything at all, because "blank" is such an ambiguous concept that any method in a library claiming to detect "blankness" is immediately banned for use by anyone who cares about what they're letting into their project.

My own Hamcrest cheat sheet marks them with skull marks for this reason:

@Thorn1089
Copy link
Author

In fact, "blank" does have a precise meaning in the Java ecosystem as I pointed out in my original post -- it should be consistent with the isBlank and isWhitespace methods of String and Character respectively.

I did learn from the associated PR that Hamcrest still supports older versions of Java that predate the String#isBlank convenience method; however, Character#isWhitespace has been present since 1.5 for Unicode code points and 1.1 for char values. I think it entirely appropriate that the IsBlank matcher be consistent with those results.

If you don't want to use the IsBlank matcher in your code, that's your decision, but it is a non-deprecated part of the Hamcrest library and as such I'd consider this a deficiency that can be fixed.

@hakanai
Copy link

hakanai commented Feb 19, 2021

No, in the "Java ecosystem" you speak of, there are multiple definitions of whitespace. Just because you only know of one of them does not mean that the others do not exist.

Kevin Bourrillion compiled this spreadsheet:

https://docs.google.com/spreadsheets/d/1kq4ECwPjHX9B8QUCTPclgsDCXYaj7T-FlT4tB5q3ahk/edit#gid=0

You're right that it's a deficiency that can be fixed, but perhaps it should be fixed by removing the method.

@peterdemaeyer
Copy link
Contributor

Note that String.isBlank() was only introduced relatively recently in Java 11.
The Hamcrest sources predate that, and still build with Java 1.7 at the time of writing.

@Thorn1089
Copy link
Author

@peterdemaeyer understood, and I have updated the linked PR to remain compatible with that JDK.

I do think that since we're currently in rampdown for JDK16, that it's reasonable that devs will start using String#isBlank (I could have sworn this made it in in JDK8 for some reason -- guess JDK11 has been out even longer than I recalled at first) and make the same mistake.

As mentioned previously, Character#isWhitespace (which is what the convenience method ultimately boils down to using) dates back to 1.5, so I think it's reasonable that the implementation use that rather than the regex pattern \s (which I believe is ASCII-only to match PCRE and other older regex engines).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants