-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support case insensitive string contains #269
Conversation
057a3ad
to
f73fbc8
Compare
Thanks for the contribution! @isomarcte is this something we'll continue to be able to support in your 2.x rewrite? |
That should be quite doable. |
Great! Let me know if you'd like me to port this anywhere |
I'll ping this ticket when the core changes in 2.x.x land. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I've pushed a unit test where this breaks down. I would expect contains to be consistent with equals. I think we could make my tests pass with a Locale.ROOT
argument to toLowerCase
. (I wish Java would deprecate the versions without a Locale.)
I still fear that's not the entire solution, because equalsIgnoreCase folds in both directions. I just don't have time to publish another failing test. I think we'll need a port of commons-lang3's method to make it bulletproof.
@rossabaker I'm pretty sure this will hold if we define scala> FullCaseFoldedString("i").toString.contains(FullCaseFoldedString("I").toString)
val res3: Boolean = true
scala> TurkicFullCaseFoldedString("i").toString.contains(TurkicFullCaseFoldedString("I").toString)
val res4: Boolean = false |
Oh sorry @armanbilge and @henricook I didn't realize this was a PR, I thought it was an issue only when I first read it. Yeah, as @rossabaker points out, we can't really define this safely on 1.x.x because we don't actually have a full representation of the case folded value. |
I think it can be supported in 1.x. The definition of equality compares corresponding characters. |
Thanks for these thoughtful comments all, i'm processing them. Why's the Turkey Test called the Turkey Test? Edit: Ah! Found it! https://stackoverflow.com/questions/796986/what-is-the-turkey-test |
Do we know why I might prefer commons-lang3's |
7a1c2c3
to
5f2a620
Compare
5f2a620
to
c4a0a8f
Compare
What do we think of c4a0a8f ? Please note: I've force pushed this branch with a rewritten commit history as my |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks correct to me. I think all the tests would pass if the haystackIndex maxes out at 0. I wonder if we could have a property like:
forAll { (prefix: String, s: String, suffix: String) =>
val a = CIString(prefix + s + suffix)
val b = CIString(s.map(_.toUpperCase)) // something that makes mixed case deterministically would be better still
a.contains(b)
}
@rossabaker @henricook I'm good with this for 1.x.x, but note that we aren't doing a Unicode character contains, we are doing a UTF-16 char contains. So things like this won't work, but it should for most Unicode definitions of case folding. scala> CIString("\u0041\u030a")
val res0: org.typelevel.ci.CIString = Å
scala> CIString("\u00c5")
val res1: org.typelevel.ci.CIString = Å
scala> res0.contains(res1)
val res2: Boolean = false Though I suppose that is consistent with the equality behavior for 1.x.x. Just please be aware of this caveat until 2.x.x is out. |
Thanks both, and apologies this fell off my radar. Is this good to go @rossabaker ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, mine too. Yes, looks good to me.
I can convert some of my existing projects to use CIString instead of my home rolled CaseInsensitiveString class if we can support contains.
I don't contribute as often as I'd like so please feel free to comment on anything