-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace isnumber(), etc. with a single function #14347
Comments
Do you not like |
A function returning the char category (AFAIK, class isn't a defined Unicode term) is more general. If the symbol comparison is slow, we could use an enum instead. |
How is it more necessarily more general? abstract CharCategory
abstract Alphabetic <: CharCategory
abstract LowerCase <: Alphabetic
abstract UpperCase <: Alphabetic
abstract UpperCategory <: UpperCase
abstract TitleCase <: UpperCase
abstract MathLetter <: Alphabetic
...
const global cat_to_type = [UpperCategory, LowerCase, TileCase, MathLetter, OtherLetter, ...] Also, it turns out that using
|
|
|
What I meant by "general" is that it allows us to save the result (category), e.g. to print it for debugging purposes, or to list all categories encountered in a string. With
Well, you can check whether the characters are in category Lu or Lm or Lo. We could also make the function even more general, say |
There already is a
I also didn't say that you'd just have So, if I had a set of (I'm not sold on the name for |
Turns out, ischartype(cat::CharCategory, c::Char) = isa(charprop(CharCategory, c), cat)` |
Yes, that was my point. I don't think we need both, especially since you wouldn't write |
OK then, would you support a PR where I did all that I talked about above, less the addition of |
I would support it, but let's ask about the opinion of others... |
A common use case is a question such as "is this character an ASCII letter or digit?", as in |
Doesn't look like this use case would really gain from the present proposal. That said, I guess it would be written inefficiently as something like |
|
That's fast, but not as fast as a plain comparison on integer values, which is all you need when working with ASCII strings. In the scenario @eschnett showed, computing the Unicode category is a waste (as I said in my previous comment). |
Right, that is for the non ASCIIString case |
I'm not sure what problem is being solved here. |
Let's not do this. |
The types version of this doesn't help anything, but doing it with symbols as originally proposed would work |
True, but doesn't really seem worth it. |
How about deprecating some of these functions, as mentioned in the description? Things like |
Agreed, these names are pretty obscure and not that widely used. Better to not be stuck with them for all of 1.x |
Note that we do have Even if you export a function to get category codes, however, many of the See also #5939, and #8233. Note also that, if I recall correctly from the discussion, there are several modern languages that choose to have these functions, e.g. Go. |
Can we move them under a namespace? They're not especially generic things that you would ever call on anything but a Char or AbstractString. Go's spellings of these are also notably more obvious than ours. They're also not all that widely-used, looks like on the order of 5-15 packages depending which one. |
Moving these to a What's even more interesting is that Swift has removed these functions in version 3 in favor of patterns like |
All character class stuff should be moved into a |
That package can also re-export stuff from |
See #25021. |
Functions have been moved to the Unicode stdlib module. Keeping this issue open since it would still make sense to provide an API to get Unicode character properties like general category. |
I think an API to get the category code should be a separate issue. |
We can iterate on the API of the |
Seems this issue's 1.0 items have been completed? |
Yup, kicked to the 1.x milestone for further iteration. |
This was mentioned by @JeffBezanson and @ScottPJones at #14340 (comment). It looks like we could replace some or all of
isalpha
,isalnum
,iscntrl
,isgraph
,islower
,isnumber
,isprint
,ispunct
andisupper
with a single function testing for a given Unicode character general category. (isspace
andisascii
cross several categories and must thus be kept;isdigit
is also more restrictive thanisnumber
.)The simplest API would be something like
charcategory(x::Char) -> Symbol
. This would force writing e.g.all(c->charcategory(c) == :L, s)
to check whether all characters of a string are uppercase, but it would at least have the advantage of clarity (#14156 (comment)), and would be fast as soon as Jeff's work on anonymous functions is merged.An intermediate solution would be to keep the most commonly used functions like
isnumber
,islower
,isprint
andisupper
, but deprecateisalpha
,isalnum
,isgraph
,ispunct
andiscntrl
. Maybe evenisupper
andislower
could be deprecated: isn't it more common to uselowercase
oruppercase
if you care about the result, or to check for a specific character? Actually, evenisnumber
might be deprecated, as it is easily confused withisdigit
, which I suspect is the most commonly needed test (for parsing and conversion).The text was updated successfully, but these errors were encountered: