Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add isvalid(Type, value) methods, to replace is_valid_* #11241

Merged
merged 1 commit into from
May 22, 2015

Conversation

ScottPJones
Copy link
Contributor

This adds a new generic function, intended to eventually replace the following:
is_valid_ascii
is_valid_utf8
is_valid_utf16
is_valid_utf32
is_valid_char
This was suggested by @quinnj on the thread for #11140

@quinnj
Copy link
Member

quinnj commented May 12, 2015

+1 😃

@ScottPJones
Copy link
Contributor Author

I believe in plagiarizing good ideas wherever I find them! 😀

@tkelman
Copy link
Contributor

tkelman commented May 12, 2015

Very nice. Needs docs and tests, but I approve of the direction.

@shashi
Copy link
Contributor

shashi commented May 12, 2015

Noting that this will need to go into Compat.jl as well.


export isgraphemebreak

# also exported by Base:
export normalize_string, graphemes, is_valid_char, is_assigned_char, charwidth,
export normalize_string, graphemes, is_valid_char, is_assigned_char, charwidth, isvalid,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I like about this commit is that isvalid is something that is more general than just strings. For this reason, does it actually make sense to export it here (from UTF8proc) in addition to from Base? The other things exported on this line are string-specific.

Also, in addition to the interfaces you provide, what about being able to call e.g. isvalid(s::UTF16String) (without explicitly providing the type as an argument). This interface would match some current uses of isvalid in julia (for instance, in base/sparse/cholmod.jl).

@JeffBezanson
Copy link
Member

Worth considering whether this relates to #3631.

+1. Underscores are generally a sign of missing abstraction!

@JeffBezanson
Copy link
Member

Hmm I see isvalid(str,i) is currently used for checking string indexes. That strikes me as an overly generic name for that functionality. Maybe we can combine that with isassigned into an isvalidindex function?

@ScottPJones
Copy link
Contributor Author

Shall I add that as another PR? isvalidindex Do you also want the old functions deprecated?

@StefanKarpinski
Copy link
Member

I think the name isvalidindex is sufficiently ugly to suggest that we can factor this better rather than having it.

@ScottPJones
Copy link
Contributor Author

@StefanKarpinski maybe so, but I agree with @JeffBezanson , that isvalid(str, i) doesn't seem like the right name for what it is doing... maybe hasindex(str, i)?

@StefanKarpinski
Copy link
Member

Slightly awkward fit, but conceptually, this is haskey, isn't it?

@ScottPJones
Copy link
Contributor Author

Yes, haskey actually is more general... I'd vote to deprecate the current isvalid(str, i) in favor of haskey(str, i)

@ScottPJones
Copy link
Contributor Author

@shashi Do you mean I need to modify Compat.jl in v"0.3"? I've never used Compat.jl, I'm only working in 0.4.x.

@ScottPJones
Copy link
Contributor Author

@tkelman I've added all the docs and test cases (much more complete than was there for the is_valid_ functions...). One thing, because I'm still waiting for my other PRs to be merged, in particular #11203, some of the tests are marked "should be false", for cases where the current underlying code is broken...
@garrison I just noticed your comment [the line notes drive me nuts... I don't seem to get notified for them], I can do that (when it isn't ambiguous, i.e. for arguments of type Char, ASCIIString, UTF*String).

@ScottPJones
Copy link
Contributor Author

@garrison I just realized though, in the future, I really want those to simply return true, i.e. if those types only contain valid characters or strings.

@ScottPJones
Copy link
Contributor Author

OK, docs and tests added, isvalid(value) added, the only thing I don't know about is Compat.jl
Anything left to hold this up?

@kmsquire
Copy link
Member

@ScottPJones, see https://github.com/JuliaLang/Compat.jl. Generally, for new features, people are encouraged to add versions of those features to Compat.jl in order to make the feature usable on Julia v0.3.

@ScottPJones
Copy link
Contributor Author

OK, but how would I get changes from that, which would be on a separate fork, as part of this PR? Or should it be a separate PR? Also, I think a few of my other PRs then could use additions to Compat.jl, do I need to make a separate PR for each one (so that the Compat.jl changes can be merged as each PR [hopefully soon!] gets merged)? Thanks!

@mbauman
Copy link
Member

mbauman commented May 13, 2015

The additions to Compat.jl happen separately (as a PR to that repository) and after the Julia changes are merged. That way, you can calculate the exact version number that the feature was added, and then you only enable the compatibility definitions before that version number. You'll end up adding something like this to Compat:

if VERSION < v"0.4.0-dev+XXXX" # calculated by the linked script
     export isvalid
     isvalid() = 
     
end

@ScottPJones
Copy link
Contributor Author

OK, so I really can't make the PRs for Compat.jl changes until somebody has pity on me and starts merging in some of my outstanding PRs... 😢

@mbauman
Copy link
Member

mbauman commented May 13, 2015

I'm just following on the periphery here, but for this PR it'd probably be good to deprecate the methods this replaces. The @deprecate macro in deprecated.jl makes this pretty easy:

@deprecate is_valid_ascii(s::Union(Array{UInt8,1},ByteString)) isvalid(ASCIIString, s)

It also exports the old symbol, so you should also remove it from the list of explicit exports. And then make sure that the old usages are all removed from Base.

@ScottPJones
Copy link
Contributor Author

Do people really want it deprecated right away? I'm happy to do that, or should it be a following PR?

@jakebolewski
Copy link
Member

You should do everything at once. The Compat.jl stuff can come after this is merged.

@ScottPJones
Copy link
Contributor Author

OK, I think everything is all set...

is_valid_utf16(s::UTF16String) = is_valid_utf16(s.data)
isvalid(str::UTF16String) = isvalid(UTF16String, str.data)
is_valid_utf16(str::UTF16String) = isvalid(UTF16String, str.data)
is_valid_utf16(data::AbstractArray{UInt16}) = isvalid(UTF16String, data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecations are done per-method, and the methods are defined by the deprecate macro. That way when it comes time to remove them, we just need to delete the section in deprecated.jl and don't need to do any archaeology. Take a look at what macroexpand(:(Base.@deprecate f(x, y) g(2x, 2y))) does as a simple example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to deprecate these methods, all you need to do is delete the =, put a @deprecate in front of it, and move it to deprecated.jl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so all trace of the old method goes away (except in deprecated.jl), right?
I'd already added them to deprecated.jl, just didn't know I could get rid of the definitions in utf16.jl...
Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, exactly.

(We really need to have more folks who are qualified to review changes to the unicode/strings codebase. I'm not that person, but I can occasionally make quick glances through change sets like this.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... if they weren't my own changes, I think I'd be a very good person for reviewing Unicode/strings changes 😀 (I spent 20 years being the gatekeeper to the core code at ISC, so I know whats involved!)

@ScottPJones
Copy link
Contributor Author

All set, documented and tested... anything else, kind audience?

@ScottPJones
Copy link
Contributor Author

I’ve now squashed this, all tests, documentation, and deprecating old is_valid_xxx functions is done...

@@ -397,6 +411,10 @@

Create a string from the address of a NUL-terminated UTF-32 string. A copy is made; the pointer can be safely freed. If ``length`` is specified, the string does not have to be NUL-terminated.

.. function:: is_valid_utf32(s) -> Bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is deprecated now, right? I guess we can leave the deprecated functions' docs in, but I'm not sure if we have a doctest to remind us to clear out obsolete docs anywhere.

@tkelman
Copy link
Contributor

tkelman commented May 14, 2015

I'd vote to deprecate the current isvalid(str, i) in favor of haskey(str, i)

Was this done in here yet? I don't see it...

@ScottPJones
Copy link
Contributor Author

OK, all done, now shows 1 argument isvalid() cases for Char, UTF8String, UTF16String, UTF32String, and 2 argument cases for ones that would be ambiguous without 1st type argument...
Also squashed!

@ScottPJones
Copy link
Contributor Author

Anybody up to reviewing this? I can do the Compat.jl changes once it's in... Thanks!

@tkelman
Copy link
Contributor

tkelman commented May 19, 2015

Looks like this should be okay to merge, but #11241 (comment) wasn't done?

@ScottPJones
Copy link
Contributor Author

Yes, it was, days ago... See line 327 of utf32.jl.

@tkelman
Copy link
Contributor

tkelman commented May 19, 2015

You mean line 102? Then are the methods in base/string.jl no longer necessary?

@ScottPJones
Copy link
Contributor Author

Oops, sorry, I was looking at my integrated version (that also has #11004)...
You are correct... I missed removing those two isvalid lines in string.jl :-(
I'll fix right now and retest...

@ScottPJones
Copy link
Contributor Author

Done! Thanks!

@@ -95,7 +95,8 @@ sizeof(s::UTF16String) = sizeof(s.data) - sizeof(UInt16)
unsafe_convert{T<:Union(Int16,UInt16)}(::Type{Ptr{T}}, s::UTF16String) =
convert(Ptr{T}, pointer(s))

function is_valid_utf16(data::AbstractArray{UInt16})
isvalid(::Type{UTF16String}, str::UTF16String) = isvalid(UTF16String, str.data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is covered by the union version in utf32.jl, right?

@tkelman
Copy link
Contributor

tkelman commented May 19, 2015

Do people think this is NEWS-worthy?

@timholy
Copy link
Member

timholy commented May 19, 2015

I haven't followed this in any detail, but it seems so to me.

The single argument form `isvalid(value)` can now be used for values of type `Char`, `ASCIIString`,
`UTF8String`, `UTF16String` and `UTF32String`.
The two argument form `isvalid(type, value)` can be used with the above types, with values
of type `Vector{UInt8}`, `Vector{UInt16}`, `Vector{UInt32}`, and `Vector{Char}`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should put ([#11241]) at the end of the NEWS item (to reference the PR) and run julia doc/NEWS-update.jl (to insert the appropriate URL at the end).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks, didn't know about that... (learning as fast as I can!), all done now.

The single argument form `isvalid(value)` can now be used for values of type `Char`, `ASCIIString`,
`UTF8String`, `UTF16String` and `UTF32String`.
The two argument form `isvalid(type, value)` can be used with the above types, with values
of type `Vector{UInt8}`, `Vector{UInt16}`, `Vector{UInt32}`, and `Vector{Char}`. ([#11241])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, you should put parenthetical citations inside punctuation (like this).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did... ([#11241]).
Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, before the . I'll fix...

@tkelman
Copy link
Contributor

tkelman commented May 20, 2015

I was about to hit merge on this one because I think it's ready, are there any remaining objections to that?

@stevengj
Copy link
Member

LGTM.

@tkelman
Copy link
Contributor

tkelman commented May 22, 2015

Great, this is an excellent cleanup. Will be good to get these new methods into Compat.jl when you get a chance.

tkelman added a commit that referenced this pull request May 22, 2015
Add isvalid(Type, value) methods, to replace is_valid_*
@tkelman tkelman merged commit ed61f1e into JuliaLang:master May 22, 2015
@ScottPJones
Copy link
Contributor Author

Thanks! I'll get this into Compat.jl now.

@ScottPJones ScottPJones deleted the spj/isvalid branch May 22, 2015 09:30
StefanKarpinski pushed a commit that referenced this pull request Feb 8, 2018
StefanKarpinski pushed a commit that referenced this pull request Feb 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.