match is type unstable #10550

sjkelly · 2015-03-18T05:34:47Z

I feel like this a duplicate issue, but I could not find it. A little ironic I guess.

julia> @code_warntype match(r"([M,G])([0-9.])+", "foo")
Variables:
  r::Regex
  s::ASCIIString

Body:
  begin  # regex.jl, line 134:
      GenSym(2) = box(Int32,checked_trunc_sint(Int32,0))
      GenSym(8) = (top(getfield))(s::ASCIIString,:data)::Array{UInt8,1}
      return match(r::Regex,$(Expr(:new, :((top(getfield))(Core,:UTF8String)::Type{UTF8String}), :((top(convert))(Array{UInt8,1},GenSym(8))))),1,GenSym(2))::Union(RegexMatch,Void)
  end::Union(RegexMatch,Void)

This behavior is documented, but I feel like it should not be the default for performance reasons. This seems like a use case for Nullables.

The text was updated successfully, but these errors were encountered:

sjkelly · 2015-03-18T05:35:52Z

I should add that I am willing to work on this is there is some general consesus to change it.

hayd · 2015-03-18T07:41:10Z

+1... but I'm not sure how you can do this without breaking lots of existing code.

Edit: thinking about it pandas made this change to their str.match method, for the same reason.

nalimilan · 2015-03-18T08:18:24Z

+1

nalimilan · 2015-03-18T09:08:57Z

BTW, I've just bumped on the same pattern with Pkg.installed(). Performance is probably not critical for this kind of function, but if you start working on it you may want to look for other functions like this.

StefanKarpinski · 2015-03-18T11:19:43Z

The big issue here is design. How do you design the equivalent functionality in a type-stable way? One possibility is using the relatively new Nullable type and always returning a Nullable. Still kind of awkward though.

I had at one point proposed an else clause for for loops that execute zero times and then writing this like so:

for m = match(r"([M,G])([0-9.])+", "foo")
    # do stuff with m
else
    # handle not matching
end

Then match would have to return an iterable object that matches zero or one times. We could also just do matches or something like that and if you only want one match, you break after the first match.

quinnj · 2015-03-18T14:24:55Z

I think this is a great case for Nullable, it's actually been on my todo
list for a while to rework this, but if someone else wants to take a crack
at it, feel free!

On Wed, Mar 18, 2015 at 5:19 AM, Stefan Karpinski notifications@github.com
wrote:

The big issue here is design. How do you design the equivalent
functionality in a type-stable way? One possibility is using the relatively
new Nullable type and always returning a Nullable. Still kind of awkward
though.

I had at one point proposed an else clause for for loops that execute
zero times and then writing this like so:

for m = match(r"([M,G])([0-9.])+", "foo")
# do stuff with melse
# handle not matchingend

Then match would have to return an iterable object that matches zero or
one times. We could also just do matches or something like that and if
you only want one match, you break after the first match.

—
Reply to this email directly or view it on GitHub
#10550 (comment).

tkelman · 2015-03-18T14:41:41Z

as mentioned here #1820 (comment) there's also the new pcre2 api and features to think about, if you're digging into regex functionality

MikeInnes · 2015-03-18T15:25:24Z

I like the idea of for/else. It's a nice construct that would remove some ugly flags in various bits of code. I don't see any real downsides to having it, since it's not like it uses an extra keyword.

nalimilan · 2015-03-18T15:25:49Z

Returning an iterable is also an appealing alternative, if it can be made as efficient as match when you only want the first match. That would allow merging match and matchall, and unify their return types.

EDIT: There's also search, which currently returns an empty range when no match is found, and thus would work with the for/else pattern.

kmsquire · 2015-03-18T16:28:22Z

A while ago, I had implemented this as a do function (which was equivalent to @StefanKarpinski's suggestion, but without handling else). I liked it, but it wasn't accepted.

This behavior was actually one of my main motivations for creating Match.jl. If I had the time to learn lisp better, I would love to try to get some of that matching into the parser directly. (Any takers?)

simonster · 2015-03-18T16:51:59Z

From a performance perspective, an iterable may be better than a Nullable. Until we have escape analysis, constructing a Nullable{RegexMatch} requires an additional allocation on the heap. (OTOH there are already so many heap allocated things inside a RegexMatch that maybe this doesn't matter.)

simonster · 2015-03-18T17:00:33Z

There may not be a big performance advantage to making match type-stable anyway. There is no type ambiguity when accessing the fields of the return value since those fields don't exist on nothing.

johnmyleswhite · 2015-03-18T18:08:37Z

My personal sense: the most important thing is that we standardize on a idiom for all functions that return results that may not exist. Nullable seems like the simplest approach, but I don't really care what we do as long as we always do the same thing. We can easily make Nullable iterable: this has been discussed before.

The heap allocation issue raised by @simonster is more serious, but I think we need to solve it since the use cases for something equivalent to Nullable are going to keep piling up.

MikeInnes · 2015-03-18T19:29:35Z

In theory you could certainly work out that (m::Union(RegexMatch, Nothing)).match is a string, [and in practice, too!]

~~but I don't think the compiler is smart enough to do that yet. My understanding is that unions are equivalent to Any as far as optimisations go.~~

johnmyleswhite · 2015-03-18T19:32:01Z

I think it's more complicated than that: Union(T, Union()) is the same as Union(T) as far as I understand.

simonster · 2015-03-18T19:56:41Z

@one-more-minute Type inference is smart enough to figure that out:

julia> function f(str)
       x = match(r"\w", str)
       x.match
       end;

julia> code_typed(f, (UTF8String,))
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[:str], Any[Any[:x],Any[Any[:str,UTF8String,0],Any[:x,Union(RegexMatch,Void),18]],Any[],Any[UTF8String,Regex,UTF8String,Regex]], :(begin  # none, line 1:
        GenSym(1) = r"\w"
        x = match(GenSym(1),str::UTF8String,1,(top(box))(Int32,(top(checked_trunc_sint))(Int32,0)))::Union(RegexMatch,Void) # line 2:
        return (top(getfield))(x::Union(RegexMatch,Void),:match)::SubString{UTF8String}
    end::SubString{UTF8String}))))

@johnmyleswhite typeof(nothing) is Void not Union(), but type inference knows that getfield(::Union(RegexMatch,Void),:match) can only return SubString{UTF8String} because there's no match field on nothing. So there's no dynamic dispatch in the IR and no additional allocation. The only overhead is that the field access occurs via jl_f_get_field instead of via a direct memory access, and this does not seem to be very expensive. Adding a type annotation to the result of match gives roughly a 3% improvement in performance for me.

MikeInnes · 2015-03-18T20:08:37Z

julia> type Foo x::Int end

julia> type Bar y::Float64 end

julia> foo() = rand()<0.5? Foo(1) : Bar(1)
foo (generic function with 1 method)

julia> bar() = foo().x
bar (generic function with 1 method)

julia> @code_typed bar()
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[], Any[Any[],Any[],Any[],Any[]], :(begin  # none, line 1:
        return (top(getfield))(foo()::Union(Foo,Bar),:x)::Int64
    end::Int64))))

That's pretty sweet, I had no idea we could do that kind of inference on field names. It even types it as Union() if the field doesn't exist, and correctly types it as Union(Float64, Int64) if both fields are named x.

JeffBezanson · 2015-04-25T04:06:05Z

Cool isn't it :D

oxinabox · 2016-08-26T00:29:54Z

Keen for this, it just catches me out everytime, because its not error I expect.

I wonder though if match should just throw an exception if it does not match.
and if try_match should return Nullable{RegexMatch} (as in #12055).

I think a Regex not matching is an exceptionable situation.
If you are only interested in if it matches or not, then you would have used ismatch.
So in general when the programmer calls match they believe it does match,
and they want more details about how it matches -- what are the capture groups, etc.

yuyichao · 2016-08-26T00:40:47Z

So in general when the programmer calls match they believe it does match,
and they want more details about how it matches -- what are the capture groups, etc.

I don't think this is true. In my experience it is very common to first check if the string matches and then do further processing using the information of the matching.

KristofferC · 2018-04-04T15:50:57Z

I think this can be closed now because in a turn of events this now matches what other functions do (e.g. findfirst).

simonster mentioned this issue Jul 6, 2015

Make match return Nullable{RegexMatch} #12033

Closed

nalimilan added the missing data Base.missing and related functionality label Sep 6, 2016

KristofferC closed this as completed Apr 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

match is type unstable #10550

match is type unstable #10550

sjkelly commented Mar 18, 2015

sjkelly commented Mar 18, 2015

hayd commented Mar 18, 2015

nalimilan commented Mar 18, 2015

nalimilan commented Mar 18, 2015

StefanKarpinski commented Mar 18, 2015

quinnj commented Mar 18, 2015

tkelman commented Mar 18, 2015

MikeInnes commented Mar 18, 2015

nalimilan commented Mar 18, 2015

kmsquire commented Mar 18, 2015

simonster commented Mar 18, 2015

simonster commented Mar 18, 2015

johnmyleswhite commented Mar 18, 2015

MikeInnes commented Mar 18, 2015

johnmyleswhite commented Mar 18, 2015

simonster commented Mar 18, 2015

MikeInnes commented Mar 18, 2015

JeffBezanson commented Apr 25, 2015

oxinabox commented Aug 26, 2016

yuyichao commented Aug 26, 2016

KristofferC commented Apr 4, 2018 •

edited

Loading

match is type unstable #10550

match is type unstable #10550

Comments

sjkelly commented Mar 18, 2015

sjkelly commented Mar 18, 2015

hayd commented Mar 18, 2015

nalimilan commented Mar 18, 2015

nalimilan commented Mar 18, 2015

StefanKarpinski commented Mar 18, 2015

quinnj commented Mar 18, 2015

tkelman commented Mar 18, 2015

MikeInnes commented Mar 18, 2015

nalimilan commented Mar 18, 2015

kmsquire commented Mar 18, 2015

simonster commented Mar 18, 2015

simonster commented Mar 18, 2015

johnmyleswhite commented Mar 18, 2015

MikeInnes commented Mar 18, 2015

johnmyleswhite commented Mar 18, 2015

simonster commented Mar 18, 2015

MikeInnes commented Mar 18, 2015

JeffBezanson commented Apr 25, 2015

oxinabox commented Aug 26, 2016

yuyichao commented Aug 26, 2016

KristofferC commented Apr 4, 2018 • edited Loading

KristofferC commented Apr 4, 2018 •

edited

Loading