Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we deprecate casting strings to regular expressions? #14694

Closed
e-kayrakli opened this issue Dec 20, 2019 · 13 comments
Closed

Should we deprecate casting strings to regular expressions? #14694

e-kayrakli opened this issue Dec 20, 2019 · 13 comments

Comments

@e-kayrakli
Copy link
Contributor

Currently, one can create a regular expression by casting a string:

use Regexp;

var s = "contains";
var rs = s:regexp;

writeln("contains regular expression".search(rs));    // matches
writeln("doesn't contain regular expression".search(rs));   // doesn't match

On #14686, @mppf suggested that may be we should deprecate these and make compile the only way to create a regular expression (these casts call compile with default arguments).

Arguably, regular expressions more than just strings and being able to cast a string to regular expression feels weird to me, too. I don't think it brings any convenience either.

The other way around (casting a regular expression to string) can be seen as a handy way of getting the actual expression from a regexp object, so it doesn't feel that weird. However, if symmetry is an issue, a user-facing way to get the expression from a regexp object can replace that cast, too.

@bradcray
Copy link
Member

The big caveat to my response is that I've hardly ever worked with regular expression libraries in languages, only with regular expressions on command-line utilities. For me compile() has always seemed like a weird term for building a regular expression parser, though I understand it. For that reason, stylistically, the cast seems more appropriate to me. But I wouldn't object to deprecating it if we feel uncertain about its future. Or another option would be to add a --warn-unstable warning for it. In either case, we could add some text like "if you want to advocate for continuing to support this, send $5 to Engin" so that people knew that it wasn't necessarily going away, just at risk of it? (unless others feel more strongly than I do).

@arezaii
Copy link
Collaborator

arezaii commented Aug 13, 2021

It seems like this issue has been decided and implemented, as compiling the provided example results in:

warning: 'Regexp' module is deprecated; please use 'Regex' module instead.
...
4: warning: Regex: 'regexp' is deprecated; please use 'regex' instead
4: error: illegal cast from string to regex

while using:

use Regex;

var s = "contains";
var rs = compile(s);

writeln("contains regular expression".search(rs));    // matches
writeln("doesn't contain regular expression".search(rs));   // doesn't match

results in:

(matched = true, offset = 0, size = 8)
(matched = false, offset = -1, size = 0)

Can we mark it resolved and close it?

@e-kayrakli
Copy link
Contributor Author

Can we mark it resolved and close it?

Nope.

The issue probably predates adding bytes-based regular expressions and making regex[p] a generic type. So, if you compile the OP today, you'll be trying to cast a string to a generic regex, which gives you that error. You can still do the equivalent of that code today which is:

use Regex;

var s = "test";
var r = s:regex(string);

writeln(r);

This cast just calls compile today: https://github.com/chapel-lang/chapel/blob/main/modules/standard/Regex.chpl#L1074

FWIW, I still support deprecating this.

@bradcray -- I don't remember whether we've discussed compile during the regex(p) review (I can dig up the notes). But maybe the solution is to rename compile to something else?

I did a quick look around. Python has the same compile in the regex interface. RE2, OTOH, uses the term "compile" in the documentation, but it seem to use just C++ constructor to compile a regular expression: https://github.com/google/re2/wiki/CplusplusAPI#pre-compiled-regular-expressions

My argument against using cast for this is still the same: regular expressions are not strings, they are just related. And casting should be a much more simple operation than creating a regex parser.

@bradcray
Copy link
Member

We did discuss it and continue to express reservations about the name, but I don't recall what we concluded either, if anything. Checking my notes isn't suggesting any conclusive decisions, though the last thing I typed was a new regex approach.

I'm not sure I necessarily buy the argument that casts should only be between related types or involve simple operations. But I'm not insistent that we use a cast either. I just find it a succinct and clear way to say "make me a this from a that." Initializers and copy initializers are obviously similar in that regard (though initializers are not always as 1:1 as casts and copy initializers).

@e-kayrakli
Copy link
Contributor Author

It is interesting that you bring up initializers. I wonder if there's something we can decide based on the "levels of conversions" idea: #17200

Which suggests that if you can initialize X from Y, you should be able to cast Y to X, if I am understanding correctly. So, that would support your point a little bit in the sense that if I can do new regex("myString") (which feels nice to me), why shouldn't I be able to myString:regex(string) (which still doesn't very nice to me).

@bradcray
Copy link
Member

I agree it feels related. I admittedly haven't internalized the "if you have a, you should also have b" hierarchy of conversions yet, but just wrote some tests that defined an init= and got the complaints about the lack of a cast. To me, this intuitively makes sense: You've already told me I can create an X from a Y using a copy initializer, so why require me to type:

var myX: X = myY;
...myX...

rather than simply:

...myY: X...

That said, I'd agree that new X(myY) is much less annoying than the copy initializer pattern. But I think #17200 is only talking about copy initializers, not normal initializers? (but am not confident and haven't tried).

@bradcray
Copy link
Member

Oh one other thought here. In conversations sometimes, I feel as though people (not saying you, necessarily) often interpret a cast as being "pretend that this Y is an X without actually creating a new X", which I don't think is accurate in general, particularly for value types. I think classes in a subclass/superclass relationship are one of the only places (in Chapel at least) where casts avoid creating new values (?). This is part of why I tend to view casts and copy initializers as being quite similar apart from one (the cast) being more explicit than the other.

@e-kayrakli
Copy link
Contributor Author

In conversations sometimes, I feel as though people (not saying you, necessarily) often interpret a cast as being "pretend that this Y is an X without actually creating a new X", which I don't think is accurate in general, particularly for value types.

I think I am one of those people, though, and it came up before. I am still trying to rewire my brain.

Not to argue against what you said, but more as an introspection: my mental reflex is still that casting is almost like the programmer telling the compiler "forget what you know about this value, pretend it is of type X", which is quite different than what you suggest and I believe it is the reason my response is different than yours.

@mppf
Copy link
Member

mppf commented Aug 23, 2021

Which suggests that if you can initialize X from Y, you should be able to cast Y to X, if I am understanding correctly. So, that would support your point a little bit in the sense that if I can do new regex("myString") (which feels nice to me), why shouldn't I be able to myString:regex(string) (which still doesn't very nice to me).

That said, I'd agree that new X(myY) is much less annoying than the copy initializer pattern. But I think #17200 is only talking about copy initializers, not normal initializers? (but am not confident and haven't tried).

Today, new X(myY) just runs a proc init(arg) but we have issue #17199 proposing that it could run init=.

At the same time, #17200 indeed only considers init= and not proc init.

A more relevant example for #17200 is that if we want

var x: regex = "myString";

to compile, then we need to also allow

var x = "myString" : regex;

I don't personally think we need to support either of those though (and I lean more towards just having people call compile).

@bradcray
Copy link
Member

bradcray commented Sep 1, 2021

I don't personally think we need to support either of those though (and I lean more towards just having people call compile).

I thought the general sense of things at the regex review was that people considered compile to be a confusing/vague term for "make a regex from this string." But that's a pretty vague recollection on my part and may reflect my own opinion rather than that of the group. And I'm not seeing any notes in the usual place other than the ones I took for myself.

@arezaii
Copy link
Collaborator

arezaii commented Aug 9, 2022

Discussing this question in a sub-group during module review, we noted some of the motivation for this question.

Motivations:

  • desire to move towards single way to do something
  • seems like a convenience method, but doesn't save much if any typing vs. an initializer
  • limits the ability to specify options relevant to a regex while maintaining backwards compatibility post 2.0

This also relates to the question of dropping compile() #17187, where the conversation is trending towards dropping compile() in favor of a throwing initializer, which is a TODO language feature.

Because throwing initializers remain a TODO, we considered if would be appropriate to instead use a throwing createRegex or similar named factory method, but decided that would be compile() by another name, and since there were possible other use cases for throwing initializers, we would rather get them implemented anyway.

Polls taken during the re-review indicate preference for dropping the cast from string to regex
in favor of an initializer pattern, new regex(...)

Questions/Polls:

If we had only one way to build a regex, would you prefer it to be:

  • new regex() : 6 yeas
  • compile() : 2 nays
  • (cast) thing:regex(string/bytes) : no votes

We want to know if it's worth preserving casting strings to regex as a convenience:

  • only have new regex(...) : 7 yeas
  • preserve cast `thing:regex(string) : no votes

The presumed result of the discussion is that we will deprecate the cast from string to regex unless there is more opposition voiced here.

@bradcray
Copy link
Member

bradcray commented Aug 9, 2022

in favor of a throwing initializer, which is a TODO language feature.

Recall that while we don't support throwing initializers today, we have written demonstrations that have shown that we're very close to supporting throwing initializers as long as the throw is done after the logical this.complete() step (i.e., after all fields are initialized explicitly or implicitly). And/or possibly in the postinit() routine? (I don't recall for sure anymore).

[edit: So in saying that, I'm suggesting maybe we should grit our teeth and get the remaining 10-20% done, under those constraints, if it's something we want for cases like this]

@e-kayrakli
Copy link
Contributor Author

I believe #20944 has resolved this.

I linked to @arezaii's comment above from #17187 which is related but different. If there's any initializer/compile related questions, we can continue the discussion there. Note also that #21636 will resolve that issue.

I am closing this one.

e-kayrakli added a commit that referenced this issue Mar 3, 2023
#21636)

This PR adds a new throwing initializer for `regex` and deprecates
`Regex.compile`.

Resolves #17187
Also see
#14694 (comment)

While working on this, I bumped into an issue while replacing a
`compile` call in ChapelBase. That part of the code already had a TODO.
So, in the end, I moved `_command_line_cast` inlining its helper from
ChapelBase to ChapelUtil and addressed the problem referred to in the
said TODO. This enables the helper to `use Regex` without causing
resolution order related problems. This also relates to
#5456.

[Reviewed by @vasslitvinov]

Test:
- [x] standard
- [x] gasnet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants