-
Notifications
You must be signed in to change notification settings - Fork 425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default type for regular expressions #14693
Comments
Add support for bytes regular expressions This PR adds support for using `bytes` values as regular expressions to be used with other `bytes` values. Summary: - Makes `regexp` generic on `type exprType` which can be `string` or `bytes` Currently, this PR doesn't assign a default to `exprType`. But we might want to make `string` the default type. See #14693 - Removes `stringPart` from Regexp module. This was based on some comments in the code that says it is not tested, and in some places, support for it is not fully implemented. This is not absolutely necessary for this PR to move on and I can revert back. But it makes the interface and sometimes implementation much easier to remove that. - Remove the deprecated `compile` with `out err`, deprecates the `compile` with `utf8` argument and all lowercase arguments. The new `compile` uses `utf8` encoding with strings and `Latin1` with bytes internally. - Adds `bytes` methods with regexp arguments. `bytes.this(reMatch)`, `bytes.search(regexp(bytes))` etc - Adds support for doing formatted reads for `regexp(bytes)`. The internal class `channel_regexp_info` now stores only `bytes`. They are `decode`d into strings as necessary when they are extracted. Making this also generic cannot be easily done, I think. Because `readf` needs to create a `channel_regexp_info` instance before starting to read the arguments. And before starting to read the arguments we don't know whether they are `regexp(bytes)` or `regexp(string)`. Note that the type of the format string does not determine the type of the regular expression. Therefore, one can capture a bytes-based regular expression using `"%/*/"` and a string-based regular expression using `b"%/*/"` - Modifies some tests to test bytes-based regular expressions in the same way string-based ones are tested. - Adjusts the `RecordParser` module to use `regexp(string)` - Adjusts mason to use new `compile` - Adds tests for non-UTF8 regexp matching and casts. - Minor adjustment in the docs to emphasize that we have raw string/bytes literals. [Reviewed by @mppf] Test: - [x] standard - [x] gasnet
I would go with option 1: no default type. It's the least restrictive option between 1 and 2 if some person wants to make a custom string type for their use case. |
I think I said this elsewhere, but: I agree that it shouldn't have a default type. I could imagine it having a default type with deprecation warning for the 1.21 release to help people transition their existing codes that relied on it being string-only. I'm imagining this could be supported by having a 0-argument initializer that generated the warning and called a 1-argument initializer passing |
Allow string-by-def regexps with warning This PR makes `string` default for `regexp` with a deprecation warning for 1.21 Resolves #14693 Resolves #14895 `regexp`s were string-based until #14686 made them generic to support bytes and strings. Under #14693, we reached a consensus that generic `regexp` is what we want going forward, but we need to support `regexp` with string-by-default type for 1.21 with a deprecation warning. This PR also adds deprecation tests [Reviewed by @dlongnecke-cray] Test: - [x] standard local - [x] standard gasnet
#14686 will add support for
bytes
regular expressions. In order to do that it makesregexp
record generic ontype exprType
. The meaningful concrete types in this scenario areregexp(string)
andregexp(bytes)
.That PR does not assign a default value to
exprType
, but maybe it should. We have the following options:This helps with some of the confusions I was having during the implementation of Add support for bytes regular expressions #14686. For example:
if
regexp
doesn't have default type, then,parse
can be used with both string and bytes regular expressions. Otherwise, we'd be stuck withstring
only, or change the prototype toproc parse(r: regexp(?))
This is also a breaking change, but probably not very impactful. Note that you can use regular expressions without using this type explicitly:
gives you a
regexp(string)
. If the literal wasbytes
then it would giveregexp(bytes)
default value is
string
string
should be the more common use for this class, and it might make sense to just make it default.default value is
bytes
-- just for completeness. I don't think this is a real optionEven though #14686 chooses 1, I don't really have a preference between 1 and 2.
The text was updated successfully, but these errors were encountered: