-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regexp: support encoding.TextMarshaler #46159
Comments
cc @rsc |
Regardless of #45754, I'd be surprised as a user to find a type that implements only either |
I share @dsnet's sentiment, though I should also note that in the case of regexp, marshal/unmarshal wouldn't be really symmetric: marshal would just show the original expression string, whereas unmarshal would do a comparatively more expensive regexp compile. A compile also takes options, so I imagine those wouldn't be exposed via TextUnmarshaler. |
Losing the options feels like a deal-breaker. re := regexp.MustCompile(".*")
re.Longest()
data, _ := re.MarshalText()
// Longest() call is not preserved
_ = re.UnmarshalText(data) |
This seems to suggest that That said, I would expect |
This proposal has been added to the active column of the proposals project |
As a strawman proposal to spur discussion, Examples: rx1 := regexp.MustCompile(`[0-9a-fA-F]{32}`)
rx1.MarshalText() // `[0-9a-fA-F]{32}`
rx2 := regexp.MustCompilePOSIX(`[0-9a-fA-F]{32}`)
rx2.MarshalText() // `/[0-9a-fA-F]{32}/p` with 'p' flag emitted to indicate POSIX mode
rx3 := regexp.MustCompile(`[0-9a-fA-F]{32}`)
rx3.Longest()
rx1.MarshalText() // `/[0-9a-fA-F]{32}/l` with 'l' flag emitted to indicate longest-prefix mode.
rx4 := regexp.MustCompile(`/path/to/blah`)
rx4.MarshalText() // `//path/to/blah/` with surrounding '/' emitted to avoid possible parsing ambiguities I arbitrarily used Alternatively, options could be hacked into the existing |
Yeah at least Perl stringifies regexen such that they will produce the same thing via the grouping notation: $ perl -E'say qr/[a-z]/'
(?^u:[a-z]) |
We could invent our own custom flags for the (?...) syntax, but that would be quite a change for a pretty limited use. |
It sounds like we could invent a text format here, but that text format would not be the input to regexp.Compile, which negates most of the benefit. Is this still worth doing? It seems like maybe not... |
I have mixed feelings about this. I'm not fond of the need to invent a new format, but adding text marshalers would be useful for #45754. I've written tools before that took in a regexp as a command-line flag. |
Custom (?...) flags seem like the cleaner way of doing this. Re #45754, I don't think it's a very compelling use-case considering there's |
Can you elaborate on the reasoning here? If we were to (say) add new |
@flimzy those are two separate possible solutions. |
The reason not to add new (?...) flags is that there are too many regular expression syntaxes in the world. Today, RE2, Go, Rust, and RE2/J all agree on a single syntax. I would be reluctant for Go to deviate from that group and introduce yet another variant, however small. |
Based on the discussion above, this proposal seems like a likely decline. |
The current idea is to have MarshalText/UnmarshalText use the direct regexp syntax and just accept that MarshalText will be a lossy encoding for POSIX or Longest regexps, which are relatively rare. Do we have any other lossy marshaling? Does anyone think we shouldn't do this? |
Based on the discussion above, this proposal seems like a likely accept. |
There are no other non-test implementations of TextMarshaler in the standard library. The documentation for TextUnmarshaler says:
Seems that technically this will hold (unless there are any POSIX regexes that cannot be parsed as re2 regex). |
Marshaling of |
I'll do the implementation and see if I run into anything surprising. |
No change in consensus, so accepted. 🎉 |
I would be interested in tackling this one. |
@rsc, are you still planning on doing this? I found a use-case for this and would like to move it along. |
Change https://go.dev/cl/479401 mentions this issue: |
Change https://go.dev/cl/498756 mentions this issue: |
For #46159 Change-Id: Ia9cc0827a89d362532d1a662b791de8eebbfb2fe Reviewed-on: https://go-review.googlesource.com/c/go/+/498756 Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Eli Bendersky <eliben@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
What did you do?
Marshaling a compiled regular expression, for example as part of a JSON object, results in an empty object. Example:
Produces:
(Playground)
What did you expect to see?
It would seem more intuitive for a compiled regexp to marshal to its string representation, as if
.String()
were called. Example:I propose we implement the
encoding.TextMarshaler
interface to return the value produced by theString()
method.Strictly speaking, this would be a breaking change, so may need to be targeted to Go 2. On the other hand, in a practical sense, the current behavior is completely useless, so anyone depending on the current behavior probably already has a bug in their program. 🤷♂️
The text was updated successfully, but these errors were encountered: