Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated regex-automata to 0.3 #3655

Closed
wants to merge 1 commit into from
Closed

Updated regex-automata to 0.3 #3655

wants to merge 1 commit into from

Conversation

Razican
Copy link

@Razican Razican commented Jul 9, 2023

Regex has released the new 0.3 regex-automata crate, used in icu-list, and I wanted to upgrade it, in order to reduce duplicate dependencies and no-longer-updated dependencies.

The only logic change is that now start_state_forward() can return an error, which the current matches_earliest_fwd_lazy() function cannot propagate.

Since the input is a static empty haystack, should I just call expect() on the error?

Awaiting for that feedback before proposing the final solution.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear to me whether data produced by regex-automata@0.2 is compatible with regex-automata@0.3 code. I couldn't find any migration documentation, and data stability is a property we require in ICU4X.

@@ -31,7 +31,7 @@ all-features = true
[dependencies]
displaydoc = { version = "0.2.3", default-features = false }
icu_provider = { version = "1.2.0", path = "../../provider/core", features = ["macros"] }
regex-automata = { version = "0.2", default-features = false }
regex-automata = { version = "0.3", default-features = false, features = ["dfa-search", "dfa-build"] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dfa-build is only required in the cases where we currently require alloc: serde_human and datagen

@sffc
Copy link
Member

sffc commented Jul 10, 2023

@BurntSushi

@robertbastian
Copy link
Member

Since the input is a static empty haystack, should I just call expect() on the error?

We guarantee to be panic-free, and it is not clear to me from start_state_forward's documentation that no error will occur, including in future releases. In particular, in 0.3 there are changes around anchoring, and I'm not convinced that these are confined to the builders.

The safe solution here is to return false on error, however I do also want to confirm that this will not actually happen with old data.

There are more API incompatibilities in the builder code. You can run cargo make bakeddata components/list to regenerate data (which is currently not working). If there's no diff then we're probably fine, otherwise we need to confirm that this is not an incompatibility.

@BurntSushi
Copy link

It is unclear to me whether data produced by regex-automata@0.2 is compatible with regex-automata@0.3 code.

It is not. Normally you'll get an error, but regex-automata 0.2 was a frankenstein release as you know.

I note that we've already had a discussion about this topic in the past. If you need serialized DFAs to work across multiple semver incompatible releases of regex-automata, then regex-automata cannot and will not ever satisfy that requirement. And I answered the compatibility question there too:

And to put a pin in it: DFAs generated by 0.3 (not yet released) will absolutely not be compatible with 0.2.

As for other stuff:

We guarantee to be panic-free, and it is not clear to me from start_state_forward's documentation that no error will occur, including in future releases. In particular, in 0.3 there are changes around anchoring, and I'm not convinced that these are confined to the builders.

I think the first part of this is probably correct. I don't think I can guarantee that the set of errors will remain completely fixed within semver compatible releases. I would expect that any new error conditions that are added would be the result of new features that are added that would be disabled by default. For example, consider what would happen if the quit byte feature were added today after the regex-automata 0.3 release. I would do it in a semver compatible manner by adding it and requiring callers to opt into it. This would in turn add a new documented error condition to start_state_forward (among other routines), but the error condition could only occur when the quit byte feature was enabled.

Otherwise, turning a non-error case into an error case is something that I think would generally be considered a breaking change, unless it was fixing a bug.

As far as anchors go, you should be fine there as long as you're search anchor configuration is consistent with your DFA builder configuration. For example, if you try to run an anchored search on a DFA that has only been configured for unanchored searches, then that will result in an error. Otherwise this isn't something that changes based on the haystack.

Personally it feels like you'd be safe to call unwrap/expect here assuming you aren't using quit bytes. But I think it's also reasonable to collapse an error into a "do not match" result as well. The main downside of the latter is that it could potentially silently swallow a bug. But that's up to y'all for how you handle things like that.

@robertbastian
Copy link
Member

Thanks for clarifying, I'll close this PR then.

We may consider upgrading to 0.3 in ICU4X 2.0, where we can do breaking data changes.

@sffc
Copy link
Member

sffc commented Jul 10, 2023

To document, the next step should be to release a V2 version of the data structure. It's easier in 2.0 because then we can drop the old dependency; it's possible to do it sooner, but we'd keep the old dependency around for people who have old data.

@robertbastian
Copy link
Member

I think having both dependencies defeats the purpose, and making them optional gets really messy.

@sffc sffc mentioned this pull request Jul 11, 2023
37 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants