Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

equality of Tokens and strings is not transitive #1114

Open
dimbleby opened this issue Feb 14, 2022 · 3 comments
Open

equality of Tokens and strings is not transitive #1114

dimbleby opened this issue Feb 14, 2022 · 3 comments
Labels
discussion Discuss new features or other changes

Comments

@dimbleby
Copy link

Describe the bug

The __eq__ implementation for Token gives non-transitive results: ie one can construct an example where a == b, and b == c, but a != c.

To Reproduce

from lark import Token

a = Token("noun", "flies")
b = "flies"
c = Token("verb", "flies")

print(f"a == b ? {a == b}")
print(f"b == c ? {b == c}")
print(f"a == c ? {a == c}")

output

a == b ? True
b == c ? True
a == c ? False

This report is prompted by python-poetry/poetry-core#286, in which one lark grammar is reused by another. So depending on which was used you could get the "same" thing with type either MARKER_NAME or markers__MARKER_NAME - which caused some confusion.

In an ideal world I think I'd argue that Token should not be a subclass of str.

I suspect this is too deeply embedded / relied upon, and that you're not going to want to do anything about this. That's fine by me, please just close this out. But I thought I'd open the issue anyway so that you knew this of this oddity and could take that decision deliberately.

@erezsh
Copy link
Member

erezsh commented Feb 14, 2022

Thanks for floating this. I can see why this behavior would be an issue. Tokens behave so much like strings that it makes sense to assume they'll act like strings in every way.

In an ideal world I think I'd argue that Token should not be a subclass of str.

Maybe you're right. Or maybe I should have just made __eq__ compare only the value and ignore the type.

Changing the subclassing isn't really an option. But, maybe changing __eq__ is, if it makes sense to do.

Funny enough, the Lark-Cython implementation's only break in compatibility is that the Token class doesn't inherit from str.

Anyway, I appreciate that you took the time to let me know. Maybe nothing will come of it, but we can keep it open and see what other people think.

P.S. just realized that even if I fix __eq__ there's still room for "oddities" like token == str but repr(token) != repr(str). But maybe that's more acceptable.

@erezsh erezsh added the discussion Discuss new features or other changes label Feb 14, 2022
@dimbleby
Copy link
Author

I see two ways to restore transitivity of equality:

  • remove the overriding __eq__ implementation, so that all of a, b, and c are considered equal
  • update it so that no Token is ever equal to any string, it can only be equal to another Token with the same type and value. Then none of a, b and c are considered equal.

I don't have a good sense of how breaking either of these would be for lark and its users and I don't actually care about this very much myself. So I'm going to drop out and leave you to do with this what you will, thanks!

@erezsh
Copy link
Member

erezsh commented Feb 15, 2022

Thanks again for your input!

My intuition says that a subclass that divides the value space shouldn't be equal to its superclass. It's something I added out of convenience and didn't think much about. It might be a little too late to change it, at least for v1.x, but it's definitely something worth thinking more about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discuss new features or other changes
Projects
None yet
Development

No branches or pull requests

2 participants