Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C API] Add PyUnicode_Equal() function #124502

Closed
vstinner opened this issue Sep 25, 2024 · 12 comments
Closed

[C API] Add PyUnicode_Equal() function #124502

vstinner opened this issue Sep 25, 2024 · 12 comments
Labels
topic-C-API type-feature A feature request or enhancement

Comments

@vstinner
Copy link
Member

vstinner commented Sep 25, 2024

Python 3.13 moved the private _PyUnicode_EQ() function to internal C API. mypy and Pyodide are using it.

I propose to add a public PyUnicode_Equal(a, b) function to the limited C API 3.14 to replace the private _PyUnicode_EQ() function:

  • Return 1 if a is equal to b.
  • Return 0 if a is not equal to b.
  • Set a TypeError exception and return -1 if a or b is not a Python str object.
  • The function always succeed if a and b are strings.

Linked PRs

@Eclips4 Eclips4 added the type-feature A feature request or enhancement label Sep 25, 2024
vstinner added a commit to vstinner/cpython that referenced this issue Sep 25, 2024
@vstinner
Copy link
Member Author

cc @serhiy-storchaka @encukou

@rruuaanng
Copy link
Contributor

rruuaanng commented Sep 25, 2024

I think PyUnicode_Equal should like strcmp in glibc.
Return 0 if a is equal to b.
Return 1 if a isn't equal to b.

@vstinner
Copy link
Member Author

I think PyUnicode_Equal should like strcmp in glibc.
Return 0 if a is equal to b.
Return 1 if a isn't equal to b.

The C API Guidelines suggest to return 1 if "found" and 0 if "not found": https://devguide.python.org/developer-workflow/c-api/#guidelines-for-expanding-changing-the-public-api

IMO it's more natural to return 1 if equal and 0 if not equal. So you can write if (PyUnicode_Equal(a, b)) { /* a == b */ }.

Moreover, PyUnicode_Equal() API the same as existing private functions _PyUnicode_EQ() and _PyUnicode_Equal() (except that these functions don't check if arguments are strings).

@rruuaanng
Copy link
Contributor

rruuaanng commented Sep 25, 2024

I think PyUnicode_Equal should like strcmp in glibc.

Return 0 if a is equal to b.

Return 1 if a isn't equal to b.

The C API Guidelines suggest to return 1 if "found" and 0 if "not found": https://devguide.python.org/developer-workflow/c-api/#guidelines-for-expanding-changing-the-public-api

IMO it's more natural to return 1 if equal and 0 if not equal. So you can write if (PyUnicode_Equal(a, b)) { /* a == b */ }.

Moreover, PyUnicode_Equal() API the same as existing private functions _PyUnicode_EQ() and _PyUnicode_Equal() (except that these functions don't check if arguments are strings).

I think that's good idea, So when will it be implemented?
Maybe you could give me a shot. Haha

@vstinner
Copy link
Member Author

I already implemented the function. See attached PR.

@ZeroIntensity
Copy link
Member

Hmm, will this result in a cascade of PySomething_Equal functions? This case should be special, I don't think we need PyLong_Equal, PyDict_Equal, PyList_Equal, etc.

Out of curiosity, what's the benefit here over PyUnicode_Compare(...) == 0?

@picnixz
Copy link
Contributor

picnixz commented Sep 25, 2024

PyUnicode_Compare is slower since it needs to handle Unicode kinds separately (you need to take care of mixed kinds since comparing two Unicode strings of different kinds is a well-defined operation). PyUnicode_Equal is faster because 1) it directly uses memcmp and 2) only needs to do something when the Unicode kinds match.

@vstinner
Copy link
Member Author

vstinner commented Sep 25, 2024

Hmm, will this result in a cascade of PySomething_Equal functions? This case should be special, I don't think we need PyLong_Equal, PyDict_Equal, PyList_Equal, etc.

C extensions such as mypy and Pyodide are already using the private _PyUnicode_EQ() function which has been removed in Python 3.13. I'm proposing a public replacement for them.

I'm not aware of other PyTYPE_Equal() function. Only PyUnicode has a specialized equal function.

Out of curiosity, what's the benefit here over PyUnicode_Compare(...) == 0?

PyUnicode_Compare() returns -1 for "less than" but also for the error case. The caller must call PyErr_Occurred() which is inefficient. It causes an ambiguous return value: capi-workgroup/problems#1

PyUnicode_Equal() has no such ambiguous return value (-1 only means error). Moreover, it may be a little bit faster, but I'm not sure about that.

@ZeroIntensity
Copy link
Member

I'm not aware of other PyTYPE_Equal() function. Only PyUnicode has a specialize equal function.

Yeah, that's what I mean. I think it would be a bad idea to use this as precedent for future "faster" PyXXX_Equal functions. I'm fine with PyUnicode_Equal.

@vstinner
Copy link
Member Author

I created capi-workgroup/decisions#43 issue in the C API Working Group.

vstinner added a commit to vstinner/cpython that referenced this issue Oct 7, 2024
* Replace unicode_compare_eq() with unicode_eq().
* Replace _PyUnicode_EQ() calls with _PyUnicode_Equal().
* Remove _PyUnicode_EQ().
vstinner added a commit to vstinner/cpython that referenced this issue Oct 8, 2024
* Cleanup unicode_compare_eq() code.
* Copy unicode_compare_eq() code in unicode_eq(): the two functions
  are now identical.
vstinner added a commit to vstinner/cpython that referenced this issue Oct 8, 2024
* Replace _PyUnicode_EQ() with _PyUnicode_Equal().
* Replace unicode_compare_eq() with unicode_eq().
* Remove unicode_compare_eq() and _PyUnicode_EQ().
vstinner added a commit to vstinner/cpython that referenced this issue Oct 8, 2024
* Replace unicode_compare_eq() with unicode_eq().
* Replace _PyUnicode_EQ() with _PyUnicode_Equal().
* Remove unicode_compare_eq() and _PyUnicode_EQ().
vstinner added a commit that referenced this issue Oct 9, 2024
* Replace unicode_compare_eq() with unicode_eq().
* Use unicode_eq() in setobject.c.
* Replace _PyUnicode_EQ() with _PyUnicode_Equal().
* Remove unicode_compare_eq() and _PyUnicode_EQ().
@vstinner
Copy link
Member Author

vstinner commented Oct 9, 2024

Function added by change a7f0727.

@vstinner vstinner closed this as completed Oct 9, 2024
@vstinner
Copy link
Member Author

vstinner commented Oct 9, 2024

Function added to pythoncapi-compat with change: python/pythoncapi-compat@abc0f29.

efimov-mikhail pushed a commit to efimov-mikhail/cpython that referenced this issue Oct 9, 2024
efimov-mikhail pushed a commit to efimov-mikhail/cpython that referenced this issue Oct 9, 2024
efimov-mikhail pushed a commit to efimov-mikhail/cpython that referenced this issue Oct 9, 2024
* Replace unicode_compare_eq() with unicode_eq().
* Use unicode_eq() in setobject.c.
* Replace _PyUnicode_EQ() with _PyUnicode_Equal().
* Remove unicode_compare_eq() and _PyUnicode_EQ().
hauntsaninja pushed a commit to python/mypy that referenced this issue Oct 14, 2024
Use functions added for Python 3.14
* `PyBytes_Join` added in
python/cpython#121645
* `PyUnicode_Equal` added in
python/cpython#124502
cdce8p added a commit to cdce8p/mypy that referenced this issue Oct 14, 2024
Use functions added for Python 3.14
* `PyBytes_Join` added in
python/cpython#121645
* `PyUnicode_Equal` added in
python/cpython#124502
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-C-API type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants