Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PyString::intern to enable access to Python's built-in string interning. #2268

Merged
merged 2 commits into from
Apr 3, 2022

Conversation

adamreichold
Copy link
Member

@adamreichold adamreichold commented Apr 3, 2022

I think the API is somewhat unfortunate but I am not sure how to get around the requirement for a null-terminated string without the cost of the allocation that this API is supposed to avoid in the first place. At least not without a PyUnicode_InternFromStringAndSize.

@birkenfeld
Copy link
Member

I'd at least add a version that takes &str. This is so inconvenient that it's going to hurt uptake, and there is an allocation anyway in the Python function.

@adamreichold
Copy link
Member Author

adamreichold commented Apr 3, 2022

I'd at least add a version that takes &str. This is so inconvenient that it's going to hurt uptake

Do you mean a function taking &str and only checking that it is null-terminated?

and there is an allocation anyway in the Python function.

While that is true for the current implementation, I think this is actually a performance bug as that implementation should probably use PyDict_{Get,Set}ItemString instead of immediately allocating a PyUnicode object.

Or put differently, if this is guaranteed to allocate a temporary PyUnicode object for every call, then this will actually be useless in the context of #2266 and we would probably be better off exposing PyDict_{Get,Set}ItemString ourselves.

@birkenfeld
Copy link
Member

birkenfeld commented Apr 3, 2022

I'd at least add a version that takes &str. This is so inconvenient that it's going to hurt uptake
Do you mean a function taking &str and only checking that it is null-terminated?

Either that, or one creating the intermediary CString by itself. It could even do both, by trying first to create a CStr and making a new null-terminated string if necessary. Then you can easily pass "abc\0" for speed, or any user given str for convenience.

@birkenfeld
Copy link
Member

Sorry I wasn't thinking properly about the "benchmark dict access with strings" use case.

In general, having an API for interning is still useful for other cases, so I'd like this to go forward independently.

@adamreichold
Copy link
Member Author

adamreichold commented Apr 3, 2022

I'd at least add a version that takes &str. This is so inconvenient that it's going to hurt uptake
Do you mean a function taking &str and only checking that it is null-terminated?

Either that, or one creating the intermediary CString by itself. It could even do both, by trying first to create a CStr and making a new null-terminated string if necessary. Then you can easily pass "abc\0" for speed, or any user given str for convenience.

Added a separate commit providing a more convenient API using CStr::from_bytes_with_nul as a fast path.

@birkenfeld
Copy link
Member

Thanks! BTW, I can never figure out what people mean by the "eyes" reaction, what's your take on that?

src/types/string.rs Outdated Show resolved Hide resolved
src/types/string.rs Outdated Show resolved Hide resolved
@adamreichold
Copy link
Member Author

Thanks! BTW, I can never figure out what people mean by the "eyes" reaction, what's your take on that?

The meaning I wanted to convey was "I am looking into that", so when I am currently thinking about/working on something but do not have any positive or negative results to share yet.

@adamreichold
Copy link
Member Author

In general, having an API for interning is still useful for other cases, so I'd like this to go forward independently.

@mejrs @birkenfeld I rebooted this into a wrapper that is purely targeted at memory efficiency as PyDict_{Set,Get}ItemString actually have the same deficiency as PyUnicode_InternFromString.

@birkenfeld
Copy link
Member

LGTM.

Copy link
Member

@mejrs mejrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@birkenfeld birkenfeld merged commit d3dcbd7 into main Apr 3, 2022
@birkenfeld birkenfeld deleted the pystring-intern branch April 3, 2022 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants