Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit if we can use PyO3's new interning of PyString #15007

Open
Eric-Arellano opened this issue Apr 5, 2022 · 3 comments
Open

Audit if we can use PyO3's new interning of PyString #15007

Eric-Arellano opened this issue Apr 5, 2022 · 3 comments

Comments

@Eric-Arellano
Copy link
Contributor

See PyO3/pyo3#2269 for how it works. We should check if there are places we can use this.

@seve-martinez
Copy link
Contributor

seve-martinez commented Feb 21, 2023

I did a little bit of digging here and found that PyString::intern was also added with this.

A quick search of the code yielded a few spots that this could be used:

fn files<'py>(&self, py: Python<'py>) -> &'py PyTuple {
let files = self.0.files();
PyTuple::new(
py,
files
.into_iter()
.map(|path| PyString::new(py, &path.to_string_lossy()))
.collect::<Vec<_>>(),
)
}

However with my limited understanding of PySnapshot i'm not sure if it's methods are good candidates. Are there some specific places you had in mind that could benefit?

I think it can also be used in many of the getattr calls like in the fs.rs module.

e.g.

impl<'source> FromPyObject<'source> for PyPathGlobs {
  fn extract(obj: &'source PyAny) -> PyResult<Self> {
    let globs: Vec<String> = obj.getattr(intern!(obj.py(), "globs"))?.extract()?;  // <-- like so

@Eric-Arellano
Copy link
Contributor Author

cc @stuhood , any thoughts?

@stuhood
Copy link
Member

stuhood commented Feb 23, 2023

However with my limited understanding of PySnapshot i'm not sure if it's methods are good candidates. Are there some specific places you had in mind that could benefit?

We already do some rust-side interning internally in PySnapshot (in the fs::Directory type), and that is for directory components (i.e. an/example/file/path would be interned as four separate components). That's probably a good example of where interning can help.

But interning entire file paths as in PySnapshot::files would rarely "hit" the intern cache, so that probably isn't worth it.


One thing to study would be the output of https://www.pantsbuild.org/v2.15/docs/reference-stats#memory_summary, to see if there are any large Python objects which could be made smaller. Here is some recent output: https://gist.github.com/stuhood/949078e32309b94bd14c905719483404

Nothing jumps out at me immediately though =/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants