Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 5180 tempfiles are leak in tmp #5210

Merged
merged 12 commits into from
Aug 3, 2022

Conversation

dqkqd
Copy link
Contributor

@dqkqd dqkqd commented Jul 30, 2022

Thanks for creating such beautiful project!

The issue

I'm trying to fix #5180

get_paths, get_lib_paths, get_include_path functions create new .json files to store output called from subprocess_run without closing it.

The fix

Because the .json file is used only inside the function call. I have two solutions for this.

  • Delete the .json file at the end of each function.
  • Don't create a file, write output directly into stdout.

As these functions contain many returns, there will be a lot of duplicated codes to delete the file. So I tend to the second solution.

As far as I know, subprocess_run execute py_command, and if it failed to execute, c.stdout will be printed out to console. But the py_command does not print anything out. Therefore c.stdout is always empty.
Instead of creating new temporary file, I think it is better to use c.stdout to write directly to paths variable.

The checklist

  • Associated issue
  • A news fragment in the news/ directory to describe this fix with the extension .bugfix, .feature, .behavior, .doc. .vendor. or .trivial (this will appear in the release changelog). Use semantic line breaks and name the file after the issue number or the PR #.

"u'platinclude': u'{{0}}'.format(distutils.sysconfig.get_python_inc("
"plat_specific=1)) }}; value = u'{{0}}'.format(json.dumps(paths));"
"fh = io.open('{0}', 'w'); fh.write(value); fh.close()"
"import distutils.sysconfig, io, json, sys; paths = {u'include': "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize you are just refactoring existing code, but now I notice this usage of distutils which is totally deprecated. I wonder what the correct transformation of this usage would be.

Copy link
Contributor Author

@dqkqd dqkqd Jul 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of writing to a file, I refactor so it print out to the stdout directly.
Because the filename should not be passed to string as variable anymore, I deleted some unnecessary curly braces.
The ouptut code should look like this

import distutils.sysconfig, io, json, sys 
paths = {
    u'include': u'{0}'.format(distutils.sysconfig.get_python_inc(plat_specific=0)), 
    u'platinclude': u'{0}'.format(distutils.sysconfig.get_python_inc(plat_specific=1)) 
}
value = u'{0}'.format(json.dumps(paths))
print(value)

Sorry, I was copied the ouptut from build_command instead of get_include_path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you said, I tried to remove distutils.sysconfig and use sysconfig instead.

Copy link
Contributor Author

@dqkqd dqkqd Jul 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since sysconfig.get_path does not raise exception like distutils.sysconfig, the c.returncode is always 0. Therefore, those lines below could be refactored

command = [self.python, "-c", py_command.format(tmpfile_path)]
c = subprocess_run(command)
if c.returncode == 0:
paths = {}
with open(tmpfile_path, "r", encoding="utf-8") as fh:
paths = json.load(fh)
if "purelib" in paths:
paths["libdir"] = paths["purelib"] = make_posix(paths["purelib"])
for key in (
"platlib",
"scripts",
"platstdlib",
"stdlib",
"include",
"platinclude",
):
if key in paths:
paths[key] = make_posix(paths[key])
return paths
else:
click.secho(f"Failed to load paths: {c.stderr}", fg="yellow")
click.secho(f"Output: {c.stdout}", fg="yellow")
return None

@dqkqd
Copy link
Contributor Author

dqkqd commented Jul 31, 2022

I noticed that self.get_paths() does not raise any exception. So I think exception handling is unnecessary, and we could safely remove self.get_include_path(), self.get_lib_paths(), self._replace_parent_version() functions.

try:
paths = self.get_paths()
except Exception:
paths = get_paths(
self.install_scheme,
vars={
"base": prefix,
"platbase": prefix,
},
)
current_version = get_python_version()
try:
for k in list(paths.keys()):
if not os.path.exists(paths[k]):
paths[k] = self._replace_parent_version(
paths[k], current_version
)
except OSError:
# Sometimes virtualenvs are made using virtualenv interpreters and there is no
# include directory, which will cause this approach to fail. This failsafe
# will make sure we fall back to the shell execution to find the real include path
paths = self.get_include_path()
paths.update(self.get_lib_paths())
paths["scripts"] = self.script_basedir

@matteius
Copy link
Member

matteius commented Aug 2, 2022

@dqkqd Looks good, nice work! -- do you mind adding a news fragment explaining your change?

@matteius
Copy link
Member

matteius commented Aug 2, 2022

@dqkqd Thanks for adding a news fragment -- the linter wants there to be a newline at the end of that news fragment.

@dqkqd
Copy link
Contributor Author

dqkqd commented Aug 2, 2022

@dqkqd Thanks for adding a news fragment -- the linter wants there to be a newline at the end of that news fragment.

Sorry, I forgot to run pre-commit

@dqkqd
Copy link
Contributor Author

dqkqd commented Aug 2, 2022

Sorry for the mess with the fragment file. I fixed it again.
I also remove exception handling when trying to run self.get_paths().
I'm not sure returncode == 0 inside self.get_paths()should be removed or not. Since py_command run inside suprocess_run, might be it would be better to check in case py_command has error.

Copy link
Member

@matteius matteius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dqkqd It would seem this latest commit is problematic for windows: 3ca7c96

Can it be reverted?

@dqkqd
Copy link
Contributor Author

dqkqd commented Aug 2, 2022

@dqkqd It would seem this latest commit is problematic for windows: 3ca7c96

Can it be reverted?

Yes, I reverted it. Not sure why it didn't work since I don't have windows.

@@ -384,18 +376,13 @@ def get_paths(self) -> Optional[Dict[str, str]]:
:return: The python paths for the environment
:rtype: Dict[str, str]
"""
tmpfile = vistir.path.create_tracked_tempfile(suffix=".json")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bonus points for removing usage of vistir 💯

@oz123 oz123 merged commit 23f4cf3 into pypa:main Aug 3, 2022
@dqkqd dqkqd deleted the issue-5180-tempfiles-are-leak-in-tmp branch August 5, 2022 10:03
@matteius
Copy link
Member

matteius commented Aug 7, 2022

@dqkqd I just want to say again what awesome work you did with this change -- to be able to eliminate some file usage altogether is amazing for performance, and I think we see that in the new measurements that @oz123 pointed me at this evening, which are provided by the head of Lincoln Loop: https://lincolnloop.github.io/python-package-manager-shootout/
Its hard to look back on the historical data presently, but its in the github actions and from the tweet I saw before, the variance of the mean time to lock was reduced as was the overall time. Nice work!

@dqkqd
Copy link
Contributor Author

dqkqd commented Aug 7, 2022

@matteius Thank you so much. This is my first time contributing to open sources. Sorry for messing up with git.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pipenv update on linux using python 3.10 leaves files in /tmp/
3 participants