Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of as much subprocess as possible, and get rid of passing args via CLI where possible #1056

Closed
KOLANICH opened this issue Sep 15, 2022 · 7 comments · Fixed by #1059
Closed

Comments

@KOLANICH
Copy link

subprocess is a potential security issue and usage of it should be minimized.

Describe the solution you'd like
0. bash install.sh is forbidden: it is directly code execution!

  1. git is used through API provided by numerous python packages for working with git
  2. wget is replaced with aria2c getting their args via a named pipe.
  3. calling tar through CLI is prohibited, use packages to access tar archives instead

Describe alternatives you've considered
No such.

@Wauplin
Copy link
Contributor

Wauplin commented Sep 16, 2022

Hi @KOLANICH, thanks for reporting :) I think what you mostly refer to here is the install_lfs_in_userspace function that execute code from the web. Since it is a legacy helper with no big importance (and undocumented), we can remove it.

Apart from that, do you see any other particularly insecure parts ? As @LysandreJik mentioned to me, the other subprocess calls are restrained to git calls with shell=True.

@KOLANICH
Copy link
Author

closed this as completed

Not quite. shell=True is not used, and it is very good, since eliminates a class of attacks. But still, if something can be dealt via API, it should be dealt via API.

@Wauplin
Copy link
Contributor

Wauplin commented Sep 20, 2022

But still, if something can be dealt via API, it should be dealt via API.

Hi @KOLANICH, I agree with this statement but we also try to avoid as much as possible external dependencies in huggingface_hub. Of course we cannot get rid of all dependencies so it's more a tradeoff we try to keep reasonable.

For git-related commands, are you referring to GitPython ? If yes, what would be the benefit in using it instead of subprocess calls ?

@julien-c
Copy link
Member

julien-c commented Sep 20, 2022

GitPython is using subprocess calls to call git command line too, no?

@KOLANICH
Copy link
Author

For git-related commands, are you referring to GitPython ? I

Not necessarily, there are various tools, each of them deals with git repo format itself to some extent. The one that has the full git protocol (except LFS, which is not part of git protocol and which is usally dealt with an external binary) available via bindings is pygit2. dulwich also aims to reimplement full git protocol in python, but by now it uses subprocesses. GitOxide currently has no pythkn bindings. Also there is a Java project jgit, which contains not only jgit lib, but also a jgit.jfs, which implements Git LFS in Java. Such libs can be called from Python through Jpype, but I guess JVM is an unnecessary dependency here, so for LFS we can keep subprocesses for now.

@Wauplin
Copy link
Contributor

Wauplin commented Sep 21, 2022

@KOLANICH thanks for the review of the existing packages. I didn't know all of them to be honest.
However for what I understand we don't have a satisfying solution here to replace subprocess call.

  • pygit2 -> that would require users to install libgit2 as well. Most of our users already have git installed which doesn't require additional work on there side.
  • jgit -> indeed we aim to have a package as easy to use as possible. A JVM dependency is not very suitable.
  • dulwich / GitOxide -> seems to be interesting projects but not mature enough/not usable from python

And in the end I am still not sure to understand the benefit of not using subprocess for our use case.

@KOLANICH
Copy link
Author

And in the end I am still not sure to understand the benefit of not using subprocess for our use case.

It's OK not to use it where it has significant enough drawbacks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants