-
Notifications
You must be signed in to change notification settings - Fork 7k
[Core] Vendor setproctitle #53471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Vendor setproctitle #53471
Conversation
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
python/setup.py
Outdated
| ) | ||
| # Vendor setproctitle which is a C extension by | ||
| # copying the so file to the ray/private/thirdparty/ folder. | ||
| subprocess.check_call( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually why not just declare setproctitle as a default dependency of Ray? What's the drawback of it? https://discuss.python.org/t/can-vendoring-dependencies-in-a-build-be-officially-supported
Case 1: I maintain a package which is a building-block for many of our internal applications and some of our customers’ applications, so maximal compatibility is a goal. This leads to us not wanting to use many libraries, as there are conflicts this would or could introduce with the downstream applications. e.g. I don’t want to use jsonschema because we have applications which use specific versions of it.
That makes sense, but unless you depend on packages with very constrained dependencies, or you yourself depend on a very constrained version of that package, it should not be an issue.
The example you gave, jsonschema, does not have any runtime dependencies, so unless you need a very constrained version of it, depending on it should not really cause any issues.
setproctitle doesn't have dependencies so dpending on it should be just fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aslonnie thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if I don’t need a constrained version of it today, any time it does a breaking change (major release if you use semver), I need to consider it. jsonschema recently released 4.0, so using it would put us in the position of having to support jsonschema 3.x and 4.x in the same codebase.
well, this can be a reason for vendoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason is that, if one of those dependencies has a security
vulnerability, a new or patched version of that dependency needs to
be re-vendored and a new version of the “immutable artifact”
published/installed even if your actual project has no new changes
at all. SBoM efforts go some way toward alleviating this concern,
but it’s still an added challenge and possible delay for the end
user who is otherwise left with a steaming pile of compromised
systems. Multiply it by all of the different things you’ve installed
each of which contains vendored dependencies and the situation can
quickly become unmanageable.
This is the reason for not vendoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less vendoring is better in my opinion. conda-forge has been quite successful in patching ray to declare a regular dependency without vendoring.
python/setup.py
Outdated
| "-m", | ||
| "pip", | ||
| "install", | ||
| "-q", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add --no-deps
python/setup.py
Outdated
| # Vendor setproctitle which is a C extension by | ||
| # copying the so file to the ray/private/thirdparty/ folder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider put it under not os.getenv("SKIP_THIRDPARTY_INSTALL_CONDA_FORGE"): ? conda forge does not really like these kind of vendoring afaik.
cc @mattip
we really just use setproctitle and getproctitle these two functions. we can put a stub file that redirects to
you can put a stub or wrapper file that redirects to either ray._private.thirdparty or external setproctitle
|
The north star for Ray Core ( The ideal would be to just set the process title in the C++ code / integrate it properly into the build system, is that an option people are considering for this? |
This test now fails because how |
|
hmm.. do we want ray users to rely on |
|
feels that ray core should provide an api of because or we cannot vendor |
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
| @@ -1,7 +1,7 @@ | |||
| """ | |||
| This script ensures python files conform to ray's import ordering rules. | |||
| In particular, we make sure psutil and setproctitle is imported _after_ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to enforce the import order for setproctitle since we vendor it and doesn't rely on sys.path anymore.
| @ray.remote | ||
| def f(x): | ||
| assert setproctitle.getproctitle() == "ray::special_f" | ||
| assert psutil.Process().cmdline()[0] == "ray::special_f" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better checking since it actually makes sure the process title is actually changed underneath while getproctitle() just returns the cached title previously set by setproctitle(). In other words, the previous check only checks that setproctitle() was called without checking if the actual process title was changed or not.
| class Foo: | ||
| def method(self, name): | ||
| assert setproctitle.getproctitle() == f"ray::{name}" | ||
| assert psutil.Process().cmdline()[0] == f"ray::{name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same reason as above, checking psutil.Process().cmdline() is better than getproctitle()
src/ray/thirdparty/BUILD.bazel
Outdated
| ray_cc_library( | ||
| name = "setproctitle", | ||
| srcs = glob(["setproctitle/spt*.c"]) + select({ | ||
| "@platforms//os:macos": ["setproctitle/darwin_set_process_name.c"], | ||
| "//conditions:default": [], | ||
| }), | ||
| hdrs = glob(["setproctitle/spt*.h"]) + ["setproctitle/c.h"] + select({ | ||
| "@platforms//os:macos": ["setproctitle/darwin_set_process_name.h"], | ||
| "//conditions:default": [], | ||
| }), | ||
| deps = ["@local_config_python//:python_headers"], | ||
| local_defines = select({ | ||
| "@platforms//os:linux": ["HAVE_SYS_PRCTL_H"], | ||
| "@platforms//os:macos": ["__darwin__"], | ||
| "//conditions:default": [], | ||
| }), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically what setproctitle.setup.py does.
aslonnie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty cool!
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Why are these changes needed?
Vendor
setproctitleby copying its C source code and expose it to Python via Cython (the original C source code is exposed via Python C extension).The original
setproctitlehas a side effect when you import it due to dvarrazzo/py-setproctitle#114 (it will changepsutil.Process().cmdline()from ["python", "script.py", "--flag"] to ["python script.py --flag", "", ""]) and this PR avoids that by only initializing and changing process title whensetproctitle()is called.Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.