Skip to content

Conversation

@jjyao
Copy link
Collaborator

@jjyao jjyao commented Jun 2, 2025

Why are these changes needed?

Vendor setproctitle by copying its C source code and expose it to Python via Cython (the original C source code is exposed via Python C extension).

The original setproctitle has a side effect when you import it due to dvarrazzo/py-setproctitle#114 (it will change psutil.Process().cmdline() from ["python", "script.py", "--flag"] to ["python script.py --flag", "", ""]) and this PR avoids that by only initializing and changing process title when setproctitle() is called.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
@jjyao jjyao added the go add ONLY when ready to merge, run all tests label Jun 2, 2025
python/setup.py Outdated
)
# Vendor setproctitle which is a C extension by
# copying the so file to the ray/private/thirdparty/ folder.
subprocess.check_call(
Copy link
Collaborator Author

@jjyao jjyao Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually why not just declare setproctitle as a default dependency of Ray? What's the drawback of it? https://discuss.python.org/t/can-vendoring-dependencies-in-a-build-be-officially-supported

Case 1: I maintain a package which is a building-block for many of our internal applications and some of our customers’ applications, so maximal compatibility is a goal. This leads to us not wanting to use many libraries, as there are conflicts this would or could introduce with the downstream applications. e.g. I don’t want to use jsonschema because we have applications which use specific versions of it.

That makes sense, but unless you depend on packages with very constrained dependencies, or you yourself depend on a very constrained version of that package, it should not be an issue.
The example you gave, jsonschema, does not have any runtime dependencies, so unless you need a very constrained version of it, depending on it should not really cause any issues.

setproctitle doesn't have dependencies so dpending on it should be just fine?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aslonnie thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if I don’t need a constrained version of it today, any time it does a breaking change (major release if you use semver), I need to consider it. jsonschema recently released 4.0, so using it would put us in the position of having to support jsonschema 3.x and 4.x in the same codebase.

well, this can be a reason for vendoring.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason is that, if one of those dependencies has a security
vulnerability, a new or patched version of that dependency needs to
be re-vendored and a new version of the “immutable artifact”
published/installed even if your actual project has no new changes
at all. SBoM efforts go some way toward alleviating this concern,
but it’s still an added challenge and possible delay for the end
user who is otherwise left with a steaming pile of compromised
systems. Multiply it by all of the different things you’ve installed
each of which contains vendored dependencies and the situation can
quickly become unmanageable.

This is the reason for not vendoring.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less vendoring is better in my opinion. conda-forge has been quite successful in patching ray to declare a regular dependency without vendoring.

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
python/setup.py Outdated
"-m",
"pip",
"install",
"-q",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add --no-deps

python/setup.py Outdated
Comment on lines 605 to 606
# Vendor setproctitle which is a C extension by
# copying the so file to the ray/private/thirdparty/ folder.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider put it under not os.getenv("SKIP_THIRDPARTY_INSTALL_CONDA_FORGE"): ? conda forge does not really like these kind of vendoring afaik.

cc @mattip

we really just use setproctitle and getproctitle these two functions. we can put a stub file that redirects to

you can put a stub or wrapper file that redirects to either ray._private.thirdparty or external setproctitle

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
@pcmoritz
Copy link
Contributor

pcmoritz commented Jun 2, 2025

The north star for Ray Core (pip install ray) is to have zero python dependencies, so declaring it as a dependency is not an option.

The ideal would be to just set the process title in the C++ code / integrate it properly into the build system, is that an option people are considering for this?

jjyao added 2 commits June 2, 2025 13:01
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
jjyao added 3 commits June 2, 2025 14:53
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
@jjyao
Copy link
Collaborator Author

jjyao commented Jun 4, 2025

import setproctitle

   @ray.remote
   def f(x):
      assert setproctitle.getproctitle() == "ray::special_f"
      return x + 1

This test now fails because how setproctitle works. setproctitle mainly provides two methods: setproctitle() and getproctitle(). setproctitle() calls syscalls to update the process title (e.g. https://man.freebsd.org/cgi/man.cgi?query=setproctitle&sektion=3&format=html) AND store the new title in a global variable. getproctitle() simply returns the previously set title from the global variable (because there is no system call to get proc title). The issue is that now we vendor setproctitle and it's different from the one inside site-package/ so if you setproctitle via one version of the setproctitle library and try to getproctitle via another version of the setproctitle library, it won't work because they don't share the same global variable.

@aslonnie
Copy link
Collaborator

aslonnie commented Jun 5, 2025

hmm.. do we want ray users to rely on setproctitle.getproctitle() to do things?

@aslonnie
Copy link
Collaborator

aslonnie commented Jun 5, 2025

feels that ray core should provide an api of ray.getproctitle() and ask user to use that instead.

because getproctitle() basically is not really a thing without the global setproctitle package.

or we cannot vendor setproctitle.. we can maybe have an optional/legacy behavior to call the global setproctitle.setproctitle() when it exists, just to grandfather the old behavior.

jjyao added 8 commits June 5, 2025 11:01
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
@jjyao jjyao changed the title Vendor setproctitle [Core] Vendor setproctitle Jun 9, 2025
jjyao added 2 commits June 9, 2025 14:31
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
@@ -1,7 +1,7 @@
"""
This script ensures python files conform to ray's import ordering rules.
In particular, we make sure psutil and setproctitle is imported _after_
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to enforce the import order for setproctitle since we vendor it and doesn't rely on sys.path anymore.

@ray.remote
def f(x):
assert setproctitle.getproctitle() == "ray::special_f"
assert psutil.Process().cmdline()[0] == "ray::special_f"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better checking since it actually makes sure the process title is actually changed underneath while getproctitle() just returns the cached title previously set by setproctitle(). In other words, the previous check only checks that setproctitle() was called without checking if the actual process title was changed or not.

class Foo:
def method(self, name):
assert setproctitle.getproctitle() == f"ray::{name}"
assert psutil.Process().cmdline()[0] == f"ray::{name}"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason as above, checking psutil.Process().cmdline() is better than getproctitle()

Comment on lines 3 to 19
ray_cc_library(
name = "setproctitle",
srcs = glob(["setproctitle/spt*.c"]) + select({
"@platforms//os:macos": ["setproctitle/darwin_set_process_name.c"],
"//conditions:default": [],
}),
hdrs = glob(["setproctitle/spt*.h"]) + ["setproctitle/c.h"] + select({
"@platforms//os:macos": ["setproctitle/darwin_set_process_name.h"],
"//conditions:default": [],
}),
deps = ["@local_config_python//:python_headers"],
local_defines = select({
"@platforms//os:linux": ["HAVE_SYS_PRCTL_H"],
"@platforms//os:macos": ["__darwin__"],
"//conditions:default": [],
}),
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically what setproctitle.setup.py does.

@jjyao jjyao marked this pull request as ready for review June 9, 2025 22:34
@jjyao jjyao requested review from a team, edoakes and richardliaw as code owners June 9, 2025 22:34
@jjyao jjyao requested a review from aslonnie June 9, 2025 22:35
Copy link
Collaborator

@aslonnie aslonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty cool!

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
@jjyao jjyao merged commit 385e000 into ray-project:master Jun 10, 2025
5 checks passed
@jjyao jjyao deleted the jjyao/setproctitle branch June 10, 2025 07:00
elliot-barn pushed a commit that referenced this pull request Jun 18, 2025
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Jul 2, 2025
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants