-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tune] Option to not override working_dir #29128
Comments
Labeling this as high prio due to internal needs. This does not need to land in 2.1 release. Assignee should take this issue and get consensus on the API on this thread. |
@justinvyu can you take a look at this (cc @krfricke) Some original context: #9571 Would be helpful to collect use-cases and possible considerations - of the top of my head there are a few directories the user may want to use (though I haven't verified which of these are practical):
|
Here's a summary of my understanding of the problem and my proposed solution: Problem
Proposed Solution
Discussion
Example UsageOption 1: Setting
|
Sounds good to me. The flag is a bit long though, how about chdir_to_trial_dir? |
Looks good. Any reason not to just always store both |
I thought the "original" working dir concept would be a bit confusing if the working directory is never changed (from setting the flag to false). |
Note: Updated proposal to set the flag in |
Agreed we should always set both flags. Sometimes present env vars are not something users will have the patience to understand. |
This looks good to me. IMO we should just point users to use This is a breaking API change, so should we default the flag to True for now and move to False after 2.2? |
Yeah, I think Currently defaulting to True. Do you think it makes sense to set it to False by default for 2.2 based on the discussion on #9571? I was proposing just keeping it True. |
cc @jiaodong |
The main question I have is what we should default to if we don't use a runtime env and the worker node does not have the current working dir locally available |
Regarding @krfricke's comment: the problem with the flag is that someone might run into the case where they have multiple nodes, and the working directory with the driver script may not exist on each node. This would not happen if Updated Proposal:Instead of setting a flag, we should control whether or not to change the working directory by checking if the user passed in a def train_func(config):
# Read from relative paths
print(open("./hello.txt").read())
# Still need to tell users to write to the Tune logdir
# This would be a new API that gets the trial dir for Tune session
# and trial_dir/rank_x for Train sessions within a Tune session
tune_log_dir = Path(session.get_log_dir())
with open(tune_log_dir / "write.txt", "w") as f:
f.write("test write")
ray.init(runtime_env={"working_dir": "."})
tuner = Tuner(
train_func,
..., # No flag anymore
) New
|
I'd be happy with that. We should make sure to document this cleanly and encourage users to always write to |
This will be confusing to users since runtime_env is a config set outside of the Tune job (spooky action at a distance). Actually, I don't see a huge problem with the working dir not existing. Why not do the following:
In other words, no action is required here. We just don't chdir to anything, it's whatever it happens to be. |
Sounds good to me! |
See #29128 for more context on the problem. This PR does the following: 1. **Fix `TUNE_ORIG_WORKING_DIR`** to pull the correct current working directory within the worker process when the Tune trial logdir is being created. This PR deprecates this environment variable as it's confusing to get Tune metadata from sources other than `session`. 2. **Introduce a `chdir_to_trial_dir` flag** in the TuneConfig that defaults to `True`, which configures whether or not Tune should change the working directory of each worker to its corresponding trial directory. - If this flag is set to False, the user may still want to access the Tune trial directory. This can be done with a **newly added `session.get_trial_dir()` API.** 3. Make the `TUNE_ORIG_WORKING_DIR` deprecation, `chdir_to_trial_dir` flag, and `session.get_trial_dir()` more visible in the documentation with an **example in the Tune FAQ.** Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
…29258) See ray-project#29128 for more context on the problem. This PR does the following: 1. **Fix `TUNE_ORIG_WORKING_DIR`** to pull the correct current working directory within the worker process when the Tune trial logdir is being created. This PR deprecates this environment variable as it's confusing to get Tune metadata from sources other than `session`. 2. **Introduce a `chdir_to_trial_dir` flag** in the TuneConfig that defaults to `True`, which configures whether or not Tune should change the working directory of each worker to its corresponding trial directory. - If this flag is set to False, the user may still want to access the Tune trial directory. This can be done with a **newly added `session.get_trial_dir()` API.** 3. Make the `TUNE_ORIG_WORKING_DIR` deprecation, `chdir_to_trial_dir` flag, and `session.get_trial_dir()` more visible in the documentation with an **example in the Tune FAQ.** Signed-off-by: Justin Yu <justinvyu@berkeley.edu> Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Currently, Tune always sets the working dir to ~/ray_results/<trial_scoped_dir>. This can be very confusing for users that depend on relative paths.
We should (1) Add a top-level run config option like
should_chdir
to make this very visible, (2) improve docs here, and (3) consider making this the default.Another issue is that TUNE_ORIG_WORKING_DIR isn't pointing to the runtime_env dir, but the dir on the driver, which isn't the right one in this case. It should point to the runtime_env dir (dir on the worker before the chdir).
The text was updated successfully, but these errors were encountered: