Scheduling error in ray multi-machine cluster mode #24

flymysql · 2025-03-04T09:48:23Z

When I deploy smallpond on two machines and execute tasks on machine A, when I schedule the task to another machine B, an error is reported that the file path cannot be found.

I checked the file path. This data path is generated when machine A is initialized, but this data path is also used when executing tasks on machine B. The initial data path of machine B should be different from that of machine A.

wangrunji0408 · 2025-03-05T15:44:58Z

You should set a data_root which is accessible to both A and B.

sp = smallpond.init(data_root="shared/path")

In your case it is not set, and the default value is in your home path.

flymysql · 2025-03-06T03:46:40Z

You should set a data_root which is accessible to both A and B.

sp = smallpond.init(data_root="shared/path")
In your case it is not set, and the default value is in your home path.

Well, I have solved this problem, but it seems that data_root needs to be set to the directory where 3FS or HDFS mounts fuse. This ensures that the content of data_root will be synchronized to other machine nodes when a session is initialized.

In fact, other ray machine nodes will not actively create the data_root directory of smallpond, so they need to rely on the distributed file system for synchronization，3FS or other

miao404 · 2025-03-06T13:16:03Z

You should set a data_root which is accessible to both A and B.
sp = smallpond.init(data_root="shared/path")
In your case it is not set, and the default value is in your home path.

Well, I have solved this problem, but it seems that data_root needs to be set to the directory where 3FS or HDFS mounts fuse. This ensures that the content of data_root will be synchronized to other machine nodes when a session is initialized.

In fact, other ray machine nodes will not actively create the data_root directory of smallpond, so they need to rely on the distributed file system for synchronization，3FS or other

Hello, I have the same problem. How did you solve it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduling error in ray multi-machine cluster mode #24

Scheduling error in ray multi-machine cluster mode #24

flymysql commented Mar 4, 2025

wangrunji0408 commented Mar 5, 2025 •

edited

Loading

flymysql commented Mar 6, 2025

miao404 commented Mar 6, 2025

Scheduling error in ray multi-machine cluster mode #24

Scheduling error in ray multi-machine cluster mode #24

Comments

flymysql commented Mar 4, 2025

wangrunji0408 commented Mar 5, 2025 • edited Loading

flymysql commented Mar 6, 2025

miao404 commented Mar 6, 2025

wangrunji0408 commented Mar 5, 2025 •

edited

Loading