Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在ray集群中运行benchmark/gray_sort_benchmark.py, 但是任务只在当前节点上跑,没有被调度到其他节点 #29

Open
nokia-t1zhou opened this issue Mar 6, 2025 · 3 comments

Comments

@nokia-t1zhou
Copy link

我的ray集群有2台机器:

`ray status
======== Autoscaler status: 2025-03-06 18:37:53.961968 ========
Node status
Active:
1 node_0b968bac8bda0fb5f5cca5dc059644dbf79928938340165899229a39
1 node_e5dcd12667c418417567f04f0754fd4d00498ba28af97e93e310aa77
Pending:
(no pending nodes)
Recent failures:
(no failures)

Resources
Usage:
0.0/352.0 CPU
0B/3.51TiB memory
0B/372.53GiB object_store_memory

Demands:
(no resource demands)
`

我在其中一台节点上用如下命令启动gray_sort_benchmark.py:
export SP_RAY_ADDRESS=10.107.204.154:26379 export SP_DATA_ROOT=/beegfs/smallpond python3 benchmarks/gray_sort_benchmark.py ray -T 51200000000 -n 1 -t 10 -V

但是只看到任务在当前节点上运行,没有被调度到第二个节点,能帮忙看看是什么原因吗?
附件是运行log

log.txt

@miao404
Copy link

miao404 commented Mar 6, 2025

hi 我想顺便请教一下如何集群部署smallpond

@nokia-t1zhou
Copy link
Author

我加了个环境变量SP_NUM_EXECUTORS, 解决了这个问题

@nokia-t1zhou
Copy link
Author

hi 我想顺便请教一下如何集群部署smallpond

  1. 用pip安装smallpond
  2. 部署分布式存储,最好是3FS
  3. 部署ray集群
  4. 使用smallpond api来处理数据

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants