-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue Tracker] PyTorch distributed RPC #96
Comments
@XuehaiPan But why does the isssu occured ? I check all my tcp configuration in kernel and limited configuration as below. It all looks right here. net.ipv4.ip_local_port_range = 10000 65535 net.netfilter.nf_conntrack_tcp_timeout_close = 10 -t: cpu time (seconds) unlimited Accutally, max number of simultaneous connections depends on two parts. One is net.ipv4.tcp_max_syn_backlog configuration. The value of my net.ipv4.tcp_max_syn_backlog = 16384 is enough to for simultaneous connections. The other is listen(fd, backlog) called in C++. I dont know if my analysis is right, but after applying my patch as above for rpc code, my test code works every time. |
This is an issue tracker for the upstream issues:
Initialize RPC with large world size:
Pass
nn.Module
andnn.Parameter
as RPC argument:nn.Parameter
as RPC argument automatically detaches from the computation graph pytorch/pytorch#86525The text was updated successfully, but these errors were encountered: