Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace 0.0.0.0 with 127.0.0.1 for client connections #7766

Merged
merged 23 commits into from
May 2, 2021
Merged

Replace 0.0.0.0 with 127.0.0.1 for client connections #7766

merged 23 commits into from
May 2, 2021

Conversation

rkimball
Copy link
Contributor

@rkimball rkimball commented Mar 29, 2021

Also change references to 127.0.0.1 to localhost so that all references are consistent.

On Windows default 0.0.0.0 is not mapped to 127.0.0.1 as it is on Linux and it should not be. 0.0.0.0 is used for an undefined address, not localhost.

https://en.wikipedia.org/wiki/0.0.0.0

While 0.0.0.0 can be the same as 127.0.0.1 on 'servers' it is not true on client computers. This PR should make things work better on Windows, but does not address all Windows networking issues.

Any socket used for listening should use 0.0.0.0 as the address
Any socket used for client connections to the local machine should use 127.0.0.1, not 0.0.0.0.
Use 127.0.0.1 instead of localhost since localhost could be IPV4 or IPV4. If a test or user specifically wants IPV6 they can use the address :1

@tqchen
Copy link
Member

tqchen commented Mar 29, 2021

on linux 0.0.0.0 allows external host to connect(as it binds on the external accessible IP), while localhost would prevent external machine to connect, which would be problematic for cases like cpp_rpc(to be expected to connect from remote).

Would be great to figure out the examples in each case.
There is also a difference in terms of IPv6 vs IPv4, and explicitly specifying "localhost" might cause some of the problem

@rkimball
Copy link
Contributor Author

0.0.0.0 is not a valid address on Windows. On linux if you ping 0.0.0.0 it actually ends up pinging 127.0.0.1. I will try an example of using 127.0.0.1 and connecting from an external host.

According to this https://en.wikipedia.org/wiki/Localhost 127.0.0.1 and localhost are the same. If you are more comfortable with 127.0.0.1 then I can change to that.

@tqchen
Copy link
Member

tqchen commented Mar 29, 2021

0.0.0.0 maps to all possible address on linux. While 127.0.0.1 is equivalent as localhost.

Say the IP of the node is 192.168.0.3, then we either need to set the host to be 192.168.0.3 or 0.0.0.0 on linux(which binds to both 127.0.0.1 and 192.168.0.3).

This is useful under a case of dynamic DNS where server can report to tracker but cannot hardcode a IP(the same command would work for multiple cases).

@tqchen
Copy link
Member

tqchen commented Mar 29, 2021

Also 127.0.0.1 can be different from localhost(because of IPv6) and some code assumes IPv6, so 127.0.0.1 can be useful in certain cases(e.g. MacOS), or we need to add IPv6 compact support in most cases

@rkimball
Copy link
Contributor Author

On Windows where I am having issues 0.0.0.0, or really IN_ADDRANY in the sockaddr structure, is needed for a server socket, one that is listening for connections. This address will bind to all local address. This works on windows and I am able to connect from both local and remote machines to a windows tracker bound to 0.0.0.0.
For client connections, any connection to a listening socket, on windows we can't use 0.0.0.0 as the target address. This works on linux but not on windows. So we are going to have a mix of addresses. For servers we must use 0.0.0.0 but clients we can't use that.
For client connections we can use either 127.0.0.1 or localhost for any self-hosted server.

@tqchen
Copy link
Member

tqchen commented Mar 29, 2021

Agree 0.0.0.0 is needed for server listen addr only but not for client connection

@tqchen
Copy link
Member

tqchen commented Mar 30, 2021

@jroesch @mdw-octoml would be great if you can also help provide some thoughts

Will let @jroesch manage the PR

@mdw-octoml
Copy link
Contributor

So, I'm a little confused here. It seems that this PR is changing the default both for the address the server binds to as well as the address a client would connect to. Am I getting that right?

My understanding is that on both Windows and Linux, binding a server to 0.0.0.0 is the correct thing to do, since it allows the server to listen on any address / any interface, which is necessary in order to accept external connections. Binding to localhost or 127.0.0.1 would only bind on the loopback interface (on Linux, anyway) only permitting local connection, but I don't think that is the correct default behavior.

The client address should, of course, be specified explicitly, either as 127.0.0.1 (or, equivalently, 'localhost') or explicitly when connecting to a remote address. So changing the client default to localhost seems fine, but I would not change the server default.

@rkimball
Copy link
Contributor Author

rkimball commented Mar 30, 2021

@mdw-octoml You are correct. This PR is wrong but there is still a problem with addresses which needs to be fixed. I will revert all of the current changes and submit new changes that correct the issues I am having on windows. I would close this PR and create a new one, but there is good discussion here that I would like to preserve.

Question to you and @tqchen: Currently when we create the tracker we pass in the "host" address of 0.0.0.0 which is fine and the tracker has a property "host" which returns the "host address" of the tracker which is not fine. The tracker likely has multiple addresses, one for each of the network interfaces and returning the "host address" of the tracker is confusing, which of the multiple host addresses would a client connect to? The client is normally told explicitly via a command-line parameter which address to connect to. I would like to remove the "host" property of the tracker. Currently the tracker host property returns 0.0.0.0 which is not the address a client would use to connect to the tracker.

We also need to disambiguate the address of the tracker in the code, which uses host and tracker_host interchangeably. The tracker host should be consistently called tracker_host

@rkimball rkimball changed the title Rename references to 0.0.0.0 to localhost. Replace 0.0.0.0 with 127.0.0.1 for client connections Mar 30, 2021
@mdw-octoml
Copy link
Contributor

I would like to remove the "host" property of the tracker. Currently the tracker host property returns 0.0.0.0 which is not the address a client would use to connect to the tracker.

I agree that should be done, but I don't know the implications of removing that property. It seems to me that clients can only discover the tracker through some out of band mechanism anyway.

We also need to disambiguate the address of the tracker in the code, which uses host and tracker_host interchangeably. The tracker host should be consistently called tracker_host

Agreed.

@tqchen
Copy link
Member

tqchen commented Mar 30, 2021

Thanks @rkimball @mdw-octoml I agree with the points being said.

Our main need so far is to keep the port number of tracker available.This is mainly for the ease of writing tests as a common pattern would be starting a tracker. Then a client can request resources from that port. In terms of host, we could always write tests that connects 127.0.0.1 so perhaps we do not strictly need that field(or client should not think that as a field to connect to)

@rkimball rkimball requested a review from yzhliu as a code owner April 7, 2021 20:12
Copy link
Contributor

@mdw-octoml mdw-octoml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A few minor nits to address.

@@ -512,7 +512,7 @@ def __init__(
self.port = my_port
break
except socket.error as sock_err:
if sock_err.errno in [98, 48]:
if sock_err.errno in [98, 48, 10098, 10048]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend using symbols from the errno library here instead of hardcoding values.
If that does not work, could you at least add a comment here indicating what these error codes are?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

98/10098 is errno.EADDRINUSE for linux/windows. This works for both.
48 is errno.ELNRNG on linux but ELNRNG is not defined in windows so we can't use the symbol. The error translates to Link number out of range which I don't understand. I removed 48/10048. If we want we can add the number 48 back but we can't use the symbol.

@@ -419,7 +419,7 @@ def __init__(
self.port = my_port
break
except socket.error as sock_err:
if sock_err.errno in [98, 48]:
if sock_err.errno in [98, 48, 10098, 10048]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above on errnos.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment for above.

@@ -399,7 +399,7 @@ def __init__(self, host, port=9190, port_end=9199, silent=False):
self.port = my_port
break
except socket.error as sock_err:
if sock_err.errno in [98, 48]:
if sock_err.errno in [98, 48, 10098, 10048]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above on errnos.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment for above

@@ -85,7 +85,7 @@ inline std::string GetHostName() {
* \return result of operation.
*/
inline bool ValidateIP(std::string ip) {
if (ip == "localhost") {
if (ip == "127.0.0.1") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be necessary since "127.0.0.1" should pass the is_ipv4 check below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been left as it was with "localhost" since it is a special valid address.

@@ -118,7 +118,7 @@ struct SockAddr {
std::string host = url.substr(2, sep - 3);
std::string port = url.substr(sep + 1, url.length() - 1);
ICHECK(ValidateIP(host)) << "Url address is not valid " << url;
if (host == "localhost") {
if (host == "127.0.0.1") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be necessary, since it's already that value :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been left as it was with "localhost" since it is a special valid address.

@@ -45,7 +45,7 @@
# run individual functions. Somewhere along the way, the imports are being
# lost, so the server ends up not registering the functions.
pytestmark = pytest.mark.skipif(
multiprocessing.get_start_method() != "fork",
sys.platform.startswith("win") == False and multiprocessing.get_start_method() != "fork",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here why Windows is being excluded here?

Copy link
Contributor Author

@rkimball rkimball Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this enables this test for Windows. Windows does not support fork so this test could never run on Windows, but now it can. I added a comment stating this

@tqchen tqchen merged commit f4a680d into apache:main May 2, 2021
umangyadav pushed a commit to umangyadav/tvm that referenced this pull request May 5, 2021
* Rename references to 0.0.0.0 to localhost. Also change references to 127.0.0.1 to localhost so that all references are consistent. 0.0.0.0 is not the same as localhost.
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
* Rename references to 0.0.0.0 to localhost. Also change references to 127.0.0.1 to localhost so that all references are consistent. 0.0.0.0 is not the same as localhost.
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
* Rename references to 0.0.0.0 to localhost. Also change references to 127.0.0.1 to localhost so that all references are consistent. 0.0.0.0 is not the same as localhost.
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
* Rename references to 0.0.0.0 to localhost. Also change references to 127.0.0.1 to localhost so that all references are consistent. 0.0.0.0 is not the same as localhost.
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request May 11, 2021
* Rename references to 0.0.0.0 to localhost. Also change references to 127.0.0.1 to localhost so that all references are consistent. 0.0.0.0 is not the same as localhost.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants