Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix windows polling using sockets for the WakeupPipe #5599

Closed
wants to merge 2 commits into from

Conversation

blagoev
Copy link
Contributor

@blagoev blagoev commented Jun 16, 2022

fixes #5591

Copy link
Contributor

@RedBeard0531 RedBeard0531 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how much effort we want to put into making this good great. Hopefully it will be dead soon(ish), but OTOH, maybe we should assume it will last longer than we expect...

Comment on lines 395 to 417
do {
if (setsockopt(listener, SOL_SOCKET, SO_REUSEADDR,
(char*) &reuse, (socklen_t) sizeof(reuse)) == -1)
break;
if (bind(listener, &a.addr, sizeof(a.inaddr)) == SOCKET_ERROR)
break;
if (getsockname(listener, &a.addr, &addrlen) == SOCKET_ERROR)
break;
if (listen(listener, 1) == SOCKET_ERROR)
break;
socks[0] = WSASocket(AF_INET, SOCK_STREAM, 0, NULL, 0, flags);
if (socks[0] == INVALID_SOCKET)
break;
if (connect(socks[0], &a.addr, sizeof(a.inaddr)) == SOCKET_ERROR)
break;
socks[1] = accept(listener, NULL, NULL);
if (socks[1] == INVALID_SOCKET)
break;

closesocket(listener);
return 0;

} while (0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This do {} while(0)+break is basically just a disguised goto ERR;. Personally I think it would be better to just use a goto than hiding it in something that looks like a loop, but maybe others are more anti-goto than I am.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree its weird. I don't like using wholes for something other than a loop especially I don't like writing endless loops.
but I tried not to change anything on this code so we follow the already public version that I assume is tested and reviewed by others

src/realm/util/network.cpp Show resolved Hide resolved
src/realm/util/network.cpp Outdated Show resolved Hide resolved
break;
if (connect(socks[0], &a.addr, sizeof(a.inaddr)) == SOCKET_ERROR)
break;
socks[1] = accept(listener, NULL, NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a slight risk that some other process will connect to this port before we do. We could mitigate that by generating a secure random number and requiring the connecter to send that number and verify that it matches. But it seems like there is a ton of code (including the python stdlib) using some form of this pattern and none of it tries to handle this case, so it is probably overkill. Since this is a localhost listener, at least we don't have to worry about remote port scanners.

Edit: I guess if some other process did manage to beat us at connecting then we would probably have the connect call above either hang or return an error since we call listen with a backlog of 1. So maybe we would only get to this line if we were the one connecting.

socks[0] = WSASocket(AF_INET, SOCK_STREAM, 0, NULL, 0, flags);
if (socks[0] == INVALID_SOCKET)
break;
if (connect(socks[0], &a.addr, sizeof(a.inaddr)) == SOCKET_ERROR)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat surprised that this doesn't hang since we haven't set this to non-blocking. I noticed that the python impl sets it to nonblocking prior to connecting then sets it back to blocking. We could skip setting it back to blocking since I think we want it in non-blocking mode anyway.

Edit: This is related to the next comment. It looks like connect() only blocks until we enter the listener's backlog, since the connection is "pre-accepted" by the kernel for us prior to calling accept() as long as there is room in the backlog. Since we have a backlog of 1, we would only hang if some other process beat us to connecting. Maybe it is worth setting nonblocking so that we can cleanly error here rather than hanging indefinitely in this very unlikely case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this got reviewed a bunch of times from multiple people so I assume it is fine. Like this one https://git.infradead.org/users/dwmw2/openconnect.git/commitdiff/bdeefa54

I am hesitant to try to improve it since we risk introducing bugs which were not there.

Comment on lines +438 to +439
m_write_fd.reset(socks[0]);
m_read_fd.reset(socks[1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to set these to non-blocking? Or is that handled somewhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should be blocking, cause signal and acknowledge_signal should complete after the signal has been sent/read

src/realm/util/network.cpp Outdated Show resolved Hide resolved
max_wait_millis);
/*if (ret == SOCKET_ERROR) {
int lastErr = WSAGetLastError();
}*/
REALM_ASSERT(ret != SOCKET_ERROR);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we try to use the common error handling below (removing the #ifndef on line 1244)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am hesitant to do this much changes. So I left the error handling as it is on this platform.

// Poll all network sockets
ret = WSAPoll(LPWSAPOLLFD(&m_pollfd_slots[1]), ULONG(m_pollfd_slots.size() - 1),
socket_poll_timeout);
ret = WSAPoll(LPWSAPOLLFD(fds), ULONG(m_pollfd_slots.size()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could get rid of most of the ifdefs in this code by having a poll_wrapper function above that dispatches to WSAPoll on windows and ::poll everywhere else. At the very least, could we narrow the #ifdeffing to just be around the function call or is some of the other setup also platform specific?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I targeted of making as minimal changes as possible. I guess we could refactor the code more

src/realm/util/network.cpp Outdated Show resolved Hide resolved
@blagoev blagoev closed this Jun 17, 2022
@blagoev blagoev deleted the blagoev/fix-win-polling branch June 17, 2022 17:52
@blagoev blagoev restored the blagoev/fix-win-polling branch June 17, 2022 17:52
@blagoev blagoev deleted the blagoev/fix-win-polling branch June 17, 2022 19:53
RedBeard0531 pushed a commit that referenced this pull request Apr 26, 2023
Co-authored-by: Kræn Hansen <kraen.hansen@mongodb.com>
Co-authored-by: Kenneth Geisshirt <kenneth.geisshirt@mongodb.com>
Co-authored-by: LJ <81748770+elle-j@users.noreply.github.com>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

High CPU usage on Windows
2 participants