Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrent access to TCP connection cache #5

Open
PatrickMaier opened this issue Dec 24, 2014 · 2 comments
Open

concurrent access to TCP connection cache #5

PatrickMaier opened this issue Dec 24, 2014 · 2 comments

Comments

@PatrickMaier
Copy link
Owner

The TCP connection cache works in default mode: cache all connections, never evict. Bypassing the cache (HdpH RTS flag -conns=0) also works but will open a new connection for every message (and probably never explicitly close connections.)

What does not work is limiting the size of cache. Eviction of a TCP connection from the cache may sometimes lead to the connection being closed while a message is still being sent over that connection. As a result the message is dropped and HdpH will most likely deadlock. HdpH RTS debug flag -d3 will log dropped messages.

@alexanderkjeldaas
Copy link

How can a dropped message lead to deadlock?

@PatrickMaier
Copy link
Owner Author

Deadlock might arise because the HdpH work stealing protocol isn't resilient to message loss. (This is HdpH, not Rob Stewart's fault tolerant HdpH-RS.) If the message being lost is a FISH message (ie. a request for work) then its sender will just wait for a reply forever. It will eventually run out of work and just sit unproductively until shutdown. Not good but no deadlock. However, if the lost message is a put to a remote IVar (ie. the result of a remotely executed task) then that result is lost, the corresponding IVar stays empty, and whoever attempts to read from it will block forever. This will lead to deadlock unless the IVar in question was never read in order to compute the overall result (ie. the parallel task was speculative - something that HdpH also doesn't support properly.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants