Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make multilocale libraries determine their port information on the fly #13044

Merged
merged 11 commits into from
May 21, 2019

Conversation

lydia-duncan
Copy link
Member

@lydia-duncan lydia-duncan commented May 17, 2019

This change modifies the generation of multilocale libraries to use
ephemeral ports instead of hard-coded ones when communicating
between the client and spawned server. It does this via a secret
flag that is not visible to the user, --chpl-mli-socket-loc, which
communicates the connection information where the client will
listen. The server will then create sockets for communicating the
function calls, arguments, and return values of the exported
functions, and will use the client's connection socket to
communicate the connection information for those sockets.

Passed testing over std/ paratest with futures and a full gasnet paratest.

Doesn't send the client's setup socket information, but tries to perform the
handshake for the main, arg, and res ports.  Unfortunately, this currently runs
into a segfault, but I'm going to figure that out next.
… connect

Was able to get the server to call the function.  Segfault occurred during the
call to chpl_library_finalize.
I did this through a new flag, --mli-socketLoc, which gets appended to the end
of the original argv and then pulled before entering chpl_library_init on the
server side.  I am not married to the name, and there might be a better strategy
for it.
This should hopefully avoid any possibility of naming conflicts
@dlongnecke-cray
Copy link
Contributor

It's strange to me that we have to use a flag to sneak our information through the launcher. In normal execution of Chapel multi-locale libraries, the end user has to be able to pass through an unadorned argv/argc to the entrypoint, no?

What is preventing us from just leveraging that?

Just curious to know a bit more about how the launcher whines at us.

@lydia-duncan
Copy link
Member Author

Basically, what was going on was that parseArgs was traversing the arguments, looking for configs and other known flags. This was happening before chpl_library_init was called on the server side. I am appending the connection information to the end of the already present argv/argc information, but 1) parseArgs didn't know anything about whether we were getting called from a multilocale library or execution settting and 2) it seemed hooky to just "assume the last argument" was what we wanted given that situation. Having an additional flag that parseArgs recognized and ignored allowed us to indicate to parseArgs that the situation was appropriate

@dlongnecke-cray
Copy link
Contributor

Ok, sounds reasonable. Thanks for clarifying. My only worry is making sure a user can't (whether accidentally or maliciously) get ahold of the flag somehow?

@lydia-duncan
Copy link
Member Author

That's a good point. I should add some additional checking

Copy link
Contributor

@dlongnecke-cray dlongnecke-cray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Mainly just seeing refactoring opportunities that are fallout from decisions I made clashing against the new design.

I think we should try to stick to a coding style internally, but I'll leave it to you to decide what that is (or if we even need to bother).

Right now I'm just worried about if a user abuses that internal flag! You have something in the works for that, I'm sure 👍.

runtime/etc/src/mli/client_runtime.c Outdated Show resolved Hide resolved
runtime/etc/src/mli/client_runtime.c Outdated Show resolved Hide resolved
runtime/etc/src/mli/server_runtime.c Outdated Show resolved Hide resolved
runtime/etc/src/mli/client_runtime.c Outdated Show resolved Hide resolved
runtime/etc/src/mli/server_runtime.c Show resolved Hide resolved
// Drop down into the server listen loop, passing in the last argument, which
// will be the socket connection information for communicating where the
// server will be listening.
chpl_mli_smain(argv[argc-1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I'm thinking that we might want to refactor the listen loop into "setup" -> "listen" -> "teardown" functions. Not necessarily with this PR, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good that we continue to keep the actual body of main as small as possible, though.

- If we see it twice, that's because the user added it themselves, so we tell
them it is for internal use only
- If we only see the flag and it is the last argument (not followed by the
connection information), tell them it needs to be followed by connection
information
- If we see the flag and what follows it does not start with `tcp://`, then it
is likely being used improperly (either by not having connection information
follow it, or by having a bad string follow it), so tell the user not to.

In any case, only the last use of the flag would matter in a multilocale library
run (other uses get dropped on the floor) and we ensure that we provide the last
use of the flag when we look for it, so it shouldn't pose a problem.
For the purposes of the client or server's use of ZMQ in a multilocale library,
we intend to only bind ephemerally, and then send that information to the pair
that will be connecting.  So the bind call does not need anything other than the
socket that will be bound.
…consistent)

Use snake_case instead of an inconsistent mixture of snake_case and camelCase
- Capitalize first letter of every debug message
- Use a colon in the first chpl_mli_push_connection debug message
@lydia-duncan
Copy link
Member Author

I think I've addressed your feedback (except for the one I responded to directly just now). I need to:

  • check testing when I'm back in the office
  • muse a little more about how to error when the secret flag is thrown for single-locale/non-library programs
  • poke the weird connection issue for my home

@lydia-duncan
Copy link
Member Author

But I think it would be good for you to take a look at what's here now, and let me know if you think any of those three tasks shouldn't hold up the PR (currently leaning towards number 3 not holding it up)

@lydia-duncan
Copy link
Member Author

(current branch passed full paratest with futures for CHPL_COMM=none, so I didn't break that at least)

@dlongnecke-cray
Copy link
Contributor

dlongnecke-cray commented May 20, 2019

It looks good, I like the error messages in runtime/src/arg.c. I don't think the coverage for the secret flag needs to be 100% complete to be able to merge this (if we waited till everything was 100% complete, we'd have nothing at all...).

As for what you told me earlier about the VPN issue, I'd be OK with making a separate issue to explore what's going on there.

@lydia-duncan lydia-duncan merged commit ba51f76 into chapel-lang:master May 21, 2019
@lydia-duncan lydia-duncan deleted the handshake branch May 21, 2019 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants