Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocrd network: Gracefully close connections of remote clients #1037

Open
kba opened this issue Mar 29, 2023 · 1 comment
Open

ocrd network: Gracefully close connections of remote clients #1037

kba opened this issue Mar 29, 2023 · 1 comment

Comments

@kba
Copy link
Member

kba commented Mar 29, 2023

image

@bertsky:

Why close here? These could be used for stopping. (In the Docker case, does it even work to recreate a CustomDockerClient via create_docker_client in kill_hosts?)

@MehmedGIT:

Right. The way it is implemented currently is to open/close connections when creating the processing workers and then again when killing processing workers. It's not efficient that way (since more connections are opened and closed) but at least we don't have to keep track of the connection state and restore it in case of connection fails.

Note that in the processing server and processing workers we preserve the connection state to the RabbitMQ server but we still don't have a reconnection mechanism (it's there just as a concept implementation in the ocrd-webapi-implementation repo).

@joschrew:

I didn't (and still don't) see the benefit of keeping the connection open. Connection creating seems fast (and therefore cheap) to me so I thought keeping a connection for an unknown amount of time around does not make sense. Also this way there is no need to worry about whether the connection dies or something else happens to it. What would be the benefit of keeping the connection?

It works btw to recreate a DockerClient, this is the way it is done for killing the mongodb and rabbitmq container at a shutdown request.

@bertsky:

It works btw to recreate a DockerClient, this is the way it is done for killing the mongodb and rabbitmq container at a shutdown request.

Ok, I can see why it works now. You build upon the Docker daemon's internal ID mechanism for containers, which docker-py gives access to.

So for MongoDB, RabbitMQ and start_docker_processor, I agree it's better to re-establish a connection when you want to stop the running container. But for start_native_processor we don't have this kind of manager in the background. Querying the PID and then killing it later is fragile IMO. (What if the process is already gone, or forked and detached? What about the process group, will there be zombies afterwards?) I understand that keeping the channel open might also be fragile, though. Perhaps experience will teach us best, so let's keep it (maybe with some comment).

@kba kba mentioned this issue Mar 29, 2023
@bertsky
Copy link
Collaborator

bertsky commented Mar 29, 2023

The discussion then continued to details of alternative solutions:

  • @bertsky proposed an alternative mechanism for native processor sessions which would require keeping the SSH connection open but kill it directly (instead of PID bookkeeping) – still based on paramiko
  • @joschrew held that

    This was the only way for me to make it work. I think for your suggestion to work the transport (or the channel, I don't know) has to be kept so it can be used later to shut it down. I currently prefer to start the workers and then let them run independently without an open connection or something like that. I think/thought (without a really good reason maybe) it is better to not keep a connection. Then other things would have to be worried about, keeping it going, handle failures etc.
    But I have not much experience regarding this. What are the benefits of using the channel object and keeping it connected?

  • @bertsky replied

    Note: paramiko.transport by default does not send keep-alive messages, but can be set up to do so.
    IMO the benefit of keeping the session open would be that you don't have to worry about detaching and PID tracking in the first place. (The fact that paramiko does not provide access to the PID itself is a good argument against such tricks.)
    But as I said in the other comment, perhaps this has to be seen in practice first, so let's try with your current implementation.

  • @bertsky continued with other proposals

    (That being said, I do think there are better ways to attain the PID value. For example, you could do something like echo $! > $processor.pid, with $processor defined and passed by the client a priori, and then in kill_processing_worker do kill $(cat $processor.pid).
    Or instead of paramiko use asyncssh, which directly provides a function Connection.create_subprocess that lets you access its get_pid.
    Or have the Processing Worker write its PID into the database under a new worker model. )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants