`ocrd network`: Gracefully close connections of remote clients #1037

kba · 2023-03-29T15:20:08Z

Why close here? These could be used for stopping. (In the Docker case, does it even work to recreate a CustomDockerClient via create_docker_client in kill_hosts?)

@MehmedGIT:

Right. The way it is implemented currently is to open/close connections when creating the processing workers and then again when killing processing workers. It's not efficient that way (since more connections are opened and closed) but at least we don't have to keep track of the connection state and restore it in case of connection fails.

Note that in the processing server and processing workers we preserve the connection state to the RabbitMQ server but we still don't have a reconnection mechanism (it's there just as a concept implementation in the ocrd-webapi-implementation repo).

@joschrew:

I didn't (and still don't) see the benefit of keeping the connection open. Connection creating seems fast (and therefore cheap) to me so I thought keeping a connection for an unknown amount of time around does not make sense. Also this way there is no need to worry about whether the connection dies or something else happens to it. What would be the benefit of keeping the connection?

It works btw to recreate a DockerClient, this is the way it is done for killing the mongodb and rabbitmq container at a shutdown request.

@bertsky:

It works btw to recreate a DockerClient, this is the way it is done for killing the mongodb and rabbitmq container at a shutdown request.

Ok, I can see why it works now. You build upon the Docker daemon's internal ID mechanism for containers, which docker-py gives access to.

So for MongoDB, RabbitMQ and start_docker_processor, I agree it's better to re-establish a connection when you want to stop the running container. But for start_native_processor we don't have this kind of manager in the background. Querying the PID and then killing it later is fragile IMO. (What if the process is already gone, or forked and detached? What about the process group, will there be zombies afterwards?) I understand that keeping the channel open might also be fragile, though. Perhaps experience will teach us best, so let's keep it (maybe with some comment).

The text was updated successfully, but these errors were encountered:

bertsky · 2023-03-29T21:41:02Z

The discussion then continued to details of alternative solutions:

@bertsky proposed an alternative mechanism for native processor sessions which would require keeping the SSH connection open but kill it directly (instead of PID bookkeeping) – still based on paramiko
@joschrew held that

This was the only way for me to make it work. I think for your suggestion to work the transport (or the channel, I don't know) has to be kept so it can be used later to shut it down. I currently prefer to start the workers and then let them run independently without an open connection or something like that. I think/thought (without a really good reason maybe) it is better to not keep a connection. Then other things would have to be worried about, keeping it going, handle failures etc.
But I have not much experience regarding this. What are the benefits of using the channel object and keeping it connected?
@bertsky replied

Note: paramiko.transport by default does not send keep-alive messages, but can be set up to do so.
IMO the benefit of keeping the session open would be that you don't have to worry about detaching and PID tracking in the first place. (The fact that paramiko does not provide access to the PID itself is a good argument against such tricks.)
But as I said in the other comment, perhaps this has to be seen in practice first, so let's try with your current implementation.
@bertsky continued with other proposals

(That being said, I do think there are better ways to attain the PID value. For example, you could do something like echo $! > $processor.pid, with $processor defined and passed by the client a priori, and then in kill_processing_worker do kill $(cat $processor.pid).
Or instead of paramiko use asyncssh, which directly provides a function Connection.create_subprocess that lets you access its get_pid.
Or have the Processing Worker write its PID into the database under a new worker model. )

kba mentioned this issue Mar 29, 2023

Processing-Server #974

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ocrd network`: Gracefully close connections of remote clients #1037

`ocrd network`: Gracefully close connections of remote clients #1037

kba commented Mar 29, 2023

bertsky commented Mar 29, 2023

ocrd network: Gracefully close connections of remote clients #1037

ocrd network: Gracefully close connections of remote clients #1037

Comments

kba commented Mar 29, 2023

bertsky commented Mar 29, 2023

`ocrd network`: Gracefully close connections of remote clients #1037

`ocrd network`: Gracefully close connections of remote clients #1037