Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSHHook get_conn() does not re-use client #10874

Closed
baolsen opened this issue Sep 11, 2020 · 11 comments · Fixed by #17378
Closed

SSHHook get_conn() does not re-use client #10874

baolsen opened this issue Sep 11, 2020 · 11 comments · Fixed by #17378
Assignees

Comments

@baolsen
Copy link
Contributor

baolsen commented Sep 11, 2020

Apache Airflow version: 1.10.8

Environment:

  • Cloud provider or hardware configuration: 4 VCPU 8GB RAM VM
  • OS (e.g. from /etc/os-release): RHEL 7.7
  • Kernel (e.g. uname -a): Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
  • Others:

What happened:

Sub-classing the SSHOperator and calling its execute repeatedly will create a new SSH connection each time, to run the command.

Not sure if this is a bug or an enhancement / feature. I can re-log as a feature request if needed.

What you expected to happen:

SSH client / connection should be re-used if it was already established.

How to reproduce it:

Sub-class the SSHOperator.
In your sub class execute method, call super().execute() a few times.
Observe in the logs how an SSH Connection is created each time.

Anything else we need to know:

The SSHHook.get_conn() method creates a new Paramiko SSH client each time. Despite storing the client on self.client before returning, the hook get_conn() method does not actually use the self.client next time. A new connection is therefore created.
I think this is because the SSH Operator uses a context manager to operate on the Paramiko client, so the Hook needs to create a new client if a previous context manager had closed the last one.

Fixing this would mean changing the SSH Operator execute() to not use the ssh_hook.get_conn() as a context manager since this will open and close the session each time. Perhaps the conn can be closed with the operator's post_execute method rather than in the execute.

Example logs

[2020-09-11 07:04:37,960] {ssh_operator.py:89} INFO - ssh_conn_id is ignored when ssh_hook is provided.
[2020-09-11 07:04:37,960] {logging_mixin.py:112} INFO - [2020-09-11 07:04:37,960] {ssh_hook.py:166} WARNING - Remote Identification Change is not verified. This wont protect against Man-In-The-Middle attacks
[2020-09-11 07:04:37,961] {logging_mixin.py:112} INFO - [2020-09-11 07:04:37,961] {ssh_hook.py:170} WARNING - No Host Key Verification. This wont protect against Man-In-The-Middle attacks
[2020-09-11 07:04:37,976] {logging_mixin.py:112} INFO - [2020-09-11 07:04:37,975] {transport.py:1819} INFO - Connected (version 2.0, client OpenSSH_7.4)
[2020-09-11 07:04:38,161] {logging_mixin.py:112} INFO - [2020-09-11 07:04:38,161] {transport.py:1819} INFO - Auth banner: b'Authorized uses only. All activity may be monitored and reported.\n'
[2020-09-11 07:04:38,161] {logging_mixin.py:112} INFO - [2020-09-11 07:04:38,161] {transport.py:1819} INFO - Authentication (publickey) successful!
[2020-09-11 07:04:38,161] {ssh_operator.py:109} INFO - Running command: [REDACTED COMMAND 1]
...
[2020-09-11 07:04:38,383] {ssh_operator.py:89} INFO - ssh_conn_id is ignored when ssh_hook is provided.
[2020-09-11 07:04:38,383] {logging_mixin.py:112} INFO - [2020-09-11 07:04:38,383] {ssh_hook.py:166} WARNING - Remote Identification Change is not verified. This wont protect against Man-In-The-Middle attacks
[2020-09-11 07:04:38,383] {logging_mixin.py:112} INFO - [2020-09-11 07:04:38,383] {ssh_hook.py:170} WARNING - No Host Key Verification. This wont protect against Man-In-The-Middle attacks
[2020-09-11 07:04:38,399] {logging_mixin.py:112} INFO - [2020-09-11 07:04:38,399] {transport.py:1819} INFO - Connected (version 2.0, client OpenSSH_7.4)
[2020-09-11 07:04:38,545] {logging_mixin.py:112} INFO - [2020-09-11 07:04:38,545] {transport.py:1819} INFO - Auth banner: b'Authorized uses only. All activity may be monitored and reported.\n'
[2020-09-11 07:04:38,546] {logging_mixin.py:112} INFO - [2020-09-11 07:04:38,546] {transport.py:1819} INFO - Authentication (publickey) successful!
[2020-09-11 07:04:38,546] {ssh_operator.py:109} INFO - Running command: [REDACTED COMMAND 2]
....
[2020-09-11 07:04:38,722] {ssh_operator.py:89} INFO - ssh_conn_id is ignored when ssh_hook is provided.
[2020-09-11 07:04:38,722] {logging_mixin.py:112} INFO - [2020-09-11 07:04:38,722] {ssh_hook.py:166} WARNING - Remote Identification Change is not verified. This wont protect against Man-In-The-Middle attacks
[2020-09-11 07:04:38,723] {logging_mixin.py:112} INFO - [2020-09-11 07:04:38,723] {ssh_hook.py:170} WARNING - No Host Key Verification. This wont protect against Man-In-The-Middle attacks
[2020-09-11 07:04:38,734] {logging_mixin.py:112} INFO - [2020-09-11 07:04:38,734] {transport.py:1819} INFO - Connected (version 2.0, client OpenSSH_7.4)
[2020-09-11 07:04:38,867] {logging_mixin.py:112} INFO - [2020-09-11 07:04:38,867] {transport.py:1819} INFO - Auth banner: b'Authorized uses only. All activity may be monitored and reported.\n'
[2020-09-11 07:04:38,868] {logging_mixin.py:112} INFO - [2020-09-11 07:04:38,867] {transport.py:1819} INFO - Authentication (publickey) successful!
[2020-09-11 07:04:38,868] {ssh_operator.py:109} INFO - Running command: [REDACTED COMMAND 3]
@baolsen baolsen added the kind:bug This is a clearly a bug label Sep 11, 2020
@rootcss
Copy link
Contributor

rootcss commented Sep 17, 2020

@kaxil Please assign this to me.

@eladkal
Copy link
Contributor

eladkal commented Jul 16, 2021

@rootcss do you wish to finish the PR you started ?

@potiuk
Copy link
Member

potiuk commented Jul 18, 2021

I read the issue and - unless I misunderstood it - I think we should close both the issue and PR.

@baolsen Why would you like to run execute() method multiple times? The whole idea of execute() method is that it is run ONCE per task execution, in a separate process and the process is killed. This is pretty much how it always works, so the fact that it opens a new connection every time it is called is pretty much expected. Caching is hard, and if we do not have to do it, we should not do it.

The right way of reusing a connection (even if you want to compose or subclass something) is to reuse HOOK not operator. You can easily create single Hook and pass it around and use it many times while keeping the connection open, but you should not try to do it with the operator - pretty much always when you want to write some behaviour where you want to reuse connection for multiple API calls etc, the hook reuse is right not the operator. You should always in this case write a custom operator. You can even subclass existing operator, if this operator has some reusable methods, but then you should never call super execute method multiple times. Execute is not designed to be called more than once.

I am tempted to close it straight away, but this a very old issue @kaxil and @eladkal reacted to it - without pointing that out - maybe I do understand it wrongly ? Can someone shed some light on it :)

@uranusjr
Copy link
Member

The right way of reusing a connection (even if you want to compose or subclass something) is to reuse HOOK not operator.

The problem is SSHHook also does not reuse the underlying Paramiko connection, a new paramiko.SSHClient is created every time SSHHook.get_conn() is called, and it’s very non-obvious how that connection should be reused.

hook = SSHHook(...)

client = hook.get_conn()

# Does not reuse! A brand new connection is established.
# Plus the previous connection is now dangling.
client = hook.get_conn()

But I don’t think there’s a good quick solution to the problem; SSHHook is fundamentally not designed to be reused. So this probably should be removed from good-first-issue and we need to properly re-design SSHHook’s interface to make it work.

@baolsen
Copy link
Contributor Author

baolsen commented Jul 19, 2021

Thank you all for the feedback.

My use case is to split and zip some files on a remote server if those files are above a specific size.

To do this, I do the following over SSH:

  • list the input files and check their size
  • zip and split the largest ones using a Python script I send over SSH
  • Check the output by iterating the output files

I put them in a single task using CustomSSHOperator which inherits SSHOperator. I want my task to be:

  • Able to resume if the process fails on file N
  • Simplify the support process so we only re-run 1 task if it fails
  • Avoid performance issues with 1.10.x Airflow scheduler if we had many tasks

In my environment connections are expensive (for local and remote), complex and error prone.
This is mainly because of the corporate authentication implementation at my site.

But SSHOperator does not allow multiple commands to be run over 1 connection even when subclassed.
I had to find a way to run multiple commands over the same connection.

I could have subclassed SSHOperator and copied all of the "boiler plate" client code.
That code is complex and doesn't directly add value to my use case. I don't want to maintain and update it myself.

So I ended up for each command, changing the self.command and calling super.execute() as it was simplest.
I agree it is not the right way to solve this just because it was convenient for me ;)

One option could be to refactor SSHOperator to isolate the "run an SSH command" part which can be reused by subclasses.

But we seem not decided yet whether the hook is a better place to do this. If so I think the changes could be more extensive and impactful.

@uranusjr
Copy link
Member

One option could be to refactor SSHOperator to isolate the "run an SSH command" part which can be reused by subclasses.

This makes total sense to me, but I believe we still need to rewrite the hook a bit to make this happen. The operator is backend by the hook, so it can’t do anything the hook does not do; in order to be able to run multiple commands with the same connection, the hook needs to be able to return that connection for reuse in the first place, and currently the hook does not do that. And that change is not trivial, is what I was trying to say.

@potiuk
Copy link
Member

potiuk commented Jul 19, 2021

@baolsen:

| One option could be to refactor SSHOperator to isolate the "run an SSH command" part which can be reused by subclasses.

Oh yeah Makes total sense. And (see below) it can be easily fixed to provide client caching in Hook

@uranusjr :

# Does not reuse! A brand new connection is established.
# Plus the previous connection is now dangling.
client = hook.get_conn()

Yeah. Because SSHHook pretty much does not have its own methods so it was never a problem, I guess. It delegates everything to SSHClient. It is a Hook that on it's own is not even a thin wrapper around the SSHClient but delegates all the work to it. I think quick fixing it and caching the client should be two lines code (it happens in other operators):

if self.client:
    return client

@uranusjr
Copy link
Member

That would also mean SSHHook cannot be re-used though, because afaict paramiko.SSHClient does not support reopening a client once it’s closed. So if we do

if self.client:
    return client

This would no longer work:

hook = SSHHook(...)

with hook.get_conn() as client:
    ...

with hook.get_conn() as client:
    # Will fail! The client was already closed, and calling __enter__ again does not reopen it.

So it’s breaking one usage to accomodate another, which feels wrong to me.

@potiuk
Copy link
Member

potiuk commented Jul 25, 2021

Proposals?

@baolsen
Copy link
Contributor Author

baolsen commented Jul 30, 2021

I can put together a PR, to refactor SSHOperator to isolate the "run an SSH command" part

@baolsen
Copy link
Contributor Author

baolsen commented Aug 6, 2021

PR is up :) Observations and suggestions welcome.
Happy to make other improvements based on the feedback.
#17378

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment