-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor SSHOperator so a subclass can run many commands (#10874) #17378
Conversation
Hey @potiuk , I saw you made a recent change to SSH Operator. I've rebased and included it. |
self.log.info('Creating ssh_client') | ||
return self.get_hook().get_conn() | ||
|
||
def exec_ssh_client_command(self, ssh_client: SSHClient, command: str) -> Tuple[int, bytes, bytes]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a wrong abstraction for me since self
here is only used for logging, and it’s entirely up to the caller to pass in the correct SSHClient instance, which the operator should be able to manage.
Would something like this make more sense?
@property
def client(self):
if self._client is None:
raise RuntimeError("Outside of a create_ssh_client() context")
return self._client
def execute(self, context=None) -> Union[bytes, str]:
with self.create_ssh_client(): # This sets self._client so it can be used by other methods.
self.run_remote_command(command)
# On exit, close self._client and set self._client to None.
# Error handling and serialization etc. afterward omitted for brevity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @uranusjr
I've pushed another commit along these lines, please take a look.
(I know this build is failing, can ignore & I'll work on it)
The thing is we don't want to call super.execute()
from a subclass.
So I put the error handling etc. outside it so it can be re-used by a subclass when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build has passed. Was flaky CI before now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok to keep it as is (re: abstraction) - with passing client, we already have some hooks that do that stateless approach (and some that keep state of the connection in).
No strong opinions which is better. The statefull approach is better from OO perspective and gives more meaning to Hook as also being 'session'. But this is not necessary really. Hook (and it is a bad name) is more of a "nice API" for operator to (re-)use and to understand "connection" and read credentials from it.
I think we never agreed on whether Hook should be 1<->1 session/client and maybe it does not really matter. I think the most important capability of the Hook is ability of mapping connection into credentials and simple Python API so that you can easily use it from Operator.
But adding _client as a field is also OK for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take a closer look tomorrow - could you please rebase though? There some changes in ssh hook since the last round |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach looks good to me. @uranusjr ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'm not sure how the refactored methods are useful, but they are refactored reasonably, and I have no problems to accept them if they help someone.
The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease. |
Created to aid discussion on #10874 , details are there.
closes: #10874