Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor SSHOperator so a subclass can run many commands (#10874) #17378

Merged
merged 2 commits into from
Oct 13, 2021
Merged

Refactor SSHOperator so a subclass can run many commands (#10874) #17378

merged 2 commits into from
Oct 13, 2021

Conversation

baolsen
Copy link
Contributor

@baolsen baolsen commented Aug 2, 2021

Created to aid discussion on #10874 , details are there.

closes: #10874

@baolsen
Copy link
Contributor Author

baolsen commented Aug 11, 2021

Hey @potiuk , I saw you made a recent change to SSH Operator. I've rebased and included it.
Was wondering if you'd like to review this change also, or know others who may be interested :)

self.log.info('Creating ssh_client')
return self.get_hook().get_conn()

def exec_ssh_client_command(self, ssh_client: SSHClient, command: str) -> Tuple[int, bytes, bytes]:
Copy link
Member

@uranusjr uranusjr Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a wrong abstraction for me since self here is only used for logging, and it’s entirely up to the caller to pass in the correct SSHClient instance, which the operator should be able to manage.

Would something like this make more sense?

@property
def client(self):
    if self._client is None:
        raise RuntimeError("Outside of a create_ssh_client() context")
    return self._client

def execute(self, context=None) -> Union[bytes, str]:
    with self.create_ssh_client():  # This sets self._client so it can be used by other methods.
        self.run_remote_command(command)
        # On exit, close self._client and set self._client to None.
    # Error handling and serialization etc. afterward omitted for brevity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @uranusjr

I've pushed another commit along these lines, please take a look.
(I know this build is failing, can ignore & I'll work on it)

The thing is we don't want to call super.execute() from a subclass.
So I put the error handling etc. outside it so it can be re-used by a subclass when needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build has passed. Was flaky CI before now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok to keep it as is (re: abstraction) - with passing client, we already have some hooks that do that stateless approach (and some that keep state of the connection in).

No strong opinions which is better. The statefull approach is better from OO perspective and gives more meaning to Hook as also being 'session'. But this is not necessary really. Hook (and it is a bad name) is more of a "nice API" for operator to (re-)use and to understand "connection" and read credentials from it.

I think we never agreed on whether Hook should be 1<->1 session/client and maybe it does not really matter. I think the most important capability of the Hook is ability of mapping connection into credentials and simple Python API so that you can easily use it from Operator.

But adding _client as a field is also OK for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @potiuk, thanks for the feedback.
Hey @uranusjr, please review again and let me know :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @potiuk and @uranusjr , any feedback :)

@baolsen baolsen requested a review from uranusjr August 18, 2021 12:21
@baolsen baolsen requested a review from potiuk October 5, 2021 06:42
@potiuk
Copy link
Member

potiuk commented Oct 5, 2021

I will take a closer look tomorrow - could you please rebase though? There some changes in ssh hook since the last round

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach looks good to me. @uranusjr ?

Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'm not sure how the refactored methods are useful, but they are refactored reasonably, and I have no problems to accept them if they help someone.

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Oct 12, 2021
@github-actions
Copy link

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers okay to merge It's ok to merge this PR as it does not require more tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SSHHook get_conn() does not re-use client
3 participants