Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-1762] Implement key_file support in ssh_hook create_tunnel #3473

Closed
wants to merge 1 commit into from

Conversation

NielsZeilemaker
Copy link
Contributor

@NielsZeilemaker NielsZeilemaker commented Jun 7, 2018

Switched to using sshtunnel package instead of popen approach

Make sure you have checked all steps below.

JIRA

  • My PR addresses the following Airflow JIRA issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    The ssh_hook.create_tunnel was opening a tunnel using a popen. Moreover, it had the local port as a required argument. This PR refactored it to use the sshtunnel package build ontop of paramiko.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • [] In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.

Code Quality

  • Passes git diff upstream/master -u -- "*.py" | flake8 --diff

@NielsZeilemaker NielsZeilemaker force-pushed the ssh_hook branch 4 times, most recently from baf1eb1 to d556b4f Compare June 7, 2018 12:26
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to take a deeper looks since this, but some quick observations:

  • One thing that will break the API right now, is the removal of the __enter__ and __exit__ methods.
  • I do like that the subprocess stuff is removed.
  • Maybe add some documentation on the changes as well.
  • I strongly feel we should have some actual tests here instead of only mocking. Since this is such an important operator.

The ssh is one of the most used hook/operator, so we have to be careful when making changes here.

setup.py Outdated
@@ -188,7 +188,7 @@ def write_version(filename=os.path.join(*['airflow',
slack = ['slackclient>=1.0.0']
snowflake = ['snowflake-connector-python>=1.5.2',
'snowflake-sqlalchemy>=1.1.0']
ssh = ['paramiko>=2.1.1', 'pysftp>=0.2.9']
ssh = ['paramiko>=2.1.1', 'pysftp>=0.2.9', 'sshtunnel']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe lock on a major version?



class SSHHookTest(unittest.TestCase):
class SSHHookTest(unittest.TestCase):\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\ typo?

@codecov-io
Copy link

codecov-io commented Jun 13, 2018

Codecov Report

Merging #3473 into master will decrease coverage by 59.21%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #3473       +/-   ##
===========================================
- Coverage   77.17%   17.95%   -59.22%     
===========================================
  Files         206      203        -3     
  Lines       15769    15123      -646     
===========================================
- Hits        12169     2716     -9453     
- Misses       3600    12407     +8807
Impacted Files Coverage Δ
airflow/utils/log/es_task_handler.py 0% <0%> (-100%) ⬇️
airflow/operators/email_operator.py 0% <0%> (-100%) ⬇️
airflow/www/forms.py 0% <0%> (-100%) ⬇️
airflow/example_dags/example_latest_only.py 0% <0%> (-100%) ⬇️
airflow/example_dags/example_http_operator.py 0% <0%> (-100%) ⬇️
airflow/sensors/time_delta_sensor.py 0% <0%> (-100%) ⬇️
airflow/operators/sqlite_operator.py 0% <0%> (-100%) ⬇️
airflow/www_rbac/validators.py 0% <0%> (-100%) ⬇️
airflow/example_dags/example_subdag_operator.py 0% <0%> (-100%) ⬇️
...rflow/api/common/experimental/get_dag_run_state.py 0% <0%> (-100%) ⬇️
... and 166 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b889522...a93b9b9. Read the comment docs.

@NielsZeilemaker
Copy link
Contributor Author

@Fokko I guess this one is ready to be merged. I put some effort in making it backwards compatible with the previous hook, and added some deprecation warnings etc.

client.set_missing_host_key_policy(paramiko.AutoAddPolicy())

if self.password and self.password.strip():
client.connect(hostname=self.remote_host,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would not work if the private key has passphrase. I tried to do that and it fail. Can you please add key_filename=self.key_file to this if statement as well? That worked for me and I don't want to create a separate PR if it can be sorted out here itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi kaxil, I added the key_file for both ssh and the tunnel. The SSHTunnel package has the same behaviour as paramiko and will also use the provided password for the key_file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @NielsZeilemaker 👍

@NielsZeilemaker NielsZeilemaker force-pushed the ssh_hook branch 2 times, most recently from a2885ef to 7502d4a Compare June 25, 2018 18:04
@kaxil
Copy link
Member

kaxil commented Jul 21, 2018

Can you please resolve the conflicts?

@NielsZeilemaker
Copy link
Contributor Author

I'll have a go at it on Monday.

@NielsZeilemaker
Copy link
Contributor Author

@kaxil I've fixed the conflicts

@kaxil
Copy link
Member

kaxil commented Jul 23, 2018

Codewise this looks good to me. @Fokko Can you also please have a look and see if you have some comments?

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please remove the print statements from the tests.

self.server_handle = subprocess.Popen(["python", "-c", HELLO_SERVER_CMD],
stdout=subprocess.PIPE)
server_handle = subprocess.Popen(["python", "-c", HELLO_SERVER_CMD],
stdout=subprocess.PIPE)
print("Setting up tunnel")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can your remove the print statements please?

Switched to using sshtunnel package instead of popen approach
@asfgit asfgit closed this in 53933c0 Jul 24, 2018
lxneng pushed a commit to lxneng/incubator-airflow that referenced this pull request Aug 10, 2018
Switched to using sshtunnel package instead of
popen approach

Closes apache#3473 from NielsZeilemaker/ssh_hook
wmorris75 pushed a commit to modmed/incubator-airflow that referenced this pull request Sep 4, 2018
add 8fit to list of companies

[AIRFLOW-XXX] Add THE ICONIC to the list of orgs using Airflow

Closes apache#3807 from ksaagariconic/patch-2

[AIRFLOW-2933] Enable Codecov on Docker-CI Build (apache#3780)

- Add missing variables and use codecov instead of coveralls.
  The issue why it wasn't working was because missing environment variables.
  The codecov library heavily depends on the environment variables in
  the CI to determine how to push the reports to codecov.

- Remove the explicit passing of the variables in the `tox.ini`
  since it is already done in the `docker-compose.yml`,
  having to maintain this at two places makes it brittle.

- Removed the empty Codecov yml since codecov was complaining that
  it was unable to parse it

[AIRFLOW-2960] Pin boto3 to <1.8 (apache#3810)

Boto 1.8 has been released a few days ago and they break our tests.

[AIRFLOW-2957] Remove obselete sensor references

[AIRFLOW-2959] Refine HTTPSensor doc (apache#3809)

HTTP Error code other than 404,
or Connection Refused, would fail the sensor
itself directly (no more poking).

[AIRFLOW-2961] Refactor tests.BackfillJobTest.test_backfill_examples test (apache#3811)

Simplify this test since it takes up 15% of all the time. This is because
every example dag, with some exclusions, are backfilled. This will put some
pressure on the scheduler and everything. If the test just covers a couple
of dags should be sufficient

254 seconds:
[success] 15.03% tests.BackfillJobTest.test_backfill_examples: 254.9323s

[AIRFLOW-XXX] Remove residual line in Changelog (apache#3814)

[AIRFLOW-2930] Fix celery excecutor scheduler crash (apache#3784)

Caused by an update in PR apache#3740.
execute_command.apply_async(args=command, ...)
-command is a list of short unicode strings and the above code pass multiple
arguments to a function defined as taking only one argument.
-command = ["airflow", "run", "dag323",...]
-args = command = ["airflow", "run", "dag323", ...]
-execute_command("airflow","run","dag3s3", ...) will be error and exit.

[AIRFLOW-2916] Arg `verify` for AwsHook() & S3 sensors/operators (apache#3764)

This is useful when
1. users want to use a different CA cert bundle than the
  one used by botocore.
2. users want to have '--no-verify-ssl'. This is especially useful
  when we're using on-premises S3 or other implementations of
  object storage, like IBM's Cloud Object Storage.

The default value here is `None`, which is also the default
value in boto3, so that backward compatibility is ensured too.

Reference:
https://boto3.readthedocs.io/en/latest/reference/core/session.html

[AIRFLOW-2709] Improve error handling in Databricks hook (apache#3570)

* Use float for default value
* Use status code to determine whether an error is retryable
* Fix wrong type in assertion
* Fix style to prevent lines from exceeding 90 characters
* Fix wrong way of checking exception type

[AIRFLOW-2854] kubernetes_pod_operator add more configuration items (apache#3697)

* kubernetes_pod_operator add more configuration items
* fix test_kubernetes_pod_operator test_faulty_service_account failure case
* fix review comment issues
* pod_operator add hostnetwork config
* add doc example

[AIRFLOW-2994] Fix command status check in Qubole Check operator (apache#3790)

[AIRFLOW-2928] Use uuid4 instead of uuid1 (apache#3779)

for better randomness.

[AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (apache#3828)

[AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (apache#3828)

[AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (apache#3828)

[AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (apache#3828)

[AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (apache#3828)

[AIRFLOW-2993] Added sftp_to_s3 and s3_to_sftp operators (apache#3828)

[AIRFLOW-2993] Added sftp_to_s3 and s3_to_sftp operators (apache#3828)

[AIRFLOW-2949] Add syntax highlight for single quote strings (apache#3795)

* AIRFLOW-2949: Add syntax highlight for single quote strings

* AIRFLOW-2949: Also updated new UI main.css

[AIRFLOW-2948] Arg check & better doc - SSHOperator & SFTPOperator (apache#3793)

There may be different combinations of arguments, and
some processings are being done 'silently', while users
may not be fully aware of them.

For example
- User only needs to provide either `ssh_hook`
  or `ssh_conn_id`, while this is not clear in doc
- if both provided, `ssh_conn_id` will be ignored.
- if `remote_host` is provided, it will replace
  the `remote_host` which wasndefined in `ssh_hook`
  or predefined in the connection of `ssh_conn_id`

These should be documented clearly to ensure it's
transparent to the users. log.info() should also be
used to remind users and provide clear logs.

In addition, add instance check for ssh_hook to ensure
it is of the correct type (SSHHook).

Tests are updated for this PR.

[AIRFLOW-XXX] Fix Broken Link in CONTRIBUTING.md

[AIRFLOW-2980] ReadTheDocs - Fix Missing API Reference

[AIRFLOW-2984] Convert operator dates to UTC (apache#3822)

Tasks can have start_dates or end_dates separately
from the DAG. These need to be converted to UTC otherwise
we cannot use them for calculation the next execution
date.

[AIRFLOW-2779] Make GHE auth third party licensed (apache#3803)

This reinstates the original license.

[AIRFLOW-XXX] Add Format to list of companies (apache#3824)

[AIRFLOW-2993] Added sftp_to_s3 and s3_to_sftp operators (apache#3828)

[AIRFLOW-2993] Added sftp_to_s3 and s3_to_sftp operators (apache#3828)

Addition of s3_to_sftp and sftp_to_s3 operators.

[AIRFLOW-2900] Show code for packaged DAGs (apache#3749)

[AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro (apache#3821)

[AIRFLOW-2989] Add param to set bootDiskType in Dataproc Op (apache#3825)

Add param to set bootDiskType for master and
worker nodes in `DataprocClusterCreateOperator`

[AIRFLOW-2974] Extended Databricks hook with clusters operation (apache#3817)

Add hooks for:
- cluster start,
- restart,
- terminate.
Add unit tests for the added hooks.
Add hooks for cluster start, restart and terminate.
Add unit tests for the added hooks.
Add cluster_id variable for performing cluster operation tests.

[AIRFLOW-2993] Fix Docstrings for Operators (apache#3828)

Addition of s3_to_sftp and sftp_to_s3 operators.

Add 'steps' into template_fields in EmrAddSteps

Rendering templates which are in steps is especially useful if you
want to pass execution time as one of the paramaters of a step in
an EMR cluster. All fields in template_fields will get rendered.

[AIRFLOW-2993] Addition of s3_to_sftp and sftp_to_s3 operators.

Add 'steps' into template_fields in EmrAddSteps

Rendering templates which are in steps is especially useful if you
want to pass execution time as one of the paramaters of a step in
an EMR cluster. All fields in template_fields will get rendered.

[AIRFLOW-1762] Implement key_file support in ssh_hook create_tunnel

Switched to using sshtunnel package instead of
popen approach

Closes apache#3473 from NielsZeilemaker/ssh_hook

Addition of s3_to_sftp and sftp_to_s3 operators.

[AIRFLOW-2993] sftp_to_s3 and s3_to_sftp Operators (apache#3828)

Addition of s3_to_sftp and sftp_to_s3 operators.

Add 'steps' into template_fields in EmrAddSteps

Rendering templates which are in steps is especially useful if you
want to pass execution time as one of the paramaters of a step in
an EMR cluster. All fields in template_fields will get rendered.

[AIRFLOW-2993] Addition of s3_to_sftp and sftp_to_s3 operators.

Add 'steps' into template_fields in EmrAddSteps

Rendering templates which are in steps is especially useful if you
want to pass execution time as one of the paramaters of a step in
an EMR cluster. All fields in template_fields will get rendered.

[AIRFLOW-1762] Implement key_file support in ssh_hook create_tunnel

Switched to using sshtunnel package instead of
popen approach

Closes apache#3473 from NielsZeilemaker/ssh_hook

Addition of s3_to_sftp and sftp_to_s3 operators.

[AIRFLOW-2974] Extended Databricks hook with clusters operation (apache#3817)

Add hooks for:
- cluster start,
- restart,
- terminate.
Add unit tests for the added hooks.
Add hooks for cluster start, restart and terminate.
Add unit tests for the added hooks.
Add cluster_id variable for performing cluster operation tests.

[AIRFLOW-XXX] Fix Docstrings for Operators (apache#3820)

Addition of s3_to_sftp and sftp_to_s3 operators.

Add 'steps' into template_fields in EmrAddSteps

Rendering templates which are in steps is especially useful if you
want to pass execution time as one of the paramaters of a step in
an EMR cluster. All fields in template_fields will get rendered.

[AIRFLOW-2993] Addition of s3_to_sftp and sftp_to_s3 operators.

Add 'steps' into template_fields in EmrAddSteps

Rendering templates which are in steps is especially useful if you
want to pass execution time as one of the paramaters of a step in
an EMR cluster. All fields in template_fields will get rendered.

[AIRFLOW-1762] Implement key_file support in ssh_hook create_tunnel

Switched to using sshtunnel package instead of
popen approach

Closes apache#3473 from NielsZeilemaker/ssh_hook

Addition of s3_to_sftp and sftp_to_s3 operators.

[AIRFLOW-XXX] Fix Docstrings for Operators (apache#3820)

Addition of s3_to_sftp and sftp_to_s3 operators.

Add 'steps' into template_fields in EmrAddSteps

Rendering templates which are in steps is especially useful if you
want to pass execution time as one of the paramaters of a step in
an EMR cluster. All fields in template_fields will get rendered.

[AIRFLOW-2993] Addition of s3_to_sftp and sftp_to_s3 operators.

Add 'steps' into template_fields in EmrAddSteps

Rendering templates which are in steps is especially useful if you
want to pass execution time as one of the paramaters of a step in
an EMR cluster. All fields in template_fields will get rendered.

[AIRFLOW-1762] Implement key_file support in ssh_hook create_tunnel

Switched to using sshtunnel package instead of
popen approach

Closes apache#3473 from NielsZeilemaker/ssh_hook

[AIRFLOW-2900] Show code for packaged DAGs (apache#3749)

[AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro (apache#3821)

[AIRFLOW-2989] Add param to set bootDiskType in Dataproc Op (apache#3825)

Add param to set bootDiskType for master and
worker nodes in `DataprocClusterCreateOperator`

[AIRFLOW-2974] Extended Databricks hook with clusters operation (apache#3817)

Add hooks for:
- cluster start,
- restart,
- terminate.
Add unit tests for the added hooks.
Add hooks for cluster start, restart and terminate.
Add unit tests for the added hooks.
Add cluster_id variable for performing cluster operation tests.

[AIRFLOW-2993] Fix Docstrings for Operators (apache#3828)

Addition of s3_to_sftp and sftp_to_s3 operators.

Add 'steps' into template_fields in EmrAddSteps

Rendering templates which are in steps is especially useful if you
want to pass execution time as one of the paramaters of a step in
an EMR cluster. All fields in template_fields will get rendered.

[AIRFLOW-2993] Addition of s3_to_sftp and sftp_to_s3 operators.

Add 'steps' into template_fields in EmrAddSteps

Rendering templates which are in steps is especially useful if you
want to pass execution time as one of the paramaters of a step in
an EMR cluster. All fields in template_fields will get rendered.

[AIRFLOW-1762] Implement key_file support in ssh_hook create_tunnel

Switched to using sshtunnel package instead of
popen approach

Closes apache#3473 from NielsZeilemaker/ssh_hook

Addition of s3_to_sftp and sftp_to_s3 operators.

[AIRFLOW-2993] sftp_to_s3 and s3_to_sftp Operators (apache#3828)

Addition of s3_to_sftp and sftp_to_s3 operators.

Add 'steps' into template_fields in EmrAddSteps

Rendering templates which are in steps is especially useful if you
want to pass execution time as one of the paramaters of a step in
an EMR cluster. All fields in template_fields will get rendered.

[AIRFLOW-2993] Addition of s3_to_sftp and sftp_to_s3 operators.

Add 'steps' into template_fields in EmrAddSteps

Rendering templates which are in steps is especially useful if you
want to pass execution time as one of the paramaters of a step in
an EMR cluster. All fields in template_fields will get rendered.

[AIRFLOW-1762] Implement key_file support in ssh_hook create_tunnel

Switched to using sshtunnel package instead of
popen approach

Closes apache#3473 from NielsZeilemaker/ssh_hook

Addition of s3_to_sftp and sftp_to_s3 operators.

[AIRFLOW-2974] Extended Databricks hook with clusters operation (apache#3817)

Add hooks for:
- cluster start,
- restart,
- terminate.
Add unit tests for the added hooks.
Add hooks for cluster start, restart and terminate.
Add unit tests for the added hooks.
Add cluster_id variable for performing cluster operation tests.

[AIRFLOW-XXX] Fix Docstrings for Operators (apache#3820)

Addition of s3_to_sftp and sftp_to_s3 operators.

Add 'steps' into template_fields in EmrAddSteps

Rendering templates which are in steps is especially useful if you
want to pass execution time as one of the paramaters of a step in
an EMR cluster. All fields in template_fields will get rendered.

[AIRFLOW-2993] Addition of s3_to_sftp and sftp_to_s3 operators.

Add 'steps' into template_fields in EmrAddSteps

Rendering templates which are in steps is especially useful if you
want to pass execution time as one of the paramaters of a step in
an EMR cluster. All fields in template_fields will get rendered.

[AIRFLOW-1762] Implement key_file support in ssh_hook create_tunnel

Switched to using sshtunnel package instead of
popen approach

Closes apache#3473 from NielsZeilemaker/ssh_hook

Addition of s3_to_sftp and sftp_to_s3 operators.

[AIRFLOW-XXX] Fix Docstrings for Operators (apache#3820)

Addition of s3_to_sftp and sftp_to_s3 operators.

Add 'steps' into template_fields in EmrAddSteps

Rendering templates which are in steps is especially useful if you
want to pass execution time as one of the paramaters of a step in
an EMR cluster. All fields in template_fields will get rendered.

[AIRFLOW-2993] Addition of s3_to_sftp and sftp_to_s3 operators.

Add 'steps' into template_fields in EmrAddSteps

Rendering templates which are in steps is especially useful if you
want to pass execution time as one of the paramaters of a step in
an EMR cluster. All fields in template_fields will get rendered.

[AIRFLOW-1762] Implement key_file support in ssh_hook create_tunnel

Switched to using sshtunnel package instead of
popen approach

Closes apache#3473 from NielsZeilemaker/ssh_hook

Addition of s3_to_sftp and sftp_to_s3 operators.

[AIRFLOW-2993] Renamed operators to meet name length requirements.

[AIRFLOW-2993] Renamed operators to meet name length requirements (apache#3828)

[AIRFLOW-2993] Corrected flake8 line diff format (apache#3828)

[AIRFLOW-2993] Corrected flake8 line diff format (apache#3828)

[AIRFLOW-2900] Show code for packaged DAGs (apache#3749)

[AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro (apache#3821)

[AIRFLOW-2974] Extended Databricks hook with clusters operation (apache#3817)

Add hooks for:
- cluster start,
- restart,
- terminate.
Add unit tests for the added hooks.
Add hooks for cluster start, restart and terminate.
Add unit tests for the added hooks.
Add cluster_id variable for performing cluster operation tests.

[AIRFLOW-XXX] Fix Docstrings for Operators (apache#3820)

[AIRFLOW-2993] Corrected flake8 line diff format (apache#3828)

[AIRFLOW-2949] Add syntax highlight for single quote strings (apache#3795)

* AIRFLOW-2949: Add syntax highlight for single quote strings

* AIRFLOW-2949: Also updated new UI main.css

[AIRFLOW-XXX] Fix Broken Link in CONTRIBUTING.md

[AIRFLOW-2980] ReadTheDocs - Fix Missing API Reference

[AIRFLOW-2984] Convert operator dates to UTC (apache#3822)

Tasks can have start_dates or end_dates separately
from the DAG. These need to be converted to UTC otherwise
we cannot use them for calculation the next execution
date.

[AIRFLOW-2779] Make GHE auth third party licensed (apache#3803)

This reinstates the original license.

[AIRFLOW-XXX] Add Format to list of companies (apache#3824)

[AIRFLOW-2900] Show code for packaged DAGs (apache#3749)

[AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro (apache#3821)

[AIRFLOW-2974] Extended Databricks hook with clusters operation (apache#3817)

Add hooks for:
- cluster start,
- restart,
- terminate.
Add unit tests for the added hooks.
Add hooks for cluster start, restart and terminate.
Add unit tests for the added hooks.
Add cluster_id variable for performing cluster operation tests.

[AIRFLOW-XXX] Fix Docstrings for Operators (apache#3820)

[AIRFLOW-2949] Add syntax highlight for single quote strings (apache#3795)

* AIRFLOW-2949: Add syntax highlight for single quote strings

* AIRFLOW-2949: Also updated new UI main.css

[AIRFLOW-XXX] Fix Broken Link in CONTRIBUTING.md

[AIRFLOW-2980] ReadTheDocs - Fix Missing API Reference

[AIRFLOW-2984] Convert operator dates to UTC (apache#3822)

Tasks can have start_dates or end_dates separately
from the DAG. These need to be converted to UTC otherwise
we cannot use them for calculation the next execution
date.

[AIRFLOW-2779] Make GHE auth third party licensed (apache#3803)

This reinstates the original license.

[AIRFLOW-XXX] Add Format to list of companies (apache#3824)

[AIRFLOW-2900] Show code for packaged DAGs (apache#3749)

[AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro (apache#3821)

[AIRFLOW-2974] Extended Databricks hook with clusters operation (apache#3817)

Add hooks for:
- cluster start,
- restart,
- terminate.
Add unit tests for the added hooks.
Add hooks for cluster start, restart and terminate.
Add unit tests for the added hooks.
Add cluster_id variable for performing cluster operation tests.

[AIRFLOW-XXX] Fix Docstrings for Operators (apache#3820)

[AIRFLOW-2949] Add syntax highlight for single quote strings (apache#3795)

* AIRFLOW-2949: Add syntax highlight for single quote strings

* AIRFLOW-2949: Also updated new UI main.css

[AIRFLOW-2948] Arg check & better doc - SSHOperator & SFTPOperator (apache#3793)

There may be different combinations of arguments, and
some processings are being done 'silently', while users
may not be fully aware of them.

For example
- User only needs to provide either `ssh_hook`
  or `ssh_conn_id`, while this is not clear in doc
- if both provided, `ssh_conn_id` will be ignored.
- if `remote_host` is provided, it will replace
  the `remote_host` which wasndefined in `ssh_hook`
  or predefined in the connection of `ssh_conn_id`

These should be documented clearly to ensure it's
transparent to the users. log.info() should also be
used to remind users and provide clear logs.

In addition, add instance check for ssh_hook to ensure
it is of the correct type (SSHHook).

Tests are updated for this PR.

[AIRFLOW-XXX] Fix Broken Link in CONTRIBUTING.md

[AIRFLOW-2980] ReadTheDocs - Fix Missing API Reference

[AIRFLOW-2984] Convert operator dates to UTC (apache#3822)

Tasks can have start_dates or end_dates separately
from the DAG. These need to be converted to UTC otherwise
we cannot use them for calculation the next execution
date.

[AIRFLOW-2779] Make GHE auth third party licensed (apache#3803)

This reinstates the original license.

[AIRFLOW-XXX] Add Format to list of companies (apache#3824)

[AIRFLOW-2900] Show code for packaged DAGs (apache#3749)

[AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro (apache#3821)

[AIRFLOW-2989] Add param to set bootDiskType in Dataproc Op (apache#3825)

Add param to set bootDiskType for master and
worker nodes in `DataprocClusterCreateOperator`

[AIRFLOW-2974] Extended Databricks hook with clusters operation (apache#3817)

Add hooks for:
- cluster start,
- restart,
- terminate.
Add unit tests for the added hooks.
Add hooks for cluster start, restart and terminate.
Add unit tests for the added hooks.
Add cluster_id variable for performing cluster operation tests.

[AIRFLOW-XXX] Fix Docstrings for Operators (apache#3820)

[AIRFLOW-2949] Add syntax highlight for single quote strings (apache#3795)

* AIRFLOW-2949: Add syntax highlight for single quote strings

* AIRFLOW-2949: Also updated new UI main.css

[AIRFLOW-2948] Arg check & better doc - SSHOperator & SFTPOperator (apache#3793)

There may be different combinations of arguments, and
some processings are being done 'silently', while users
may not be fully aware of them.

For example
- User only needs to provide either `ssh_hook`
  or `ssh_conn_id`, while this is not clear in doc
- if both provided, `ssh_conn_id` will be ignored.
- if `remote_host` is provided, it will replace
  the `remote_host` which wasndefined in `ssh_hook`
  or predefined in the connection of `ssh_conn_id`

These should be documented clearly to ensure it's
transparent to the users. log.info() should also be
used to remind users and provide clear logs.

In addition, add instance check for ssh_hook to ensure
it is of the correct type (SSHHook).

Tests are updated for this PR.

[AIRFLOW-XXX] Fix Broken Link in CONTRIBUTING.md

[AIRFLOW-2980] ReadTheDocs - Fix Missing API Reference

[AIRFLOW-2984] Convert operator dates to UTC (apache#3822)

Tasks can have start_dates or end_dates separately
from the DAG. These need to be converted to UTC otherwise
we cannot use them for calculation the next execution
date.

[AIRFLOW-2779] Make GHE auth third party licensed (apache#3803)

This reinstates the original license.

[AIRFLOW-XXX] Add Format to list of companies (apache#3824)

[AIRFLOW-2900] Show code for packaged DAGs (apache#3749)

[AIRFLOW-2983] Add prev_ds_nodash and next_ds_nodash macro (apache#3821)

[AIRFLOW-2974] Extended Databricks hook with clusters operation (apache#3817)

Add hooks for:
- cluster start,
- restart,
- terminate.
Add unit tests for the added hooks.
Add hooks for cluster start, restart and terminate.
Add unit tests for the added hooks.
Add cluster_id variable for performing cluster operation tests.

[AIRFLOW-XXX] Fix Docstrings for Operators (apache#3820)

[AIRFLOW-2994] Fix flatten_results for BigQueryOperator (apache#3829)

[AIRFLOW-2951] Update dag_run table end_date when state change (apache#3798)

The existing airflow only change dag_run table end_date value when
a user teminate a dag in web UI. The end_date will not be updated
if airflow detected a dag finished and updated its state.

This commit add end_date update in DagRun's set_state function to
make up tho problem mentioned above.

[AIRFLOW-2145] fix deadlock on clearing running TI (apache#3657)

a `shutdown` task is not considered be `unfinished`, so a dag run can
deadlock when all `unfinished` downstreams are all waiting on a task
that's in the `shutdown` state. fix this by considering `shutdown` to
be `unfinished`, since it's not truly a terminal state

[AIRFLOW-2981] Fix TypeError in dataflow operators (apache#3831)

- Fix TypeError in dataflow operators when using GCS jar or py_file

[AIRFLOW-XXX] Fix typo in docstring of gcs_to_bq (apache#3833)

[AIRFLOW-2476] Allow tabulate up to 0.8.2 (apache#3835)

[AIRFLOW-XXX] Fix typos in faq.rst (apache#3837)

[AIRFLOW-2979] Make celery_result_backend conf Backwards compatible (apache#3832)

(apache#2806) Renamed `celery_result_backend` to `result_backend` and broke backwards compatibility.

[AIRFLOW-2866] Fix missing CSRF token head when using RBAC UI (apache#3804)

[AIRFLOW-491] Add feature to pass extra api configs to BQ Hook (apache#3733)

[AIRFLOW-208] Add badge to show supported Python versions (apache#3839)

[AIRFLOW-2993] Added sftp_to_s3 operator and s3_to_sftp operator. (apache#3828)
ashb pushed a commit that referenced this pull request Oct 22, 2018
Switched to using sshtunnel package instead of
popen approach

Closes #3473 from NielsZeilemaker/ssh_hook
ashb pushed a commit to ashb/airflow that referenced this pull request Oct 22, 2018
Switched to using sshtunnel package instead of
popen approach

Closes apache#3473 from NielsZeilemaker/ssh_hook
galak75 pushed a commit to VilledeMontreal/incubator-airflow that referenced this pull request Nov 23, 2018
Switched to using sshtunnel package instead of
popen approach

Closes apache#3473 from NielsZeilemaker/ssh_hook
aliceabe pushed a commit to aliceabe/incubator-airflow that referenced this pull request Jan 3, 2019
Switched to using sshtunnel package instead of
popen approach

Closes apache#3473 from NielsZeilemaker/ssh_hook
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Jan 23, 2019
Switched to using sshtunnel package instead of
popen approach

Closes apache#3473 from NielsZeilemaker/ssh_hook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants