Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-4268] Add MsSqlToGoogleCloudStorageOperator #5077

Merged
merged 1 commit into from
Apr 21, 2019

Conversation

Tomme
Copy link
Contributor

@Tomme Tomme commented Apr 10, 2019

Make sure you have checked all steps below.

Jira

Description

  • Added MsSqlToGoogleCloudStorageOperator using MsSqlHook and GoogleCloudStorageHook.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    • tests/contrib/operators/test_mssql_to_gcs_operator.py

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

@OmerJog
Copy link
Contributor

OmerJog commented Apr 10, 2019

My 2 notes:

  1. I think it's better to support all the relevant argument of the hook. For example the operator introduced in this PR doesn't support the gzip flag of the GoogleCloudStorageHook (reference [AIRFLOW-2932] GoogleCloudStorageHook - allow compression of file #3893 )

  2. PR [AIRFLOW-4236] Add num_retries to MySqlToGoogleCloudStorageOperator #5043 raised a question about the num_retries of the hook/operator. I didn't see the committers agree on course of action hopefully it will be done soon and this PR will be updated accordingly. No longer relevant after [AIRFLOW-4255] Replace Discovery based api with client based for GCS #5054

@codecov-io
Copy link

codecov-io commented Apr 10, 2019

Codecov Report

Merging #5077 into master will decrease coverage by 0.44%.
The diff coverage is 98.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5077      +/-   ##
==========================================
- Coverage   77.51%   77.07%   -0.45%     
==========================================
  Files         466      466              
  Lines       29986    30042      +56     
==========================================
- Hits        23244    23155      -89     
- Misses       6742     6887     +145
Impacted Files Coverage Δ
airflow/contrib/operators/mssql_to_gcs.py 98.75% <98.75%> (ø)
airflow/operators/mysql_operator.py 0% <0%> (-100%) ⬇️
airflow/contrib/operators/vertica_operator.py 0% <0%> (-100%) ⬇️
airflow/operators/mysql_to_hive.py 0% <0%> (-100%) ⬇️
airflow/hooks/webhdfs_hook.py 36.36% <0%> (-51.4%) ⬇️
airflow/contrib/operators/winrm_operator.py 0% <0%> (-44.07%) ⬇️
airflow/utils/sqlalchemy.py 77.27% <0%> (-4.55%) ⬇️
airflow/hooks/hive_hooks.py 73.28% <0%> (-1.86%) ⬇️
airflow/models/connection.py 64.6% <0%> (-1.13%) ⬇️
airflow/hooks/dbapi_hook.py 87.93% <0%> (-0.87%) ⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb4e39a...d0400f4. Read the comment docs.

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor change suggest otherwise LGTM. Good work @Tomme

@Tomme
Copy link
Contributor Author

Tomme commented Apr 12, 2019

Cheers @kaxil, all squashed now 👍

airflow/contrib/operators/mssql_to_gcs.py Outdated Show resolved Hide resolved
airflow/contrib/operators/mssql_to_gcs.py Show resolved Hide resolved
airflow/contrib/operators/mssql_to_gcs.py Show resolved Hide resolved
airflow/contrib/operators/mssql_to_gcs.py Show resolved Hide resolved
airflow/contrib/operators/mssql_to_gcs.py Outdated Show resolved Hide resolved
airflow/contrib/operators/mssql_to_gcs.py Outdated Show resolved Hide resolved
airflow/contrib/operators/mssql_to_gcs.py Show resolved Hide resolved
@Tomme
Copy link
Contributor Author

Tomme commented Apr 15, 2019

@OmerJog Regarding your first point (and PR #3893), I agree it would be good to support gzip however what would be the best approach to take? Should we improve operators on an ad-hoc basis or in bulk?

@OmerJog
Copy link
Contributor

OmerJog commented Apr 15, 2019

@Tomme This is a new operator to the repo so I think it's better to have it with as much functionality as possible. However there is also the question of your time.

* Add operator MsSqlToGoogleCloudStorageOperator
* Add unit test for MsSqlToGoogleCloudStorageOperator
* Update intergration.rst documentation
@OmerJog
Copy link
Contributor

OmerJog commented Apr 21, 2019

@kaxil PTAL

@kaxil kaxil merged commit 98388ae into apache:master Apr 21, 2019
@kaxil
Copy link
Member

kaxil commented Apr 21, 2019

Thanks @OmerJog

andriisoldatenko pushed a commit to andriisoldatenko/airflow that referenced this pull request Jul 26, 2019
wmorris75 pushed a commit to modmed/incubator-airflow that referenced this pull request Jul 29, 2019
dharamsk pushed a commit to postmates/airflow that referenced this pull request Aug 8, 2019
@stpavinash
Copy link

I'm trying to connect to sql server instance using this operator and trying to a query a table which has got 20M records and the airflow task is in running state for atleast 8 hours and it never generated a json file in GCS. However, tables that have like 200k records were queried in like 1 min and json was created in GCS. Could you please help?

@mik-laj
Copy link
Member

mik-laj commented Oct 5, 2019

Can you create a bug report in Jira? Please add the "gcp" component.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants