Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/ssl support dbtspark #169

Merged
merged 22 commits into from
Jun 7, 2021
Merged

Feature/ssl support dbtspark #169

merged 22 commits into from
Jun 7, 2021

Conversation

rahulgoyal2987
Copy link
Contributor

@rahulgoyal2987 rahulgoyal2987 commented May 19, 2021

resolves #

Description

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rahulgoyal2987 Thanks for the PR!

I'm not opposed to the changes you're proposing here; my main concern is with the new dependency (https://github.com/devinstevenson/pure-transport). I see it has a small number of stars/users/contributors. Is this a package you've relied on for other projects?

FWIW it does seem tailor-made to the our use case, given the reliance on PyHive for thrift + http connections.

requirements.txt Outdated
@@ -3,3 +3,4 @@ PyHive[hive]>=0.6.0,<0.7.0
pyodbc>=4.0.30
sqlparams>=3.0.0
thrift>=0.11.0,<0.12.0
pure-transport>=0.2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we put an upper bound on the version here, since it's version-0 software?

You'll need to add this to the extra setup requirements as well:
https://github.com/fishtown-analytics/dbt-spark/blob/dff1b613ddf87e4e72e8a47475bcfd1d55796a5c/setup.py#L41-L44

I'm inclined to bundle it in with the larger PyHive extra, unless there's a good reason to bundle it separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtcohen6 I have added the logic to build transport object rather than using pure-transport

dbt/adapters/spark/connections.py Outdated Show resolved Hide resolved
@rahulgoyal2987
Copy link
Contributor Author

rahulgoyal2987 commented May 20, 2021

@jtcohen6 Major use of pure-transport dependency (https://github.com/devinstevenson/pure-transport) It is providing support for TSSLSocket which pyhive is not providing. I have added the logic to build transport object and remove pure-transport dependency

@jtcohen6
Copy link
Contributor

@rahulgoyal2987 This is really neat! Have you managed to test this with your own projects, and confirm that it works as expected?

@rahulgoyal2987
Copy link
Contributor Author

rahulgoyal2987 commented May 25, 2021

@jtcohen6 I tested with my own project and setup and it works as expected.

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rahulgoyal2987 Thanks for all the work here! This isn't functionality I'm able to test directly, so I appreciate the unit test, and your confirmation that this works for your use case.

My only comments are around package imports: making sure the dependencies are appropriately installed alongside PyHive extras, or handled if missing.

Could you also:

  • Add use_ssl to the README as a supported connection parameter
  • Add a Changelog entry (under dbt next), and add yourself to the list of contributors

Comment on lines 29 to 31
from thrift.transport.TSSLSocket import TSSLSocket
import thrift
import ssl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's include these up in the try import (line 11) for requirements that may or may not be installed.

ssl is a net-new dependency, right? We'll need to add it in setup.py, within the Pyhive extra

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ssl is part of python library https://docs.python.org/3/library/ssl.html so i think it is not required to add

Copy link
Contributor

@jtcohen6 jtcohen6 Jun 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I said ssl here but I was thinking of thrift_sasl. That's a net-new dependency, right? Just kidding, I follow now, those are dependencies of PyHive[hive]. Thanks for the clarification around ssl.

Comment on lines 460 to 462
# Defer import so package dependency is optional
import sasl
import thrift_sasl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a different way of handling these optional imports. See the try logic at the top of the file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Could you move these up alongside the other imports (lines 29-32)? I'd prefer not to have imports nested so far down in the fil.

@rahulgoyal2987
Copy link
Contributor Author

@rahulgoyal2987 Thanks for all the work here! This isn't functionality I'm able to test directly, so I appreciate the unit test, and your confirmation that this works for your use case.

My only comments are around package imports: making sure the dependencies are appropriately installed alongside PyHive extras, or handled if missing.

Could you also:

  • Add use_ssl to the README as a supported connection parameter
  • Add a Changelog entry (under dbt next), and add yourself to the list of contributors

Done

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the contribution @rahulgoyal2987

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants