-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initialise PySpark using hooks rather than custom context #1563
Comments
@Galileo-Galilei comment:
|
I've been thinking it may be nice to have some sort of |
@datajoely could those also be put into an e.g. all the spark instantiation requires access to is |
I think you have to assume you would want local/dev/prod spark configuration environments. I think all of these can be migrated to this pattern, my push here is to make sure we think about remote execution targets in abstract when developing this. Personally - I'm very keen to build a Snowpark implementation, but will only do so once this stabilises. |
Completed in kedro-org/kedro-starters#102 |
Spun out of #506 (comment).
Currently PySpark is initialised in kedro using a custom context. We now have a much better place to do this:
after_context_created
hook defined in hooks.py. This would look something like this:In addition to making the hooks.py file you should remove the context.py file and edit settings.py to instantiate
SparkHooks
inHOOKS
and no longer provide a custom context.We need to change this in a:
Maybe in future
SparkHooks
will live somewhere within the kedro package so that users can just dofrom kedro.extras.hooks.pyspark import SparkHooks
and not need to define the hook themselves.Related/possible alternative? #904
The text was updated successfully, but these errors were encountered: