-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First blog post from external contributor about AWS EMR #64
Conversation
Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>
|
||
## 2. Set up `CONF_ROOT` | ||
|
||
By default, Kedro looks at the root `conf` folder for all its configurations (catalog, parameters, globals, credentials, logging) to run the pipelines. However, [this can be customized](https://docs.kedro.org/en/stable/kedro_project_setup/configuration.html#configuration-root) by changing `CONF_ROOT` in `settings.py`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this more annoying in 0.18.7 we do package configuration as a separate tar.gz
file which can be used in conjunction with then --conf-source
flag
from proj_name.__main__ import main: | ||
|
||
if __name__ == "__main__": | ||
# params = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this comment style is intuitive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll turn it into a triple quote with appropriate indentation and add notes for better understanding.
|
||
Upload the relevant files to an S3 bucket (EMR should have access to this bucket), in order to run the Spark Job. The following artifacts should be uploaded to S3: | ||
|
||
- .egg [file created in step #3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question for engineers - egg is old, wheel is new. Should we use that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obliterate eggs from everywhere please! kedro-org/kedro#2273
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll remove the .egg
occurrences, will need to do a sanity check if spark-submit --py-files
works the same with .whl
files too.
Still needs a bit more introduction (why use EMR?) and conclusion.
Adding a new folder where I'll work on blog posts collaboratively when external authors want to contribute in markdown.
Adding a new file because there's an author working on a post about EMR and Kedro 💃