First blog post from external contributor about AWS EMR #64

stichbury · 2023-04-19T15:24:39Z

Adding a new folder where I'll work on blog posts collaboratively when external authors want to contribute in markdown.

Adding a new file because there's an author working on a post about EMR and Kedro 💃

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

datajoely · 2023-04-20T07:33:07Z

blog_post_collaboration/deploy_Kedro_pipelines_on_EMR.md

+
+## 2. Set up `CONF_ROOT`
+
+By default, Kedro looks at the root `conf` folder for all its configurations (catalog, parameters, globals, credentials, logging) to run the pipelines. However, [this can be customized](https://docs.kedro.org/en/stable/kedro_project_setup/configuration.html#configuration-root) by changing `CONF_ROOT`  in `settings.py`. 


To make this more annoying in 0.18.7 we do package configuration as a separate tar.gz file which can be used in conjunction with then --conf-source flag

datajoely · 2023-04-20T07:34:13Z

blog_post_collaboration/deploy_Kedro_pipelines_on_EMR.md

+from proj_name.__main__ import main: 
+
+if __name__ == "__main__":
+# params = [ 


I don't know if this comment style is intuitive

I'll turn it into a triple quote with appropriate indentation and add notes for better understanding.

datajoely · 2023-04-20T07:34:51Z

blog_post_collaboration/deploy_Kedro_pipelines_on_EMR.md

+
+Upload the relevant files to an S3 bucket (EMR should have access to this bucket), in order to run the Spark Job. The following artifacts should be uploaded to S3:
+
+-   .egg [file created in step #3]


Question for engineers - egg is old, wheel is new. Should we use that?

Obliterate eggs from everywhere please! kedro-org/kedro#2273

Okay, I'll remove the .egg occurrences, will need to do a sanity check if spark-submit --py-files works the same with .whl files too.

Still needs a bit more introduction (why use EMR?) and conclusion.

Add first draft blog post

490c1d6

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

stichbury added the Blog post creation Blog posts (ideas and execution) label Apr 19, 2023

stichbury self-assigned this Apr 19, 2023

Update deploy_Kedro_pipelines_on_EMR.md

8ec3b10

stichbury requested a review from datajoely April 19, 2023 15:50

datajoely reviewed Apr 20, 2023

View reviewed changes

stichbury changed the title ~~Add folder for blog post collaboration and first blog post~~ First blog post from external contributor about AWS EMR May 2, 2023

stichbury added 2 commits May 5, 2023 09:59

Minor tweak to introductory language

8dab26d

Still needs a bit more introduction (why use EMR?) and conclusion.

Merge branch 'main' into add-blog-draft-from-collaborator

5e4fc72

stichbury merged commit c50735f into main May 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First blog post from external contributor about AWS EMR #64

First blog post from external contributor about AWS EMR #64

stichbury commented Apr 19, 2023

datajoely Apr 20, 2023

datajoely Apr 20, 2023

afaqueahmad7117 Apr 30, 2023

datajoely Apr 20, 2023

astrojuanlu Apr 20, 2023

afaqueahmad7117 Apr 30, 2023


		## 2. Set up `CONF_ROOT`

		By default, Kedro looks at the root `conf` folder for all its configurations (catalog, parameters, globals, credentials, logging) to run the pipelines. However, [this can be customized](https://docs.kedro.org/en/stable/kedro_project_setup/configuration.html#configuration-root) by changing `CONF_ROOT` in `settings.py`.


		Upload the relevant files to an S3 bucket (EMR should have access to this bucket), in order to run the Spark Job. The following artifacts should be uploaded to S3:

		- .egg [file created in step #3]

First blog post from external contributor about AWS EMR #64

First blog post from external contributor about AWS EMR #64

Conversation

stichbury commented Apr 19, 2023

datajoely Apr 20, 2023

Choose a reason for hiding this comment

datajoely Apr 20, 2023

Choose a reason for hiding this comment

afaqueahmad7117 Apr 30, 2023

Choose a reason for hiding this comment

datajoely Apr 20, 2023

Choose a reason for hiding this comment

astrojuanlu Apr 20, 2023

Choose a reason for hiding this comment

afaqueahmad7117 Apr 30, 2023

Choose a reason for hiding this comment