GitHub - lindsaybolz/nira-interview

Get setup to develop for the take home assignment.

First, fork the nira-interview repo into your own repo. You may need to install git LFS to get the clone to work.

Install Pyenv to switch between different Python versions.
https://github.com/pyenv/pyenv, windows: https://github.com/pyenv-win/pyenv-win
Install the python version specified in /nira-interview/.python-version using Pyenv. pyenv install 3.9.6
Navigate into the /nira-interview directory and double check that you've set up pyenv correct.
When you run python --version, the version should match the version specified at /nira-interview/.python-version, which is 3.9.6
Pyenv works by reading the .python-version file and automaticaly switching to the right python version.
Install poetry https://python-poetry.org/docs/
Configure Poetry to create .venv folders in the project poetry config virtualenvs.in-project true
Navigate into the pipeline folder cd /nira-interview/pipeline
Install dependencies poetry install
Activate the virtual environment. poetry shell
Double check that the right version of python is being used in the virtual environment. python --version
Make sure that the dependencies were installed. poetry show. If you're on an M1 mac, you might need to do some additional steps. Honestly, I don't have this set up yet so I don't know what those steps are, just know that you're not alone.
Spin up dagit. poetry run dagit
Navigate to localhost:3000. You should see dagster running there
In the jobs pane on the left, click the "nira_smoke_test_job" job. Click "Launchpad" and then "Launch run". You should see the job print "Successfully ran smoketest".
Specify python interpreter in VSCode You should open the setting in VScode to "Python: Select interpreter". Input your own path, which should be ./pipeline/.venv/bin/python. Type this in manually, don't use the file selector.
You should be ready if you get here

Introduction to Dagster

Dagster is an open source tool we use to orchestrate our pipelines. You can learn more about Dagster at dagster.io. They're an awesome company.

Dagster jobs are essentially a list of steps written in pipeline. Each step is called an op. If you open smoke_test_job.py, you'll see the nira_smoke_test_job python definition which is annotated with @job.

The job is made from a series of calls to ops. The two ops are also defined there. The output of smoke_test_op1 is passed into smoke_test_op2.

Its that easy, Dagster jobs are constructed from ops.

Your assignment

Your task is to edit the interview_job defined in interview_job.py. First, lets see whats going on inside of interview_job.

First, we read in a raw CSV of buses we need to run the pipeline on in raw_buses_to_run.
Then we calculate the MW available for each bus get_mw_available_for_each_bus_very_slow. You can see in the code that calculating this takes 5 minutes per bus! Super slow.
Then we convert MW to GW in add_gw_available_column.
Lastly, we write the final DF to disk in output_interview_job.

This pipeline has already been run and has results inside of pipeline/interview_job/output. This pipeline has been run the slow way with the initial set of buses.

Modifications needed

Sometimes, we have a new bus we need to run as well. But we don't want to rerun all the buses because that's too slow.

Your task is to figure out how to construct this pipeline so that we don't have to rerun all the buses, only the new ones, while still outputting one single CSV to disk.

A few constraints:

You can tweak get_mw_available_for_each_bus_very_slow for testing purposes, but you are not allowed to change the code inside this file in the final submission. Don't get clever and just decrease the sleep() call to one second.
We are only ever adding new buses, you do not need to worry about buses being removed.
For any given bus, the values calculated in get_mw_available_for_each_bus_very_slow will always be exactly the same (you can see this in the code).

Final deliverable:

Send over a link to the forked repo you made the modifications in.
Inside raw_buses_to_run.py, comment out line 4, and uncomment line 5. This will switch the raw buses csv to a new csv. You can go look in the CSVs, the only difference is one additional bus in the new one. Remember, there will only ever be bus additions in the new csv.
There should be only one new file inside the /output folder that contains all the buses results for the buses defined in new_raw_buses.csv. You should delete the original csv in the output folder that the repo started with. There should never be 2 csv's in the output folder.
Any new ops you need should be added to interview_job.py and also be implemented in their own file in the /ops folder.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
pipeline		pipeline
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.python-version		.python-version
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Get setup to develop for the take home assignment.

Introduction to Dagster

Your assignment

Modifications needed

About

Releases

Packages

Languages

lindsaybolz/nira-interview

Folders and files

Latest commit

History

Repository files navigation

Get setup to develop for the take home assignment.

Introduction to Dagster

Your assignment

Modifications needed

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages