-
-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding the API call script to fetch all rows of data for a given year #1257
Conversation
Added more comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does look quite similar to https://github.com/hackforla/311-data/blob/dev/server/dash/app.py#L25. We should be able to use this to download the data. In principle, we don't want to have two versions of code that do the same thing since that increases maintenance cost.
To do this, we should refactor that method above into a file like "server/utils/data_collection.py". Then, we can create a binary like "server/utils/get_request_data_csv.py" that takes absl flags for start_date, end_date, and output_csv_path. That binary will invoke the helper method in data_collection.py.
I know this might sound complicated, so feel free to ask for help. Also happy to have a video call to discuss it.
Also, please use an autoformatter in your code editor.
APIcall/api311_2021.py
Outdated
# -*- coding: utf-8 -*- | ||
""" | ||
Created on Sat Jun 4 17:49:23 2022 | ||
|
||
@author: AdithiPriya | ||
|
||
""" | ||
# import the necessary packages... | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get rid of these comments
APIcall/api311_2021.py
Outdated
|
||
data_2021_df=data_2021_df.reset_index(drop=True) # reindexing... | ||
|
||
data_2021_df.to_csv('clean_311_data_2021.csv') # saving the dataframe as a csv file... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this an argument to the binary using https://abseil.io/docs/python/guides/flags. This will require you to create a main() method and and such, as in the example from the link.
You can create a helper function that encapsulates the code that you've already written. Then, in main(), you can call that helper function with the absl flags.
To expand on my earlier comment: when writing code, it's common to have libraries and binaries. Libraries are modules that contain functions or classes that other modules can import and use. Binaries are executables; they often import other libraries to accomplish a certain task. So for this PR, we want to create a library that contains the existing function To be clear, your code is perfectly functional. But ideally, when contributing to production systems, we want to make sure that we are not duplicating code in two places for maintainability. Furthermore, it allows us to write unit tests once, and have a single source of truth for a particular feature. |
Adding the existing function- batch_get_data -https://github.com/hackforla/311-data/blob/dev/server/dash/app.py here and renaming the file as data_collection.py
I am able to run the api_311 call function from the command line by passing the start_date, end_date, skip and limit as the arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Anupriya!
Can we move this to "server/utils/get_request_data_csv.py"? Also mentioned some other things to you offline--README, Pipfile.
Can we also add unit tests here? It doesn't need to be too complicated. You can look at the Prefect unit tests as an example.
APIcall/data_collection.py
Outdated
|
||
def api_311(start_date, end_date, skip, limit): | ||
skip = 0 | ||
limit = 10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please create a module-level constant for this, e.g., REQUESTS_BATCH_SIZE = 10000
. You can put it under the imports.
APIcall/data_collection.py
Outdated
import argparse | ||
from datetime import date | ||
|
||
def api_311(start_date, end_date, skip, limit): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a docstring. See details here: https://google.github.io/styleguide/pyguide.html#383-functions-and-methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just a few small comments.
server/utils/README.md
Outdated
@@ -0,0 +1,7 @@ | |||
# **API call to fetch 311 request data from the 311 data server for a given start date and end date.** | |||
|
|||
### The 311 request data from the [lacity.org](https://data.lacity.org/browse?q=MyLA311%20Service%20Request%20Data%20&sortBy=relevance) has 34 columns. The get_request_data_csv.py script can be run from the command line passing the arguments- start_date and end_date that lets you retreive the 311 request data from the [311 data server](https://dev-api.311-data.org/docs). The 311 server processes the data from the lacity.org. The data cleaning procedure is mentioned [here](https://github.com/hackforla/311-data/blob/dev/docs/data_loading.md). The dataframe that is returned can be saved as a csv file. A preview of the data_final dataframe is printed in the command line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: to be clear, we are not returning a dataframe. We don't even need to mention that a dataframe is being produced, since it's an intermediate result. We can just say "The result is written to a csv file". We should also explicitly say what the path to the output csv file is.
server/utils/README.md
Outdated
|
||
### The 311 request data from the [lacity.org](https://data.lacity.org/browse?q=MyLA311%20Service%20Request%20Data%20&sortBy=relevance) has 34 columns. The get_request_data_csv.py script can be run from the command line passing the arguments- start_date and end_date that lets you retreive the 311 request data from the [311 data server](https://dev-api.311-data.org/docs). The 311 server processes the data from the lacity.org. The data cleaning procedure is mentioned [here](https://github.com/hackforla/311-data/blob/dev/docs/data_loading.md). The dataframe that is returned can be saved as a csv file. A preview of the data_final dataframe is printed in the command line. | ||
|
||
### Example: python get_311_request_data_csv.py "2021-01-01" "2021-01-03" will return 261 rows and 15 columns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: wrap code and commands in "`".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if this is what you wanted- "" python get_311_request_data_csv.py "2021-01-01" "2021-01-03" "
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks for your patience Anupriya!
Thank you Nich for reviewing this PR! Have learnt a lot. |
Fixes #107
dev
branchAny questions? See the getting started guide