Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the API call script to fetch all rows of data for a given year #1257

Merged
merged 18 commits into from
Sep 21, 2022

Conversation

priyakalyan
Copy link
Member

@priyakalyan priyakalyan commented Jun 23, 2022

Fixes #107

  • Up to date with dev branch
  • Branch name follows guidelines
  • All PR Status checks are successful
  • Peer reviewed and approved

Any questions? See the getting started guide

@nichhk nichhk self-requested a review June 23, 2022 20:21
Copy link
Member

@nichhk nichhk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does look quite similar to https://github.com/hackforla/311-data/blob/dev/server/dash/app.py#L25. We should be able to use this to download the data. In principle, we don't want to have two versions of code that do the same thing since that increases maintenance cost.

To do this, we should refactor that method above into a file like "server/utils/data_collection.py". Then, we can create a binary like "server/utils/get_request_data_csv.py" that takes absl flags for start_date, end_date, and output_csv_path. That binary will invoke the helper method in data_collection.py.

I know this might sound complicated, so feel free to ask for help. Also happy to have a video call to discuss it.

Also, please use an autoformatter in your code editor.

Comment on lines 1 to 9
# -*- coding: utf-8 -*-
"""
Created on Sat Jun 4 17:49:23 2022

@author: AdithiPriya

"""
# import the necessary packages...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get rid of these comments

APIcall/api311_2021.py Outdated Show resolved Hide resolved
APIcall/api311_2021.py Outdated Show resolved Hide resolved

data_2021_df=data_2021_df.reset_index(drop=True) # reindexing...

data_2021_df.to_csv('clean_311_data_2021.csv') # saving the dataframe as a csv file...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this an argument to the binary using https://abseil.io/docs/python/guides/flags. This will require you to create a main() method and and such, as in the example from the link.

You can create a helper function that encapsulates the code that you've already written. Then, in main(), you can call that helper function with the absl flags.

APIcall/api311_2021.py Outdated Show resolved Hide resolved
@priyakalyan priyakalyan changed the title Adding the API call script to fetch all rows of data for the year 2021 Adding the API call script to fetch all rows of data for a given year Jun 23, 2022
@nichhk
Copy link
Member

nichhk commented Jun 24, 2022

To expand on my earlier comment: when writing code, it's common to have libraries and binaries. Libraries are modules that contain functions or classes that other modules can import and use. Binaries are executables; they often import other libraries to accomplish a certain task.

So for this PR, we want to create a library that contains the existing function batch_get_data (i.e., we should remove batch_get_data from that file and put it into a new file, "server/utils/data_collection.py"). Then, we want to create our executable file, aka the binary, called "server/utils/get_request_data_csv.py". In this file, we can call batch_get_data from "data_collection.py", and then write the output of batch_get_data to a csv.

To be clear, your code is perfectly functional. But ideally, when contributing to production systems, we want to make sure that we are not duplicating code in two places for maintainability. Furthermore, it allows us to write unit tests once, and have a single source of truth for a particular feature.

Adding the existing function- batch_get_data -https://github.com/hackforla/311-data/blob/dev/server/dash/app.py here and renaming the file as data_collection.py
I am able to run the api_311 call function from the command line by passing the start_date, end_date, skip and limit as the arguments.
Copy link
Member

@nichhk nichhk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Anupriya!

Can we move this to "server/utils/get_request_data_csv.py"? Also mentioned some other things to you offline--README, Pipfile.

Can we also add unit tests here? It doesn't need to be too complicated. You can look at the Prefect unit tests as an example.

APIcall/data_collection.py Outdated Show resolved Hide resolved

def api_311(start_date, end_date, skip, limit):
skip = 0
limit = 10000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please create a module-level constant for this, e.g., REQUESTS_BATCH_SIZE = 10000. You can put it under the imports.

APIcall/data_collection.py Outdated Show resolved Hide resolved
import argparse
from datetime import date

def api_311(start_date, end_date, skip, limit):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APIcall/data_collection.py Outdated Show resolved Hide resolved
APIcall/data_collection.py Outdated Show resolved Hide resolved
APIcall/data_collection.py Outdated Show resolved Hide resolved
APIcall/data_collection.py Outdated Show resolved Hide resolved
APIcall/data_collection.py Outdated Show resolved Hide resolved
APIcall/data_collection.py Outdated Show resolved Hide resolved
Copy link
Member

@nichhk nichhk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few small comments.

server/utils/README.md Outdated Show resolved Hide resolved
server/utils/README.md Outdated Show resolved Hide resolved
server/utils/README.md Outdated Show resolved Hide resolved
@@ -0,0 +1,7 @@
# **API call to fetch 311 request data from the 311 data server for a given start date and end date.**

### The 311 request data from the [lacity.org](https://data.lacity.org/browse?q=MyLA311%20Service%20Request%20Data%20&sortBy=relevance) has 34 columns. The get_request_data_csv.py script can be run from the command line passing the arguments- start_date and end_date that lets you retreive the 311 request data from the [311 data server](https://dev-api.311-data.org/docs). The 311 server processes the data from the lacity.org. The data cleaning procedure is mentioned [here](https://github.com/hackforla/311-data/blob/dev/docs/data_loading.md). The dataframe that is returned can be saved as a csv file. A preview of the data_final dataframe is printed in the command line.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: to be clear, we are not returning a dataframe. We don't even need to mention that a dataframe is being produced, since it's an intermediate result. We can just say "The result is written to a csv file". We should also explicitly say what the path to the output csv file is.


### The 311 request data from the [lacity.org](https://data.lacity.org/browse?q=MyLA311%20Service%20Request%20Data%20&sortBy=relevance) has 34 columns. The get_request_data_csv.py script can be run from the command line passing the arguments- start_date and end_date that lets you retreive the 311 request data from the [311 data server](https://dev-api.311-data.org/docs). The 311 server processes the data from the lacity.org. The data cleaning procedure is mentioned [here](https://github.com/hackforla/311-data/blob/dev/docs/data_loading.md). The dataframe that is returned can be saved as a csv file. A preview of the data_final dataframe is printed in the command line.

### Example: python get_311_request_data_csv.py "2021-01-01" "2021-01-03" will return 261 rows and 15 columns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: wrap code and commands in "`".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this is what you wanted- "" python get_311_request_data_csv.py "2021-01-01" "2021-01-03" ""

server/utils/get_request_data_csv.py Outdated Show resolved Hide resolved
server/utils/get_request_data_csv.py Outdated Show resolved Hide resolved
server/utils/get_request_data_csv.py Show resolved Hide resolved
server/utils/README.md Outdated Show resolved Hide resolved
server/utils/README.md Outdated Show resolved Hide resolved
server/utils/get_request_data_csv.py Outdated Show resolved Hide resolved
server/utils/get_request_data_csv.py Outdated Show resolved Hide resolved
server/utils/README.md Outdated Show resolved Hide resolved
Copy link
Member

@nichhk nichhk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for your patience Anupriya!

@priyakalyan
Copy link
Member Author

Thank you Nich for reviewing this PR! Have learnt a lot.

@priyakalyan priyakalyan merged commit 41124d6 into hackforla:dev Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CoP: Data Science: Analyze correlations between metro locations and 311-data requests
2 participants