Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the API call script to fetch all rows of data for a given year #1257

Merged
merged 18 commits into from
Sep 21, 2022
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions server/utils/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# **API call to fetch 311 request data from the 311 data server for a given start date and end date.**
priyakalyan marked this conversation as resolved.
Show resolved Hide resolved

### The 311 request data from the [lacity.org](https://data.lacity.org/browse?q=MyLA311%20Service%20Request%20Data%20&sortBy=relevance) has 34 columns. The get_request_data_csv.py script can be run from the command line passing the arguments- start_date and end_date that lets you retreive the 311 request data from the [311 data server](https://dev-api.311-data.org/docs). The 311 server processes the data from the lacity.org. The data cleaning procedure is mentioned [here](https://github.com/hackforla/311-data/blob/dev/docs/data_loading.md). The dataframe that is returned can be saved as a csv file. A preview of the data_final dataframe is printed in the command line.
priyakalyan marked this conversation as resolved.
Show resolved Hide resolved
priyakalyan marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: to be clear, we are not returning a dataframe. We don't even need to mention that a dataframe is being produced, since it's an intermediate result. We can just say "The result is written to a csv file". We should also explicitly say what the path to the output csv file is.


### Example: python get_311_request_data_csv.py "2021-01-01" "2021-01-03" will return 261 rows and 15 columns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: wrap code and commands in "`".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this is what you wanted- "" python get_311_request_data_csv.py "2021-01-01" "2021-01-03" ""


![image](https://user-images.githubusercontent.com/10836669/188473763-52bc9474-0878-432c-b4e8-6e4ff21dcda2.png)
46 changes: 46 additions & 0 deletions server/utils/get_request_data_csv.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
import pandas as pd
priyakalyan marked this conversation as resolved.
Show resolved Hide resolved
import requests
import argparse

REQUESTS_BATCH_SIZE = 10000

def get_311_request_data(start_date, end_date):
"""Fetches 311 request from the 311 data server.
priyakalyan marked this conversation as resolved.
Show resolved Hide resolved

Retreives 311 requests from the 311 data server for a given start_date and end_date.
priyakalyan marked this conversation as resolved.
Show resolved Hide resolved

Args:
start_date: The date from which the 311 request data have to be collected. Datatype: Datetime.
end_date: The date upto which the 311 request data have to be fetched. Datatype: Datetime.

Return:
Dataframe data_final is returned with 15 columns. The dataframe is saved as a CSV file ('data_final.csv') in the current directory.
"""

skip = 0
all_requests = []
while True:
url=f'https://dev-api.311-data.org/requests?start_date={start_date}&end_date={end_date}&skip={skip}&limit={REQUESTS_BATCH_SIZE}'
response = requests.get(url)
data = response.json()
all_requests.extend(data)
skip += REQUESTS_BATCH_SIZE
if len(data) < skip:
break
data_final = pd.DataFrame(all_requests)
data_final.sort_values(by='createdDate', inplace = True, ignore_index = True)
data_final.to_csv('data_final.csv')
return data_final

def main():
parser = argparse.ArgumentParser(description='Gets 311 request data from the server')
parser.add_argument('start_date', type=str, help='The start date that has to be entered')
parser.add_argument('end_date', type=str, help='The end data that has to be entered')
args = parser.parse_args()
start_date = args.start_date
end_date = args.end_date
data_final = get_311_request_data(start_date, end_date)
print(data_final)

if __name__ == "__main__":
main()