Adding the API call script to fetch all rows of data for a given year #1257

priyakalyan · 2022-06-23T16:55:28Z

Fixes #107

Up to date with dev branch
Branch name follows guidelines
All PR Status checks are successful
Peer reviewed and approved

Any questions? See the getting started guide

Added more comments

nichhk

This does look quite similar to https://github.com/hackforla/311-data/blob/dev/server/dash/app.py#L25. We should be able to use this to download the data. In principle, we don't want to have two versions of code that do the same thing since that increases maintenance cost.

To do this, we should refactor that method above into a file like "server/utils/data_collection.py". Then, we can create a binary like "server/utils/get_request_data_csv.py" that takes absl flags for start_date, end_date, and output_csv_path. That binary will invoke the helper method in data_collection.py.

I know this might sound complicated, so feel free to ask for help. Also happy to have a video call to discuss it.

Also, please use an autoformatter in your code editor.

nichhk · 2022-06-23T20:21:50Z

APIcall/api311_2021.py

+# -*- coding: utf-8 -*-
+"""
+Created on Sat Jun  4 17:49:23 2022
+
+@author: AdithiPriya
+
+"""
+# import the necessary packages...
+


We can get rid of these comments

APIcall/api311_2021.py

nichhk · 2022-06-23T20:24:06Z

APIcall/api311_2021.py

+
+data_2021_df=data_2021_df.reset_index(drop=True)         # reindexing...
+
+data_2021_df.to_csv('clean_311_data_2021.csv')   # saving the dataframe as a csv file...


Please make this an argument to the binary using https://abseil.io/docs/python/guides/flags. This will require you to create a main() method and and such, as in the example from the link.

You can create a helper function that encapsulates the code that you've already written. Then, in main(), you can call that helper function with the absl flags.

APIcall/api311_2021.py

nichhk · 2022-06-24T00:36:16Z

To expand on my earlier comment: when writing code, it's common to have libraries and binaries. Libraries are modules that contain functions or classes that other modules can import and use. Binaries are executables; they often import other libraries to accomplish a certain task.

So for this PR, we want to create a library that contains the existing function batch_get_data (i.e., we should remove batch_get_data from that file and put it into a new file, "server/utils/data_collection.py"). Then, we want to create our executable file, aka the binary, called "server/utils/get_request_data_csv.py". In this file, we can call batch_get_data from "data_collection.py", and then write the output of batch_get_data to a csv.

To be clear, your code is perfectly functional. But ideally, when contributing to production systems, we want to make sure that we are not duplicating code in two places for maintainability. Furthermore, it allows us to write unit tests once, and have a single source of truth for a particular feature.

Adding the existing function- batch_get_data -https://github.com/hackforla/311-data/blob/dev/server/dash/app.py here and renaming the file as data_collection.py

I am able to run the api_311 call function from the command line by passing the start_date, end_date, skip and limit as the arguments.

nichhk

Thanks Anupriya!

Can we move this to "server/utils/get_request_data_csv.py"? Also mentioned some other things to you offline--README, Pipfile.

Can we also add unit tests here? It doesn't need to be too complicated. You can look at the Prefect unit tests as an example.

APIcall/data_collection.py

nichhk · 2022-08-29T19:02:45Z

APIcall/data_collection.py

+
+def api_311(start_date, end_date, skip, limit):
+    skip = 0
+    limit = 10000


please create a module-level constant for this, e.g., REQUESTS_BATCH_SIZE = 10000. You can put it under the imports.

APIcall/data_collection.py

nichhk · 2022-08-29T19:04:19Z

APIcall/data_collection.py

+import argparse
+from datetime import date
+
+def api_311(start_date, end_date, skip, limit):


Please add a docstring. See details here: https://google.github.io/styleguide/pyguide.html#383-functions-and-methods

APIcall/data_collection.py

…ta_csv.py

nichhk

Looks good! Just a few small comments.

server/utils/README.md

nichhk · 2022-09-07T22:29:24Z

server/utils/README.md

@@ -0,0 +1,7 @@
+# **API call to fetch 311 request data from the 311 data server for a given start date and end date.** 
+
+### The 311 request data from the [lacity.org](https://data.lacity.org/browse?q=MyLA311%20Service%20Request%20Data%20&sortBy=relevance) has 34 columns. The get_request_data_csv.py script can be run from the command line passing the arguments- start_date and end_date that lets you retreive the 311 request data from the [311 data server](https://dev-api.311-data.org/docs). The 311 server processes the data from the lacity.org. The data cleaning procedure is mentioned [here](https://github.com/hackforla/311-data/blob/dev/docs/data_loading.md). The dataframe that is returned can be saved as a csv file. A preview of the data_final dataframe is printed in the command line. 


nit: to be clear, we are not returning a dataframe. We don't even need to mention that a dataframe is being produced, since it's an intermediate result. We can just say "The result is written to a csv file". We should also explicitly say what the path to the output csv file is.

nichhk · 2022-09-07T22:29:57Z

server/utils/README.md

+
+### The 311 request data from the [lacity.org](https://data.lacity.org/browse?q=MyLA311%20Service%20Request%20Data%20&sortBy=relevance) has 34 columns. The get_request_data_csv.py script can be run from the command line passing the arguments- start_date and end_date that lets you retreive the 311 request data from the [311 data server](https://dev-api.311-data.org/docs). The 311 server processes the data from the lacity.org. The data cleaning procedure is mentioned [here](https://github.com/hackforla/311-data/blob/dev/docs/data_loading.md). The dataframe that is returned can be saved as a csv file. A preview of the data_final dataframe is printed in the command line. 
+
+### Example: python get_311_request_data_csv.py "2021-01-01" "2021-01-03" will return 261 rows and 15 columns.


Nit: wrap code and commands in "`".

Let me know if this is what you wanted- "" python get_311_request_data_csv.py "2021-01-01" "2021-01-03" ""

server/utils/get_request_data_csv.py

server/utils/README.md

server/utils/get_request_data_csv.py

server/utils/README.md

nichhk

Looks great! Thanks for your patience Anupriya!

priyakalyan · 2022-09-21T16:33:45Z

Thank you Nich for reviewing this PR! Have learnt a lot.

priyakalyan added 3 commits June 23, 2022 09:32

Add files via upload

83066b2

Update api311_2021.py

e1f1d3a

Added more comments

Moving it to the folder APIcall

7b44f35

nichhk self-requested a review June 23, 2022 20:21

nichhk requested changes Jun 23, 2022

View reviewed changes

Update api311_2021.py

98d522f

priyakalyan changed the title ~~Adding the API call script to fetch all rows of data for the year 2021~~ Adding the API call script to fetch all rows of data for a given year Jun 23, 2022

nichhk mentioned this pull request Jun 24, 2022

Make 311 Data's request data easily accessible for data scientists #1259

Closed

2 tasks

nichhk mentioned this pull request Jul 1, 2022

Get a week's worth of requests by default on dev site #1266

Closed

priyakalyan added 3 commits July 13, 2022 17:50

Update and rename api311_2021.py to data_collection.py

b8cafdf

Adding the existing function- batch_get_data -https://github.com/hackforla/311-data/blob/dev/server/dash/app.py here and renaming the file as data_collection.py

Update data_collection.py

eb31154

Update data_collection.py

785c8fe

I am able to run the api_311 call function from the command line by passing the start_date, end_date, skip and limit as the arguments.

priyakalyan mentioned this pull request Aug 12, 2022

CoP: Data Science: Analyze correlations between metro locations and 311-data requests hackforla/data-science#107

Open

22 tasks

Update data_collection.py

6162e85

nichhk requested changes Aug 29, 2022

View reviewed changes

priyakalyan added 5 commits August 29, 2022 12:55

Update and rename data_collection.py to get_request_data_csv.py

7439568

Rename APIcall/get_request_data_csv.py to server/utils/get_request_da…

8e8a2dd

…ta_csv.py

Create README.md

85c1c3f

Update get_request_data_csv.py

059d4b4

Update README.md

3844426

nichhk requested changes Sep 7, 2022

View reviewed changes

priyakalyan added 2 commits September 8, 2022 12:04

Update README.md

fac94d8

Update get_request_data_csv.py

e334bd4

nichhk requested changes Sep 9, 2022

View reviewed changes

server/utils/README.md Outdated Show resolved Hide resolved

server/utils/README.md Outdated Show resolved Hide resolved

server/utils/get_request_data_csv.py Outdated Show resolved Hide resolved

server/utils/get_request_data_csv.py Outdated Show resolved Hide resolved

priyakalyan added 2 commits September 13, 2022 16:03

Update get_request_data_csv.py

559a7f6

Update README.md

8060ec0

nichhk reviewed Sep 21, 2022

View reviewed changes

server/utils/README.md Outdated Show resolved Hide resolved

nichhk approved these changes Sep 21, 2022

View reviewed changes

Update README.md

aad1ac5

priyakalyan merged commit 41124d6 into hackforla:dev Sep 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding the API call script to fetch all rows of data for a given year #1257

Adding the API call script to fetch all rows of data for a given year #1257

priyakalyan commented Jun 23, 2022 •

edited

Loading

nichhk left a comment •

edited

Loading

nichhk Jun 23, 2022

nichhk Jun 23, 2022

nichhk commented Jun 24, 2022

nichhk left a comment

nichhk Aug 29, 2022

nichhk Aug 29, 2022

nichhk left a comment

nichhk Sep 7, 2022

nichhk Sep 7, 2022

priyakalyan Sep 8, 2022

nichhk left a comment

priyakalyan commented Sep 21, 2022


		data_2021_df=data_2021_df.reset_index(drop=True) # reindexing...

		data_2021_df.to_csv('clean_311_data_2021.csv') # saving the dataframe as a csv file...

		@@ -0,0 +1,7 @@
		# API call to fetch 311 request data from the 311 data server for a given start date and end date.

		### The 311 request data from the [lacity.org](https://data.lacity.org/browse?q=MyLA311%20Service%20Request%20Data%20&sortBy=relevance) has 34 columns. The get_request_data_csv.py script can be run from the command line passing the arguments- start_date and end_date that lets you retreive the 311 request data from the [311 data server](https://dev-api.311-data.org/docs). The 311 server processes the data from the lacity.org. The data cleaning procedure is mentioned [here](https://github.com/hackforla/311-data/blob/dev/docs/data_loading.md). The dataframe that is returned can be saved as a csv file. A preview of the data_final dataframe is printed in the command line.


		### The 311 request data from the [lacity.org](https://data.lacity.org/browse?q=MyLA311%20Service%20Request%20Data%20&sortBy=relevance) has 34 columns. The get_request_data_csv.py script can be run from the command line passing the arguments- start_date and end_date that lets you retreive the 311 request data from the [311 data server](https://dev-api.311-data.org/docs). The 311 server processes the data from the lacity.org. The data cleaning procedure is mentioned [here](https://github.com/hackforla/311-data/blob/dev/docs/data_loading.md). The dataframe that is returned can be saved as a csv file. A preview of the data_final dataframe is printed in the command line.

		### Example: python get_311_request_data_csv.py "2021-01-01" "2021-01-03" will return 261 rows and 15 columns.

Adding the API call script to fetch all rows of data for a given year #1257

Adding the API call script to fetch all rows of data for a given year #1257

Conversation

priyakalyan commented Jun 23, 2022 • edited Loading

nichhk left a comment • edited Loading

Choose a reason for hiding this comment

nichhk Jun 23, 2022

Choose a reason for hiding this comment

nichhk Jun 23, 2022

Choose a reason for hiding this comment

nichhk commented Jun 24, 2022

nichhk left a comment

Choose a reason for hiding this comment

nichhk Aug 29, 2022

Choose a reason for hiding this comment

nichhk Aug 29, 2022

Choose a reason for hiding this comment

nichhk left a comment

Choose a reason for hiding this comment

nichhk Sep 7, 2022

Choose a reason for hiding this comment

nichhk Sep 7, 2022

Choose a reason for hiding this comment

priyakalyan Sep 8, 2022

Choose a reason for hiding this comment

nichhk left a comment

Choose a reason for hiding this comment

priyakalyan commented Sep 21, 2022

priyakalyan commented Jun 23, 2022 •

edited

Loading

nichhk left a comment •

edited

Loading