Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoP: Data Science: Analyze correlations between metro locations and 311-data requests #107

Open
22 tasks
ryanmswan opened this issue Apr 22, 2020 · 16 comments · Fixed by hackforla/311-data#1257

Comments

@ryanmswan
Copy link
Contributor

ryanmswan commented Apr 22, 2020

Overview

Investigate whether there are meaningful trends associated with metro stops and metro lines with regards to requests tracked by 311-data in LA County.

Action Items

  • Define requirements for 311 data (adding notes to the resources section and discussing at the Data Science CoP meeting
    • Do you need a one-time or ongoing dump of the data?
    • Do you need subset of data (i.e. certain years) or the entire data set (approx. 4 million rows or 11 GB)?
      • If a subset is needed, please define subset characteristics (i.e. date range, etc.)
    • Do you need online access via an API or a download of data?
    • Add dependency label and put in the icebox until 311 data is provided
  • Find available data sources and add to Resources section below
  • Determine is this is one-time or ongoing project (and assign appropriate label)
  • Write one-sheet
    • Define stakeholder
    • Summarize project including value add
    • Define project 6 month roadmap
    • Detail history (if any)
  • Define tools to be used to visualize combined data
  • Create issues for the following
    • EDA (Exploratory Data Analysis) of metro data
    • Identify correlations between distance from metro stop and request type
    • Determine if correlations observed are solely due to metro stop or are more broadly associated with population density or other factors
    • Combine geolocation data for metro lines with district types
    • Compare correlations/trends between different districts within each type
    • Compare LA county data with other California counties, compare with district types within county. (Post MVP)
    • Compare with statewide trends and within district types. (Post MVP)

Resources

Information about 311 Data here
Access 311 data here
http://geohub.lacity.org/datasets/metro-rail-lines-stops
https://developer.metro.net/docs/gis-data/overview/
District types issue: #118

use 2019 data for 311
streetlights
crime
metrostops

tools
google colab, sklearn, pandas

Work in progress

@github-actions
Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in X days.

@ExperimentsInHonesty ExperimentsInHonesty transferred this issue from hackforla/311-data Sep 18, 2021
@akhaleghi akhaleghi added feature: guide All issues related to guide feature: missing this tags is mutually exclusive with project: missing. Please use the correct label and removed feature: missing this tags is mutually exclusive with project: missing. Please use the correct label labels Nov 5, 2021
@akhaleghi akhaleghi assigned akhaleghi and priyakalyan and unassigned akhaleghi Mar 18, 2022
@akhaleghi
Copy link
Contributor

@priyakalyan please document the following update to this issue in the comments here

Progress: "What is the current status of your project? What have you completed and what is left to do?"
Blockers: "Difficulties or errors encountered."
Availability: "How much time will you have this week to work on this issue?"
ETA: "When do you expect this issue to be completed?"
Pictures (if necessary): "Add any pictures that will help illustrate what you are working on."

@priyakalyan
Copy link
Member

Progress: I added this file Progress summary 311 Data Project and data dictionary for 311 data Value Column 311 data and Metro rail and bus line Value column Metro- Bus and Rail line on 3-31-2022. So far have downloaded the 311 data (not cleaned it yet) and looked at the request type count/relative frequency over the years: 2015-2022 (till March 27th). Also looked at the looked at the request type count for different APCs.

Since then have been out of town up until today, so literally have no further update for the last week.

Plan for the upcoming week:

  • Installing the docker in my windows PC- set up a local 311 data server;
  • look at the metro bus/rail line data; learn about geospatial data analysis-wiki

Availability: 6 hours this week.

ETA- Totally new to geospatial data analysis, so may be 1 to 2 weeks.

@priyakalyan
Copy link
Member

Progress: Was successful in installing the docker but could not set up a local 311 data server (tried many times- the last step in Step 3: Build and seed your local database failed. Any suggestions/pointers? For now, I have stopped working on it. Downloaded data from this website.

Loaded the metro rail line shapefile, the metro bus line shapefile and the neighborhood council shapefile.

Currently working on spatially joining the 311 data and the NC data (looking at one region at a time- 12 in all). Then overlay the metro rail and bus line and plot different request type and do like a qualitative study exploring the request type count geographically.

Availability: 6 hours this week.

ETA- 1 to 2 weeks.

@priyakalyan
Copy link
Member

Progress: Finally figured out to how to use paginated API's with python to fetch all rows of data from the 311 server for the year 2021. I have saved it as a CSV file-clean_311_data_2021. I will fetch the clean data rest of the years (2015-2020, 2022).

Have spatially joined the 311 data+ NC data + metro bus + metro rail line displaying the specific request types over 12 regions of NC.

Adding sample pics here- this is for the region 4- South East Valley- NC's: 'SHERMAN OAKS NC', 'NORTH HOLLYWOOD NORTH EAST NC', 'VAN NUYS NC', 'GREATER VALLEY GLEN', 'NOHO NC', 'NOHO WEST NC', 'STUDIO CITY NC', 'NC VALLEY VILLAGE', 'GREATER TOLUCA LAKE NC'.

Part1
Part2
Part3
Part4
Reg4

Availability: 6 hours this week.

@priyakalyan
Copy link
Member

Progress:

  • Created heat map using folium for the request type- Single streetlight issue (SSI) and multiple street light issue over the reg 6 for the year 2021;
  • Could add all the years of data (2015-2021) for the reg 6 and req type: SSI as layers on the same map and toggle between the layer control to jump from one feature to the next
    one.
  • Successful in setting up geofencing (a block in radius) around each metro rail marker on reg 6. This extent of the geofencing
    can be changed depending upon the requirement.

Plan for the upcoming week:

  • Extract the bounds around each marker using geopandas manipulation- buffer and intersection...
  • Then analyze the number of request type within each of these buffer zones and compare them with the ones outside.

Availability: 6 hours this week.

ETA- 1 week

@nichhk
Copy link
Member

nichhk commented May 23, 2022

The team discussed this last Thursday, so I'll leave some notes for the record:

I think it would be useful to have a histogram where the x axis is "distance from nearest bus stop/metro rail marker/etc." and the y axis is "number of requests". This will allow us to very clearly see whether there is some correlation between nearness to bus stops and 311 requests.

@priyakalyan
Copy link
Member

Used the haversine formula- (great-circle distance) to calculate the distance between each request type-lat, long and metro rail stop. For each request type, found out the distance from the nearest metro rail marker. All this was done for reg6 - year 2021 and request type- Single Streetlight Issue.

As discussed in the last 311 team meeting, here is the histogram plot:

Histogram_reg6_ssi_2021_1

@nichhk
Copy link
Member

nichhk commented Jun 2, 2022

Thanks Anupriya! Sorry for the delay. What do you make of this graph? To me, it seems to suggest that there is not a strong association between distance to nearest metro stop and request frequency--I'd expect to see a (basically) monotonically decreasing histogram, implying that there are a lot of requests close to metro stops but just a few far from metro stops. But maybe a request type like graffiti would be more illuminating.

Another bit that might help us understand this better: what is the density of metro stops? If the density of metro stops is very low, e.g., they are 10km apart from each other, then the median distance from the nearest metro stop of ~500m would be quite close. But if metro stops are 1km apart from each other, then ~500m is pretty far.

With this foundation, I think we can start controlling for factors like population density, bus ridership density, and metro stop density. Does that sound feasible?

@priyakalyan
Copy link
Member

priyakalyan commented Jun 17, 2022

Have been trying to figure out how to get the population of each neighborhood council so that we can figure out the population density and so on. As @piotrsan mentioned in another issue

I also found this: Demographics of Neighborhood Councils. In both these files there are only 97 records- 97 NCs.

The NC boundary has been updated in 2018 with 2 new NC's added- here is the link. I found out the missing council names- NORTH WESTWOOD NC and ARTS DISTRICT LITTLE TOKYO NC.

Next step is to figure out how to go from census block/tract data and adjust it at NC level. This link gives the mapping process to start from block data and reconcile at NC boundary level.

After today's meeting- it looks like starting at census tract will be the easiest way to go. Take the NC shape file and merge it with the census tract and get the geocodes and move on to demographics from there.

@priyakalyan
Copy link
Member

priyakalyan commented Jul 14, 2022

Have calculated the population of each neighborhood council using the census tract 2020 (TIGER/line shapefile 2020), updated NC shape file (99 councils) and the ACS 2020 demographics data at the tract level. No approximation was made in the geometry this time. Found the percentage of area/population for tracts intersecting multiple NCs and then calculated the actual population.

@priyakalyan
Copy link
Member

priyakalyan commented Jul 26, 2022

Worked on this notebook- to find the updated population of the LA city neighborhood councils using geospatial analysis. Next- add a notebook- comparing the updated NC population obtained by geospatial analysis and arcGIS analysis.

@priyakalyan
Copy link
Member

priyakalyan commented Aug 12, 2022

Have updated the notebook. The total population of LA city NCs is very close to the 2021 Census Bureau value. Have also been working on this PR- API pagination using python- to fetch all rows of data from 311 data pipeline for a given year.

@akhaleghi
Copy link
Contributor

Hi @priyakalyan, are there any recent updates to this issue?

@priyakalyan
Copy link
Member

priyakalyan commented Aug 31, 2022

@ExperimentsInHonesty
Copy link
Member

A summary of this should be added to the wiki

@ExperimentsInHonesty ExperimentsInHonesty changed the title Analyze correlations between metro locations and 311-data requests CoP: Data Science: Analyze correlations between metro locations and 311-data requests Jun 18, 2024
@venkata-sai-swathi venkata-sai-swathi self-assigned this Jun 25, 2024
@akhaleghi akhaleghi closed this as completed by moving to Filled in HfLA: Open Roles Sep 23, 2024
@akhaleghi akhaleghi reopened this Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment