Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoP: Data Science: Create district types reusable tool (API, single dataset, etc.) #118

Open
5 of 7 tasks
ExperimentsInHonesty opened this issue Sep 28, 2021 · 16 comments
Open
5 of 7 tasks
Assignees
Labels
epic feature: guide All issues related to guide role: data analysis size: epic size: 1pt Can be done in 6 hours or less

Comments

@ExperimentsInHonesty
Copy link
Member

ExperimentsInHonesty commented Sep 28, 2021

Overview

We need to create a tool so that each project at H4LA that renders points on a map can use District Files to help people analyze or view the data.

Action Items

  • Identify large groups/districts
  • Identify links for groups/districts
  • Locate and obtain shape files for these districts Obtain Shape Files for Different District Types as of Nov/Dec 2021 #124
  • Determine what files types we will make these available (shp, npm, and/or GeoJSON)
  • Put files in GitHub repository so they are available to use in the organization.
  • research how we will create a data set out of this info that will be self updating (meaning are there apis for these groups)
  • ...

Resources

Example Neighborhood Council Shape File

Initial Identification of Large Groups/Districts

@ExperimentsInHonesty ExperimentsInHonesty transferred this issue from hackforla/311-data Sep 28, 2021
@akhaleghi akhaleghi added role: data analysis size: missing feature: guide All issues related to guide feature: missing this tags is mutually exclusive with project: missing. Please use the correct label and removed feature: missing this tags is mutually exclusive with project: missing. Please use the correct label labels Nov 2, 2021
@ExperimentsInHonesty ExperimentsInHonesty changed the title Define district types Create district types reusable tool (API, single dataset, etc.) Nov 9, 2021
@akhaleghi akhaleghi added size: 1pt Can be done in 6 hours or less and removed size: missing labels Nov 12, 2021
@ExperimentsInHonesty
Copy link
Member Author

ExperimentsInHonesty commented Dec 7, 2021

create a npm package for delivering the data. We need to get a backend person involved and we need to make one for each time they change, so la-shape-files-2021, la-shape-files-2022

@ExperimentsInHonesty
Copy link
Member Author

next steps are talking to 311 team, tdm team, food oasis, luckparking

@akhaleghi
Copy link
Contributor

Feedback from Mike Morgan on 12/9: Since the shape files for the various districts are small enough (less than 50MB, see here), they can be stored in a repository. We should also consider making these available as npm and GeoJSON.

@akhaleghi
Copy link
Contributor

akhaleghi commented Mar 18, 2022

Notes from 3/11 meeting with Abe, Bonnie, John (Food Oasis) and Mike:

Food Oasis uses PostGRES DB's own geometry data type to run scripts, and then converts to geojson to send to client.

  • Can take lat/lon and returns NC
  • Can render a neighborhood on a map

PostGRES can also consume geojson to convert to its proprietary geometric data type.

The recording of the meeting

@ExperimentsInHonesty
Copy link
Member Author

This issue will have to get re-written to check and see if the shape files are out of date. But the programming using the shape files, should be built first, given that up to date shape files, with no programming is useless.

@akhaleghi
Copy link
Contributor

Next steps: Create a script that can be run to automate downloading the various shape files from the various district types listed above. We will want to note the data the files was last updated and the date the file was downloaded.

@parcheesime parcheesime self-assigned this Mar 19, 2024
@parcheesime
Copy link
Member

Update on issue #118, district types reusable tool:

  • Familiarization: I conducted a review of each target site so that I can understand the layout, available data, and challenges in data extraction.

  • APIs: Looked for available apis to simplify the extraction process.

  • Created a spreadsheet to keep tabs on each site

  • Initiated a Jupyter Notebook to document coding and data collection/automation.

@parcheesime
Copy link
Member

Using the GeoHub L.A. website I programmatically created shape files:
Data Acquisition: Utilizing the GeoHub LA website, I identified and accessed URL endpoints for the API calls corresponding to our project's requirements.
Data Extraction: Through programmatic queries, I fetched JSON data from the different district API endpoints, capturing geographical information such as boundaries, points of interest, and administrative divisions.
Shapefile Creation: Using the gathered JSON data, I made shapefiles, the geospatial data format compatible with various GIS software and tools.
Compression Exploration: To optimize storage and handling of the shapefiles, I'm trying out compressing the data using TruncatedSVD.

@parcheesime
Copy link
Member

Update:
Data Acquisition, Extraction, Shapefile, and compression exploration can be accessed in my repo, HERE

This week I will look into how we can run the data collection script on a quarterly basis and have it collect in Google Drive and/or GitHub, or whatever is best for the team.

@parcheesime
Copy link
Member

Here's an update on data acquisition and extraction of district shape files:

Update on the Shape File Automation Project

  • Implemented Google Drive functions to add files directly to Google Drive.
  • Updated the main function to create shape files with new functionalities.
  • Explored automation options using Google Cloud Functions for continuous data collection of district shape files.

Consideration:

  • Google Cloud Function seems to be a viable solution for automating the data collection process. However, it requires setting up with a credit card. I will investigate if Hack for LA has an account or could provide a credit card for this purpose.

Next Steps:

  • Confirm the availability of a credit card or an existing Google Cloud account through Hack for LA.
  • If available, proceed with setting up the Google Cloud Function.
  • Test the entire automation workflow to ensure everything is functioning as expected.
  • Or investigate other automation avenues.

I've also pushed all recent updates to the repository, and you can check the latest commits for detailed changes.

@parcheesime
Copy link
Member

Project Update:

  • A GitHub workflow has successfully been integrated to automatically update files in my Google Drive.
  • Adjustments made in main script to ensure compatibility with the GitHub workflow.
  • Secrets have been configured for Google API JSON file and Google Drive Folder ID.
  • I will update the ID to our HFLA Google Drive Folder.
  • Automation is set for every other month on the first of the month.
  • Current updates to the repository

I can adjust the code to update a GitHub folder. We can do both Google Drive and GitHub, need be.

@parcheesime
Copy link
Member

This week I refined the setup of environment variables to enhance both local development and CI/CD workflows in GitHub Actions. By leveraging os.getenv() for securely accessing environment variables I've streamlined the development process significantly. This ensures that applications run smoothly with the necessary configurations without hardcoding sensitive information.

Additionally, I've discussed with our project manager about updating the top-level Google folder structure. This change aims to improve the automation process for storing shape files.

District Data Collection Repo

@akhaleghi akhaleghi moved this to In progress (actively working) in CoP: Data Science: Project Board Jun 10, 2024
@ExperimentsInHonesty ExperimentsInHonesty closed this as completed by moving to Filled in HfLA: Open Roles Jun 18, 2024
@github-project-automation github-project-automation bot moved this from In progress (actively working) to Done in CoP: Data Science: Project Board Jun 18, 2024
@github-project-automation github-project-automation bot moved this from Done to In progress (actively working) in CoP: Data Science: Project Board Jun 18, 2024
@ExperimentsInHonesty ExperimentsInHonesty changed the title Create district types reusable tool (API, single dataset, etc.) CoP: Data Science: Create district types reusable tool (API, single dataset, etc.) Jun 18, 2024
@parcheesime
Copy link
Member

I gathered all the information for transferring my current repo with the District Shape File pipeline, into a new repo established in Hack for LA account for housing the shape data. Below are the steps involved. The transfer will be completed within the week. In the meantime, shape file data is in Hack for LA Google Drive.

Steps for Repository Transfer

The following steps have been determined for transferring the repository associated with the district data collection:

  1. Prepare New Repository

    • A new empty repository has been established to house the district data collection.
  2. ETL Process Completion

    • The ETL process has been completed in my current repository.
  3. Code Transfer Process

    • Clone the new repository locally.
    • Add the new repository as a remote to the existing project.
    • Pull the latest code from the current (old) repository.
    • Push the code to the new repository.
  4. Transfer Automation Components

    • Transfer GitHub Actions and secrets necessary for pipeline automation.
  5. Update Documentation

    • The README file will be updated to reflect changes and provide guidance for the new repository setup.

@akhaleghi
Copy link
Contributor

@parcheesime Is there still work to be done on this issue or is it complete?

@parcheesime
Copy link
Member

@akhaleghi I've successfully tested adding the Los Angeles district shape data in my own repository, complete with a README and automated scripts running on schedule. How we can integrate this into the Hack for L.A. repository. Should we create a dedicated directory like LA_District_ShapeFiles for the data?

@parcheesime
Copy link
Member

@akhaleghi I've successfully tested adding the Los Angeles district shape data in my own repository, complete with a README and automated scripts running on schedule. How we can integrate this into the Hack for L.A. repository. Should we create a dedicated directory like LA_District_ShapeFiles for the data?

Follow-up: @akhaleghi I have the data updating on my personal repository. I will need assistance in adding my project to our data science repo. @salice may have made a one but it was awhile ago before the repository updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic feature: guide All issues related to guide role: data analysis size: epic size: 1pt Can be done in 6 hours or less
Projects
Status: In progress (actively working)
Status: Filled
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants