generated from hackforla/.github-hackforla-base-repo-template
-
-
Notifications
You must be signed in to change notification settings - Fork 17
99 Neighborhood Council Websites Technologies Used Analysis
Bonnie Wolfe edited this page May 7, 2022
·
11 revisions
Project to create a scraper to get information from builtwith.com on technologies used by 99 neighborhood council website.
Automate scrape job to run periodically.
- Be able to run script on demand
- Gather the following information
- Name of Tech on each NC site
- URL of Tech
- Category of Tech
- Total NCs using Tech
- Total Catagories
- 2021-07-01 New issue created https://github.com/hackforla/data-science/issues/44
- 2021-07-08 Accessing the API didn’t return the info required for the project, so we will use selenium to scrape
- 2021-08-09 Sophia ran a video tutorial session on scraping with Selenium and shared some starter code. Next steps is for the person assigned to issue to parse the output into a usable format and save it as a file
- 2021-08-30 Abe joined as the Cop PM and said he would get up to speed and then move the issue forward.
- 2021-09-13 Rajinder assigned
OCS: Builtwith data on 99 NCs technologies
- Builtwith
- https://builtwith.com
-
builtwith API
- API limitations: Some sites, are resistant to being crawled (WordPress, for instance https://atwatervillage.org/calendar/). So what we need is a list of all the sites that can't be put through the sitemap maker. See notes about WordPress site crawling: https://community.funnelback.com/knowledge-base/implementation/Gather-And-Index/integration/crawl-wordpress-sites
- Selenium
- Docker
- Target Website List Here - this is one tab on a larger analysis workbook.
- code on data-science repo with Rajinder's code - this will need to be moved to another directory. It has nothing to do with 311. Its a project for Open Community Survey
- Rajinder's personal repo - this seems to be updated more recently than the one on data-science.
@akibrhast, @ava li, @Sarah Williams, @wendywilhelm10 @rajindermavi @ShikaZzz @JessicaFB @Poorvi Rao
@kalyaniraman, @akhaleghi, @ryanswan @salice