Skip to content

Conversation

@Adaakal
Copy link
Member

@Adaakal Adaakal commented Sep 16, 2025

Status (WIP)

New script 311-data/webscraping/builtwith_api_scrape.py (BuiltWith/RapidAPI path).

Reads URLs from the wide NCsurvey.csv (“NC URL (if avail)” row) and iterates all 97 domains.

Current stop point: completes the lookups but crashes when building the CSVs → KeyError: ['technology'].
Why: Free BuiltWith/RapidAPI response doesn’t always include technologies in the fields our parser expected.
Next steps:

Make parser tolerant of multiple response shapes.

Guard output step so it writes empty CSVs when no rows are returned.

(If org has a full BuiltWith API key, run again to populate tech tables.)

@Adaakal Adaakal mentioned this pull request Sep 16, 2025
16 tasks
@Adaakal
Copy link
Member Author

Adaakal commented Oct 21, 2025

Update (Oct 20, 2025): HTML heuristic refresh

Re-ran widget_probe_min.py (no API) across all 97 NC sites.

Current widget counts:

has_calendar: 63
has_chatbot: 2
has_search: 19
has_translation: 13

Spot checks (first few domains per category):

  • Calendar: atwatervillage.org, babcnc.org, bhnc.net, canndunc.org, chnc.org, centralsanpedro.org, chatsworthcouncil.org, cspnc.org
  • Chatbot: ncwpdr.org, whcouncil.org
  • Search: chnc.org, dlanc.com, echoparknc.com, glassellparknc.org, cypressparknc.com, greaterwilshire.org, hcnnc.org, hhwnc.org
  • Translation: babcnc.org, myevrnc.com, cypressparknc.com, hcnnc.org, marvista.org, nohowest.org, prnc.org, soronc.org

Notes:

  • Heuristic looks for common calendar/chat/search/translation markers in HTML.

  • The script doesn’t need API keys and doesn’t depend on third-party services. Anyone can rerun it quickly on their machine to reproduce your numbers.

  • Future work: compare what our HTML rules detect against BuiltWith’s widgets group/categories for the same domains. That tells us how complete/accurate our heuristic is and where it misses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant