Skip to content

Data Ops

Bonnie Wolfe edited this page Aug 18, 2025 · 7 revisions

Issues used in the creation of this page

(Some of these issues may be closed or open/in progress.)

Introduction to DataOps

At Hack for LA, DataOps is not just about moving data efficiently — it’s about ensuring the sustainability, accuracy, and usefulness of civic tech data pipelines in a volunteer-driven environment. While the core principles align with industry practices like automation, version control, and cross-functional collaboration, our implementation reflects the unique constraints and opportunities of a civic tech ecosystem.

How HfLA’s DataOps is Similar to Industry

  • Automation of repetitive tasks — We use scripts, cloud workflows, and dashboards to reduce manual intervention in data ingestion, cleaning, and reporting.
  • Version-controlled data schemas and ETL logic — GitHub repositories act as our source of truth for transformation logic, ensuring changes are auditable.
  • Cross-functional workflows — Data engineers, analysts, product managers, and designers work together, using agile-like sprints to align deliverables.

How HfLA’s DataOps Differs from Industry

  • Volunteer-driven execution — Contributors often onboard mid-project, so workflows must be well-documented and easy to learn.
  • Multi-project shared data governance — Centralized assets like the PeopleDepot schema support multiple teams, reducing duplication but requiring stricter change control.
  • Public-sector and open data integration — Our data sources often come from government agencies, requiring additional cleaning, validation, and ethical review before use.
  • Sustainability over speed — Unlike startups chasing rapid iteration, we balance innovation with long-term maintainability, ensuring future volunteers can continue the work.

Examples of DataOps in Action at HfLA

  • PeopleDepot — Centralized, governed data schema that consolidates people, program, and project data for multiple HfLA projects.
  • Food Oasis — Automated processes to validate, clean, and update food resource listings, blending automated checks with human review.
  • TDM Calculator — Structured datasets integrated into a web tool for the City of Los Angeles, with automated testing to ensure data integrity.
  • AI Skills Assessor Pipeline — Processing GitHub issue data through an AI-assisted classification pipeline, applying standardized skill labels and maintaining auditable logs for human-in-the-loop review.
  • Volunteer Engagement Dashboards — Pulling contribution, meeting attendance, and project activity data into Looker dashboards to support product managers in decision-making and resource planning.
  • Shared Drive File Deletion Monitor — Daily automated reporting on file deletions across shared drives, comparing current to prior snapshots and alerting stakeholders when thresholds are exceeded.
  • Product Board & Issue Health Dashboard — Aggregates GitHub issue activity across projects, surfacing backlog health, stale issues, and contributor load to help product managers prioritize work.
  • 311 Data Visualization — Ingests the City of Los Angeles 311 service request dataset from the public open data portal, cleans and standardizes the data, and powers a web app that visualizes requests on an interactive map. Users can filter by neighborhood council boundaries or by service category (e.g., streetlights, graffiti, illegal dumping, bulky item pickup). The DataOps workflow ensures the dataset stays up-to-date, supports performant filtering in the UI, and maintains geographic accuracy for civic engagement and advocacy.

Key Tools & Practices We Use

  • GitHub for code, schema, and ETL version control
  • Google Sheets & Google Apps Script for lightweight data manipulation and reporting
  • Looker for dashboards and data visualization
  • Documented onboarding guides to enable new volunteers to pick up work with minimal disruption
  • Data validation scripts to catch anomalies early in the pipeline

References & Resources

Contributors

Clone this wiki locally