Skip to content

Goodly/CapitolQuery

Repository files navigation

CapitolQuery

CapitolQuery is converting a static archive of the Congressional Record into a research-ready database.

The project, spawned in D-Lab's CTAWG and with the support of the Social Science Research Council, will proceed from June 15th – Sep 15th. The estimated ~200 hours of work will occur across three phases.

Phase I: Acquiring and Cleaning Data

Phase II: Structuring, Chunking, and Tagging the Text

Phase III: Packaging the Data for Researchers and Archives

Each phase of work will result in deliverables demonstrating progress toward (and culminating in a) research-ready pilot database, along with scripts and educational (Jupyter or RMarkdown) notebooks instructing researchers and memory organizations (i.e., archives and libraries) in the process of readying textual data for computational text analysis projects. (Significant elements of Phase I and II are already completed and we can expect some consultation help for Phase III.)

Contribute :)

Ideal contributors will be adept at coding in python and/or R, and writing technical curriculum for an audience at beginner/intermediate skill level. Expertise in XML (for the creation of the Phase III database) is a plus. The team will be managed (as lightly as possible) by GoodlyLabs Conductor, Nick Adams, and can expect authorship credit on all their products.

To begin contributing:

  1. Read the Statement of Work document: 'SSRC_Goodly_SoW_Funded.pdf'
  2. Email Nick to let him know what you would like to work on: nickbadams [at] gmail.com
  3. Request access to our files at: https://drive.google.com/drive/folders/0B7dPnKIP7WrQdUtLeFlFenpsZDA

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published