Skip to content
Michael edited this page Feb 26, 2019 · 6 revisions

The committee_meetings scraper pulls upcoming House and Senate committee meetings from http://docs.house.gov/Committee and http://www.senate.gov/general/committee_schedules/hearings.xml, respectively. To run the scraper:

./run committee_meetings --force --debug

This outputs two JSON files:

data/committee_meetings_house.json
data/committee_meetings_senate.json.

The House-side scraper is very slow. Each meeting is requested from a separate file whose response time seems to be pretty slow.

Each file contains an array of committee meeting objects which look like:

  {
    "bill_ids": [
      "hr1897-113", 
      "hr1951-113", 
      "hres131-113"
    ], 
    "chamber": "house", 
    "committee": "HSFA", 
    "congress": 113, 
    "guid": "21f0bf90-d28c-420e-9fa2-2e4c327fe5ff", 
    "house_event_id": "100871", 
    "house_meeting_type": "HMKP", 
    "occurs_at": "2013-05-15T23:30:00", 
    "room": "RHOB 2172", 
    "subcommittee": "16", 
    "topic": "Markup of H.R. 1897, H.R. 1951, and H. Res. 131."
  }

The fields are:

  {
    "guid": "21f0bf90-d28c-420e-9fa2-2e4c327fe5ff"
  }

Each meeting is assigned a GUID. If you re-run the scraper (without deleting the output JSON files), the GUIDs will be preserved from run to run so that you can tell when meetings are added or revised. For Senate committee meetings, we preserve the GUID by a heuristic. The House provides stable IDs for meetings. IDs start at 100,031 for meetings since the launch of docs.house.gov.

  {
    "chamber": "house", 
    "congress": 113, 
    "committee": "HSFA", 
    "subcommittee": "16"
  }

The committee holding the meeting is indicated by the committee attribute. If it is a subcommittee meeting, subcommittee will also be set with the integer-like identifier for the subcommittee. See https://github.com/unitedstates/congress-legislators/blob/master/committees-current.yaml for the key to these identifiers.

congress will be set to the number of the current Congress (see Bills).

chamber is house or senate. It is often redundant with the chamber of the committee. But if a joint committee holds a meeting, chamber will be set according to which data source the meeting was specified in.

  {
    "occurs_at": "2013-05-15T23:30:00", 
    "room": "RHOB 2172", 
    "topic": "Markup of H.R. 1897, H.R. 1951, and H. Res. 131."
  }

The date and time of the meeting in local time (occurs_at), the meeting's location (room), and the free-text topic of the meeting as provided by the House or Senate. Most meetings are in one of the congressional office buildings in Washington, DC (putting the time in Eastern Time). We don't try to normalize the room, but you'll often find these abbreviations:

  • SR: Senate Russel Office Building
  • SD: Senate Dirksen Office Building
  • SH: Senate Hart Office Building
  • CHOB: Cannon House Office Building
  • LHOB: Longworth House Office Building
  • RHOB: Rayburn House Office Building
  • CAPITOL: The Capitol Building (often with H or S indicating which side)
  • HVC/SVC: House or Senate side of the Capitol Visitor Center
  {
    "bill_ids": [
      "hr1897-113", 
      "hr1951-113", 
      "hres131-113"
    ] 
  }

Related bills indentified by the same ID format as in Bills. For Senate meetings, we scan the topic for named bills using a regular expression. The House provides related bills data.

  {
    "house_event_id": "100871", 
    "house_meeting_type": "HMKP"
  }

For House meetings, we also pass through the meeting ID and meeting type code from the House. See their naming conventions.

Clone this wiki locally