-
Notifications
You must be signed in to change notification settings - Fork 204
Committee Meetings
The committee_meetings scraper pulls upcoming House and Senate committee meetings from http://docs.house.gov/Committee and http://www.senate.gov/general/committee_schedules/hearings.xml, respectively. To run the scraper:
./run committee_meetings --force --debug
This outputs two JSON files:
data/committee_meetings_house.json
data/committee_meetings_senate.json.
The House-side scraper is very slow. Each meeting is requested from a separate file whose response time seems to be pretty slow.
Each file contains an array of committee meeting objects which look like:
{
"bill_ids": [
"hr1897-113",
"hr1951-113",
"hres131-113"
],
"chamber": "house",
"committee": "HSFA",
"congress": 113,
"guid": "21f0bf90-d28c-420e-9fa2-2e4c327fe5ff",
"house_event_id": "100871",
"house_meeting_type": "HMKP",
"occurs_at": "2013-05-15T23:30:00",
"room": "RHOB 2172",
"subcommittee": "16",
"topic": "Markup of H.R. 1897, H.R. 1951, and H. Res. 131."
}
The fields are:
{
"guid": "21f0bf90-d28c-420e-9fa2-2e4c327fe5ff"
}
Each meeting is assigned a GUID. If you re-run the scraper (without deleting the output JSON files), the GUIDs will be preserved from run to run so that you can tell when meetings are added or revised. For Senate committee meetings, we preserve the GUID by a heuristic. The House provides stable IDs for meetings. IDs start at 100,031 for meetings since the launch of docs.house.gov.
{
"chamber": "house",
"congress": 113,
"committee": "HSFA",
"subcommittee": "16"
}
The committee holding the meeting is indicated by the committee
attribute. If it is a subcommittee meeting, subcommittee
will also be set with the integer-like identifier for the subcommittee. See https://github.com/unitedstates/congress-legislators/blob/master/committees-current.yaml for the key to these identifiers.
congress
will be set to the number of the current Congress (see Bills).
chamber
is house
or senate
. It is often redundant with the chamber of the committee. But if a joint committee holds a meeting, chamber
will be set according to which data source the meeting was specified in.
{
"occurs_at": "2013-05-15T23:30:00",
"room": "RHOB 2172",
"topic": "Markup of H.R. 1897, H.R. 1951, and H. Res. 131."
}
The date and time of the meeting in local time (occurs_at
), the meeting's location (room
), and the free-text topic of the meeting as provided by the House or Senate. Most meetings are in one of the congressional office buildings in Washington, DC (putting the time in Eastern Time). We don't try to normalize the room, but you'll often find these abbreviations:
- SR: Senate Russel Office Building
- SD: Senate Dirksen Office Building
- SH: Senate Hart Office Building
- CHOB: Cannon House Office Building
- LHOB: Longworth House Office Building
- RHOB: Rayburn House Office Building
- CAPITOL: The Capitol Building (often with H or S indicating which side)
- HVC/SVC: House or Senate side of the Capitol Visitor Center
{
"bill_ids": [
"hr1897-113",
"hr1951-113",
"hres131-113"
]
}
Related bills indentified by the same ID format as in Bills. For Senate meetings, we scan the topic for named bills using a regular expression. The House provides related bills data.
{
"house_event_id": "100871",
"house_meeting_type": "HMKP"
}
For House meetings, we also pass through the meeting ID and meeting type code from the House. See their naming conventions.