URL = http://www.ceo.kerala.gov.in/electoralrolls.html
Year = Draft Roll for 2018
The Script does 3 things:
-
Produces kerala.csv that contains metadata about the pdfs. The CSV has the following fields:
district, leg_assembly, booth_no, polling_station_name, language, file_name
-
Downloads all the pdfs to a directory called
kerala_pdfs/
-
Renames the pdfs as follows:
- lowercase, snake_case
- english language rolls have the prefix
eng
and Malayalam language rolls have the prefixmal
. - Second segment is 2 digit district code (01, 02, 03,...)
- Third segment is 3 digit legislative assembly code (001, 002, 003, ...)
- Fourth segment is a 3 digit polling_booth_code (001, 002, 003, ...)
So a sample name = eng_01_001_001.pdf
pip install -r requirements.txt
python kerala.py
Archives available from 2011. Script for scraping the archives here.