For this project's abstract/description, please see the Abstract segment of the README.
Luis G. (@curlyLasagna), Andreas P. (@Greek), Rayquan W. (@Factorial343)
To view the abstract, see Abstract.pdf
.
To view our presentation, see IC25 Presentation.pdf
.
The results of the processed FOIA data is available in the results.csv
file.
All raw and unprocessed data belongs in the data/
directory, which is
then processed by the src/to_csv.py
, src/classification.py
, and src/semantic_search.py
scripts to produce results.
Here are the files required for processing:
- A CSV file of all FOIA requests -
data/pii/data.csv
(not included in repo) - A list of all departments -
data/departments.csv
There is also some information that includes personally identifiable information,
also known as PII. This data is stored in the data/pii/
directory, which
unfortunately includes the actual un-processed FOIA dataset
Generative AI was heavily used throughout this project
Platform | Google AI Studio. Gemini 2.0 Flash model |
How it was used | Explain concepts concisely. How to generate a chart using Altair. Debug through errors. Determine where a keyword should go based on a department's name |
Learning points | Filtering dataframes. Applying functions to each row of a dataframe. What stop words are in the context of keyword extraction. Libraries to use |
To utilize the Python programs used to complete this project, you must,
- Have Python 3.13 installed
- Have 'uv' package manager installed, available here: https://docs.astral.sh/uv/#installation
First, install all the required packages using uv:
$ uv sync # Installs the necessary packages
Then pull a copy of all FOIAs (or a subset) in CSV format, and save it in ./data/pii/data.csv
.
Warning
Ensure the columns are in the following format: Request ID,Request Description
Warning
Ensure you're in the ROOT DIRECTORY of this repository when executing the script.
To run the classification script, run the following:
$ python3 src/classification.py
The results will appear in the results.csv
file.
Make sure a venv is created and activated
marimo edit semantic_search.py
To run a quick prototype of our search app that will return department that semantic search considers as the best candidate
streamlit run search_app.py