In this project, we generate summaries and category tags for of Massachusetts bills for MAPLE Platform. The goal is to simplify the legal language and content to make it comprehensible for a broader audience (9th-grade comprehension level) by exploring different ML and LLM services.
This repository contains a pipeline from taking bills from Massachusetts legislature, generating summaries and category tags leveraging different the Massachusetts General Law sections, creating a dashboard to display and save the generated texts, to deploying and integrating into MAPLE platform.
-
Documentation:
Research.md
: our research on large language models and evaluation methods we planned to use for this project.
Documentation MAPLE.pdf
: includes detail operation of our model for future use and improvement. -
EDA: the notebook
eda.ipynb
includes our work from scraping data that takes bills from MAPLE Swagger API, creating a dataframe to clean and process data, making visualizations to analyze data and explore characteristics of the dataset. -
demoapp:
app.py
: contains the codes of the LLM service we used and the wepapp we made using Streamlit. The webapp allows user to search for all bills.
app2.py
: we test on top 12 bills from MAPLE website. We extract information from Massachusetts General Law to add context for the summaries of these bills.
Other files: helper files to be imported in the above two Python app files. -
Prompts Engineering:
prompts.md
stores all prompts that we tested. -
Tagging: contains the list of categories and tags.
-
Deployment: contains the link of our Streamlit deployed webapp.
The dataset used for this project is fully open sourced and can be access through Mass General Laws API.
Our team and MAPLE agree about putting disclaimer that this text is AI-generated.
Although we make use of open source transformers to evaluate hallucination with Vectara, it is important to have experts and human evaluation to further maintain a trustworthy LLM system.
- https://huggingface.co/docs/transformers/tasks/summarization
- https://huggingface.co/vectara/hallucination_evaluation_model
- https://github.com/vectara/hallucination-leaderboard
- https://www.nocode.ai/llms-undesirable-outputs/
- https://learn.deeplearning.ai/
- https://blog.langchain.dev/espilla-x-langchain-retrieval-augmented-generation-rag-in-llm-powered-question-answering-pipelines/
Vy Nguyen - Email: nptv1207@bu.edu
Andy Yang - Email: ayang903@bu.edu
Gauri Bhandarwar - Email: gaurib3@bu.edu
Weining Mai - Email: weimai@bu.edu