Official repository of the paper "Answering User Questions about Machine Learning Models through Standardized Model Cards"
pip install -r requirements.txt
We used Python 3.11 for this project.
To collect list of models and their discussions from Hugging Face Hub, run the following command from the data_collector
directory.
python main.py
- List of models will be saved in
data/all_models.csv
file. - Discussions along with pull requests will be saved inside the
data/discussions
directory. The directory structure is as followed:
├── data: all the data generated after running the scripts are saved in this directory
│ ├── discussions: directory to save all discussions and pull requests
│ │ ├── <model_id>: model repository to save discussions and pull requests. the `/` in the `model_id` is replaced with '@'. an empty directory means there are no discussions and pull requests in the repository.
│ │ │ ├── discussion_<discussion_number>.yaml: a discussion file containing the discussion details
│ │ │ ├── pull_request_<pull_request_number>.yaml: a pull request file containing the pull request details
- A list of the downloaded discussions will be created in
data/all_discussions.csv
file
To select sample data for manual analysis, run the following command from the data_analyzer
directory.
python random_discussion_selector.py
- 378 list of randomly selected discussions will be created in
data/all_random_discussions.csv
file.
To filter the random discussions in data/all_random_discussions.csv
file, run the following command from data_cleaner
directory.
python random_discussion_cleaner.py
- Filtered list of random discussions will be saved in
data/cleaned_random_discussions.csv
file.
To filter the models and all the discussions, run the following command from the data_cleaner
directory.
python main.py
- Filtered list of models will be saved in
data/quality_models.csv
. - Discussion list of the filtered models will be saved in
data/quality_models_discussions.csv
. - Filtered list of discussions will be saved in
data/cleaned_discussions.csv
file.
To classify the filtered random discussion posts using gpt-3.5-turbo-0125
, run the following command from the discussion_classifier
directory
python random_discussion_classifier.py
Please note that you need to have an OpenAI API key to run the classification. The key should be saved in the OPENAI_API_KEY
variable of the util/constants.py
file.
- Classification will run 3 times, saving the results in
data/random_discussion_classification
directory. The result generated by GPT for each discussion will be saved in anmd
file in format<index>_<model_id>_<discussion_number>_result_gpt-3-5.md
. The 3 runs' results will be saved inrun_1
,run_2
, andrun_3
directories. - Classification results will also be saved in columns of
data/cleaned_random_discussions.csv
file in namecontains_question_run_<run_number>
. - Final decision about the class will be saved in
data/cleaned_random_discussions.csv
file in namecontains_question_final_class
.
Two authors individually manually identified if the sample discussions contain questions. The ground truth is available in data/gpt_sample_discussion_classification.xlsx
file. 1st_author_classes
and 2nd_author_classes
contains the classes of the two authors and agreed_classes
is their agreed classes. Their agreement is calculated using Cohen's Kappa and saved in cohens_kappa
sheet. The disagreement resolution is saved in disagreement_resolution
sheet.
Performance evaluation of GPT in classifying the sample discussion posts as question-containing post is available in the gpt_classification_evaluation
sheet of data/gpt_sample_discussion_classification.xlsx
file.
To classify all the filtered discussion posts using gpt-3.5-turbo-0125
, run the following command from the discussion_classifier
directory
python all_discussion_classifier.py
Please note that you need to have an OpenAI API key to run the classification. The key should be saved in the OPENAI_API_KEY
variable of the util/constants.py
file.
- Classification will run 3 times, saving the results in
data/all_discussion_classification
directory. The result generated by GPT for each discussion will be saved in anmd
file in format<index>_<model_id>_<discussion_number>_result_gpt-3-5.md
. The 3 runs' results will be saved inrun_1
,run_2
, andtie_breakers
directories. - Classification results will also be saved in columns of
data/cleaned_discussions.csv
file in namecontains_question_run_1
,contains_question_run_2
, andcontains_question_tie_breaker
accordingly. - Final decision about the class will be saved in
data/cleaned_discussions.csv
file in namecontains_question_final_class
- List of question-containing discussions will be saved in
data/all_questions.csv
file.
To generate all the plots, run the following command from the plot_generator
directory
python main.py
- Plots will be generated in
data/plots
directory inpdf
andpng
format.
To train a BERTopic model on the discussion posts, first run the following command from the repository root
python -m spacy download en_core_web_sm
Then run the following command from the discussion_topic_modeller
directory
python bertopic_topic_modeller.py
- Trained BERTopic model file
model_min_cluster_size_60
will be saved indata/bertopic_model
. - Our trained model is available here.
To save the representative topics and keywords for each topic, run the following command from the discussion_topic_modeller
directory
python topic_analyzer.py
- Representative documents and keywords of the topics will be saved in
data/bertopic_model/topics/<topic_id>.md
file.
To visualize the topics, run the discussion_topic_modeller/bertopic_topic_visualizer.ipynb
notebook.
To visualize the clusters of the topics, run the following command from the discussion_topic_modeller
directory
python topic_cluster_visualizer.py
- Topic ids of the same clusters will be printed in the console.
- Cluster visualization will be saved in
data/bertopic_model/model_min_cluster_size_60_hierarchy_plot.pdf
file. The GPT generated labels for the topics have been used in the visualization. - Cluster visualization with our own labels will be generated in the
data/bertopic_model/custom_label_hierarchy_plot.pdf
file. The labels are available in thedata/bertopic_model/topic_custom_label.csv
file.
Two authors individually manually mapped the questions to the model cards. The mapping result is available in data/manual_question_mapping.xlsx
file. The 1st and 2nd authors' mapping results are saved in author1_labels
and author2_labels
sheet respectively. The disagreement resolution is saved in the resolution
column of the author1_labels
sheet. To calculate the inter-rater agreement, run the following command from the data_analyzer
directory
python irr_calculator.py
- Kappa score of the 2 rounds of mapping will be printed in the console.