Fixing T5 Fine-Tuning Bugs for Medical Diagnosis

This notebook documents the debugging process of fine-tuning T5-small for medical diagnosis tasks. What started as a simple text-to-text classification project turned into a deep dive into tokenizer handling, generation bugs, and training instabilities.

Problem

After training, the model only output "True" or repeated inputs. Despite a great loss curve, the outputs were useless.

Method

I approached this by:

Isolating tokenizer and data collator issues
Testing task prefix impact (critical for T5!)
Comparing fresh vs continued training
Tuning generation parameters (e.g. beam search, repetition penalty)

Results

Final loss: 0.009; accuracy on our small, hand-crafted test set was 100%, likely due to the limited dataset and clear-cut labels (e.g., Pneumonia, Myocardial infarction). This should not be interpreted as clinical performance. www.kaggle.com/code/kjacoby/debugging-guide-t5-fine-tuning-true-bug

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LICENSE		LICENSE
README.md		README.md
ai_debugging_methodology_guide.md		ai_debugging_methodology_guide.md
llm_debugging_guide_de.md		llm_debugging_guide_de.md
llm_debugging_guide_en.md		llm_debugging_guide_en.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fixing T5 Fine-Tuning Bugs for Medical Diagnosis

Problem

Method

Results

About

Uh oh!

Releases

Packages

License

KatharinaJacoby/LLM-Fine-tuning-Debugging-Guide

Folders and files

Latest commit

History

Repository files navigation

Fixing T5 Fine-Tuning Bugs for Medical Diagnosis

Problem

Method

Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages