Skip to content

Workshop on Large Language Models at the International Association of Cancer Registries 2025 conference (IACR2025), Izmir, Turkey

License

Notifications You must be signed in to change notification settings

IKNL/iacr_2025_llm_workshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IACR 2025 LLM Workshop

Welcome to the online materials of the pre-conference LLM workshop for the International Association of Cancer Registries 2025 in Izmir, Turkey (https://www.iacr2025.com/pre-conference-workshop) written by Dimitris Katsimpokis and Irene Cara. The workshop features hands-on sessions on text classification and Retrieval-Augmented Generation (RAG).

Workshop Overview

The workshop consists of two main sessions designed to provide practical experience with modern LLM techniques in healthcare and research contexts.

Session Notebook
Session 1: Text classification Open In Colab
Session 2: Retrieval-Augmented Generation (RAG) Open In Colab

Repository Structure

├── data/                           # Workshop datasets
├── model/                          # Model files
├── notebooks/                      # Jupyter notebooks for workshop sessions
│   ├── retrieval_augmented_generation/
│   │   ├── RAG.ipynb               # RAG workshop notebook
│   │   └── helper_notebooks/       # Helper notebooks (i.e., data parsing)
│   └── text_classification/
│       └── IACR_text_classification.ipynb # Text classification workshop notebook
├── requirements.txt                 # Python dependencies
├── LICENSE.md                       # MIT License
└── readme.md                        # This file

Running the Notebooks

Option 1: Google Colab (Preferred)

Click the "Open in Colab" badges above to run the notebooks in Google Colab with pre-configured environments.

Important Note: A google account is needed to run the notebooks.

Option 2: Local Jupyter

[TO BE ADDED LATER]

Session Details

Session 1: Text Classification

Learn how to implement and fine-tune LLMs for text classification tasks using healthcare data. The session covers:

  • TF-IDF vectorization for turning text into a numerical representation
  • Not fine-tuned sentence embedding model to get contextual embeddings, in combination with a random forest classifier
  • Few-shot finetuning sentence embedding models with Setfit
  • Zero-shot classification with the HugginFace pipeline
  • Evaluation of all previous models on WHO performance status classification

Session 2: Retrieval-Augmented Generation (RAG)

Explore RAG techniques for enhancing LLM responses with external knowledge. Topics include:

  • How to reduce hallucinations in LLM responces
  • Data Base Retrieval mechanisms and similarity search
  • Context integration and prompt engineering
  • Evaluation of RAG on Adverse Events (CTCAE) and Cancer in 5 Continents (CI5) data

Data Sources

The workshop utilizes:

For local installation only (Option 2 above)

[TO BE ADDED LATER]

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Contributors

Developed by Dimitris Katsimpokis (d.katsimpokis@iknl.nl) and Irene Cara (i.cara@iknl.nl) working at the Netherlands Comprehensive Cancer Organisation (Integraal Kankercentrum Nederland; IKNL)

Support

For questions or issues during the workshop, please raise an issue in this repository or contact the workshop organizers.

About

Workshop on Large Language Models at the International Association of Cancer Registries 2025 conference (IACR2025), Izmir, Turkey

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •