RAFT Enhancements: Improved robustness, logging, checkpointing, threading, Llama support, Azure auth and eval #604

cedricvidal · 2024-08-25T18:04:45Z

This pull request introduces a comprehensive set of updates and improvements to the RAFT project, enhancing robustness, logging, progress monitoring, checkpointing, multi-threading, Llama support, Azure authentication, and evaluation processes.

Note: Those updates where developed for the most part to prepare the MS Build 2024 talk Practicalities of Fine-Tuning Llama 2 with AI Studio with @ShishirPatil and Bala Venkataraman.

Key updates include:

RAFT Script Improvements:

This PR introduces significant updates to the raft.py script, expanding its functionality, improving its configurability, and removing deprecated options. Below is a summary of the key changes:

Logging Enhancements: Improved logging configuration, including more granular logging for various operations.
Checkpointing Overhaul: Significant refactoring of checkpointing logic in raft.py, including the introduction of multi-threading, better directory handling, and optimization of chunk processing. The --fast mode, which deactivated checkpointing, was removed in favor of a more efficient implementation that allows checkpointing to remain activated at all times.
Multi-Worker Support: Added a --workers parameter to enable parallel processing, improving efficiency and reliability during various operations.
Llama Instruction Support: Added support for Llama instructions in addition to GPT instructions, enhancing the versatility of the script for different model types.
Dataset Processing: Added more robust handling and filtering of datasets, including support for customized field names, empty row filtering, and threshold-based early stopping.
Authentication Updates: Added support for Azure OpenAI Keyless and Managed Identity authentication, along with related environment variable handling.
Content Safety Handling: Updated the content generation process to skip chunks that fail content safety compliance checks, allowing the process to continue without interruption.
Progress Logging Enhancements: Improved progress logging with tqdm, including enhanced stats support in client_utils.py, providing better insights into the process flow.
Bug Fixes and Cleanup: Fixed various bugs across the project, cleaned up help messages, and removed outdated or redundant components.

New Features and Options

Output Format Expansion:
- Added a new output format option: eval. This format is intended for evaluation purposes, providing an additional way to format datasets.
Enhanced Output Configuration:
- Introduced --output-completion-prompt-column and --output-completion-completion-column options to allow users to specify custom column names for prompts and completions when using the completion format.
System Prompt Customization:
- Added the --system-prompt-key option to allow users to select between different system prompt keys (gpt or llama) based on the model they intend to use for dataset generation.
Worker Thread Management:
- Introduced the --workers option to allow parallel processing by specifying the number of worker threads, improving the script’s efficiency in handling large datasets.
Checkpoint Management:
- Added the --auto-clean-checkpoints option, giving users the ability to automatically clean up checkpoints after dataset generation, reducing the need for manual intervention.
Question/Answer Sample Threshold:
- Introduced the --qa-threshold option, which allows users to specify a threshold for the number of Question/Answer samples to generate before stopping. This provides more control over the dataset generation process, particularly in large-scale operations.

Removed Options

--fast:
- The --fast option has been removed. This option was previously used to run the script in a fast mode with no recovery implemented. The script has been optimized to improve performance without the need for a separate fast mode, rendering this option obsolete.

Default Value Updates

Several options now have default values set, including --output-type, --output-format, --doctype, --embedding_model, --completion_model, --workers, and more. These defaults aim to make the script more user-friendly by reducing the need for extensive configuration.

Evaluation Script Improvements:

Stop Keyword: Added a stop keyword functionality to allow controlled early termination of evaluation processes when specific conditions are met.
Retry Mechanism: Introduced a retry mechanism for failed tasks, improving reliability during evaluations.
Improved Robustness: Enhanced the script’s robustness, particularly in handling errors and edge cases, ensuring a smoother evaluation process.
Logging Retry Statistics: Implemented logging for retry attempts, providing detailed insights and transparency into the evaluation process.
Main Thread Exception Handling: Fixed an issue where exceptions in the main thread could cause silent failures, ensuring that all errors are properly reported and handled.
Support for Chat and Completion Models: Extended the script to support both chat and completion models, increasing its versatility across different use cases.
Environment Prefix Handling: Enabled the script to accept an environment prefix as a parameter, enhancing its adaptability to different deployment environments.
Progress Monitoring: Integrated progress monitoring with tqdm, allowing for real-time tracking of the evaluation process.
Configurable Workers: Made the number of workers configurable using the --workers option, allowing for fine-tuned parallel processing during evaluations.

Here's the PR message formatted in Markdown:

Enhanced CLI Options for `eval.py`

This PR introduces several new command-line options to the eval.py script, providing enhanced functionality and flexibility for model evaluation. The following changes have been made:

--model MODEL: Added support for specifying the model to be evaluated.
--mode MODE: Introduced a new option to select the API mode, either 'chat' or 'completion'. The default mode is set to 'chat'.
--input-prompt-key INPUT_PROMPT_KEY: Added the ability to define which column in the dataset should be used as the input prompt.
--output-answer-key OUTPUT_ANSWER_KEY: Added the ability to define which column in the dataset should be used as the output answer.
--workers WORKERS: Introduced multi-threading support, allowing users to specify the number of worker threads for evaluating the dataset, improving processing efficiency.
--env-prefix ENV_PREFIX: Added an option to customize the prefix for environment variables used for API keys and base URLs. The default prefix is set to EVAL.

These enhancements provide greater control over the evaluation process, allowing for more customized and efficient use of the eval.py script.

Testing

pytest

With 30 chunks, if the process fails at chunk 29 (0-indexed), then checkpoint saves 30//15*15 = 15 instead of 30 Moreover, no need to save checkpoint state for every chunk, it's only need when the ds is actually saved and makes the code easier to understand.

because different systems have different expectations.

To clarify that this is the ground truth gold answer as opposed to the answer generated by the fine tuned model

- makes collecting checkpoints less expensive - makes deleting checkpoints a single line call

- Threads can stop early when asked through shared Event - `--qa-threshold` param to configure when to stop

PR #605 and #604 had conflicting requirements.txt. This should fix it.

)

cedricvidal added 2 commits August 25, 2024 16:13

Resolve default logging conf file relative to logconf.py

44e2ca7

Format support for eval format

bc24be6

cedricvidal changed the title ~~RAFT Enhancements: Improved robustness, logging, progress, checkpointing, multi-threading, Llama support, Azure auth and eval~~ RAFT Enhancements: Improved robustness, logging, checkpointing, threading, Llama support, Azure auth and eval Aug 25, 2024

cedricvidal added 26 commits August 25, 2024 18:45

RAFT format.py Fix jsonl input type

3aec2a9

ARFT format.py more logs

b7ad22b

RAFT format.py Field names can be customized

20b3349

because different systems have different expectations.

RAFT format.py support for eval format

6a4239e

RAFT format.py renamed answer to gold_answer

79ef352

To clarify that this is the ground truth gold answer as opposed to the answer generated by the fine tuned model

RAFT eval notebook works end to end

7cb803f

RAFT raft.py add --checkpoint-size arg

6222cb5

RAFT raft.py fix chunk and progress logging

98d01e3

RAFT raft.py saving checkpoints in a single directory

26dbaa4

- makes collecting checkpoints less expensive - makes deleting checkpoints a single line call

RAFT raft.py disbale datasets progress bars when saving checkpoints

de2bc2e

RAFT raft.py add llama template

56f469b

RAFT raft.py llama prompt test 1

ede07e0

RAFT raft.py fixed missed change to formatter

ade26e9

RAFT raft.py specific template for llama to generate questions

b7422c7

RAFT format.py stop keyword

2c5ac21

RAFT eval.py stop keyword

b509584

RAFT raft.py gpt qa temaplte formatting

3a125f5

RAFT format.py final_answer

b48e390

RAFT raft.py save chunks to checkpoints folder

c550d7a

RAFT raft.py moved chunks checkpoint support to method

2de88b9

RAFT raft.py rename chunks func

249a375

RAFT format.py more logging

a587d25

RAFT raft.py Moved checkpointing logic

a29ac6f

RAFT raft.py more checkpoint refactoring

c32a199

RAFT raft.py relying on checkpoint directories instead of state file

3a09923

cedricvidal added 3 commits August 25, 2024 18:45

RAFT eval.py logging retry stats

f6039ea

RAFT eval.py fixed main thread silent fail in case of exception

d767416

RAFT fomrat.py 'answer' column is optional

fe2dfbe

cedricvidal force-pushed the upstream-merge-prep branch 2 times, most recently from 5a09a0a to 44aa242 Compare August 25, 2024 19:37

cedricvidal added 18 commits August 25, 2024 19:38

RAFT Display PDFs in notebooks

47c23c5

RAFT Support for chat and completion models

4c9f963

Ignore config.json

70723f0

Updated README with new CLI parameters

219a256

Display default values in help + some help cleanup

87003af

Display default values in format.py help

af53b91

Moving notebooks to separate repo

7535756

Fixed response Choice format

57dc942

Logging resolved prefixed OPENAI env vars

212476b

eval.py now takes the env prefix as param

52274f7

Skipping chunks that raise content safety alerts

fc8a645

More content filtering support int the question generation step

45a825e

Fixed chat format and added default chat system prompt

e58ca85

Fixed bug when format is completion

8acf9c0

Support for Azure OpenAI Keyless and Managed Identity authentication

4a5c5cc

Early stopping after QA threshold is met

d99bbdb

- Threads can stop early when asked through shared Event - `--qa-threshold` param to configure when to stop

Trimming the generated dataset to qa_threshold if set

e95d6f4

Using questions to track progress if qa_threshold is set

f1b84f6

cedricvidal force-pushed the upstream-merge-prep branch from 44aa242 to f1b84f6 Compare August 25, 2024 19:41

Merge branch 'main' into upstream-merge-prep

b2b41f8

ShishirPatil approved these changes Aug 27, 2024

View reviewed changes

ShishirPatil merged commit fa3bf8c into ShishirPatil:main Aug 27, 2024

ShishirPatil mentioned this pull request Aug 27, 2024

Fix/merge commit #605 and #604 #609

Merged

ShishirPatil added a commit that referenced this pull request Aug 27, 2024

Fix/merge commit #605 and #604 (#609)

fc6bd60

PR #605 and #604 had conflicting requirements.txt. This should fix it.

nkcheng255 pushed a commit to nkcheng255/gorilla that referenced this pull request Aug 30, 2024

Fix/merge commit ShishirPatil#605 and ShishirPatil#604 (ShishirPatil#609

4b24e71

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAFT Enhancements: Improved robustness, logging, checkpointing, threading, Llama support, Azure auth and eval #604

RAFT Enhancements: Improved robustness, logging, checkpointing, threading, Llama support, Azure auth and eval #604

cedricvidal commented Aug 25, 2024 •

edited

Loading

RAFT Enhancements: Improved robustness, logging, checkpointing, threading, Llama support, Azure auth and eval #604

RAFT Enhancements: Improved robustness, logging, checkpointing, threading, Llama support, Azure auth and eval #604

Conversation

cedricvidal commented Aug 25, 2024 • edited Loading

RAFT Script Improvements:

New Features and Options

Removed Options

Default Value Updates

Evaluation Script Improvements:

Enhanced CLI Options for eval.py

Testing

cedricvidal commented Aug 25, 2024 •

edited

Loading

Enhanced CLI Options for `eval.py`