Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAFT Enhancements: Improved robustness, logging, checkpointing, threading, Llama support, Azure auth and eval #604

Merged
merged 81 commits into from
Aug 27, 2024

Conversation

cedricvidal
Copy link
Contributor

@cedricvidal cedricvidal commented Aug 25, 2024

This pull request introduces a comprehensive set of updates and improvements to the RAFT project, enhancing robustness, logging, progress monitoring, checkpointing, multi-threading, Llama support, Azure authentication, and evaluation processes.

Note: Those updates where developed for the most part to prepare the MS Build 2024 talk Practicalities of Fine-Tuning Llama 2 with AI Studio with @ShishirPatil and Bala Venkataraman.

Key updates include:

RAFT Script Improvements:

This PR introduces significant updates to the raft.py script, expanding its functionality, improving its configurability, and removing deprecated options. Below is a summary of the key changes:

  • Logging Enhancements: Improved logging configuration, including more granular logging for various operations.
  • Checkpointing Overhaul: Significant refactoring of checkpointing logic in raft.py, including the introduction of multi-threading, better directory handling, and optimization of chunk processing. The --fast mode, which deactivated checkpointing, was removed in favor of a more efficient implementation that allows checkpointing to remain activated at all times.
  • Multi-Worker Support: Added a --workers parameter to enable parallel processing, improving efficiency and reliability during various operations.
  • Llama Instruction Support: Added support for Llama instructions in addition to GPT instructions, enhancing the versatility of the script for different model types.
  • Dataset Processing: Added more robust handling and filtering of datasets, including support for customized field names, empty row filtering, and threshold-based early stopping.
  • Authentication Updates: Added support for Azure OpenAI Keyless and Managed Identity authentication, along with related environment variable handling.
  • Content Safety Handling: Updated the content generation process to skip chunks that fail content safety compliance checks, allowing the process to continue without interruption.
  • Progress Logging Enhancements: Improved progress logging with tqdm, including enhanced stats support in client_utils.py, providing better insights into the process flow.
  • Bug Fixes and Cleanup: Fixed various bugs across the project, cleaned up help messages, and removed outdated or redundant components.

New Features and Options

  1. Output Format Expansion:

    • Added a new output format option: eval. This format is intended for evaluation purposes, providing an additional way to format datasets.
  2. Enhanced Output Configuration:

    • Introduced --output-completion-prompt-column and --output-completion-completion-column options to allow users to specify custom column names for prompts and completions when using the completion format.
  3. System Prompt Customization:

    • Added the --system-prompt-key option to allow users to select between different system prompt keys (gpt or llama) based on the model they intend to use for dataset generation.
  4. Worker Thread Management:

    • Introduced the --workers option to allow parallel processing by specifying the number of worker threads, improving the script’s efficiency in handling large datasets.
  5. Checkpoint Management:

    • Added the --auto-clean-checkpoints option, giving users the ability to automatically clean up checkpoints after dataset generation, reducing the need for manual intervention.
  6. Question/Answer Sample Threshold:

    • Introduced the --qa-threshold option, which allows users to specify a threshold for the number of Question/Answer samples to generate before stopping. This provides more control over the dataset generation process, particularly in large-scale operations.

Removed Options

  1. --fast:
    • The --fast option has been removed. This option was previously used to run the script in a fast mode with no recovery implemented. The script has been optimized to improve performance without the need for a separate fast mode, rendering this option obsolete.

Default Value Updates

  • Several options now have default values set, including --output-type, --output-format, --doctype, --embedding_model, --completion_model, --workers, and more. These defaults aim to make the script more user-friendly by reducing the need for extensive configuration.

Evaluation Script Improvements:

  • Stop Keyword: Added a stop keyword functionality to allow controlled early termination of evaluation processes when specific conditions are met.
  • Retry Mechanism: Introduced a retry mechanism for failed tasks, improving reliability during evaluations.
  • Improved Robustness: Enhanced the script’s robustness, particularly in handling errors and edge cases, ensuring a smoother evaluation process.
  • Logging Retry Statistics: Implemented logging for retry attempts, providing detailed insights and transparency into the evaluation process.
  • Main Thread Exception Handling: Fixed an issue where exceptions in the main thread could cause silent failures, ensuring that all errors are properly reported and handled.
  • Support for Chat and Completion Models: Extended the script to support both chat and completion models, increasing its versatility across different use cases.
  • Environment Prefix Handling: Enabled the script to accept an environment prefix as a parameter, enhancing its adaptability to different deployment environments.
  • Progress Monitoring: Integrated progress monitoring with tqdm, allowing for real-time tracking of the evaluation process.
  • Configurable Workers: Made the number of workers configurable using the --workers option, allowing for fine-tuned parallel processing during evaluations.

Here's the PR message formatted in Markdown:

Enhanced CLI Options for eval.py

This PR introduces several new command-line options to the eval.py script, providing enhanced functionality and flexibility for model evaluation. The following changes have been made:

  • --model MODEL: Added support for specifying the model to be evaluated.
  • --mode MODE: Introduced a new option to select the API mode, either 'chat' or 'completion'. The default mode is set to 'chat'.
  • --input-prompt-key INPUT_PROMPT_KEY: Added the ability to define which column in the dataset should be used as the input prompt.
  • --output-answer-key OUTPUT_ANSWER_KEY: Added the ability to define which column in the dataset should be used as the output answer.
  • --workers WORKERS: Introduced multi-threading support, allowing users to specify the number of worker threads for evaluating the dataset, improving processing efficiency.
  • --env-prefix ENV_PREFIX: Added an option to customize the prefix for environment variables used for API keys and base URLs. The default prefix is set to EVAL.

These enhancements provide greater control over the evaluation process, allowing for more customized and efficient use of the eval.py script.

Testing

pytest

@cedricvidal cedricvidal changed the title Comprehensive RAFT Enhancements: Improved robustness, logging, progress, checkpointing, multi-threading, Llama support, Azure authentication and evaluation refinements RAFT Enhancements: Improved robustness, logging, progress, checkpointing, multi-threading, Llama support, Azure auth and eval Aug 25, 2024
@cedricvidal cedricvidal changed the title RAFT Enhancements: Improved robustness, logging, progress, checkpointing, multi-threading, Llama support, Azure auth and eval RAFT Enhancements: Improved robustness, logging, checkpointing, threading, Llama support, Azure auth and eval Aug 25, 2024
With 30 chunks, if the process fails at chunk 29 (0-indexed), then checkpoint saves 30//15*15 = 15 instead of 30

Moreover, no need to save checkpoint state for every chunk, it's only need when the ds is actually saved and makes the code easier to understand.
because different systems have different expectations.
To clarify that this is the ground truth gold answer as opposed to the answer generated by the fine tuned model
- makes collecting checkpoints less expensive
- makes deleting checkpoints a single line call
@cedricvidal cedricvidal force-pushed the upstream-merge-prep branch 2 times, most recently from 5a09a0a to 44aa242 Compare August 25, 2024 19:37
@ShishirPatil ShishirPatil merged commit fa3bf8c into ShishirPatil:main Aug 27, 2024
ShishirPatil added a commit that referenced this pull request Aug 27, 2024
PR #605 and #604 had conflicting requirements.txt. This should fix it.
nkcheng255 pushed a commit to nkcheng255/gorilla that referenced this pull request Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants