llm-loss-validator

Validator that computes the validation loss for a huggingface-compatible LLM

Environment Setup

We recommand you to use conda to manage the python env for this repo.

conda create -n llm-loss-validator python==3.10.12
conda activate llm-loss-validator
pip install -r requirements.txt

How to run validation script

Automation with GPU

If you wish to continuously receive task assignments, you should use the following command:

cd /src
CUDA_VISIBLE_DEVICES=0 \
bash start.sh \
--hf_token your_hf_token \
--flock_api_key your_flock_api_key \
--task_id your_task_id \
--validation_args_file validation_config.json.example \
--auto_clean_cache False \
--lora_only True

Explanation of Parameters

CUDA_VISIBLE_DEVICES=0: Specifies which GPU to use. 0 indicates the first GPU. Adjust this based on your available GPUs.
--hf_token: Your Hugging Face token, required for accessing certain models. This should token should have write access.
--flock_api_key: Your FLock API key.
--task_id: The ID of the task you want to validate. If you are validating multiple tasks, you can pass a list eg. if you are validating tasks 8 and 9, you can pass --task_id 8,9
--validation_args_file: The path to the validation arguments file.
--auto_clean_cache: A flag to determine whether to automatically clean the model cache.
--lora_only: A flag to indicate whether to validate only repositories with LoRA (Low-Rank Adaptation) weights. True means only LoRA weights will be validated. This is useful for validators with limited network bandwidth, as LoRA weights are significantly smaller (10-500 MiB) compared to full model files (>10 GiB).

Validate only one assignment

With CPU

cd /src
FLOCK_API_KEY="<your-api-key>" python validate.py validate \
--model_name_or_path Qwen/Qwen1.5-1.8B-Chat \
--base_model qwen1.5 \
--eval_file ./data/dummy_data.jsonl \
--context_length 128 \
--max_params 7000000000 \
--local_test \
--validation_args_file validation_config_cpu.json.example

With GPU

cd /src
CUDA_VISIBLE_DEVICES=0 FLOCK_API_KEY="<your-api-key>" python validate.py validate \
--model_name_or_path Qwen/Qwen1.5-1.8B-Chat \
--base_model qwen1.5 \
--eval_file ./data/dummy_data.jsonl \
--context_length 128 \
--max_params 7000000000 \
--local_test \
--validation_args_file validation_config.json.example

The --local_test flag is for both validator and training node to test that whether they can successfully run validation for a given model submission and dataset. It won't interact with the Fed Ledger service.

To actually calculate and submit the score for a given task assignment. You should use the following command

CUDA_VISIBLE_DEVICES=0 FLOCK_API_KEY="<your-api-key>" python validate.py validate \
--model_name_or_path Qwen/Qwen1.5-1.8B-Chat \
--base_model qwen1.5 \
--eval_file ./data/dummy_data.jsonl \
--context_length 128 \
--max_params 7000000000 \
--assignment_id <assignment-id> \
--validation_args_file validation_config.json.example

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
Dockerfile		Dockerfile
Dockerfile-gpu		Dockerfile-gpu
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm-loss-validator

Environment Setup

How to run validation script

Automation with GPU

Explanation of Parameters

Validate only one assignment

With CPU

With GPU

About

Releases 15

Packages

Contributors 8

Languages

License

FLock-io/llm-loss-validator

Folders and files

Latest commit

History

Repository files navigation

llm-loss-validator

Environment Setup

How to run validation script

Automation with GPU

Explanation of Parameters

Validate only one assignment

With CPU

With GPU

About

Resources

License

Stars

Watchers

Forks

Releases 15

Packages 0

Contributors 8

Languages

Packages