This subdirectory provides a minimal, directly runnable pipeline to:
- Select a subset of samples from an embedding matrix
- Train a dual-head VAE with a shared encoder
- Generate noisy sample pairs using the trained VAE
- Train a Bradley–Terry reward model, with optional baseline comparison
run_pipeline.sh
: Simplified pipeline script (uses relative paths)select_samples.py
,train_vae_dual.py
,generate_noisy_pairs.py
,train_bt_vae.py
: Core Python scriptsrequirements.txt
: Python dependenciesrun_100k_reference.sh
: Original large script for reference only
- Python >= 3.9 (3.10/3.11 recommended)
- Install dependencies (prefer a virtualenv):
pip install -r requirements.txt
- For GPU usage, install a CUDA-enabled
torch
build.
Expected inputs (example names, paths are configurable):
data/train_100k.npy
: Training embeddings (shape: [N, D])data/multi_response_embeddings.npy
: Multi-response featuresdata/multi_response_rewards.npy
: Corresponding rewards
You may place them anywhere and pass their paths via script arguments.
bash run_pipeline.sh --input_file data/train_100k.npy --output_dir outputs/llama_instruct_10k --multi_response_features_path data/multi_response_embeddings.npy --multi_response_rewards_path data/multi_response_rewards.npy --num_samples 50000 --latent_dim 16 --hidden_dims 64 --batch_size 128 --epochs 20 --lr 1e-4 --temperature 1.0 --contrastive_weight 0.01 --noise_std 0.01 --num_variants 1 --n_noise 10 --train_size 1000.0 --hidden_dim 512 --dropout 0.0 --seed 44 --run_comparison true
- Outputs are saved under
--output_dir
, including:seeds_samples/
: Selected subsetvae_model/
: Trained VAE (best_model.pt
)generated_pairs/
: Noisy pairs (generated_noisy_pairs.npy
)reward_model_with_vae/
andreward_model_baseline/
: Reward model results (gold_reward_results.json
, etc.)
Note: This minimal version removes dependencies on
jq
/bc
and does not enforce extra comparison report generation, while keeping core functionality intact.
- Without a GPU,
torch
will fall back to CPU and be slower. - If defaults in
train_vae_dual.py
ortrain_bt_vae.py
differ, override them via the same-namedrun_pipeline.sh
args. - Ensure input
.npy
files have the expected shapes.
Follow your repo's license; if unspecified, default to MIT.