Skip to content

Conversation

@tyler-griggs
Copy link
Member

🎯 Overview

This PR introduces a comprehensive Pydantic-based configuration system for SkyRL training while maintaining full backward compatibility with existing YAML/Hydra workflows.

✨ Key Features

1. Type-Safe Pydantic Models

  • 30+ Pydantic model classes with full type annotations
  • 1:1 mapping to ppo_base_config.yaml structure
  • IDE autocomplete and type checking support
  • Runtime validation of configuration values

2. Unified Entry Point

  • New run_training(cfg: Union[DictConfig, SkyRLConfig]) function
  • Accepts both YAML-based DictConfig and Python-based Pydantic configs
  • Automatic type detection and conversion

3. CLI Override Support ✨

  • NEW: CLI overrides work with Python configs!
  • Syntax: --trainer.epochs=30 --trainer.policy.model.path=Qwen/Qwen2.5-7B
  • Supports all Python types (int, float, bool, strings, lists)

4. Compositional Config Building

  • RLlib-style nested configuration construction
  • More readable than mutation-based style

5. Full Backward Compatibility

  • All existing .sh scripts work unchanged
  • YAML + Hydra CLI overrides still supported

📦 What's New

  • skyrl_train/config/configs.py - Pydantic models (800+ lines)
  • skyrl_train/config/cli_overrides.py - CLI override utilities
  • examples/gsm8k/run_gsm8k.py - Python-based config example
  • examples/gsm8k/test_run_gsm8k.py - Test config
  • PYDANTIC_CONFIG_PROJECT.md - Full documentation

🧪 Testing

✅ Successfully ran full training with Pydantic config on 4 GPUs
✅ Verified backward compatibility with YAML configs
✅ Tested CLI override parsing and application

See PYDANTIC_CONFIG_PROJECT.md for complete documentation!

tyler-griggs and others added 2 commits January 25, 2026 00:49
- Add log_to_driver=False to ray.init() to suppress worker/raylet log forwarding
- Set RAY_BACKEND_LOG_LEVEL=fatal to suppress C++ metrics exporter errors
- Add SKYRL_DEBUG_LOGGING env var to re-enable verbose logging when needed

This significantly reduces stdout noise during training runs while still
preserving all logs in Ray's log files for debugging.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add log_to_driver=False to ray.init() to suppress worker/raylet log forwarding
- Set RAY_BACKEND_LOG_LEVEL=fatal to suppress C++ metrics exporter errors
- Use existing LOG_LEVEL=DEBUG to re-enable verbose logging when needed

This significantly reduces stdout noise during training runs while still
preserving all logs in Ray's log files for debugging.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive Pydantic-based configuration system, significantly improving type safety and programmatic configuration, including new support for CLI overrides. A critical security vulnerability has been identified in the implementation of CLI overrides within set_nested_attr and get_nested_attr. The use of unvalidated dot-notation paths with getattr and setattr allows for Attribute Injection and Class Pollution, which could enable an attacker to corrupt the application state or bypass security controls if training scripts are executed with untrusted command-line arguments. Strict validation is recommended to ensure only defined Pydantic fields can be accessed or modified. Additionally, a critical functional issue was found in the CLI override parsing logic that could cause it to fail for certain argument formats, and a high-severity bug in default value handling might lead to user-provided arguments being ignored. There are also several medium-severity suggestions for code quality, such as removing unused imports, adhering to import location best practices, and updating the project documentation.

Comment on lines 156 to 180
# It's a flag or argument with value
# Check if it's defined in parser
try:
# Try to parse just this arg to see if it's known
test_args = known_args + [arg]
if i + 1 < len(args) and not args[i + 1].startswith("--"):
test_args.append(args[i + 1])
parser.parse_known_args(test_args)
# If successful, it's a known arg
known_args.append(arg)
if i + 1 < len(args) and not args[i + 1].startswith("--"):
known_args.append(args[i + 1])
i += 2
else:
i += 1
except:
# Unknown flag, might be override without =
# Treat as override if next item exists and doesn't start with --
if i + 1 < len(args) and not args[i + 1].startswith("--"):
override_str = f"{arg[2:]}={args[i + 1]}"
overrides.append(override_str)
i += 2
else:
known_args.append(arg)
i += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The logic to distinguish between known argparse arguments and unknown config overrides is flawed. The try-except block on lines 158-171 will not catch unknown arguments as intended.

parser.parse_known_args() does not raise an exception for unknown arguments; instead, it returns them in the second element of the result tuple. Your current implementation will incorrectly classify unknown arguments (like --trainer.policy.model.path ... when not using =) as known arguments, which will cause parser.parse_args() to fail later.

This logic needs to be revised to correctly check the unknown list returned by parse_known_args() to properly identify and handle config overrides.

Comment on lines 486 to 503
def set_nested_attr(obj: Any, path: str, value: Any) -> None:
"""
Set a nested attribute on a Pydantic model using dot notation.

Args:
obj: The root Pydantic model
path: Dot-separated path (e.g., "trainer.policy.model.path")
value: Value to set

Example:
>>> cfg = SkyRLConfig(...)
>>> set_nested_attr(cfg, "trainer.policy.model.path", "Qwen/Qwen2.5-1.5B")
"""
parts = path.split(".")
for part in parts[:-1]:
obj = getattr(obj, part)
setattr(obj, parts[-1], value)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The set_nested_attr function is vulnerable to Attribute Injection (and potentially Class Pollution). It uses getattr and setattr to traverse and modify the configuration object based on a dot-separated path provided via CLI overrides. Since the path is not validated, an attacker can provide a path like __class__.__init__ to overwrite the constructor of the class, or access other internal Python attributes.

To remediate this, you should validate that each part of the path corresponds to a valid field defined in the Pydantic model. For Pydantic v2 models, you can check obj.model_fields to ensure only intended configuration fields are modified.

def set_nested_attr(obj: Any, path: str, value: Any) -> None:
    parts = path.split(".")
    for part in parts[:-1]:
        if hasattr(obj, "model_fields") and part not in obj.model_fields:
            raise ValueError(f"Invalid config field: {part}")
        if part.startswith("__"):
            raise ValueError(f"Access to internal attribute {part} is forbidden")
        obj = getattr(obj, part)
    
    last_part = parts[-1]
    if hasattr(obj, "model_fields") and last_part not in obj.model_fields:
        raise ValueError(f"Invalid config field: {last_part}")
    if last_part.startswith("__"):
        raise ValueError(f"Access to internal attribute {last_part} is forbidden")
    setattr(obj, last_part, value)

Comment on lines 181 to 184
data_dir = args.data_dir or os.environ.get("DATA_DIR", None)
num_gpus = args.num_gpus or int(os.environ.get("NUM_GPUS", "4"))
logger = args.logger or os.environ.get("LOGGER", "wandb")
inference_backend = args.inference_backend or os.environ.get("INFERENCE_BACKEND", "vllm")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using the or operator for setting default values can lead to unexpected behavior. For instance, if a user provides 0 for --num-gpus or an empty string for a path, these falsy values will be ignored, and the default from the environment variable or the hardcoded value will be used instead. This can be surprising and lead to incorrect configurations.

To ensure user-provided values are always respected, it's safer to explicitly check for None before falling back to a default.

Suggested change
data_dir = args.data_dir or os.environ.get("DATA_DIR", None)
num_gpus = args.num_gpus or int(os.environ.get("NUM_GPUS", "4"))
logger = args.logger or os.environ.get("LOGGER", "wandb")
inference_backend = args.inference_backend or os.environ.get("INFERENCE_BACKEND", "vllm")
data_dir = args.data_dir if args.data_dir is not None else os.environ.get("DATA_DIR", None)
num_gpus = args.num_gpus if args.num_gpus is not None else int(os.environ.get("NUM_GPUS", "4"))
logger = args.logger if args.logger is not None else os.environ.get("LOGGER", "wandb")
inference_backend = args.inference_backend if args.inference_backend is not None else os.environ.get("INFERENCE_BACKEND", "vllm")

Comment on lines 505 to 524
def get_nested_attr(obj: Any, path: str) -> Any:
"""
Get a nested attribute from a Pydantic model using dot notation.

Args:
obj: The root Pydantic model
path: Dot-separated path (e.g., "trainer.policy.model.path")

Returns:
The value at the specified path

Example:
>>> cfg = SkyRLConfig(...)
>>> model_path = get_nested_attr(cfg, "trainer.policy.model.path")
"""
parts = path.split(".")
for part in parts:
obj = getattr(obj, part)
return obj

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The get_nested_attr function allows arbitrary attribute access because it does not validate the path argument. An attacker could use this to read sensitive internal attributes of the Python objects. Similar to set_nested_attr, you should validate that the path only traverses allowed fields of the Pydantic model.

def get_nested_attr(obj: Any, path: str) -> Any:
    parts = path.split(".")
    for part in parts:
        if hasattr(obj, "model_fields") and part not in obj.model_fields:
            raise ValueError(f"Invalid config field: {part}")
        if part.startswith("__"):
            raise ValueError(f"Access to internal attribute {part} is forbidden")
        obj = getattr(obj, part)
    return obj

Comment on lines +223 to +225
1. **No CLI overrides for Python configs**: When using pure Python configs, you can't override via CLI like Hydra does
- **Workaround**: Use environment variables or modify Python code
- **Planned**: Add argparse-based override system
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This documentation appears to be out of sync with the implementation in this pull request. The PR successfully adds CLI override support for Python configurations, but this document still lists it as a current limitation and a future to-do item.

To align the documentation with the code, please consider the following updates:

  • Line 37: Mark [ ] Add CLI override support for Python-based configs as completed ([x]).
  • Lines 223-225: Remove this section, as the limitation has been addressed.
  • Line 236: Mark [ ] **CLI Override Support** as completed ([x]).

import os
from pathlib import Path

from skyrl_train.config.configs import create_default_config, SkyRLConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The create_default_config import is not used in this file and can be removed to keep the imports clean.

Suggested change
from skyrl_train.config.configs import create_default_config, SkyRLConfig
from skyrl_train.config.configs import SkyRLConfig

Comment on lines +54 to +59
from skyrl_train.config.configs import (
DataConfig, TrainerConfig, GeneratorConfig, EnvironmentConfig,
PlacementConfig, PolicyConfig, RefConfig, CriticConfig,
ModelConfig, CriticModelConfig, OptimizerConfig, AlgorithmConfig,
SamplingParamsConfig
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to PEP 8, imports should generally be placed at the top of the file. Moving this local import out of the get_gsm8k_config function would improve readability and align with standard Python conventions. There doesn't appear to be a circular dependency that would require a local import here.

import os
from pathlib import Path

from skyrl_train.config.configs import create_default_config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The create_default_config import is unused in this file and can be removed.

Comment on lines +20 to +25
from skyrl_train.config.configs import (
SkyRLConfig, DataConfig, TrainerConfig, GeneratorConfig, EnvironmentConfig,
PlacementConfig, PolicyConfig, RefConfig, CriticConfig,
ModelConfig, CriticModelConfig, OptimizerConfig, AlgorithmConfig,
SamplingParamsConfig
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to PEP 8, imports should generally be placed at the top of the file. Moving this local import out of the get_test_config function would improve readability and align with standard Python conventions. There doesn't appear to be a circular dependency that would require a local import here.

tyler-griggs and others added 2 commits January 25, 2026 01:06
- Renamed cli_overrides.py to cli.py (cleaner name)
- Simplified main() function to remove redundant argparse arguments
- All config fields can be overridden via --key.path=value syntax
- Environment variables still supported for convenience params

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add _validate_path() to prevent access to private/dunder attributes
- Remove unused create_default_config import from run_gsm8k.py
- Replace flawed parse_args_with_overrides with simpler collect_overrides

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@erictang000
Copy link
Collaborator

thoughts on making these just simple python dataclasses rather than opting for full blown pydantic configs?

just feeling that pydantic sometimes comes with heavy weight and annoying validation which can slow down projects (for example, it's sometimes pretty annoying for testing to construct valid pydantic configs that have just a subset of arguments). We do a lot of config validation in skyrl-train already, and adding new fields to a dataclass is easier. We could use pydantic selectively for some specific configs that need extra validation, which seems like a common pattern.

For example, vLLM and huggingface EngineArgs and TrainingArgs use python dataclasses for the top level config, but have some specific configs that use pydantic for validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants