Skip to content

Add generic trajectory logging for debugging RL training#174

Merged
tyler-griggs merged 15 commits intoNovaSky-AI:dev/trajectory_loggerfrom
adigoel4:trajectory-logging-clean
Sep 22, 2025
Merged

Add generic trajectory logging for debugging RL training#174
tyler-griggs merged 15 commits intoNovaSky-AI:dev/trajectory_loggerfrom
adigoel4:trajectory-logging-clean

Conversation

@adigoel4
Copy link

Added a generic TrajectoryLogger system to help debug what's happening during training. You can now export complete trajectories (prompts, responses, rewards, etc.) to see exactly what the model is doing.

Features:

  • Base TrajectoryLogger class with WandB and CSV implementations
  • Integrated into SkyRLGymGenerator using dependency injection
  • Trainer automatically logs trajectories at configurable intervals during training/eval
  • Configurable text truncation and trajectory limits to avoid spamming logs
  • Handles tokenization properly so everything exports as readable text

Config:

generator:
   trajectory_logging:
      enabled: true
      type: "wandb"  # or "csv" 
      max_trajectories: 10
      log_interval: 100
      max_text_length: 2000

I also added CSV logging and integration tests so I could test this on CPU without needing GPU dependencies - not sure if we need to keep those but they work.

Fixes #122

- Implement flexible TrajectoryLogger with multiple backends (WandB, CSV, Composite)
- Add complete trajectory data capture including prompts, responses, rewards, and metadata
- Integrate trajectory logging into SkyRLGymGenerator with configurable options
- Support detokenization for readable text analysis
- Add comprehensive unit and integration tests (710+ test lines)
- Configure YAML-based trajectory logging with granular control options
- Enable trajectory export to pandas DataFrames for analysis workflows
- Add create_trajectory_logger_from_config() factory function
- Add optional trajectory_logger parameter to SkyRLGymGenerator
- Remove test-specific get_collected_trajectories() method
- Update config to support CSV output_dir parameter
- Refactor tests to use dependency injection
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable trajectory logging system for debugging RL training, with implementations for both WandB and CSV. The code is well-structured, includes new configurations, and is accompanied by a good set of unit and integration tests. My review includes suggestions for improving maintainability by using a NamedTuple for complex return types, refactoring duplicated logic, and ensuring consistency in metric calculations. I've also pointed out some minor style issues related to imports.

- Replace agent_loop return tuple with NamedTuple for better maintainability
- Remove redundant 'trajectory_id' in locals() check
- Add public flush_trajectories method to GeneratorInterface
- Fix inconsistent response_length calculation to use token count
- Move import statements to top of files per PEP 8
Copy link
Member

@tyler-griggs tyler-griggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sending this in! Left a handful of comments that are primarily based around simplifying / minimizing the implementation. I'm sure users will customize on top of these classes, so it'd be great to make them as canonical as possible.

- Remove tokenizer dependency - users provide text directly
- Remove text truncation features for full trajectories
- Move trajectory_logger.py from utils/ to generators/
- Update defaults: max_trajectories=-1, log_full_history=True
- Remove wandb try/catch since it's already a dependency
- Fix response_length calculation to use len(traj.response)
- Update all imports and tests for simplified interface
- Change hardcoded prefix from 'train' to 'generation'

Addresses all PR review comments for simpler implementation.
@adigoel4
Copy link
Author

Just pushed a much more simplified implementation. Looking back on this, I definitely over engineered this without thinking about what is actually necessary right now.

@adigoel4 adigoel4 requested a review from tyler-griggs August 27, 2025 20:40
Copy link
Member

@tyler-griggs tyler-griggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the edits! My review is primarily around making another large simplification pass.

- Remove Trajectory dataclass per review feedback
- Simplify log() method to accept prompts, responses, rewards directly
- Remove log_full_history configuration option
- Add TODO for custom trajectory logger registry
- Update factory method to use export_path for default output_dir
- Assert trajectory logging is enabled in factory method
- Remove log_full_history option (always log full trajectories)
- Remove log_interval option (always log every step)
- Update output_dir comment to indicate default uses export_path
@adigoel4
Copy link
Author

Went through and tried to make it as simple as possible!

@tyler-griggs
Copy link
Member

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable trajectory logging system for debugging, with implementations for WandB and CSV. The overall structure is well-designed, including the use of a base class and a factory function for creating loggers from configuration. The addition of comprehensive unit and integration tests is also a great practice.

My review includes one critical feedback regarding a hardcoded step in the logging call, which would cause issues with data being overwritten or logged incorrectly. I've also included a couple of medium-severity suggestions to improve code reuse and performance in the logger implementations.

Overall, this is a solid contribution that will significantly improve debugging capabilities.

Comment on lines +14 to +22
class TrajectoryLogger(ABC):
"""
Abstract base class for trajectory logging.

TODO: Allow users to bring a custom trajectory logger. They should be able to
define their own class (outside of the skyrl-train package) and add it to
a registry (see AdvantageEstimatorRegistry for an example) so that it can
be referenced by name in the config.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve maintainability and reduce code duplication, the trajectory slicing logic, which is currently duplicated in both WandbTableTrajectoryLogger and CSVTrajectoryLogger, could be moved into this base class.

You could use the template method pattern by making log a concrete method that handles slicing and then calls a new abstract method (e.g., _log) for the specific logging implementation. This would centralize the slicing logic and make the logger implementations cleaner.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lines of duplicated slicing logic are worth it for the much cleaner user experience when implementing custom loggers

@tyler-griggs tyler-griggs changed the base branch from main to dev/trajectory_logger September 22, 2025 18:10
@tyler-griggs
Copy link
Member

Merging this into a dev branch dev/trajectory_logger to be picked up after SkyRLGymGenerator refactor and clean-up.

@tyler-griggs tyler-griggs merged commit 158f5de into NovaSky-AI:dev/trajectory_logger Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add generic trajectory logger

2 participants

Comments