Add generic trajectory logging for debugging RL training#174
Conversation
- Implement flexible TrajectoryLogger with multiple backends (WandB, CSV, Composite) - Add complete trajectory data capture including prompts, responses, rewards, and metadata - Integrate trajectory logging into SkyRLGymGenerator with configurable options - Support detokenization for readable text analysis - Add comprehensive unit and integration tests (710+ test lines) - Configure YAML-based trajectory logging with granular control options - Enable trajectory export to pandas DataFrames for analysis workflows
- Add create_trajectory_logger_from_config() factory function - Add optional trajectory_logger parameter to SkyRLGymGenerator - Remove test-specific get_collected_trajectories() method - Update config to support CSV output_dir parameter - Refactor tests to use dependency injection
Tests no longer check internal state
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable trajectory logging system for debugging RL training, with implementations for both WandB and CSV. The code is well-structured, includes new configurations, and is accompanied by a good set of unit and integration tests. My review includes suggestions for improving maintainability by using a NamedTuple for complex return types, refactoring duplicated logic, and ensuring consistency in metric calculations. I've also pointed out some minor style issues related to imports.
- Replace agent_loop return tuple with NamedTuple for better maintainability - Remove redundant 'trajectory_id' in locals() check - Add public flush_trajectories method to GeneratorInterface - Fix inconsistent response_length calculation to use token count - Move import statements to top of files per PEP 8
tyler-griggs
left a comment
There was a problem hiding this comment.
Thanks for sending this in! Left a handful of comments that are primarily based around simplifying / minimizing the implementation. I'm sure users will customize on top of these classes, so it'd be great to make them as canonical as possible.
- Remove tokenizer dependency - users provide text directly - Remove text truncation features for full trajectories - Move trajectory_logger.py from utils/ to generators/ - Update defaults: max_trajectories=-1, log_full_history=True - Remove wandb try/catch since it's already a dependency - Fix response_length calculation to use len(traj.response) - Update all imports and tests for simplified interface - Change hardcoded prefix from 'train' to 'generation' Addresses all PR review comments for simpler implementation.
|
Just pushed a much more simplified implementation. Looking back on this, I definitely over engineered this without thinking about what is actually necessary right now. |
tyler-griggs
left a comment
There was a problem hiding this comment.
Thanks for the edits! My review is primarily around making another large simplification pass.
- Remove Trajectory dataclass per review feedback - Simplify log() method to accept prompts, responses, rewards directly - Remove log_full_history configuration option - Add TODO for custom trajectory logger registry - Update factory method to use export_path for default output_dir - Assert trajectory logging is enabled in factory method
- Remove log_full_history option (always log full trajectories) - Remove log_interval option (always log every step) - Update output_dir comment to indicate default uses export_path
|
Went through and tried to make it as simple as possible! |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable trajectory logging system for debugging, with implementations for WandB and CSV. The overall structure is well-designed, including the use of a base class and a factory function for creating loggers from configuration. The addition of comprehensive unit and integration tests is also a great practice.
My review includes one critical feedback regarding a hardcoded step in the logging call, which would cause issues with data being overwritten or logged incorrectly. I've also included a couple of medium-severity suggestions to improve code reuse and performance in the logger implementations.
Overall, this is a solid contribution that will significantly improve debugging capabilities.
| class TrajectoryLogger(ABC): | ||
| """ | ||
| Abstract base class for trajectory logging. | ||
|
|
||
| TODO: Allow users to bring a custom trajectory logger. They should be able to | ||
| define their own class (outside of the skyrl-train package) and add it to | ||
| a registry (see AdvantageEstimatorRegistry for an example) so that it can | ||
| be referenced by name in the config. | ||
| """ |
There was a problem hiding this comment.
To improve maintainability and reduce code duplication, the trajectory slicing logic, which is currently duplicated in both WandbTableTrajectoryLogger and CSVTrajectoryLogger, could be moved into this base class.
You could use the template method pattern by making log a concrete method that handles slicing and then calls a new abstract method (e.g., _log) for the specific logging implementation. This would centralize the slicing logic and make the logger implementations cleaner.
There was a problem hiding this comment.
lines of duplicated slicing logic are worth it for the much cleaner user experience when implementing custom loggers
|
Merging this into a dev branch |
Added a generic TrajectoryLogger system to help debug what's happening during training. You can now export complete trajectories (prompts, responses, rewards, etc.) to see exactly what the model is doing.
Features:
Config:
I also added CSV logging and integration tests so I could test this on CPU without needing GPU dependencies - not sure if we need to keep those but they work.
Fixes #122