Add generic trajectory logging for debugging RL training by adigoel4 · Pull Request #174 · NovaSky-AI/SkyRL

adigoel4 · 2025-08-20T23:33:34Z

Added a generic TrajectoryLogger system to help debug what's happening during training. You can now export complete trajectories (prompts, responses, rewards, etc.) to see exactly what the model is doing.

Features:

Base TrajectoryLogger class with WandB and CSV implementations
Integrated into SkyRLGymGenerator using dependency injection
Trainer automatically logs trajectories at configurable intervals during training/eval
Configurable text truncation and trajectory limits to avoid spamming logs
Handles tokenization properly so everything exports as readable text

Config:

generator:
   trajectory_logging:
      enabled: true
      type: "wandb"  # or "csv" 
      max_trajectories: 10
      log_interval: 100
      max_text_length: 2000

I also added CSV logging and integration tests so I could test this on CPU without needing GPU dependencies - not sure if we need to keep those but they work.

Fixes #122

- Implement flexible TrajectoryLogger with multiple backends (WandB, CSV, Composite) - Add complete trajectory data capture including prompts, responses, rewards, and metadata - Integrate trajectory logging into SkyRLGymGenerator with configurable options - Support detokenization for readable text analysis - Add comprehensive unit and integration tests (710+ test lines) - Configure YAML-based trajectory logging with granular control options - Enable trajectory export to pandas DataFrames for analysis workflows

- Add create_trajectory_logger_from_config() factory function - Add optional trajectory_logger parameter to SkyRLGymGenerator - Remove test-specific get_collected_trajectories() method - Update config to support CSV output_dir parameter - Refactor tests to use dependency injection

Tests no longer check internal state

… later

gemini-code-assist

Code Review

This pull request introduces a valuable trajectory logging system for debugging RL training, with implementations for both WandB and CSV. The code is well-structured, includes new configurations, and is accompanied by a good set of unit and integration tests. My review includes suggestions for improving maintainability by using a NamedTuple for complex return types, refactoring duplicated logic, and ensuring consistency in metric calculations. I've also pointed out some minor style issues related to imports.

skyrl-train/skyrl_train/generators/skyrl_gym_generator.py

skyrl-train/skyrl_train/trainer.py

skyrl-train/skyrl_train/utils/trajectory_logger.py

skyrl-train/tests/cpu/test_trajectory_logging_integration.py

- Replace agent_loop return tuple with NamedTuple for better maintainability - Remove redundant 'trajectory_id' in locals() check - Add public flush_trajectories method to GeneratorInterface - Fix inconsistent response_length calculation to use token count - Move import statements to top of files per PEP 8

tyler-griggs

Thanks for sending this in! Left a handful of comments that are primarily based around simplifying / minimizing the implementation. I'm sure users will customize on top of these classes, so it'd be great to make them as canonical as possible.

skyrl-train/skyrl_train/config/ppo_base_config.yaml

skyrl-train/skyrl_train/generators/base.py

skyrl-train/skyrl_train/trainer.py

skyrl-train/skyrl_train/utils/trajectory_logger.py

- Remove tokenizer dependency - users provide text directly - Remove text truncation features for full trajectories - Move trajectory_logger.py from utils/ to generators/ - Update defaults: max_trajectories=-1, log_full_history=True - Remove wandb try/catch since it's already a dependency - Fix response_length calculation to use len(traj.response) - Update all imports and tests for simplified interface - Change hardcoded prefix from 'train' to 'generation' Addresses all PR review comments for simpler implementation.

adigoel4 · 2025-08-27T19:49:51Z

Just pushed a much more simplified implementation. Looking back on this, I definitely over engineered this without thinking about what is actually necessary right now.

tyler-griggs

Thanks for the edits! My review is primarily around making another large simplification pass.

skyrl-train/skyrl_train/generators/skyrl_gym_generator.py

skyrl-train/skyrl_train/config/ppo_base_config.yaml

skyrl-train/skyrl_train/generators/skyrl_gym_generator.py

skyrl-train/skyrl_train/generators/trajectory_logger.py

- Remove Trajectory dataclass per review feedback - Simplify log() method to accept prompts, responses, rewards directly - Remove log_full_history configuration option - Add TODO for custom trajectory logger registry - Update factory method to use export_path for default output_dir - Assert trajectory logging is enabled in factory method

- Remove log_full_history option (always log full trajectories) - Remove log_interval option (always log every step) - Update output_dir comment to indicate default uses export_path

adigoel4 · 2025-09-12T00:25:06Z

Went through and tried to make it as simple as possible!

tyler-griggs · 2025-09-12T01:45:23Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a valuable trajectory logging system for debugging, with implementations for WandB and CSV. The overall structure is well-designed, including the use of a base class and a factory function for creating loggers from configuration. The addition of comprehensive unit and integration tests is also a great practice.

My review includes one critical feedback regarding a hardcoded step in the logging call, which would cause issues with data being overwritten or logged incorrectly. I've also included a couple of medium-severity suggestions to improve code reuse and performance in the logger implementations.

Overall, this is a solid contribution that will significantly improve debugging capabilities.

skyrl-train/skyrl_train/generators/skyrl_gym_generator.py

gemini-code-assist · 2025-09-12T01:46:46Z

skyrl-train/skyrl_train/generators/trajectory_logger.py

+class TrajectoryLogger(ABC):
+    """
+    Abstract base class for trajectory logging.
+
+    TODO: Allow users to bring a custom trajectory logger. They should be able to
+    define their own class (outside of the skyrl-train package) and add it to
+    a registry (see AdvantageEstimatorRegistry for an example) so that it can
+    be referenced by name in the config.
+    """


To improve maintainability and reduce code duplication, the trajectory slicing logic, which is currently duplicated in both WandbTableTrajectoryLogger and CSVTrajectoryLogger, could be moved into this base class.

You could use the template method pattern by making log a concrete method that handles slicing and then calls a new abstract method (e.g., _log) for the specific logging implementation. This would centralize the slicing logic and make the logger implementations cleaner.

lines of duplicated slicing logic are worth it for the much cleaner user experience when implementing custom loggers

skyrl-train/skyrl_train/generators/trajectory_logger.py

tyler-griggs · 2025-09-22T18:11:18Z

Merging this into a dev branch dev/trajectory_logger to be picked up after SkyRLGymGenerator refactor and clean-up.

adigoel4 added 5 commits August 20, 2025 15:33

Added helper methods for trajectory collection and flushing

3e2ba62

Tests no longer check internal state

Remove CompositeTrajectoryLogger to have focused PR, easy to add back…

2b32ba2

… later

Remove redundant tests to keep PR focused

76c3742

gemini-code-assist bot reviewed Aug 20, 2025

View reviewed changes

tyler-griggs reviewed Aug 22, 2025

View reviewed changes

adigoel4 requested a review from tyler-griggs August 27, 2025 20:40

Merge branch 'main' into trajectory-logging-clean

179699f

tyler-griggs reviewed Sep 4, 2025

View reviewed changes

adigoel4 added 6 commits September 10, 2025 04:42

Merge branch 'main' into trajectory-logging-clean

9884e4a

refactor: simplify trajectory logging config

0387f27

- Remove log_full_history option (always log full trajectories) - Remove log_interval option (always log every step) - Update output_dir comment to indicate default uses export_path

refactor: simplify trajectory logging tests and remove unused methods

acf7f93

Clean up and last simplification pass

02a3f03

Forgot Comma

57cdb6d

adigoel4 requested a review from tyler-griggs September 12, 2025 00:24

gemini-code-assist bot reviewed Sep 12, 2025

View reviewed changes

Fixing gemini comments, and removing step and prefix

0210e44

tyler-griggs changed the base branch from main to dev/trajectory_logger September 22, 2025 18:10

tyler-griggs merged commit 158f5de into NovaSky-AI:dev/trajectory_logger Sep 22, 2025

Conversation

adigoel4 commented Aug 20, 2025

Features:

Config:

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tyler-griggs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adigoel4 commented Aug 27, 2025

Uh oh!

tyler-griggs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adigoel4 commented Sep 12, 2025

Uh oh!

tyler-griggs commented Sep 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

adigoel4 Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tyler-griggs commented Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments