-
Notifications
You must be signed in to change notification settings - Fork 255
Description
It is very useful to dig into the complete generated trajectories (including model outputs and environment observations / feedback) to debug model behavior during training. We should support exporting trajectories in an easily-readable format.
At minimum, we should support dumping the chat history as a table to wandb.
A better solution is likely creating a generic TrajectoryLogger class that is given complete trajectories and can dump / export them in a user-customizable way. Some users simply want to read the prompts and responses (e.g., as a table in wandb), others may want to create a dataframe and do some data analysis in a notebook -- a generic TrajectoryLogger class would support this.
To start, the class could exposes a single method, log, that takes as input a trajectory in ConversationType data type. But, it is possible (and likely) that users will want more flexibility in what they pass into the class, so this is worth considering.
TODOs
- Create generic TrajectoryLogger class
- Create a WandbTableTrajectoryLogger instantiation of this class, which simply uploads a table of prompts and responses
- Use the WandbTableTrajectoryLogger in SkyRLGymGenerator (and cover with flag) to add trajectory logging to wandb for all training runs that use the skyrl gym generator. One tricky detail I anticipate is handling de-tokenization of trajecotries so they are uploaded in plaintext.