In SkyRLGymGenerator, we post-process the trajectories to truncate them to the configured maximum trajectory length (code link). This is a bit messy.
Instead, we should just update the sampling parameter's max_generate_length parameter to be something like max generate length = min(max model len - len(chat_history), max_generate_length). Basically, we truncate the max_generate_length if it exceeds the maximum context length. Then, we can remove the messy post-processing.