[SkyRLGymGenerator] Cleaner generation length handling

In SkyRLGymGenerator, we post-process the trajectories to truncate them to the configured maximum trajectory length ([code link](https://github.com/NovaSky-AI/SkyRL/blob/cbd0680d7572f3045f68e0de68d78eb1f947364d/skyrl-train/skyrl_train/generators/skyrl_gym_generator.py#L285-L286)). This is a bit messy.

Instead, we should just update the sampling parameter's `max_generate_length` parameter to be something like `max generate length = min(max model len - len(chat_history), max_generate_length)`. Basically, we truncate the `max_generate_length` if it exceeds the maximum context length. Then, we can remove the messy post-processing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SkyRLGymGenerator] Cleaner generation length handling #279

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[SkyRLGymGenerator] Cleaner generation length handling #279

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions