Skip to content

Conversation

@ehsk
Copy link
Collaborator

@ehsk ehsk commented Dec 3, 2025

Goal: Reproduce previously produced reward curves on ChartQA

Old New (this PR)
image image
achieved more than 80 after around 30 steps same results and fixed crash after 40 steps with before (orange) vs after (green)
  • streams=redis (changing to files substantially slows down the actor as the size of rollouts explode), added max_stream_size to control the size of Redis' queue.
  • freeze_vision_encoder option added. (not used for the above results though)
  • Minor improvements in launch.py
  • Batch size can be greater than 1:
x-axis=optimizer steps x-axis=time
image image

@ehsk ehsk changed the title Multimodal tweaks Enhance multi-modal support Dec 3, 2025
@ehsk ehsk mentioned this pull request Dec 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants