Skip to content
Merged
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
f167c4d
Add reinforcement learning example illustrating gpu-to-gpu RDT and GRPO.
Oct 21, 2025
60b43cc
Simpify blocking for generator update
Oct 21, 2025
b8055e5
polish
Oct 21, 2025
aff3be9
polish
Oct 21, 2025
18a7ace
Rename constant
Oct 22, 2025
3c474ae
Explain total variable.
Oct 22, 2025
35f4ed4
PR feedback.
Oct 22, 2025
00860d4
PR feedback.
Oct 22, 2025
96f8444
PR feedback.
Oct 22, 2025
f179cac
Simplify tqdm loop
Oct 22, 2025
b58284a
PR feedback.
Oct 22, 2025
f5f3094
Remap ACTION_DIM -> GROUP_SIZE
Oct 22, 2025
728d453
Turn all comments into full sentences.
Oct 22, 2025
93dee0e
clarify comment
Oct 22, 2025
f994dec
remove unnecessary nixl decorator
Oct 22, 2025
efc2bb7
typos
Oct 22, 2025
4e0c1c3
simplify
Oct 22, 2025
2d0ddee
Remove the SignalActor since the actors are no longer async.
Oct 22, 2025
553a9d2
Improve comments.
Oct 22, 2025
f392d2d
Organized constants.
Oct 22, 2025
1a85bc5
organize constants
Oct 22, 2025
208a1c3
PR feedback.
Oct 22, 2025
77f66aa
Prevent memory leaks.
Oct 22, 2025
41249be
Expand on replay buffer docstring
Oct 22, 2025
9fdbc73
Fix algo name
Oct 22, 2025
0f4b7c8
Handle race condition with replay buffer and learner
Oct 22, 2025
c627914
Drop EMA reference model and KL loss term.
Oct 22, 2025
5d913f6
became one with the network until it converged nicely
Oct 25, 2025
444ee87
drop ema teacher weights
Oct 25, 2025
34bc1cc
lint
Oct 25, 2025
8123573
coerce floats
Oct 25, 2025
65e3755
clarify comment; lint
Oct 25, 2025
bc4810c
Apply suggestion from @stephanie-wang
stephanie-wang Oct 25, 2025
2c3a993
Apply suggestion from @stephanie-wang
stephanie-wang Oct 25, 2025
68e088f
Apply suggestion from @stephanie-wang
stephanie-wang Oct 25, 2025
512665b
Update grpo_contextual_bandits.py
stephanie-wang Oct 25, 2025
f3d30a1
Update grpo_contextual_bandits.py
stephanie-wang Oct 26, 2025
b471d42
Update grpo_contextual_bandits.py
stephanie-wang Oct 26, 2025
10306d3
drop lr scheduler; didn't make a big difference in loss
Oct 27, 2025
e7ce235
convert Scorer to CPU-only actor
Oct 27, 2025
d7abae7
revert change to policy version so that the first batch has >0 weight
Oct 27, 2025
4311cf8
fix error caused by sample_from when ReplayBuffer is empty
Oct 27, 2025
105f712
add note about single threaded actors
Oct 27, 2025
badad92
lint
Oct 27, 2025
cf0887a
drop policy version from learner
Oct 27, 2025
d7dcb57
fix indentation in metrics reporting
Oct 27, 2025
cb06a0e
rewrite while loop to avoid an extra call at each train step
Oct 27, 2025
cd9c8b3
lint
Oct 27, 2025
097fecc
Improve comments
Oct 29, 2025
4c84547
Revert accidental revert.
Oct 29, 2025
3c5ae46
lint
Oct 29, 2025
d403fcc
Merge branch 'master' into doc/rl-rdt-contextual-bandits
Qiaolin-Yu Oct 30, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading