-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Trainers as generators #559
Add Trainers as generators #559
Conversation
…o a generator that yields a 3-tuple (epoch, stats, info) of train results on every epoch.
Add tests for trainer generators.
Nice suggestion! But could you please remove the duplicated code in a single trainer file? I'm thinking about the following approach (or something similar): class Trainer:
def __init__(self, ...):
...
def __iter__(self, ...):
...
def run(self):
...
onpolicy_trainer = lambda *args, **kwargs: Trainer(*args, **kwargs).run()
onpolicy_trainer_gen = Trainer # or another name |
Hi @Trinkle23897 lets see how looks offline.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great work!
And is it possible to create BaseTrainer to further reduce code duplication? |
…aner less code keeping the same functionality
…he sketch in offline.py
In the same line I just wanted to ask, what is specifically the difference
between off policy and on policy?
El dom., 6 mar. 2022 19:29, Jiayi Weng ***@***.***> escribió:
… And is it possible to create BaseTrainer to further reduce code
duplication?
—
Reply to this email directly, view it on GitHub
<#559 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA3NNMZ6IDQIPNGEK75S3FTU6T2RDANCNFSM5QADHE7A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Now pushed the OffPolicyTrainer |
W.r.t common things of the three Trainers. The init it is very the same except from few things It is just the next that is different, however it has things in common. |
* All the procedures are so equal that separating them will make to much unnecessary duplicated complex code * Included tests in test_ppo.py, test_cql.py and test_rd3.py * It can be simplified even more, but would break backward Api compatibility
…ers_as_generators
Hi man @Trinkle23897 I think I have done what I can. https://github.com/jamartinh/tianshou/runs/5469774938?check_suite_focus=true#step:7:27 But I cannot go further, so help is needed if this refactoring is usefull. Thanks, |
* Simplify return logic
* Simplify return logic
* Simplify return logic
Codecov Report
@@ Coverage Diff @@
## master #559 +/- ##
==========================================
- Coverage 93.62% 93.50% -0.12%
==========================================
Files 64 65 +1
Lines 4392 4419 +27
==========================================
+ Hits 4112 4132 +20
- Misses 280 287 +7
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
@Trinkle23897 RHEL 7 test_sac_with_il.py::test_sac_with_il
============================== 1 passed in 35.95s ==============================
Process finished with exit code 0
PASSED [100%]Epoch #1: test_reward: -480.222050 ± 144.561932, best_reward: -480.222050 ± 144.561932 in #1
Epoch #2: test_reward: -223.095769 ± 115.835893, best_reward: -223.095769 ± 115.835893 in #2
Epoch #1: 75%|#######4 | 17991/24000 [00:24<00:08, 725.82it/s, alpha=0.634, env_step=17990, len=200, loss/actor=115.555, loss/alpha=-0.534, loss/critic1=14.512, loss/critic2=17.633, n/ep=0, n/st=10, rew=-511.42]
Epoch #1: 501it [00:00, 506.55it/s, env_step=500, len=0, loss=0.004, n/ep=0, n/st=10, rew=0.00]
Epoch #2: 501it [00:00, 576.68it/s, env_step=1000, len=0, loss=0.001, n/ep=0, n/st=10, rew=0.00]
|
* Simplify return logic
* Simplify return logic
@Trinkle23897 |
@Trinkle23897 Ok now what? |
Sorry about the delay because I have a deadline this evening and another deadline two days later. I'll have a look right after finishing those tasks. |
2fd889f
to
7a00daf
Compare
The new proposed feature is to have trainers as generators. The usage pattern is: ```python trainer = OnPolicyTrainer(...) for epoch, epoch_stat, info in trainer: print(f"Epoch: {epoch}") print(epoch_stat) print(info) do_something_with_policy() query_something_about_policy() make_a_plot_with(epoch_stat) display(info) ``` - epoch int: the epoch number - epoch_stat dict: a large collection of metrics of the current epoch, including stat - info dict: the usual dict out of the non-generator version of the trainer You can even iterate on several different trainers at the same time: ```python trainer1 = OnPolicyTrainer(...) trainer2 = OnPolicyTrainer(...) for result1, result2, ... in zip(trainer1, trainer2, ...): compare_results(result1, result2, ...) ``` Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
The new proposed feature is to have trainers as generators.
The usage pattern is like:
You can even iterate on several different trainers at the same time:
make format
(required)make commit-checks
(required)