Replies: 1 comment
-
First, I assume your environment's action format is a dict, e.g., {"a": [3, 4], "b": 5.0} the next step is to inherit an existing policy class and overwrite its forward function, so that the return value is a Batch that contains the desired action format, e.g., def forward(self, ...):
...
return Batch(act=Batch(a=..., b=...), ...) Feel free to modify the way that calculates the result of a and b, for example, you can directly return a and b in your network forward. And that's it. Note: in forward function, action a and b are batch data, i.e., a has a shape of |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Any tutorials on warpping a (multi-actions) custom gym-env for tianshou?
有没有官方交流群?感觉弄个群交流会方便大家使用
Beta Was this translation helpful? Give feedback.
All reactions