You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background:
I am currently working on a custom environment for the Battle City game using Gymnasium and Stable Baselines v3. My objective is to train an agent using the Proximal Policy Optimization (PPO) algorithm. To enhance the learning process, I've developed a rule-based bot which successfully wins the game.
Issue:
The challenge arises with the PPO agent's learning efficacy. Despite incorporating the bot's actions as part of the observation and rewarding the agent for mimicking these actions, the learning outcomes have been suboptimal. My initial impression was that this library would facilitate learning through imitation from my custom bot. However, after reviewing the example codes, it appears that the current setup may necessitate an alternative model for this purpose.
Inquiry:
I seek clarification on the library's capabilities in this context:
Does the current implementation only support learning from another model, rather than a custom rule-based bot?
If my understanding is correct, and learning from a rule-based bot is not supported, I would like to propose this as a feature request. Implementing the ability to use actions from a custom bot for imitation learning would be a valuable addition to this library.
Alternate Request:
In case my interpretation is incorrect, and the library does support learning from a bot's actions, I would greatly appreciate a simple example or guidance on how to utilize my bot's actions for training the PPO agent within this framework.
Looking forward to your response and guidance on this matter.
The text was updated successfully, but these errors were encountered:
Background:
I am currently working on a custom environment for the Battle City game using Gymnasium and Stable Baselines v3. My objective is to train an agent using the Proximal Policy Optimization (PPO) algorithm. To enhance the learning process, I've developed a rule-based bot which successfully wins the game.
Issue:
The challenge arises with the PPO agent's learning efficacy. Despite incorporating the bot's actions as part of the observation and rewarding the agent for mimicking these actions, the learning outcomes have been suboptimal. My initial impression was that this library would facilitate learning through imitation from my custom bot. However, after reviewing the example codes, it appears that the current setup may necessitate an alternative model for this purpose.
Inquiry:
I seek clarification on the library's capabilities in this context:
Does the current implementation only support learning from another model, rather than a custom rule-based bot?
If my understanding is correct, and learning from a rule-based bot is not supported, I would like to propose this as a feature request. Implementing the ability to use actions from a custom bot for imitation learning would be a valuable addition to this library.
Alternate Request:
In case my interpretation is incorrect, and the library does support learning from a bot's actions, I would greatly appreciate a simple example or guidance on how to utilize my bot's actions for training the PPO agent within this framework.
Looking forward to your response and guidance on this matter.
The text was updated successfully, but these errors were encountered: