-
Notifications
You must be signed in to change notification settings - Fork 34
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* p2e dv1 and p2e dv2 split into exploration and finetuning * fix: exploration amount * fix: change actor from exploration to task when starting training * fix: from __future__ import annotations * fix: exploration amount * docs: added p2e readme * Feature/p2e dv3 (#113) * feat: implemented p2e_dv3 * feat: added the possibility to have more critics for exploration * tests: added p2e_dv3 test * docs: update p2e_dv3 docs * docs: update * fix: p2e_dv3 refactoring * fix: checkpoint * Fix missing 0.5 value * feat: add validate args to p2e_dv3 * feat: uniform p2e_dv3 with last improvements * fix: ppo tests * feat: split exploration and finetuning * fix: resume from checkpoint controls * fix: bugs * tests: added p2e_dv3 and resume from checkpoint tests * fix: p2e dv3 resume from checkpoint * tests: update p2e dv3 test * feat: added p2e_dv3 evaluation * fix: evaluate and __init__ * fix: cli controls * fix: added detach() when learning world model in exploration * fix: checks in cli * fix: exploration amount * fix: removed minedojo test cfgs --------- Co-authored-by: belerico_t <federico.belotti@orobix.com> * fix: buffer load --------- Co-authored-by: belerico_t <federico.belotti@orobix.com>
- Loading branch information
1 parent
933f4b9
commit 9b68f22
Showing
47 changed files
with
4,286 additions
and
387 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# Plan2Explore | ||
## Algorithm Overview | ||
|
||
The Plan2Explore algorithm is designed to efficiently learn and exploit the dynamics of the environment for accomplishing multiple tasks. The algorithm employs two actors: one for exploration and one for learning the task. During the exploratory phase, the exploration actor focuses on discovering new states by selecting actions that lead to unexplored regions. Simultaneously, the task actor learns from the experiences gathered by the exploration actor in a zero-shot manner. Following the exploration phase, the agent can be fine-tuned with experiences collected by the task actor in a few-shot fashion, enhancing its performance on specific tasks. | ||
|
||
## Implementation Details | ||
|
||
### Scripts | ||
|
||
The algorithm implementation is organized into two scripts: | ||
|
||
1. **Exploration Script (`p2e_dv1_exploration.py`):** | ||
- Used for the exploratory phase to learn the dynamics of the environment. | ||
- Trains the exploration actor to select actions leading to new states. | ||
|
||
2. **Fine-tuning Script (`p2e_dv1_finetuning.py`):** | ||
- Utilized for fine-tuning the agent after the exploration phase. | ||
- Starts with a trained agent and refines its performance or learns new tasks. | ||
|
||
### Configuration Constraints | ||
|
||
To ensure the proper functioning of the algorithm, the following constraints must be observed: | ||
|
||
- **Environment Configuration:** The fine-tuning must be executed with the same environment configurations used during exploration. | ||
|
||
- **Hyper-parameter Consistency:** Hyper-parameters of the agent should remain consistent between the exploration and fine-tuning phases. | ||
|
||
### Experience Collection | ||
|
||
The implementation supports flexibility in experience collection during fine-tuning: | ||
|
||
- **Buffer Options:** Fine-tuning can start from the buffer collected during exploration or a new one (`buffer.load_from_exploration` parameter). | ||
|
||
- **Initial Experiences:** If using a new buffer, users can decide whether to collect initial experiences (until `learning_start`) with the `actor_exploration` or the `actor_task`. After `learning_start`, only the `actor_task` collects experiences. (`player.actor_type` parameter, can be either `exploration` or `task`). | ||
|
||
> **Note** | ||
> | ||
> When exploring, the only valid choice of the `player.actor_type` parameter is `exploration`. | ||
## Usage | ||
|
||
To use the Plan2Explore framework, follow these steps: | ||
|
||
1. Run the exploration script to learn the dynamics of the environment. | ||
2. Execute the fine-tuning script with the same environment configurations and consistent hyper-parameters. | ||
|
||
> **Note** | ||
> | ||
> Choose whether to start fine-tuning from the exploration buffer or create a new buffer, and specify the actor for initial experience collection accordingly. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.