Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(v2.2.7) - WandB integration #306

Merged
merged 24 commits into from
Mar 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
62a9427
Updated version from 2.2.6 to 2.2.7
AlejandroCN7 Mar 15, 2023
7282d1e
Added wandb extra dependency
AlejandroCN7 Mar 15, 2023
cf97a7b
Deleted mlflow code and included wandb init and params log in DRL_bat…
AlejandroCN7 Mar 15, 2023
20907d5
Deleted tensorboard_log in SB3 models and mlflow artifact stores in D…
AlejandroCN7 Mar 15, 2023
9b5bde7
Deleted mlflow and tensorboard extra dependencies
AlejandroCN7 Mar 15, 2023
40ee707
Deleted tensorboard element and added wandb in JSON example for DRL_b…
AlejandroCN7 Mar 15, 2023
1934708
Added WandBOutputFormar for callback loggers in SB3
AlejandroCN7 Mar 15, 2023
dd3b5f4
Added wandb folder to gitignore
AlejandroCN7 Mar 16, 2023
899b172
Fixed log interval parameter in DRL_battery.py JSON example and fixed…
AlejandroCN7 Mar 16, 2023
626cd16
DRL_battery.py: Added Monitor wrapper to env and wandb artifacts
AlejandroCN7 Mar 16, 2023
d605df5
Updated docs source modules
AlejandroCN7 Mar 16, 2023
2d9ba4c
Deleted old images of mlflow and tensorboard and included wandb image…
AlejandroCN7 Mar 16, 2023
e2fec9e
Updated documentation about DRL and google cloud with sinergym
AlejandroCN7 Mar 16, 2023
53f6ded
Added Stable Baselines 3 (gymnasium branch) in extra requires
AlejandroCN7 Mar 17, 2023
3752ce3
Fixed spelling and sphinx compilation
AlejandroCN7 Mar 17, 2023
e8fcb50
Added artifact_name and artifact_tag to DRL_battery JSON and adapted …
AlejandroCN7 Mar 17, 2023
d27c664
Adapted load_agent.py to new JSON format and created JSON example (wi…
AlejandroCN7 Mar 17, 2023
61cac07
Deleted stable baselines 3 from gymnasium PR extra dependency
AlejandroCN7 Mar 20, 2023
617a676
Fixed DRL_battery wandb artifact definition
AlejandroCN7 Mar 20, 2023
eea60c8
Updated DRL_battery.py model save path and wandb artifact
AlejandroCN7 Mar 20, 2023
c6abb2e
Updated load_agent evaluation name, updated wandb init_params, fixed …
AlejandroCN7 Mar 20, 2023
f9e8c89
updated structure documentation for load_agent.py
AlejandroCN7 Mar 20, 2023
437f6ac
extra require SB3 latest stable version and SB3 tests deactivated tem…
AlejandroCN7 Mar 20, 2023
b601ca0
Added wandb dependency to test container workflow
AlejandroCN7 Mar 20, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,7 @@ dist/
#coverage
.coverage
codecov
coverage.xml
coverage.xml

#wandb
wandb/
Binary file removed docs/source/_static/mlflow_example.png
Binary file not shown.
Binary file removed docs/source/_static/tensorboard_example.png
Binary file not shown.
Binary file added docs/source/_static/wandb_example1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/wandb_example2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/wandb_example3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
125 changes: 91 additions & 34 deletions docs/source/pages/deep-reinforcement-learning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,30 @@ about how information is extracted which is why its implementation.
``sinergym_logger`` attribute in constructor.

``LoggerCallback`` inherits from Stable Baselines 3 ``BaseCallback`` and
uses `Tensorboard <https://www.tensorflow.org/tensorboard?hl=es-419>`__ on the
background at the same time. With *Tensorboard*, it's possible to visualize all DRL
training in real time and compare between different executions. This is an example:
uses `Weights & Biases <https://wandb.ai/site>`__(*wandb*) in the background in order to host
all information extracted. With *wandb*, it's possible to track and visualize all DRL
training in real time, register hyperparameters and details of each execution, save artifacts
such as models and sinergym output, and compare between different executions. This is an example:

.. image:: /_static/tensorboard_example.png
- Hyperparameter and summary registration:

.. image:: /_static/wandb_example1.png
:width: 800
:alt: WandB hyperparameters
:align: center

- Artifacts registered (if evaluation is enabled, best model is registered too):

.. image:: /_static/wandb_example2.png
:width: 800
:alt: Tensorboard example
:alt: WandB artifacts
:align: center

- Metrics visualization in real time:

.. image:: /_static/wandb_example3.png
:width: 800
:alt: WandB charts
:align: center

There are tables which are in some algorithms and not in others and vice versa.
Expand All @@ -80,7 +97,7 @@ at the end of the training).

Its name is ``LoggerEvalCallback`` and it inherits from Stable Baselines 3 ``EvalCallback``.
The main feature added is that the model evaluation is logged in a particular section in
Tensorboard too for the concrete metrics of the building model.
*wandb* too for the concrete metrics of the building model.

We have to define in ``LoggerEvalCallback`` construction how many training episodes we want
the evaluation process to take place. On the other hand, we have to define how many episodes
Expand All @@ -91,14 +108,14 @@ therefore, the more faithful it will be to reality in terms of how good the curr
turning out to be. However, it will take more time.

It calculates timestep and episode average for power consumption, comfort penalty and power penalty.
On the other hand, it calculates too comfort violation percentage in episodes too.
On the other hand, it calculates comfort violation percentage in episodes too.
Currently, only mean reward is taken into account to decide when a model is better.

***********************
Tensorboard structure
***********************
******************************
Weights and Biases structure
******************************

The main structure for *Sinergym* with *Tensorboard* is:
The main structure for *Sinergym* with *wandb* is:

* **action**: This section has action values during training. When algorithm
is On Policy, it will appear **action_simulation** too. This is because
Expand Down Expand Up @@ -153,13 +170,19 @@ The main structure for *Sinergym* with *Tensorboard* is:
.. note:: Evaluation of models can be recorded too, adding ``EvalLoggerCallback``
to model learn method.

**********
How use
**********
************
How to use
************

For more information about how to use it with cloud computing, visit :ref:`Sinergym with Google Cloud`


Train a model
~~~~~~~~~~~~~~~~

You can try your own experiments and benefit from this functionality.
`sinergym/scripts/DRL_battery.py <https://github.com/ugr-sail/sinergym/blob/main/scripts/DRL_battery.py>`__
is a example code to use it. You can use ``DRL_battery.py`` directly from
is a script to help you to do it. You can use ``DRL_battery.py`` directly from
your local computer or using Google Cloud Platform.

The most **important information** you must keep in mind when you try
Expand Down Expand Up @@ -193,35 +216,69 @@ JSON structure example in `sinergym/scripts/DRL_battery_example.json <https://gi
default values).

* The **optional** parameters are: All environment parameters (if it is specified
will be overwrite the default environment value) seed, model to load (before training),
will be overwrite the default environment value), seed, model to load (before training),
experiment ID, wrappers to use (respecting the order), training evaluation,
tensorboard functionality and cloud options.
wandb functionality and cloud options.

* The name of the fields must be like in example mentioned. Otherwise, the experiment
will return an error.

****************
Mlflow
****************
This script do the next:

Our scripts to run DRL with *Sinergym* environments are using
`Mlflow <https://mlflow.org/>`__, in order to **tracking experiments**
and recorded them methodically. It is recommended to use it.
You can start a local server with information stored during the
battery of experiments such as initial and ending date of execution,
hyperparameters, duration, etc.
1. Setting an appropriate name for the experiment. Following the next
format: ``<algorithm>-<environment_name>-episodes<episodes_int>-seed<seed_value>(<experiment_date>)``

Here is an example:
2. Starting WandB track experiment with that name (if configured in JSON), it will create an local path (*./wandb*) too.

.. image:: /_static/mlflow_example.png
:width: 800
:alt: Tensorboard example
:align: center
3. Log all parameters allocated in JSON configuration (including *sinergym.__version__* and python version).

4. Setting env with parameters overwritten in case of establishing them.

5. Setting wrappers specified in JSON.

.. note:: For information about how use *Tensorboard* and *Mlflow* with a Cloud
Computing paradigm, see :ref:`Remote Tensorboard log` and
:ref:`Mlflow tracking server set up`.
6. Defining model algorithm using hyperparameters defined.

7. Calculate training timesteps using number of episodes.

8. Setting up evaluation callback if it has been specified.

9. Setting up WandB logger callback if it has been specified.

10. Training with environment.

11. If remote store has been specified, saving all outputs in Google
Cloud Bucket. If wandb has been specified, saving all
outputs in wandb run artifact.

12. Auto-delete remote container in Google Cloud Platform when parameter
auto-delete has been specified.


Load a trained model
~~~~~~~~~~~~~~~~~~~~~~

You can try load a previous trained model and evaluate or execute it.
`sinergym/scripts/load_agent.py <https://github.com/ugr-sail/sinergym/blob/main/scripts/load_agent.py>`__
is a script to help you to do it. You can use ``load_agent.py`` directly from
your local computer or using Google Cloud Platform.

``load_agent.py`` has a unique parameter to be able to execute it; ``-conf``.
This parameter is a str to indicate the JSON file in which there are allocated
all information about the evaluation you want to execute. You can see the
JSON structure example in `sinergym/scripts/load_agent_example.json <https://github.com/ugr-sail/sinergym/blob/main/scripts/load_agent_example.json>`__:

* The **obligatory** parameters are: environment, episodes,
algorithm (only algorithm name is necessary) and model to load.

* The **optional** parameters are: All environment parameters (if it is specified
will be overwrite the default environment value),
experiment ID, wrappers to use (respecting the order),
wandb functionality and cloud options.

This script loads the model. Once the model is loaded, it predicts the actions from the
states during the agreed episodes. The information is collected and sent to a remote
storage if it is indicated (such as WandB),
otherwise it is stored in local memory.

.. note:: *This is a work in progress project. Direct support with others
algorithms is being planned for the future!*
Loading