Skip to content

Commit

Permalink
(v2.2.7) - WandB integration (#306)
Browse files Browse the repository at this point in the history
* Updated version from 2.2.6 to 2.2.7

* Added wandb extra dependency

* Deleted mlflow code and included wandb init and params log in DRL_battery.py

* Deleted tensorboard_log in SB3 models and mlflow artifact stores in DRL_battery.py

* Deleted mlflow and tensorboard extra dependencies

* Deleted tensorboard element and added wandb in JSON example for DRL_battery.py

* Added WandBOutputFormar for callback loggers in SB3

* Added wandb folder to gitignore

* Fixed log interval parameter in DRL_battery.py JSON example and fixed wandb section

* DRL_battery.py: Added Monitor wrapper to env and wandb artifacts

* Updated docs source modules

* Deleted old images of mlflow and tensorboard and included wandb images in documentation

* Updated documentation about DRL and google cloud with sinergym

* Added Stable Baselines 3 (gymnasium branch) in extra requires

* Fixed spelling and sphinx compilation

* Added artifact_name and artifact_tag to DRL_battery JSON and adapted in DRL_battery.py

* Adapted load_agent.py to new JSON format and created JSON example (with wandb model load option included)

* Deleted stable baselines 3 from gymnasium PR extra dependency

* Fixed DRL_battery wandb artifact definition

* Updated DRL_battery.py model save path and wandb artifact

* Updated load_agent evaluation name, updated wandb init_params, fixed wandb_path and artifact registration added

* updated structure documentation for load_agent.py

* extra require SB3 latest stable version and SB3 tests deactivated temporally

* Added wandb dependency to test container workflow
  • Loading branch information
AlejandroCN7 authored Mar 20, 2023
1 parent b399cf0 commit ea4bb91
Show file tree
Hide file tree
Showing 28 changed files with 658 additions and 771 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,7 @@ dist/
#coverage
.coverage
codecov
coverage.xml
coverage.xml

#wandb
wandb/
Binary file removed docs/source/_static/mlflow_example.png
Binary file not shown.
Binary file removed docs/source/_static/tensorboard_example.png
Binary file not shown.
Binary file added docs/source/_static/wandb_example1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/wandb_example2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/wandb_example3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
125 changes: 91 additions & 34 deletions docs/source/pages/deep-reinforcement-learning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,30 @@ about how information is extracted which is why its implementation.
``sinergym_logger`` attribute in constructor.

``LoggerCallback`` inherits from Stable Baselines 3 ``BaseCallback`` and
uses `Tensorboard <https://www.tensorflow.org/tensorboard?hl=es-419>`__ on the
background at the same time. With *Tensorboard*, it's possible to visualize all DRL
training in real time and compare between different executions. This is an example:
uses `Weights & Biases <https://wandb.ai/site>`__(*wandb*) in the background in order to host
all information extracted. With *wandb*, it's possible to track and visualize all DRL
training in real time, register hyperparameters and details of each execution, save artifacts
such as models and sinergym output, and compare between different executions. This is an example:

.. image:: /_static/tensorboard_example.png
- Hyperparameter and summary registration:

.. image:: /_static/wandb_example1.png
:width: 800
:alt: WandB hyperparameters
:align: center

- Artifacts registered (if evaluation is enabled, best model is registered too):

.. image:: /_static/wandb_example2.png
:width: 800
:alt: Tensorboard example
:alt: WandB artifacts
:align: center

- Metrics visualization in real time:

.. image:: /_static/wandb_example3.png
:width: 800
:alt: WandB charts
:align: center

There are tables which are in some algorithms and not in others and vice versa.
Expand All @@ -80,7 +97,7 @@ at the end of the training).

Its name is ``LoggerEvalCallback`` and it inherits from Stable Baselines 3 ``EvalCallback``.
The main feature added is that the model evaluation is logged in a particular section in
Tensorboard too for the concrete metrics of the building model.
*wandb* too for the concrete metrics of the building model.

We have to define in ``LoggerEvalCallback`` construction how many training episodes we want
the evaluation process to take place. On the other hand, we have to define how many episodes
Expand All @@ -91,14 +108,14 @@ therefore, the more faithful it will be to reality in terms of how good the curr
turning out to be. However, it will take more time.

It calculates timestep and episode average for power consumption, comfort penalty and power penalty.
On the other hand, it calculates too comfort violation percentage in episodes too.
On the other hand, it calculates comfort violation percentage in episodes too.
Currently, only mean reward is taken into account to decide when a model is better.

***********************
Tensorboard structure
***********************
******************************
Weights and Biases structure
******************************

The main structure for *Sinergym* with *Tensorboard* is:
The main structure for *Sinergym* with *wandb* is:

* **action**: This section has action values during training. When algorithm
is On Policy, it will appear **action_simulation** too. This is because
Expand Down Expand Up @@ -153,13 +170,19 @@ The main structure for *Sinergym* with *Tensorboard* is:
.. note:: Evaluation of models can be recorded too, adding ``EvalLoggerCallback``
to model learn method.

**********
How use
**********
************
How to use
************

For more information about how to use it with cloud computing, visit :ref:`Sinergym with Google Cloud`


Train a model
~~~~~~~~~~~~~~~~

You can try your own experiments and benefit from this functionality.
`sinergym/scripts/DRL_battery.py <https://github.com/ugr-sail/sinergym/blob/main/scripts/DRL_battery.py>`__
is a example code to use it. You can use ``DRL_battery.py`` directly from
is a script to help you to do it. You can use ``DRL_battery.py`` directly from
your local computer or using Google Cloud Platform.

The most **important information** you must keep in mind when you try
Expand Down Expand Up @@ -193,35 +216,69 @@ JSON structure example in `sinergym/scripts/DRL_battery_example.json <https://gi
default values).

* The **optional** parameters are: All environment parameters (if it is specified
will be overwrite the default environment value) seed, model to load (before training),
will be overwrite the default environment value), seed, model to load (before training),
experiment ID, wrappers to use (respecting the order), training evaluation,
tensorboard functionality and cloud options.
wandb functionality and cloud options.

* The name of the fields must be like in example mentioned. Otherwise, the experiment
will return an error.

****************
Mlflow
****************
This script do the next:

Our scripts to run DRL with *Sinergym* environments are using
`Mlflow <https://mlflow.org/>`__, in order to **tracking experiments**
and recorded them methodically. It is recommended to use it.
You can start a local server with information stored during the
battery of experiments such as initial and ending date of execution,
hyperparameters, duration, etc.
1. Setting an appropriate name for the experiment. Following the next
format: ``<algorithm>-<environment_name>-episodes<episodes_int>-seed<seed_value>(<experiment_date>)``

Here is an example:
2. Starting WandB track experiment with that name (if configured in JSON), it will create an local path (*./wandb*) too.

.. image:: /_static/mlflow_example.png
:width: 800
:alt: Tensorboard example
:align: center
3. Log all parameters allocated in JSON configuration (including *sinergym.__version__* and python version).

4. Setting env with parameters overwritten in case of establishing them.

5. Setting wrappers specified in JSON.

.. note:: For information about how use *Tensorboard* and *Mlflow* with a Cloud
Computing paradigm, see :ref:`Remote Tensorboard log` and
:ref:`Mlflow tracking server set up`.
6. Defining model algorithm using hyperparameters defined.

7. Calculate training timesteps using number of episodes.

8. Setting up evaluation callback if it has been specified.

9. Setting up WandB logger callback if it has been specified.

10. Training with environment.

11. If remote store has been specified, saving all outputs in Google
Cloud Bucket. If wandb has been specified, saving all
outputs in wandb run artifact.

12. Auto-delete remote container in Google Cloud Platform when parameter
auto-delete has been specified.


Load a trained model
~~~~~~~~~~~~~~~~~~~~~~

You can try load a previous trained model and evaluate or execute it.
`sinergym/scripts/load_agent.py <https://github.com/ugr-sail/sinergym/blob/main/scripts/load_agent.py>`__
is a script to help you to do it. You can use ``load_agent.py`` directly from
your local computer or using Google Cloud Platform.

``load_agent.py`` has a unique parameter to be able to execute it; ``-conf``.
This parameter is a str to indicate the JSON file in which there are allocated
all information about the evaluation you want to execute. You can see the
JSON structure example in `sinergym/scripts/load_agent_example.json <https://github.com/ugr-sail/sinergym/blob/main/scripts/load_agent_example.json>`__:

* The **obligatory** parameters are: environment, episodes,
algorithm (only algorithm name is necessary) and model to load.

* The **optional** parameters are: All environment parameters (if it is specified
will be overwrite the default environment value),
experiment ID, wrappers to use (respecting the order),
wandb functionality and cloud options.

This script loads the model. Once the model is loaded, it predicts the actions from the
states during the agreed episodes. The information is collected and sent to a remote
storage if it is indicated (such as WandB),
otherwise it is stored in local memory.

.. note:: *This is a work in progress project. Direct support with others
algorithms is being planned for the future!*
Loading

0 comments on commit ea4bb91

Please sign in to comment.