Update report #457

pilgrimygy · 2021-08-15T07:57:58Z

PR Checklist

Update NEWS.md?

findmyway · 2021-08-16T04:52:14Z

docs/homepage/blog/offline_reinforcement_learning_algorithm_phase1/index.md

+This technical report is the first evaluation report of Project "Enriching Offline Reinforcement Learning Algorithms in ReinforcementLearning.jl" in OSPP. It includes three components: project information, project schedule, future plan.
+## Project Information
+- Project name: Enriching Offline Reinforcement Learning Algorithms in ReinforcementLearning.jl
+- Scheme Description: Recent advances in offline reinforcement learning make it possible to turn reinforcement learning into a data-driven discipline, such that many effective methods from the supervised learning field could be applied. Until now, the only offline method provided in ReinforcementLearning.jl is behavior cloning. We'd like to have more algorithms added like Batch Constrain Q-Learning (BCQ)\dcite{DBLP:conf/icml/FujimotoMP19}, Conservative Q-Learning (CQL)\dcite{DBLP:conf/nips/KumarZTL20}. It is expected to implement at least three to four modern offline RL algorithms.


Add a reference to behavior cloning.

findmyway · 2021-08-16T04:53:17Z

docs/homepage/blog/offline_reinforcement_learning_algorithm_phase1/index.md

+    batch_size::Int
+end
+```
+This implementation of `OfflinePolicy` refers to `QBasePolicy` ([link](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/master/src/ReinforcementLearningCore/src/policies/q_based_policies/q_based_policy.jl)). It provides a parameter `continuous` to support different action space types, including continuous and discrete. `learner` is a specific algorithm for learning and providing policy. `dataset` and `batch_size` are used to sample data for learning.


Replace the link with the one in the docs https://juliareinforcementlearning.org/docs/rlcore/#ReinforcementLearningCore.QBasedPolicy

findmyway · 2021-08-16T04:54:28Z

docs/homepage/blog/offline_reinforcement_learning_algorithm_phase1/index.md

+```
+This implementation of `OfflinePolicy` refers to `QBasePolicy` ([link](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/master/src/ReinforcementLearningCore/src/policies/q_based_policies/q_based_policy.jl)). It provides a parameter `continuous` to support different action space types, including continuous and discrete. `learner` is a specific algorithm for learning and providing policy. `dataset` and `batch_size` are used to sample data for learning.
+
+Besides, we implement corresponding functions `π`, `update!` and `sample`. `π` is used to select the action, whose form is determined by the type of action space. `update!` can be used in two stages. In `PreExperiment` stage, we can call this function for pre-training algorithms with `pretrain_step` parameters (such as PLAS). In `PreAct` stage, we call this function for training the `learner`. In function `update!`, we need to call function `sample` to sample a batch of data from the dataset. With the development of RLDataset.jl, the `sample` function will be deprecated.


Explain PLAS here.

Add a link to RLDataset.jl.

And better to use the full name of ReinforcementLearningDatasets.jl in this report.

findmyway · 2021-08-16T04:55:43Z

docs/homepage/blog/offline_reinforcement_learning_algorithm_phase1/index.md

+            learner = DQNLearner(
+                # Omit specific code
+            ),
+            dataset = dataset,
+            continuous = false,
+            batch_size = 64,
+        )


Fix the indent.

findmyway · 2021-08-16T04:56:07Z

docs/homepage/blog/offline_reinforcement_learning_algorithm_phase1/index.md

+        )
+```
+
+Therefore, we unified the parameter name in different algorithms so that different `learner` can be compatible with `OfflinePolicy`.


Suggested change

Therefore, we unified the parameter name in different algorithms so that different `learner` can be compatible with `OfflinePolicy`.

Therefore, we unified the parameter name in different algorithms so that different `learner`s can be compatible with `OfflinePolicy`.

findmyway · 2021-08-16T05:01:28Z

docs/homepage/blog/offline_reinforcement_learning_algorithm_phase1/index.md

+```
+
+#### Offline RL Algorithms
+We used the existing algorithms and hooks to train the offline RL algorithm to create datasets in several environments (such as CartPole, Pendulum) for training. This work can guide the subsequent development of package RLDataset.jl, for example:


Same as above, add links to CartPole and Pendulum in the docs.

findmyway · 2021-08-16T05:03:21Z

docs/homepage/blog/offline_reinforcement_learning_algorithm_phase1/index.md

+```
+
+##### Benchmark
+We implemented and experimented with offline DQN (in discrete action space) and offline SAC (in continuous action space) as benchmarks. The performance of offline DQN in Cartpole environment:


The tense you used in this report is kind of confusing to me. Sometimes the present tense is used, now the past tense here. Better to unify them all.

findmyway · 2021-08-16T05:04:26Z

docs/homepage/blog/offline_reinforcement_learning_algorithm_phase1/index.md

+
+Besides, we implement corresponding functions `π`, `update!` and `sample`. `π` is used to select the action, whose form is determined by the type of action space. `update!` can be used in two stages. In `PreExperiment` stage, we can call this function for pre-training algorithms with `pretrain_step` parameters (such as PLAS). In `PreAct` stage, we call this function for training the `learner`. In function `update!`, we need to call function `sample` to sample a batch of data from the dataset. With the development of RLDataset.jl, the `sample` function will be deprecated.
+
+We can quickly call the offline version of the existing algorithms with almost no additional code with this framework. Therefore, the implementation and performance testing of offline DQN and offline SAC can be completed soon. For example:


I think this is the first place to mention SAC, better to add a reference here.

docs/homepage/blog/offline_reinforcement_learning_algorithm_phase1/index.md

findmyway · 2021-08-16T05:06:43Z

docs/homepage/blog/offline_reinforcement_learning_algorithm_phase1/index.md

+
+\dfig{body;PLAS2.png}
+
+Please refer to this link for specific code ([link](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/master/src/ReinforcementLearningZoo/src/algorithms/offline_rl/PLAS.jl)). The brief function parameters are as follows:


Better to use the link in docs

What do the links in the docs? Is it a brief introduction later?

I mean this link: https://juliareinforcementlearning.org/docs/rlzoo/#ReinforcementLearningZoo.PLASLearner-Tuple{}

pilgrimygy and others added 5 commits August 15, 2021 15:55

Update report

6577797

Delete bibliography.bib.sav

ca64c59

Update cspell

9d3187a

Merge branch 'report' of https://github.com/pilgrimygy/ReinforcementL…

98430aa

…earning.jl into report

Merge branch 'JuliaReinforcementLearning:master' into report

e1e21ed

findmyway requested changes Aug 16, 2021

View reviewed changes

pilgrimygy and others added 2 commits August 16, 2021 13:51

Merge branch 'master' into report

eac804b

update report

c72363c

pilgrimygy requested a review from findmyway August 16, 2021 07:53

Merge branch 'master' into report

e107cb6

findmyway merged commit 64c87bc into JuliaReinforcementLearning:master Aug 16, 2021

pilgrimygy deleted the report branch August 19, 2021 13:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Update report #457

Update report #457

Uh oh!

pilgrimygy commented Aug 15, 2021

Uh oh!

findmyway Aug 16, 2021

Uh oh!

findmyway Aug 16, 2021

Uh oh!

findmyway Aug 16, 2021

Uh oh!

findmyway Aug 16, 2021

Uh oh!

findmyway Aug 16, 2021

Uh oh!

findmyway Aug 16, 2021

Uh oh!

findmyway Aug 16, 2021

Uh oh!

findmyway Aug 16, 2021

Uh oh!

findmyway Aug 16, 2021

Uh oh!

Uh oh!

findmyway Aug 16, 2021

Uh oh!

pilgrimygy Aug 16, 2021

Uh oh!

findmyway Aug 16, 2021

Uh oh!

pilgrimygy Aug 16, 2021

Uh oh!

Uh oh!

	Therefore, we unified the parameter name in different algorithms so that different `learner` can be compatible with `OfflinePolicy`.
	Therefore, we unified the parameter name in different algorithms so that different `learner`s can be compatible with `OfflinePolicy`.


		Besides, we implement corresponding functions `π`, `update!` and `sample`. `π` is used to select the action, whose form is determined by the type of action space. `update!` can be used in two stages. In `PreExperiment` stage, we can call this function for pre-training algorithms with `pretrain_step` parameters (such as PLAS). In `PreAct` stage, we call this function for training the `learner`. In function `update!`, we need to call function `sample` to sample a batch of data from the dataset. With the development of RLDataset.jl, the `sample` function will be deprecated.

		We can quickly call the offline version of the existing algorithms with almost no additional code with this framework. Therefore, the implementation and performance testing of offline DQN and offline SAC can be completed soon. For example:


		\dfig{body;PLAS2.png}

		Please refer to this link for specific code ([link](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/master/src/ReinforcementLearningZoo/src/algorithms/offline_rl/PLAS.jl)). The brief function parameters are as follows:

Uh oh!

Update report #457

Update report #457

Uh oh!

Conversation

pilgrimygy commented Aug 15, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!