You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/homepage/blog/ospp_mid-term_report_210370741/index.md
+30-13Lines changed: 30 additions & 13 deletions
Original file line number
Diff line number
Diff line change
@@ -35,7 +35,7 @@ Establish a General Pipeline for Offline Reinforcement Learning Evaluation
35
35
36
36
### Background
37
37
38
-
In the recent years there have been several breakthroughs in the field of Reinforcement Learning with several practical applications where RL bots have been able to achieve superhuman performance. This has also been reflected in the industry where several cutting edge solutions have been developed based on RL (Tesla Motors, AutoML, DeepMind data center cooling solutions just to name a few).
38
+
In the recent years there have been several breakthroughs in the field of Reinforcement Learning with numerous practical applications where RL bots have been able to achieve superhuman performance. This has also been reflected in the industry where several cutting edge solutions have been developed based on RL (Tesla Motors, AutoML, DeepMind data center cooling solutions just to name a few).
39
39
40
40
One of the most notorious challenges in RL is the lack of reliable environments for training RL agents. Offline RL has played a pivotal role in solving this problem by removing the need for the agent to interact with the environment to improve its policy over time. This brings forth the problem of having reliable tests to verify the performance of RL algorithms. Such tests are facilitated by standard datasets ([RL Unplugged](https://arxiv.org/abs/2006.13888), [D4RL](https://arxiv.org/abs/2004.07219) to name a few) that are used to train Offline RL agents and benchmark against other algorithms and implementations. [ReinforcementLearningDatasets.jl](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/tree/master/src/ReinforcementLearningDatasets) provides a simple solution to access various standard datasets that are available for Offline RL benchmarking across a variety of tasks.
41
41
@@ -45,7 +45,7 @@ Another problem in Offline RL is Offline Model Selection. For this there are sev
45
45
46
46
### Objectives
47
47
48
-
Create a package called **ReinforcementLearningDatasets.jl** that would aid in loading various standard datasets that are available.
48
+
Create a package called **ReinforcementLearningDatasets.jl** that would aid in loading various standard datasets and policies that are available.
49
49
50
50
Make the following datasets available in RLDatasets.
51
51
@@ -58,7 +58,8 @@ Make standard policies available in [Benchmarks for Deep Off-Policy Evaluation](
58
58
59
59
Implement an OPE method and select between a number of standard policies for a particular task using RLDatasets.jl.
60
60
61
-
Following are the future work that is possible in this project.
61
+
Following are the future work that are possible in this project.
62
+
62
63
- Parallel loading and partial loading of datasets for supported datasets.
63
64
- Add support for envs that are not supported by GymEnvs -> Flow and CARLA.
64
65
- Add support for datasets in Flow and CARLA envs.
@@ -80,7 +81,7 @@ Refer the following [discussion](https://github.com/JuliaReinforcementLearning/R
80
81
| 07/21 - 07/30 | Implement loading of `d4rl` and `d4rl-pybullet` datasets |
81
82
| 07/31 - 08/06 | Implement loading of `Google Research DQN Replay Datasets`|
82
83
| 08/07 - 08/14 | Implement loading of `RL Unplugged atari datasets`, setup the docs, add README.md. Make the package more user friendly. Make the **mid-term report**|
83
-
| 08/15 - 08/30 | Add the rest of RL Unplugged datasets, polish the interface, finalize the structure of the codebase. Add examples and `register the package.`|
84
+
| 08/15 - 08/30 | Add lazy multi threaded loading support for `Google Research DQN Replay Datasets`. Add the rest of RL Unplugged datasets, polish the interface, finalize the structure of the codebase. Add examples and `register the package.`|
84
85
| 09/01 - 09/15 | Add support for policy loading from [Benchmarks for Deep Off-Policy Evaluation](https://github.com/google-research/deep_ope) and implement and OPE method |
85
86
| 09/16 - 09/30 | Test OPE in various environments and publish benchmarks in RLDatasets.jl. Implement other features that makes the package more user friendly. Complete the **final-term report**|
86
87
@@ -160,22 +161,22 @@ The type that is returned is a `Channel{RLTransition}` which returns batches of
160
161
-[Expand to d4rl-pybullet #416](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/416)
161
162
-[Add Atari datasets released by Google Research #429](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/429)
162
163
-[RL unplugged implementation with tests #452](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/452)
163
-
-[Features for Offline Reinforcement Learning Pipeline #359](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/discussions/359)
164
+
-[Features for Offline Reinforcement Learning Pipeline #359](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/discussions/359)
The challenge that was faced during the first week was to chart out a direction for RLDatasets.jl. So, I had to research implementations of the pipeline in [d3rlpy](https://github.com/takuseno/d3rlpy), [TF.data.Dataset](https://www.tensorflow.org/datasets) etc. Then I narrowed down some of the inspiring ideas in the [discussion](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/discussions/359).
169
170
170
-
Later I made the [implementation](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/384) as a wrapper around d4rl python library which was discarded as it did not align with the purpose of the library.
171
+
Later I made the [implementation](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/384) as a wrapper around d4rl python library which was discarded as it did not align with the purpose of the library of being lightweight, not requiring a `Mujoco license` for usage of open source datasets. A wrapper would also not give the fine grained control that we could get if we load the datasets natively.
171
172
172
-
It was finalized that the package was going to be a lightweight implementation without much dependencies. So, we decided to use [DataDeps.jl](https://github.com/oxinabox/DataDeps.jl) for registering, tracking and locating datasets without any hassle. The thing that I learnt here was how to make a package, manage its dependencies and choose which package would be the right fit for the job. I also had to learn about Iterator interfaces in julia to convert the type that is output by the `dataset` function into an iterator. `d4rl-pybullet` was also implemented in a similar fashion.
173
+
We decided to use [DataDeps.jl](https://github.com/oxinabox/DataDeps.jl) for registering, tracking and locating datasets without any hassle. The thing that I learnt here was how to make a package, manage its dependencies and choose which package would be the right fit for the job. I also had to learn about `Iterator` interfaces in julia to convert the type that is output by the `dataset` function into an `iterator`. `d4rl-pybullet` was also implemented in a similar fashion.
173
174
174
-
Implementation of `Google Research Atari DQN Replay Datasets` was harder because it was quite a large dataset and even one shard didn't exactly fit into memory. One of the major things that I had to figure out was how the data was stored and how to retrieve it. Initially I planned to use `GZip.jl` to unpack the gzip files and use `NPZ.jl` to read the files. And NPZ wasn't able to read from `GZipStream` by itself so I had to adapt the functions in `NPZ` to read the stream. Later we decided to use `CodecZlib` to get a decompressed buffer channel output which was natively supported by `NPZ`. We also had to test it internally and skip the CI test because CI wouldn't be able to handle the dataset.
175
+
Implementation of `Google Research Atari DQN Replay Datasets` was harder because it was quite a large dataset and even one shard didn't exactly fit into memory. One of the major things that I had to figure out was how the data was stored and how to retrieve it. Initially I planned to use `GZip.jl` to unpack the gzip files and use `NPZ.jl` to read the files. And NPZ wasn't able to read from `GZipStream` by itself so I had to adapt the functions in `NPZ` to read the stream. Later we decided to use `CodecZlib` to get a decompressed buffer channel output which was natively supported by `NPZ`. We also had to test it internally and skip the CI test because CI wouldn't be able to handle the dataset. Exploring the possibility of lazy loading of the files that are available and enabling it is also within the scope of the project.
175
176
176
-
For supporting RL Unplugged dataset I had to learn about `.tfrecord` files, Protocol Buffers, Channels, buffered Channels and using multi threading in a lot of occasions that took a lot of time to learn, The final implementation was however based on already existing work in `TFRecord.jl`.
177
+
For supporting `RL Unplugged dataset` I had to learn about `.tfrecord` files, `Protocol Buffers`, `buffered Channels` and using multi threading in a lot of occasions which took a lot of time to learn. The final implementation was however based on already existing work in `TFRecord.jl`.
177
178
178
-
Some of the more interesting pieces of code used in RL Unplugged dataset.
179
+
Some of the more interesting pieces of code used in loading RL Unplugged dataset.
179
180
180
181
```julia
181
182
ch_src =Channel{RLTransition}(n * tf_reader_sz) do ch
Multi threaded iteration over a Channel to `put!` into another Channel. While the implementation inside `TFRecord.read` is multi threaded in itself. Took quite a while for me to understand these nuances.
198
+
Multi threaded iteration over a Channel to `put!` into another Channel while the implementation inside `TFRecord.read` is multi threaded in itself. It took quite a while for me to understand these nuances.
198
199
199
200
```julia
200
201
res =Channel{RLTransition}(n_preallocations; taskref=taskref, spawn=true) do ch
@@ -205,10 +206,26 @@ end
205
206
```
206
207
Multi threaded batching.
207
208
209
+
All of this work wouldn't have been possible without the patient mentoring and vast knowledge that was shown by my mentor [Jun Tian](https://github.com/findmyway) who has been pivotal in the design and implementation of the package. His massive experience and beautifully written code had provided a lot of inspiration for the making of this package. His amicable nature and commitment to the users of the package providing timely and detailed explanations to any issues or queries related package despite his time constraints has provided a long standing example as a developer and as a person for the developers within and outside OSPP.
210
+
208
211
## Implications
209
212
210
-
Equipping RL.jl with RLDatasets.jl is a key step in making the package more industry relevant because different Offline algorithms can be compared with respect to a variety of standard offline dataset benchmarks.
213
+
Equipping RL.jl with RLDatasets.jl is a key step in making the package more industry relevant because different Offline algorithms can be compared with respect to a variety of standard offline dataset benchmarks. It is also meant to improve the implementations of existing offline algorithms and make it on par with the SOTA implementations. This package also provides a seamless way of downloading and accessing existing datasets. It also supports loading datasets into memory with ease which implemented separately would be difficult and not very usable for the users of the said implementation.
214
+
215
+
After the implementation of [Benchmarks for Deep Off-Policy Evaluation](https://github.com/google-research/deep_ope), testing and comparing algorithms would be much easier than before. This package would also help SOTA offline RL more accessible and reliable than ever before in ReinforcementLearning.jl.
216
+
211
217
212
218
## Future Plan
213
219
214
-
Enabling more feature that were mentioned in [Features for Offline Reinforcement Learning Pipeline #359](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/discussions/359) would be the next obvious step after an OPE method (like FQE) has been explored. Dataset generation, storage and policy parameter storage would be great for the package.
220
+
### Within the time frame of the project.
221
+
222
+
Within the scope of the project and in the given time frame we are planning to:
223
+
224
+
- Polish the package in terms of structure and make it more user friendly.
225
+
- Support the datasets that has not been added.
226
+
- Support of [Benchmarks for Deep Off-Policy Evaluation](https://github.com/google-research/deep_ope) using ONNX.jl. Support possible policies that are provided for implementation within the time frame.
227
+
- Experiments for policy selection using RLDatasets.jl to finalize and establish the usability of the package.
228
+
229
+
### Ideas further down the line
230
+
231
+
Enabling more features that were mentioned in [Features for Offline Reinforcement Learning Pipeline #359](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/discussions/359) would be the next obvious step after the implementation an OPE method (like FQE) has been explored. Dataset generation, storage and policy parameter storage would also be great to implement in RLDatasets.jl.
0 commit comments