-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] RL Common / Baselines Common Package #503
Comments
Which parts of I agree this would be a useful thing, especially now with the upcoming transitions to a new backend. It would be a lot of work and debugging to get everything right (and even then probably some clashing with interfaces), but it would help keeping current stable-baselines structure clean. At the same time we can tidy up other things, e.g.
As for the name: I vote for "baselines-common". Term "Baselines" is already familiar in (D)RL scene. "rl-common" is nice too, but there is still the very active part of "non-deep reinforcement learning" which is quite different from the "deep reinforcement learning" where network-like models screw up everything :) Edit: Cpt. Obvious here but the first thing to do is to gather solid baseline results of current stable-baselines/baselines we then should be able to replicate in the new backends or when major things change. I think results in the RLZoo would do the trick? |
Everything that is not related to tensorflow, I see at least (not exhaustive list):
Totally agree When using a new backend, you will need to implement those specific parts:
That is the whole point of rl zoo ;) |
To check I understood this right: In summary, this would move backend independent tools used by one or more RL methods, used by environments or generally used for debugging to a new package. This sounds good. The lack of assumptions on the backend makes throughout optimization impossible (e.g. using TF tensors to store replay data or so), but that is not what this package is for and I agree this would be very useful for quick debugging and setting up algorithms. 👍 I think repo for such package should be quick to setup by cloning stable-baselines setup with all the CI things and how tests/docs are done, etc? PyTorch experiments: Yay! Now at least there is one of us who knows how to do them properly ^^. |
It will require at least one day I think and then some iteration to refactor things nicely. |
Ah, I was talking about setting it up so we can start working on it ^^. Naturally the full refactoring / better documentation / etc will take longer, but at least would have a place to start with. Would still like to hear comments from others, though (@erniejunior @AdamGleave) |
I'm weakly in favour of this. I think separating the common dependencies out would be slightly cleaner, although it might also slightly complicate setting up the development environment, which could deter some new contributors. I suspect you'll be one of the few users of this change. So if it'll save you time to restructure this then go for it. But unless others chime up I would not do it for the good of the community. Not many people write RL frameworks from scratch, and those that do will often not want to use Stable Baselines "common" implementation. Even if we clean things up there will be lots of design decisions embedded in this. For example, the VecEnv interface works well for parallelizing at small scale on a single machine, but the synchronous API prevents it scaling to distributed RL workers. |
That's true... Now, you make me ensure about what to do...
On that part, I would expect the "common" package to be relatively stable (in term of amount of change) compared to the main repo.
True, I was mostly thinking of people implementing only one algorithm. For instance, I've already seen the replay buffer implementation copied over. Do you think there could be an alternative, like a git-submodule instead of a package? |
The suggestions sound even more obscure than having a separate package. If goal is to also offer people a set of miscellaneous and independent utilities for RL algorithms, I do not see how git-submodules or the proposed I have personally looked for independent implementations of some algorithms like more sophisticated return/advantage computations (e.g. GAE, V-trace), since it is easy have bugs in them. In that sense a repository of independent and common tools would be nice, but I do not think this would be easy if even possible to do in independent fashion, as the definition of the function can be closely tied to the learning algorithm itself. |
I agree with @Miffyli that if we're going to split it then it's better to be in a separate package. Probably best to be in its own repo too, although you can have multiple Python packages in a single repository. |
With @hill-a , we came up to the same conclusion about things in the common folder: a lot is not specific to this library and could be separated in a stand-alone python package.
Doing so would:
As a personal example, I have now an internal (minimal) pytorch version of stable-baselines (I plan to release it at some point but it is used for my own research purposes for now) where I had to copy a lot of what was present in the common folder.
I would like to have your thoughts on that ;)
@erniejunior @AdamGleave @Miffyli
Suggestions for the name of the package are also welcomed, for now we have:
the repo would be under the stable-baselines team organization: https://github.com/Stable-Baselines-Team
The text was updated successfully, but these errors were encountered: