-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chaper 14 Deterministic policy gradients results are quite noisy. #86
Comments
Random weights initialization adds randomness to initial starting point.
Usage if different parallel environments also might add stochastisity
вт, 27 окт. 2020 г., 12:01 isu10503054a <notifications@github.com>:
… In the results of Chapter 14 Deterministic policy gradients in the book,
why the training is not very stable and noisy?
I read the content repeatedly, but I still don’t understand why.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#86>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAQE2WTJOWPGQGYY3MOTRLSM2D5XANCNFSM4TAQL7BQ>
.
|
Is there any hyperparameter in the source code that can modification to improve this situation? |
Tons of :). In fact any constant in the code could be seen as
hyperparameter:
* learning rate
* gamma
* amount of environments
* optimisation method
etc, etc, etc
…On Wed, Oct 28, 2020 at 11:41 AM isu10503054a ***@***.***> wrote:
Random weights initialization adds randomness to initial starting point.
Usage if different parallel environments also might add stochastisity вт,
27 окт. 2020 г., 12:01 isu10503054a ***@***.***:
… <#m_7201119268102051534_>
In the results of Chapter 14 Deterministic policy gradients in the book,
why the training is not very stable and noisy? I read the content
repeatedly, but I still don’t understand why. — You are receiving this
because you are subscribed to this thread. Reply to this email directly,
view it on GitHub <#86
<#86>>,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAAQE2WTJOWPGQGYY3MOTRLSM2D5XANCNFSM4TAQL7BQ
.
Is there any Hyperparameter in the source code that can modification to
improve this situation?
thx
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#86 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAQE2WD2H3KPPJI7OQAZQLSM7KLXANCNFSM4TAQL7BQ>
.
--
wbr, Max Lapan
|
In the results of Chapter 14 Deterministic policy gradients in the book,
why the training is not very stable and noisy?
I read the content repeatedly, but I still don’t understand why.
The text was updated successfully, but these errors were encountered: