-
-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update OpenAI Lander example #252
Update OpenAI Lander example #252
Conversation
score += reward | ||
env.render() | ||
if done: | ||
if terminated: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use the truncated state since it seems more to behave like the old "done".
According to the docstring truncated means:
truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.
Typically a timelimit, but could also be used to indicate agent physically going out of bounds.
Can be used to end the episode prematurely before a `terminal state` is reached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the correct action here would have been if terminated or truncated
.
data.append(np.hstack((observation, action, reward))) | ||
|
||
if done: | ||
if terminated: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment for line 223 -> same topic
2 similar comments
step = 0 | ||
data = [] | ||
while 1: | ||
step += 1 | ||
if step < 200 and random.random() < 0.2: | ||
action = env.action_space.sample() | ||
else: | ||
output = net.activate(observation) | ||
output = net.activate(observation_init_vals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this wrong? Shouldn't you have named this observation
? Now it just feeds the initial observation every time through the loop, and the observation never changes. Same issue below!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you are right. I created another PR ... maybe have a look at it and feel free to comment if u find something
#274
Follow up: |
This PR updates the OpenAI Lander example. It addresses changes made in the upstream lander code to make this example work again.