Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[drake_gym] Add info_handler callback #21900

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

samzapo
Copy link
Collaborator

@samzapo samzapo commented Sep 12, 2024

Closes #21899.


This change is Reviewable

@jwnimmer-tri
Copy link
Collaborator

jwnimmer-tri commented Sep 16, 2024

I am not very familiar with Gym, so let me ask a few questions and then also invite others to comment.

(1) Is the info["timestep"] a convention in the Gym community? If yes, they we should add a comment in the code with cross-reference(s) to existing practice, in which case my other questions are moot.

(2) If not, then for users who want this kind of metadata, why would the following not be a better solution (in their own code, with no changes to Drake)?

observation, reward, terminated, truncated, info = env.step(action)
info["timestamp"] = env.simulator.get_context().get_time()

With this technique, users can add any metadata they like. Perhaps the problem is that the user is not directly calling env.step, because it's called from a framework? In that case ...

(3) If we do need to populate info during DrakeGymEnv.step and neither option 2 nor subclassing is a viable solution, possibly we should following the existing pattern of self.reward and add a self.info constructor argument, so that the user can pass into the constructor any kind of info-callback they like. Certainly the timestamp won't be the only kind of info ever requested / useful, so hard-coding it be exactly one thing seems unwise? Flexibility seems like the better approach, at first glance.

+@JoseBarreiros-TRI for discussion and feature review, please.

@JoseBarreiros-TRI
Copy link
Contributor

Here are my thoughts:

  1. afaik, info["timestep"] is not a convention in the Gymnasium environment. The only widely use info key is is_success which is often use by third-party libraries (e.g. wandb, sb3) to log success rate.
  2. The user can perfectly do what is suggested here. Indeed, a common practice (and what is recommended by Gymnaisum) when adding more operations/info to the environment is using a wrapper.
  3. If we want to add a sugar method to add data to info, then Add a "timestamp" item to the info return value from DrakeGymEnv.step(...) #21899 seems to be the way to go.
    In summary, implementing (3) or leaving the code as it is are both satisfactory.

@samzapo
Copy link
Collaborator Author

samzapo commented Sep 16, 2024

@jwnimmer-tri This is a great suggestion and works, but I'd prefer to support multiple gym environments, some not using Drake (e.g.: huggingface/gym-aloha).

Since it is not a convention to provide time data in info I'd lean toward requesting an optional "info handler" DrakeGymEnv construction parameter, mentioned here:
#21899 (comment)
rather than wrapping DrakeGymEnv and overriding step(), as I've done here: samzapo/gym-drake-lca as stop-gap.

If this PR is dismissed, as the desired behavior is possible with the current setup (thanks @JoseBarreiros-TRI ). That's fine too.

@jwnimmer-tri
Copy link
Collaborator

... I'd lean toward requesting an optional "info handler" ...

Ah, thanks for the pointer; I'd missed that detail.

It sounds like everyone is happy with the "info handler" proposal, so that's probably the way to proceed.

@jwnimmer-tri jwnimmer-tri changed the title Adds a "timestamp" item to the info return value from DrakeGymEnv.step(...) [drake_gym] Add info_handler callback Sep 17, 2024
@jwnimmer-tri jwnimmer-tri added the release notes: feature This pull request contains a new feature label Sep 17, 2024
Copy link
Collaborator

@jwnimmer-tri jwnimmer-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samzapo Jenkins is showing linter errors. You can run the linters locally via bazel test --config=lint //bindings/pydrake/....

@JoseBarreiros-TRI do you have time/interest in serving as the reviewer here? If not, I can recruit someone else.

Reviewable status: 1 unresolved discussion, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)


bindings/pydrake/gym/_drake_gym_env.py line 256 at r2 (raw file):

            terminated = False
            reward = 0
            info = dict() if self.info_handler is None else self.info_handler(self.simulator)

nit Instead of copy-pasting the if-None-else logic multiple times, we can probably do something less brittle, e.g.

(1) Don't allow the info handler be None; instead, change the default value in the constructor args to bedict (i.e., the callable constructor that does what we want as a default).

(2) Add a private helper method def _info(self): on DrakeGymEnv with the None-handling logic, and change this to info = self._info() or similar.

Copy link
Contributor

@JoseBarreiros-TRI JoseBarreiros-TRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to review this

Reviewable status: 1 unresolved discussion, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)

Copy link
Collaborator Author

@samzapo samzapo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)


bindings/pydrake/gym/_drake_gym_env.py line 256 at r2 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

nit Instead of copy-pasting the if-None-else logic multiple times, we can probably do something less brittle, e.g.

(1) Don't allow the info handler be None; instead, change the default value in the constructor args to bedict (i.e., the callable constructor that does what we want as a default).

(2) Add a private helper method def _info(self): on DrakeGymEnv with the None-handling logic, and change this to info = self._info() or similar.

Done. I've chosen to do (1), see assignment to self.info_handler logic

@samzapo
Copy link
Collaborator Author

samzapo commented Sep 18, 2024

bindings/pydrake/gym/_drake_gym_env.py line 308 at r3 (raw file):

        # port.
        observations = self.observation_port.Eval(context)
        info = dict() if self.info_handler is None else self.info_handler(self.simulator)

Fixing this in a moment. Immediately validating why the complicated logic on each line is hard to maintain.

Code quote:

info = dict() if self.info_handler is None else self.info_handler(self.simulator)

Copy link
Collaborator

@jwnimmer-tri jwnimmer-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)


bindings/pydrake/gym/_drake_gym_env.py line 256 at r2 (raw file):

Previously, samzapo (Samuel Zapolsky) wrote…

Done. I've chosen to do (1), see assignment to self.info_handler logic

BTW I expect that foo = lambda _: dict() can be shortened to just foo = dict. The constructor is already a callable.

Copy link
Collaborator Author

@samzapo samzapo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)


bindings/pydrake/gym/_drake_gym_env.py line 256 at r2 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

BTW I expect that foo = lambda _: dict() can be shortened to just foo = dict. The constructor is already a callable.

Would this work if self.info_handler(simulator) is called (i.e., with a parameter)?

Copy link
Collaborator

@jwnimmer-tri jwnimmer-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)


bindings/pydrake/gym/_drake_gym_env.py line 256 at r2 (raw file):

Previously, samzapo (Samuel Zapolsky) wrote…

Would this work if self.info_handler(simulator) is called (i.e., with a parameter)?

Oops, nevermind. I forgot the callback took argument.

Copy link
Contributor

@JoseBarreiros-TRI JoseBarreiros-TRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 unresolved discussions, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)


bindings/pydrake/examples/gym/envs/cart_pole.py line 418 at r4 (raw file):

        observation_port_id="observations",
        reset_handler=reset_handler,
        info_handler=lambda _: dict(),

why not set this to None or better provide a nice example of an info handler


bindings/pydrake/gym/_drake_gym_env.py line 257 at r4 (raw file):

            terminated = False
            reward = 0
            info = self.info_handler(self.simulator)

I'm not sure about the expected behavior here. I think that if the run gets to this point it means that the simulator faulted which would cause an unexpected behavior when called by the info_handler(). Consider setting the info to an empty dict or info["sim_runtime_error"]=True.

Copy link
Contributor

@JoseBarreiros-TRI JoseBarreiros-TRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samzapo I'm happy to help land this PR, let me know if you have any questions.

Reviewable status: 2 unresolved discussions, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)

Copy link
Collaborator Author

@samzapo samzapo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @JoseBarreiros-TRI , I apologize for the delay, I was on vacation last week. I'll resume working on this today.

Reviewable status: 2 unresolved discussions, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits)


bindings/pydrake/gym/_drake_gym_env.py line 257 at r4 (raw file):

Previously, JoseBarreiros-TRI wrote…

I'm not sure about the expected behavior here. I think that if the run gets to this point it means that the simulator faulted which would cause an unexpected behavior when called by the info_handler(). Consider setting the info to an empty dict or info["sim_runtime_error"]=True.

I don't want to add anything to info that isn't standard.
This is what motivated adding an info handler over explicitly adding info["timestamp"]

I think it's likely that simulator will still behave normally after most runtime errors, but this is a good point
I'd go for just returning an empty dict in this case, and adding documentation that says, e.g.,
"if the simulator emits a runtime error while advancing, the provided info handler will not be called and will return an empty dict instead".

Would that work?

Copy link
Collaborator Author

@samzapo samzapo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 unresolved discussions, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)


bindings/pydrake/examples/gym/envs/cart_pole.py line 418 at r4 (raw file):

Previously, JoseBarreiros-TRI wrote…

why not set this to None or better provide a nice example of an info handler

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes: feature This pull request contains a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a "timestamp" item to the info return value from DrakeGymEnv.step(...)
3 participants