-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for a simple preference UI #712
Add support for a simple preference UI #712
Conversation
src/imitation/util/video_wrapper.py
Outdated
self.video_recorder.capture_frame() | ||
return res | ||
# is it crazy to save the video path at every step? | ||
info["video_path"] = self.get_current_video_path() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably best to only store this on the first step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fragmenter will splice trajectories into multiple fragments, so storing it once per episode may not be enough (unless you make the fragmenter cleverer, and have it propagate the video path as needed). You can avoid extra memory by making sure self.get_current_video_path()
returns the same object on successive calls if underlying path hasn't changed (some kind of memoization should work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue I was worried about was that the trajectories would be stored on disk, in which case the video path would be duplicated a lot. Although maybe pickle is smart enough to make it a reference if it's the same object? IIRC str
is immutable so it probably gets stored on disk as a value but Path
probably won't be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing this! Looks like implementation is moving in right direction. Agreed we should add test case & a notebook demo. Left some relatively minor comments on the implementation so far.
""" | ||
|
||
preferences = np.zeros(len(fragment_pairs), dtype=np.float32) | ||
for i, (frag1, frag2) in enumerate(fragment_pairs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also write as something like (not tested, may need to be modified):
preferences = [self._display_videos(**frag) for frag in fragment_pairs]
preferences = np.astype(preferences, dtype=np.float32)
Not sure that's actually any better though -- more succinct but also a bit more magical. WDYT Tom & @levmckinney?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't super love using a list comprehension for UI because it obfuscates the fact that it's going to be slow and synchronous - a list comprehension feels functional to me which would imply that the order of operations doesn't matter
raise ValueError(f"Unexpected input {pref}") | ||
else: | ||
print("Which video is preferred? (1 or 2, or q to quit, or r to replay):\n") | ||
cap1 = cv2.VideoCapture(str(frag1_video_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IPython and CV sections are largely independent except for initial path extraction -- consider moving this whole else:
branch into a helper function analogous to self._display_videos_in_notebook
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'll do that. Although the way I built it _display_videos_in_notebook
only displays the videos, while the display and interaction code isn't cleanly separated in this code. (I could refactor to make it more modular and have the same shape but that feels like overkill.)
ret1, frame1 = cap1.read() | ||
ret2, frame2 = cap2.read() | ||
elif key == "1" or key == "2": | ||
cap1.release() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we always want these lines to be executed? If so, might want to move them to a finally:
block to both ensure they always execute and cut duplication of lines 1026-1028.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm wrong but based on Googling it looks like finally
can only be used for try/except blocks? Were you thinking I'd add a dummy try/except here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or for the KeyboardInterrupt case? TBH I think I don't need to run these functions in the KeyboardInterrupt case because it's going to kill the program anyway EDIT: I guess it's possible the KeyboardInterrupt is caught so nevermind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need an except:
, it can just be:
try:
# your regular code
finally:
# code to always run
Not that important in this case but yes in general a variety of exceptions could occur (including KeyboardInterrupt) and be caught elsewhere in the code.
src/imitation/util/video_wrapper.py
Outdated
self.video_recorder.capture_frame() | ||
return res | ||
# is it crazy to save the video path at every step? | ||
info["video_path"] = self.get_current_video_path() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fragmenter will splice trajectories into multiple fragments, so storing it once per episode may not be enough (unless you make the fragmenter cleverer, and have it propagate the video path as needed). You can avoid extra memory by making sure self.get_current_video_path()
returns the same object on successive calls if underlying path hasn't changed (some kind of memoization should work).
FWIW I haven't cleaned this up at all so it's not quite ready for review - I probably should have included "[WIP]" on the title |
print("Which video is preferred? (1 or 2, or q to quit, or r to replay):\n") | ||
cap1 = cv2.VideoCapture(str(frag1_video_path)) | ||
cap2 = cv2.VideoCapture(str(frag2_video_path)) | ||
cv2.namedWindow("Video 1", cv2.WINDOW_NORMAL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "video" the right word here? "Trajectory"? "Demonstration"?
bd31608
to
f176bdc
Compare
@@ -906,6 +908,201 @@ def _reward_sums(self, fragment_pairs) -> Tuple[np.ndarray, np.ndarray]: | |||
return np.array(rews1, dtype=np.float32), np.array(rews2, dtype=np.float32) | |||
|
|||
|
|||
class SynchronousCLIGatherer(PreferenceGatherer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurs to me that this isn't very testable right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to leave this untested for now, since I have other tasks I need to attend to for now. 🤞 that I'll get to it later but I'm not sure I will
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good, some minor changes needed and addition of a test case.
@levmckinney reckon you could take this PR over the finish line?
" rng=rng,\n", | ||
")\n", | ||
"\n", | ||
"# This gatherer will show the user (you!) pairs of trajectories and ask it to choose\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"# This gatherer will show the user (you!) pairs of trajectories and ask it to choose\n", | |
"# This gatherer will show the user (you!) pairs of trajectories and ask them to choose\n", |
Most people's pronouns aren't "it" ;)
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"From this point onward, this notebook is the same as [the synthetic gatherer notebook](5_train_preference_comparisons.ipynb).\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a comment in 5_train_preference_comparisons.ipynb
to this effect so that we don't inadvertently make edits in that file that cause the two to drift out of sync?
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Now we can train an agent, that only sees those learned reward." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Now we can train an agent, that only sees those learned reward." | |
"Now we can train an agent, that only sees those learned rewards." |
ret1, frame1 = cap1.read() | ||
ret2, frame2 = cap2.read() | ||
elif key == "1" or key == "2": | ||
cap1.release() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need an except:
, it can just be:
try:
# your regular code
finally:
# code to always run
Not that important in this case but yes in general a variety of exceptions could occur (including KeyboardInterrupt) and be caught elsewhere in the code.
|
||
Args: | ||
video_dir: directory where videos of the trajectories are saved. | ||
video_width: width of the video in pixels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this and video_height
only have an effect when not running in IPython. Consider either documenting this or changing it to control width/height in IPython? (Or perhaps we can follow IPython default and have the CV windows width/height default to the size of the recorded videos.)
|
||
def _in_ipython(self) -> bool: | ||
try: | ||
return self.is_running_pytest_test() or get_ipython().__class__.__name__ == "ZMQInteractiveShell" # type: ignore[name-defined] # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we never test _display_in_windows
?
@@ -166,7 +169,18 @@ def train_preference_comparisons( | |||
|
|||
custom_logger, log_dir = logging_ingredient.setup_logging() | |||
|
|||
with environment.make_venv() as venv: | |||
post_wrappers = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long expression is a bit hard to follow, perhaps:
post_wrappers = None
if video_log_dir is not None:
post_wrappers = [...]
True, | ||
np.array([1.0]), | ||
), | ||
types.TrajectoryWithRew( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this identical to above? Can we assign it to a temporary variable e.g. traj
and then rewrite trajectory_pairs = [(traj,traj)]
?
), | ||
] | ||
|
||
# this is the actual test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test the "r" or "q" functionality somehow?
mock_input.return_value = "2" | ||
assert gatherer(trajectory_pairs) == np.array([0.0]) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: test the functionality using self._display_in_windows
not just IPython? (Probably only worth doing if it's relatively easy, e.g. ~1 hour of work, otherwise we can just take the hit in code coverage.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a general code review, of the library changes, left a few comments.
A few more notes:
- Right after running the tutorial, this is what I see:
![image](https://private-user-images.githubusercontent.com/19414946/250654297-de309116-7978-4595-a05d-c9bbd64ccfa1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5OTI5NDIsIm5iZiI6MTczODk5MjY0MiwicGF0aCI6Ii8xOTQxNDk0Ni8yNTA2NTQyOTctZGUzMDkxMTYtNzk3OC00NTk1LWEwNWQtYzliYmQ2NGNjZmExLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDA1MzA0MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTgwMjQxNjZiNDY2NzE3NWNmMTM0ODcwOGIwODdiZjI1NjFiMjI4MjlkYjIwYzEwNTNhNWMyNjVmYjM5ODJmNzMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.kRkKj8PkqbjlsA3kBgPPZeJN3ssna-XGQPnvseyDktE)
It's probably because we're still running on gym==0.21.0
which uses pyglet for rendering, which is a ton of pain. Gymnasium uses pygame, so it might be easier to make headless, but I'm not 100% sure right now.
If we could remove the individual rendering for the learning envs, that would be great. The one way I see to do this is an env without rendering for learning, and then a separate one for rendering -- would this be easy to do? I think if it works, it'd be a good improvement to the user experience of the tutorial.
-
Maybe an option to skip a given sample? (in the 1-2-r-q choice, add
s
for skip) Sometimes it's so hard to tell that it feels better to not choose anything; not sure how well this plays with the algorithm (it should be alright to just get another sample, or remove that one from the dataset) -
If possible, maybe put the videos side by side
This is in order of importance, and all of them are mainly just QoL improvements, depending on how much we want to spend on this tutorial. On the other hand, if we (or someone else) intend to use the UI for larger real workloads, they become somewhat more significant.
Also, for the sake of the tutorial, it'd be good to have some more commentary around each created object -- what's a reward net, what's a fragmenter, a preference model etc
|
||
def step(self, action): | ||
res = self.env.step(action) | ||
obs, rew, done, info = self.env.step(action) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cause an issue during the switch to Gymnasium, which has a 5-element step
return
|
||
def close(self) -> None: | ||
if self.video_recorder is not None: | ||
self.video_recorder.close() | ||
self.video_recorder = None | ||
if self.delete_on_close: | ||
for path in self.directory.glob("*.mp4"): | ||
path.unlink() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might cause an unlikely failure case if one of the videos is deleted during the iteration, might be good to include a try-except clause
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing I noticed -- I don't think this will delete the directories created along the way? But we need to be careful in case someone points at "." and then it'd get deleted upon closing
@patch("builtins.input") | ||
@patch("IPython.display.display") | ||
def test_synchronous_human_gatherer(mock_display, mock_input): | ||
del mock_display # unused |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually do anything? Manually deleting variables is rarely needed in python
ret1, frame1 = cap1.read() | ||
ret2, frame2 = cap2.read() | ||
|
||
key = chr(cv2.waitKey(1) & 0xFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment here for any future poor souls who have to wonder why there's a 0xFF
in here?
Oh wow that is ugly! I don't think I saw this when I ran it, may be a version or OS difference. I'm confused why the learning envs are rendering -- with Pendulum I thought the policy and reward took input in raw observations, and rendering was just used for the human preference comparisons? Making it all headless obviously good if we're capturing the video anyway and showing it in the notebook. I'd be OK merging this PR without fixing this and just making an issue to track it -- it still seems better to have the tutorial than not.
Yeah, I think the original DRLHP paper had an option for contractors to skip which just ignored the sample. Not that principled but works OK in practice.
I think this is too toy to be used for real human experiments, #716 will be better for real workloads. |
The windows can be avoided by using a virtual frame buffer. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #712 +/- ##
==========================================
- Coverage 96.29% 95.82% -0.48%
==========================================
Files 91 91
Lines 8728 8836 +108
==========================================
+ Hits 8405 8467 +62
- Misses 323 369 +46 ☔ View full report in Codecov by Sentry. |
Description
See #711
Testing
TODO: add notebook and experiment config that use this feature, and screenshots of behavior. (I've tested myself but not in a clean way.)