-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review #1 #6
Labels
Comments
We are extremely grateful to the reviewer for their thoughtful comments. We have made a number of changes thanks to their suggestions.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The following peer review was solicited as part of the Distill review process.
The reviewer chose to keep anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service they offer to the community.
General Comments
This paper combined several techniques (dimensionality reduction) to inspect the visual features learned in a reinforcement learning (RL) model. The main hypothesis made in the paper is the diversity hypothesis: if the RL models are trained with more diverse environments, the RL models will become more interpretable. To support this hypothesis, this paper used the procedurally-generated video game environment CoinRun as the research platform and took advantage of feature visualization techniques to identify visually interpretable focal points of the model. The interfaces provided in the paper have a rich collection of visual examples on different games trained with different experimental settings, which well support the hypothesis and are very helpful to help readers understand the claim.
This paper is well-written and provides many visual examples to help explain the ideas. Below are some suggestions on improving the writing:
It would be more intuitive for readers who are not familiar with CoinRun to play the game themselves via keyboard control. I suggest adding an interactive interface at the beginning of the paper. This helps readers understand how easy/difficult it is to play coinrun and how good the controls are.
For superscript 2, it’s worthwhile to point out that the darkness of the block represents the magnitude of the velocity.
Can you include some failed rollouts in the “Model Analysis” section to show that value function can tell that the agent will die before the episode ends?
It’s not straight forward what the “attribution channel totals” mean, what y-axis represents. A better caption or explanation is needed.
It would be easier to see the action choice if the next action is directly shown on the policy probabilities. For example, using a different color for the next action on the probability distribution would be more straight-forward.
As mentioned in the “Dissecting failure” section, the failure is due to the lack of memory and stochastic sampling. Have you tried to do the same analysis while the agent always picks the action that has the highest probability (greedy policy) from the policy distribution in the test time? And have you tried an RNN policy? Does these help reduce failure cases?
It can help readers (especially readers who are not familiar with the CoinRun environment) see clearly what are in the scene if a full resolution observation is added next to the compressed observation.
The “Landing platform moving off-screen” example is not very convincing. As even in the first few frames, the platform on the right is totally visible in the view. The agent fails here because it jumps too early. And when the agent jumps to the air, no matter what action it takes, the trajectory is barely affected in CoinRun. If the agent can move to the right one more step before it starts jumping, it will succeed. So it doesn’t seem that the agent fails here because the platform moves out of view.
In the superscript 8, it’s mentioned that the features are obtained by applying attention-based NMF to layer 2b of our model. But the paper didn’t provide a detailed description of the network structure. So it’s not clear which layer they used for the analysis.
Overall, I think this paper provides a valuable method and example of understanding visual features learned in an RL model and their interpretability. It’s a valuable contribution to the RL community.
Distill employs a reviewer worksheet as a help for reviewers.
The first three parts of this worksheet ask reviewers to rate a submission along certain dimensions on a scale from 1 to 5. While the scale meaning is consistently "higher is better", please read the explanations for our expectations for each score—we do not expect even exceptionally good papers to receive a perfect score in every category, and expect most papers to be around a 3 in most categories.
Any concerns or conflicts of interest that you are aware of?: No known conflicts of interest
What type of contributions does this article make?: Explanation of existing results
The text was updated successfully, but these errors were encountered: