-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about the output of the decision transformer #67
Comments
I found the same question, I guess the reason why there's no problem is that Decision Transformer does not use the return_preds (the return in the next timestamp) and return_states. |
I think the comments in min_decision_transformer is easier to understand: https://github.com/nikhilbarhate99/min-decision-transformer/blob/d6694248b48c57c84fc7487e6e8017dcca861b02/decision_transformer/model.py#L152 |
Hi - You can view next token prediction as like:
Hence Therefore, with the above formulation of the predictions, we have:
If we wanted to make a prediction using Sorry I have been away for a very long time. Best, Kevin |
From the code in here:
https://github.com/kzl/decision-transformer/blob/master/gym/decision_transformer/models/decision_transformer.py#L92-L99
I'm not sure I understand why self.predict_return(x[:, 2]) or self.predict_state(x[:, 2]) is predicting the return/next state given the state and action. From the comment on the top, x[:, 2] is only the action? Am I missing something?
And if this code is correct, what is the use of x[:, 0]?
I have also asked this question in the huggingface/transformers repo:
huggingface/transformers#27916
The text was updated successfully, but these errors were encountered: