Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding imagination process #272

Closed
anthony0727 opened this issue Apr 25, 2024 · 5 comments
Closed

Question regarding imagination process #272

anthony0727 opened this issue Apr 25, 2024 · 5 comments

Comments

@anthony0727
Copy link

I can't fully understand the imagination process
in https://github.com/Eclectic-Sheep/sheeprl/blob/main/notebooks/dreamer_v3_imagination.ipynb,

from the "context" of beginning of imagination

    if i == initial_steps - imagination_steps - 1:
        stochastic_state = player.stochastic_state.clone()
        recurrent_state = player.recurrent_state.clone()

the imagination is performed

        # imagination step
        imagined_stochastic_state, recurrent_state = world_model.rssm.imagination(
            stochastic_state, recurrent_state, actions
        )

but doesn't stochastic_state_{t-1} have to be fed into world_model.rssm.imagination to output stochastic_state_{t}?
i.e.

        # imagination step
        stochastic_state, recurrent_state = world_model.rssm.imagination(
            stochastic_state, recurrent_state, actions
        )

Really appreciate if you could help me understand this process!

@belerico
Copy link
Member

Hi @anthony0727, the notebook has been thought like this. Supopose that we want our agent to play for 200 sptes and while imagining for 45 steps (as by default in the notebook). Our final objective is to compare how the imagination differs from the real behaviour: in our example we want to compare the last 45 steps. So:

  1. First the agent plays for initial_steps while saving everything in the rb_initial buffer
  2. At initial_steps - imagination_steps - 1-th step we save the recurrent and stochastic states: this will be used as a starting point for the imagination
  3. Then we imagine for imagination_steps. During this time one can choose to really imagine actions or take the already played ones, with those actions that are used to compute the next stochastic and recurrent state from the world model, which are then used to reconstruct the image from the decoder. At the same time we reconstruct the image from the stochastic and recurrent states really played, so that we can also compare the reconstruction of the frames played

Is it more clear now?

@anthony0727
Copy link
Author

anthony0727 commented Apr 26, 2024

Thanks for the reply!

but my qusestion is,

why isn't "next" stochastic used for next next stochastic like shown in behavior learning?

imagined_prior, recurrent_state = world_model.rssm.imagination(imagined_prior, recurrent_state, actions)

vs

        # imagination step
        imagined_stochastic_state, recurrent_state = world_model.rssm.imagination(
            stochastic_state, recurrent_state, actions
        )

@belerico
Copy link
Member

belerico commented Apr 27, 2024

You're super right! Thank you both @anthony0727 and @michele-milesi for spotting that! I thought i was going blind!
I'll fix it up right now

@belerico
Copy link
Member

Opened here a new branch: can you try it out?

@anthony0727
Copy link
Author

Yup I think the fix is correct! We can close this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants