dv3 Imagination notebook re-visited #285

anthony0727 · 2024-05-12T11:18:50Z

I can't find a button to reopen the issue,
regarding #272, I have two questions!

shouldn't initial stochastic_state be flattened s.t.

stochastic_state = player.stochastic_state.view(1, 1, -1).clone()

in

    if i == initial_steps - imagination_steps - 1:
        stochastic_state = player.stochastic_state.clone()
        recurrent_state = player.recurrent_state.clone()

shouldn't 0.5 also be added to imagined reconstructed obs? s.t.

        step_data["rgb"] = rec_obs["rgb"].unsqueeze(0).detach().cpu().numpy() + 0.5

in

 # imagination step
        stochastic_state, recurrent_state = world_model.rssm.imagination(stochastic_state, recurrent_state, actions)
        # update current state
        imagined_latent_states = torch.cat((stochastic_state.view(1, 1, -1), recurrent_state), -1)
        rec_obs = world_model.observation_model(imagined_latent_states)
        step_data["rgb"] = rec_obs["rgb"].unsqueeze(0).detach().cpu().numpy()

The text was updated successfully, but these errors were encountered:

anthony0727 · 2024-05-12T12:22:01Z

Also -0.5 is missing here.
This is quite critical in the test phase.
when testing with the notebook, I couldn't get similar return seen at training,
and this was the main cause

                preprocessed_obs[k] = preprocessed_obs[k] / 255.0 - 0.5

in

 preprocessed_obs[k] = torch.as_tensor(v[np.newaxis], dtype=torch.float32, device=fabric.device)
            if k in cfg.algo.cnn_keys.encoder:
                preprocessed_obs[k] = preprocessed_obs[k] / 255.0 
        mask = {k: v for k, v in preprocessed_obs.items() if k.startswith("mask")}

shouldn't below fixes should be made to align buffers' timeline? I'm a bit confused in this though.
I think -1 for actions is correct,

            # actions actually played by the agent
            actions = torch.tensor(
                rb_initial["actions"][-imagination_steps + i - 1], # (anthony) subtract 1
                device=fabric.device,
                dtype=torch.float32,
            )[None]

I think the original -imagination_steps + i is correct with -1 for actions, but instead I got desired trajectory with -imagination_steps + i + 1... confusing

        # reconstruct the observations from the latent states used when interacting with the environment
        played_latent_states = torch.cat(
            (
                torch.tensor(rb_initial["stochastic_state"][-imagination_steps + i + 1], device=fabric.device), # (anthony) add 1
                torch.tensor(rb_initial["recurrent_state"][-imagination_steps + i + 1], device=fabric.device), # (anthony) add 1
            ),
            -1,
        )

anthony0727 · 2024-05-12T12:25:12Z

With the changes above, now I got the desired performance & results!

michele-milesi · 2024-05-13T08:11:14Z

Hi @anthony0727,

The player has the stochastic state that is flattened, as you can see here:

sheeprl/sheeprl/algos/dreamer_v3/agent.py

Line 655 in 4003506

self.stochastic_state = stochastic_state.reshape(1, self.num_envs, -1)

or

sheeprl/sheeprl/algos/dreamer_v3/agent.py

Line 686 in 4003506

self.stochastic_state = self.stochastic_state.view(
You are right, the reconstructed observation should be incremented by 0.5
Same as 2., we modified Dreamer and these two modifications escaped us.
+1 is correct because there is a mismatch between actions and latent states (due to the inclusion of the initial latent state in the buffer). The problem with +1 is that the last action is the one in position 0.

I will fix these problems as soon as possible, thanks for reporting these.

EDIT:
4. As it stands, latent states at index i are generated by the observation at index i-1 and are used to generate the action at index i-1. This is because the initial latent state is buffered. I will modify it because it only creates confusion.

belerico · 2024-05-13T08:40:52Z

@anthony0727 @michele-milesi thank you both!
Effectively we save the stochastic and recurrent state that refers to the previous action instead of the one that has been generated

michele-milesi · 2024-05-20T10:44:27Z

Hi @anthony0727,
we created a branch for fixing this issue, can you check if it works? (https://github.com/Eclectic-Sheep/sheeprl/tree/fix/dv3-imagination-notebook)

Thanks

anthony0727 · 2024-05-23T14:00:27Z

I'll soon look into it & new dv3 after my deadline thing!!

michele-milesi linked a pull request May 20, 2024 that will close this issue

fix: dv3-imagination notebook #290

Merged

4 tasks

belerico closed this as completed in #290 May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dv3 Imagination notebook re-visited #285

dv3 Imagination notebook re-visited #285

anthony0727 commented May 12, 2024 •

edited

Loading

anthony0727 commented May 12, 2024 •

edited

Loading

anthony0727 commented May 12, 2024

michele-milesi commented May 13, 2024 •

edited

Loading

belerico commented May 13, 2024

michele-milesi commented May 20, 2024

anthony0727 commented May 23, 2024

dv3 Imagination notebook re-visited #285

dv3 Imagination notebook re-visited #285

Comments

anthony0727 commented May 12, 2024 • edited Loading

anthony0727 commented May 12, 2024 • edited Loading

anthony0727 commented May 12, 2024

michele-milesi commented May 13, 2024 • edited Loading

belerico commented May 13, 2024

michele-milesi commented May 20, 2024

anthony0727 commented May 23, 2024

anthony0727 commented May 12, 2024 •

edited

Loading

anthony0727 commented May 12, 2024 •

edited

Loading

michele-milesi commented May 13, 2024 •

edited

Loading