Redundant calling of `update_masks` in dafault sampler #160

listar2000 · 2024-02-17T21:19:18Z

Hi, I'm currently developing my own environment and following the training script given in the example (i.e. I do on_policy sampling using the forward policy), and I'm a little bit confused about the following redundancy:

during the sequential sampling process, at each step, the current state calls the update_masks method in its initialization method, which sets up the forward_masks and backward_masks.
once the sampling is done, the trajectories are batched into a new States object in the default samplers.py:

trajectories_states = env.States(tensor=trajectories_states)

(p.s. I think in the nightly version this is changed to)

torchgfn/src/gfn/samplers.py

Line 225 in 3276492

trajectories_states = env.states_from_tensor(trajectories_states)

which essentially does the same thing. This new state is initialized and then call update_masks on all the states (in the trajectories of states) again, which I believe has been already calculated once in step 1. So why bother repeating this process and not reusing the already computed masks?

Lots of thanks for any explanation for this :).

The text was updated successfully, but these errors were encountered:

josephdviviano · 2024-02-20T17:40:14Z

Thanks for this great point. We need a States.stack() method or stack_states() function which will accept a list of states to avoid this recomputation. Addressed in #161 .

Pseudocode

from gfn.containers.utils import stack_states

        while not all(dones):
            actions = env.actions_from_batch_shape((n_trajectories,))  # Dummy actions.
            valid_actions, actions_log_probs, estimator_outputs = self.sample_actions(
                env,
                states[~dones],
                save_estimator_outputs=True if save_estimator_outputs else False,
                calculate_logprobs=False if skip_logprob_calculaion else True,
                **policy_kwargs,
            )
            ...
            actions[~dones] = valid_actions
            ...
            if self.estimator.is_backward:
                new_states = env._backward_step(states, actions)
            else:
                new_states = env._step(states, actions)
            ...
            new_dones = (new_states.is_initial_state if self.estimator.is_backward else sink_states_mask ) & ~dones
            trajectories_dones[new_dones & ~dones] = step
            ...    
            states = new_states
            dones = dones | new_dones

            trajectories_states += [states]

        trajectories_states = stack_states(trajectories_states, dim=0)

And this stack_states method would extend all relevant attributes of the submitted states along the trajectory dim, and would return a Trajectories object.

josephdviviano · 2024-02-22T18:16:21Z

We're working on this here #163

Edit - this issue is resolved!

saleml self-assigned this Feb 18, 2024

josephdviviano self-assigned this Feb 19, 2024

saleml mentioned this issue Feb 20, 2024

Add stack method to States #161

Closed

josephdviviano closed this as completed Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redundant calling of `update_masks` in dafault sampler #160

Redundant calling of `update_masks` in dafault sampler #160

listar2000 commented Feb 17, 2024

josephdviviano commented Feb 20, 2024 •

edited

Loading

josephdviviano commented Feb 22, 2024 •

edited

Loading

Redundant calling of update_masks in dafault sampler #160

Redundant calling of update_masks in dafault sampler #160

Comments

listar2000 commented Feb 17, 2024

josephdviviano commented Feb 20, 2024 • edited Loading

josephdviviano commented Feb 22, 2024 • edited Loading

Redundant calling of `update_masks` in dafault sampler #160

Redundant calling of `update_masks` in dafault sampler #160

josephdviviano commented Feb 20, 2024 •

edited

Loading

josephdviviano commented Feb 22, 2024 •

edited

Loading