Fix off policy #174

saleml · 2024-03-21T16:41:13Z

This fixes #168.
The idea is to remove the arguments we had before off_policy and sample_off_policy, and be explicit about what we're evaluating and storing when sampling.
When being on_policy, we should store the logprobs. This is the default.
When being off_policy, with a tempered/modified PF, we should only store estimator_outputs.
When we use a replay buffer, we don't need to store anything - we should recalculate the logprobs.

Additionally, this fixes FM + ReplayBuffer, that was broken before, because states extension didn't take into account the _log_probs attribute.

josephdviviano

I really like these API changes -- I have a few small questions before we approve (but this might not require any further changes to the code -- I just want to understand).

josephdviviano · 2024-03-27T17:47:07Z

src/gfn/env.py

@@ -393,7 +393,7 @@ class DiscreteEnvStates(DiscreteStates):

    def make_actions_class(self) -> type[Actions]:
        env = self
-        n_actions = self.n_actions
+        self.n_actions


What's going on here? I find this confusing.

I'm adding it back in. I'm sure this works and potentially correct but I find it weird, I suspect others will as well.

Not sure what happened. Actually, we don't need that line altogether (thanks Pylance) !
I'm removing the whole line

ok that works for me ;)

josephdviviano · 2024-03-27T18:05:42Z

src/gfn/gflownet/detailed_balance.py

@@ -66,19 +72,20 @@ def get_scores(self, env: Env, transitions: Transitions) -> Tuple[

        if states.batch_shape != tuple(actions.batch_shape):
            raise ValueError("Something wrong happening with log_pf evaluations")
-        if not self.off_policy:
+        if (
+            transitions.log_probs is not None


I'm seeing this logic a few times in the code. Should we abstract it into a utility like

def has_log_probs(obj): return obj.log_probs is not None and obj.log_probs.nelement() > 0

?

I've added this utility function.

josephdviviano · 2024-03-27T18:23:32Z

src/gfn/gflownet/detailed_balance.py

-            # Evaluate the log PF of the actions sampled off policy.
-            # I suppose the Transitions container should then have some
-            # estimator_outputs attribute as well, to avoid duplication here ?
-            # See (#156).


Why did you remove this issue reference (#156) ?

My bad! Added back

josephdviviano · 2024-03-27T18:26:08Z

src/gfn/gflownet/trajectory_balance.py

@@ -53,7 +53,9 @@ def loss(self, env: Env, trajectories: Trajectories) -> TT[0, float]:
            ValueError: if the loss is NaN.
        """
        del env  # unused
-        _, _, scores = self.get_trajectories_scores(trajectories)
+        _, _, scores = self.get_trajectories_scores(
+            trajectories, recalculate_all=recalculate_all


I'm wondering if there's a more explicit name for recalculate_all -- like recalculate_all_logprobs?

yes, good idea, done

josephdviviano · 2024-03-27T18:28:13Z

src/gfn/samplers.py

            policy_kwargs: keyword arguments to be passed to the
                `to_probability_distribution` method of the estimator. For example, for
                DiscretePolicyEstimators, the kwargs can contain the `temperature`
                parameter, `epsilon`, and `sf_bias`. In the continuous case these
                kwargs will be user defined. This can be used to, for example, sample
                off-policy.
-            debug_mode: if True, everything gets calculated.


Why is debug_mode removed? If I recall, this was important for tests.

Isn't this the same as recalculate_all?

right -- I'll change it back and add a note :)

saleml · 2024-04-02T07:50:35Z

src/gfn/utils/common.py

Good idea! Can the function be a class method of Container?
self.has_log_prob() looks more natural than has_log_prob(self)

Hmm, the only issue is we actually use it in TrajectoryBasedGFlowNet

josephdviviano

lgtm!

saleml added 5 commits March 21, 2024 20:34

make explicit what to save during sampling

c67cce4

fix flow matching + replay buffer

462c8c9

fix tests and scripts and tutorials

32ba79b

revert back to default TB in script

70ae3e1

pre-commit stuff

87c29b5

saleml requested a review from josephdviviano March 21, 2024 17:00

josephdviviano added 2 commits March 27, 2024 13:39

Merge branch 'master' into fix_off_policy

6488518

added has_log_probs function

feacb14

josephdviviano reviewed Mar 27, 2024

View reviewed changes

josephdviviano added 2 commits March 27, 2024 14:57

hasattr not isinstance

cc67a59

black

a50af8e

josephdviviano mentioned this pull request Mar 31, 2024

Is it normal that running train_line.py renders samples from one mode only? #173

Closed

josephdviviano assigned josephdviviano and saleml Mar 31, 2024

saleml commented Apr 2, 2024

View reviewed changes

saleml added 4 commits April 2, 2024 11:58

remove unused variable

8b2d124

add back comment referring to issue 156

368af4c

change recalculate_all to recalculate_all_logprobs

bff764d

add forgotten keyword in train_line.py

89c72b5

saleml requested a review from josephdviviano April 2, 2024 10:03

saleml mentioned this pull request Apr 2, 2024

Train line bugfix #176

Merged

black

9ae95a5

josephdviviano approved these changes Apr 2, 2024

View reviewed changes

josephdviviano merged commit 74cd34e into master Apr 2, 2024
3 checks passed

josephdviviano deleted the fix_off_policy branch April 2, 2024 14:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix off policy #174

Fix off policy #174

saleml commented Mar 21, 2024

josephdviviano left a comment

josephdviviano Mar 27, 2024

josephdviviano Mar 27, 2024

saleml Apr 2, 2024

josephdviviano Apr 2, 2024

josephdviviano Mar 27, 2024

josephdviviano Mar 27, 2024

saleml Apr 2, 2024

josephdviviano Mar 27, 2024

saleml Apr 2, 2024

josephdviviano Mar 27, 2024

saleml Apr 2, 2024

josephdviviano Mar 27, 2024

saleml Apr 2, 2024

josephdviviano Apr 2, 2024

saleml Apr 2, 2024

josephdviviano Apr 2, 2024

josephdviviano left a comment

Fix off policy #174

Fix off policy #174

Conversation

saleml commented Mar 21, 2024

josephdviviano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josephdviviano left a comment

Choose a reason for hiding this comment