Rethinking sampling #147

josephdviviano · 2023-11-24T05:07:14Z

This PR is a hodgepodge of a few tweaks, bugfixes, and investigations related to the sampling logic, including a new simple continuous example.

estimator_outputs are now re-used when sampling off policy.
- This is accomplished currently using padding. In a future PR this will be handled using nested tensors. I will likely wait until their API solidifies (it will apparently change in the near future).
policy_kwargs are passed around properly to do off policy sampling, hopefully in a way that remains generic.
TODO:s added around based on my observations.
log_reward_min_clamp now off by default and only defined in the gflownets, NOT the environment.
Other minor tweaks I made while chasing a ghost relating to magically changing numbers (I now believe this to be due to faulty RAM on my laptop).
API change: the user must now specify often whether a gflownet or sampler is running off_policy or not. This is for efficiency. If sampling happens off policy, we save estimator outputs (because we assume we will need them later, to evaluate log probabilities of actions under the policy). If it's done on_policy, we calculate the log-rewards during the forward pass.

…clip_min is now optional (only if it is finite)

:

josephdviviano · 2023-11-27T20:56:04Z

Sorry in advance for the large PR - feel free to be critical ...

marpaia

In general, I think all of the code in this PR makes a lot of sense. I don't deeply understand all of your intent as well as you do (of course) but I think everything here is well put together.

In general, I have some broad bits of feedback:

This PR is really large. While this is often the case necessarily with changes that aim to make large improvements across the whole codebase, it's always worth trying to make changes more focused when possible. I know you already know that though so no worries.
We should probably come up with some strategy for dealing with the TODOs. I've worked in large projects before where each TODO had to be associated with a specific GitHub Issue for example. A lot of them relate to copy semantics which seems like a good, focused thing that we could pursue in isolation.
I understand your intent with introducing a very generic policy_kwargs dictionary as its not possible to know what parameters might be needed by continuous off-policy exploration. I think we should keep an eye on how that winds up getting used in practice though. It may be possible to type those parameters more strongly in the future.

In general, I think this is worth merging! Not the least of which because I'm excited about #149 😄

josephdviviano · 2023-11-30T15:06:09Z

Thanks for the feedback!

I really like your idea of associating each TODO with an issue. That would also make it easier to go fix the thing (you just search for the issue number in the code).

I can do this in a follow up PR!

Sorry, I knew I was being naughty when I submitted this monster PR. It essentially was a grab bag of things I tried, while trying to get the library to play ball for the gflownet workshop, and it would have been really annoying to go split it out into various PRs post-hoc. I figured it wasn't too bad because I was the only one working on it but I agree this is horrible practice and not conducive to collaboration.

I understand your desire for strong typing on the policy_kwargs but I'm worried it will add a lot of developer overhead. We should keep this in mind. In general, I don't want researchers to have to think too hard about software engineering stuff when using this library - we should figure out the minimal set of good engineering practices that are researcher friendly. To be frank, the variance in engineering skills in research is enormous, because the pipeline does not select much for engineering ability, and I think this library will have the greatest impact if we can make it accessible to as many of those people as possible.

And just to clarify the intent of the PR: I was addressing multiple points of feedback:

It's inefficient to do two forward passes on a neural network when one would suffice.
The intention to train off or on policy was too often implicit. It wasn't obvious to new users how to sample off policy. So now everything is very explicit (and often required to be passed by the user).
The copy stuff is arising from discussions with a collaborator at Intel who did some profiling of the library. I think that piece is far from over - but the stuff I did was mostly trying to track down that bug with the slightly changing values.

josephdviviano · 2023-11-30T15:07:58Z

I'll wait for @saleml (who is defending next week so is likely distracted at the minute) to merge. No rush!

saleml · 2023-12-18T12:27:21Z

On it !

saleml

Sorry for my very late review.

This is a great PR, that would make using the library much simpler. Thanks a lot @josephdviviano.

I left a few comments, questions and suggestions. They are minor. Hopefully the tests would pass after the fixes

saleml · 2023-12-18T12:29:31Z

src/gfn/containers/trajectories.py

@@ -65,7 +77,7 @@ def __init__(
        self.env = env
        self.is_backward = is_backward
        self.states = (
-            states
+            states.clone()  # TODO: Do we need this clone?


I don't see why we would need that

saleml · 2023-12-18T12:33:18Z

src/gfn/containers/trajectories.py

@@ -155,6 +168,12 @@ def __getitem__(self, index: int | Sequence[int]) -> Trajectories:
            self._log_rewards[index] if self._log_rewards is not None else None
        )

+        if is_tensor(self.estimator_outputs):
+            estimator_outputs = self.estimator_outputs[:, index]


This implicitly assumes that self.estimator_outputs is of shape max_length x n_trajectories (as is the case for example for self.log_probs). Would this always be the case?

I feel like things would easily break here unless we force some structure on estimator_outputs. Rather than torch.Tensor, it has to be some TensorType with a specific shape IMO.

What do you think of simply:

if is_tensor(self.estimator_outputs): estimator_outputs = self.estimator_outputs[..., index] estimator_outputs = estimator_outputs[:new_max_length]

?

that should work !

saleml · 2023-12-18T12:37:30Z

src/gfn/containers/trajectories.py

+        # Either set, or append, estimator outputs if they exist in the submitted
+        # trajectory.
+        if self.estimator_outputs is None and is_tensor(other.estimator_outputs):
+            self.estimator_outputs = other.estimator_outputs


but how would we match the indices of the trajectories to the indices of the estimator_outputs ?

This feels dangerous. I suggest just throwing an error when one is None and the other is not (either one).

I think the idea is to be able to extend an empty Trajectories instance, say with a stored buffer.

I agree it is dangerous but I think we should support this behaviour.

Admittedly it has been some time since I looked at this so I might be forgetting something.

Fair enough!

saleml · 2023-12-18T12:39:02Z

src/gfn/containers/trajectories.py

+                    other_shape = np.array(other.estimator_outputs.shape)
+                    required_first_dim = max(self_shape[0], other_shape[0])
+
+                    # TODO: This should be a single reused function.


Right ! There is a function elsewhere that does something similar. Maybe for a next PR.

saleml · 2023-12-18T12:40:15Z

src/gfn/env.py

@@ -83,7 +79,7 @@ def reset(
        assert not (random and sink)

        if random and seed is not None:
-            torch.manual_seed(seed)
+            torch.manual_seed(seed)  # TODO: Improve seeding here?


you made a set_seed function in common.py

saleml · 2023-12-18T13:08:22Z

src/gfn/gym/discrete_ebm.py

@@ -119,6 +116,7 @@ def make_random_states_tensor(
                    device=env.device,
                )

+            # TODO: Look into make masks - I don't think this is being called.


yes. This function can safely be deleted.

saleml · 2023-12-18T13:09:35Z

src/gfn/states.py


    def __setitem__(
        self, index: int | Sequence[int] | Sequence[bool], states: States
    ) -> None:
        """Set particular states of the batch."""
        self.tensor[index] = states.tensor

+    def clone(self) -> States:


what about batch_shape and log_reward attributes ?

Right -- I think the easiest solution here is to use deepcopy - what do you think?

saleml · 2023-12-18T13:12:32Z

src/gfn/utils/modules.py

+                arch.append(nn.Linear(hidden_dim, hidden_dim))
+                arch.append(activation())
+            self.torso = nn.Sequential(*arch)
+            self.torso.hidden_dim = hidden_dim  # TODO: what is this?


Storing the hidden_dim attribute in self.torso.

saleml · 2023-12-18T13:14:21Z

tutorials/notebooks/intro_gfn_continuous_line.ipynb

saleml · 2023-12-18T13:15:12Z

src/gfn/preprocessors.py

-        return states.tensor.float()
+        return (
+            states.tensor.float()
+        )  # TODO: should we typecast here? not a true identity...


I don't understand the question

It means that, the identity preprocessor is typecasting data, which seems like unexpected behaviour possibly. I would expect this to return whatever tensor is already inside states untouched.

josephdviviano · 2024-02-14T20:33:50Z

Hey Salem -- the indexing changes I implemented to fix this

#147 (comment)

have broken the tests -- I'm working on that now. I opened a can of worms here! There's likely an elegant solution.

josephdviviano · 2024-02-14T23:13:33Z

Finally I revered the change for that failing test. I'm not sure there's a better solution that isn't extremely complicated.

I'd like to get this PR merged but we can happily revisit this issue in a future much smaller PR.

josephdviviano · 2024-02-16T14:09:18Z

@saleml would love you to check this before I merge :)

saleml · 2024-02-16T16:04:40Z

This is some great work! Thank you Joseph for this very important PR.

I have read your replies to my comments, and seen that the tests pass. I think this can be merged as is.

Small nitpick: For this comment: #147 (comment), do you think we should add the argument in the abstract function?

josephdviviano · 2024-02-16T18:16:51Z

Small nitpick: For this comment: #147 (comment), do you think we should add the argument in the abstract function?

I added it!

josephdviviano added 30 commits November 16, 2023 08:39

improved documentation, and also formatting

95f4e01

notes, and some changes RE: passing of policy_kwargs (renamed)

7b0688b

renaming policy kwargs and changing usage

0b812e5

added policy_kwargs

6eeb0af

renamed args

a391b28

changed logic and added TODOs for improved efficiency

d9fa884

sample trajectories bugfix

2a16b1b

sample trajectories bugfix

740e16f

sync for debug

9d1ffd0

adding line environment

963693c

TODO

76ab487

added logic for sampling off policy and some helper functions

8999dd3

estimator_outputs can be passed around

450ebf0

estimator outputs can be saved

a8b637e

tweaks to demo

1acfcce

added back in default recomputing behaviour for pf in off policy mode.

e052c82

bugfix

f897aab

simplified logprobs calc

67ea36e

v1 of the line tutorial

e88e1bb

estimator outputs can be passed around to avoid recalculation

b28dc95

documentation & removal of reward clamping.

a72afe9

added clone (just a test)

46693af

black formatting, debugging code left in (commented), and log_reward_…

119559d

…clip_min is now optional (only if it is finite)

log_reward_clip_min is now default off

e8ab999

log_reward_clip_min is now optional

5e87e3f

black

732fb0f

added log reward clipping

5048f3c

formatting

3b4e597

reorg of training loop (nothing is functionally different

5ef0c22

:

isort

d69d258

further loosened test tolerances

a6601d7

josephdviviano requested a review from marpaia November 28, 2023 19:13

josephdviviano mentioned this pull request Nov 29, 2023

No more class factories #149

Merged

marpaia approved these changes Nov 30, 2023

View reviewed changes

This was referenced Dec 1, 2023

test_scripts failing by small deltas #138

Closed

Conserve LogZ value while using epsilon #148

Closed

saleml reviewed Dec 18, 2023

View reviewed changes

josephdviviano added 3 commits February 13, 2024 11:24

changes requested for PR

71da6b5

moved training specific imports here to avoid circular deps

bafa1ad

circular deps fix

0990d51

josephdviviano added 6 commits February 14, 2024 18:01

removing addiditons (additions commented out)

aa3c656

formatting common

e2ad9dd

indexing reverted to old strategy with copius documentation

be122ed

formatting of tests

cfc560c

isort / black

2bebde2

isort

e7c7453

saleml approved these changes Feb 16, 2024

View reviewed changes

josephdviviano merged commit eedc7e8 into master Feb 16, 2024
3 checks passed

josephdviviano deleted the rethinking_sampling branch February 16, 2024 18:16

josephdviviano mentioned this pull request Feb 16, 2024

[help wanted] Traning a DiscreteEBM ends up with "Log probabilities are inf. This should not happen." #136

Closed

josephdviviano mentioned this pull request Feb 27, 2024

Replay buffer broken in train_hypergrid.py #168

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rethinking sampling #147

Rethinking sampling #147

josephdviviano commented Nov 24, 2023 •

edited

Loading

josephdviviano commented Nov 27, 2023

marpaia left a comment

josephdviviano commented Nov 30, 2023 •

edited

Loading

josephdviviano commented Nov 30, 2023

saleml commented Dec 18, 2023

saleml left a comment

saleml Dec 18, 2023

saleml Dec 18, 2023

josephdviviano Feb 13, 2024

saleml Feb 16, 2024

saleml Dec 18, 2023

josephdviviano Feb 13, 2024

saleml Feb 16, 2024

saleml Dec 18, 2023

saleml Dec 18, 2023

saleml Dec 18, 2023

saleml Dec 18, 2023

josephdviviano Feb 13, 2024

saleml Dec 18, 2023

josephdviviano Feb 13, 2024

saleml Dec 18, 2023

saleml Dec 18, 2023

saleml Dec 18, 2023

josephdviviano Feb 13, 2024

josephdviviano commented Feb 14, 2024

josephdviviano commented Feb 14, 2024

josephdviviano commented Feb 16, 2024

saleml commented Feb 16, 2024

josephdviviano commented Feb 16, 2024

Rethinking sampling #147

Rethinking sampling #147

Conversation

josephdviviano commented Nov 24, 2023 • edited Loading

josephdviviano commented Nov 27, 2023

marpaia left a comment

Choose a reason for hiding this comment

josephdviviano commented Nov 30, 2023 • edited Loading

josephdviviano commented Nov 30, 2023

saleml commented Dec 18, 2023

saleml left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josephdviviano commented Feb 14, 2024

josephdviviano commented Feb 14, 2024

josephdviviano commented Feb 16, 2024

saleml commented Feb 16, 2024

josephdviviano commented Feb 16, 2024

josephdviviano commented Nov 24, 2023 •

edited

Loading

josephdviviano commented Nov 30, 2023 •

edited

Loading