Coupling transform module #22

simonschnake · 2023-07-05T13:52:47Z

simonschnake
Jul 5, 2023

Hey,

I am currently sketching out how a dedicated CouplingTransformModule would look like.
I know that you can get something similar by using the MaskedAffineTransformModule with passes=2.
My usage is something different, I want to use different conditioner networks or in the language of zuko hyper network.

I wanted to ask if there is interest to add this to zuko.

Cheers
Simon

francois-rozet · 2023-07-05T20:00:40Z

francois-rozet
Jul 5, 2023
Maintainer

Hello @simonschnake, thanks for the issue. There is interest for purely coupling transformations. Some hyper networks are hard/impossible to write as masked networks. It is notably the case of convolutional networks. I implemented a ConvConditionalTransform for a PR (#7) that implements multi-scale flows, but I was not satisfied with the interface. If you have ideas to implement this cleanly, I would love to discuss that with you.

0 replies

simonschnake · 2023-07-08T18:13:22Z

simonschnake
Jul 8, 2023
Author

Okay, very nice.

Here are my CouplingTransformModule and the CouplingTransform.

One difference in the implementation, compared to other implementations, is that both parts of the features are transformed sequentially. In most implementations, only one side undergoes the transformations. There is not a significant difference between them. My rationale was to ensure that all features are transformed an equal number of times. I can also provide the vanilla implementation of a hyper network. What do you think of it?

I haven't implemented any tests yet, but I can do that and provide a pull request.

class CouplingTransformModule(TransformModule):
    r"""Creates a coupling transformation module.

    References:
        | NICE: Non-linear Independent Components Estimation (Dinh et al., 2014)
        | https://arxiv.org/abs/1410.8516v6

    Arguments:
        features: The number of features.
        context: The number of context features.
        order: The feature ordering. If :py:`None`, use :py:`range(features)` instead.
        univariate: The univariate transformation constructor.
        shapes: The shapes of the univariate transformation parameters.
        kwargs: Keyword arguments passed to conditioner networks.
    """ 

    def __init__(
        self,
        features: int,
        context: int = 0,
        order: LongTensor = None,
        univariate: Callable[..., Transform] = MonotonicAffineTransform,
        shapes: Sequence[Size] = ((), ()),
        hyper_network: nn.Module = None, # TODO <- hyper_network needs to be sketched out 
        **kwargs,
    ):
        super().__init__()

        # Univariate transformation
        self.univariate = univariate
        self.shapes = list(map(Size, shapes))
        self.sizes = [s.numel() for s in self.shapes]

        if order is None:
            order = torch.arange(features)
        else:
            order = torch.as_tensor(order)

        self.register_buffer('first_features', order[:len(order) // 2])
        self.register_buffer('second_features', order[len(order) // 2:])

        self.hyper_first = hyper_network(
            first_features = self.first_features,
            second_features = self.second_features,
            num_params = sum(self.sizes),
            context_features = context,
            **kwargs
        )

        self.hyper_second = hyper_network(
            first_features = self.second_features,
            second_features = self.first_features,
            num_params = sum(self.sizes),
            context_features = context,
            **kwargs
        )

class CouplingTransform(Transform):
    r"""Transform via a coupling scheme.

    .. math:: y_i = f(m x, \bar{m} x)

    Arguments:
        meta_first: A meta function which returns the transformation :math:`f_1` of the first half.
        meta_second: A meta function which returns the transformation :math:`f_2` of the second half.
    """

    domain = constraints.real_vector
    codomain = constraints.real_vector
    bijective = True

    def __init__(
        self,
        meta_first: Callable[[Tensor], Transform],
        meta_second: Callable[[Tensor], Transform],
        first_features: Tensor,
        second_features: Tensor,
        **kwargs,
    ) -> None:
        super().__init__(**kwargs)

        self.meta_first = meta_first
        self.meta_second = meta_second
        self.first_features = first_features
        self.second_features = second_features

    def _call(self, x: Tensor) -> Tensor:
        x_first = x[..., self.first_features]
        x_second = x[..., self.second_features]

        y_first = self.meta_first(x_second)(x_first)
        y_second = self.meta_second(y_first)(x_second)

        y = torch.empty_like(x)
        y[..., self.first_features] = y_first
        y[..., self.second_features] = y_second

        return y

    def _inverse(self, y: Tensor) -> Tensor:
        y_first = y[..., self.first_features]
        y_second = y[..., self.second_features]

        x_second = self.meta_second(y_first)(y_second)
        x_first = self.meta_first(x_second)(y_first)

        x = torch.empty_like(x)
        x[..., self.first_features] = x_first
        x[..., self.second_features] = x_second

        return x
    
    def log_abs_det_jacobian(self, x: Tensor, y: Tensor) -> Tensor:
        x_first = x[..., self.first_features]
        x_second = x[..., self.second_features]
        y_first = y[..., self.first_features]
        y_second = y[..., self.second_features]

        ladj_first = self.meta_first(x_second).log_abs_det_jacobian(x_first, y_first).sum(dim=-1)
        ladj_second = self.meta_second(y_first).log_abs_det_jacobian(x_second, y_second).sum(dim=-1)

        return ladj_first + ladj_second
        
    def call_and_ladj(self, x: Tensor) -> Tuple[Tensor, Tensor]:
        x_first = x[..., self.first_features]
        x_second = x[..., self.second_features]

        y_first, ladj_first = self.meta_first(x_second).call_and_ladj(x_first)
        y_second, ladj_second = self.meta_second(y_first).call_and_ladj(x_second)

        y = torch.empty_like(x)
        y[..., self.first_features] = y_first
        y[..., self.second_features] = y_second

        return y_first, ladj_first + ladj_second
        ```

5 replies

francois-rozet Jul 9, 2023
Maintainer

Thank you very much! Here are a few thoughts:

Although it is a great idea to ensure that all features are transformed an "equal number of times",
I think diverging from the usual coupling definition $y_a = x_a$ and $y_b = f(x_b; x_a)$ could lead to some confusion and it would make it hard to benchmark against other libraries (such as nflows).
Instead of using an order to split the features, I think it would be more appropriate to use a mask. This would allow to use torch.masked_select and torch.masked_scatter to split/merge the two parts, which is faster than indexing (although not by much).
For the hyper network, I think a callable of the type (input_size: int, output_size: int) -> nn.Module such as the class MLP would fit. In this case input_size would be the number of coupling features ($x_a$) plus the number of context features ($c$) and output_size would be the number of parameters for $x_b$.

WDYT?

simonschnake Jul 10, 2023
Author

Hey @francois-rozet,

thanks for the comments.

For the first point, I think you are right. That would be better. I will change it that way.
Also, here, I think you are right. Probably, if one wants to build a more complex coupling flow, one has to build a dedicated module.
Here I disagree, I haven't worked a lot with torch.masked_select and torch.scatter, but they are definitely more complex to work with. Also my benchmarks show that indexing is faster than using `masked_select or using a mask.

Python 3.10.5 (main, Jun 21 2022, 11:18:08) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.12.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import torch

In [2]: x = torch.randn(1000, 1000, 20, device='cuda')

In [3]: mask = torch.rand(20, device='cuda') > 0.5

In [4]: idx = torch.arange(20, device='cuda')[mask]

In [5]: %%timeit
   ...: x.masked_select(mask)
925 µs ± 472 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [6]: %%timeit
   ...: x[..., mask]
143 µs ± 93.8 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [7]: %%timeit
   ...: x[..., idx]
110 µs ± 94 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [8]: torch.__version__
Out[8]: '2.0.0+cu118'

Could be that it is different for other versions of python/torch/cuda. I tested it on an A100.

Have you done other measurements, or do you thought of another way to use masked_select?

Cheers
Simon

francois-rozet Jul 10, 2023
Maintainer

I did some test and I think you are indeed right about the indexing vs masking. However we should guarantee that no indices are duplicated, and this might be hard if the transformation takes a set of indices as input. So maybe we can take a mask as input and transform it to indices with mask.nonzero().squeeze() in __init__. Also, I am not sure that y[..., idx] = y_b can be differentiated.

simonschnake Jul 11, 2023
Author

That's how nflows is implementing the split
https://github.com/bayesiains/nflows/blob/569c8ad50941824ccb07aa3a3eb59c85721e0c5f/nflows/transforms/coupling.py#L90C1-L100C34

I would do nearly the same.

francois-rozet Jul 11, 2023
Maintainer

Ok, great! You can submit a PR and I'll review it! The only question that remains is the name of the CouplingTransformModule class. It does not match the other names in the zuko.flows module.

francois-rozet · 2023-07-13T20:28:26Z

francois-rozet
Jul 13, 2023
Maintainer

PR #23 adds coupling transformations to Zuko 🔥

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coupling transform module #22

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Coupling transform module #22

simonschnake Jul 5, 2023

Replies: 3 comments · 5 replies

francois-rozet Jul 5, 2023 Maintainer

simonschnake Jul 8, 2023 Author

francois-rozet Jul 9, 2023 Maintainer

simonschnake Jul 10, 2023 Author

francois-rozet Jul 10, 2023 Maintainer

simonschnake Jul 11, 2023 Author

francois-rozet Jul 11, 2023 Maintainer

francois-rozet Jul 13, 2023 Maintainer

simonschnake
Jul 5, 2023

Replies: 3 comments 5 replies

francois-rozet
Jul 5, 2023
Maintainer

simonschnake
Jul 8, 2023
Author

francois-rozet Jul 9, 2023
Maintainer

simonschnake Jul 10, 2023
Author

francois-rozet Jul 10, 2023
Maintainer

simonschnake Jul 11, 2023
Author

francois-rozet Jul 11, 2023
Maintainer

francois-rozet
Jul 13, 2023
Maintainer