Understanding MaskedMLP #30
-
Hi everyone, I am trying to understand the code of zuko, however zuko.nn.MaskedMLP confuses me a lot. I just wonder is there any public work related to this class? Or is there any mathematical proof about why it can just produce the exact Jacobian as the given adj? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 11 replies
-
Hey @zqy767, the class is used for construction of masked autoregressive flows. Hope, that helps you :) Cheers |
Beta Was this translation helpful? Give feedback.
-
Hello @zqy767, this a very good question! I'll start by explaining why WhyLet However, evaluating Assuming that HowThe typical way to impose adjacency in MLPs is with binary masks. Let The goal is to find a factorization of masks Trivial factorizations respecting this property exist, but they heavily reduce the expressiveness of the network. Hence, what is a good factorization and how to find one? ImplementationIn short, the factorization algorithm implemented in Formally, let's say that if a neuron of the Let's assume that all the neurons of for all Hence, given the allocations of layers |
Beta Was this translation helpful? Give feedback.
Hello @zqy767, this a very good question! I'll start by explaining why
MaskedMLP
is necessary in autoregressive transformations and then how it is implemented in Zuko. I also invite you to take a look at discussion #16, whose subject is related.Why
Let$x$ be a vector in $\mathbb{R}^n$ . An autoregressive transformation is a mapping $y = f(x) \in \mathbb{R}^n$ such that the $i$ -th element of $y$ is a bijective univariate transformation of the $i$ -th element of $x$ , conditioned on the preceding elements. That is $y_i = f_i(x_i \mid h_i(x_{1:i-1}))$ where $x_{1:i} = (x_1, x_2, \dots, x_i)$ and $h_i$ returns the parameters of the univariate transformation $f_i$ . $h_i$ is typically a neural n…