Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About implementing a reversible MLP Network #95

Open
xuedue opened this issue Aug 20, 2021 · 5 comments
Open

About implementing a reversible MLP Network #95

xuedue opened this issue Aug 20, 2021 · 5 comments

Comments

@xuedue
Copy link

xuedue commented Aug 20, 2021

Thank you for sharing.
In this framework, I have some questions.

Question1
Can this INN framework implements an MLP network with different input and output dimensions? For example, the input dimension is (batch size, 10) and the output dimension is (batch size, 2) .

Question2
Using the design of the reversible MLP in your demo, I found that when the input dimension becomes very large (thousands), the program will be stuck when running. How to solve this problem?

@fdraxler
Copy link
Collaborator

Hi, thanks for your questions.

  1. INNs represent invertible functions, so the number of incoming and outgoing dimensions must be equal. If you are interested in prediction, you are probably looking for a conditional INN (cINN), where the input is passed as a condition to the network. For an example, see VLL-HD/conditional_INNs

  2. Can you provide more details on what you mean by "be stuck"? Most likely, the computations become expensive and take their time. For debugging, could you please check whether the CPU usage is high (i.e. something is computed) and what the typical stack trace is when you do a keyboard interrupt?

@xuedue
Copy link
Author

xuedue commented Aug 26, 2021

Hi, thanks for your questions.

  1. INNs represent invertible functions, so the number of incoming and outgoing dimensions must be equal. If you are interested in prediction, you are probably looking for a conditional INN (cINN), where the input is passed as a condition to the network. For an example, see VLL-HD/conditional_INNs
  2. Can you provide more details on what you mean by "be stuck"? Most likely, the computations become expensive and take their time. For debugging, could you please check whether the CPU usage is high (i.e. something is computed) and what the typical stack trace is when you do a keyboard interrupt?

Thanks for your reply, I have solve this problem as follows:

The network I want to implement is an MLP with 8 layers of FC, the code is as follows
image
image

I found that when I modify the permute_soft parameter to False, there is no problem at all.

I have two questions here.

  1. Here subnet_fc directly returns an 8-layer MLP whether it will be a problem, because your demo is set to several layers of fc, and then use the append function to add.
  2. Does permute_soft = False affect the generated result? What is the meaning of this parameter?

@fdraxler
Copy link
Collaborator

Great!

  1. The architecture of the subnetwork is a hyperparameter of the INN, just like overall structure of the INN.
  2. permute_soft is also a hyperparameter. For the RealNVP block, input vectors are rotated and then split. Then, only have of the split dimensions is actually modified by the RealNVP block, ensuring invertibility. permute_soft concerns the mode of rotation applied, either softly (arbitrary rotations) or not (dimensions are permuted). I am not aware of systematic ablations that directly compare permute_soft and not., but both variants exist in the literature.

@xuedue
Copy link
Author

xuedue commented Sep 3, 2021

Great!

  1. The architecture of the subnetwork is a hyperparameter of the INN, just like overall structure of the INN.
  2. permute_soft is also a hyperparameter. For the RealNVP block, input vectors are rotated and then split. Then, only have of the split dimensions is actually modified by the RealNVP block, ensuring invertibility. permute_soft concerns the mode of rotation applied, either softly (arbitrary rotations) or not (dimensions are permuted). I am not aware of systematic ablations that directly compare permute_soft and not., but both variants exist in the literature.

Thank you for your reply.

I have another question to disturb you.

When I was training this reversible MLP network, I found that as the training progresses, the reversible structure of this reversible MLP is being destroyed. That is to say, as the training progresses, the gap between the input and the inverse of output gradually getting bigger.

Excuse me, why is this? Is there any solution?

Looking forward to your reply.

@psorrenson
Copy link
Collaborator

Hi, sorry for the late reply to your question. It looks like you are using a single AllInOneBlock with a large MLP as the subnet. In general, it is much more effective to have multiple blocks, each one using a smaller subnet. I'm not sure why your network is becoming less invertible as training progresses, but it may be due to numerical issues such as the outputs becoming extremely large or extremely small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants