Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About dropout and BN #11

Open
AmoseKang opened this issue Jul 9, 2019 · 3 comments
Open

About dropout and BN #11

AmoseKang opened this issue Jul 9, 2019 · 3 comments

Comments

@AmoseKang
Copy link

I have two questions about you code.
The first question is, you use 0.6 dropout in early stages, dropout function is directly applied to indicator. Due to descaling property of dropout, the output of dropout function is scaled according to your dropout rate, so your indicator would be scaled to 1/0.6 or drop to 0, after early stages of training, when we remove or reset dropout rate, your indicator become 0 or 1 again. The behavior doesn't make any sense. I wonder if it is a bug. (PS: I don't understand why you use dropout rate of 100 and raise no exception either)
The second question is you apply weight mask of channels before BN, which in my opinion, totally ruins BN statistics. I suggest you could apply the mask after BN, which seems more reasonable.

@amyburden
Copy link

Got the same question here. Besides, Bn will also apply bias to every channel which will result in masked output not zero and it will make point-wise conv after bn has a shift.

@QueeneTam
Copy link

QueeneTam commented Nov 3, 2019

I have two questions about you code.
The first question is, you use 0.6 dropout in early stages, dropout function is directly applied to indicator. Due to descaling property of dropout, the output of dropout function is scaled according to your dropout rate, so your indicator would be scaled to 1/0.6 or drop to 0, after early stages of training, when we remove or reset dropout rate, your indicator become 0 or 1 again. The behavior doesn't make any sense. I wonder if it is a bug. (PS: I don't understand why you use dropout rate of 100 and raise no exception either)
The second question is you apply weight mask of channels before BN, which in my opinion, totally ruins BN statistics. I suggest you could apply the mask after BN, which seems more reasonable.

I still can't figure out how to solve the problem that indicator is scaled to 1.6666666 which makes an exception in parse_search_output.py because the function named encode_single_path_net_arch() needs inds to be 0.0 or 1.0. I still don't know how to solve this problem.

@qianhuiliu
Copy link

I have two questions about you code.
The first question is, you use 0.6 dropout in early stages, dropout function is directly applied to indicator. Due to descaling property of dropout, the output of dropout function is scaled according to your dropout rate, so your indicator would be scaled to 1/0.6 or drop to 0, after early stages of training, when we remove or reset dropout rate, your indicator become 0 or 1 again. The behavior doesn't make any sense. I wonder if it is a bug. (PS: I don't understand why you use dropout rate of 100 and raise no exception either)
The second question is you apply weight mask of channels before BN, which in my opinion, totally ruins BN statistics. I suggest you could apply the mask after BN, which seems more reasonable.

I still can't figure out how to solve the problem that indicator is scaled to 1.6666666 which makes an exception in parse_search_output.py because the function named encode_single_path_net_arch() needs inds to be 0.0 or 1.0. I still don't know how to solve this problem.

the same question, have you figured it now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants