-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About dropout and BN #11
Comments
Got the same question here. Besides, Bn will also apply bias to every channel which will result in masked output not zero and it will make point-wise conv after bn has a shift. |
I still can't figure out how to solve the problem that indicator is scaled to 1.6666666 which makes an exception in parse_search_output.py because the function named encode_single_path_net_arch() needs inds to be 0.0 or 1.0. I still don't know how to solve this problem. |
the same question, have you figured it now? |
I have two questions about you code.
The first question is, you use 0.6 dropout in early stages, dropout function is directly applied to indicator. Due to descaling property of dropout, the output of dropout function is scaled according to your dropout rate, so your indicator would be scaled to 1/0.6 or drop to 0, after early stages of training, when we remove or reset dropout rate, your indicator become 0 or 1 again. The behavior doesn't make any sense. I wonder if it is a bug. (PS: I don't understand why you use dropout rate of 100 and raise no exception either)
The second question is you apply weight mask of channels before BN, which in my opinion, totally ruins BN statistics. I suggest you could apply the mask after BN, which seems more reasonable.
The text was updated successfully, but these errors were encountered: