About dropout and BN #11

AmoseKang · 2019-07-09T07:28:53Z

I have two questions about you code.
The first question is, you use 0.6 dropout in early stages, dropout function is directly applied to indicator. Due to descaling property of dropout, the output of dropout function is scaled according to your dropout rate, so your indicator would be scaled to 1/0.6 or drop to 0, after early stages of training, when we remove or reset dropout rate, your indicator become 0 or 1 again. The behavior doesn't make any sense. I wonder if it is a bug. (PS: I don't understand why you use dropout rate of 100 and raise no exception either)
The second question is you apply weight mask of channels before BN, which in my opinion, totally ruins BN statistics. I suggest you could apply the mask after BN, which seems more reasonable.

amyburden · 2019-07-24T23:57:21Z

Got the same question here. Besides, Bn will also apply bias to every channel which will result in masked output not zero and it will make point-wise conv after bn has a shift.

QueeneTam · 2019-11-03T06:12:24Z

I have two questions about you code.
The first question is, you use 0.6 dropout in early stages, dropout function is directly applied to indicator. Due to descaling property of dropout, the output of dropout function is scaled according to your dropout rate, so your indicator would be scaled to 1/0.6 or drop to 0, after early stages of training, when we remove or reset dropout rate, your indicator become 0 or 1 again. The behavior doesn't make any sense. I wonder if it is a bug. (PS: I don't understand why you use dropout rate of 100 and raise no exception either)
The second question is you apply weight mask of channels before BN, which in my opinion, totally ruins BN statistics. I suggest you could apply the mask after BN, which seems more reasonable.

I still can't figure out how to solve the problem that indicator is scaled to 1.6666666 which makes an exception in parse_search_output.py because the function named encode_single_path_net_arch() needs inds to be 0.0 or 1.0. I still don't know how to solve this problem.

qianhuiliu · 2022-05-22T02:28:19Z

I have two questions about you code.
The first question is, you use 0.6 dropout in early stages, dropout function is directly applied to indicator. Due to descaling property of dropout, the output of dropout function is scaled according to your dropout rate, so your indicator would be scaled to 1/0.6 or drop to 0, after early stages of training, when we remove or reset dropout rate, your indicator become 0 or 1 again. The behavior doesn't make any sense. I wonder if it is a bug. (PS: I don't understand why you use dropout rate of 100 and raise no exception either)
The second question is you apply weight mask of channels before BN, which in my opinion, totally ruins BN statistics. I suggest you could apply the mask after BN, which seems more reasonable.

I still can't figure out how to solve the problem that indicator is scaled to 1.6666666 which makes an exception in parse_search_output.py because the function named encode_single_path_net_arch() needs inds to be 0.0 or 1.0. I still don't know how to solve this problem.

the same question, have you figured it now?

This was referenced Nov 7, 2019

About dropout step in the paper #6

Open

Did anyone reproduce the result listed in the paper with multi-GPUs? #9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About dropout and BN #11

About dropout and BN #11

AmoseKang commented Jul 9, 2019

amyburden commented Jul 24, 2019

QueeneTam commented Nov 3, 2019 •

edited

Loading

qianhuiliu commented May 22, 2022

About dropout and BN #11

About dropout and BN #11

Comments

AmoseKang commented Jul 9, 2019

amyburden commented Jul 24, 2019

QueeneTam commented Nov 3, 2019 • edited Loading

qianhuiliu commented May 22, 2022

QueeneTam commented Nov 3, 2019 •

edited

Loading