-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why not do log_softmax("arch_param") in the graph? #1
Comments
Hi, thanks for your attention to our repo. Originally, for convenience, we define the variable "log_alphas" as the log probability distribution for operations. After each architecture optimization step, this defination is violated. Following ProxylessNAS(Sec. 3.2.1) and DenseNAS (A.3.), we do log_softmax() out of the graph to rescale the updated values. I think it's ok to just use param alpha and do log_softmax() in each forward step. I will run experiment for this. Thanks a lot. |
Please keep me updated on your progress. |
Can you explain these lines? Why pw_key's index is in the second dimension? |
@touchdreamer The width search only occurs on |
In train_search.py, I noticed that you do log_softmax() out of the graph, but why? Why not just use param alpha instead and do log_softmax() in each forward step?
The text was updated successfully, but these errors were encountered: