Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about penalty #47

Open
hhhhnwl opened this issue Oct 31, 2022 · 4 comments
Open

question about penalty #47

hhhhnwl opened this issue Oct 31, 2022 · 4 comments

Comments

@hhhhnwl
Copy link

hhhhnwl commented Oct 31, 2022

prior = torch.ones(args.num_class)/args.num_class
prior = prior.cuda()
pred_mean = torch.softmax(logits, dim=1).mean(0)
penalty = torch.sum(prior*torch.log(prior/pred_mean))

entropy=p*log(p) why not penalty = torch.sum(pred_mean*torch.log(prior/pred_mean))

@hhhhnwl
Copy link
Author

hhhhnwl commented Oct 31, 2022

这里是要加一个最小化熵作为惩罚项吧,但是按照现在的写法pred_mean越大,loss越小,还是说有其他的考虑呢

@hhhhnwl
Copy link
Author

hhhhnwl commented Oct 31, 2022

举一个极端的例子,假设3个分类数量均衡,每个batch有三个样本
batch1 softmax的结果是[[0.33333,0.33333,0.33333],[0.33333,0.33333,0.33333],[0.33333,0.33333,0.33333]]
batch2 softmax的结果是[[1,0,0],[0,1,0],[0,0,1]]
这两个batch的penalty几乎相等,是否和预期不符呢

@YuanShunJie1
Copy link

hi, bro,
I am learning this implementation of DivideMix. The author did transform in the warm-up training stage, I wonder the reason why they did it. Do you know it? Maybe for best performance? Thanks.

@onlyonewater
Copy link

Hi, @YuanShunJie1 , I am a freshman who is studying noisy label learning, warm-up is used in the early training stage because the network will overfit the clean samples in the early training stage(these samples have small loss values), so using warm-up in the early stage could do the Co-divide operation to distinguish the clean label or noisy label. That is my understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants