question about penalty #47

hhhhnwl · 2022-10-31T07:20:36Z

prior = torch.ones(args.num_class)/args.num_class
prior = prior.cuda()
pred_mean = torch.softmax(logits, dim=1).mean(0)
penalty = torch.sum(prior*torch.log(prior/pred_mean))

entropy=p*log(p) why not penalty = torch.sum(pred_mean*torch.log(prior/pred_mean))

The text was updated successfully, but these errors were encountered:

hhhhnwl · 2022-10-31T07:31:48Z

这里是要加一个最小化熵作为惩罚项吧，但是按照现在的写法pred_mean越大，loss越小，还是说有其他的考虑呢

hhhhnwl · 2022-10-31T08:18:05Z

举一个极端的例子，假设3个分类数量均衡，每个batch有三个样本
batch1 softmax的结果是[[0.33333,0.33333,0.33333],[0.33333,0.33333,0.33333],[0.33333,0.33333,0.33333]]
batch2 softmax的结果是[[1,0,0],[0,1,0],[0,0,1]]
这两个batch的penalty几乎相等，是否和预期不符呢

YuanShunJie1 · 2022-10-31T12:55:39Z

hi, bro,
I am learning this implementation of DivideMix. The author did transform in the warm-up training stage, I wonder the reason why they did it. Do you know it? Maybe for best performance? Thanks.

onlyonewater · 2023-02-05T16:27:18Z

Hi, @YuanShunJie1 , I am a freshman who is studying noisy label learning, warm-up is used in the early training stage because the network will overfit the clean samples in the early training stage(these samples have small loss values), so using warm-up in the early stage could do the Co-divide operation to distinguish the clean label or noisy label. That is my understanding.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about penalty #47

question about penalty #47

hhhhnwl commented Oct 31, 2022

hhhhnwl commented Oct 31, 2022

hhhhnwl commented Oct 31, 2022

YuanShunJie1 commented Oct 31, 2022

onlyonewater commented Feb 5, 2023

question about penalty #47

question about penalty #47

Comments

hhhhnwl commented Oct 31, 2022

hhhhnwl commented Oct 31, 2022

hhhhnwl commented Oct 31, 2022

YuanShunJie1 commented Oct 31, 2022

onlyonewater commented Feb 5, 2023