You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the paper, it is stated that the temperature T has to be a positive number. In the code, however, although the temperature is initialized with a positive number (i.e., self.temperature = nn.Parameter(torch.ones(1) * 1.5)), it seems to me that there is nothing in the code of .set_temperature() to make sure that we do not end up with a negative temperature.
Did I miss something? Or, is it because it can be mathematically proven that the gradient will never push the temperature to the negative side as long as it is initialized to be positive? If not so, should we initialize with something like self.temperature = nn.Parameter(torch.ones(1) * 1.5) ** 2 to ensure that self.temperature is always positive?
The text was updated successfully, but these errors were encountered:
@eugene-yh Intuitively, I think it should be strange that the temperature takes negative numbers since it would be inverting the prediction of the network (i.e. the most likely class would now be the least likely)
That being said, it actually happened to me that I found a negative temperature in some case. I found that adding a torch.abs(self.temperature) for the closure function worked well. For instance:
def closure():
optimizer.zero_grad()
scaled_logits = logits / torch.abs(self.temperature) # Ensure temperature stays positive in optimization.
if metric == 'ece':
loss = ece_criterion(scaled_logits, labels)
elif metric == 'nll':
loss = nll_criterion(scaled_logits, labels)
else:
raise NotImplementedError()
loss.backward()
return loss
In the paper, it is stated that the temperature T has to be a positive number. In the code, however, although the temperature is initialized with a positive number (i.e., self.temperature = nn.Parameter(torch.ones(1) * 1.5)), it seems to me that there is nothing in the code of .set_temperature() to make sure that we do not end up with a negative temperature.
Did I miss something? Or, is it because it can be mathematically proven that the gradient will never push the temperature to the negative side as long as it is initialized to be positive? If not so, should we initialize with something like self.temperature = nn.Parameter(torch.ones(1) * 1.5) ** 2 to ensure that self.temperature is always positive?
The text was updated successfully, but these errors were encountered: