An unofficial Tensorflow2 implementation of the paper Circle Loss: A Unified Perspective of Pair Similarity Optimization
The interpretation of this paper is recorded in this blog
Add implementation of the paper Proxy Anchor Loss for Deep Metric Learning
The interpretation of this paper is recorded in this blog
- Support class-level labels
- Support Sparse class-level labels
- Support pair-wise labels
- Verified by experiment
from circle_loss import SparseCircleLoss, CircleLoss, PairCircleLoss
# General usage
model.compile(
loss=CircleLoss(gamma=64, margin=0.25),
optimizer=k.optimizers.Adam(),
metrics=[k.metrics.CategoricalAccuracy('acc')])
# For Sparse Categorical such as face recognition
# NOTE need set `batch_size`
model.compile(
loss=SparseCircleLoss(gamma=64, margin=0.25, batch_size=batch_size),
optimizer=k.optimizers.Adam(),
metrics=[k.metrics.SparseCategoricalAccuracy('acc')])
NOTE: In order to speed up the loss function, SparseCircleLoss
does not support dynamic batch_size
. if you need dynamic batch_size
can refer from this issue
Using cifar10 data set for classification experiment, circle loss
is better than am softmax loss
. (Python 3.7.4, tensorflow 2.1)
python ./main.py
Softmax evaluate:
10000/10000 [==============================] - 2s 177us/sample - loss: 1.5902 - acc: 0.7456
Am Softmax evaluate:
10000/10000 [==============================] - 2s 187us/sample - loss: 5.8502 - acc: 0.7727
Circle Loss evaluate:
10000/10000 [==============================] - 0s 40us/sample - loss: 12.5618 - acc: 0.8344
You can test loss fuction:
python ./circle_loss.py
circle loss:
[ 80.482956 284.16852 62.08397 87.57021 192.4977 87.34067 119.77495 41.9718 ]
sparse circle loss:
[ 80.48296 284.16852 62.08397 87.57022 192.4977 87.34067 119.77495 41.9718 ]
pair circle loss:
[ 80.48296 284.16852 62.08397 87.57022 192.4977 87.34067 119.77495 41.9718 ]
-
When I set last Dense layer
unit=3
for visualization emmbeding, I found that when the dimension of hidden space is small, the effect ofcircle loss
is very poor. The effect ofam softmax loss
is reduced.python ./main_emmbed.py Am Softmax evaluate: 10000/10000 [==============================] - 0s 25us/sample - loss: 8.9257 - acc: 0.7404 Circle Loss evaluate: 10000/10000 [==============================] - 0s 29us/sample - loss: 25.2796 - acc: 0.4561
It can be found that the classification vector of circle loss is not effectively separated:
-
The loss function used for sparse labels on
circle_loss.py line 124
has a numerical overflow problem, so I used a slower method.If there is a better method, you can contact me.
def call(self, y_true: tf.Tensor, y_pred: tf.Tensor) -> tf.Tensor: idxs = tf.concat([self.batch_idxs, tf.cast(y_true, tf.int32)], 1) sp = tf.expand_dims(tf.gather_nd(y_pred, idxs), 1) alpha_p = tf.nn.relu(self.O_p - tf.stop_gradient(sp)) alpha_n = tf.nn.relu(tf.stop_gradient(y_pred) - self.O_n) alpha_n_for_p = tf.expand_dims(tf.gather_nd(alpha_n, idxs), 1) r_sp_m = alpha_p * (sp - self.Delta_p) r_sn_m = alpha_n * (y_pred - self.Delta_n) _Z = tf.concat([r_sn_m, r_sp_m], 1) _Z = _Z * self.gamma # sum all similarity logZ = tf.math.reduce_logsumexp(_Z, 1, keepdims=True) # remove sn_p from all sum similarity # TODO This line will be numerical overflow, Need a more numerically safe method logZ = logZ + tf.math.log(1 - tf.math.exp( (alpha_n_for_p * (sp - self.Delta_n)) * self.gamma - logZ)) return -r_sp_m * self.gamma + logZ