Inception Score calculation #29

sbarratt · 2017-11-13T18:42:57Z

The Inception Score calculation has 3 mistakes.

It uses an outdated Inception network that in fact outputs a 1008-vector of classes (see the following GitHub issue):

It turns out that the 1008 size softmax output is an artifact of dimension back-compatibility with a older, Google-internal system. Newer versions of the inception model have 1001 output classes, where one is an "other" class used in training. You shouldn't need to pay any attention to the extra 8 outputs.

Fix: See link for the new inception Model.

It calculates the kl-divergence directly using logs, which leads to numerical instabilities (can output nan instead of inf). Instead, `scipy.stats.entropy` should be used.

kl = part * (np.log(part) - np.log(np.expand_dims(np.mean(part, 0), 0)))
kl = np.mean(np.sum(kl, 1))

Fix: Replace the above with something along the lines of the following:

py = np.mean(part, axis=0)
l = np.mean([entropy(part[i, :], py) for i in range(part.shape[0])])

It calculates the mean of the exponential of the split rather than the exponential of the mean:

Here is the code in inception_score.py which does this:

    scores.append(np.exp(kl))
return np.mean(scores), np.std(scores)

This is clearly problematic, as can easily be seen in a very simple case with a x~Bernoulli(0.5) random variable that E[e^x] = .5(e^(0) + e^(1)) != e^(.5(0)+.5(1)) = e^[E[x]]. This can further be seen with an example w/ a uniform random variable, where the split-mean over-estimates the exponential.

import numpy as np
data = np.random.uniform(low=0., high=15., size=1000)
split_data = np.split(data, 10)
np.mean([np.exp(np.mean(x)) for x in split_data]) # 1608.25
np.exp(np.mean(data)) # 1477.25

Fix: Do not calculate the mean of the exponential of the split, and instead calculate the exponential of the mean of the KL-divergence over all 50,000 inputs.

The text was updated successfully, but these errors were encountered:

xunhuang1995 · 2017-11-13T22:29:55Z

The first point is an important issue. For the third point, note that they do NOT intend to calculate the inception score (IS) for 50,000 inputs. Rather, they spit 50,000 samples into 10 splits each with 5,000 samples. They then calculate IS for each split and return the average IS over splits. So the code is correct.

mathfinder · 2018-01-03T13:25:09Z

So,which version is correct？

tsc2017 · 2018-04-15T03:18:16Z

Hi, I have rewritten the code for calculating Inception Score, taking the first problem into consideration:https://github.com/tsc2017/inception-score

As to the second problem, since the softmax function hardly outputs a 0 for a category, which means the conditional and marginal distribution of y is supported on all the 1000 classes, it is unlikely to get a 0*log(0), log(∞) or divide-by-0 error, and I do not observe any numerical instability, neither with the old nor with my new implementation.

Lastly, since the inception score is approximated by a statistic of a sample, just make sure the sample size is big enough. The common use of 50000 images in 10 splits seems acceptable. Take the CIFAR-10 training set images as an example, the inception score is around 11.34 in 1 split and 11.31±0.08 if in 10 splits.

lipanpeng · 2018-05-21T09:05:58Z

Can anyone tell me where can I find the material about the inception score?

michaelklachko · 2018-05-23T21:46:14Z

@lipanpeng https://arxiv.org/abs/1801.01973

@xunhuang1995 the third point is valid, because 5k splits might be too small to adequately represent 1k classes. And as they show in the paper, IS changes depending on the size of the split.

sharma409 mentioned this issue Nov 13, 2017

Unable to get the same score as the tensorflow version sbarratt/inception-score-pytorch#1

Closed

akanimax mentioned this issue Feb 25, 2019

Is resizing images not required? tsc2017/Inception-Score#3

Closed

mseitzer mentioned this issue Sep 22, 2020

Why choose num_classes=1008 when creating the model? mseitzer/pytorch-fid#43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inception Score calculation #29

Inception Score calculation #29

sbarratt commented Nov 13, 2017 •

edited

Loading

xunhuang1995 commented Nov 13, 2017

mathfinder commented Jan 3, 2018

tsc2017 commented Apr 15, 2018 •

edited

Loading

lipanpeng commented May 21, 2018

michaelklachko commented May 23, 2018 •

edited

Loading

Inception Score calculation #29

Inception Score calculation #29

Comments

sbarratt commented Nov 13, 2017 • edited Loading

It uses an outdated Inception network that in fact outputs a 1008-vector of classes (see the following GitHub issue):

It calculates the kl-divergence directly using logs, which leads to numerical instabilities (can output nan instead of inf). Instead, scipy.stats.entropy should be used.

It calculates the mean of the exponential of the split rather than the exponential of the mean:

xunhuang1995 commented Nov 13, 2017

mathfinder commented Jan 3, 2018

tsc2017 commented Apr 15, 2018 • edited Loading

lipanpeng commented May 21, 2018

michaelklachko commented May 23, 2018 • edited Loading

sbarratt commented Nov 13, 2017 •

edited

Loading

It calculates the kl-divergence directly using logs, which leads to numerical instabilities (can output nan instead of inf). Instead, `scipy.stats.entropy` should be used.

tsc2017 commented Apr 15, 2018 •

edited

Loading

michaelklachko commented May 23, 2018 •

edited

Loading