Implicit assumption of random ordering of generated images in calculation of Inception Score leads to underestimated ISC. #153

adilhasan927 · 2024-08-05T08:24:31Z

Hello, in the provided evaluator.py:

    def compute_inception_score(self, activations: np.ndarray, split_size: int = 5000) -> float:
        softmax_out = []
        for i in range(0, len(activations), self.softmax_batch_size):
            acts = activations[i : i + self.softmax_batch_size]
            softmax_out.append(self.sess.run(self.softmax, feed_dict={self.softmax_input: acts}))
        preds = np.concatenate(softmax_out, axis=0)
        # https://github.com/openai/improved-gan/blob/4f5d1ec5c16a7eceb206f42bfc652693601e1d5c/inception_score/model.py#L46
        scores = []
        for i in range(0, len(preds), split_size):
            part = preds[i : i + split_size]
            kl = part * (np.log(part) - np.log(np.expand_dims(np.mean(part, 0), 0)))
            kl = np.mean(np.sum(kl, 1))
            scores.append(np.exp(kl))
        return float(np.mean(scores))

Of interest is the computation of the KL divergence in batches of 5000. This implicitly assumes that, having generated 50,000 images of, say, 1000 ImageNet classes as our conditioning information, the images are ordered randomly in the provided array and thus the mean of the KL divergence of the batch approaches the KL divergence of the whole.

If instead the case occurs where the images are ordered by class (i.e. images of class 0 as the first 50 images, class 1 as the next 50, etc etc) in the provided array, the KL divergence of the batch will spike due to the batch only containing representations of 100 out of 1000 classes, and thus the calculated ISC will be artificially low.

This issue can be fixed by adding the following line:

np.random.shuffle(activations)

As a demonstration: If I sort my own generated ImageNet sample set of 50K images in order, I get ISC of ~50, the other is not in order and gets ISC of ~366.

Fortunately, this bug does not affect the academic research which uses this script for evaluations, because authors save the images to disk as individual files, then use Python to read the files back in -- which ends up being, by happenstance, in a random enough order that the batch statistics are close to the non-batched KL divergence.

The text was updated successfully, but these errors were encountered:

zhengkw18 · 2024-10-14T19:58:19Z

Just ran into the issue of extremely low IS (~50) and find your insightful observation exactly what I want.

This was referenced Oct 18, 2024

Weird/Inconsistency evaluation IS score. FoundationVision/VAR#77

Open

Can not align FID with provided checkpoint FoundationVision/VAR#69

Closed

tojiboyevf mentioned this issue Feb 17, 2025

I can't reproduce the paper results TaylorJocelyn/D2-DPM#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implicit assumption of random ordering of generated images in calculation of Inception Score leads to underestimated ISC. #153

Implicit assumption of random ordering of generated images in calculation of Inception Score leads to underestimated ISC. #153

adilhasan927 commented Aug 5, 2024 •

edited

Loading

zhengkw18 commented Oct 14, 2024

Implicit assumption of random ordering of generated images in calculation of Inception Score leads to underestimated ISC. #153

Implicit assumption of random ordering of generated images in calculation of Inception Score leads to underestimated ISC. #153

Comments

adilhasan927 commented Aug 5, 2024 • edited Loading

zhengkw18 commented Oct 14, 2024

adilhasan927 commented Aug 5, 2024 •

edited

Loading