Low validation accuracy 71% for race estimation #7

kylemcdonald · 2020-09-15T01:58:09Z

When I use the pretrained model to predict race on the validation set, I get the following accuracy:

Accuracy	Category
75.54%	White
86.05%	Black
59.33%	Latino_Hispanic
78.00%	East Asian
62.26%	Southeast Asian
73.02%	Indian
61.79%	Middle Eastern
70.43%	Non-white
71.40%	All

This is very different from the accuracy reported in the paper. On the held-out datasets you report 81% average in Table 6.

This 10% difference makes me think I'm doing something wrong, or that the held-out datasets are not comparable to the validation dataset.

Here is my code:

#!/usr/bin/env python
# coding: utf-8

from torchvision import transforms, models
import torch.nn as nn
import torch
from PIL import Image
import numpy as np

with open('data/fairface_label_val.csv') as f:
    data = f.read().splitlines() # split rows
data = [row.split(',') for row in data]
data = data[1:] # drop the header
fn_data, age_data, gender_data, race_data, _ = zip(*data) # unpack into columns

# convert from race names to race indices
races_names = ['White','Black','Latino_Hispanic','East Asian','Southeast Asian','Indian','Middle Eastern']
race_indices = [races_names.index(race_name) for race_name in race_data]
race_indices = np.asarray(race_indices)

model = models.resnet34(pretrained=True)
model.fc = nn.Linear(model.fc.in_features, 18)
model.load_state_dict(torch.load('data/fair_face_models/res34_fair_align_multi_7_20190809.pt'))
model = model.to('cuda')

trans = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

def chunks(x, n):
    for i in range(0, len(x), n):
        yield x[i:i+n]

batch_size = 256
all_pred = []

for fn_batch in chunks(fn_data, batch_size):
    print('.', end='')
    
    images = [Image.open('data/padding-0.25/' + fn) for fn in fn_batch]
    images = [trans(image) for image in images]
    images = torch.stack(images).to('cuda')
    
    with torch.no_grad():
        outputs = model(images)
        pred = outputs[:,:7].argmax(-1).cpu().detach().numpy()
        
    all_pred.extend(pred)
    
all_pred = np.asarray(all_pred)

matching = all_pred == race_indices

for i, race_name in enumerate(races_names):
    accuracy = matching[race_indices==i].mean()
    print(f'{100*accuracy:05.2f}%\t{race_name}')
    
accuracy = matching[race_indices>0].mean()
print(f'{100*accuracy:05.2f}%\tNon-white')

accuracy = matching.mean()
print(f'{100*accuracy:05.2f}%\tAll')

noamgat · 2020-11-05T11:40:46Z

I can confirm I also get very similar results. In order to avoid possibility of bugs I ran predict.py with the validation data set, and used excel to compare the results. I got 70.8%. Attaching the excel:
test_outputs.zip
The overall accuracy is in cell L2 of "Sheet2" sheet.

usamatahir7 · 2020-12-09T16:55:10Z

I also have similar results.

KEVIN666666666 · 2021-02-02T03:49:35Z

same result here.

joojs · 2021-02-19T20:28:15Z

The result in Table 6 in arxiv (table 8 in wacv paper) was measured on the "external validation datasets". The paper explains how they were collected and evaluated in detail. We are not able to release these datasets because these are not under CC license. The pre-trained model is the one used in our experiments in the paper. Thanks.

joojs · 2021-02-19T21:04:03Z

Also, some experiments (race classification) were based on 4 or 5 race categories (not 7) because the other datasets we compared (eg UTK, LFWA) don't have 7.

KeyKy · 2022-03-31T06:48:16Z

what i get is age@1: 60.52, gender@1: 94.36, race@1: 72.04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low validation accuracy 71% for race estimation #7

Low validation accuracy 71% for race estimation #7

kylemcdonald commented Sep 15, 2020

noamgat commented Nov 5, 2020

usamatahir7 commented Dec 9, 2020

KEVIN666666666 commented Feb 2, 2021

joojs commented Feb 19, 2021

joojs commented Feb 19, 2021

KeyKy commented Mar 31, 2022

Low validation accuracy 71% for race estimation #7

Low validation accuracy 71% for race estimation #7

Comments

kylemcdonald commented Sep 15, 2020

noamgat commented Nov 5, 2020

usamatahir7 commented Dec 9, 2020

KEVIN666666666 commented Feb 2, 2021

joojs commented Feb 19, 2021

joojs commented Feb 19, 2021

KeyKy commented Mar 31, 2022