Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low validation accuracy 71% for race estimation #7

Open
kylemcdonald opened this issue Sep 15, 2020 · 6 comments
Open

Low validation accuracy 71% for race estimation #7

kylemcdonald opened this issue Sep 15, 2020 · 6 comments

Comments

@kylemcdonald
Copy link

When I use the pretrained model to predict race on the validation set, I get the following accuracy:

Accuracy Category
75.54% White
86.05% Black
59.33% Latino_Hispanic
78.00% East Asian
62.26% Southeast Asian
73.02% Indian
61.79% Middle Eastern
70.43% Non-white
71.40% All

This is very different from the accuracy reported in the paper. On the held-out datasets you report 81% average in Table 6.

This 10% difference makes me think I'm doing something wrong, or that the held-out datasets are not comparable to the validation dataset.

Here is my code:

#!/usr/bin/env python
# coding: utf-8

from torchvision import transforms, models
import torch.nn as nn
import torch
from PIL import Image
import numpy as np

with open('data/fairface_label_val.csv') as f:
    data = f.read().splitlines() # split rows
data = [row.split(',') for row in data]
data = data[1:] # drop the header
fn_data, age_data, gender_data, race_data, _ = zip(*data) # unpack into columns

# convert from race names to race indices
races_names = ['White','Black','Latino_Hispanic','East Asian','Southeast Asian','Indian','Middle Eastern']
race_indices = [races_names.index(race_name) for race_name in race_data]
race_indices = np.asarray(race_indices)

model = models.resnet34(pretrained=True)
model.fc = nn.Linear(model.fc.in_features, 18)
model.load_state_dict(torch.load('data/fair_face_models/res34_fair_align_multi_7_20190809.pt'))
model = model.to('cuda')

trans = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

def chunks(x, n):
    for i in range(0, len(x), n):
        yield x[i:i+n]

batch_size = 256
all_pred = []

for fn_batch in chunks(fn_data, batch_size):
    print('.', end='')
    
    images = [Image.open('data/padding-0.25/' + fn) for fn in fn_batch]
    images = [trans(image) for image in images]
    images = torch.stack(images).to('cuda')
    
    with torch.no_grad():
        outputs = model(images)
        pred = outputs[:,:7].argmax(-1).cpu().detach().numpy()
        
    all_pred.extend(pred)
    
all_pred = np.asarray(all_pred)

matching = all_pred == race_indices

for i, race_name in enumerate(races_names):
    accuracy = matching[race_indices==i].mean()
    print(f'{100*accuracy:05.2f}%\t{race_name}')
    
accuracy = matching[race_indices>0].mean()
print(f'{100*accuracy:05.2f}%\tNon-white')

accuracy = matching.mean()
print(f'{100*accuracy:05.2f}%\tAll')
@noamgat
Copy link

noamgat commented Nov 5, 2020

I can confirm I also get very similar results. In order to avoid possibility of bugs I ran predict.py with the validation data set, and used excel to compare the results. I got 70.8%. Attaching the excel:
test_outputs.zip
The overall accuracy is in cell L2 of "Sheet2" sheet.

@usamatahir7
Copy link

I also have similar results.

@KEVIN666666666
Copy link

same result here.

@joojs
Copy link

joojs commented Feb 19, 2021

The result in Table 6 in arxiv (table 8 in wacv paper) was measured on the "external validation datasets". The paper explains how they were collected and evaluated in detail. We are not able to release these datasets because these are not under CC license. The pre-trained model is the one used in our experiments in the paper. Thanks.

@joojs
Copy link

joojs commented Feb 19, 2021

Also, some experiments (race classification) were based on 4 or 5 race categories (not 7) because the other datasets we compared (eg UTK, LFWA) don't have 7.

@KeyKy
Copy link

KeyKy commented Mar 31, 2022

what i get is age@1: 60.52, gender@1: 94.36, race@1: 72.04

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants