Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sparse objects benchmark #536

Merged
merged 97 commits into from
Jun 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
4690626
integrate –https://github.com/brain-score/model-tools/pull/75 to brai…
benlonnqvist Jan 10, 2024
26ed57a
Merge branch 'brain-score:master' into vision_microsaccades
benlonnqvist Jan 10, 2024
1fee011
retrigger
benlonnqvist Jan 15, 2024
26313aa
retrigger
benlonnqvist Jan 16, 2024
71ba207
add exception for when temp file write fails
benlonnqvist Jan 17, 2024
3ea9a9c
fix error with tf temp file management
benlonnqvist Jan 17, 2024
3203e7b
add bandaid to a DataAssembly index problem
benlonnqvist Jan 18, 2024
a2e0a15
retrigger
benlonnqvist Jan 24, 2024
c6974a2
initial commit
benlonnqvist Feb 9, 2024
a636742
Merge branch 'brain-score:master' into sparse_objects
benlonnqvist Feb 9, 2024
c4eb6b3
add data packaging
benlonnqvist Feb 9, 2024
d375873
revert inadvertent changes from another branch
benlonnqvist Feb 9, 2024
9ae0d9c
revert an inadvertent change from another branch
benlonnqvist Feb 9, 2024
850c073
revert inadvertent change from another branch
benlonnqvist Feb 9, 2024
9c425c0
Merge branch 'brain-score:master' into sparse_objects
benlonnqvist Feb 19, 2024
df1a386
repackage data in a more readable format
benlonnqvist Feb 26, 2024
d42aa68
Merge branch 'sparse_objects' of https://github.com/benlonnqvist/brai…
benlonnqvist Feb 26, 2024
b2d0d78
Merge branch 'brain-score:master' into sparse_objects
benlonnqvist Feb 26, 2024
40793d4
add check for perfect accuracy in observed_consistency, that existed …
benlonnqvist Feb 26, 2024
3fdf742
Merge branch 'sparse_objects' of https://github.com/benlonnqvist/brai…
benlonnqvist Feb 26, 2024
3b82aeb
add accuracy distance metric
benlonnqvist Feb 28, 2024
187f504
add accuracy_distance ceilings
benlonnqvist Feb 29, 2024
97a60e2
rename benchmark
benlonnqvist Feb 29, 2024
3139434
add composite error consistency benchmark and engineering benchmark
benlonnqvist Mar 4, 2024
d71f3ae
fix naming errors
benlonnqvist Mar 4, 2024
1089cf5
add some benchmark tests
benlonnqvist Mar 5, 2024
0241dbe
add benchmark tests and test values
benlonnqvist Mar 5, 2024
bb81ae0
add metric tests
benlonnqvist Mar 6, 2024
32b820f
correct accuracy distance computation, and the corresponding ceilings…
benlonnqvist Mar 6, 2024
e69bf05
add benchmarks to s3
benlonnqvist Mar 12, 2024
fc65bed
Merge branch 'brain-score:master' into sparse_objects
benlonnqvist Mar 18, 2024
19ec821
add some hashes
benlonnqvist Mar 18, 2024
fa0fd1f
add assembly and stimulus hashes, versions, etc.
benlonnqvist Mar 20, 2024
b1f374b
Merge branch 'sparse_objects' of https://github.com/benlonnqvist/brai…
benlonnqvist Mar 20, 2024
3476320
Merge branch 'brain-score:master' into sparse_objects
benlonnqvist Mar 20, 2024
806b861
Merge branch 'master' into sparse_objects
benlonnqvist Mar 20, 2024
b1bc727
Merge branch 'master' into sparse_objects
benlonnqvist Mar 20, 2024
59e5def
Merge branch 'master' into sparse_objects
benlonnqvist Mar 22, 2024
c47c75d
Merge branch 'master' into sparse_objects
benlonnqvist Mar 26, 2024
eb0d61a
Merge branch 'master' into sparse_objects
benlonnqvist Mar 26, 2024
5cc691c
add correct bucket path for .nc files
benlonnqvist Mar 26, 2024
8350c11
Merge branch 'master' into sparse_objects
benlonnqvist Mar 26, 2024
679c00a
fix metric tests
benlonnqvist Mar 27, 2024
92c96db
Merge branch 'sparse_objects' of https://github.com/benlonnqvist/brai…
benlonnqvist Mar 27, 2024
cc6913a
fix bug with error consistency naming
benlonnqvist Mar 27, 2024
6177803
Merge branch 'master' into sparse_objects
benlonnqvist Mar 30, 2024
6caefd1
Merge branch 'master' into sparse_objects
benlonnqvist Apr 2, 2024
12c8787
Merge branch 'master' into sparse_objects
benlonnqvist Apr 3, 2024
a9e0dae
correct benchmark identifiers
benlonnqvist Apr 3, 2024
1109093
Merge branch 'master' into sparse_objects
benlonnqvist Apr 5, 2024
0c60660
remove DATASETS that no longer exists
benlonnqvist Apr 8, 2024
f711cde
Merge branch 'sparse_objects' of https://github.com/benlonnqvist/brai…
benlonnqvist Apr 8, 2024
5509082
Merge branch 'master' into sparse_objects
benlonnqvist Apr 9, 2024
55cf8e2
Merge branch 'master' into sparse_objects
benlonnqvist May 1, 2024
6926b1c
Merge branch 'master' into sparse_objects
benlonnqvist May 8, 2024
02fa379
fix test bug
benlonnqvist May 8, 2024
a2e78cf
remove diffs to main
benlonnqvist May 8, 2024
e2f3237
Merge branch 'sparse_objects' of https://github.com/benlonnqvist/brai…
benlonnqvist May 8, 2024
186c8fe
resolve conflict
benlonnqvist May 8, 2024
4c8179c
resolve conflict
benlonnqvist May 8, 2024
446a1f0
Merge branch 'master' into sparse_objects
benlonnqvist May 8, 2024
1cf05dc
Merge branch 'master' into sparse_objects
benlonnqvist May 30, 2024
52f448f
schroedinger's benchmark registry: only by observing can you know whe…
benlonnqvist May 30, 2024
41ba2ec
remove benchmark initialization from registry
benlonnqvist May 30, 2024
cf6cf19
maybe fix issue with dataset loading
benlonnqvist May 30, 2024
8c075ce
Merge branch 'master' into sparse_objects
benlonnqvist May 31, 2024
5bf73de
Merge branch 'master' into sparse_objects
benlonnqvist Jun 4, 2024
1fdc923
add stimulus prefix to stimulus sets
benlonnqvist Jun 5, 2024
b85445a
change stimulus sets and data assemblies to have type(stimulus_id) ==…
benlonnqvist Jun 5, 2024
8bb0a41
change the .nc bucket
benlonnqvist Jun 5, 2024
2802674
image_id => stimulus_id
benlonnqvist Jun 5, 2024
727f2e5
change composite to all
benlonnqvist Jun 5, 2024
bede2ad
Merge branch 'master' into sparse_objects
benlonnqvist Jun 6, 2024
fbaf4db
data types
benlonnqvist Jun 6, 2024
31e5913
Merge branch 'sparse_objects' of https://github.com/benlonnqvist/brai…
benlonnqvist Jun 6, 2024
42693fc
change s3 .nc directory
benlonnqvist Jun 7, 2024
9b773b1
.nc
benlonnqvist Jun 7, 2024
ec63b78
another .nc
benlonnqvist Jun 7, 2024
c1962bf
.nc now on bucket brainscore-vision
benlonnqvist Jun 7, 2024
a89daa4
remove / from remote_filepath?
benlonnqvist Jun 7, 2024
b50c78f
assembly -> target
benlonnqvist Jun 7, 2024
5a47c71
subject_assembly -> target
benlonnqvist Jun 7, 2024
8a72940
assembly -> target
benlonnqvist Jun 7, 2024
84941a3
fix error with accuracydistance benchmark
benlonnqvist Jun 8, 2024
d7b9083
fix raw score error with accuracydistance
benlonnqvist Jun 8, 2024
ac0642d
ensure score <= 1.0
benlonnqvist Jun 8, 2024
2705fe5
fix lazyload bug with stimulusset
benlonnqvist Jun 8, 2024
2f4b29f
Merge branch 'master' into sparse_objects
benlonnqvist Jun 8, 2024
1cdaec2
add raw score to engineering benchmark
benlonnqvist Jun 8, 2024
b7f3c08
fix recursion in score calculation and fix expected scores
benlonnqvist Jun 8, 2024
0c339a7
fix .nc name error in test
benlonnqvist Jun 8, 2024
e9a2572
nc filename retrieval
benlonnqvist Jun 8, 2024
c44fb0b
@pytest.mark.private_access
benlonnqvist Jun 12, 2024
a5665f9
ugly code -> pretty code
benlonnqvist Jun 12, 2024
3d28874
remove unnecessary newline addition
benlonnqvist Jun 12, 2024
e93643f
add newline back
benlonnqvist Jun 12, 2024
0515591
Merge branch 'master' into sparse_objects
benlonnqvist Jun 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions brainscore_vision/benchmarks/scialom2024/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
from brainscore_vision import benchmark_registry
from . import benchmark

# behavioral benchmarks
benchmark_registry['Scialom2024_rgbBehavioralAccuracyDistance'] = lambda: benchmark._Scialom2024BehavioralAccuracyDistance('rgb')
benchmark_registry['Scialom2024_contoursBehavioralAccuracyDistance'] = lambda: benchmark._Scialom2024BehavioralAccuracyDistance('contours')
benchmark_registry['Scialom2024_phosphenes-12BehavioralAccuracyDistance'] = lambda: benchmark._Scialom2024BehavioralAccuracyDistance('phosphenes-12')
benchmark_registry['Scialom2024_phosphenes-16BehavioralAccuracyDistance'] = lambda: benchmark._Scialom2024BehavioralAccuracyDistance('phosphenes-16')
benchmark_registry['Scialom2024_phosphenes-21BehavioralAccuracyDistance'] = lambda: benchmark._Scialom2024BehavioralAccuracyDistance('phosphenes-21')
benchmark_registry['Scialom2024_phosphenes-27BehavioralAccuracyDistance'] = lambda: benchmark._Scialom2024BehavioralAccuracyDistance('phosphenes-27')
benchmark_registry['Scialom2024_phosphenes-35BehavioralAccuracyDistance'] = lambda: benchmark._Scialom2024BehavioralAccuracyDistance('phosphenes-35')
benchmark_registry['Scialom2024_phosphenes-46BehavioralAccuracyDistance'] = lambda: benchmark._Scialom2024BehavioralAccuracyDistance('phosphenes-46')
benchmark_registry['Scialom2024_phosphenes-59BehavioralAccuracyDistance'] = lambda: benchmark._Scialom2024BehavioralAccuracyDistance('phosphenes-59')
benchmark_registry['Scialom2024_phosphenes-77BehavioralAccuracyDistance'] = lambda: benchmark._Scialom2024BehavioralAccuracyDistance('phosphenes-77')
benchmark_registry['Scialom2024_phosphenes-100BehavioralAccuracyDistance'] = lambda: benchmark._Scialom2024BehavioralAccuracyDistance('phosphenes-100')
benchmark_registry['Scialom2024_segments-12BehavioralAccuracyDistance'] = lambda: benchmark._Scialom2024BehavioralAccuracyDistance('segments-12')
benchmark_registry['Scialom2024_segments-16BehavioralAccuracyDistance'] = lambda: benchmark._Scialom2024BehavioralAccuracyDistance('segments-16')
benchmark_registry['Scialom2024_segments-21BehavioralAccuracyDistance'] = lambda: benchmark._Scialom2024BehavioralAccuracyDistance('segments-21')
benchmark_registry['Scialom2024_segments-27BehavioralAccuracyDistance'] = lambda: benchmark._Scialom2024BehavioralAccuracyDistance('segments-27')
benchmark_registry['Scialom2024_segments-35BehavioralAccuracyDistance'] = lambda: benchmark._Scialom2024BehavioralAccuracyDistance('segments-35')
benchmark_registry['Scialom2024_segments-46BehavioralAccuracyDistance'] = lambda: benchmark._Scialom2024BehavioralAccuracyDistance('segments-46')
benchmark_registry['Scialom2024_segments-59BehavioralAccuracyDistance'] = lambda: benchmark._Scialom2024BehavioralAccuracyDistance('segments-59')
benchmark_registry['Scialom2024_segments-77BehavioralAccuracyDistance'] = lambda: benchmark._Scialom2024BehavioralAccuracyDistance('segments-77')
benchmark_registry['Scialom2024_segments-100BehavioralAccuracyDistance'] = lambda: benchmark._Scialom2024BehavioralAccuracyDistance('segments-100')

# composites
benchmark_registry['Scialom2024_phosphenes-allBehavioralErrorConsistency'] = lambda: benchmark._Scialom2024BehavioralErrorConsistency('phosphenes-all')
benchmark_registry['Scialom2024_segments-allBehavioralErrorConsistency'] = lambda: benchmark._Scialom2024BehavioralErrorConsistency('segments-all')

# engineering benchmarks
benchmark_registry['Scialom2024_rgbEngineeringAccuracy'] = lambda: benchmark._Scialom2024EngineeringAccuracy('rgb')
benchmark_registry['Scialom2024_contoursEngineeringAccuracy'] = lambda: benchmark._Scialom2024EngineeringAccuracy('contours')
benchmark_registry['Scialom2024_phosphenes-12EngineeringAccuracy'] = lambda: benchmark._Scialom2024EngineeringAccuracy('phosphenes-12')
benchmark_registry['Scialom2024_phosphenes-16EngineeringAccuracy'] = lambda: benchmark._Scialom2024EngineeringAccuracy('phosphenes-16')
benchmark_registry['Scialom2024_phosphenes-21EngineeringAccuracy'] = lambda: benchmark._Scialom2024EngineeringAccuracy('phosphenes-21')
benchmark_registry['Scialom2024_phosphenes-27EngineeringAccuracy'] = lambda: benchmark._Scialom2024EngineeringAccuracy('phosphenes-27')
benchmark_registry['Scialom2024_phosphenes-35EngineeringAccuracy'] = lambda: benchmark._Scialom2024EngineeringAccuracy('phosphenes-35')
benchmark_registry['Scialom2024_phosphenes-46EngineeringAccuracy'] = lambda: benchmark._Scialom2024EngineeringAccuracy('phosphenes-46')
benchmark_registry['Scialom2024_phosphenes-59EngineeringAccuracy'] = lambda: benchmark._Scialom2024EngineeringAccuracy('phosphenes-59')
benchmark_registry['Scialom2024_phosphenes-77EngineeringAccuracy'] = lambda: benchmark._Scialom2024EngineeringAccuracy('phosphenes-77')
benchmark_registry['Scialom2024_phosphenes-100EngineeringAccuracy'] = lambda: benchmark._Scialom2024EngineeringAccuracy('phosphenes-100')
benchmark_registry['Scialom2024_segments-12EngineeringAccuracy'] = lambda: benchmark._Scialom2024EngineeringAccuracy('segments-12')
benchmark_registry['Scialom2024_segments-16EngineeringAccuracy'] = lambda: benchmark._Scialom2024EngineeringAccuracy('segments-16')
benchmark_registry['Scialom2024_segments-21EngineeringAccuracy'] = lambda: benchmark._Scialom2024EngineeringAccuracy('segments-21')
benchmark_registry['Scialom2024_segments-27EngineeringAccuracy'] = lambda: benchmark._Scialom2024EngineeringAccuracy('segments-27')
benchmark_registry['Scialom2024_segments-35EngineeringAccuracy'] = lambda: benchmark._Scialom2024EngineeringAccuracy('segments-35')
benchmark_registry['Scialom2024_segments-46EngineeringAccuracy'] = lambda: benchmark._Scialom2024EngineeringAccuracy('segments-46')
benchmark_registry['Scialom2024_segments-59EngineeringAccuracy'] = lambda: benchmark._Scialom2024EngineeringAccuracy('segments-59')
benchmark_registry['Scialom2024_segments-77EngineeringAccuracy'] = lambda: benchmark._Scialom2024EngineeringAccuracy('segments-77')
benchmark_registry['Scialom2024_segments-100EngineeringAccuracy'] = lambda: benchmark._Scialom2024EngineeringAccuracy('segments-100')
97 changes: 97 additions & 0 deletions brainscore_vision/benchmarks/scialom2024/benchmark.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
import numpy as np

import brainscore_vision
from brainio.assemblies import BehavioralAssembly
from brainscore_vision import load_dataset, load_metric
from brainscore_vision.benchmark_helpers.screen import place_on_screen
from brainscore_vision.benchmarks import BenchmarkBase
from brainscore_vision.metrics import Score
from brainscore_vision.model_interface import BrainModel
from brainscore_vision.utils import LazyLoad

BIBTEX = "" # to appear in a future article


class _Scialom2024BehavioralErrorConsistency(BenchmarkBase):
def __init__(self, dataset):
self._metric = load_metric('error_consistency')
self._assembly = LazyLoad(lambda: load_assembly(dataset))
self._visual_degrees = 8
self._number_of_trials = 1

super(_Scialom2024BehavioralErrorConsistency, self).__init__(
identifier=f'Scialom2024_{dataset}-error_consistency', version=1,
ceiling_func=lambda: self._metric.ceiling(self._assembly),
parent='Scialom2024',
bibtex=BIBTEX)

def __call__(self, candidate: BrainModel):
choice_labels = set(self._assembly['truth'].values)
choice_labels = list(sorted(choice_labels))
candidate.start_task(BrainModel.Task.label, choice_labels)
stimulus_set = place_on_screen(self._assembly.stimulus_set, target_visual_degrees=candidate.visual_degrees(),
source_visual_degrees=self._visual_degrees)
labels = candidate.look_at(stimulus_set, number_of_trials=self._number_of_trials)
raw_score = self._metric(labels, self._assembly)
ceiling = self.ceiling
score = raw_score / ceiling
score.attrs['raw'] = raw_score
score.attrs['ceiling'] = ceiling
return score


class _Scialom2024BehavioralAccuracyDistance(BenchmarkBase):
# behavioral benchmark
def __init__(self, dataset):
self._metric = load_metric('accuracy_distance')
self._assembly = LazyLoad(lambda: load_assembly(dataset))
super(_Scialom2024BehavioralAccuracyDistance, self).__init__(
identifier=f'Scialom2024_{dataset}-behavioral_accuracy', version=1,
ceiling_func=lambda: self._metric.ceiling(self._assembly),
parent='Scialom2024',
bibtex=BIBTEX)

def __call__(self, candidate: BrainModel):
choice_labels = set(self._assembly.stimulus_set['truth'].values)
choice_labels = list(sorted(choice_labels))
candidate.start_task(BrainModel.Task.label, choice_labels)
labels = candidate.look_at(self._assembly.stimulus_set, number_of_trials=1)
raw_score = self._metric(labels, target=self._assembly)
ceiling = self.ceiling
score = raw_score / ceiling
# ensure score <= 1.0
if score.values > 1:
score = Score(np.array(1.))
score.attrs['raw'] = raw_score
score.attrs['ceiling'] = ceiling
return score


class _Scialom2024EngineeringAccuracy(BenchmarkBase):
# engineering/ML benchmark
def __init__(self, dataset):
self._metric = load_metric('accuracy')
# no lazyload because needed in candidate.look_at()
self._stimulus_set = load_assembly(dataset).stimulus_set
super(_Scialom2024EngineeringAccuracy, self).__init__(
identifier=f'Scialom2024_{dataset}-engineering_accuracy', version=1,
ceiling_func=lambda: Score(1),
parent='Scialom2024-top1',
bibtex=BIBTEX)

def __call__(self, candidate: BrainModel):
choice_labels = set(self._stimulus_set['truth'].values)
choice_labels = list(sorted(choice_labels))
candidate.start_task(BrainModel.Task.label, choice_labels)
labels = candidate.look_at(self._stimulus_set, number_of_trials=10)
raw_score = self._metric(labels, target=self._stimulus_set['truth'].values)
ceiling = Score(np.array(1.))
score = raw_score / ceiling
score.attrs['raw'] = raw_score
score.attrs['ceiling'] = ceiling
return score


def load_assembly(dataset: str) -> BehavioralAssembly:
assembly = brainscore_vision.load_dataset(f'Scialom2024_{dataset}')
return assembly
158 changes: 158 additions & 0 deletions brainscore_vision/benchmarks/scialom2024/test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
from pathlib import Path

import numpy as np
import pytest
from pytest import approx

from brainio.assemblies import BehavioralAssembly
from brainscore_vision import benchmark_registry, load_benchmark
from brainscore_vision.benchmark_helpers import PrecomputedFeatures
from brainscore_vision.data_helpers import s3


@pytest.mark.parametrize('benchmark', [
'Scialom2024_rgbBehavioralAccuracyDistance',
'Scialom2024_contoursBehavioralAccuracyDistance',
'Scialom2024_phosphenes-12BehavioralAccuracyDistance',
'Scialom2024_phosphenes-16BehavioralAccuracyDistance',
'Scialom2024_phosphenes-21BehavioralAccuracyDistance',
'Scialom2024_phosphenes-27BehavioralAccuracyDistance',
'Scialom2024_phosphenes-35BehavioralAccuracyDistance',
'Scialom2024_phosphenes-46BehavioralAccuracyDistance',
'Scialom2024_phosphenes-59BehavioralAccuracyDistance',
'Scialom2024_phosphenes-77BehavioralAccuracyDistance',
'Scialom2024_phosphenes-100BehavioralAccuracyDistance',
'Scialom2024_segments-12BehavioralAccuracyDistance',
'Scialom2024_segments-16BehavioralAccuracyDistance',
'Scialom2024_segments-21BehavioralAccuracyDistance',
'Scialom2024_segments-27BehavioralAccuracyDistance',
'Scialom2024_segments-35BehavioralAccuracyDistance',
'Scialom2024_segments-46BehavioralAccuracyDistance',
'Scialom2024_segments-59BehavioralAccuracyDistance',
'Scialom2024_segments-77BehavioralAccuracyDistance',
'Scialom2024_segments-100BehavioralAccuracyDistance',
'Scialom2024_phosphenes-allBehavioralErrorConsistency',
'Scialom2024_segments-allBehavioralErrorConsistency'
])
def test_benchmark_registry(benchmark):
assert benchmark in benchmark_registry


class TestBehavioral:
@pytest.mark.private_access
@pytest.mark.parametrize('dataset, expected_ceiling', [
('rgb', approx(0.98513, abs=0.001)),
('contours', approx(0.97848, abs=0.001)),
('phosphenes-12', approx(0.95416, abs=0.001)),
('phosphenes-16', approx(0.92583, abs=0.001)),
('phosphenes-21', approx(0.92166, abs=0.001)),
('phosphenes-27', approx(0.86888, abs=0.001)),
('phosphenes-35', approx(0.87277, abs=0.001)),
('phosphenes-46', approx(0.87125, abs=0.001)),
('phosphenes-59', approx(0.87625, abs=0.001)),
('phosphenes-77', approx(0.89277, abs=0.001)),
('phosphenes-100', approx(0.89930, abs=0.001)),
('segments-12', approx(0.89847, abs=0.001)),
('segments-16', approx(0.89055, abs=0.001)),
('segments-21', approx(0.88083, abs=0.001)),
('segments-27', approx(0.87083, abs=0.001)),
('segments-35', approx(0.86333, abs=0.001)),
('segments-46', approx(0.90250, abs=0.001)),
('segments-59', approx(0.87847, abs=0.001)),
('segments-77', approx(0.89013, abs=0.001)),
('segments-100', approx(0.93236, abs=0.001)), # all of the above are AccuracyDistance
('phosphenes-all', approx(0.45755, abs=0.01)), # alls are ErrorConsistency
('segments-all', approx(0.42529, abs=0.01)),
])
def test_dataset_ceiling(self, dataset, expected_ceiling):
if 'all' in dataset:
benchmark = f"Scialom2024_{dataset}BehavioralErrorConsistency"
else:
benchmark = f"Scialom2024_{dataset}BehavioralAccuracyDistance"
benchmark = load_benchmark(benchmark)
ceiling = benchmark.ceiling
assert ceiling == expected_ceiling

@pytest.mark.private_access
@pytest.mark.parametrize('dataset, expected_raw_score', [
('rgb', approx(0.92666, abs=0.001)),
('contours', approx(0.26708, abs=0.001)),
('phosphenes-12', approx(0.87666, abs=0.001)),
('phosphenes-16', approx(0.83666, abs=0.001)),
('phosphenes-21', approx(0.83166, abs=0.001)),
('phosphenes-27', approx(0.73666, abs=0.001)),
('phosphenes-35', approx(0.72416, abs=0.001)),
('phosphenes-46', approx(0.59500, abs=0.001)),
('phosphenes-59', approx(0.50666, abs=0.001)),
('phosphenes-77', approx(0.42083, abs=0.001)),
('phosphenes-100', approx(0.33166, abs=0.001)),
('segments-12', approx(0.81500, abs=0.001)),
('segments-16', approx(0.73750, abs=0.001)),
('segments-21', approx(0.69666, abs=0.001)),
('segments-27', approx(0.59500, abs=0.001)),
('segments-35', approx(0.52666, abs=0.001)),
('segments-46', approx(0.42166, abs=0.001)),
('segments-59', approx(0.34583, abs=0.001)),
('segments-77', approx(0.28916, abs=0.001)),
('segments-100', approx(0.19750, abs=0.001)), # all of the above are AccuracyDistance
('phosphenes-all', approx(0.18057, abs=0.01)), # alls are ErrorConsistency
('segments-all', approx(0.15181, abs=0.01)),
])
def test_model_8_degrees(self, dataset, expected_raw_score):
if 'all' in dataset:
benchmark = f"Scialom2024_{dataset}BehavioralErrorConsistency"
else:
benchmark = f"Scialom2024_{dataset}BehavioralAccuracyDistance"
benchmark = load_benchmark(benchmark)
nc_filename = dataset.upper() if dataset == 'rgb' else dataset
filename = f"resnet50_julios_Scialom2024_{nc_filename}.nc"
precomputed_features = Path(__file__).parent / filename
s3.download_file_if_not_exists(precomputed_features,
bucket='brainscore-vision', remote_filepath=f'benchmarks/Scialom2024/{filename}')
precomputed_features = BehavioralAssembly.from_files(file_path=precomputed_features)
precomputed_features = PrecomputedFeatures(precomputed_features, visual_degrees=8)
score = benchmark(precomputed_features)
raw_score = score.raw
# division by ceiling <= 1 should result in higher score
assert score >= raw_score
assert raw_score == expected_raw_score


class TestEngineering:
@pytest.mark.private_access
@pytest.mark.parametrize('dataset, expected_accuracy', [
('rgb', approx(0.91666, abs=0.001)),
('contours', approx(0.25000, abs=0.001)),
('phosphenes-12', approx(0.08333, abs=0.001)),
('phosphenes-16', approx(0.08333, abs=0.001)),
('phosphenes-21', approx(0.08333, abs=0.001)),
('phosphenes-27', approx(0.08333, abs=0.001)),
('phosphenes-35', approx(0.12500, abs=0.001)),
('phosphenes-46', approx(0.08333, abs=0.001)),
('phosphenes-59', approx(0.08333, abs=0.001)),
('phosphenes-77', approx(0.08333, abs=0.001)),
('phosphenes-100', approx(0.08333, abs=0.001)),
('segments-12', approx(0.08333, abs=0.001)),
('segments-16', approx(0.08333, abs=0.001)),
('segments-21', approx(0.08333, abs=0.001)),
('segments-27', approx(0.08333, abs=0.001)),
('segments-35', approx(0.08333, abs=0.001)),
('segments-46', approx(0.08333, abs=0.001)),
('segments-59', approx(0.08333, abs=0.001)),
('segments-77', approx(0.10416, abs=0.001)),
('segments-100', approx(0.10416, abs=0.001))
])
def test_accuracy(self, dataset, expected_accuracy):
benchmark = load_benchmark(f"Scialom2024_{dataset}EngineeringAccuracy")
nc_filename = dataset.upper() if dataset == 'rgb' else dataset
filename = f"resnet50_julios_Scialom2024_{nc_filename}.nc"
precomputed_features = Path(__file__).parent / filename
s3.download_file_if_not_exists(precomputed_features,
bucket='brainio-brainscore', remote_filepath=f'/benchmarks/Scialom2024/{filename}')
precomputed_features = BehavioralAssembly.from_files(file_path=precomputed_features)
precomputed_features = PrecomputedFeatures(precomputed_features, visual_degrees=8)
score = benchmark(precomputed_features)
raw_score = score.raw
# division by ceiling <= 1 should result in higher score
assert score >= raw_score
assert raw_score == expected_accuracy
Loading
Loading