You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If the StimulusSet['stimulus_id'] field is a string that contains e.g. only digit characters, when it is saved as a .csv and loaded from s3, the string datatype for any values that do not contain characters is not respected, resulting in other data types being loaded (as opposed to what were saved).
When brain-score merges the StimulusSet into the DataAssembly along the stimulus_id dim when loading the DataAssembly, interesting errors pop up. This is because while the stimulus_id needs to be a string in the StimulusSet in order for the StimulusSet to be uploaded, the stimulus_id also needs to be a csv-inferrable type in the DataAssembly (rather than a string) in order for the merging of the two to succeed when loading the DataAssembly
This issue is also present for fields that are not stimulus_id: string types are saved as .csv and the data types of values are then inferred on a value-by-value basis. If a column of the StimulusSet contains values where some values could be interpreted as strings, and others as integers (e.g., 'condition' = {'100', '35', 'contours', 'RGB'}), these are inferred differently, resulting in a mix of strings and integers in the StimulusSet after loading from s3. This results in errors on any tests that test for the integrity of the data.
Since it does not seem to be possible to fix this like above by enforcing data types on the DataAssembly (since DataArrays don't seem to allow mixed types), the two most reasonable workarounds to this issue seem to be to either code such values explicitly as strings (e.g., 'condition' = {'100a', '35a', 'contours', 'RGB'} instead of 'condition' = {'100', '35', 'contours', 'RGB'}), or to enforce the data types after loading.
I would suggest saving the StimulusSet in a data format that respects data types, e.g. xarray netcdf4 instead of .csv, or to add more descriptive error messages when aforementioned errors occur.
The text was updated successfully, but these errors were encountered:
When uploading StimulusSets, the
stimulus_id
has to be coded as a string, as otherwise zip packaging of the StimulusSet fails.If the
StimulusSet['stimulus_id']
field is a string that contains e.g. only digit characters, when it is saved as a .csv and loaded from s3, the string datatype for any values that do not contain characters is not respected, resulting in other data types being loaded (as opposed to what were saved).This is opposed to
DataAssembly
, which do respect data types when being loaded.When brain-score merges the
StimulusSet
into theDataAssembly
along thestimulus_id
dim when loading the DataAssembly,interesting errors pop up. This is because while the
stimulus_id
needs to be a string in theStimulusSet
in order for theStimulusSet
to be uploaded, thestimulus_id
also needs to be a csv-inferrable type in theDataAssembly
(rather than a string) in order for the merging of the two to succeed when loading theDataAssembly
This issue is also present for fields that are not
stimulus_id
:string
types are saved as.csv
and the data types of values are then inferred on a value-by-value basis. If a column of theStimulusSet
contains values where some values could be interpreted as strings, and others as integers (e.g.,'condition' = {'100', '35', 'contours', 'RGB'}
), these are inferred differently, resulting in a mix of strings and integers in theStimulusSet
after loading from s3. This results in errors on any tests that test for the integrity of the data.Since it does not seem to be possible to fix this like above by enforcing data types on the
DataAssembly
(since DataArrays don't seem to allow mixed types), the two most reasonable workarounds to this issue seem to be to either code such values explicitly as strings (e.g.,'condition' = {'100a', '35a', 'contours', 'RGB'}
instead of'condition' = {'100', '35', 'contours', 'RGB'}
), or to enforce the data types after loading.I would suggest saving the
StimulusSet
in a data format that respects data types, e.g.xarray netcdf4
instead of.csv
, or to add more descriptive error messages when aforementioned errors occur.The text was updated successfully, but these errors were encountered: