-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Describe the bug
When using the MetricsSaver
handler, a list of filenames will be joined as a string first (see here), and then send to write the metric report.
In distributed mode, the way to join all filenames into a string is directly connect all filenames by the setted delimiter (see here), and all filenames need to be gathered via ignite's all_gather
function first. However, this method has a limitation that if the length of the string to be gathered is larger than 1024, the string will be truncated and only keep the first 1024 characters (the ignite source code of this function is in here).
Therefore, if the filename string is truncated, the number of metrics will be different from the number of filenames.
It is usual to get a larger than 1024 length string, for instance, a filename on my current working dataset is:
'/workspace/data/medical/Task04_Hippocampus/imagesTr/hippocampus_033.nii.gz'
that has the length 74, thus more than 14 samples will introduce the bug.
Hi @Nic-Ma @wyli , we may need to modify the metric saving way, and avoid gathering the string (tensor or float types do not have the limitation, see here). What do you think?