Potential bug in mm_utils.py process_image function #54

hubenjm · 2024-05-09T22:34:43Z

When data_args.image_aspect_ratio = 'resize', it seems that mm_utils.process_image returns the image as a PIL.Image.Image data type, which has no shape attribute. See https://github.com/Efficient-Large-Model/VILA/blob/main/llava/mm_utils.py#L168

When doing stage 1 alignment training, we use the datasets.LazySupervisedDataset class, whose get_item function tries to call image.shape here: https://github.com/Efficient-Large-Model/VILA/blob/main/llava/data/dataset.py#L834

This crashes the training. So should we simply add the line
image = processor.preprocess(image, return_tensors="pt")["pixel_values"][0]
below line 168 of mm_utils.py: https://github.com/Efficient-Large-Model/VILA/blob/main/llava/mm_utils.py#L168 ?

The text was updated successfully, but these errors were encountered:

Efficient-Large-Language-Model · 2024-05-10T15:18:33Z

Seems valid, we will verify on our end and make the changes.

suppport bug free radio encoder

gheinrich pushed a commit to gheinrich/VILA that referenced this issue Dec 16, 2024

Merge pull request NVlabs#54 from Efficient-Large-Model/dev/radio

7d73a25

suppport bug free radio encoder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential bug in mm_utils.py process_image function #54

Potential bug in mm_utils.py process_image function #54

hubenjm commented May 9, 2024

Efficient-Large-Language-Model commented May 10, 2024

Potential bug in mm_utils.py process_image function #54

Potential bug in mm_utils.py process_image function #54

Comments

hubenjm commented May 9, 2024

Efficient-Large-Language-Model commented May 10, 2024