We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When data_args.image_aspect_ratio = 'resize', it seems that mm_utils.process_image returns the image as a PIL.Image.Image data type, which has no shape attribute. See https://github.com/Efficient-Large-Model/VILA/blob/main/llava/mm_utils.py#L168
data_args.image_aspect_ratio = 'resize'
shape
When doing stage 1 alignment training, we use the datasets.LazySupervisedDataset class, whose get_item function tries to call image.shape here: https://github.com/Efficient-Large-Model/VILA/blob/main/llava/data/dataset.py#L834
datasets.LazySupervisedDataset
get_item
image.shape
This crashes the training. So should we simply add the line image = processor.preprocess(image, return_tensors="pt")["pixel_values"][0] below line 168 of mm_utils.py: https://github.com/Efficient-Large-Model/VILA/blob/main/llava/mm_utils.py#L168 ?
image = processor.preprocess(image, return_tensors="pt")["pixel_values"][0]
The text was updated successfully, but these errors were encountered:
Seems valid, we will verify on our end and make the changes.
Sorry, something went wrong.
Merge pull request NVlabs#54 from Efficient-Large-Model/dev/radio
7d73a25
suppport bug free radio encoder
No branches or pull requests
When
data_args.image_aspect_ratio = 'resize'
, it seems that mm_utils.process_image returns the image as a PIL.Image.Image data type, which has noshape
attribute. See https://github.com/Efficient-Large-Model/VILA/blob/main/llava/mm_utils.py#L168When doing stage 1 alignment training, we use the
datasets.LazySupervisedDataset
class, whoseget_item
function tries to callimage.shape
here: https://github.com/Efficient-Large-Model/VILA/blob/main/llava/data/dataset.py#L834This crashes the training. So should we simply add the line
image = processor.preprocess(image, return_tensors="pt")["pixel_values"][0]
below line 168 of mm_utils.py: https://github.com/Efficient-Large-Model/VILA/blob/main/llava/mm_utils.py#L168 ?
The text was updated successfully, but these errors were encountered: