Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Llava-video] Wrong data augmentation for video data #318

Open
HYUNJS opened this issue Oct 21, 2024 · 0 comments
Open

[Llava-video] Wrong data augmentation for video data #318

HYUNJS opened this issue Oct 21, 2024 · 0 comments

Comments

@HYUNJS
Copy link

HYUNJS commented Oct 21, 2024

During the training of LLaVA-Video, I observed some inconsistencies in how video data augmentation is handled.

Typically, standard video data augmentation involves applying random cropping during training and center-cropping (or multi-cropping) during testing—both after resizing the frames while maintaining the original aspect ratio.

However, the current LLaVA-Video code deviates from this approach. Instead of preserving the aspect ratio, it resizes videos directly from their original resolution (720x1280) to a fixed square shape (384x384), following the image-processing logic defined by the SigLIP image processor. This results in distorted aspect ratios.

Siglip Image Preprocessor

transforms = [
    convert_to_rgb,
    to_numpy_array,
    partial(resize, size=self.size, resample=self.resample, data_format=self.data_format),
    partial(rescale, scale=self.rescale_factor, data_format=self.data_format),
    partial(normalize, mean=self.image_mean, std=self.image_std, data_format=self.data_format),
    partial(to_channel_dimension_format, channel_dim=self.data_format, input_channel_dim=self.data_format),
]
images = reduce(lambda x, f: [*map(f, x)], transforms, images)

On the other hand, image data preprocessing is properly done by calling the version of augmentation following the config.
Image preprocessing code

Am I correctly understanding your codebase? Please let me know if I have misunderstood anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant