We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Here it is probably typo:
from torchtune.models.llama3_2_vision import llama3_2_vision_transform from torchtune.datasets.multimodal import multimodal_chat_dataset transform = Llama3VisionTransform( path="/tmp/Meta-Llama-3-8B-Instruct/original/tokenizer.model", prompt_template="torchtune.data.QuestionAnswerTemplate", max_seq_len=8192, image_size=560, ) ds = multimodal_chat_dataset( model_transform=model_transform, source="json", data_files="data/my_data.json", column_map={ "dialogue": "conversations", "image_path": "image", }, image_dir="/home/user/dataset/", # /home/user/dataset/images/clock.jpg image_tag="<image>", split="train", ) tokenized_dict = ds[0] print(transform.decode(tokenized_dict["tokens"], skip_special_tokens=False)) # '<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nQuestion:<|image|>What time is it on the clock?Answer:<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nIt is 10:00AM.<|eot_id|>' print(tokenized_dict["encoder_input"]["images"][0].shape) # (num_tiles, num_channels, tile_height, tile_width) # torch.Size([4, 3, 224, 224])
Shouldn't it be just transform, not model_transform?
The text was updated successfully, but these errors were encountered:
Ah yeah, good catch. We should either use transform or model_transform across the example.
transform
model_transform
If you're able to fix it with a quick PR, happy to stamp it. Otherwise, I can address it.
Sorry, something went wrong.
No branches or pull requests
Here it is probably typo:
Shouldn't it be just transform, not model_transform?
The text was updated successfully, but these errors were encountered: