🎒 I am pursuing my PhD on the topic of visual perception and reasoning in the open world.
🔭 I’m recently focusing on scene graph generation 🕸, vision language models 🧠, and embodied AI 🤖️.
🎒 I am pursuing my PhD on the topic of visual perception and reasoning in the open world.
🔭 I’m recently focusing on scene graph generation 🕸, vision language models 🧠, and embodied AI 🤖️.
🐙Octopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.
Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23
Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)