Can spatial perception be achieved, such as providing a video of the specific room layout of a building, and then based on the description or input images, a destination can be given for path planning and navigation. If you only need to share short videos, how much VRAM is needed and can it be run in Google Colab #4

libai-lab · 2024-10-28T10:23:08Z

Can spatial perception be achieved, such as providing a video of the specific room layout of a building, and then based on the description or input images, a destination can be given for path planning and navigation. If you only need to share short videos, how much VRAM is needed and can it be run in Google Colab

shuyansy · 2024-10-28T12:06:38Z

Thanks for your concern. Actually we do not test on the spatial perception video data, maybe you can run our demo to evaluate its spatial perception ability. For the current released weight, it can process 1024 frames on a 80G GPU, we will release another model that can understand 2048 frms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libai-lab commented Oct 28, 2024

shuyansy commented Oct 28, 2024

Comments

libai-lab commented Oct 28, 2024

shuyansy commented Oct 28, 2024