You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Can spatial perception be achieved, such as providing a video of the specific room layout of a building, and then based on the description or input images, a destination can be given for path planning and navigation. If you only need to share short videos, how much VRAM is needed and can it be run in Google Colab
#4
Open
libai-lab opened this issue
Oct 28, 2024
· 1 comment
Can spatial perception be achieved, such as providing a video of the specific room layout of a building, and then based on the description or input images, a destination can be given for path planning and navigation. If you only need to share short videos, how much VRAM is needed and can it be run in Google Colab
The text was updated successfully, but these errors were encountered:
Thanks for your concern. Actually we do not test on the spatial perception video data, maybe you can run our demo to evaluate its spatial perception ability. For the current released weight, it can process 1024 frames on a 80G GPU, we will release another model that can understand 2048 frms.
Can spatial perception be achieved, such as providing a video of the specific room layout of a building, and then based on the description or input images, a destination can be given for path planning and navigation. If you only need to share short videos, how much VRAM is needed and can it be run in Google Colab
The text was updated successfully, but these errors were encountered: