You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You’re correct that the current 3D/2D features lack explicit coordinate information. Only the DINOv2 features might capture some local spatial relationships within the images. The strong performance on datasets like ScanRefer could be attributed to their detailed attribute descriptions of the target object. The model can be further improved by incorporating encoders capable of capturing coordinate information.
In your implementation, the 3D features obtained by UNI3D and 2D features obtained by DINOV2 drop the absolute coordinate information of objects.
The text was updated successfully, but these errors were encountered: