Coarse-to-Fine Grained Text-based Video-moment Retrieval pipeline utilizing T-MASS and MESM models for efficient multi-stage text-video alignment.
machine-learning deep-learning video-processing tacos video-moment-retrieval text-video-retrieval t-mass mesm coarse-to-fine-alignment stochastic-embed stochastic-embedding tacos-cg
-
Updated
Aug 28, 2024 - Python