发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-11-29 | DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting | DROID-Splat:结合端到端SLAM与3D高斯Splatting | Christian Homeyer, Leon Begiristain, Christoph Schnörr | http://arxiv.org/pdf/2411.17660v2 | https://github.com/ChenHoy/DROID-Splat |
2024-11-29 | DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering | DeSplat:用于无干扰渲染的分解高斯喷溅 | Yihao Wang, Marcus Klasson, Matias Turkulainen, Shuzhe Wang, Juho Kannala, Arno Solin | http://arxiv.org/pdf/2411.19756v1 | None |
2024-11-29 | TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting | 基于八叉树的三维高斯分层渲染生成高质量PBR材质:TexGaussian | Bojun Xiong, Jialun Liu, Jiakui Hu, Chenming Wu, Jinbo Wu, Xing Liu, Chen Zhao, Errui Ding | http://arxiv.org/pdf/2411.19654v1 | None |
2024-11-29 | Gaussian Splashing: Direct Volumetric Rendering Underwater | 高斯溅射:水下直接体渲染 | Nir Mualem, Roy Amoyal, Oren Freifeld, Derya Akkaynak | http://arxiv.org/pdf/2411.19588v1 | None |
2024-11-29 | Tortho-Gaussian: Splatting True Digital Orthophoto Maps | Tortho-Gaussian:分层渲染真实数字正射影像图 | Xin Wang, Wendi Zhang, Hong Xie, Haibin Ai, Qiangqiang Yuan, Zongqian Zhan | http://arxiv.org/pdf/2411.19594v1 | None |
2024-11-29 | Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding | 自举高斯聚类以实现视一致性3D场景理解 | Wenbo Zhang, Lu Zhang, Ping Hu, Liqian Ma, Yunzhi Zhuge, Huchuan Lu | http://arxiv.org/pdf/2411.19551v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-11-29 | AlphaTablets: A Generic Plane Representation for 3D Planar Reconstruction from Monocular Videos | AlphaTablets:一种用于从单目视频中重建3D平面表示的通用方法 | Yuze He, Wang Zhao, Shaohui Liu, Yubin Hu, Yushi Bai, Yu-Hui Wen, Yong-Jin Liu | http://arxiv.org/pdf/2411.19950v1 | None |
2024-11-29 | FaVoR: Features via Voxel Rendering for Camera Relocalization | 基于体素渲染的相机重定位特征 | Vincenzo Polizzi, Marco Cannici, Davide Scaramuzza, Jonathan Kelly | http://arxiv.org/pdf/2409.07571v2 | None |
2024-11-29 | Free-form Generation Enhances Challenging Clothed Human Modeling | 自由形式生成增强具有挑战性的着装人体建模 | Hang Ye, Xiaoxuan Ma, Hai Ci, Wentao Zhu, Yizhou Wang | http://arxiv.org/pdf/2411.19942v1 | None |
2024-11-29 | MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds | MoSca:通过4D运动支架从非结构化视频中动态高斯融合 | Jiahui Lei, Yijia Weng, Adam Harley, Leonidas Guibas, Kostas Daniilidis | http://arxiv.org/pdf/2405.17421v2 | None |
2024-11-29 | Quantifying the synthetic and real domain gap in aerial scene understanding | 量化空中场景理解中的合成与真实域差距 | Alina Marcu | http://arxiv.org/pdf/2411.19913v1 | None |
2024-11-29 | SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens | SAT-HMR:通过尺度自适应标记实现实时多人3D网格估计 | Chi Su, Xiaoxuan Ma, Jiajun Su, Yizhou Wang | http://arxiv.org/pdf/2411.19824v1 | None |
2024-11-29 | PerLA: Perceptive 3D Language Assistant | 感知3D语言助手:PerLA | Guofeng Mei, Wei Lin, Luigi Riz, Yujiao Wu, Fabio Poiesi, Yiming Wang | http://arxiv.org/pdf/2411.19774v1 | None |
2024-11-29 | Aggregated Attributions for Explanatory Analysis of 3D Segmentation Models | 3D分割模型解释性分析的聚合归因 | Maciej Chrabaszcz, Hubert Baniecki, Piotr Komorowski, Szymon Płotka, Przemyslaw Biecek | http://arxiv.org/pdf/2407.16653v3 | https://github.com/mi2datalab/agg2exp. |
2024-11-29 | MonoPP: Metric-Scaled Self-Supervised Monocular Depth Estimation by Planar-Parallax Geometry in Automotive Applications | MonoPP:基于平面视差几何的度量缩放自监督单目深度估计在汽车应用中的研究 | Gasser Elazab, Torben Gräber, Michael Unterreiner, Olaf Hellwich | http://arxiv.org/pdf/2411.19717v1 | None |
2024-11-29 | GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding | GREAT:开放词汇3D物体属性定位的几何-意图协同推理 | Yawen Shao, Wei Zhai, Yuhang Yang, Hongchen Luo, Yang Cao, Zheng-Jun Zha | http://arxiv.org/pdf/2411.19626v1 | None |
2024-11-29 | Self-Supervised Denoiser Framework | 自监督去噪框架 | Emilien Valat, Andreas Hauptmann, Ozan Öktem | http://arxiv.org/pdf/2411.19593v1 | None |
2024-11-29 | Anatomical Foundation Models for Brain MRIs | 脑MRI的解剖基础模型 | Carlo Alberto Barbano, Matteo Brunello, Benoit Dufumier, Marco Grangetto | http://arxiv.org/pdf/2408.07079v3 | https://github.com/EIDOSLAB/AnatCL. |
2024-11-29 | ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration | ReconDreamer:通过在线修复构建驾驶场景重建的世界模型 | Chaojun Ni, Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Wenkang Qin, Guan Huang, Chen Liu, Yuyin Chen | http://arxiv.org/pdf/2411.19548v1 | None |
2024-11-29 | InstantGeoAvatar: Effective Geometry and Appearance Modeling of Animatable Avatars from Monocular Video | 即时GeoAvatar:从单目视频中有效建模可动虚拟形象的几何和外观 | Alvaro Budria, Adrian Lopez-Rodriguez, Oscar Lorente, Francesc Moreno-Noguer | http://arxiv.org/pdf/2411.01512v2 | None |
2024-11-29 | EvaSurf: Efficient View-Aware Implicit Textured Surface Reconstruction | EvaSurf:高效视角感知隐式纹理表面重建 | Jingnan Gao, Zhuo Chen, Yichao Yan, Bowen Pan, Zhe Wang, Jiangjing Lyu, Xiaokang Yang | http://arxiv.org/pdf/2311.09806v4 | None |
2024-11-29 | Diorama: Unleashing Zero-shot Single-view 3D Scene Modeling | 全景图:释放零样本单视图3D场景建模 | Qirui Wu, Denys Iliash, Daniel Ritchie, Manolis Savva, Angel X. Chang | http://arxiv.org/pdf/2411.19492v1 | None |
2024-11-29 | Driving with Prior Maps: Unified Vector Prior Encoding for Autonomous Vehicle Mapping | 驾驶与先验地图:自动驾驶车辆制图的统一向量先验编码 | Shuang Zeng, Xinyuan Chang, Xinran Liu, Zheng Pan, Xing Wei | http://arxiv.org/pdf/2409.05352v3 | None |
2024-11-29 | Blurred LiDAR for Sharper 3D: Robust Handheld 3D Scanning with Diffuse LiDAR and RGB | 模糊激光雷达实现更清晰的3D:使用漫反射激光雷达和RGB进行鲁棒的便携式3D扫描 | Nikhil Behari, Aaron Young, Siddharth Somasundaram, Tzofi Klinghoffer, Akshat Dave, Ramesh Raskar | http://arxiv.org/pdf/2411.19474v1 | None |
2024-11-29 | Robust Bayesian Scene Reconstruction by Leveraging Retrieval-Augmented Priors | 利用检索增强先验的鲁棒贝叶斯场景重建 | Herbert Wright, Weiming Zhi, Matthew Johnson-Roberson, Tucker Hermans | http://arxiv.org/pdf/2411.19461v1 | None |
2024-11-29 | Transientangelo: Few-Viewpoint Surface Reconstruction Using Single-Photon Lidar | 瞬变天使:单光子激光雷达的少视点表面重建 | Weihan Luo, Anagh Malik, David B. Lindell | http://arxiv.org/pdf/2408.12191v4 | None |
2024-11-29 | Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning | 多视角等变性通过最小特征微调提升3D对应理解 | Yang You, Yixin Li, Congyue Deng, Yue Wang, Leonidas Guibas | http://arxiv.org/pdf/2411.19458v1 | https://github.com/qq456cvb/3DCorrEnhance. |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-11-29 | Reanimating Images using Neural Representations of Dynamic Stimuli | 利用动态刺激的神经表征重动画像 | Jacob Yeung, Andrew F. Luo, Gabriel Sarch, Margaret M. Henderson, Deva Ramanan, Michael J. Tarr | http://arxiv.org/pdf/2406.02659v2 | None |
2024-11-29 |
|
C³-NeRF:通过条件累积连续神经辐射场建模多个场景 | Prajwal Singh, Ashish Tiwari, Gautam Vashishtha, Shanmuganathan Raman | http://arxiv.org/pdf/2411.19903v1 | None |
2024-11-29 | ThermoNeRF: Joint RGB and Thermal Novel View Synthesis for Building Facades using Multimodal Neural Radiance Fields | 热NeRF:基于多模态神经辐射场联合RGB和热成像的建筑物立面新视角合成 | Mariam Hassan, Florent Forest, Olga Fink, Malcolm Mielle | http://arxiv.org/pdf/2403.12154v2 | None |
2024-11-29 | Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models | 释放数据洪流的力量:关于语言模型指令微调数据评估与选择的全面调查 | Yulei Qin, Yuncheng Yang, Pengcheng Guo, Gang Li, Hang Shao, Yuchen Shi, Zihan Xu, Yun Gu | http://arxiv.org/pdf/2408.02085v4 | https://github.com/yuleiqin/fantastic-data-engineering. |
2024-11-29 | LokiTalk: Learning Fine-Grained and Generalizable Correspondences to Enhance NeRF-based Talking Head Synthesis | 洛基谈:学习细粒度和可泛化的对应关系以增强基于NeRF的说话头合成 | Tianqi Li, Ruobing Zheng, Bonan Li, Zicheng Zhang, Meng Wang, Jingdong Chen, Ming Yang | http://arxiv.org/pdf/2411.19525v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-11-29 | DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation | DELTA:一种基于多样性的早期晚期训练的数据集蒸馏方法 | Zhiqiang Shen, Ammar Sherif, Zeyuan Yin, Shitong Shao | http://arxiv.org/pdf/2411.19946v1 | https://github.com/VILA-Lab/DELT. |
2024-11-29 | MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks | MoTe:学习用于多生成任务的动态-文本扩散模型 | Yiming Wu, Wei Ji, Kecheng Zheng, Zicheng Wang, Dong Xu | http://arxiv.org/pdf/2411.19786v1 | None |
2024-11-29 | JetFormer: An Autoregressive Generative Model of Raw Images and Text | JetFormer:原始图像和文本的自回归生成模型 | Michael Tschannen, André Susano Pinto, Alexander Kolesnikov | http://arxiv.org/pdf/2411.19722v1 | None |
2024-11-29 | HUPE: Heuristic Underwater Perceptual Enhancement with Semantic Collaborative Learning | HUPE:基于语义协作学习的启发式水下感知增强 | Zengxi Zhang, Zhiying Jiang, Long Ma, Jinyuan Liu, Xin Fan, Risheng Liu | http://arxiv.org/pdf/2411.18296v2 | https://github.com/ZengxiZhang/HUPE. |
2024-11-29 | Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing | 统一注意力图:提升重建和编辑中的图像保真度 | Wenyi Mo, Tianyu Zhang, Yalong Bai, Bing Su, Ji-Rong Wen | http://arxiv.org/pdf/2411.19652v1 | https://github.com/Mowenyii/Uniform-Attention-Maps. |
2024-11-29 | Simultaneous Image-to-Zero and Zero-to-Noise: Diffusion Models with Analytical Image Attenuation | 同时进行图像到零和零到噪声:具有解析图像衰减的扩散模型 | Yuhang Huang, Zheng Qin, Xinwang Liu, Kai Xu | http://arxiv.org/pdf/2306.13720v9 | None |
2024-11-29 | LDA-AQU: Adaptive Query-guided Upsampling via Local Deformable Attention | LDA-AQU:基于局部可变形注意力的自适应查询引导上采样 | Zewen Du, Zhenjiang Hu, Guiyu Zhao, Ying Jin, Hongbin Ma | http://arxiv.org/pdf/2411.19585v1 | https://github.com/duzw9311/LDA-AQU |
2024-11-29 | SuperMat: Physically Consistent PBR Material Estimation at Interactive Rates | 超级材质:交互式速率下的物理一致性PBR材质估计 | Yijia Hong, Yuan-Chen Guo, Ran Yi, Yulong Chen, Yan-Pei Cao, Lizhuang Ma | http://arxiv.org/pdf/2411.17515v3 | None |
2024-11-29 | Contextual Checkerboard Denoise -- A Novel Neural Network-Based Approach for Classification-Aware OCT Image Denoising | 上下文棋盘格降噪——一种基于神经网络的分类感知OCT图像降噪新方法 | Md. Touhidul Islam, Md. Abtahi M. Chowdhury, Sumaiya Salekin, Aye T. Maung, Akil A. Taki, Hafiz Imtiaz | http://arxiv.org/pdf/2411.19549v1 | None |
2024-11-29 | Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook | 生成式AI时代深度伪造媒体生成与检测综述与展望 | Florinel-Alin Croitoru, Andrei-Iulian Hiji, Vlad Hondru, Nicolae Catalin Ristea, Paul Irofti, Marius Popescu, Cristian Rusu, Radu Tudor Ionescu | http://arxiv.org/pdf/2411.19537v1 | https://github.com/CroitoruAlin/biodeep. |
2024-11-29 | QUOTA: Quantifying Objects with Text-to-Image Models for Any Domain | QUOTA:任何领域的文本到图像模型量化对象 | Wenfang Sun, Yingjun Du, Gaowen Liu, Cees G. M. Snoek | http://arxiv.org/pdf/2411.19534v1 | None |
2024-11-29 | RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation | RAGDiffusion:通过外部知识同化实现忠实布料生成 | Xianfeng Tan, Yuhan Li, Wenxiang Shang, Yubo Wu, Jian Wang, Xuanhong Chen, Yi Zhang, Ran Lin | http://arxiv.org/pdf/2411.19528v1 | None |
2024-11-29 | Retrieval-guided Cross-view Image Synthesis | 基于检索引导的跨视图图像合成 | Hongji Yang, Yiru Li, Yingying Zhu | http://arxiv.org/pdf/2411.19510v1 | None |
2024-11-29 | An Approach Towards Learning K-means-friendly Deep Latent Representation | 一种学习K-means友好深度潜在表示的方法 | Debapriya Roy | http://arxiv.org/pdf/2411.19496v1 | None |
2024-11-29 | VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model | 多模态大型语言模型赋能的通用图像外推:VIP | Jinze Yang, Haoran Wang, Zining Zhu, Chenglong Liu, Meng Wymond Wu, Mingming Sun | http://arxiv.org/pdf/2406.01059v3 | None |
2024-11-29 | Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis | Ditto:可控实时说话人头合成中的运动空间扩散 | Tianqi Li, Ruobing Zheng, Minghui Yang, Jingdong Chen, Ming Yang | http://arxiv.org/pdf/2411.19509v1 | None |
2024-11-29 | Layered Rendering Diffusion Model for Controllable Zero-Shot Image Synthesis | 分层渲染扩散模型用于可控零样本图像合成 | Zipeng Qi, Guoxi Huang, Chenyang Liu, Fei Ye | http://arxiv.org/pdf/2311.18435v2 | None |
2024-11-29 | ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection | 伪造侦探:赋能多模态大型语言模型进行图像篡改检测 | Zhihao Sun, Haoran Jiang, Haoran Chen, Yixin Cao, Xipeng Qiu, Zuxuan Wu, Yu-Gang Jiang | http://arxiv.org/pdf/2411.19466v1 | None |
2024-11-29 | Fleximo: Towards Flexible Text-to-Human Motion Video Generation | Fleximo:迈向灵活的文本到人类动作视频生成 | Yuhang Zhang, Yuan Zhou, Zeyu Liu, Yuxuan Cai, Qiuyue Wang, Aidong Men, Huan Yang | http://arxiv.org/pdf/2411.19459v1 | None |
2024-11-29 | AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea | AnyEdit:掌握适用于任何想法的统一高质量图像编辑 | Qifan Yu, Wei Chow, Zhongqi Yue, Kaihang Pan, Yang Wu, Xiaoyang Wan, Juncheng Li, Siliang Tang | http://arxiv.org/pdf/2411.15738v2 | None |
2024-11-29 | PromptHSI: Universal Hyperspectral Image Restoration Framework for Composite Degradation | PromptHSI:通用高光谱图像复合退化恢复框架 | Chia-Ming Lee, Ching-Heng Cheng, Yu-Fan Lin, Yi-Ching Cheng, Wo-Ting Liao, Chih-Chung Hsu, Fu-En Yang, Yu-Chiang Frank Wang | http://arxiv.org/pdf/2411.15922v2 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-11-29 | SignLLM: Sign Language Production Large Language Models | SignLLM:手语生成大型语言模型 | Sen Fang, Lei Wang, Ce Zheng, Chunyu Sui, Mingyu Zhao, Yapeng Tian, Chen Chen | http://arxiv.org/pdf/2405.10718v2 | None |
2024-11-29 | VLSBench: Unveiling Visual Leakage in Multimodal Safety | VLSBench:揭示多模态安全中的视觉泄露 | Xuhao Hu, Dongrui Liu, Hao Li, Xuanjing Huang, Jing Shao | http://arxiv.org/pdf/2411.19939v1 | None |
2024-11-29 | On Domain-Specific Post-Training for Multimodal Large Language Models | 特定领域后训练的多模态大型语言模型 | Daixuan Cheng, Shaohan Huang, Ziyu Zhu, Xintong Zhang, Wayne Xin Zhao, Zhongzhi Luan, Bo Dai, Zhenliang Zhang | http://arxiv.org/pdf/2411.19930v1 | None |
2024-11-29 | A Survey on Multimodal Large Language Models | 多模态大型语言模型综述 | Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, Enhong Chen | http://arxiv.org/pdf/2306.13549v4 | https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models. |
2024-11-29 | CLIPArTT: Adaptation of CLIP to New Domains at Test Time | CLIPArTT:测试时将CLIP适应新领域的自适应方法 | Gustavo Adolfo Vargas Hakim, David Osowiechi, Mehrdad Noori, Milad Cheraghalikhani, Ali Bahri, Moslem Yazdanpanah, Ismail Ben Ayed, Christian Desrosiers | http://arxiv.org/pdf/2405.00754v2 | https://github.com/dosowiechi/CLIPArTT.git |
2024-11-29 | Multimodal Whole Slide Foundation Model for Pathology | 多模态全切片病理学基础模型 | Tong Ding, Sophia J. Wagner, Andrew H. Song, Richard J. Chen, Ming Y. Lu, Andrew Zhang, Anurag J. Vaidya, Guillaume Jaume | http://arxiv.org/pdf/2411.19666v1 | None |
2024-11-29 | Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings | 加速多模态大型语言模型:动态视觉-标记退出与实证发现 | Qiong Wu, Wenhao Lin, Weihao Ye, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji | http://arxiv.org/pdf/2411.19628v1 | https://github.com/DoubtedSteam/DyVTE. |
2024-11-29 | GalLoP: Learning Global and Local Prompts for Vision-Language Models | GalLoP:学习全局和局部提示以用于视觉语言模型 | Marc Lafon, Elias Ramzi, Clément Rambour, Nicolas Audebert, Nicolas Thome | http://arxiv.org/pdf/2407.01400v2 | https://github.com/MarcLafon/gallop. |
2024-11-29 | EarthMarker: A Visual Prompting Multi-modal Large Language Model for Remote Sensing | 地球标记器:一种用于遥感的多模态视觉提示大型语言模型 | Wei Zhang, Miaoxin Cai, Tong Zhang, Jun Li, Yin Zhuang, Xuerui Mao | http://arxiv.org/pdf/2407.13596v3 | https://github.com/wivizhang/EarthMarker. |
2024-11-29 | Combining inherent knowledge of vision-language models with unsupervised domain adaptation through strong-weak guidance | 结合视觉-语言模型内在知识与通过强-弱指导的无监督领域自适应 | Thomas Westfechtel, Dexuan Zhang, Tatsuya Harada | http://arxiv.org/pdf/2312.04066v4 | None |
2024-11-29 | Interleaved-Modal Chain-of-Thought | 交错模态思维链 | Jun Gao, Yongqi Li, Ziqiang Cao, Wenjie Li | http://arxiv.org/pdf/2411.19488v1 | None |
2024-11-29 | A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models | 医学多模态大型语言模型的频谱评估基准 | Jie Liu, Wenxuan Wang, Yihang Su, Jingyuan Huan, Wenting Chen, Yudi Zhang, Cheng-Yi Li, Kao-Jung Chang | http://arxiv.org/pdf/2402.11217v2 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-11-29 | SEDMamba: Enhancing Selective State Space Modelling with Bottleneck Mechanism and Fine-to-Coarse Temporal Fusion for Efficient Error Detection in Robot-Assisted Surgery | SEDMamba:通过瓶颈机制和细到粗时间融合增强选择性状态空间建模,以实现机器人辅助手术中高效错误检测 | Jialang Xu, Nazir Sirajudeen, Matthew Boal, Nader Francis, Danail Stoyanov, Evangelos Mazomenos | http://arxiv.org/pdf/2406.15920v4 | https://github.com/wzjialang/SEDMamba. |
2024-11-29 | FlowCLAS: Enhancing Normalizing Flow Via Contrastive Learning For Anomaly Segmentation | FlowCLAS:通过对比学习增强归一化流用于异常分割 | Chang Won Lee, Selina Leveugle, Svetlana Stolpner, Chris Langley, Paul Grouchy, Jonathan Kelly, Steven L. Waslander | http://arxiv.org/pdf/2411.19888v1 | None |
2024-11-29 | A Visual-inertial Localization Algorithm using Opportunistic Visual Beacons and Dead-Reckoning for GNSS-Denied Large-scale Applications | 基于机会视觉信标和航位推算的GNSS拒绝场景下大规模应用的视觉惯性定位算法 | Liqiang Zhang Ye Tian Dongyan Wei | http://arxiv.org/pdf/2411.19845v1 | None |
2024-11-29 | Feedback-driven object detection and iterative model improvement | 反馈驱动的目标检测与迭代模型改进 | Sönke Tenckhoff, Mario Koddenbrock, Erik Rodner | http://arxiv.org/pdf/2411.19835v1 | None |
2024-11-29 | SpaRC: Sparse Radar-Camera Fusion for 3D Object Detection | 稀疏雷达-相机融合用于3D目标检测 | Philipp Wolters, Johannes Gilg, Torben Teepe, Fabian Herzog, Felix Fent, Gerhard Rigoll | http://arxiv.org/pdf/2411.19860v1 | https://github.com/phi-wol/sparc. |
2024-11-29 | Towards Class-wise Robustness Analysis | 朝向类别鲁棒性分析 | Tejaswini Medi, Julia Grabinski, Margret Keuper | http://arxiv.org/pdf/2411.19853v1 | None |
2024-11-29 | Efficient Text-driven Motion Generation via Latent Consistency Training | 基于潜在一致性训练的高效文本驱动运动生成 | Mengxian Hu, Minghao Zhu, Xun Zhou, Qingqing Yan, Shu Li, Chengju Liu, Qijun Chen | http://arxiv.org/pdf/2405.02791v3 | None |
2024-11-29 | Gaussian multi-target filtering with target dynamics driven by a stochastic differential equation | 高斯多目标滤波:由随机微分方程驱动的目标动力学 | Ángel F. García-Fernández, Simo Särkkä | http://arxiv.org/pdf/2411.19814v1 | None |
2024-11-29 | Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation | 超球面空间中参数高效的开放词汇语义分割微调 | Zelin Peng, Zhengqin Xu, Zhilin Zeng, Yaoming Wang, Wei Shen | http://arxiv.org/pdf/2405.18840v2 | None |
2024-11-29 | Image segmentation of treated and untreated tumor spheroids by Fully Convolutional Networks | 基于全卷积网络的已处理与未处理肿瘤球体图像分割 | Matthias Streller, Soňa Michlíková, Willy Ciecior, Katharina Lönnecke, Leoni A. Kunz-Schughart, Steffen Lange, Anja Voss-Böhme | http://arxiv.org/pdf/2405.01105v2 | None |
2024-11-29 | P2PFormer: A Primitive-to-polygon Method for Regular Building Contour Extraction from Remote Sensing Images | P2PFormer:一种从遥感图像中提取规则建筑轮廓的原始到多边形方法 | Tao Zhang, Shiqing Wei, Yikang Zhou, Muying Luo, Wenling You, Shunping Ji | http://arxiv.org/pdf/2406.02930v2 | None |
2024-11-29 | Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models | 双风险最小化:迈向零样本模型微调的下一级鲁棒性 | Kaican Li, Weiyan Xie, Yongxiang Huang, Didan Deng, Lanqing Hong, Zhenguo Li, Ricardo Silva, Nevin L. Zhang | http://arxiv.org/pdf/2411.19757v1 | https://github.com/vaynexie/DRM |
2024-11-29 | LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References | LaVIDE:一种用于检测带有地图参考的卫星图像变化的语言-视觉判别器 | Shuguo Jiang, Fang Xu, Sen Jia, Gui-Song Xia | http://arxiv.org/pdf/2411.19758v1 | None |
2024-11-29 | Domain-Adaptive Pre-training of Self-Supervised Foundation Models for Medical Image Classification in Gastrointestinal Endoscopy | 胃肠内镜医学图像分类的自监督基础模型领域自适应预训练 | Marcel Roth, Micha V. Nowak, Adrian Krenzer, Frank Puppe | http://arxiv.org/pdf/2410.21302v3 | None |
2024-11-29 | A Multi-Loss Strategy for Vehicle Trajectory Prediction: Combining Off-Road, Diversity, and Directional Consistency Losses | 多损失策略用于车辆轨迹预测:结合越野、多样性和方向一致性损失 | Ahmad Rahimi, Alexandre Alahi | http://arxiv.org/pdf/2411.19747v1 | https://github.com/vita-epfl/stay-on-track |
2024-11-29 | Real-Time Anomaly Detection in Video Streams | 实时视频流中的异常检测 | Fabien Poirier | http://arxiv.org/pdf/2411.19731v1 | None |
2024-11-29 | Forensics Adapter: Adapting CLIP for Generalizable Face Forgery Detection | 法医适配器:将CLIP适配于通用人脸伪造检测 | Xinjie Cui, Yuezun Li, Ao Luo, Jiaran Zhou, Junyu Dong | http://arxiv.org/pdf/2411.19715v1 | None |
2024-11-29 | SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks | 医学视觉问答任务中鲁棒性评估的系统理解:SURE-VQA | Kim-Celine Kahl, Selen Erkan, Jeremias Traub, Carsten T. Lüth, Klaus Maier-Hein, Lena Maier-Hein, Paul F. Jaeger | http://arxiv.org/pdf/2411.19688v1 | https://github.com/IML-DKFZ/sure-vqa. |
2024-11-29 | CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation | CogACT:机器人操作中协同认知与动作的基础视觉-语言-动作模型 | Qixiu Li, Yaobo Liang, Zeyu Wang, Lin Luo, Xi Chen, Mozheng Liao, Fangyun Wei, Yu Deng | http://arxiv.org/pdf/2411.19650v1 | None |
2024-11-29 | FairDD: Fair Dataset Distillation via Synchronized Matching | 公平数据蒸馏:通过同步匹配实现 | Qihang Zhou, Shenhao Fang, Shibo He, Wenchao Meng, Jiming Chen | http://arxiv.org/pdf/2411.19623v1 | None |
2024-11-29 | Mind the Prompt: A Novel Benchmark for Prompt-based Class-Agnostic Counting | 注意提示:基于提示的类无关计数的新基准 | Luca Ciampi, Nicola Messina, Matteo Pierucci, Giuseppe Amato, Marco Avvenuti, Fabrizio Falchi | http://arxiv.org/pdf/2409.15953v2 | https://github.com/ciampluca/PrACo. |
2024-11-29 | A Comprehensive Framework for Automated Segmentation of Perivascular Spaces in Brain MRI with the nnU-Net | 基于nnU-Net的脑部MRI血管周围间隙自动分割的全面框架 | William Pham, Alexander Jarema, Donggyu Rim, Zhibin Chen, Mohamed S. H. Khlif, Vaughan G. Macefield, Luke A. Henderson, Amy Brodtmann | http://arxiv.org/pdf/2411.19564v1 | None |
2024-11-29 | RadioActive: 3D Radiological Interactive Segmentation Benchmark | 放射性:3D放射学交互式分割基准 | Constantin Ulrich, Tassilo Wald, Emily Tempus, Maximilian Rokuss, Paul F. Jaeger, Klaus Maier-Hein | http://arxiv.org/pdf/2411.07885v2 | None |
2024-11-29 | Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph | 基于骨架的空间-时间全景图组活动识别 | Zhengcen Li, Xinle Chang, Yueran Li, Jingyong Su | http://arxiv.org/pdf/2407.19497v2 | https://github.com/mgiant/MP-GCN |
2024-11-29 | SkelMamba: A State Space Model for Efficient Skeleton Action Recognition of Neurological Disorders | SkelMamba:一种用于神经疾病高效骨骼动作识别的状态空间模型 | Niki Martinel, Mariano Serrao, Christian Micheloni | http://arxiv.org/pdf/2411.19544v1 | None |
2024-11-29 | Enhancing AI microscopy for foodborne bacterial classification via adversarial domain adaptation across optical and biological variability | 通过跨光学和生物变异的对抗性领域自适应增强食品传播细菌分类的AI显微镜 | Siddhartha Bhattacharya, Aarham Wasit, Mason Earles, Nitin Nitin, Luyao Ma, Jiyoon Yi | http://arxiv.org/pdf/2411.19514v1 | None |
2024-11-29 | Effective Fine-Tuning of Vision-Language Models for Accurate Galaxy Morphology Analysis | 有效的视觉-语言模型微调以提高精确的星系形态分析 | Ruoqi Wang, Haitao Wang, Qiong Luo | http://arxiv.org/pdf/2411.19475v1 | None |
2024-11-29 | ARN-LSTM: A Multi-Stream Fusion Model for Skeleton-based Action Recognition | ARN-LSTM:基于骨架的动作识别的多流融合模型 | Chuanchuan Wang, Ahmad Sufril Azlan Mohmamed, Mohd Halim Bin Mohd Noor, Xiao Yang, Feifan Yi, Xiang Li | http://arxiv.org/pdf/2411.01769v2 | None |
2024-11-29 | FLARE: Towards Universal Dataset Purification against Backdoor Attacks | FLARE:面向对抗后门攻击的通用数据集净化 | Linshan Hou, Wei Luo, Zhongyun Hua, Songhua Chen, Leo Yu Zhang, Yiming Li | http://arxiv.org/pdf/2411.19479v1 | None |
2024-11-29 | Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning | 可调自感知的SAM通过自感知调优的提示式异常分割 | Hui-Yue Yang, Hui Chen, Ao Wang, Kai Chen, Zijia Lin, Yongliang Tang, Pengcheng Gao, Yuming Quan | http://arxiv.org/pdf/2411.17217v3 | None |
2024-11-29 | Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification | 高效的大视觉-语言模型微调用于细粒度船舶分类 | Long Lan, Fengxiang Wang, Xiangtao Zheng, Zengmao Wang, Xinwang Liu | http://arxiv.org/pdf/2403.08271v2 | None |
2024-11-29 | Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks | 避开危害:防御视觉语言模型免受越狱攻击的自适应方法 | Han Wang, Gang Wang, Huan Zhang | http://arxiv.org/pdf/2411.16721v2 | None |
2024-11-29 | Adaptive Interactive Segmentation for Multimodal Medical Imaging via Selection Engine | 自适应交互式多模态医学图像分割通过选择引擎 | Zhi Li, Kai Zhao, Yaqi Wang, Shuai Wang | http://arxiv.org/pdf/2411.19447v1 | None |
2024-11-29 | Pytorch-Wildlife: A Collaborative Deep Learning Framework for Conservation | Pytorch-Wildlife:一种用于保护的协同深度学习框架 | Andres Hernandez, Zhongqi Miao, Luisa Vargas, Sara Beery, Rahul Dodhia, Pablo Arbelaez, Juan M. Lavista Ferres | http://arxiv.org/pdf/2405.12930v4 | https://github.com/microsoft/CameraTraps. |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-11-29 | Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification | 神经符号形式验证在文本到视频模型评估中的应用 | S. P. Sharan, Minkyu Choi, Sahil Shah, Harsh Goel, Mohammad Omama, Sandeep Chinchali | http://arxiv.org/pdf/2411.16718v2 | None |
2024-11-29 | Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark | 感知测试2024:挑战总结与一种新型一小时视频问答基准 | Joseph Heyward, João Carreira, Dima Damen, Andrew Zisserman, Viorica Pătrăucean | http://arxiv.org/pdf/2411.19941v1 | None |
2024-11-29 | SIMS: Simulating Human-Scene Interactions with Real World Script Planning | SIMS:利用现实世界脚本规划模拟人-场景交互 | Wenjia Wang, Liang Pan, Zhiyang Dou, Zhouyingcheng Liao, Yuke Lou, Lei Yang, Jingbo Wang, Taku Komura | http://arxiv.org/pdf/2411.19921v1 | None |
2024-11-29 | SceneMotion: From Agent-Centric Embeddings to Scene-Wide Forecasts | 场景运动:从以代理为中心的嵌入到场景级预测 | Royden Wagner, Ömer Sahin Tas, Marlon Steiner, Fabian Konstantinidis, Hendrik Königshof, Marvin Klemp, Carlos Fernandez, Christoph Stiller | http://arxiv.org/pdf/2408.01537v3 | https://github.com/kit-mrt/future-motion |
2024-11-29 | Hybrid Architecture for Real-Time Video Anomaly Detection: Integrating Spatial and Temporal Analysis | 混合架构实时视频异常检测:集成空间和时间分析 | Fabien Poirier | http://arxiv.org/pdf/2410.15909v3 | None |
2024-11-29 | Aggregating Nearest Sharp Features via Hybrid Transformers for Video Deblurring | 通过混合Transformer聚合最近锐利特征的视频去模糊 | Wei Shang, Dongwei Ren, Yi Yang, Wangmeng Zuo | http://arxiv.org/pdf/2309.07054v2 | https://github.com/shangwei5/STGTN. |
2024-11-29 | VideoDirector: Precise Video Editing via Text-to-Video Models | 视频导演:通过文本到视频模型进行精确视频编辑 | Yukun Wang, Longguang Wang, Zhiyuan Ma, Qibin Hu, Kai Xu, Yulan Guo | http://arxiv.org/pdf/2411.17592v2 | None |
2024-11-29 | LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos | LongVALE:面向时间感知长视频的视觉-音频-语言-事件基准 | Tiantian Geng, Jinrui Zhang, Qingni Wang, Teng Wang, Jinming Duan, Feng Zheng | http://arxiv.org/pdf/2411.19772v1 | None |
2024-11-29 | The Streetscape Application Services Stack (SASS): Towards a Distributed Sensing Architecture for Urban Applications | 城市应用分布式感知架构的街道景观应用服务栈(SASS) | Navid Salami Pargoo, Mahshid Ghasemi, Shuren Xia, Mehmet Kerem Turkcan, Taqiya Ehsan, Chengbo Zang, Yuan Sun, Javad Ghaderi | http://arxiv.org/pdf/2411.19714v1 | None |
2024-11-29 | Combining Pre- and Post-Demosaicking Noise Removal for RAW Video | 结合预去马赛克和后去马赛克噪声去除的RAW视频 | Marco Sánchez-Beeckman, Antoni Buades, Nicola Brandonisio, Bilel Kanoun | http://arxiv.org/pdf/2410.02572v2 | None |
2024-11-29 | Subjective and Objective Quality Assessment Methods of Stereoscopic Videos with Visibility Affecting Distortions | 立体视频受可见性影响失真下的主观和客观质量评估方法 | Sria Biswas, Balasubramanyam Appina, Priyanka Kokil, Sumohana S Channappayya | http://arxiv.org/pdf/2411.19522v1 | None |
2024-11-29 | V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow | V2SFlow:基于语音分解和修正流的视频到语音生成 | Jeongsoo Choi, Ji-Hoon Kim, Jinyu Li, Joon Son Chung, Shujie Liu | http://arxiv.org/pdf/2411.19486v1 | None |
2024-11-29 | Look Every Frame All at Once: Video-Ma$^2$mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing | 一次查看每一帧:基于多轴梯度检查点的视频-Ma$^2$mba高效长视频理解 | Hosu Lee, Junho Kim, Hyunjun Kim, Yong Man Ro | http://arxiv.org/pdf/2411.19460v1 | None |
2024-11-29 | MCUCoder: Adaptive Bitrate Learned Video Compression for IoT Devices | MCUCoder:适用于物联网设备的自适应码率学习视频压缩 | Ali Hojjat, Janek Haberer, Olaf Landsiedel | http://arxiv.org/pdf/2411.19442v1 | https://github.com/ds-kiel/MCUCoder. |
2024-11-29 | Lifelong Learning of Video Diffusion Models From a Single Video Stream | 从单一视频流中终身学习视频扩散模型 | Jason Yoo, Yingchen He, Saeid Naderiparizi, Dylan Green, Gido M. van de Ven, Geoff Pleiss, Frank Wood | http://arxiv.org/pdf/2406.04814v2 | None |
2024-11-29 | Actions and Objects Pathways for Domain Adaptation in Video Question Answering | 视频问答中的领域自适应动作与物体路径 | Safaa Abdullahi Moallim Mohamud, Ho-Young Jung | http://arxiv.org/pdf/2411.19434v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-11-29 | Exact Aggregation for Federated and Efficient Fine-Tuning of Foundation Models | 精确聚合:联邦和高效微调基础模型 | Raghav Singhal, Kaustubh Ponkshe, Praneeth Vepakomma | http://arxiv.org/pdf/2410.09432v2 | https://github.com/RaghavSinghal10/fedex-lora. |
2024-11-29 | A Comprehensive Content Verification System for ensuring Digital Integrity in the Age of Deep Fakes | 全面内容验证系统:确保深度伪造时代数字完整性的综合解决方案 | RaviKanth Kaja | http://arxiv.org/pdf/2411.19750v1 | None |
2024-11-29 | Explaining the Impact of Training on Vision Models via Activation Clustering | 通过激活聚类解释训练对视觉模型的影响 | Ahcène Boubekki, Samuel G. Fadel, Sebastian Mair | http://arxiv.org/pdf/2411.19700v1 | None |
2024-11-29 | Gated-Attention Feature-Fusion Based Framework for Poverty Prediction | 基于门控注意力特征融合的贫困预测框架 | Muhammad Umer Ramzan, Wahab Khaddim, Muhammad Ehsan Rana, Usman Ali, Manohar Ali, Fiaz ul Hassan, Fatima Mehmood | http://arxiv.org/pdf/2411.19690v1 | None |
2024-11-29 | You Don't Need Domain-Specific Data Augmentations When Scaling Self-Supervised Learning | 当扩展自监督学习时,您不需要特定领域的数据增强 | Théo Moutakanni, Maxime Oquab, Marc Szafraniec, Maria Vakalopoulou, Piotr Bojanowski | http://arxiv.org/pdf/2406.09294v2 | None |
2024-11-29 | Towards Evaluating Generalist Agents: An Automated Benchmark in Open World | 迈向通用智能体评估:开放世界中的自动化基准 | Xinyue Zheng, Haowei Lin, Kaichen He, Zihao Wang, Zilong Zheng, Yitao Liang | http://arxiv.org/pdf/2310.08367v2 | None |
2024-11-29 | Enabling DBSCAN for Very Large-Scale High-Dimensional Spaces | 启用DBSCAN处理大规模高维空间 | Yongyu Wang | http://arxiv.org/pdf/2411.11421v2 | None |
2024-11-29 | Dynamic Universal Approximation Theory: The Basic Theory for Deep Learning-Based Computer Vision Models | 动态通用逼近理论:基于深度学习计算机视觉模型的基本理论 | Wei Wang, Qing Li | http://arxiv.org/pdf/2407.17480v4 | None |
2024-11-29 | Learning Visual Abstract Reasoning through Dual-Stream Networks | 通过双流网络学习视觉抽象推理 | Kai Zhao, Chang Xu, Bailu Si | http://arxiv.org/pdf/2411.19451v1 | None |