Skip to content

Latest commit

 

History

History
174 lines (149 loc) · 34.9 KB

2024-11-29.md

File metadata and controls

174 lines (149 loc) · 34.9 KB

[UPDATED!] 2024-11-29 (Update Time)

3DGS

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-11-29 DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting DROID-Splat:结合端到端SLAM与3D高斯Splatting Christian Homeyer, Leon Begiristain, Christoph Schnörr http://arxiv.org/pdf/2411.17660v2 https://github.com/ChenHoy/DROID-Splat
2024-11-29 DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering DeSplat:用于无干扰渲染的分解高斯喷溅 Yihao Wang, Marcus Klasson, Matias Turkulainen, Shuzhe Wang, Juho Kannala, Arno Solin http://arxiv.org/pdf/2411.19756v1 None
2024-11-29 TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting 基于八叉树的三维高斯分层渲染生成高质量PBR材质:TexGaussian Bojun Xiong, Jialun Liu, Jiakui Hu, Chenming Wu, Jinbo Wu, Xing Liu, Chen Zhao, Errui Ding http://arxiv.org/pdf/2411.19654v1 None
2024-11-29 Gaussian Splashing: Direct Volumetric Rendering Underwater 高斯溅射:水下直接体渲染 Nir Mualem, Roy Amoyal, Oren Freifeld, Derya Akkaynak http://arxiv.org/pdf/2411.19588v1 None
2024-11-29 Tortho-Gaussian: Splatting True Digital Orthophoto Maps Tortho-Gaussian:分层渲染真实数字正射影像图 Xin Wang, Wendi Zhang, Hong Xie, Haibin Ai, Qiangqiang Yuan, Zongqian Zhan http://arxiv.org/pdf/2411.19594v1 None
2024-11-29 Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding 自举高斯聚类以实现视一致性3D场景理解 Wenbo Zhang, Lu Zhang, Ping Hu, Liqian Ma, Yunzhi Zhuge, Huchuan Lu http://arxiv.org/pdf/2411.19551v1 None

3D视觉与重建

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-11-29 AlphaTablets: A Generic Plane Representation for 3D Planar Reconstruction from Monocular Videos AlphaTablets:一种用于从单目视频中重建3D平面表示的通用方法 Yuze He, Wang Zhao, Shaohui Liu, Yubin Hu, Yushi Bai, Yu-Hui Wen, Yong-Jin Liu http://arxiv.org/pdf/2411.19950v1 None
2024-11-29 FaVoR: Features via Voxel Rendering for Camera Relocalization 基于体素渲染的相机重定位特征 Vincenzo Polizzi, Marco Cannici, Davide Scaramuzza, Jonathan Kelly http://arxiv.org/pdf/2409.07571v2 None
2024-11-29 Free-form Generation Enhances Challenging Clothed Human Modeling 自由形式生成增强具有挑战性的着装人体建模 Hang Ye, Xiaoxuan Ma, Hai Ci, Wentao Zhu, Yizhou Wang http://arxiv.org/pdf/2411.19942v1 None
2024-11-29 MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds MoSca:通过4D运动支架从非结构化视频中动态高斯融合 Jiahui Lei, Yijia Weng, Adam Harley, Leonidas Guibas, Kostas Daniilidis http://arxiv.org/pdf/2405.17421v2 None
2024-11-29 Quantifying the synthetic and real domain gap in aerial scene understanding 量化空中场景理解中的合成与真实域差距 Alina Marcu http://arxiv.org/pdf/2411.19913v1 None
2024-11-29 SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens SAT-HMR:通过尺度自适应标记实现实时多人3D网格估计 Chi Su, Xiaoxuan Ma, Jiajun Su, Yizhou Wang http://arxiv.org/pdf/2411.19824v1 None
2024-11-29 PerLA: Perceptive 3D Language Assistant 感知3D语言助手:PerLA Guofeng Mei, Wei Lin, Luigi Riz, Yujiao Wu, Fabio Poiesi, Yiming Wang http://arxiv.org/pdf/2411.19774v1 None
2024-11-29 Aggregated Attributions for Explanatory Analysis of 3D Segmentation Models 3D分割模型解释性分析的聚合归因 Maciej Chrabaszcz, Hubert Baniecki, Piotr Komorowski, Szymon Płotka, Przemyslaw Biecek http://arxiv.org/pdf/2407.16653v3 https://github.com/mi2datalab/agg2exp.
2024-11-29 MonoPP: Metric-Scaled Self-Supervised Monocular Depth Estimation by Planar-Parallax Geometry in Automotive Applications MonoPP:基于平面视差几何的度量缩放自监督单目深度估计在汽车应用中的研究 Gasser Elazab, Torben Gräber, Michael Unterreiner, Olaf Hellwich http://arxiv.org/pdf/2411.19717v1 None
2024-11-29 GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding GREAT:开放词汇3D物体属性定位的几何-意图协同推理 Yawen Shao, Wei Zhai, Yuhang Yang, Hongchen Luo, Yang Cao, Zheng-Jun Zha http://arxiv.org/pdf/2411.19626v1 None
2024-11-29 Self-Supervised Denoiser Framework 自监督去噪框架 Emilien Valat, Andreas Hauptmann, Ozan Öktem http://arxiv.org/pdf/2411.19593v1 None
2024-11-29 Anatomical Foundation Models for Brain MRIs 脑MRI的解剖基础模型 Carlo Alberto Barbano, Matteo Brunello, Benoit Dufumier, Marco Grangetto http://arxiv.org/pdf/2408.07079v3 https://github.com/EIDOSLAB/AnatCL.
2024-11-29 ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration ReconDreamer:通过在线修复构建驾驶场景重建的世界模型 Chaojun Ni, Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Wenkang Qin, Guan Huang, Chen Liu, Yuyin Chen http://arxiv.org/pdf/2411.19548v1 None
2024-11-29 InstantGeoAvatar: Effective Geometry and Appearance Modeling of Animatable Avatars from Monocular Video 即时GeoAvatar:从单目视频中有效建模可动虚拟形象的几何和外观 Alvaro Budria, Adrian Lopez-Rodriguez, Oscar Lorente, Francesc Moreno-Noguer http://arxiv.org/pdf/2411.01512v2 None
2024-11-29 EvaSurf: Efficient View-Aware Implicit Textured Surface Reconstruction EvaSurf:高效视角感知隐式纹理表面重建 Jingnan Gao, Zhuo Chen, Yichao Yan, Bowen Pan, Zhe Wang, Jiangjing Lyu, Xiaokang Yang http://arxiv.org/pdf/2311.09806v4 None
2024-11-29 Diorama: Unleashing Zero-shot Single-view 3D Scene Modeling 全景图:释放零样本单视图3D场景建模 Qirui Wu, Denys Iliash, Daniel Ritchie, Manolis Savva, Angel X. Chang http://arxiv.org/pdf/2411.19492v1 None
2024-11-29 Driving with Prior Maps: Unified Vector Prior Encoding for Autonomous Vehicle Mapping 驾驶与先验地图:自动驾驶车辆制图的统一向量先验编码 Shuang Zeng, Xinyuan Chang, Xinran Liu, Zheng Pan, Xing Wei http://arxiv.org/pdf/2409.05352v3 None
2024-11-29 Blurred LiDAR for Sharper 3D: Robust Handheld 3D Scanning with Diffuse LiDAR and RGB 模糊激光雷达实现更清晰的3D:使用漫反射激光雷达和RGB进行鲁棒的便携式3D扫描 Nikhil Behari, Aaron Young, Siddharth Somasundaram, Tzofi Klinghoffer, Akshat Dave, Ramesh Raskar http://arxiv.org/pdf/2411.19474v1 None
2024-11-29 Robust Bayesian Scene Reconstruction by Leveraging Retrieval-Augmented Priors 利用检索增强先验的鲁棒贝叶斯场景重建 Herbert Wright, Weiming Zhi, Matthew Johnson-Roberson, Tucker Hermans http://arxiv.org/pdf/2411.19461v1 None
2024-11-29 Transientangelo: Few-Viewpoint Surface Reconstruction Using Single-Photon Lidar 瞬变天使:单光子激光雷达的少视点表面重建 Weihan Luo, Anagh Malik, David B. Lindell http://arxiv.org/pdf/2408.12191v4 None
2024-11-29 Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning 多视角等变性通过最小特征微调提升3D对应理解 Yang You, Yixin Li, Congyue Deng, Yue Wang, Leonidas Guibas http://arxiv.org/pdf/2411.19458v1 https://github.com/qq456cvb/3DCorrEnhance.

NeRF

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-11-29 Reanimating Images using Neural Representations of Dynamic Stimuli 利用动态刺激的神经表征重动画像 Jacob Yeung, Andrew F. Luo, Gabriel Sarch, Margaret M. Henderson, Deva Ramanan, Michael J. Tarr http://arxiv.org/pdf/2406.02659v2 None
2024-11-29 $C^{3}$-NeRF: Modeling Multiple Scenes via Conditional-cum-Continual Neural Radiance Fields C³-NeRF:通过条件累积连续神经辐射场建模多个场景 Prajwal Singh, Ashish Tiwari, Gautam Vashishtha, Shanmuganathan Raman http://arxiv.org/pdf/2411.19903v1 None
2024-11-29 ThermoNeRF: Joint RGB and Thermal Novel View Synthesis for Building Facades using Multimodal Neural Radiance Fields 热NeRF:基于多模态神经辐射场联合RGB和热成像的建筑物立面新视角合成 Mariam Hassan, Florent Forest, Olga Fink, Malcolm Mielle http://arxiv.org/pdf/2403.12154v2 None
2024-11-29 Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models 释放数据洪流的力量:关于语言模型指令微调数据评估与选择的全面调查 Yulei Qin, Yuncheng Yang, Pengcheng Guo, Gang Li, Hang Shao, Yuchen Shi, Zihan Xu, Yun Gu http://arxiv.org/pdf/2408.02085v4 https://github.com/yuleiqin/fantastic-data-engineering.
2024-11-29 LokiTalk: Learning Fine-Grained and Generalizable Correspondences to Enhance NeRF-based Talking Head Synthesis 洛基谈:学习细粒度和可泛化的对应关系以增强基于NeRF的说话头合成 Tianqi Li, Ruobing Zheng, Bonan Li, Zicheng Zhang, Meng Wang, Jingdong Chen, Ming Yang http://arxiv.org/pdf/2411.19525v1 None

图像生成与编辑

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-11-29 DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation DELTA:一种基于多样性的早期晚期训练的数据集蒸馏方法 Zhiqiang Shen, Ammar Sherif, Zeyuan Yin, Shitong Shao http://arxiv.org/pdf/2411.19946v1 https://github.com/VILA-Lab/DELT.
2024-11-29 MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks MoTe:学习用于多生成任务的动态-文本扩散模型 Yiming Wu, Wei Ji, Kecheng Zheng, Zicheng Wang, Dong Xu http://arxiv.org/pdf/2411.19786v1 None
2024-11-29 JetFormer: An Autoregressive Generative Model of Raw Images and Text JetFormer:原始图像和文本的自回归生成模型 Michael Tschannen, André Susano Pinto, Alexander Kolesnikov http://arxiv.org/pdf/2411.19722v1 None
2024-11-29 HUPE: Heuristic Underwater Perceptual Enhancement with Semantic Collaborative Learning HUPE:基于语义协作学习的启发式水下感知增强 Zengxi Zhang, Zhiying Jiang, Long Ma, Jinyuan Liu, Xin Fan, Risheng Liu http://arxiv.org/pdf/2411.18296v2 https://github.com/ZengxiZhang/HUPE.
2024-11-29 Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing 统一注意力图:提升重建和编辑中的图像保真度 Wenyi Mo, Tianyu Zhang, Yalong Bai, Bing Su, Ji-Rong Wen http://arxiv.org/pdf/2411.19652v1 https://github.com/Mowenyii/Uniform-Attention-Maps.
2024-11-29 Simultaneous Image-to-Zero and Zero-to-Noise: Diffusion Models with Analytical Image Attenuation 同时进行图像到零和零到噪声:具有解析图像衰减的扩散模型 Yuhang Huang, Zheng Qin, Xinwang Liu, Kai Xu http://arxiv.org/pdf/2306.13720v9 None
2024-11-29 LDA-AQU: Adaptive Query-guided Upsampling via Local Deformable Attention LDA-AQU:基于局部可变形注意力的自适应查询引导上采样 Zewen Du, Zhenjiang Hu, Guiyu Zhao, Ying Jin, Hongbin Ma http://arxiv.org/pdf/2411.19585v1 https://github.com/duzw9311/LDA-AQU
2024-11-29 SuperMat: Physically Consistent PBR Material Estimation at Interactive Rates 超级材质:交互式速率下的物理一致性PBR材质估计 Yijia Hong, Yuan-Chen Guo, Ran Yi, Yulong Chen, Yan-Pei Cao, Lizhuang Ma http://arxiv.org/pdf/2411.17515v3 None
2024-11-29 Contextual Checkerboard Denoise -- A Novel Neural Network-Based Approach for Classification-Aware OCT Image Denoising 上下文棋盘格降噪——一种基于神经网络的分类感知OCT图像降噪新方法 Md. Touhidul Islam, Md. Abtahi M. Chowdhury, Sumaiya Salekin, Aye T. Maung, Akil A. Taki, Hafiz Imtiaz http://arxiv.org/pdf/2411.19549v1 None
2024-11-29 Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook 生成式AI时代深度伪造媒体生成与检测综述与展望 Florinel-Alin Croitoru, Andrei-Iulian Hiji, Vlad Hondru, Nicolae Catalin Ristea, Paul Irofti, Marius Popescu, Cristian Rusu, Radu Tudor Ionescu http://arxiv.org/pdf/2411.19537v1 https://github.com/CroitoruAlin/biodeep.
2024-11-29 QUOTA: Quantifying Objects with Text-to-Image Models for Any Domain QUOTA:任何领域的文本到图像模型量化对象 Wenfang Sun, Yingjun Du, Gaowen Liu, Cees G. M. Snoek http://arxiv.org/pdf/2411.19534v1 None
2024-11-29 RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation RAGDiffusion:通过外部知识同化实现忠实布料生成 Xianfeng Tan, Yuhan Li, Wenxiang Shang, Yubo Wu, Jian Wang, Xuanhong Chen, Yi Zhang, Ran Lin http://arxiv.org/pdf/2411.19528v1 None
2024-11-29 Retrieval-guided Cross-view Image Synthesis 基于检索引导的跨视图图像合成 Hongji Yang, Yiru Li, Yingying Zhu http://arxiv.org/pdf/2411.19510v1 None
2024-11-29 An Approach Towards Learning K-means-friendly Deep Latent Representation 一种学习K-means友好深度潜在表示的方法 Debapriya Roy http://arxiv.org/pdf/2411.19496v1 None
2024-11-29 VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model 多模态大型语言模型赋能的通用图像外推:VIP Jinze Yang, Haoran Wang, Zining Zhu, Chenglong Liu, Meng Wymond Wu, Mingming Sun http://arxiv.org/pdf/2406.01059v3 None
2024-11-29 Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis Ditto:可控实时说话人头合成中的运动空间扩散 Tianqi Li, Ruobing Zheng, Minghui Yang, Jingdong Chen, Ming Yang http://arxiv.org/pdf/2411.19509v1 None
2024-11-29 Layered Rendering Diffusion Model for Controllable Zero-Shot Image Synthesis 分层渲染扩散模型用于可控零样本图像合成 Zipeng Qi, Guoxi Huang, Chenyang Liu, Fei Ye http://arxiv.org/pdf/2311.18435v2 None
2024-11-29 ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection 伪造侦探:赋能多模态大型语言模型进行图像篡改检测 Zhihao Sun, Haoran Jiang, Haoran Chen, Yixin Cao, Xipeng Qiu, Zuxuan Wu, Yu-Gang Jiang http://arxiv.org/pdf/2411.19466v1 None
2024-11-29 Fleximo: Towards Flexible Text-to-Human Motion Video Generation Fleximo:迈向灵活的文本到人类动作视频生成 Yuhang Zhang, Yuan Zhou, Zeyu Liu, Yuxuan Cai, Qiuyue Wang, Aidong Men, Huan Yang http://arxiv.org/pdf/2411.19459v1 None
2024-11-29 AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea AnyEdit:掌握适用于任何想法的统一高质量图像编辑 Qifan Yu, Wei Chow, Zhongqi Yue, Kaihang Pan, Yang Wu, Xiaoyang Wan, Juncheng Li, Siliang Tang http://arxiv.org/pdf/2411.15738v2 None
2024-11-29 PromptHSI: Universal Hyperspectral Image Restoration Framework for Composite Degradation PromptHSI:通用高光谱图像复合退化恢复框架 Chia-Ming Lee, Ching-Heng Cheng, Yu-Fan Lin, Yi-Ching Cheng, Wo-Ting Liao, Chih-Chung Hsu, Fu-En Yang, Yu-Chiang Frank Wang http://arxiv.org/pdf/2411.15922v2 None

多模态学习

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-11-29 SignLLM: Sign Language Production Large Language Models SignLLM:手语生成大型语言模型 Sen Fang, Lei Wang, Ce Zheng, Chunyu Sui, Mingyu Zhao, Yapeng Tian, Chen Chen http://arxiv.org/pdf/2405.10718v2 None
2024-11-29 VLSBench: Unveiling Visual Leakage in Multimodal Safety VLSBench:揭示多模态安全中的视觉泄露 Xuhao Hu, Dongrui Liu, Hao Li, Xuanjing Huang, Jing Shao http://arxiv.org/pdf/2411.19939v1 None
2024-11-29 On Domain-Specific Post-Training for Multimodal Large Language Models 特定领域后训练的多模态大型语言模型 Daixuan Cheng, Shaohan Huang, Ziyu Zhu, Xintong Zhang, Wayne Xin Zhao, Zhongzhi Luan, Bo Dai, Zhenliang Zhang http://arxiv.org/pdf/2411.19930v1 None
2024-11-29 A Survey on Multimodal Large Language Models 多模态大型语言模型综述 Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, Enhong Chen http://arxiv.org/pdf/2306.13549v4 https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models.
2024-11-29 CLIPArTT: Adaptation of CLIP to New Domains at Test Time CLIPArTT:测试时将CLIP适应新领域的自适应方法 Gustavo Adolfo Vargas Hakim, David Osowiechi, Mehrdad Noori, Milad Cheraghalikhani, Ali Bahri, Moslem Yazdanpanah, Ismail Ben Ayed, Christian Desrosiers http://arxiv.org/pdf/2405.00754v2 https://github.com/dosowiechi/CLIPArTT.git
2024-11-29 Multimodal Whole Slide Foundation Model for Pathology 多模态全切片病理学基础模型 Tong Ding, Sophia J. Wagner, Andrew H. Song, Richard J. Chen, Ming Y. Lu, Andrew Zhang, Anurag J. Vaidya, Guillaume Jaume http://arxiv.org/pdf/2411.19666v1 None
2024-11-29 Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings 加速多模态大型语言模型:动态视觉-标记退出与实证发现 Qiong Wu, Wenhao Lin, Weihao Ye, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji http://arxiv.org/pdf/2411.19628v1 https://github.com/DoubtedSteam/DyVTE.
2024-11-29 GalLoP: Learning Global and Local Prompts for Vision-Language Models GalLoP:学习全局和局部提示以用于视觉语言模型 Marc Lafon, Elias Ramzi, Clément Rambour, Nicolas Audebert, Nicolas Thome http://arxiv.org/pdf/2407.01400v2 https://github.com/MarcLafon/gallop.
2024-11-29 EarthMarker: A Visual Prompting Multi-modal Large Language Model for Remote Sensing 地球标记器:一种用于遥感的多模态视觉提示大型语言模型 Wei Zhang, Miaoxin Cai, Tong Zhang, Jun Li, Yin Zhuang, Xuerui Mao http://arxiv.org/pdf/2407.13596v3 https://github.com/wivizhang/EarthMarker.
2024-11-29 Combining inherent knowledge of vision-language models with unsupervised domain adaptation through strong-weak guidance 结合视觉-语言模型内在知识与通过强-弱指导的无监督领域自适应 Thomas Westfechtel, Dexuan Zhang, Tatsuya Harada http://arxiv.org/pdf/2312.04066v4 None
2024-11-29 Interleaved-Modal Chain-of-Thought 交错模态思维链 Jun Gao, Yongqi Li, Ziqiang Cao, Wenjie Li http://arxiv.org/pdf/2411.19488v1 None
2024-11-29 A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models 医学多模态大型语言模型的频谱评估基准 Jie Liu, Wenxuan Wang, Yihang Su, Jingyuan Huan, Wenting Chen, Yudi Zhang, Cheng-Yi Li, Kao-Jung Chang http://arxiv.org/pdf/2402.11217v2 None

目标检测与分割

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-11-29 SEDMamba: Enhancing Selective State Space Modelling with Bottleneck Mechanism and Fine-to-Coarse Temporal Fusion for Efficient Error Detection in Robot-Assisted Surgery SEDMamba:通过瓶颈机制和细到粗时间融合增强选择性状态空间建模,以实现机器人辅助手术中高效错误检测 Jialang Xu, Nazir Sirajudeen, Matthew Boal, Nader Francis, Danail Stoyanov, Evangelos Mazomenos http://arxiv.org/pdf/2406.15920v4 https://github.com/wzjialang/SEDMamba.
2024-11-29 FlowCLAS: Enhancing Normalizing Flow Via Contrastive Learning For Anomaly Segmentation FlowCLAS:通过对比学习增强归一化流用于异常分割 Chang Won Lee, Selina Leveugle, Svetlana Stolpner, Chris Langley, Paul Grouchy, Jonathan Kelly, Steven L. Waslander http://arxiv.org/pdf/2411.19888v1 None
2024-11-29 A Visual-inertial Localization Algorithm using Opportunistic Visual Beacons and Dead-Reckoning for GNSS-Denied Large-scale Applications 基于机会视觉信标和航位推算的GNSS拒绝场景下大规模应用的视觉惯性定位算法 Liqiang Zhang Ye Tian Dongyan Wei http://arxiv.org/pdf/2411.19845v1 None
2024-11-29 Feedback-driven object detection and iterative model improvement 反馈驱动的目标检测与迭代模型改进 Sönke Tenckhoff, Mario Koddenbrock, Erik Rodner http://arxiv.org/pdf/2411.19835v1 None
2024-11-29 SpaRC: Sparse Radar-Camera Fusion for 3D Object Detection 稀疏雷达-相机融合用于3D目标检测 Philipp Wolters, Johannes Gilg, Torben Teepe, Fabian Herzog, Felix Fent, Gerhard Rigoll http://arxiv.org/pdf/2411.19860v1 https://github.com/phi-wol/sparc.
2024-11-29 Towards Class-wise Robustness Analysis 朝向类别鲁棒性分析 Tejaswini Medi, Julia Grabinski, Margret Keuper http://arxiv.org/pdf/2411.19853v1 None
2024-11-29 Efficient Text-driven Motion Generation via Latent Consistency Training 基于潜在一致性训练的高效文本驱动运动生成 Mengxian Hu, Minghao Zhu, Xun Zhou, Qingqing Yan, Shu Li, Chengju Liu, Qijun Chen http://arxiv.org/pdf/2405.02791v3 None
2024-11-29 Gaussian multi-target filtering with target dynamics driven by a stochastic differential equation 高斯多目标滤波:由随机微分方程驱动的目标动力学 Ángel F. García-Fernández, Simo Särkkä http://arxiv.org/pdf/2411.19814v1 None
2024-11-29 Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation 超球面空间中参数高效的开放词汇语义分割微调 Zelin Peng, Zhengqin Xu, Zhilin Zeng, Yaoming Wang, Wei Shen http://arxiv.org/pdf/2405.18840v2 None
2024-11-29 Image segmentation of treated and untreated tumor spheroids by Fully Convolutional Networks 基于全卷积网络的已处理与未处理肿瘤球体图像分割 Matthias Streller, Soňa Michlíková, Willy Ciecior, Katharina Lönnecke, Leoni A. Kunz-Schughart, Steffen Lange, Anja Voss-Böhme http://arxiv.org/pdf/2405.01105v2 None
2024-11-29 P2PFormer: A Primitive-to-polygon Method for Regular Building Contour Extraction from Remote Sensing Images P2PFormer:一种从遥感图像中提取规则建筑轮廓的原始到多边形方法 Tao Zhang, Shiqing Wei, Yikang Zhou, Muying Luo, Wenling You, Shunping Ji http://arxiv.org/pdf/2406.02930v2 None
2024-11-29 Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models 双风险最小化:迈向零样本模型微调的下一级鲁棒性 Kaican Li, Weiyan Xie, Yongxiang Huang, Didan Deng, Lanqing Hong, Zhenguo Li, Ricardo Silva, Nevin L. Zhang http://arxiv.org/pdf/2411.19757v1 https://github.com/vaynexie/DRM
2024-11-29 LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References LaVIDE:一种用于检测带有地图参考的卫星图像变化的语言-视觉判别器 Shuguo Jiang, Fang Xu, Sen Jia, Gui-Song Xia http://arxiv.org/pdf/2411.19758v1 None
2024-11-29 Domain-Adaptive Pre-training of Self-Supervised Foundation Models for Medical Image Classification in Gastrointestinal Endoscopy 胃肠内镜医学图像分类的自监督基础模型领域自适应预训练 Marcel Roth, Micha V. Nowak, Adrian Krenzer, Frank Puppe http://arxiv.org/pdf/2410.21302v3 None
2024-11-29 A Multi-Loss Strategy for Vehicle Trajectory Prediction: Combining Off-Road, Diversity, and Directional Consistency Losses 多损失策略用于车辆轨迹预测:结合越野、多样性和方向一致性损失 Ahmad Rahimi, Alexandre Alahi http://arxiv.org/pdf/2411.19747v1 https://github.com/vita-epfl/stay-on-track
2024-11-29 Real-Time Anomaly Detection in Video Streams 实时视频流中的异常检测 Fabien Poirier http://arxiv.org/pdf/2411.19731v1 None
2024-11-29 Forensics Adapter: Adapting CLIP for Generalizable Face Forgery Detection 法医适配器:将CLIP适配于通用人脸伪造检测 Xinjie Cui, Yuezun Li, Ao Luo, Jiaran Zhou, Junyu Dong http://arxiv.org/pdf/2411.19715v1 None
2024-11-29 SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks 医学视觉问答任务中鲁棒性评估的系统理解:SURE-VQA Kim-Celine Kahl, Selen Erkan, Jeremias Traub, Carsten T. Lüth, Klaus Maier-Hein, Lena Maier-Hein, Paul F. Jaeger http://arxiv.org/pdf/2411.19688v1 https://github.com/IML-DKFZ/sure-vqa.
2024-11-29 CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation CogACT:机器人操作中协同认知与动作的基础视觉-语言-动作模型 Qixiu Li, Yaobo Liang, Zeyu Wang, Lin Luo, Xi Chen, Mozheng Liao, Fangyun Wei, Yu Deng http://arxiv.org/pdf/2411.19650v1 None
2024-11-29 FairDD: Fair Dataset Distillation via Synchronized Matching 公平数据蒸馏:通过同步匹配实现 Qihang Zhou, Shenhao Fang, Shibo He, Wenchao Meng, Jiming Chen http://arxiv.org/pdf/2411.19623v1 None
2024-11-29 Mind the Prompt: A Novel Benchmark for Prompt-based Class-Agnostic Counting 注意提示:基于提示的类无关计数的新基准 Luca Ciampi, Nicola Messina, Matteo Pierucci, Giuseppe Amato, Marco Avvenuti, Fabrizio Falchi http://arxiv.org/pdf/2409.15953v2 https://github.com/ciampluca/PrACo.
2024-11-29 A Comprehensive Framework for Automated Segmentation of Perivascular Spaces in Brain MRI with the nnU-Net 基于nnU-Net的脑部MRI血管周围间隙自动分割的全面框架 William Pham, Alexander Jarema, Donggyu Rim, Zhibin Chen, Mohamed S. H. Khlif, Vaughan G. Macefield, Luke A. Henderson, Amy Brodtmann http://arxiv.org/pdf/2411.19564v1 None
2024-11-29 RadioActive: 3D Radiological Interactive Segmentation Benchmark 放射性:3D放射学交互式分割基准 Constantin Ulrich, Tassilo Wald, Emily Tempus, Maximilian Rokuss, Paul F. Jaeger, Klaus Maier-Hein http://arxiv.org/pdf/2411.07885v2 None
2024-11-29 Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph 基于骨架的空间-时间全景图组活动识别 Zhengcen Li, Xinle Chang, Yueran Li, Jingyong Su http://arxiv.org/pdf/2407.19497v2 https://github.com/mgiant/MP-GCN
2024-11-29 SkelMamba: A State Space Model for Efficient Skeleton Action Recognition of Neurological Disorders SkelMamba:一种用于神经疾病高效骨骼动作识别的状态空间模型 Niki Martinel, Mariano Serrao, Christian Micheloni http://arxiv.org/pdf/2411.19544v1 None
2024-11-29 Enhancing AI microscopy for foodborne bacterial classification via adversarial domain adaptation across optical and biological variability 通过跨光学和生物变异的对抗性领域自适应增强食品传播细菌分类的AI显微镜 Siddhartha Bhattacharya, Aarham Wasit, Mason Earles, Nitin Nitin, Luyao Ma, Jiyoon Yi http://arxiv.org/pdf/2411.19514v1 None
2024-11-29 Effective Fine-Tuning of Vision-Language Models for Accurate Galaxy Morphology Analysis 有效的视觉-语言模型微调以提高精确的星系形态分析 Ruoqi Wang, Haitao Wang, Qiong Luo http://arxiv.org/pdf/2411.19475v1 None
2024-11-29 ARN-LSTM: A Multi-Stream Fusion Model for Skeleton-based Action Recognition ARN-LSTM:基于骨架的动作识别的多流融合模型 Chuanchuan Wang, Ahmad Sufril Azlan Mohmamed, Mohd Halim Bin Mohd Noor, Xiao Yang, Feifan Yi, Xiang Li http://arxiv.org/pdf/2411.01769v2 None
2024-11-29 FLARE: Towards Universal Dataset Purification against Backdoor Attacks FLARE:面向对抗后门攻击的通用数据集净化 Linshan Hou, Wei Luo, Zhongyun Hua, Songhua Chen, Leo Yu Zhang, Yiming Li http://arxiv.org/pdf/2411.19479v1 None
2024-11-29 Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning 可调自感知的SAM通过自感知调优的提示式异常分割 Hui-Yue Yang, Hui Chen, Ao Wang, Kai Chen, Zijia Lin, Yongliang Tang, Pengcheng Gao, Yuming Quan http://arxiv.org/pdf/2411.17217v3 None
2024-11-29 Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification 高效的大视觉-语言模型微调用于细粒度船舶分类 Long Lan, Fengxiang Wang, Xiangtao Zheng, Zengmao Wang, Xinwang Liu http://arxiv.org/pdf/2403.08271v2 None
2024-11-29 Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks 避开危害:防御视觉语言模型免受越狱攻击的自适应方法 Han Wang, Gang Wang, Huan Zhang http://arxiv.org/pdf/2411.16721v2 None
2024-11-29 Adaptive Interactive Segmentation for Multimodal Medical Imaging via Selection Engine 自适应交互式多模态医学图像分割通过选择引擎 Zhi Li, Kai Zhao, Yaqi Wang, Shuai Wang http://arxiv.org/pdf/2411.19447v1 None
2024-11-29 Pytorch-Wildlife: A Collaborative Deep Learning Framework for Conservation Pytorch-Wildlife:一种用于保护的协同深度学习框架 Andres Hernandez, Zhongqi Miao, Luisa Vargas, Sara Beery, Rahul Dodhia, Pablo Arbelaez, Juan M. Lavista Ferres http://arxiv.org/pdf/2405.12930v4 https://github.com/microsoft/CameraTraps.

视频理解与处理

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-11-29 Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification 神经符号形式验证在文本到视频模型评估中的应用 S. P. Sharan, Minkyu Choi, Sahil Shah, Harsh Goel, Mohammad Omama, Sandeep Chinchali http://arxiv.org/pdf/2411.16718v2 None
2024-11-29 Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark 感知测试2024:挑战总结与一种新型一小时视频问答基准 Joseph Heyward, João Carreira, Dima Damen, Andrew Zisserman, Viorica Pătrăucean http://arxiv.org/pdf/2411.19941v1 None
2024-11-29 SIMS: Simulating Human-Scene Interactions with Real World Script Planning SIMS:利用现实世界脚本规划模拟人-场景交互 Wenjia Wang, Liang Pan, Zhiyang Dou, Zhouyingcheng Liao, Yuke Lou, Lei Yang, Jingbo Wang, Taku Komura http://arxiv.org/pdf/2411.19921v1 None
2024-11-29 SceneMotion: From Agent-Centric Embeddings to Scene-Wide Forecasts 场景运动:从以代理为中心的嵌入到场景级预测 Royden Wagner, Ömer Sahin Tas, Marlon Steiner, Fabian Konstantinidis, Hendrik Königshof, Marvin Klemp, Carlos Fernandez, Christoph Stiller http://arxiv.org/pdf/2408.01537v3 https://github.com/kit-mrt/future-motion
2024-11-29 Hybrid Architecture for Real-Time Video Anomaly Detection: Integrating Spatial and Temporal Analysis 混合架构实时视频异常检测:集成空间和时间分析 Fabien Poirier http://arxiv.org/pdf/2410.15909v3 None
2024-11-29 Aggregating Nearest Sharp Features via Hybrid Transformers for Video Deblurring 通过混合Transformer聚合最近锐利特征的视频去模糊 Wei Shang, Dongwei Ren, Yi Yang, Wangmeng Zuo http://arxiv.org/pdf/2309.07054v2 https://github.com/shangwei5/STGTN.
2024-11-29 VideoDirector: Precise Video Editing via Text-to-Video Models 视频导演:通过文本到视频模型进行精确视频编辑 Yukun Wang, Longguang Wang, Zhiyuan Ma, Qibin Hu, Kai Xu, Yulan Guo http://arxiv.org/pdf/2411.17592v2 None
2024-11-29 LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos LongVALE:面向时间感知长视频的视觉-音频-语言-事件基准 Tiantian Geng, Jinrui Zhang, Qingni Wang, Teng Wang, Jinming Duan, Feng Zheng http://arxiv.org/pdf/2411.19772v1 None
2024-11-29 The Streetscape Application Services Stack (SASS): Towards a Distributed Sensing Architecture for Urban Applications 城市应用分布式感知架构的街道景观应用服务栈(SASS) Navid Salami Pargoo, Mahshid Ghasemi, Shuren Xia, Mehmet Kerem Turkcan, Taqiya Ehsan, Chengbo Zang, Yuan Sun, Javad Ghaderi http://arxiv.org/pdf/2411.19714v1 None
2024-11-29 Combining Pre- and Post-Demosaicking Noise Removal for RAW Video 结合预去马赛克和后去马赛克噪声去除的RAW视频 Marco Sánchez-Beeckman, Antoni Buades, Nicola Brandonisio, Bilel Kanoun http://arxiv.org/pdf/2410.02572v2 None
2024-11-29 Subjective and Objective Quality Assessment Methods of Stereoscopic Videos with Visibility Affecting Distortions 立体视频受可见性影响失真下的主观和客观质量评估方法 Sria Biswas, Balasubramanyam Appina, Priyanka Kokil, Sumohana S Channappayya http://arxiv.org/pdf/2411.19522v1 None
2024-11-29 V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow V2SFlow:基于语音分解和修正流的视频到语音生成 Jeongsoo Choi, Ji-Hoon Kim, Jinyu Li, Joon Son Chung, Shujie Liu http://arxiv.org/pdf/2411.19486v1 None
2024-11-29 Look Every Frame All at Once: Video-Ma$^2$mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing 一次查看每一帧:基于多轴梯度检查点的视频-Ma$^2$mba高效长视频理解 Hosu Lee, Junho Kim, Hyunjun Kim, Yong Man Ro http://arxiv.org/pdf/2411.19460v1 None
2024-11-29 MCUCoder: Adaptive Bitrate Learned Video Compression for IoT Devices MCUCoder:适用于物联网设备的自适应码率学习视频压缩 Ali Hojjat, Janek Haberer, Olaf Landsiedel http://arxiv.org/pdf/2411.19442v1 https://github.com/ds-kiel/MCUCoder.
2024-11-29 Lifelong Learning of Video Diffusion Models From a Single Video Stream 从单一视频流中终身学习视频扩散模型 Jason Yoo, Yingchen He, Saeid Naderiparizi, Dylan Green, Gido M. van de Ven, Geoff Pleiss, Frank Wood http://arxiv.org/pdf/2406.04814v2 None
2024-11-29 Actions and Objects Pathways for Domain Adaptation in Video Question Answering 视频问答中的领域自适应动作与物体路径 Safaa Abdullahi Moallim Mohamud, Ho-Young Jung http://arxiv.org/pdf/2411.19434v1 None

其他

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-11-29 Exact Aggregation for Federated and Efficient Fine-Tuning of Foundation Models 精确聚合:联邦和高效微调基础模型 Raghav Singhal, Kaustubh Ponkshe, Praneeth Vepakomma http://arxiv.org/pdf/2410.09432v2 https://github.com/RaghavSinghal10/fedex-lora.
2024-11-29 A Comprehensive Content Verification System for ensuring Digital Integrity in the Age of Deep Fakes 全面内容验证系统:确保深度伪造时代数字完整性的综合解决方案 RaviKanth Kaja http://arxiv.org/pdf/2411.19750v1 None
2024-11-29 Explaining the Impact of Training on Vision Models via Activation Clustering 通过激活聚类解释训练对视觉模型的影响 Ahcène Boubekki, Samuel G. Fadel, Sebastian Mair http://arxiv.org/pdf/2411.19700v1 None
2024-11-29 Gated-Attention Feature-Fusion Based Framework for Poverty Prediction 基于门控注意力特征融合的贫困预测框架 Muhammad Umer Ramzan, Wahab Khaddim, Muhammad Ehsan Rana, Usman Ali, Manohar Ali, Fiaz ul Hassan, Fatima Mehmood http://arxiv.org/pdf/2411.19690v1 None
2024-11-29 You Don't Need Domain-Specific Data Augmentations When Scaling Self-Supervised Learning 当扩展自监督学习时,您不需要特定领域的数据增强 Théo Moutakanni, Maxime Oquab, Marc Szafraniec, Maria Vakalopoulou, Piotr Bojanowski http://arxiv.org/pdf/2406.09294v2 None
2024-11-29 Towards Evaluating Generalist Agents: An Automated Benchmark in Open World 迈向通用智能体评估:开放世界中的自动化基准 Xinyue Zheng, Haowei Lin, Kaichen He, Zihao Wang, Zilong Zheng, Yitao Liang http://arxiv.org/pdf/2310.08367v2 None
2024-11-29 Enabling DBSCAN for Very Large-Scale High-Dimensional Spaces 启用DBSCAN处理大规模高维空间 Yongyu Wang http://arxiv.org/pdf/2411.11421v2 None
2024-11-29 Dynamic Universal Approximation Theory: The Basic Theory for Deep Learning-Based Computer Vision Models 动态通用逼近理论:基于深度学习计算机视觉模型的基本理论 Wei Wang, Qing Li http://arxiv.org/pdf/2407.17480v4 None
2024-11-29 Learning Visual Abstract Reasoning through Dual-Stream Networks 通过双流网络学习视觉抽象推理 Kai Zhao, Chang Xu, Bailu Si http://arxiv.org/pdf/2411.19451v1 None