Skip to content

Latest commit

 

History

History
177 lines (116 loc) · 17.2 KB

2024-12-15.md

File metadata and controls

177 lines (116 loc) · 17.2 KB

[UPDATED!] 2024-12-15 (Update Time)

3D感知

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-15 Modality-Driven Design for Multi-Step Dexterous Manipulation: Insights from Neuroscience 多模态驱动设计用于多步灵巧操作:神经科学启示 Naoki Wake, Atsushi Kanehira, Daichi Saito, Jun Takamatsu, Kazuhiro Sasabuchi, Hideki Koike, Katsushi Ikeuchi http://arxiv.org/pdf/2412.11337v1 None
2024-12-15 Learning Normal Flow Directly From Event Neighborhoods 直接从事件邻域学习正常流 Dehao Yuan, Levi Burner, Jiayi Wu, Minghui Liu, Jingxi Chen, Yiannis Aloimonos, Cornelia Fermüller http://arxiv.org/pdf/2412.11284v1 None
2024-12-15 GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs 高斯属性:利用LMMs将物理属性整合到3D高斯中 Xinli Xu, Wenhang Ge, Dicong Qiu, ZhiFei Chen, Dongyu Yan, Zhuoyun Liu, Haoyu Zhao, Hanfeng Zhao http://arxiv.org/pdf/2412.11258v1 None
2024-12-15 Volumetric Mapping with Panoptic Refinement via Kernel Density Estimation for Mobile Robots 基于核密度估计的移动机器人全景细化体积映射 Khang Nguyen, Tuan Dang, Manfred Huber http://arxiv.org/pdf/2412.11241v1 https://github.com/mkhangg/refined
2024-12-15 ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction ViPOcc:利用视觉先验知识从视觉基础模型中进行单视图3D占用预测 Yi Feng, Yu Han, Xijing Zhang, Tanghui Li, Yanting Zhang, Rui Fan http://arxiv.org/pdf/2412.11210v1 None
2024-12-15 GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control GEM:一种可泛化的自我视觉多模态世界模型,用于精细粒度的自我运动、物体动态和场景构图控制 Mariam Hassan, Sebastian Stapf, Ahmad Rahimi, Pedro M B Rezende, Yasaman Haghighi, David Brüggemann, Isinsu Katircioglu, Lin Zhang http://arxiv.org/pdf/2412.11198v1 None
2024-12-15 AURORA: Automated Unleash of 3D Room Outlines for VR Applications AURORA:面向VR应用的自动释放3D房间轮廓 Huijun Han, Yongqing Liang, Yuanlong Zhou, Wenping Wang, Edgar J. Rojas-Munoz, Xin Li http://arxiv.org/pdf/2412.11033v1 None

3D重建

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-15 Sonicmesh: Enhancing 3D Human Mesh Reconstruction in Vision-Impaired Environments With Acoustic Signals 声网:利用声学信号增强视觉障碍环境中的3D人体网格重建 Xiaoxuan Liang, Wuyang Zhang, Hong Zhou, Zhaolong Wei, Sicheng Zhu, Yansong Li, Rui Yin, Jiantao Yuan http://arxiv.org/pdf/2412.11325v1 None
2024-12-15 OTLRM: Orthogonal Learning-based Low-Rank Metric for Multi-Dimensional Inverse Problems 正交学习基低秩度量多维逆问题 Xiangming Wang, Haijin Zeng, Jiaoyang Chen, Sheng Liu, Yongyong Chen, Guoqing Chao http://arxiv.org/pdf/2412.11165v1 None
2024-12-15 A Digitalized Atlas for Pulmonary Airway 数字化肺气道图谱 Minghui Zhang, Chenyu Li, Hanxiao Zhang, Yaoyu Liu, Yun Gu http://arxiv.org/pdf/2412.11039v1 None

NeRF

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-15 Macro2Micro: Cross-modal Magnetic Resonance Imaging Synthesis Leveraging Multi-scale Brain Structures 宏观到微观:利用多尺度脑结构的跨模态磁共振成像合成 Sooyoung Kim, Joonwoo Kwon, Junbeom Kwon, Sangyoon Bae, Yuewei Lin, Shinjae Yoo, Jiook Cha http://arxiv.org/pdf/2412.11277v1 None

人脸识别/处理

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-15 Unimodal and Multimodal Static Facial Expression Recognition for Virtual Reality Users with EmoHeVRDB 单模态和多模态静态面部表情识别:针对虚拟现实用户的EmoHeVRDB Thorben Ortmann, Qi Wang, Larissa Putzar http://arxiv.org/pdf/2412.11306v1 None
2024-12-15 Facial Surgery Preview Based on the Orthognathic Treatment Prediction 基于正颌治疗预测的面部整形预览 Huijun Han, Congyi Zhang, Lifeng Zhu, Pradeep Singh, Richard Tai Chiu Hsung, Yiu Yan Leung, Taku Komura, Wenping Wang http://arxiv.org/pdf/2412.11045v1 None

动作识别

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-15 A Comprehensive Survey of Action Quality Assessment: Method and Benchmark 动作质量评估综述:方法与基准 Kanglei Zhou, Ruizhi Cai, Liyuan Wang, Hubert P. H. Shum, Xiaohui Liang http://arxiv.org/pdf/2412.11149v1 None

图像分类

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-15 On the Generalizability of Iterative Patch Selection for Memory-Efficient High-Resolution Image Classification 关于迭代补丁选择在内存高效高分辨率图像分类中的泛化性研究 Max Riffi-Aslett, Christina Fell http://arxiv.org/pdf/2412.11237v1 None

图像恢复

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-15 Towards Context-aware Convolutional Network for Image Restoration 面向上下文感知卷积网络进行图像恢复 Fangwei Hao, Ji Du, Weiyun Liang, Jing Xu, Xiaoxuan Xu http://arxiv.org/pdf/2412.11008v1 None

图像生成/合成

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-15 One-Shot Multilingual Font Generation Via ViT 一次多语言字体生成通过ViT Zhiheng Wang, Jiarui Liu http://arxiv.org/pdf/2412.11342v1 None
2024-12-15 Provably Secure Robust Image Steganography via Cross-Modal Error Correction 基于跨模态误差校正的可证明安全鲁棒图像隐写术 Yuang Qi, Kejiang Chen, Na Zhao, Zijin Yang, Weiming Zhang http://arxiv.org/pdf/2412.12206v1 None
2024-12-15 GenLit: Reformulating Single-Image Relighting as Video Generation GenLit:将单图像重光照重新表述为视频生成 Shrisha Bharadwaj, Haiwen Feng, Victoria Abrevaya, Michael J. Black http://arxiv.org/pdf/2412.11224v1 None
2024-12-15 Distribution-Consistency-Guided Multi-modal Hashing 分布一致性引导的多模态哈希 Jin-Yu Liu, Xian-Ling Mao, Tian-Yi Che, Rong-Cheng Tu http://arxiv.org/pdf/2412.11216v1 https://github.com/LiuJinyu1229/DCGMH.
2024-12-15 Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation 轻量级快速文本到动作生成模型:Light-T2M Ling-An Zeng, Guohong Huang, Gaojie Wu, Wei-Shi Zheng http://arxiv.org/pdf/2412.11193v1 https://github.com/qinghuannn/light-t2m.
2024-12-15 OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation OccScene:基于语义占用跨任务互学习的3D场景生成 Bohan Li, Xin Jin, Jianan Wang, Yukai Shi, Yasheng Sun, Xiaofeng Wang, Zhuang Ma, Baao Xie http://arxiv.org/pdf/2412.11183v1 None
2024-12-15 Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation 基准测试与学习用于文本到3D生成的多维度质量评估器 Yujie Zhang, Bingyang Cui, Qi Yang, Zhu Li, Yiling Xu http://arxiv.org/pdf/2412.11170v1 None
2024-12-15 Combating Multimodal LLM Hallucination via Bottom-up Holistic Reasoning 对抗多模态LLM幻觉的底层整体推理 Shengqiong Wu, Hao Fei, Liangming Pan, William Yang Wang, Shuicheng Yan, Tat-Seng Chua http://arxiv.org/pdf/2412.11124v1 None
2024-12-15 Plug-and-Play Priors as a Score-Based Method 插件式先验作为基于分数的方法 Chicago Y. Park, Yuyang Hu, Michael T. McCann, Cristina Garcia-Cardona, Brendt Wohlberg, Ulugbek S. Kamilov http://arxiv.org/pdf/2412.11108v1 https://github.com/wustl-cig/score_pnp.
2024-12-15 HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation HC-LLM:基于历史约束的放射学报告生成大型语言模型 Tengfei Liu, Jiapu Wang, Yongli Hu, Mingjie Li, Junfei Yi, Xiaojun Chang, Junbin Gao, Baocai Yin http://arxiv.org/pdf/2412.11070v1 None
2024-12-15 SHMT: Self-supervised Hierarchical Makeup Transfer via Latent Diffusion Models SHMT:基于潜在扩散模型的自我监督分层化妆迁移 Zhaoyang Sun, Shengwu Xiong, Yaxiong Chen, Fei Du, Weihua Chen, Fan Wang, Yi Rong http://arxiv.org/pdf/2412.11058v1 https://github.com/Snowfallingplum/SHMT
2024-12-15 RAC3: Retrieval-Augmented Corner Case Comprehension for Autonomous Driving with Vision-Language Models RAC3:基于视觉-语言模型的自动驾驶边缘案例理解检索增强 Yujin Wang, Quanfeng Liu, Jiaqi Fan, Jinlong Hong, Hongqing Chu, Mengjian Tian, Bingzhao Gao, Hong Chen http://arxiv.org/pdf/2412.11050v1 None

图像编辑/处理

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-15 Image Forgery Localization with State Space Models 基于状态空间模型的图像伪造定位 Zijie Lou, Gang Cao http://arxiv.org/pdf/2412.11214v1 None
2024-12-15 Efficient Quantization-Aware Training on Segment Anything Model in Medical Images and Its Deployment 高效在医学图像中针对Segment Anything模型进行量化感知训练及其部署 Haisheng Lu, Yujie Fu, Fan Zhang, Le Zhang http://arxiv.org/pdf/2412.11186v1 https://github.com/AVC2-UESTC/QMedSAM.
2024-12-15 Why and How: Knowledge-Guided Learning for Cross-Spectral Image Patch Matching 为什么以及如何:跨光谱图像块匹配的知识引导学习 Chuang Yu, Yunpeng Liu, Jinmiao Zhao, Xiangyu Yue http://arxiv.org/pdf/2412.11161v1 https://github.com/YuChuang1205/KGL-Net.
2024-12-15 Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing 双时序逆变换:用于真实图像编辑的训练和调优免逆变换 Jiancheng Huang, Yi Huang, Jianzhuang Liu, Donghao Zhou, Yifan Liu, Shifeng Chen http://arxiv.org/pdf/2412.11152v1 None
2024-12-15 Unpaired Multi-Domain Histopathology Virtual Staining using Dual Path Prompted Inversion 无配对多域病理学虚拟染色:基于双路径提示的逆变换 Bing Xiong, Yue Peng, RanRan Zhang, Fuqiang Chen, JiaYe He, Wenjian Qin http://arxiv.org/pdf/2412.11106v1 None
2024-12-15 Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval 基于原因的检索:训练自由零样本组合图像检索的单阶段反思思维链 Yuanmin Tang, Xiaoting Qin, Jue Zhang, Jing Yu, Gaopeng Gou, Gang Xiong, Qingwei Ling, Saravan Rajmohan http://arxiv.org/pdf/2412.11077v1 https://github.com/Pter61/osrcir2024
2024-12-15 From Simple to Professional: A Combinatorial Controllable Image Captioning Agent 从简单到专业:一种组合可控图像描述生成代理 Xinran Wang, Muxi Diao, Baoteng Li, Haiwen Zhang, Kongming Liang, Zhanyu Ma http://arxiv.org/pdf/2412.11025v1 https://github.com/xin-ran-w/CapAgent.

场景理解

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-15 SceneLLM: Implicit Language Reasoning in LLM for Dynamic Scene Graph Generation 场景LLM:用于动态场景图生成的LLM中的隐式语言推理 Hang Zhang, Zhuoling Li, Jun Liu http://arxiv.org/pdf/2412.11026v1 None

实例分割

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-15 SAM-IF: Leveraging SAM for Incremental Few-Shot Instance Segmentation SAM-IF:利用SAM进行增量小样本实例分割 Xudong Zhou, Wenhao He http://arxiv.org/pdf/2412.11034v1 None

渲染

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-15 Empowering LLMs to Understand and Generate Complex Vector Graphics 赋能大型语言模型理解和生成复杂矢量图形 Ximing Xing, Juncheng Hu, Guotao Liang, Jing Zhang, Dong Xu, Qian Yu http://arxiv.org/pdf/2412.11102v1 None

目标检测

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-15 Detecting Daily Living Gait Amid Huntington's Disease Chorea using a Foundation Deep Learning Model 利用基础深度学习模型检测亨廷顿病舞蹈症期间的日常生活步态 Dafna Schwartz, Lori Quinn, Nora E. Fritz, Lisa M. Muratori, Jeffery M. Hausdorff, Ran Gilad Bachrach http://arxiv.org/pdf/2412.11286v1 None
2024-12-15 From Easy to Hard: Progressive Active Learning Framework for Infrared Small Target Detection with Single Point Supervision 从易到难:基于单点监督的渐进式主动学习红外小目标检测框架 Chuang Yu, Jinmiao Zhao, Yunpeng Liu, Sicheng Zhao, Xiangyu Yue http://arxiv.org/pdf/2412.11154v1 https://github.com/YuChuang1205/PAL.
2024-12-15 Redefining Normal: A Novel Object-Level Approach for Multi-Object Novelty Detection 重新定义正常:多对象新颖性检测的一种新颖的物体级方法 Mohammadreza Salehi, Nikolaos Apostolikas, Efstratios Gavves, Cees G. M. Snoek, Yuki M. Asano http://arxiv.org/pdf/2412.11148v1 https://github.com/SMSD75/Redefining_Normal_ACCV24
2024-12-15 Deep Spectral Clustering via Joint Spectral Embedding and Kmeans 深度光谱聚类:通过联合光谱嵌入和K-means Wengang Guo, Wei Ye http://arxiv.org/pdf/2412.11080v1 None

自监督学习

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-15 Impact of Adversarial Attacks on Deep Learning Model Explainability 深度学习模型可解释性受对抗攻击的影响 Gazi Nazia Nur, Mohammad Ahnaf Sadat http://arxiv.org/pdf/2412.11119v1 None
2024-12-15 Adapter-Enhanced Semantic Prompting for Continual Learning 增强适配器语义提示的持续学习 Baocai Yin, Ji Zhao, Huajie Jiang, Ningning Hou, Yongli Hu, Amin Beheshti, Ming-Hsuan Yang, Yuankai Qi http://arxiv.org/pdf/2412.11074v1 None

视觉-语言理解

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-15 Seeing the Forest and the Trees: Solving Visual Graph and Tree Based Data Structure Problems using Large Multimodal Models 看见森林与树木:利用大型多模态模型解决视觉图和树状数据结构问题 Sebastian Gutierrez, Irene Hou, Jihye Lee, Kenneth Angelikas, Owen Man, Sophia Mettille, James Prather, Paul Denny http://arxiv.org/pdf/2412.11088v1 None

视频分析

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-15 Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition Uni-AdaFocus:视频识别的空间-时间动态计算 Yulin Wang, Haoji Zhang, Yang Yue, Shiji Song, Chao Deng, Junlan Feng, Gao Huang http://arxiv.org/pdf/2412.11228v1 None

视频生成

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-15 VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping 生动面孔:一种基于扩散的高保真视频人脸交换混合框架 Hao Shao, Shulun Wang, Yang Zhou, Guanglu Song, Dailan He, Shuo Qin, Zhuofan Zong, Bingqi Ma http://arxiv.org/pdf/2412.11279v1 None
2024-12-15 AI-Driven Innovations in Volumetric Video Streaming: A Review 基于AI的体积视频流创新综述 Erfan Entezami, Hui Guan http://arxiv.org/pdf/2412.12208v1 None
2024-12-15 DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes 动态缩放器:全景场景的无缝和可扩展视频生成 Jinxiu Liu, Shaoheng Lin, Yinxiao Li, Ming-Hsuan Yang http://arxiv.org/pdf/2412.11100v1 None
2024-12-15 Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track TREC 2024 医学视频问答(MedVidQA)赛道概述 Deepak Gupta, Dina Demner-Fushman http://arxiv.org/pdf/2412.11056v1 None

视频追踪

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-15 Exploring Enhanced Contextual Information for Video-Level Object Tracking 探索视频级目标跟踪的增强上下文信息 Ben Kang, Xin Chen, Simiao Lai, Yang Liu, Yi Liu, Dong Wang http://arxiv.org/pdf/2412.11023v1 https://github.com/kangben258/MCITrack.

语义分割

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-15 MoRe: Class Patch Attention Needs Regularization for Weakly Supervised Semantic Segmentation MoRe:弱监督语义分割中类块注意力需要正则化 Zhiwei Yang, Yucong Meng, Kexue Fu, Shuo Wang, Zhijian Song http://arxiv.org/pdf/2412.11076v1 https://github.com/zwyang6/MoRe.
2024-12-15 Classification Drives Geographic Bias in Street Scene Segmentation 街景分割中的地理偏差驱动分类 Rahul Nair, Gabriel Tseng, Esther Rolf, Bhanu Tokas, Hannah Kerner http://arxiv.org/pdf/2412.11061v1 None

其他

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-15 Drawing the Line: Enhancing Trustworthiness of MLLMs Through the Power of Refusal 绘制界限:通过拒绝的力量增强多模态语言模型的可靠性 Yuhao Wang, Zhiyuan Zhu, Heyang Liu, Yusheng Liao, Hongcheng Liu, Yanfeng Wang, Yu Wang http://arxiv.org/pdf/2412.11196v1 None
2024-12-15 Making Bias Amplification in Balanced Datasets Directional and Interpretable 在平衡数据集中使偏差放大方向化和可解释 Bhanu Tokas, Rahul Nair, Hannah Kerner http://arxiv.org/pdf/2412.11060v1 None