[UPDATED!] 2024-12-15 (Update Time)

3D感知

发布日期	英文标题	中文标题	作者	PDF链接	代码链接
2024-12-15	Modality-Driven Design for Multi-Step Dexterous Manipulation: Insights from Neuroscience	多模态驱动设计用于多步灵巧操作：神经科学启示	Naoki Wake, Atsushi Kanehira, Daichi Saito, Jun Takamatsu, Kazuhiro Sasabuchi, Hideki Koike, Katsushi Ikeuchi	http://arxiv.org/pdf/2412.11337v1	None
2024-12-15	Learning Normal Flow Directly From Event Neighborhoods	直接从事件邻域学习正常流	Dehao Yuan, Levi Burner, Jiayi Wu, Minghui Liu, Jingxi Chen, Yiannis Aloimonos, Cornelia Fermüller	http://arxiv.org/pdf/2412.11284v1	None
2024-12-15	GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs	高斯属性：利用LMMs将物理属性整合到3D高斯中	Xinli Xu, Wenhang Ge, Dicong Qiu, ZhiFei Chen, Dongyu Yan, Zhuoyun Liu, Haoyu Zhao, Hanfeng Zhao	http://arxiv.org/pdf/2412.11258v1	None
2024-12-15	Volumetric Mapping with Panoptic Refinement via Kernel Density Estimation for Mobile Robots	基于核密度估计的移动机器人全景细化体积映射	Khang Nguyen, Tuan Dang, Manfred Huber	http://arxiv.org/pdf/2412.11241v1	https://github.com/mkhangg/refined
2024-12-15	ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction	ViPOcc：利用视觉先验知识从视觉基础模型中进行单视图3D占用预测	Yi Feng, Yu Han, Xijing Zhang, Tanghui Li, Yanting Zhang, Rui Fan	http://arxiv.org/pdf/2412.11210v1	None
2024-12-15	GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control	GEM：一种可泛化的自我视觉多模态世界模型，用于精细粒度的自我运动、物体动态和场景构图控制	Mariam Hassan, Sebastian Stapf, Ahmad Rahimi, Pedro M B Rezende, Yasaman Haghighi, David Brüggemann, Isinsu Katircioglu, Lin Zhang	http://arxiv.org/pdf/2412.11198v1	None
2024-12-15	AURORA: Automated Unleash of 3D Room Outlines for VR Applications	AURORA：面向VR应用的自动释放3D房间轮廓	Huijun Han, Yongqing Liang, Yuanlong Zhou, Wenping Wang, Edgar J. Rojas-Munoz, Xin Li	http://arxiv.org/pdf/2412.11033v1	None

3D重建

发布日期	英文标题	中文标题	作者	PDF链接	代码链接
2024-12-15	Sonicmesh: Enhancing 3D Human Mesh Reconstruction in Vision-Impaired Environments With Acoustic Signals	声网：利用声学信号增强视觉障碍环境中的3D人体网格重建	Xiaoxuan Liang, Wuyang Zhang, Hong Zhou, Zhaolong Wei, Sicheng Zhu, Yansong Li, Rui Yin, Jiantao Yuan	http://arxiv.org/pdf/2412.11325v1	None
2024-12-15	OTLRM: Orthogonal Learning-based Low-Rank Metric for Multi-Dimensional Inverse Problems	正交学习基低秩度量多维逆问题	Xiangming Wang, Haijin Zeng, Jiaoyang Chen, Sheng Liu, Yongyong Chen, Guoqing Chao	http://arxiv.org/pdf/2412.11165v1	None
2024-12-15	A Digitalized Atlas for Pulmonary Airway	数字化肺气道图谱	Minghui Zhang, Chenyu Li, Hanxiao Zhang, Yaoyu Liu, Yun Gu	http://arxiv.org/pdf/2412.11039v1	None

NeRF

发布日期	英文标题	中文标题	作者	PDF链接	代码链接
2024-12-15	Macro2Micro: Cross-modal Magnetic Resonance Imaging Synthesis Leveraging Multi-scale Brain Structures	宏观到微观：利用多尺度脑结构的跨模态磁共振成像合成	Sooyoung Kim, Joonwoo Kwon, Junbeom Kwon, Sangyoon Bae, Yuewei Lin, Shinjae Yoo, Jiook Cha	http://arxiv.org/pdf/2412.11277v1	None

人脸识别/处理

发布日期	英文标题	中文标题	作者	PDF链接	代码链接
2024-12-15	Unimodal and Multimodal Static Facial Expression Recognition for Virtual Reality Users with EmoHeVRDB	单模态和多模态静态面部表情识别：针对虚拟现实用户的EmoHeVRDB	Thorben Ortmann, Qi Wang, Larissa Putzar	http://arxiv.org/pdf/2412.11306v1	None
2024-12-15	Facial Surgery Preview Based on the Orthognathic Treatment Prediction	基于正颌治疗预测的面部整形预览	Huijun Han, Congyi Zhang, Lifeng Zhu, Pradeep Singh, Richard Tai Chiu Hsung, Yiu Yan Leung, Taku Komura, Wenping Wang	http://arxiv.org/pdf/2412.11045v1	None

动作识别

发布日期	英文标题	中文标题	作者	PDF链接	代码链接
2024-12-15	A Comprehensive Survey of Action Quality Assessment: Method and Benchmark	动作质量评估综述：方法与基准	Kanglei Zhou, Ruizhi Cai, Liyuan Wang, Hubert P. H. Shum, Xiaohui Liang	http://arxiv.org/pdf/2412.11149v1	None

图像分类

发布日期	英文标题	中文标题	作者	PDF链接	代码链接
2024-12-15	On the Generalizability of Iterative Patch Selection for Memory-Efficient High-Resolution Image Classification	关于迭代补丁选择在内存高效高分辨率图像分类中的泛化性研究	Max Riffi-Aslett, Christina Fell	http://arxiv.org/pdf/2412.11237v1	None

图像恢复

发布日期	英文标题	中文标题	作者	PDF链接	代码链接
2024-12-15	Towards Context-aware Convolutional Network for Image Restoration	面向上下文感知卷积网络进行图像恢复	Fangwei Hao, Ji Du, Weiyun Liang, Jing Xu, Xiaoxuan Xu	http://arxiv.org/pdf/2412.11008v1	None

图像生成/合成

发布日期	英文标题	中文标题	作者	PDF链接	代码链接
2024-12-15	One-Shot Multilingual Font Generation Via ViT	一次多语言字体生成通过ViT	Zhiheng Wang, Jiarui Liu	http://arxiv.org/pdf/2412.11342v1	None
2024-12-15	Provably Secure Robust Image Steganography via Cross-Modal Error Correction	基于跨模态误差校正的可证明安全鲁棒图像隐写术	Yuang Qi, Kejiang Chen, Na Zhao, Zijin Yang, Weiming Zhang	http://arxiv.org/pdf/2412.12206v1	None
2024-12-15	GenLit: Reformulating Single-Image Relighting as Video Generation	GenLit：将单图像重光照重新表述为视频生成	Shrisha Bharadwaj, Haiwen Feng, Victoria Abrevaya, Michael J. Black	http://arxiv.org/pdf/2412.11224v1	None
2024-12-15	Distribution-Consistency-Guided Multi-modal Hashing	分布一致性引导的多模态哈希	Jin-Yu Liu, Xian-Ling Mao, Tian-Yi Che, Rong-Cheng Tu	http://arxiv.org/pdf/2412.11216v1	https://github.com/LiuJinyu1229/DCGMH.
2024-12-15	Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation	轻量级快速文本到动作生成模型：Light-T2M	Ling-An Zeng, Guohong Huang, Gaojie Wu, Wei-Shi Zheng	http://arxiv.org/pdf/2412.11193v1	https://github.com/qinghuannn/light-t2m.
2024-12-15	OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation	OccScene：基于语义占用跨任务互学习的3D场景生成	Bohan Li, Xin Jin, Jianan Wang, Yukai Shi, Yasheng Sun, Xiaofeng Wang, Zhuang Ma, Baao Xie	http://arxiv.org/pdf/2412.11183v1	None
2024-12-15	Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation	基准测试与学习用于文本到3D生成的多维度质量评估器	Yujie Zhang, Bingyang Cui, Qi Yang, Zhu Li, Yiling Xu	http://arxiv.org/pdf/2412.11170v1	None
2024-12-15	Combating Multimodal LLM Hallucination via Bottom-up Holistic Reasoning	对抗多模态LLM幻觉的底层整体推理	Shengqiong Wu, Hao Fei, Liangming Pan, William Yang Wang, Shuicheng Yan, Tat-Seng Chua	http://arxiv.org/pdf/2412.11124v1	None
2024-12-15	Plug-and-Play Priors as a Score-Based Method	插件式先验作为基于分数的方法	Chicago Y. Park, Yuyang Hu, Michael T. McCann, Cristina Garcia-Cardona, Brendt Wohlberg, Ulugbek S. Kamilov	http://arxiv.org/pdf/2412.11108v1	https://github.com/wustl-cig/score_pnp.
2024-12-15	HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation	HC-LLM：基于历史约束的放射学报告生成大型语言模型	Tengfei Liu, Jiapu Wang, Yongli Hu, Mingjie Li, Junfei Yi, Xiaojun Chang, Junbin Gao, Baocai Yin	http://arxiv.org/pdf/2412.11070v1	None
2024-12-15	SHMT: Self-supervised Hierarchical Makeup Transfer via Latent Diffusion Models	SHMT：基于潜在扩散模型的自我监督分层化妆迁移	Zhaoyang Sun, Shengwu Xiong, Yaxiong Chen, Fei Du, Weihua Chen, Fan Wang, Yi Rong	http://arxiv.org/pdf/2412.11058v1	https://github.com/Snowfallingplum/SHMT
2024-12-15	RAC3: Retrieval-Augmented Corner Case Comprehension for Autonomous Driving with Vision-Language Models	RAC3：基于视觉-语言模型的自动驾驶边缘案例理解检索增强	Yujin Wang, Quanfeng Liu, Jiaqi Fan, Jinlong Hong, Hongqing Chu, Mengjian Tian, Bingzhao Gao, Hong Chen	http://arxiv.org/pdf/2412.11050v1	None

图像编辑/处理

发布日期	英文标题	中文标题	作者	PDF链接	代码链接
2024-12-15	Image Forgery Localization with State Space Models	基于状态空间模型的图像伪造定位	Zijie Lou, Gang Cao	http://arxiv.org/pdf/2412.11214v1	None
2024-12-15	Efficient Quantization-Aware Training on Segment Anything Model in Medical Images and Its Deployment	高效在医学图像中针对Segment Anything模型进行量化感知训练及其部署	Haisheng Lu, Yujie Fu, Fan Zhang, Le Zhang	http://arxiv.org/pdf/2412.11186v1	https://github.com/AVC2-UESTC/QMedSAM.
2024-12-15	Why and How: Knowledge-Guided Learning for Cross-Spectral Image Patch Matching	为什么以及如何：跨光谱图像块匹配的知识引导学习	Chuang Yu, Yunpeng Liu, Jinmiao Zhao, Xiangyu Yue	http://arxiv.org/pdf/2412.11161v1	https://github.com/YuChuang1205/KGL-Net.
2024-12-15	Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing	双时序逆变换：用于真实图像编辑的训练和调优免逆变换	Jiancheng Huang, Yi Huang, Jianzhuang Liu, Donghao Zhou, Yifan Liu, Shifeng Chen	http://arxiv.org/pdf/2412.11152v1	None
2024-12-15	Unpaired Multi-Domain Histopathology Virtual Staining using Dual Path Prompted Inversion	无配对多域病理学虚拟染色：基于双路径提示的逆变换	Bing Xiong, Yue Peng, RanRan Zhang, Fuqiang Chen, JiaYe He, Wenjian Qin	http://arxiv.org/pdf/2412.11106v1	None
2024-12-15	Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval	基于原因的检索：训练自由零样本组合图像检索的单阶段反思思维链	Yuanmin Tang, Xiaoting Qin, Jue Zhang, Jing Yu, Gaopeng Gou, Gang Xiong, Qingwei Ling, Saravan Rajmohan	http://arxiv.org/pdf/2412.11077v1	https://github.com/Pter61/osrcir2024
2024-12-15	From Simple to Professional: A Combinatorial Controllable Image Captioning Agent	从简单到专业：一种组合可控图像描述生成代理	Xinran Wang, Muxi Diao, Baoteng Li, Haiwen Zhang, Kongming Liang, Zhanyu Ma	http://arxiv.org/pdf/2412.11025v1	https://github.com/xin-ran-w/CapAgent.

场景理解

发布日期	英文标题	中文标题	作者	PDF链接	代码链接
2024-12-15	SceneLLM: Implicit Language Reasoning in LLM for Dynamic Scene Graph Generation	场景LLM：用于动态场景图生成的LLM中的隐式语言推理	Hang Zhang, Zhuoling Li, Jun Liu	http://arxiv.org/pdf/2412.11026v1	None

实例分割

发布日期	英文标题	中文标题	作者	PDF链接	代码链接
2024-12-15	SAM-IF: Leveraging SAM for Incremental Few-Shot Instance Segmentation	SAM-IF：利用SAM进行增量小样本实例分割	Xudong Zhou, Wenhao He	http://arxiv.org/pdf/2412.11034v1	None

渲染

发布日期	英文标题	中文标题	作者	PDF链接	代码链接
2024-12-15	Empowering LLMs to Understand and Generate Complex Vector Graphics	赋能大型语言模型理解和生成复杂矢量图形	Ximing Xing, Juncheng Hu, Guotao Liang, Jing Zhang, Dong Xu, Qian Yu	http://arxiv.org/pdf/2412.11102v1	None

目标检测

发布日期	英文标题	中文标题	作者	PDF链接	代码链接
2024-12-15	Detecting Daily Living Gait Amid Huntington's Disease Chorea using a Foundation Deep Learning Model	利用基础深度学习模型检测亨廷顿病舞蹈症期间的日常生活步态	Dafna Schwartz, Lori Quinn, Nora E. Fritz, Lisa M. Muratori, Jeffery M. Hausdorff, Ran Gilad Bachrach	http://arxiv.org/pdf/2412.11286v1	None
2024-12-15	From Easy to Hard: Progressive Active Learning Framework for Infrared Small Target Detection with Single Point Supervision	从易到难：基于单点监督的渐进式主动学习红外小目标检测框架	Chuang Yu, Jinmiao Zhao, Yunpeng Liu, Sicheng Zhao, Xiangyu Yue	http://arxiv.org/pdf/2412.11154v1	https://github.com/YuChuang1205/PAL.
2024-12-15	Redefining Normal: A Novel Object-Level Approach for Multi-Object Novelty Detection	重新定义正常：多对象新颖性检测的一种新颖的物体级方法	Mohammadreza Salehi, Nikolaos Apostolikas, Efstratios Gavves, Cees G. M. Snoek, Yuki M. Asano	http://arxiv.org/pdf/2412.11148v1	https://github.com/SMSD75/Redefining_Normal_ACCV24
2024-12-15	Deep Spectral Clustering via Joint Spectral Embedding and Kmeans	深度光谱聚类：通过联合光谱嵌入和K-means	Wengang Guo, Wei Ye	http://arxiv.org/pdf/2412.11080v1	None

自监督学习

发布日期	英文标题	中文标题	作者	PDF链接	代码链接
2024-12-15	Impact of Adversarial Attacks on Deep Learning Model Explainability	深度学习模型可解释性受对抗攻击的影响	Gazi Nazia Nur, Mohammad Ahnaf Sadat	http://arxiv.org/pdf/2412.11119v1	None
2024-12-15	Adapter-Enhanced Semantic Prompting for Continual Learning	增强适配器语义提示的持续学习	Baocai Yin, Ji Zhao, Huajie Jiang, Ningning Hou, Yongli Hu, Amin Beheshti, Ming-Hsuan Yang, Yuankai Qi	http://arxiv.org/pdf/2412.11074v1	None

视觉-语言理解

发布日期	英文标题	中文标题	作者	PDF链接	代码链接
2024-12-15	Seeing the Forest and the Trees: Solving Visual Graph and Tree Based Data Structure Problems using Large Multimodal Models	看见森林与树木：利用大型多模态模型解决视觉图和树状数据结构问题	Sebastian Gutierrez, Irene Hou, Jihye Lee, Kenneth Angelikas, Owen Man, Sophia Mettille, James Prather, Paul Denny	http://arxiv.org/pdf/2412.11088v1	None

视频分析

发布日期	英文标题	中文标题	作者	PDF链接	代码链接
2024-12-15	Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition	Uni-AdaFocus：视频识别的空间-时间动态计算	Yulin Wang, Haoji Zhang, Yang Yue, Shiji Song, Chao Deng, Junlan Feng, Gao Huang	http://arxiv.org/pdf/2412.11228v1	None

视频生成

发布日期	英文标题	中文标题	作者	PDF链接	代码链接
2024-12-15	VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping	生动面孔：一种基于扩散的高保真视频人脸交换混合框架	Hao Shao, Shulun Wang, Yang Zhou, Guanglu Song, Dailan He, Shuo Qin, Zhuofan Zong, Bingqi Ma	http://arxiv.org/pdf/2412.11279v1	None
2024-12-15	AI-Driven Innovations in Volumetric Video Streaming: A Review	基于AI的体积视频流创新综述	Erfan Entezami, Hui Guan	http://arxiv.org/pdf/2412.12208v1	None
2024-12-15	DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes	动态缩放器：全景场景的无缝和可扩展视频生成	Jinxiu Liu, Shaoheng Lin, Yinxiao Li, Ming-Hsuan Yang	http://arxiv.org/pdf/2412.11100v1	None
2024-12-15	Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track	TREC 2024 医学视频问答（MedVidQA）赛道概述	Deepak Gupta, Dina Demner-Fushman	http://arxiv.org/pdf/2412.11056v1	None

视频追踪

发布日期	英文标题	中文标题	作者	PDF链接	代码链接
2024-12-15	Exploring Enhanced Contextual Information for Video-Level Object Tracking	探索视频级目标跟踪的增强上下文信息	Ben Kang, Xin Chen, Simiao Lai, Yang Liu, Yi Liu, Dong Wang	http://arxiv.org/pdf/2412.11023v1	https://github.com/kangben258/MCITrack.

语义分割

发布日期	英文标题	中文标题	作者	PDF链接	代码链接
2024-12-15	MoRe: Class Patch Attention Needs Regularization for Weakly Supervised Semantic Segmentation	MoRe：弱监督语义分割中类块注意力需要正则化	Zhiwei Yang, Yucong Meng, Kexue Fu, Shuo Wang, Zhijian Song	http://arxiv.org/pdf/2412.11076v1	https://github.com/zwyang6/MoRe.
2024-12-15	Classification Drives Geographic Bias in Street Scene Segmentation	街景分割中的地理偏差驱动分类	Rahul Nair, Gabriel Tseng, Esther Rolf, Bhanu Tokas, Hannah Kerner	http://arxiv.org/pdf/2412.11061v1	None

其他

发布日期	英文标题	中文标题	作者	PDF链接	代码链接
2024-12-15	Drawing the Line: Enhancing Trustworthiness of MLLMs Through the Power of Refusal	绘制界限：通过拒绝的力量增强多模态语言模型的可靠性	Yuhao Wang, Zhiyuan Zhu, Heyang Liu, Yusheng Liao, Hongcheng Liu, Yanfeng Wang, Yu Wang	http://arxiv.org/pdf/2412.11196v1	None
2024-12-15	Making Bias Amplification in Balanced Datasets Directional and Interpretable	在平衡数据集中使偏差放大方向化和可解释	Bhanu Tokas, Rahul Nair, Hannah Kerner	http://arxiv.org/pdf/2412.11060v1	None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2024-12-15.md

2024-12-15.md

[UPDATED!] 2024-12-15 (Update Time)

3D感知

3D重建

NeRF

人脸识别/处理

动作识别

图像分类

图像恢复

图像生成/合成

图像编辑/处理

场景理解

实例分割

渲染

目标检测

自监督学习

视觉-语言理解

视频分析

视频生成

视频追踪

语义分割

其他

Files

2024-12-15.md

Latest commit

History

2024-12-15.md

File metadata and controls

[UPDATED!] 2024-12-15 (Update Time)

3D感知

3D重建

NeRF

人脸识别/处理

动作识别

图像分类

图像恢复

图像生成/合成

图像编辑/处理

场景理解

实例分割

渲染

目标检测

自监督学习

视觉-语言理解

视频分析

视频生成

视频追踪

语义分割

其他