发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-15 | Modality-Driven Design for Multi-Step Dexterous Manipulation: Insights from Neuroscience | 多模态驱动设计用于多步灵巧操作:神经科学启示 | Naoki Wake, Atsushi Kanehira, Daichi Saito, Jun Takamatsu, Kazuhiro Sasabuchi, Hideki Koike, Katsushi Ikeuchi | http://arxiv.org/pdf/2412.11337v1 | None |
2024-12-15 | Learning Normal Flow Directly From Event Neighborhoods | 直接从事件邻域学习正常流 | Dehao Yuan, Levi Burner, Jiayi Wu, Minghui Liu, Jingxi Chen, Yiannis Aloimonos, Cornelia Fermüller | http://arxiv.org/pdf/2412.11284v1 | None |
2024-12-15 | GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs | 高斯属性:利用LMMs将物理属性整合到3D高斯中 | Xinli Xu, Wenhang Ge, Dicong Qiu, ZhiFei Chen, Dongyu Yan, Zhuoyun Liu, Haoyu Zhao, Hanfeng Zhao | http://arxiv.org/pdf/2412.11258v1 | None |
2024-12-15 | Volumetric Mapping with Panoptic Refinement via Kernel Density Estimation for Mobile Robots | 基于核密度估计的移动机器人全景细化体积映射 | Khang Nguyen, Tuan Dang, Manfred Huber | http://arxiv.org/pdf/2412.11241v1 | https://github.com/mkhangg/refined |
2024-12-15 | ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction | ViPOcc:利用视觉先验知识从视觉基础模型中进行单视图3D占用预测 | Yi Feng, Yu Han, Xijing Zhang, Tanghui Li, Yanting Zhang, Rui Fan | http://arxiv.org/pdf/2412.11210v1 | None |
2024-12-15 | GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control | GEM:一种可泛化的自我视觉多模态世界模型,用于精细粒度的自我运动、物体动态和场景构图控制 | Mariam Hassan, Sebastian Stapf, Ahmad Rahimi, Pedro M B Rezende, Yasaman Haghighi, David Brüggemann, Isinsu Katircioglu, Lin Zhang | http://arxiv.org/pdf/2412.11198v1 | None |
2024-12-15 | AURORA: Automated Unleash of 3D Room Outlines for VR Applications | AURORA:面向VR应用的自动释放3D房间轮廓 | Huijun Han, Yongqing Liang, Yuanlong Zhou, Wenping Wang, Edgar J. Rojas-Munoz, Xin Li | http://arxiv.org/pdf/2412.11033v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-15 | Sonicmesh: Enhancing 3D Human Mesh Reconstruction in Vision-Impaired Environments With Acoustic Signals | 声网:利用声学信号增强视觉障碍环境中的3D人体网格重建 | Xiaoxuan Liang, Wuyang Zhang, Hong Zhou, Zhaolong Wei, Sicheng Zhu, Yansong Li, Rui Yin, Jiantao Yuan | http://arxiv.org/pdf/2412.11325v1 | None |
2024-12-15 | OTLRM: Orthogonal Learning-based Low-Rank Metric for Multi-Dimensional Inverse Problems | 正交学习基低秩度量多维逆问题 | Xiangming Wang, Haijin Zeng, Jiaoyang Chen, Sheng Liu, Yongyong Chen, Guoqing Chao | http://arxiv.org/pdf/2412.11165v1 | None |
2024-12-15 | A Digitalized Atlas for Pulmonary Airway | 数字化肺气道图谱 | Minghui Zhang, Chenyu Li, Hanxiao Zhang, Yaoyu Liu, Yun Gu | http://arxiv.org/pdf/2412.11039v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-15 | Macro2Micro: Cross-modal Magnetic Resonance Imaging Synthesis Leveraging Multi-scale Brain Structures | 宏观到微观:利用多尺度脑结构的跨模态磁共振成像合成 | Sooyoung Kim, Joonwoo Kwon, Junbeom Kwon, Sangyoon Bae, Yuewei Lin, Shinjae Yoo, Jiook Cha | http://arxiv.org/pdf/2412.11277v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-15 | Unimodal and Multimodal Static Facial Expression Recognition for Virtual Reality Users with EmoHeVRDB | 单模态和多模态静态面部表情识别:针对虚拟现实用户的EmoHeVRDB | Thorben Ortmann, Qi Wang, Larissa Putzar | http://arxiv.org/pdf/2412.11306v1 | None |
2024-12-15 | Facial Surgery Preview Based on the Orthognathic Treatment Prediction | 基于正颌治疗预测的面部整形预览 | Huijun Han, Congyi Zhang, Lifeng Zhu, Pradeep Singh, Richard Tai Chiu Hsung, Yiu Yan Leung, Taku Komura, Wenping Wang | http://arxiv.org/pdf/2412.11045v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-15 | A Comprehensive Survey of Action Quality Assessment: Method and Benchmark | 动作质量评估综述:方法与基准 | Kanglei Zhou, Ruizhi Cai, Liyuan Wang, Hubert P. H. Shum, Xiaohui Liang | http://arxiv.org/pdf/2412.11149v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-15 | On the Generalizability of Iterative Patch Selection for Memory-Efficient High-Resolution Image Classification | 关于迭代补丁选择在内存高效高分辨率图像分类中的泛化性研究 | Max Riffi-Aslett, Christina Fell | http://arxiv.org/pdf/2412.11237v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-15 | Towards Context-aware Convolutional Network for Image Restoration | 面向上下文感知卷积网络进行图像恢复 | Fangwei Hao, Ji Du, Weiyun Liang, Jing Xu, Xiaoxuan Xu | http://arxiv.org/pdf/2412.11008v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-15 | One-Shot Multilingual Font Generation Via ViT | 一次多语言字体生成通过ViT | Zhiheng Wang, Jiarui Liu | http://arxiv.org/pdf/2412.11342v1 | None |
2024-12-15 | Provably Secure Robust Image Steganography via Cross-Modal Error Correction | 基于跨模态误差校正的可证明安全鲁棒图像隐写术 | Yuang Qi, Kejiang Chen, Na Zhao, Zijin Yang, Weiming Zhang | http://arxiv.org/pdf/2412.12206v1 | None |
2024-12-15 | GenLit: Reformulating Single-Image Relighting as Video Generation | GenLit:将单图像重光照重新表述为视频生成 | Shrisha Bharadwaj, Haiwen Feng, Victoria Abrevaya, Michael J. Black | http://arxiv.org/pdf/2412.11224v1 | None |
2024-12-15 | Distribution-Consistency-Guided Multi-modal Hashing | 分布一致性引导的多模态哈希 | Jin-Yu Liu, Xian-Ling Mao, Tian-Yi Che, Rong-Cheng Tu | http://arxiv.org/pdf/2412.11216v1 | https://github.com/LiuJinyu1229/DCGMH. |
2024-12-15 | Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation | 轻量级快速文本到动作生成模型:Light-T2M | Ling-An Zeng, Guohong Huang, Gaojie Wu, Wei-Shi Zheng | http://arxiv.org/pdf/2412.11193v1 | https://github.com/qinghuannn/light-t2m. |
2024-12-15 | OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation | OccScene:基于语义占用跨任务互学习的3D场景生成 | Bohan Li, Xin Jin, Jianan Wang, Yukai Shi, Yasheng Sun, Xiaofeng Wang, Zhuang Ma, Baao Xie | http://arxiv.org/pdf/2412.11183v1 | None |
2024-12-15 | Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation | 基准测试与学习用于文本到3D生成的多维度质量评估器 | Yujie Zhang, Bingyang Cui, Qi Yang, Zhu Li, Yiling Xu | http://arxiv.org/pdf/2412.11170v1 | None |
2024-12-15 | Combating Multimodal LLM Hallucination via Bottom-up Holistic Reasoning | 对抗多模态LLM幻觉的底层整体推理 | Shengqiong Wu, Hao Fei, Liangming Pan, William Yang Wang, Shuicheng Yan, Tat-Seng Chua | http://arxiv.org/pdf/2412.11124v1 | None |
2024-12-15 | Plug-and-Play Priors as a Score-Based Method | 插件式先验作为基于分数的方法 | Chicago Y. Park, Yuyang Hu, Michael T. McCann, Cristina Garcia-Cardona, Brendt Wohlberg, Ulugbek S. Kamilov | http://arxiv.org/pdf/2412.11108v1 | https://github.com/wustl-cig/score_pnp. |
2024-12-15 | HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation | HC-LLM:基于历史约束的放射学报告生成大型语言模型 | Tengfei Liu, Jiapu Wang, Yongli Hu, Mingjie Li, Junfei Yi, Xiaojun Chang, Junbin Gao, Baocai Yin | http://arxiv.org/pdf/2412.11070v1 | None |
2024-12-15 | SHMT: Self-supervised Hierarchical Makeup Transfer via Latent Diffusion Models | SHMT:基于潜在扩散模型的自我监督分层化妆迁移 | Zhaoyang Sun, Shengwu Xiong, Yaxiong Chen, Fei Du, Weihua Chen, Fan Wang, Yi Rong | http://arxiv.org/pdf/2412.11058v1 | https://github.com/Snowfallingplum/SHMT |
2024-12-15 | RAC3: Retrieval-Augmented Corner Case Comprehension for Autonomous Driving with Vision-Language Models | RAC3:基于视觉-语言模型的自动驾驶边缘案例理解检索增强 | Yujin Wang, Quanfeng Liu, Jiaqi Fan, Jinlong Hong, Hongqing Chu, Mengjian Tian, Bingzhao Gao, Hong Chen | http://arxiv.org/pdf/2412.11050v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-15 | Image Forgery Localization with State Space Models | 基于状态空间模型的图像伪造定位 | Zijie Lou, Gang Cao | http://arxiv.org/pdf/2412.11214v1 | None |
2024-12-15 | Efficient Quantization-Aware Training on Segment Anything Model in Medical Images and Its Deployment | 高效在医学图像中针对Segment Anything模型进行量化感知训练及其部署 | Haisheng Lu, Yujie Fu, Fan Zhang, Le Zhang | http://arxiv.org/pdf/2412.11186v1 | https://github.com/AVC2-UESTC/QMedSAM. |
2024-12-15 | Why and How: Knowledge-Guided Learning for Cross-Spectral Image Patch Matching | 为什么以及如何:跨光谱图像块匹配的知识引导学习 | Chuang Yu, Yunpeng Liu, Jinmiao Zhao, Xiangyu Yue | http://arxiv.org/pdf/2412.11161v1 | https://github.com/YuChuang1205/KGL-Net. |
2024-12-15 | Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing | 双时序逆变换:用于真实图像编辑的训练和调优免逆变换 | Jiancheng Huang, Yi Huang, Jianzhuang Liu, Donghao Zhou, Yifan Liu, Shifeng Chen | http://arxiv.org/pdf/2412.11152v1 | None |
2024-12-15 | Unpaired Multi-Domain Histopathology Virtual Staining using Dual Path Prompted Inversion | 无配对多域病理学虚拟染色:基于双路径提示的逆变换 | Bing Xiong, Yue Peng, RanRan Zhang, Fuqiang Chen, JiaYe He, Wenjian Qin | http://arxiv.org/pdf/2412.11106v1 | None |
2024-12-15 | Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval | 基于原因的检索:训练自由零样本组合图像检索的单阶段反思思维链 | Yuanmin Tang, Xiaoting Qin, Jue Zhang, Jing Yu, Gaopeng Gou, Gang Xiong, Qingwei Ling, Saravan Rajmohan | http://arxiv.org/pdf/2412.11077v1 | https://github.com/Pter61/osrcir2024 |
2024-12-15 | From Simple to Professional: A Combinatorial Controllable Image Captioning Agent | 从简单到专业:一种组合可控图像描述生成代理 | Xinran Wang, Muxi Diao, Baoteng Li, Haiwen Zhang, Kongming Liang, Zhanyu Ma | http://arxiv.org/pdf/2412.11025v1 | https://github.com/xin-ran-w/CapAgent. |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-15 | SceneLLM: Implicit Language Reasoning in LLM for Dynamic Scene Graph Generation | 场景LLM:用于动态场景图生成的LLM中的隐式语言推理 | Hang Zhang, Zhuoling Li, Jun Liu | http://arxiv.org/pdf/2412.11026v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-15 | SAM-IF: Leveraging SAM for Incremental Few-Shot Instance Segmentation | SAM-IF:利用SAM进行增量小样本实例分割 | Xudong Zhou, Wenhao He | http://arxiv.org/pdf/2412.11034v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-15 | Empowering LLMs to Understand and Generate Complex Vector Graphics | 赋能大型语言模型理解和生成复杂矢量图形 | Ximing Xing, Juncheng Hu, Guotao Liang, Jing Zhang, Dong Xu, Qian Yu | http://arxiv.org/pdf/2412.11102v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-15 | Detecting Daily Living Gait Amid Huntington's Disease Chorea using a Foundation Deep Learning Model | 利用基础深度学习模型检测亨廷顿病舞蹈症期间的日常生活步态 | Dafna Schwartz, Lori Quinn, Nora E. Fritz, Lisa M. Muratori, Jeffery M. Hausdorff, Ran Gilad Bachrach | http://arxiv.org/pdf/2412.11286v1 | None |
2024-12-15 | From Easy to Hard: Progressive Active Learning Framework for Infrared Small Target Detection with Single Point Supervision | 从易到难:基于单点监督的渐进式主动学习红外小目标检测框架 | Chuang Yu, Jinmiao Zhao, Yunpeng Liu, Sicheng Zhao, Xiangyu Yue | http://arxiv.org/pdf/2412.11154v1 | https://github.com/YuChuang1205/PAL. |
2024-12-15 | Redefining Normal: A Novel Object-Level Approach for Multi-Object Novelty Detection | 重新定义正常:多对象新颖性检测的一种新颖的物体级方法 | Mohammadreza Salehi, Nikolaos Apostolikas, Efstratios Gavves, Cees G. M. Snoek, Yuki M. Asano | http://arxiv.org/pdf/2412.11148v1 | https://github.com/SMSD75/Redefining_Normal_ACCV24 |
2024-12-15 | Deep Spectral Clustering via Joint Spectral Embedding and Kmeans | 深度光谱聚类:通过联合光谱嵌入和K-means | Wengang Guo, Wei Ye | http://arxiv.org/pdf/2412.11080v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-15 | Impact of Adversarial Attacks on Deep Learning Model Explainability | 深度学习模型可解释性受对抗攻击的影响 | Gazi Nazia Nur, Mohammad Ahnaf Sadat | http://arxiv.org/pdf/2412.11119v1 | None |
2024-12-15 | Adapter-Enhanced Semantic Prompting for Continual Learning | 增强适配器语义提示的持续学习 | Baocai Yin, Ji Zhao, Huajie Jiang, Ningning Hou, Yongli Hu, Amin Beheshti, Ming-Hsuan Yang, Yuankai Qi | http://arxiv.org/pdf/2412.11074v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-15 | Seeing the Forest and the Trees: Solving Visual Graph and Tree Based Data Structure Problems using Large Multimodal Models | 看见森林与树木:利用大型多模态模型解决视觉图和树状数据结构问题 | Sebastian Gutierrez, Irene Hou, Jihye Lee, Kenneth Angelikas, Owen Man, Sophia Mettille, James Prather, Paul Denny | http://arxiv.org/pdf/2412.11088v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-15 | Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition | Uni-AdaFocus:视频识别的空间-时间动态计算 | Yulin Wang, Haoji Zhang, Yang Yue, Shiji Song, Chao Deng, Junlan Feng, Gao Huang | http://arxiv.org/pdf/2412.11228v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-15 | VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping | 生动面孔:一种基于扩散的高保真视频人脸交换混合框架 | Hao Shao, Shulun Wang, Yang Zhou, Guanglu Song, Dailan He, Shuo Qin, Zhuofan Zong, Bingqi Ma | http://arxiv.org/pdf/2412.11279v1 | None |
2024-12-15 | AI-Driven Innovations in Volumetric Video Streaming: A Review | 基于AI的体积视频流创新综述 | Erfan Entezami, Hui Guan | http://arxiv.org/pdf/2412.12208v1 | None |
2024-12-15 | DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes | 动态缩放器:全景场景的无缝和可扩展视频生成 | Jinxiu Liu, Shaoheng Lin, Yinxiao Li, Ming-Hsuan Yang | http://arxiv.org/pdf/2412.11100v1 | None |
2024-12-15 | Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track | TREC 2024 医学视频问答(MedVidQA)赛道概述 | Deepak Gupta, Dina Demner-Fushman | http://arxiv.org/pdf/2412.11056v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-15 | Exploring Enhanced Contextual Information for Video-Level Object Tracking | 探索视频级目标跟踪的增强上下文信息 | Ben Kang, Xin Chen, Simiao Lai, Yang Liu, Yi Liu, Dong Wang | http://arxiv.org/pdf/2412.11023v1 | https://github.com/kangben258/MCITrack. |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-15 | MoRe: Class Patch Attention Needs Regularization for Weakly Supervised Semantic Segmentation | MoRe:弱监督语义分割中类块注意力需要正则化 | Zhiwei Yang, Yucong Meng, Kexue Fu, Shuo Wang, Zhijian Song | http://arxiv.org/pdf/2412.11076v1 | https://github.com/zwyang6/MoRe. |
2024-12-15 | Classification Drives Geographic Bias in Street Scene Segmentation | 街景分割中的地理偏差驱动分类 | Rahul Nair, Gabriel Tseng, Esther Rolf, Bhanu Tokas, Hannah Kerner | http://arxiv.org/pdf/2412.11061v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-15 | Drawing the Line: Enhancing Trustworthiness of MLLMs Through the Power of Refusal | 绘制界限:通过拒绝的力量增强多模态语言模型的可靠性 | Yuhao Wang, Zhiyuan Zhu, Heyang Liu, Yusheng Liao, Hongcheng Liu, Yanfeng Wang, Yu Wang | http://arxiv.org/pdf/2412.11196v1 | None |
2024-12-15 | Making Bias Amplification in Balanced Datasets Directional and Interpretable | 在平衡数据集中使偏差放大方向化和可解释 | Bhanu Tokas, Rahul Nair, Hannah Kerner | http://arxiv.org/pdf/2412.11060v1 | None |