Skip to content

Latest commit

 

History

History
219 lines (164 loc) · 32.6 KB

2025-01-27.md

File metadata and controls

219 lines (164 loc) · 32.6 KB

[UPDATED!] 2025-01-27 (Update Time)

图像理解

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding 物理世界理解中的视觉-语言模型基准与提升:PhysBench Wei Chow, Jiageng Mao, Boyi Li, Daniel Seita, Vitor Guizilini, Yue Wang http://arxiv.org/pdf/2501.16411v2 None
🆕 发布 Multi-view Structural Convolution Network for Domain-Invariant Point Cloud Recognition of Autonomous Vehicles 多视角结构卷积网络用于自动驾驶车辆领域不变点云识别 Younggun Kim, Beomsik Cho, Seonghoon Ryoo, Soomok Lee http://arxiv.org/pdf/2501.16289v1 https://github.com/MLMLab/MSCN.
🆕 发布 PDC-ViT : Source Camera Identification using Pixel Difference Convolution and Vision Transformer PDC-ViT:基于像素差异卷积和视觉Transformer的源相机识别 Omar Elharrouss, Younes Akbari, Noor Almaadeed, Somaya Al-Maadeed, Fouad Khelifi, Ahmed Bouridane http://arxiv.org/pdf/2501.16227v1 None
📝 更新 Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual Classification 解读您的决策:视觉分类中的逻辑推理正则化以实现泛化 Zhaorui Tan, Xi Yang, Qiufeng Wang, Anh Nguyen, Kaizhu Huang http://arxiv.org/pdf/2410.04492v5 None
📝 更新 Dimensions underlying the representational alignment of deep neural networks with humans 深神经网络与人类表征对齐的潜在维度 Florian P. Mahner, Lukas Muttenthaler, Umut Güçlü, Martin N. Hebart http://arxiv.org/pdf/2406.19087v2 None
📝 更新 Task Me Anything 任意任务处理 Jieyu Zhang, Weikai Huang, Zixian Ma, Oscar Michel, Dong He, Tanmay Gupta, Wei-Chiu Ma, Ali Farhadi .etc. http://arxiv.org/pdf/2406.11775v2 None

检测分割

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Efficient Object Detection of Marine Debris using Pruned YOLO Model 基于剪枝YOLO模型的海洋垃圾高效目标检测 Abi Aryaza, Novanto Yudistira, Tibyani http://arxiv.org/pdf/2501.16571v1 None
🆕 发布 Cross-Domain Semantic Segmentation with Large Language Model-Assisted Descriptor Generation 跨域语义分割:基于大型语言模型辅助描述符生成的技术 Philip Hughes, Larry Burns, Luke Adams http://arxiv.org/pdf/2501.16467v1 None
🆕 发布 DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation DynAlign:无监督跨域分割的动态分类对齐 Han Sun, Rui Gong, Ismail Nejjar, Olga Fink http://arxiv.org/pdf/2501.16410v1 None
🆕 发布 Large Models in Dialogue for Active Perception and Anomaly Detection 大型模型在主动感知和异常检测中的对话 Tzoulio Chamiti, Nikolaos Passalis, Anastasios Tefas http://arxiv.org/pdf/2501.16300v1 None
🆕 发布 Lightweight Weighted Average Ensemble Model for Pneumonia Detection in Chest X-Ray Images 轻量级加权平均集成模型在胸部X光片中的肺炎检测 Suresh Babu Nettur, Shanthi Karpurapu, Unnati Nettur, Likhit Sagar Gajja, Sravanthy Myneni, Akhil Dusi, Lalithya Posham http://arxiv.org/pdf/2501.16249v2 None
🆕 发布 The Linear Attention Resurrection in Vision Transformer 视觉Transformer中的线性注意力复兴 Chuanyang Zheng http://arxiv.org/pdf/2501.16182v1 None
🆕 发布 Addressing Out-of-Label Hazard Detection in Dashcam Videos: Insights from the COOOL Challenge 应对行车记录仪视频中的标签外危险检测:COOOL挑战赛的见解 Anh-Kiet Duong, Petra Gomez-Krämer http://arxiv.org/pdf/2501.16037v1 https://github.com/ffyyytt/COOOL_2025.
🆕 发布 Controllable Forgetting Mechanism for Few-Shot Class-Incremental Learning 可控遗忘机制在少样本类增量学习中的应用 Kirill Paramonov, Mete Ozay, Eunju Yang, Jijoong Moon, Umberto Michieli http://arxiv.org/pdf/2501.15998v1 None
🆕 发布 D-PLS: Decoupled Semantic Segmentation for 4D-Panoptic-LiDAR-Segmentation D-PLS:4D全景激光雷达分割的解耦语义分割 Maik Steinhauser, Laurenz Reichardt, Nikolas Ebert, Oliver Wasenmüller http://arxiv.org/pdf/2501.15870v1 None
🆕 发布 Can Location Embeddings Enhance Super-Resolution of Satellite Imagery? 卫星图像超分辨率中的位置嵌入能否增强? Daniel Panangian, Ksenia Bittner http://arxiv.org/pdf/2501.15847v2 None
📝 更新 Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving 自动驾驶中输入监测用视觉基础模型的基准测试 Mert Keser, Halil Ibrahim Orhan, Niki Amini-Naieni, Gesina Schwalbe, Alois Knoll, Matthias Rottmann http://arxiv.org/pdf/2501.08083v2 None
📝 更新 Label-Efficient Data Augmentation with Video Diffusion Models for Guidewire Segmentation in Cardiac Fluoroscopy 基于视频扩散模型的标签高效数据增强在心脏荧光透视导丝分割中的应用 Shaoyan Pan, Yikang Liu, Lin Zhao, Eric Z. Chen, Xiao Chen, Terrence Chen, Shanhui Sun http://arxiv.org/pdf/2412.16050v4 None
📝 更新 Segmentation Dataset for Reinforced Concrete Construction 钢筋混凝土结构分割数据集 Patrick Schmidt, Lazaros Nalpantidis http://arxiv.org/pdf/2407.09372v2 None
📝 更新 Comprehensive Performance Evaluation of YOLO11, YOLOv10, YOLOv9 and YOLOv8 on Detecting and Counting Fruitlet in Complex Orchard Environments 全面评估YOLO11、YOLOv10、YOLOv9和YOLOv8在复杂果园环境中检测和计数幼果的性能 Ranjan Sapkota, Zhichao Meng, Martin Churuvija, Xiaoqiang Du, Zenghong Ma, Manoj Karkee http://arxiv.org/pdf/2407.12040v6 None

视频理解

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion 文档链:一个高效的AI驱动文档转换开源工具包 Nikolaos Livathinos, Christoph Auer, Maksym Lysak, Ahmed Nassar, Michele Dolfi, Panos Vagenas, Cesar Berrospi Ramis, Matteo Omenetti .etc. http://arxiv.org/pdf/2501.17887v1 None
🆕 发布 Understanding Long Videos via LLM-Powered Entity Relation Graphs 通过LLM驱动的实体关系图理解长视频 Meng Chu, Yicong Li, Tat-Seng Chua http://arxiv.org/pdf/2501.15953v1 None

生成模型

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation LoRA-X:连接基础模型与无需训练的跨模型自适应 Farzad Farhadzadeh, Debasmit Das, Shubhankar Borse, Fatih Porikli http://arxiv.org/pdf/2501.16559v1 None
🆕 发布 PackDiT: Joint Human Motion and Text Generation via Mutual Prompting PackDiT:通过相互提示联合人类动作和文本生成 Zhongyu Jiang, Wenhao Chai, Zhuoran Zhou, Cheng-Yen Yang, Hsiang-Wei Huang, Jenq-Neng Hwang http://arxiv.org/pdf/2501.16551v1 None
🆕 发布 RelightVid: Temporal-Consistent Diffusion Model for Video Relighting 视频重光照:时序一致扩散模型 Ye Fang, Zeyi Sun, Shangzhan Zhang, Tong Wu, Yinghao Xu, Pan Zhang, Jiaqi Wang, Gordon Wetzstein .etc. http://arxiv.org/pdf/2501.16330v1 None
🆕 发布 Efficient Portrait Matte Creation With Layer Diffusion and Connectivity Priors 高效的人像磨皮:基于层扩散和连接先验 Zhiyuan Lu, Hao Lu, Hua Huang http://arxiv.org/pdf/2501.16147v1 None
🆕 发布 Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation 基于槽位引导的预训练扩散模型在对象中心学习和组合生成中的应用 Adil Kaan Akan, Yucel Yemez http://arxiv.org/pdf/2501.15878v2 https://kaanakan.github.io/SlotAdapt
📝 更新 Textualize Visual Prompt for Image Editing via Diffusion Bridge 通过扩散桥文本化视觉提示进行图像编辑 Pengcheng Xu, Qingnan Fan, Fei Kou, Shuai Qin, Hong Gu, Ruoyu Zhao, Charles Ling, Boyu Wang http://arxiv.org/pdf/2501.03495v2 None
📝 更新 Make-A-Texture: Fast Shape-Aware Texture Generation in 3 Seconds 制作纹理:3秒内快速生成形状感知纹理 Xiaoyu Xiang, Liat Sless Gorelik, Yuchen Fan, Omri Armstrong, Forrest Iandola, Yilei Li, Ita Lifshitz, Rakesh Ranjan http://arxiv.org/pdf/2412.07766v2 None
📝 更新 Deciphering Oracle Bone Language with Diffusion Models 《利用扩散模型解读甲骨文语言》 Haisu Guan, Huanxin Yang, Xinyu Wang, Shengwei Han, Yongge Liu, Lianwen Jin, Xiang Bai, Yuliang Liu http://arxiv.org/pdf/2406.00684v2 https://github.com/guanhaisu/OBSD.

扩散桥

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 MatCLIP: Light- and Shape-Insensitive Assignment of PBR Material Models MatCLIP:对PBR材质模型的光照和形状无关的分配 Michael Birsak, John Femiani, Biao Zhang, Peter Wonka http://arxiv.org/pdf/2501.15981v1 None

流模型

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 ARFlow: Autogressive Flow with Hybrid Linear Attention ARFlow:具有混合线性注意力的自回归流 Mude Hui, Rui-Jie Zhu, Songlin Yang, Yu Zhang, Zirui Wang, Yuyin Zhou, Jason Eshraghian, Cihang Xie http://arxiv.org/pdf/2501.16085v1 None

图像处理

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration 引导Mamba处理复杂纹理:一种高效的纹理感知状态空间模型用于图像恢复 Long Peng, Xin Di, Zhanfeng Feng, Wenbo Li, Renjing Pei, Yang Wang, Xueyang Fu, Yang Cao .etc. http://arxiv.org/pdf/2501.16583v1 None
🆕 发布 Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity 混合曼巴:通过模态感知稀疏性增强多模态状态空间模型 Weixin Liang, Junhong Shen, Genghan Zhang, Ning Dong, Luke Zettlemoyer, Lili Yu http://arxiv.org/pdf/2501.16295v1 https://github.com/Weixin-Liang/Mixture-of-Mamba
🆕 发布 SPECIAL: Zero-shot Hyperspectral Image Classification With CLIP 特别篇:基于CLIP的零样本高光谱图像分类 Li Pang, Jing Yao, Kaiyu Li, Xiangyong Cao http://arxiv.org/pdf/2501.16222v2 https://github.com/LiPang/SPECIAL.
🆕 发布 UDBE: Unsupervised Diffusion-based Brightness Enhancement in Underwater Images 无监督水下图像扩散亮度增强:UDBE Tatiana Taís Schein, Gustavo Pereira de Almeira, Stephanie Loi Brião, Rodrigo Andrade de Bem, Felipe Gomes de Oliveira, Paulo L. J. Drews-Jr http://arxiv.org/pdf/2501.16211v1 https://github.com/gusanagy/UDBE.
🆕 发布 CILP-FGDI: Exploiting Vision-Language Model for Generalizable Person Re-Identification CILP-FGDI:利用视觉-语言模型进行泛化的人脸重识别 Huazhong Zhao, Lei Qi, Xin Geng http://arxiv.org/pdf/2501.16065v2 None
🆕 发布 Freestyle Sketch-in-the-Loop Image Segmentation 自由式循环草图图像分割 Subhadeep Koley, Viswanatha Reddy Gajjala, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Ayan Kumar Bhunia, Yi-Zhe Song http://arxiv.org/pdf/2501.16022v1 None
🆕 发布 CausalSR: Structural Causal Model-Driven Super-Resolution with Counterfactual Inference 因果超分辨率:基于结构因果模型和反事实推理的超级分辨率 Zhengyang Lu, Bingjie Lu, Feng Wang http://arxiv.org/pdf/2501.15852v1 None
🆕 发布 MM-Retinal V2: Transfer an Elite Knowledge Spark into Fundus Vision-Language Pretraining MM-Retinal V2:将精英知识火花迁移至眼底视觉-语言预训练 Ruiqi Wu, Na Su, Chenran Zhang, Tengfei Ma, Tao Zhou, Zhiting Cui, Nianfeng Tang, Tianyu Mao .etc. http://arxiv.org/pdf/2501.15798v1 https://github.com/lxirich/MM-Retinal.
🆕 发布 Efficient Attention-Sharing Information Distillation Transformer for Lightweight Single Image Super-Resolution 高效注意力共享信息蒸馏Transformer用于轻量级单图像超分辨率 Karam Park, Jae Woong Soh, Nam Ik Cho http://arxiv.org/pdf/2501.15774v1 None
🆕 发布 VLMaterial: Procedural Material Generation with Large Vision-Language Models VLMaterial:基于大型视觉-语言模型的程序化材质生成 Beichen Li, Rundi Wu, Armando Solar-Lezama, Changxi Zheng, Liang Shi, Bernd Bickel, Wojciech Matusik http://arxiv.org/pdf/2501.18623v1 None
📝 更新 Text-driven Adaptation of Foundation Models for Few-shot Surgical Workflow Analysis 基于文本驱动的基座模型在少样本手术流程分析中的应用 Tingxuan Chen, Kun Yuan, Vinkle Srivastav, Nassir Navab, Nicolas Padoy http://arxiv.org/pdf/2501.09555v2 https://github.com/CAMMA-public/Surg-FTDA
📝 更新 MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning MoColl:基于代理的特定和通用模型协作进行图像描述 Pu Yang, Bin Dong http://arxiv.org/pdf/2501.01834v3 None
📝 更新 Accelerating lensed quasar discovery and modeling with physics-informed variational autoencoders 加速使用物理信息变分自编码器进行透镜引力透镜类星体发现和建模 Irham T. Andika, Stefan Schuldt, Sherry H. Suyu, Satadru Bag, Raoul Cañameras, Alejandra Melo, Claudio Grillo, James H. H. Chan http://arxiv.org/pdf/2412.12709v3 None
📝 更新 BioTrove: A Large Curated Image Dataset Enabling AI for Biodiversity 生物宝库:一个大型精选图像数据集,助力人工智能在生物多样性领域的应用 Chih-Hsuan Yang, Benjamin Feuer, Zaki Jubery, Zi K. Deng, Andre Nakkab, Md Zahid Hasan, Shivani Chiranjeevi, Kelly Marshall .etc. http://arxiv.org/pdf/2406.17720v2 None
📝 更新 Learning Point Spread Function Invertibility Assessment for Image Deconvolution 学习图像去卷积中点扩散函数可逆性评估 Romario Gualdrón-Hurtado, Roman Jacome, Sergio Urrea, Henry Arguello, Luis Gonzalez http://arxiv.org/pdf/2405.16343v3 None
📝 更新 A New Cross-Space Total Variation Regularization Model for Color Image Restoration with Quaternion Blur Operator 一种基于四元数模糊算子的彩色图像恢复的新跨空间全变分正则化模型 Zhigang Jia, Yuelian Xiang, Meixiang Zhao, Tingting Wu, Michael K. Ng http://arxiv.org/pdf/2405.12114v3 None
📝 更新 QOC: Quantum On-Chip Training with Parameter Shift and Gradient Pruning QOC:基于参数移位和梯度剪枝的片上量子训练 Hanrui Wang, Zirui Li, Jiaqi Gu, Yongshan Ding, David Z. Pan, Song Han http://arxiv.org/pdf/2202.13239v3 None

3D场景

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Automatic Calibration of a Multi-Camera System with Limited Overlapping Fields of View for 3D Surgical Scene Reconstruction 多摄像头系统有限重叠视场自动校准用于三维手术场景重建 Tim Flückiger, Jonas Hein, Valery Fischer, Philipp Fürnstahl, Lilian Calvet http://arxiv.org/pdf/2501.16221v2 None
🆕 发布 3D Reconstruction of non-visible surfaces of objects from a Single Depth View -- Comparative Study 从单张深度图中重建物体不可见表面的3D重建——比较研究 Rafał Staszak, Piotr Michałek, Jakub Chudziński, Marek Kopicki, Dominik Belter http://arxiv.org/pdf/2501.16101v1 None

神经渲染

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 LinPrim: Linear Primitives for Differentiable Volumetric Rendering 线性基元:可微分体渲染的线性原语 Nicolas von Lützow, Matthias Nießner http://arxiv.org/pdf/2501.16312v2 None
🆕 发布 A Radiance Field Loss for Fast and Simple Emissive Surface Reconstruction 辐射场损失用于快速简单发射表面重建 Ziyi Zhang, Nicolas Roussel, Thomas Müller, Tizian Zeltner, Merlin Nimier-David, Fabrice Rousselle, Wenzel Jakob http://arxiv.org/pdf/2501.18627v1 None
🆕 发布 Efficiency Bottlenecks of Convolutional Kolmogorov-Arnold Networks: A Comprehensive Scrutiny with ImageNet, AlexNet, LeNet and Tabular Classification 卷积柯尔莫哥洛夫-阿诺德网络效率瓶颈:基于ImageNet、AlexNet、LeNet和表格分类的全面审视 Ashim Dahal, Saydul Akbar Murad, Nick Rahimi http://arxiv.org/pdf/2501.15757v2 https://github.com/ashimdahal/Study-of-Convolutional-Kolmogorov-Arnold-networks

3DGS

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Deformable Beta Splatting 可变形贝塔分层 Rong Liu, Dylan Sun, Meida Chen, Yue Wang, Andrew Feng http://arxiv.org/pdf/2501.18630v1 None
📝 更新 3DGS$^2$: Near Second-order Converging 3D Gaussian Splatting 3DGS$^2$:近二阶收敛的3D高斯分层渲染 Lei Lan, Tianjia Shao, Zixuan Lu, Yu Zhang, Chenfanfu Jiang, Yin Yang http://arxiv.org/pdf/2501.13975v2 None
📝 更新 EasySplat: View-Adaptive Learning makes 3D Gaussian Splatting Easy EasySplat:视图自适应学习让3D高斯分层变得简单 Ao Gao, Luosong Guo, Tao Chen, Zhao Wang, Ying Tai, Jian Yang, Zhenyu Zhang http://arxiv.org/pdf/2501.01003v2 None
📝 更新 PEP-GS: Perceptually-Enhanced Precise Structured 3D Gaussians for View-Adaptive Rendering 感知增强精确结构化3D高斯用于视适应渲染 Junxi Jin, Xiulai Li, Haiping Huang, Lianjun Liu, Yujie Sun, Boyi Liu http://arxiv.org/pdf/2411.05731v2 None

多模态

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers FALCON:通过视觉注册解决高分辨率多模态大型语言模型中的视觉冗余和碎片化 Renshan Zhang, Rui Shao, Gongwei Chen, Kaiwen Zhou, Weili Guan, Liqiang Nie http://arxiv.org/pdf/2501.16297v1 None
🆕 发布 Can Multimodal Large Language Models be Guided to Improve Industrial Anomaly Detection? 多模态大型语言模型能否被引导以提升工业异常检测? Zhiling Chen, Hanning Chen, Mohsen Imani, Farhad Imani http://arxiv.org/pdf/2501.15795v1 None
📝 更新 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining 2.5年课堂经验:视觉-语言预训练的多模态教科书 Wenqi Zhang, Hang Zhang, Xin Li, Jiashuo Sun, Yongliang Shen, Weiming Lu, Deli Zhao, Yueting Zhuang .etc. http://arxiv.org/pdf/2501.00958v3 https://github.com/DAMO-NLP-SG/multimodal_textbook.
📝 更新 TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data TEOChat:一种用于时序地球观测数据的大规模视觉语言助手 Jeremy Andrew Irvin, Emily Ruoyu Liu, Joyce Chuyi Chen, Ines Dormoy, Jinyoung Kim, Samar Khanna, Zhuo Zheng, Stefano Ermon http://arxiv.org/pdf/2410.06234v2 https://github.com/ermongroup/TEOChat
📝 更新 E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection E2E-MFD:迈向端到端同步多模态融合检测 Jiaqing Zhang, Mingxiang Cao, Weiying Xie, Jie Lei, Daixun Li, Wenbo Huang, Yunsong Li, Xue Yang http://arxiv.org/pdf/2403.09323v4 https://github.com/icey-zhang/E2E-MFD.

具身智能

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 PhysAnimator: Physics-Guided Generative Cartoon Animation 物理引导的生成卡通动画:PhysAnimator Tianyi Xie, Yiwei Zhao, Ying Jiang, Chenfanfu Jiang http://arxiv.org/pdf/2501.16550v1 None
🆕 发布 Objects matter: object-centric world models improve reinforcement learning in visually complex environments 物体至上:以物体为中心的世界模型提升视觉复杂环境中的强化学习 Weipu Zhang, Adam Jelley, Trevor McInroe, Amos Storkey http://arxiv.org/pdf/2501.16443v1 None
🆕 发布 Improving Tropical Cyclone Forecasting With Video Diffusion Models 利用视频扩散模型提升热带气旋预报 Zhibo Ren, Pritthijit Nath, Pancham Shukla http://arxiv.org/pdf/2501.16003v1 https://github.com/Ren-creater/forecast-video-diffmodels.
🆕 发布 Evaluating Data Influence in Meta Learning 评估元学习中的数据影响 Chenyang Ren, Huanyi Xie, Shu Yang, Meng Ding, Lijie Hu, Di Wang http://arxiv.org/pdf/2501.15963v1 None
🆕 发布 The Components of Collaborative Joint Perception and Prediction -- A Conceptual Framework 协同联合感知与预测的组成部分——一个概念框架 Lei Wan, Hannan Ejaz Keen, Alexey Vinel http://arxiv.org/pdf/2501.15860v1 None
📝 更新 GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration GUI-Bee:通过自主探索将GUI动作定位与新型环境对齐 Yue Fan, Handong Zhao, Ruiyi Zhang, Yu Shen, Xin Eric Wang, Gang Wu http://arxiv.org/pdf/2501.13896v2 None
📝 更新 Towards Kriging-informed Conditional Diffusion for Regional Sea-Level Data Downscaling 向区域海平面数据降尺度迈进:基于克里金信息的条件扩散 Subhankar Ghosh, Arun Sharma, Jayant Gupta, Aneesh Subramanian, Shashi Shekhar http://arxiv.org/pdf/2410.15628v3 None

人体分析

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Toward Efficient Generalization in 3D Human Pose Estimation via a Canonical Domain Approach 迈向通过规范域方法实现高效泛化的3D人体姿态估计 Hoosang Lee, Jeha Ryu http://arxiv.org/pdf/2501.16146v1 None
🆕 发布 Automated Detection of Sport Highlights from Audio and Video Sources 从音频和视频源自动检测体育精彩瞬间 Francesco Della Santa, Morgana Lalli http://arxiv.org/pdf/2501.16100v2 None
🆕 发布 NanoHTNet: Nano Human Topology Network for Efficient 3D Human Pose Estimation 纳米人拓扑网络:用于高效3D人体姿态估计的纳米人类拓扑网络 Jialun Cai, Mengyuan Liu, Hong Liu, Wenhao Li, Shuheng Zhou http://arxiv.org/pdf/2501.15763v1 https://github.com/vefalun/NanoHTNet.
📝 更新 VCRScore: Image captioning metric based on V&L Transformers, CLIP, and precision-recall VCRScore:基于V&L Transformers、CLIP和精确率-召回率的图像标题度量标准 Guillermo Ruiz, Tania Ramírez, Daniela Moctezuma http://arxiv.org/pdf/2501.09155v2 None
📝 更新 From Dashcam Videos to Driving Simulations: Stress Testing Automated Vehicles against Rare Events 从行车记录仪视频到驾驶模拟:对自动驾驶汽车进行罕见事件的压力测试 Yan Miao, Georgios Fainekos, Bardh Hoxha, Hideki Okamoto, Danil Prokhorov, Sayan Mitra http://arxiv.org/pdf/2411.16027v2 None

人脸技术

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 LLM-attacker: Enhancing Closed-loop Adversarial Scenario Generation for Autonomous Driving with Large Language Models LLM-attacker:利用大型语言模型增强自动驾驶闭环对抗场景生成 Yuewen Mei, Tong Nie, Jian Sun, Ye Tian http://arxiv.org/pdf/2501.15850v1 None
📝 更新 MADation: Face Morphing Attack Detection with Foundation Models MADation:基于基础模型的表情合成攻击检测 Eduarda Caldeira, Guray Ozgur, Tahar Chettaoui, Marija Ivanovska, Peter Peer, Fadi Boutros, Vitomir Struc, Naser Damer http://arxiv.org/pdf/2501.03800v3 https://github.com/gurayozgur/MADation

数字人

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 BAG: Body-Aligned 3D Wearable Asset Generation BAG:基于身体对齐的3D可穿戴资产生成 Zhongjin Luo, Yang Li, Mingrui Zhang, Senbo Wang, Han Yan, Xibin Song, Taizhang Shang, Wei Mao .etc. http://arxiv.org/pdf/2501.16177v1 https://bag-3d.github.io/.
🆕 发布 A Data-Centric Approach: Dimensions of Visual Complexity and How to find Them 数据驱动的解决方案:视觉复杂性的维度及其发现方法 Karahan Sarıtaş, Tingke Shen, Surabhi S Nath, Peter Dayan http://arxiv.org/pdf/2501.15890v1 None
🆕 发布 ClearSight: Human Vision-Inspired Solutions for Event-Based Motion Deblurring 清晰视界:基于人类视觉的动态模糊去噪解决方案 Xiaopeng Lin, Yulong Huang, Hongwei Ren, Zunchang Liu, Yue Zhou, Haotian Fu, Bojun Cheng http://arxiv.org/pdf/2501.15808v1 None
🆕 发布 Do Existing Testing Tools Really Uncover Gender Bias in Text-to-Image Models? 现有测试工具真的能揭示文本到图像模型中的性别偏见吗? Yunbo Lyu, Zhou Yang, Yuqing Niu, Jing Jiang, David Lo http://arxiv.org/pdf/2501.15775v1 None

模型优化

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 BiFold: Bimanual Cloth Folding with Language Guidance 双面折叠:语言引导下的双手布料折叠 Oriol Barbany, Adrià Colomé, Carme Torras http://arxiv.org/pdf/2501.16458v1 None
🆕 发布 Return of the Encoder: Maximizing Parameter Efficiency for SLMs 编码器归来:最大化SLMs的参数效率 Mohamed Elfeki, Rui Liu, Chad Voegele http://arxiv.org/pdf/2501.16273v2 None
🆕 发布 Distilling foundation models for robust and efficient models in digital pathology 从基础模型中提炼出数字病理学中的鲁棒和高效模型 Alexandre Filiot, Nicolas Dop, Oussama Tchita, Auriane Riou, Rémy Dubois, Thomas Peeters, Daria Valter, Marin Scalbert .etc. http://arxiv.org/pdf/2501.16239v2 None
🆕 发布 Rethinking the Bias of Foundation Model under Long-tailed Distribution 重新思考长尾分布下基础模型的偏差 Jiahao Chen, Bin Qin, Jiangmeng Li, Hao Chen, Bing Su http://arxiv.org/pdf/2501.15955v1 None
🆕 发布 Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks 任意到任意Tryon:利用自适应位置嵌入实现多功能的虚拟服装任务 Hailong Guo, Bohan Zeng, Yiren Song, Wentao Zhang, Chuang Zhang, Jiaming Liu http://arxiv.org/pdf/2501.15891v1 https://logn-2024.github.io/Any2anyTryonProjectPage
🆕 发布 Controllable Hand Grasp Generation for HOI and Efficient Evaluation Methods 可控手部抓取生成用于人机交互和高效评估方法 Ishant, Rongliang Wu, Joo Hwee Lim http://arxiv.org/pdf/2501.15839v1 None
📝 更新 Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning 通过互补掩码实现隐式位置-标题对齐的弱监督密集视频字幕生成 Shiping Ge, Qiang Chen, Zhiwei Jiang, Yafeng Yin, Liu Qin, Ziyao Chen, Qing Gu http://arxiv.org/pdf/2412.12791v2 None
📝 更新 CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes CAFuser:基于条件感知的多模态融合,用于驾驶场景的鲁棒语义感知 Tim Broedermann, Christos Sakaridis, Yuqian Fu, Luc Van Gool http://arxiv.org/pdf/2410.10791v2 https://github.com/timbroed/CAFuser.
📝 更新 JAM: A Comprehensive Model for Age Estimation, Verification, and Comparability JAM:一种用于年龄估计、验证和可比性的综合模型 François David, Alexey A. Novikov, Ruslan Parkhomenko, Artem Voronin, Alix Melchy http://arxiv.org/pdf/2410.04012v2 None

医学应用

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Multi-Objective Deep-Learning-based Biomechanical Deformable Image Registration with MOREA 基于多目标深度学习的生物力学可变形图像配准:使用MOREA Georgios Andreadis, Eduard Ruiz Munné, Thomas H. W. Bäck, Peter A. N. Bosman, Tanja Alderliesten http://arxiv.org/pdf/2501.16525v1 None
🆕 发布 Generating customized prompts for Zero-Shot Rare Event Medical Image Classification using LLM 基于大型语言模型生成零样本罕见事件医学图像分类的定制提示 Payal Kamboj, Ayan Banerjee, Bin Xu, Sandeep Gupta http://arxiv.org/pdf/2501.16481v1 None
🆕 发布 Object Detection for Medical Image Analysis: Insights from the RT-DETR Model 医学图像分析中的目标检测:RT-DETR模型见解 Weijie He, Yuwei Zhang, Ting Xu, Tai An, Yingbin Liang, Bo Zhang http://arxiv.org/pdf/2501.16469v1 None
🆕 发布 Adaptive Iterative Compression for High-Resolution Files: an Approach Focused on Preserving Visual Quality in Cinematic Workflows 自适应迭代压缩:一种专注于电影制作流程中保持视觉质量的解决方案 Leonardo Melo, Filipe Litaiff http://arxiv.org/pdf/2501.16319v1 None
🆕 发布 Brain-Adapter: Enhancing Neurological Disorder Analysis with Adapter-Tuning Multimodal Large Language Models 脑适配器:通过适配器调优的多模态大型语言模型增强神经系统疾病分析 Jing Zhang, Xiaowei Yu, Yanjun Lyu, Lu Zhang, Tong Chen, Chao Cao, Yan Zhuang, Minheng Chen .etc. http://arxiv.org/pdf/2501.16282v1 None
🆕 发布 CLISC: Bridging clip and sam by enhanced cam for unsupervised brain tumor segmentation CLISC:通过增强CAM连接CLIP和SAM以实现无监督脑肿瘤分割 Xiaochuan Ma, Jia Fu, Wenjun Liao, Shichuan Zhang, Guotai Wang http://arxiv.org/pdf/2501.16246v1 None
🆕 发布 Real-Time Brain Tumor Detection in Intraoperative Ultrasound Using YOLO11: From Model Training to Deployment in the Operating Room 实时术中超声脑肿瘤检测:基于YOLO11从模型训练到手术室部署 Santiago Cepeda, Olga Esteban-Sinovas, Roberto Romero, Vikas Singh, Prakash Shetty, Aliasgar Moiyadi, Ilyess Zemmoura, Giuseppe Roberto Giammalva .etc. http://arxiv.org/pdf/2501.15994v1 None
🆕 发布 Pfungst and Clever Hans: Identifying the unintended cues in a widely used Alzheimer's disease MRI dataset using explainable deep learning 《普芬施和聪明的汉斯:利用可解释深度学习识别广泛使用的阿尔茨海默病MRI数据集中未预期的提示》 Christian Tinauer, Maximilian Sackl, Rudolf Stollberger, Stefan Ropele, Christian Langkammer http://arxiv.org/pdf/2501.15831v1 None
🆕 发布 Z-Stack Scanning can Improve AI Detection of Mitosis: A Case Study of Meningiomas Z-Stack扫描可提升AI对有丝分裂的检测:脑膜瘤案例分析 Hongyan Gu, Ellie Onstott, Wenzhong Yan, Tengyou Xu, Ruolin Wang, Zida Wu, Xiang 'Anthony' Chen, Mohammad Haeri http://arxiv.org/pdf/2501.15743v1 None
🆕 发布 Leveraging Video Vision Transformer for Alzheimer's Disease Diagnosis from 3D Brain MRI 利用视频视觉Transformer从3D脑MRI诊断阿尔茨海默病 Taymaz Akan, Sait Alp, Md. Shenuarin Bhuiyan, Elizabeth A. Disbrow, Steven A. Conrad, John A. Vanchiere, Christopher G. Kevil, Mohammad A. N. Bhuiyan http://arxiv.org/pdf/2501.15733v1 None
🆕 发布 A Survey on Computational Pathology Foundation Models: Datasets, Adaptation Strategies, and Evaluation Tasks 计算病理学基础模型综述:数据集、自适应策略和评估任务 Dong Li, Guihong Wan, Xintao Wu, Xinyu Wu, Ajit J. Nirmal, Christine G. Lian, Peter K. Sorger, Yevgeniy R. Semenov .etc. http://arxiv.org/pdf/2501.15724v1 None
🆕 发布 SeqSeg: Learning Local Segments for Automatic Vascular Model Construction SeqSeg:学习局部段以自动构建血管模型 Numi Sveinsson Cepero, Shawn C. Shadden http://arxiv.org/pdf/2501.15712v1 None
📝 更新 FedDAG: Federated Domain Adversarial Generation Towards Generalizable Medical Image Analysis 联邦域对抗生成以实现可泛化医学图像分析:FedDAG Haoxuan Che, Yifei Wu, Haibo Jin, Yong Xia, Hao Chen http://arxiv.org/pdf/2501.13967v2 None
📝 更新 Slot-BERT: Self-supervised Object Discovery in Surgical Video 槽位BERT:手术视频中的自监督物体发现 Guiqiu Liao, Matjaz Jogan, Marcel Hussing, Kenta Nakahashi, Kazuhiro Yasufuku, Amin Madani, Eric Eaton, Daniel A. Hashimoto http://arxiv.org/pdf/2501.12477v2 None
📝 更新 Multi-Tiered Self-Contrastive Learning for Medical Microwave Radiometry (MWR) Breast Cancer Detection 多层自对比学习在医学微波辐射计(MWR)乳腺癌检测中的应用 Christoforos Galazis, Huiyi Wu, Igor Goryanin http://arxiv.org/pdf/2410.04636v2 https://github.com/cgalaz01/self_contrastive_mwr.
📝 更新 MSDet: Receptive Field Enhanced Multiscale Detection for Tiny Pulmonary Nodule MSDet:针对微小肺结节的多尺度检测感受野增强方法 Guohui Cai, Ruicheng Zhang, Hongyang He, Zeyu Zhang, Daji Ergu, Yuanzhouhan Cao, Jinman Zhao, Binbin Hu .etc. http://arxiv.org/pdf/2409.14028v2 https://github.com/CaiGuoHui123/MSDet
📝 更新 Generative Adversarial Networks in Ultrasound Imaging: Extending Field of View Beyond Conventional Limits 超声成像中生成对抗网络:扩展视场超越传统限制 Matej Gazda, Samuel Kadoury, Jakub Gazda, Peter Drotar http://arxiv.org/pdf/2405.20981v2 None
📝 更新 MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis MedPromptX:基于地面多模态提示的胸部X光诊断 Mai A. Shaaban, Adnan Khan, Mohammad Yaqub http://arxiv.org/pdf/2403.15585v4 https://github.com/BioMedIA-MBZUAI/MedPromptX.

其他

状态 英文标题 中文标题 作者 PDF链接 代码链接
📝 更新 Evaluation of GPT-4o and GPT-4o-mini's Vision Capabilities for Compositional Analysis from Dried Solution Drops GPT-4o和GPT-4o-mini在干溶液滴组合分析中的视觉能力评估 Deven B. Dangi, Beni B. Dangi, Oliver Steinbock http://arxiv.org/pdf/2412.10587v2 None