[UPDATED!] 2025-01-27 (Update Time)

图像理解

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding	物理世界理解中的视觉-语言模型基准与提升：PhysBench	Wei Chow, Jiageng Mao, Boyi Li, Daniel Seita, Vitor Guizilini, Yue Wang	http://arxiv.org/pdf/2501.16411v2	None
🆕 发布	Multi-view Structural Convolution Network for Domain-Invariant Point Cloud Recognition of Autonomous Vehicles	多视角结构卷积网络用于自动驾驶车辆领域不变点云识别	Younggun Kim, Beomsik Cho, Seonghoon Ryoo, Soomok Lee	http://arxiv.org/pdf/2501.16289v1	https://github.com/MLMLab/MSCN.
🆕 发布	PDC-ViT : Source Camera Identification using Pixel Difference Convolution and Vision Transformer	PDC-ViT：基于像素差异卷积和视觉Transformer的源相机识别	Omar Elharrouss, Younes Akbari, Noor Almaadeed, Somaya Al-Maadeed, Fouad Khelifi, Ahmed Bouridane	http://arxiv.org/pdf/2501.16227v1	None
📝 更新	Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual Classification	解读您的决策：视觉分类中的逻辑推理正则化以实现泛化	Zhaorui Tan, Xi Yang, Qiufeng Wang, Anh Nguyen, Kaizhu Huang	http://arxiv.org/pdf/2410.04492v5	None
📝 更新	Dimensions underlying the representational alignment of deep neural networks with humans	深神经网络与人类表征对齐的潜在维度	Florian P. Mahner, Lukas Muttenthaler, Umut Güçlü, Martin N. Hebart	http://arxiv.org/pdf/2406.19087v2	None
📝 更新	Task Me Anything	任意任务处理	Jieyu Zhang, Weikai Huang, Zixian Ma, Oscar Michel, Dong He, Tanmay Gupta, Wei-Chiu Ma, Ali Farhadi .etc.	http://arxiv.org/pdf/2406.11775v2	None

检测分割

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Efficient Object Detection of Marine Debris using Pruned YOLO Model	基于剪枝YOLO模型的海洋垃圾高效目标检测	Abi Aryaza, Novanto Yudistira, Tibyani	http://arxiv.org/pdf/2501.16571v1	None
🆕 发布	Cross-Domain Semantic Segmentation with Large Language Model-Assisted Descriptor Generation	跨域语义分割：基于大型语言模型辅助描述符生成的技术	Philip Hughes, Larry Burns, Luke Adams	http://arxiv.org/pdf/2501.16467v1	None
🆕 发布	DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation	DynAlign：无监督跨域分割的动态分类对齐	Han Sun, Rui Gong, Ismail Nejjar, Olga Fink	http://arxiv.org/pdf/2501.16410v1	None
🆕 发布	Large Models in Dialogue for Active Perception and Anomaly Detection	大型模型在主动感知和异常检测中的对话	Tzoulio Chamiti, Nikolaos Passalis, Anastasios Tefas	http://arxiv.org/pdf/2501.16300v1	None
🆕 发布	Lightweight Weighted Average Ensemble Model for Pneumonia Detection in Chest X-Ray Images	轻量级加权平均集成模型在胸部X光片中的肺炎检测	Suresh Babu Nettur, Shanthi Karpurapu, Unnati Nettur, Likhit Sagar Gajja, Sravanthy Myneni, Akhil Dusi, Lalithya Posham	http://arxiv.org/pdf/2501.16249v2	None
🆕 发布	The Linear Attention Resurrection in Vision Transformer	视觉Transformer中的线性注意力复兴	Chuanyang Zheng	http://arxiv.org/pdf/2501.16182v1	None
🆕 发布	Addressing Out-of-Label Hazard Detection in Dashcam Videos: Insights from the COOOL Challenge	应对行车记录仪视频中的标签外危险检测：COOOL挑战赛的见解	Anh-Kiet Duong, Petra Gomez-Krämer	http://arxiv.org/pdf/2501.16037v1	https://github.com/ffyyytt/COOOL_2025.
🆕 发布	Controllable Forgetting Mechanism for Few-Shot Class-Incremental Learning	可控遗忘机制在少样本类增量学习中的应用	Kirill Paramonov, Mete Ozay, Eunju Yang, Jijoong Moon, Umberto Michieli	http://arxiv.org/pdf/2501.15998v1	None
🆕 发布	D-PLS: Decoupled Semantic Segmentation for 4D-Panoptic-LiDAR-Segmentation	D-PLS：4D全景激光雷达分割的解耦语义分割	Maik Steinhauser, Laurenz Reichardt, Nikolas Ebert, Oliver Wasenmüller	http://arxiv.org/pdf/2501.15870v1	None
🆕 发布	Can Location Embeddings Enhance Super-Resolution of Satellite Imagery?	卫星图像超分辨率中的位置嵌入能否增强？	Daniel Panangian, Ksenia Bittner	http://arxiv.org/pdf/2501.15847v2	None
📝 更新	Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving	自动驾驶中输入监测用视觉基础模型的基准测试	Mert Keser, Halil Ibrahim Orhan, Niki Amini-Naieni, Gesina Schwalbe, Alois Knoll, Matthias Rottmann	http://arxiv.org/pdf/2501.08083v2	None
📝 更新	Label-Efficient Data Augmentation with Video Diffusion Models for Guidewire Segmentation in Cardiac Fluoroscopy	基于视频扩散模型的标签高效数据增强在心脏荧光透视导丝分割中的应用	Shaoyan Pan, Yikang Liu, Lin Zhao, Eric Z. Chen, Xiao Chen, Terrence Chen, Shanhui Sun	http://arxiv.org/pdf/2412.16050v4	None
📝 更新	Segmentation Dataset for Reinforced Concrete Construction	钢筋混凝土结构分割数据集	Patrick Schmidt, Lazaros Nalpantidis	http://arxiv.org/pdf/2407.09372v2	None
📝 更新	Comprehensive Performance Evaluation of YOLO11, YOLOv10, YOLOv9 and YOLOv8 on Detecting and Counting Fruitlet in Complex Orchard Environments	全面评估YOLO11、YOLOv10、YOLOv9和YOLOv8在复杂果园环境中检测和计数幼果的性能	Ranjan Sapkota, Zhichao Meng, Martin Churuvija, Xiaoqiang Du, Zenghong Ma, Manoj Karkee	http://arxiv.org/pdf/2407.12040v6	None

视频理解

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion	文档链：一个高效的AI驱动文档转换开源工具包	Nikolaos Livathinos, Christoph Auer, Maksym Lysak, Ahmed Nassar, Michele Dolfi, Panos Vagenas, Cesar Berrospi Ramis, Matteo Omenetti .etc.	http://arxiv.org/pdf/2501.17887v1	None
🆕 发布	Understanding Long Videos via LLM-Powered Entity Relation Graphs	通过LLM驱动的实体关系图理解长视频	Meng Chu, Yicong Li, Tat-Seng Chua	http://arxiv.org/pdf/2501.15953v1	None

生成模型

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation	LoRA-X：连接基础模型与无需训练的跨模型自适应	Farzad Farhadzadeh, Debasmit Das, Shubhankar Borse, Fatih Porikli	http://arxiv.org/pdf/2501.16559v1	None
🆕 发布	PackDiT: Joint Human Motion and Text Generation via Mutual Prompting	PackDiT：通过相互提示联合人类动作和文本生成	Zhongyu Jiang, Wenhao Chai, Zhuoran Zhou, Cheng-Yen Yang, Hsiang-Wei Huang, Jenq-Neng Hwang	http://arxiv.org/pdf/2501.16551v1	None
🆕 发布	RelightVid: Temporal-Consistent Diffusion Model for Video Relighting	视频重光照：时序一致扩散模型	Ye Fang, Zeyi Sun, Shangzhan Zhang, Tong Wu, Yinghao Xu, Pan Zhang, Jiaqi Wang, Gordon Wetzstein .etc.	http://arxiv.org/pdf/2501.16330v1	None
🆕 发布	Efficient Portrait Matte Creation With Layer Diffusion and Connectivity Priors	高效的人像磨皮：基于层扩散和连接先验	Zhiyuan Lu, Hao Lu, Hua Huang	http://arxiv.org/pdf/2501.16147v1	None
🆕 发布	Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation	基于槽位引导的预训练扩散模型在对象中心学习和组合生成中的应用	Adil Kaan Akan, Yucel Yemez	http://arxiv.org/pdf/2501.15878v2	https://kaanakan.github.io/SlotAdapt
📝 更新	Textualize Visual Prompt for Image Editing via Diffusion Bridge	通过扩散桥文本化视觉提示进行图像编辑	Pengcheng Xu, Qingnan Fan, Fei Kou, Shuai Qin, Hong Gu, Ruoyu Zhao, Charles Ling, Boyu Wang	http://arxiv.org/pdf/2501.03495v2	None
📝 更新	Make-A-Texture: Fast Shape-Aware Texture Generation in 3 Seconds	制作纹理：3秒内快速生成形状感知纹理	Xiaoyu Xiang, Liat Sless Gorelik, Yuchen Fan, Omri Armstrong, Forrest Iandola, Yilei Li, Ita Lifshitz, Rakesh Ranjan	http://arxiv.org/pdf/2412.07766v2	None
📝 更新	Deciphering Oracle Bone Language with Diffusion Models	《利用扩散模型解读甲骨文语言》	Haisu Guan, Huanxin Yang, Xinyu Wang, Shengwei Han, Yongge Liu, Lianwen Jin, Xiang Bai, Yuliang Liu	http://arxiv.org/pdf/2406.00684v2	https://github.com/guanhaisu/OBSD.

扩散桥

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	MatCLIP: Light- and Shape-Insensitive Assignment of PBR Material Models	MatCLIP：对PBR材质模型的光照和形状无关的分配	Michael Birsak, John Femiani, Biao Zhang, Peter Wonka	http://arxiv.org/pdf/2501.15981v1	None

流模型

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	ARFlow: Autogressive Flow with Hybrid Linear Attention	ARFlow：具有混合线性注意力的自回归流	Mude Hui, Rui-Jie Zhu, Songlin Yang, Yu Zhang, Zirui Wang, Yuyin Zhou, Jason Eshraghian, Cihang Xie	http://arxiv.org/pdf/2501.16085v1	None

图像处理

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration	引导Mamba处理复杂纹理：一种高效的纹理感知状态空间模型用于图像恢复	Long Peng, Xin Di, Zhanfeng Feng, Wenbo Li, Renjing Pei, Yang Wang, Xueyang Fu, Yang Cao .etc.	http://arxiv.org/pdf/2501.16583v1	None
🆕 发布	Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity	混合曼巴：通过模态感知稀疏性增强多模态状态空间模型	Weixin Liang, Junhong Shen, Genghan Zhang, Ning Dong, Luke Zettlemoyer, Lili Yu	http://arxiv.org/pdf/2501.16295v1	https://github.com/Weixin-Liang/Mixture-of-Mamba
🆕 发布	SPECIAL: Zero-shot Hyperspectral Image Classification With CLIP	特别篇：基于CLIP的零样本高光谱图像分类	Li Pang, Jing Yao, Kaiyu Li, Xiangyong Cao	http://arxiv.org/pdf/2501.16222v2	https://github.com/LiPang/SPECIAL.
🆕 发布	UDBE: Unsupervised Diffusion-based Brightness Enhancement in Underwater Images	无监督水下图像扩散亮度增强：UDBE	Tatiana Taís Schein, Gustavo Pereira de Almeira, Stephanie Loi Brião, Rodrigo Andrade de Bem, Felipe Gomes de Oliveira, Paulo L. J. Drews-Jr	http://arxiv.org/pdf/2501.16211v1	https://github.com/gusanagy/UDBE.
🆕 发布	CILP-FGDI: Exploiting Vision-Language Model for Generalizable Person Re-Identification	CILP-FGDI：利用视觉-语言模型进行泛化的人脸重识别	Huazhong Zhao, Lei Qi, Xin Geng	http://arxiv.org/pdf/2501.16065v2	None
🆕 发布	Freestyle Sketch-in-the-Loop Image Segmentation	自由式循环草图图像分割	Subhadeep Koley, Viswanatha Reddy Gajjala, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Ayan Kumar Bhunia, Yi-Zhe Song	http://arxiv.org/pdf/2501.16022v1	None
🆕 发布	CausalSR: Structural Causal Model-Driven Super-Resolution with Counterfactual Inference	因果超分辨率：基于结构因果模型和反事实推理的超级分辨率	Zhengyang Lu, Bingjie Lu, Feng Wang	http://arxiv.org/pdf/2501.15852v1	None
🆕 发布	MM-Retinal V2: Transfer an Elite Knowledge Spark into Fundus Vision-Language Pretraining	MM-Retinal V2：将精英知识火花迁移至眼底视觉-语言预训练	Ruiqi Wu, Na Su, Chenran Zhang, Tengfei Ma, Tao Zhou, Zhiting Cui, Nianfeng Tang, Tianyu Mao .etc.	http://arxiv.org/pdf/2501.15798v1	https://github.com/lxirich/MM-Retinal.
🆕 发布	Efficient Attention-Sharing Information Distillation Transformer for Lightweight Single Image Super-Resolution	高效注意力共享信息蒸馏Transformer用于轻量级单图像超分辨率	Karam Park, Jae Woong Soh, Nam Ik Cho	http://arxiv.org/pdf/2501.15774v1	None
🆕 发布	VLMaterial: Procedural Material Generation with Large Vision-Language Models	VLMaterial：基于大型视觉-语言模型的程序化材质生成	Beichen Li, Rundi Wu, Armando Solar-Lezama, Changxi Zheng, Liang Shi, Bernd Bickel, Wojciech Matusik	http://arxiv.org/pdf/2501.18623v1	None
📝 更新	Text-driven Adaptation of Foundation Models for Few-shot Surgical Workflow Analysis	基于文本驱动的基座模型在少样本手术流程分析中的应用	Tingxuan Chen, Kun Yuan, Vinkle Srivastav, Nassir Navab, Nicolas Padoy	http://arxiv.org/pdf/2501.09555v2	https://github.com/CAMMA-public/Surg-FTDA
📝 更新	MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning	MoColl：基于代理的特定和通用模型协作进行图像描述	Pu Yang, Bin Dong	http://arxiv.org/pdf/2501.01834v3	None
📝 更新	Accelerating lensed quasar discovery and modeling with physics-informed variational autoencoders	加速使用物理信息变分自编码器进行透镜引力透镜类星体发现和建模	Irham T. Andika, Stefan Schuldt, Sherry H. Suyu, Satadru Bag, Raoul Cañameras, Alejandra Melo, Claudio Grillo, James H. H. Chan	http://arxiv.org/pdf/2412.12709v3	None
📝 更新	BioTrove: A Large Curated Image Dataset Enabling AI for Biodiversity	生物宝库：一个大型精选图像数据集，助力人工智能在生物多样性领域的应用	Chih-Hsuan Yang, Benjamin Feuer, Zaki Jubery, Zi K. Deng, Andre Nakkab, Md Zahid Hasan, Shivani Chiranjeevi, Kelly Marshall .etc.	http://arxiv.org/pdf/2406.17720v2	None
📝 更新	Learning Point Spread Function Invertibility Assessment for Image Deconvolution	学习图像去卷积中点扩散函数可逆性评估	Romario Gualdrón-Hurtado, Roman Jacome, Sergio Urrea, Henry Arguello, Luis Gonzalez	http://arxiv.org/pdf/2405.16343v3	None
📝 更新	A New Cross-Space Total Variation Regularization Model for Color Image Restoration with Quaternion Blur Operator	一种基于四元数模糊算子的彩色图像恢复的新跨空间全变分正则化模型	Zhigang Jia, Yuelian Xiang, Meixiang Zhao, Tingting Wu, Michael K. Ng	http://arxiv.org/pdf/2405.12114v3	None
📝 更新	QOC: Quantum On-Chip Training with Parameter Shift and Gradient Pruning	QOC：基于参数移位和梯度剪枝的片上量子训练	Hanrui Wang, Zirui Li, Jiaqi Gu, Yongshan Ding, David Z. Pan, Song Han	http://arxiv.org/pdf/2202.13239v3	None

3D场景

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Automatic Calibration of a Multi-Camera System with Limited Overlapping Fields of View for 3D Surgical Scene Reconstruction	多摄像头系统有限重叠视场自动校准用于三维手术场景重建	Tim Flückiger, Jonas Hein, Valery Fischer, Philipp Fürnstahl, Lilian Calvet	http://arxiv.org/pdf/2501.16221v2	None
🆕 发布	3D Reconstruction of non-visible surfaces of objects from a Single Depth View -- Comparative Study	从单张深度图中重建物体不可见表面的3D重建——比较研究	Rafał Staszak, Piotr Michałek, Jakub Chudziński, Marek Kopicki, Dominik Belter	http://arxiv.org/pdf/2501.16101v1	None

神经渲染

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	LinPrim: Linear Primitives for Differentiable Volumetric Rendering	线性基元：可微分体渲染的线性原语	Nicolas von Lützow, Matthias Nießner	http://arxiv.org/pdf/2501.16312v2	None
🆕 发布	A Radiance Field Loss for Fast and Simple Emissive Surface Reconstruction	辐射场损失用于快速简单发射表面重建	Ziyi Zhang, Nicolas Roussel, Thomas Müller, Tizian Zeltner, Merlin Nimier-David, Fabrice Rousselle, Wenzel Jakob	http://arxiv.org/pdf/2501.18627v1	None
🆕 发布	Efficiency Bottlenecks of Convolutional Kolmogorov-Arnold Networks: A Comprehensive Scrutiny with ImageNet, AlexNet, LeNet and Tabular Classification	卷积柯尔莫哥洛夫-阿诺德网络效率瓶颈：基于ImageNet、AlexNet、LeNet和表格分类的全面审视	Ashim Dahal, Saydul Akbar Murad, Nick Rahimi	http://arxiv.org/pdf/2501.15757v2	https://github.com/ashimdahal/Study-of-Convolutional-Kolmogorov-Arnold-networks

3DGS

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Deformable Beta Splatting	可变形贝塔分层	Rong Liu, Dylan Sun, Meida Chen, Yue Wang, Andrew Feng	http://arxiv.org/pdf/2501.18630v1	None
📝 更新	3DGS$^2$: Near Second-order Converging 3D Gaussian Splatting	3DGS$^2$：近二阶收敛的3D高斯分层渲染	Lei Lan, Tianjia Shao, Zixuan Lu, Yu Zhang, Chenfanfu Jiang, Yin Yang	http://arxiv.org/pdf/2501.13975v2	None
📝 更新	EasySplat: View-Adaptive Learning makes 3D Gaussian Splatting Easy	EasySplat：视图自适应学习让3D高斯分层变得简单	Ao Gao, Luosong Guo, Tao Chen, Zhao Wang, Ying Tai, Jian Yang, Zhenyu Zhang	http://arxiv.org/pdf/2501.01003v2	None
📝 更新	PEP-GS: Perceptually-Enhanced Precise Structured 3D Gaussians for View-Adaptive Rendering	感知增强精确结构化3D高斯用于视适应渲染	Junxi Jin, Xiulai Li, Haiping Huang, Lianjun Liu, Yujie Sun, Boyi Liu	http://arxiv.org/pdf/2411.05731v2	None

多模态

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers	FALCON：通过视觉注册解决高分辨率多模态大型语言模型中的视觉冗余和碎片化	Renshan Zhang, Rui Shao, Gongwei Chen, Kaiwen Zhou, Weili Guan, Liqiang Nie	http://arxiv.org/pdf/2501.16297v1	None
🆕 发布	Can Multimodal Large Language Models be Guided to Improve Industrial Anomaly Detection?	多模态大型语言模型能否被引导以提升工业异常检测？	Zhiling Chen, Hanning Chen, Mohsen Imani, Farhad Imani	http://arxiv.org/pdf/2501.15795v1	None
📝 更新	2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining	2.5年课堂经验：视觉-语言预训练的多模态教科书	Wenqi Zhang, Hang Zhang, Xin Li, Jiashuo Sun, Yongliang Shen, Weiming Lu, Deli Zhao, Yueting Zhuang .etc.	http://arxiv.org/pdf/2501.00958v3	https://github.com/DAMO-NLP-SG/multimodal_textbook.
📝 更新	TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data	TEOChat：一种用于时序地球观测数据的大规模视觉语言助手	Jeremy Andrew Irvin, Emily Ruoyu Liu, Joyce Chuyi Chen, Ines Dormoy, Jinyoung Kim, Samar Khanna, Zhuo Zheng, Stefano Ermon	http://arxiv.org/pdf/2410.06234v2	https://github.com/ermongroup/TEOChat
📝 更新	E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection	E2E-MFD：迈向端到端同步多模态融合检测	Jiaqing Zhang, Mingxiang Cao, Weiying Xie, Jie Lei, Daixun Li, Wenbo Huang, Yunsong Li, Xue Yang	http://arxiv.org/pdf/2403.09323v4	https://github.com/icey-zhang/E2E-MFD.

具身智能

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	PhysAnimator: Physics-Guided Generative Cartoon Animation	物理引导的生成卡通动画：PhysAnimator	Tianyi Xie, Yiwei Zhao, Ying Jiang, Chenfanfu Jiang	http://arxiv.org/pdf/2501.16550v1	None
🆕 发布	Objects matter: object-centric world models improve reinforcement learning in visually complex environments	物体至上：以物体为中心的世界模型提升视觉复杂环境中的强化学习	Weipu Zhang, Adam Jelley, Trevor McInroe, Amos Storkey	http://arxiv.org/pdf/2501.16443v1	None
🆕 发布	Improving Tropical Cyclone Forecasting With Video Diffusion Models	利用视频扩散模型提升热带气旋预报	Zhibo Ren, Pritthijit Nath, Pancham Shukla	http://arxiv.org/pdf/2501.16003v1	https://github.com/Ren-creater/forecast-video-diffmodels.
🆕 发布	Evaluating Data Influence in Meta Learning	评估元学习中的数据影响	Chenyang Ren, Huanyi Xie, Shu Yang, Meng Ding, Lijie Hu, Di Wang	http://arxiv.org/pdf/2501.15963v1	None
🆕 发布	The Components of Collaborative Joint Perception and Prediction -- A Conceptual Framework	协同联合感知与预测的组成部分——一个概念框架	Lei Wan, Hannan Ejaz Keen, Alexey Vinel	http://arxiv.org/pdf/2501.15860v1	None
📝 更新	GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration	GUI-Bee：通过自主探索将GUI动作定位与新型环境对齐	Yue Fan, Handong Zhao, Ruiyi Zhang, Yu Shen, Xin Eric Wang, Gang Wu	http://arxiv.org/pdf/2501.13896v2	None
📝 更新	Towards Kriging-informed Conditional Diffusion for Regional Sea-Level Data Downscaling	向区域海平面数据降尺度迈进：基于克里金信息的条件扩散	Subhankar Ghosh, Arun Sharma, Jayant Gupta, Aneesh Subramanian, Shashi Shekhar	http://arxiv.org/pdf/2410.15628v3	None

人体分析

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Toward Efficient Generalization in 3D Human Pose Estimation via a Canonical Domain Approach	迈向通过规范域方法实现高效泛化的3D人体姿态估计	Hoosang Lee, Jeha Ryu	http://arxiv.org/pdf/2501.16146v1	None
🆕 发布	Automated Detection of Sport Highlights from Audio and Video Sources	从音频和视频源自动检测体育精彩瞬间	Francesco Della Santa, Morgana Lalli	http://arxiv.org/pdf/2501.16100v2	None
🆕 发布	NanoHTNet: Nano Human Topology Network for Efficient 3D Human Pose Estimation	纳米人拓扑网络：用于高效3D人体姿态估计的纳米人类拓扑网络	Jialun Cai, Mengyuan Liu, Hong Liu, Wenhao Li, Shuheng Zhou	http://arxiv.org/pdf/2501.15763v1	https://github.com/vefalun/NanoHTNet.
📝 更新	VCRScore: Image captioning metric based on V&L Transformers, CLIP, and precision-recall	VCRScore：基于V&L Transformers、CLIP和精确率-召回率的图像标题度量标准	Guillermo Ruiz, Tania Ramírez, Daniela Moctezuma	http://arxiv.org/pdf/2501.09155v2	None
📝 更新	From Dashcam Videos to Driving Simulations: Stress Testing Automated Vehicles against Rare Events	从行车记录仪视频到驾驶模拟：对自动驾驶汽车进行罕见事件的压力测试	Yan Miao, Georgios Fainekos, Bardh Hoxha, Hideki Okamoto, Danil Prokhorov, Sayan Mitra	http://arxiv.org/pdf/2411.16027v2	None

人脸技术

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	LLM-attacker: Enhancing Closed-loop Adversarial Scenario Generation for Autonomous Driving with Large Language Models	LLM-attacker：利用大型语言模型增强自动驾驶闭环对抗场景生成	Yuewen Mei, Tong Nie, Jian Sun, Ye Tian	http://arxiv.org/pdf/2501.15850v1	None
📝 更新	MADation: Face Morphing Attack Detection with Foundation Models	MADation：基于基础模型的表情合成攻击检测	Eduarda Caldeira, Guray Ozgur, Tahar Chettaoui, Marija Ivanovska, Peter Peer, Fadi Boutros, Vitomir Struc, Naser Damer	http://arxiv.org/pdf/2501.03800v3	https://github.com/gurayozgur/MADation

数字人

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	BAG: Body-Aligned 3D Wearable Asset Generation	BAG：基于身体对齐的3D可穿戴资产生成	Zhongjin Luo, Yang Li, Mingrui Zhang, Senbo Wang, Han Yan, Xibin Song, Taizhang Shang, Wei Mao .etc.	http://arxiv.org/pdf/2501.16177v1	https://bag-3d.github.io/.
🆕 发布	A Data-Centric Approach: Dimensions of Visual Complexity and How to find Them	数据驱动的解决方案：视觉复杂性的维度及其发现方法	Karahan Sarıtaş, Tingke Shen, Surabhi S Nath, Peter Dayan	http://arxiv.org/pdf/2501.15890v1	None
🆕 发布	ClearSight: Human Vision-Inspired Solutions for Event-Based Motion Deblurring	清晰视界：基于人类视觉的动态模糊去噪解决方案	Xiaopeng Lin, Yulong Huang, Hongwei Ren, Zunchang Liu, Yue Zhou, Haotian Fu, Bojun Cheng	http://arxiv.org/pdf/2501.15808v1	None
🆕 发布	Do Existing Testing Tools Really Uncover Gender Bias in Text-to-Image Models?	现有测试工具真的能揭示文本到图像模型中的性别偏见吗？	Yunbo Lyu, Zhou Yang, Yuqing Niu, Jing Jiang, David Lo	http://arxiv.org/pdf/2501.15775v1	None

模型优化

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	BiFold: Bimanual Cloth Folding with Language Guidance	双面折叠：语言引导下的双手布料折叠	Oriol Barbany, Adrià Colomé, Carme Torras	http://arxiv.org/pdf/2501.16458v1	None
🆕 发布	Return of the Encoder: Maximizing Parameter Efficiency for SLMs	编码器归来：最大化SLMs的参数效率	Mohamed Elfeki, Rui Liu, Chad Voegele	http://arxiv.org/pdf/2501.16273v2	None
🆕 发布	Distilling foundation models for robust and efficient models in digital pathology	从基础模型中提炼出数字病理学中的鲁棒和高效模型	Alexandre Filiot, Nicolas Dop, Oussama Tchita, Auriane Riou, Rémy Dubois, Thomas Peeters, Daria Valter, Marin Scalbert .etc.	http://arxiv.org/pdf/2501.16239v2	None
🆕 发布	Rethinking the Bias of Foundation Model under Long-tailed Distribution	重新思考长尾分布下基础模型的偏差	Jiahao Chen, Bin Qin, Jiangmeng Li, Hao Chen, Bing Su	http://arxiv.org/pdf/2501.15955v1	None
🆕 发布	Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks	任意到任意Tryon：利用自适应位置嵌入实现多功能的虚拟服装任务	Hailong Guo, Bohan Zeng, Yiren Song, Wentao Zhang, Chuang Zhang, Jiaming Liu	http://arxiv.org/pdf/2501.15891v1	https://logn-2024.github.io/Any2anyTryonProjectPage
🆕 发布	Controllable Hand Grasp Generation for HOI and Efficient Evaluation Methods	可控手部抓取生成用于人机交互和高效评估方法	Ishant, Rongliang Wu, Joo Hwee Lim	http://arxiv.org/pdf/2501.15839v1	None
📝 更新	Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning	通过互补掩码实现隐式位置-标题对齐的弱监督密集视频字幕生成	Shiping Ge, Qiang Chen, Zhiwei Jiang, Yafeng Yin, Liu Qin, Ziyao Chen, Qing Gu	http://arxiv.org/pdf/2412.12791v2	None
📝 更新	CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes	CAFuser：基于条件感知的多模态融合，用于驾驶场景的鲁棒语义感知	Tim Broedermann, Christos Sakaridis, Yuqian Fu, Luc Van Gool	http://arxiv.org/pdf/2410.10791v2	https://github.com/timbroed/CAFuser.
📝 更新	JAM: A Comprehensive Model for Age Estimation, Verification, and Comparability	JAM：一种用于年龄估计、验证和可比性的综合模型	François David, Alexey A. Novikov, Ruslan Parkhomenko, Artem Voronin, Alix Melchy	http://arxiv.org/pdf/2410.04012v2	None

医学应用

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Multi-Objective Deep-Learning-based Biomechanical Deformable Image Registration with MOREA	基于多目标深度学习的生物力学可变形图像配准：使用MOREA	Georgios Andreadis, Eduard Ruiz Munné, Thomas H. W. Bäck, Peter A. N. Bosman, Tanja Alderliesten	http://arxiv.org/pdf/2501.16525v1	None
🆕 发布	Generating customized prompts for Zero-Shot Rare Event Medical Image Classification using LLM	基于大型语言模型生成零样本罕见事件医学图像分类的定制提示	Payal Kamboj, Ayan Banerjee, Bin Xu, Sandeep Gupta	http://arxiv.org/pdf/2501.16481v1	None
🆕 发布	Object Detection for Medical Image Analysis: Insights from the RT-DETR Model	医学图像分析中的目标检测：RT-DETR模型见解	Weijie He, Yuwei Zhang, Ting Xu, Tai An, Yingbin Liang, Bo Zhang	http://arxiv.org/pdf/2501.16469v1	None
🆕 发布	Adaptive Iterative Compression for High-Resolution Files: an Approach Focused on Preserving Visual Quality in Cinematic Workflows	自适应迭代压缩：一种专注于电影制作流程中保持视觉质量的解决方案	Leonardo Melo, Filipe Litaiff	http://arxiv.org/pdf/2501.16319v1	None
🆕 发布	Brain-Adapter: Enhancing Neurological Disorder Analysis with Adapter-Tuning Multimodal Large Language Models	脑适配器：通过适配器调优的多模态大型语言模型增强神经系统疾病分析	Jing Zhang, Xiaowei Yu, Yanjun Lyu, Lu Zhang, Tong Chen, Chao Cao, Yan Zhuang, Minheng Chen .etc.	http://arxiv.org/pdf/2501.16282v1	None
🆕 发布	CLISC: Bridging clip and sam by enhanced cam for unsupervised brain tumor segmentation	CLISC：通过增强CAM连接CLIP和SAM以实现无监督脑肿瘤分割	Xiaochuan Ma, Jia Fu, Wenjun Liao, Shichuan Zhang, Guotai Wang	http://arxiv.org/pdf/2501.16246v1	None
🆕 发布	Real-Time Brain Tumor Detection in Intraoperative Ultrasound Using YOLO11: From Model Training to Deployment in the Operating Room	实时术中超声脑肿瘤检测：基于YOLO11从模型训练到手术室部署	Santiago Cepeda, Olga Esteban-Sinovas, Roberto Romero, Vikas Singh, Prakash Shetty, Aliasgar Moiyadi, Ilyess Zemmoura, Giuseppe Roberto Giammalva .etc.	http://arxiv.org/pdf/2501.15994v1	None
🆕 发布	Pfungst and Clever Hans: Identifying the unintended cues in a widely used Alzheimer's disease MRI dataset using explainable deep learning	《普芬施和聪明的汉斯：利用可解释深度学习识别广泛使用的阿尔茨海默病MRI数据集中未预期的提示》	Christian Tinauer, Maximilian Sackl, Rudolf Stollberger, Stefan Ropele, Christian Langkammer	http://arxiv.org/pdf/2501.15831v1	None
🆕 发布	Z-Stack Scanning can Improve AI Detection of Mitosis: A Case Study of Meningiomas	Z-Stack扫描可提升AI对有丝分裂的检测：脑膜瘤案例分析	Hongyan Gu, Ellie Onstott, Wenzhong Yan, Tengyou Xu, Ruolin Wang, Zida Wu, Xiang 'Anthony' Chen, Mohammad Haeri	http://arxiv.org/pdf/2501.15743v1	None
🆕 发布	Leveraging Video Vision Transformer for Alzheimer's Disease Diagnosis from 3D Brain MRI	利用视频视觉Transformer从3D脑MRI诊断阿尔茨海默病	Taymaz Akan, Sait Alp, Md. Shenuarin Bhuiyan, Elizabeth A. Disbrow, Steven A. Conrad, John A. Vanchiere, Christopher G. Kevil, Mohammad A. N. Bhuiyan	http://arxiv.org/pdf/2501.15733v1	None
🆕 发布	A Survey on Computational Pathology Foundation Models: Datasets, Adaptation Strategies, and Evaluation Tasks	计算病理学基础模型综述：数据集、自适应策略和评估任务	Dong Li, Guihong Wan, Xintao Wu, Xinyu Wu, Ajit J. Nirmal, Christine G. Lian, Peter K. Sorger, Yevgeniy R. Semenov .etc.	http://arxiv.org/pdf/2501.15724v1	None
🆕 发布	SeqSeg: Learning Local Segments for Automatic Vascular Model Construction	SeqSeg：学习局部段以自动构建血管模型	Numi Sveinsson Cepero, Shawn C. Shadden	http://arxiv.org/pdf/2501.15712v1	None
📝 更新	FedDAG: Federated Domain Adversarial Generation Towards Generalizable Medical Image Analysis	联邦域对抗生成以实现可泛化医学图像分析：FedDAG	Haoxuan Che, Yifei Wu, Haibo Jin, Yong Xia, Hao Chen	http://arxiv.org/pdf/2501.13967v2	None
📝 更新	Slot-BERT: Self-supervised Object Discovery in Surgical Video	槽位BERT：手术视频中的自监督物体发现	Guiqiu Liao, Matjaz Jogan, Marcel Hussing, Kenta Nakahashi, Kazuhiro Yasufuku, Amin Madani, Eric Eaton, Daniel A. Hashimoto	http://arxiv.org/pdf/2501.12477v2	None
📝 更新	Multi-Tiered Self-Contrastive Learning for Medical Microwave Radiometry (MWR) Breast Cancer Detection	多层自对比学习在医学微波辐射计（MWR）乳腺癌检测中的应用	Christoforos Galazis, Huiyi Wu, Igor Goryanin	http://arxiv.org/pdf/2410.04636v2	https://github.com/cgalaz01/self_contrastive_mwr.
📝 更新	MSDet: Receptive Field Enhanced Multiscale Detection for Tiny Pulmonary Nodule	MSDet：针对微小肺结节的多尺度检测感受野增强方法	Guohui Cai, Ruicheng Zhang, Hongyang He, Zeyu Zhang, Daji Ergu, Yuanzhouhan Cao, Jinman Zhao, Binbin Hu .etc.	http://arxiv.org/pdf/2409.14028v2	https://github.com/CaiGuoHui123/MSDet
📝 更新	Generative Adversarial Networks in Ultrasound Imaging: Extending Field of View Beyond Conventional Limits	超声成像中生成对抗网络：扩展视场超越传统限制	Matej Gazda, Samuel Kadoury, Jakub Gazda, Peter Drotar	http://arxiv.org/pdf/2405.20981v2	None
📝 更新	MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis	MedPromptX：基于地面多模态提示的胸部X光诊断	Mai A. Shaaban, Adnan Khan, Mohammad Yaqub	http://arxiv.org/pdf/2403.15585v4	https://github.com/BioMedIA-MBZUAI/MedPromptX.

其他

状态	英文标题	中文标题	作者	PDF链接	代码链接
📝 更新	Evaluation of GPT-4o and GPT-4o-mini's Vision Capabilities for Compositional Analysis from Dried Solution Drops	GPT-4o和GPT-4o-mini在干溶液滴组合分析中的视觉能力评估	Deven B. Dangi, Beni B. Dangi, Oliver Steinbock	http://arxiv.org/pdf/2412.10587v2	None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2025-01-27.md

2025-01-27.md

[UPDATED!] 2025-01-27 (Update Time)

图像理解

检测分割

视频理解

生成模型

扩散桥

流模型

图像处理

3D场景

神经渲染

3DGS

多模态

具身智能

人体分析

人脸技术

数字人

模型优化

医学应用

其他

Files

2025-01-27.md

Latest commit

History

2025-01-27.md

File metadata and controls

[UPDATED!] 2025-01-27 (Update Time)

图像理解

检测分割

视频理解

生成模型

扩散桥

流模型

图像处理

3D场景

神经渲染

3DGS

多模态

具身智能

人体分析

人脸技术

数字人

模型优化

医学应用

其他