[UPDATED!] 2025-01-28 (Update Time)

图像理解

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Influence of field of view in visual prostheses design: Analysis with a VR system	视觉假肢设计中对视场角的影响：VR系统分析	Melani Sanchez-Garcia, Ruben Martinez-Cantin, Jesus Bermudez-Cameo, Jose J. Guerrero	http://arxiv.org/pdf/2501.17322v1	None
🆕 发布	SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training	SFT记忆，RL泛化：基础模型后训练的比较研究	Tianzhe Chu, Yuexiang Zhai, Jihan Yang, Shengbang Tong, Saining Xie, Dale Schuurmans, Quoc V. Le, Sergey Levine .etc.	http://arxiv.org/pdf/2501.17161v1	None
🆕 发布	DFCon: Attention-Driven Supervised Contrastive Learning for Robust Deepfake Detection	DFCon：基于注意力的监督对比学习用于鲁棒深度伪造检测	MD Sadik Hossain Shanto, Mahir Labib Dihan, Souvik Ghosh, Riad Ahmed Anonto, Hafijul Hoque Chowdhury, Abir Muhtasim, Rakib Ahsan, MD Tanvir Hassan .etc.	http://arxiv.org/pdf/2501.16704v1	None
🆕 发布	Determining Mosaic Resilience in Sugarcane Plants using Hyperspectral Images	利用高光谱图像确定甘蔗植株的马赛克抗性	Ali Zia, Jun Zhou, Muyiwa Olayemi	http://arxiv.org/pdf/2501.16700v1	None
🆕 发布	Improving Interpretability and Accuracy in Neuro-Symbolic Rule Extraction Using Class-Specific Sparse Filters	基于类特定稀疏滤波器提升神经符号规则提取的可解释性和准确性	Parth Padalkar, Jaeseong Lee, Shiyi Wei, Gopal Gupta	http://arxiv.org/pdf/2501.16677v1	None
🆕 发布	Unsupervised Domain Adaptation with Dynamic Clustering and Contrastive Refinement for Gait Recognition	无监督领域自适应：基于动态聚类和对比精炼的人体步态识别	Xiaolei Liu, Yan Sun, Mark Nixon	http://arxiv.org/pdf/2501.16608v1	https://github.com/YanSun-github/GaitDCCR
📝 更新	Audio-Visual Deepfake Detection With Local Temporal Inconsistencies	基于局部时间不一致性的音视频深度伪造检测	Marcella Astrid, Enjie Ghorbel, Djamila Aouada	http://arxiv.org/pdf/2501.08137v2	None

检测分割

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	WASUP: Interpretable Classification with Weight-Input Alignment and Class-Discriminative SUPports Vectors	WASUP：基于权重输入对齐和类判别支持向量的可解释分类	Tom Nuno Wolf, Christian Wachinger	http://arxiv.org/pdf/2501.17328v1	None
🆕 发布	A Contrastive Teacher-Student Framework for Novelty Detection under Style Shifts	对比式风格变化下的新颖性检测教师-学生框架	Hossein Mirzaei, Mojtaba Nafez, Moein Madadi, Arad Maleki, Mahdi Hajialilue, Zeinab Sadat Taghavi, Sepehr Rezaee, Ali Ansari .etc.	http://arxiv.org/pdf/2501.17289v1	None
🆕 发布	DINOSTAR: Deep Iterative Neural Object Detector Self-Supervised Training for Roadside LiDAR Applications	深迭代神经目标检测器自监督训练，适用于道路侧激光雷达应用	Muhammad Shahbaz, Shaurya Agarwal	http://arxiv.org/pdf/2501.17076v1	None
🆕 发布	Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding	弱监督时空视频定位的上下文自定步长学习	Akash Kumar, Zsolt Kira, Yogesh Singh Rawat	http://arxiv.org/pdf/2501.17053v1	None
🆕 发布	MAUCell: An Adaptive Multi-Attention Framework for Video Frame Prediction	MAUCell：一种自适应多注意力框架的视频帧预测	Shreyam Gupta, P. Agrawal, Priyam Gupta	http://arxiv.org/pdf/2501.16997v1	None
🆕 发布	Modulating CNN Features with Pre-Trained ViT Representations for Open-Vocabulary Object Detection	利用预训练的ViT表示调节CNN特征进行开放词汇物体检测	Xiangyu Gao, Yu Dai, Benliu Qiu, Hongliang Li	http://arxiv.org/pdf/2501.16981v1	None
🆕 发布	Beyond-Labels: Advancing Open-Vocabulary Segmentation With Vision-Language Models	超越标签：利用视觉-语言模型推进开放词汇分割	Muhammad Atta ur Rahman	http://arxiv.org/pdf/2501.16769v2	None
🆕 发布	AdaSemSeg: An Adaptive Few-shot Semantic Segmentation of Seismic Facies	AdaSemSeg：一种自适应的地震岩性少样本语义分割	Surojit Saha, Ross Whitaker	http://arxiv.org/pdf/2501.16760v1	None
🆕 发布	DebugAgent: Efficient and Interpretable Error Slice Discovery for Comprehensive Model Debugging	DebugAgent：高效且可解释的错误切片发现，用于全面模型调试	Muxi Chen, Chenchen Zhao, Qiang Xu	http://arxiv.org/pdf/2501.16751v1	None
🆕 发布	CSPCL: Category Semantic Prior Contrastive Learning for Deformable DETR-Based Prohibited Item Detectors	CSPCL：基于可变形DETR的违禁物品检测器类别语义先验对比学习	Mingyuan Li, Tong Jia, Hui Lu, Bowen Ma, Hao Wang, Dongyue Chen	http://arxiv.org/pdf/2501.16665v1	None
🆕 发布	Vision-based autonomous structural damage detection using data-driven methods	基于视觉的驱动数据方法自主结构损伤检测	Seyyed Taghi Ataei, Parviz Mohammad Zadeh, Saeid Ataei	http://arxiv.org/pdf/2501.16662v2	None
📝 更新	SpikSSD: Better Extraction and Fusion for Object Detection with Spiking Neuron Networks	SpikSSD：基于脉冲神经网络的对象检测中的更好提取与融合	Yimeng Fan, Changsong Liu, Mingyang Li, Wei Zhang	http://arxiv.org/pdf/2501.15151v2	https://github.com/yimeng-fan/SpikSSD.
📝 更新	Proto-OOD: Enhancing OOD Object Detection with Prototype Feature Similarity	原型-OOOD：利用原型特征相似性增强OOOD目标检测	Junkun Chen, Jilin Mei, Liang Chen, Fangzhou Zhao, Yan Xing, Yu Hu	http://arxiv.org/pdf/2409.05466v2	None
📝 更新	Weakly-Supervised Learning via Multi-Lateral Decoder Branching for Tool Segmentation in Robot-Assisted Cardiovascular Catheterization	基于多侧解码分支的弱监督学习在机器人辅助心血管导管消融工具分割中的应用	Olatunji Mumini Omisore, Toluwanimi Akinyemi, Anh Nguyen, Lei Wang	http://arxiv.org/pdf/2404.07594v2	None
📝 更新	A Deep Learning-Based Unified Framework for Red Lesions Detection on Retinal Fundus Images	基于深度学习的视网膜眼底图像红病变检测统一框架	Norah Asiri, Muhammad Hussain, Fadwa Al Adel	http://arxiv.org/pdf/2109.05021v5	None
📝 更新	Conterfactual Generative Zero-Shot Semantic Segmentation	反事实生成零样本语义分割	Feihong Shen, Jun Liu, Ping Hu	http://arxiv.org/pdf/2106.06360v2	None
📝 更新	Semantic and structural image segmentation for prosthetic vision	语义和结构图像分割用于假肢视觉	Melani Sanchez-Garcia, Ruben Martinez-Cantin, Jose J. Guerrero	http://arxiv.org/pdf/1809.09607v3	None

视频理解

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Extending Information Bottleneck Attribution to Video Sequences	扩展信息瓶颈归因到视频序列	Veronika Solopova, Lucas Schmidt, Dorothea Kolossa	http://arxiv.org/pdf/2501.16889v1	None
🆕 发布	Overcoming Semantic Dilution in Transformer-Based Next Frame Prediction	克服基于Transformer的下一帧预测中的语义稀释问题	Hy Nguyen, Srikanth Thudumu, Hung Du, Rajesh Vasa, Kon Mouzakis	http://arxiv.org/pdf/2501.16753v1	None
📝 更新	Uni-Sign: Toward Unified Sign Language Understanding at Scale	统一手语理解：迈向大规模统一	Zecheng Li, Wengang Zhou, Weichao Zhao, Kepeng Wu, Hezhen Hu, Houqiang Li	http://arxiv.org/pdf/2501.15187v2	https://github.com/ZechengLi19/Uni-Sign.

生成模型

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	DebiasPI: Inference-time Debiasing by Prompt Iteration of a Text-to-Image Generative Model	DebiasPI：通过文本到图像生成模型的提示迭代进行推理时去偏	Sarah Bonna, Yu-Cheng Huang, Ekaterina Novozhilova, Sejin Paik, Zhengyang Shan, Michelle Yilin Feng, Ge Gao, Yonish Tayal .etc.	http://arxiv.org/pdf/2501.18642v1	None
🆕 发布	CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation	立方差异：将基于扩散的图像模型重新用于全景生成	Nikolai Kalischek, Michael Oechsle, Fabian Manhardt, Philipp Henzler, Konrad Schindler, Federico Tombari	http://arxiv.org/pdf/2501.17162v1	None
🆕 发布	Text-to-Image Generation for Vocabulary Learning Using the Keyword Method	基于关键词方法的文本到图像生成用于词汇学习	Nuwan T. Attygalle, Matjaž Kljun, Aaron Quigley, Klen čOpič Pucihar, Jens Grubert, Verena Biener, Luis A. Leiva, Juri Yoneyama .etc.	http://arxiv.org/pdf/2501.17099v1	None
🆕 发布	DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation	DiffSplat：重用图像扩散模型以实现可扩展高斯喷溅生成	Chenguo Lin, Panwang Pan, Bangbang Yang, Zeming Li, Yadong Mu	http://arxiv.org/pdf/2501.16764v1	None
🆕 发布	ITVTON:Virtual Try-On Diffusion Transformer Model Based on Integrated Image and Text	ITVTON：基于集成图像和文本的虚拟试穿扩散Transformer模型	Haifeng Ni	http://arxiv.org/pdf/2501.16757v1	None
🆕 发布	Separate Motion from Appearance: Customizing Motion via Customizing Text-to-Video Diffusion Models	从外观中分离运动：通过定制文本到视频扩散模型定制运动	Huijie Liu, Jingyun Wang, Shuai Ma, Jie Hu, Xiaoming Wei, Guoliang Kang	http://arxiv.org/pdf/2501.16714v1	None
📝 更新	Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation	基于槽位引导的预训练扩散模型在对象中心学习和组合生成中的应用	Adil Kaan Akan, Yucel Yemez	http://arxiv.org/pdf/2501.15878v2	https://kaanakan.github.io/SlotAdapt
📝 更新	StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning	稳定材料：通过半监督学习增强材料生成多样性	Giuseppe Vecchio	http://arxiv.org/pdf/2406.09293v3	None

扩散桥

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Adversarial Masked Autoencoder Purifier with Defense Transferability	对抗性掩码自编码器净化器与防御迁移性	Yuan-Chih Chen, Chun-Shien Lu	http://arxiv.org/pdf/2501.16904v1	None
📝 更新	Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion	统一渲染与逆渲染：通过双流扩散实现	Zhifei Chen, Tianshuo Xu, Wenhang Ge, Leyi Wu, Dongyu Yan, Jing He, Luozhou Wang, Lu Zeng .etc.	http://arxiv.org/pdf/2412.15050v3	None

图像处理

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Scenario Understanding of Traffic Scenes Through Large Visual Language Models	通过大型视觉语言模型理解交通场景的场景感知	Rivera Esteban, Lübberstedt Jannik, Nico Uhlemann, Markus Lienkamp	http://arxiv.org/pdf/2501.17131v1	None
🆕 发布	RODEO: Robust Outlier Detection via Exposing Adaptive Out-of-Distribution Samples	RODEO：通过暴露自适应异常值样本实现鲁棒异常值检测	Hossein Mirzaei, Mohammad Jafari, Hamid Reza Dehbashi, Ali Ansari, Sepehr Ghobadi, Masoud Hadi, Arshia Soltani Moakhar, Mohammad Azizmalayeri .etc.	http://arxiv.org/pdf/2501.16971v1	None
🆕 发布	Image-based Geo-localization for Robotics: Are Black-box Vision-Language Models there yet?	基于图像的机器人地理定位：黑盒视觉-语言模型是否已经到来？	Sania Waheed, Bruno Ferrarini, Michael Milford, Sarvapali D. Ramchurn, Shoaib Ehsan	http://arxiv.org/pdf/2501.16947v1	None
📝 更新	SPECIAL: Zero-shot Hyperspectral Image Classification With CLIP	特别篇：基于CLIP的零样本高光谱图像分类	Li Pang, Jing Yao, Kaiyu Li, Xiangyong Cao	http://arxiv.org/pdf/2501.16222v2	https://github.com/LiPang/SPECIAL.
📝 更新	The Hatching-Box: A Novel System for Automated Monitoring and Quantification of Drosophila melanogaster Developmental Behavior	孵化箱：一种用于自动监测和量化黑腹果蝇发育行为的创新系统	Julian Bigge, Maite Ogueta, Luis Garcia, Benjamin Risse	http://arxiv.org/pdf/2411.15390v3	None
📝 更新	Cauchy activation function and XNet	柯西激活函数与XNet	Xin Li, Zhihong Xia, Hongkun Zhang	http://arxiv.org/pdf/2409.19221v2	None
📝 更新	FlexCap: Describe Anything in Images in Controllable Detail	FlexCap：以可控细节描述图像中的任何内容	Debidatta Dwibedi, Vidhi Jain, Jonathan Tompson, Andrew Zisserman, Yusuf Aytar	http://arxiv.org/pdf/2403.12026v2	None

3D场景

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Synthesizing 3D Abstractions by Inverting Procedural Buildings with Transformers	通过逆变换程序化建筑生成3D抽象	Maximilian Dax, Jordi Berbel, Jan Stria, Leonidas Guibas, Urs Bergmann	http://arxiv.org/pdf/2501.17044v2	None
🆕 发布	Consistency Diffusion Models for Single-Image 3D Reconstruction with Priors	一致性扩散模型在具有先验知识的单图像3D重建中的应用	Chenru Jiang, Chengrui Zhang, Xi Yang, Jie Sun, Yifei Zhang, Bin Dong, Kaizhu Huang	http://arxiv.org/pdf/2501.16737v2	None
📝 更新	Automatic Calibration of a Multi-Camera System with Limited Overlapping Fields of View for 3D Surgical Scene Reconstruction	多摄像头系统有限重叠视场自动校准用于三维手术场景重建	Tim Flückiger, Jonas Hein, Valery Fischer, Philipp Fürnstahl, Lilian Calvet	http://arxiv.org/pdf/2501.16221v2	None
📝 更新	Acquiring Submillimeter-Accurate Multi-Task Vision Datasets for Computer-Assisted Orthopedic Surgery	获取用于计算机辅助骨科手术的亚毫米级多任务视觉数据集	Emma Most, Jonas Hein, Frédéric Giraud, Nicola A. Cavalcanti, Lukas Zingg, Baptiste Brument, Nino Louman, Fabio Carrillo .etc.	http://arxiv.org/pdf/2501.15371v2	None
📝 更新	PokeFlex: A Real-World Dataset of Volumetric Deformable Objects for Robotics	PokeFlex：一个用于机器人的真实世界体积可变形物体数据集	Jan Obrist, Miguel Zamora, Hehui Zheng, Ronan Hinchet, Firat Ozdemir, Juan Zarate, Robert K. Katzschmann, Stelian Coros	http://arxiv.org/pdf/2410.07688v2	None
📝 更新	Manydepth2: Motion-Aware Self-Supervised Multi-Frame Monocular Depth Estimation in Dynamic Scenes	Manydepth2：动态场景中的运动感知自监督多帧单目深度估计	Kaichen Zhou, Jia-Wang Bian, Jian-Qing Zheng, Jiaxing Zhong, Qian Xie, Niki Trigoni, Andrew Markham	http://arxiv.org/pdf/2312.15268v8	https://github.com/kaichen-z/Manydepth2.
📝 更新	iMatching: Imperative Correspondence Learning	iMatching：命令式对应学习	Zitong Zhan, Dasong Gao, Yun-Jou Lin, Youjie Xia, Chen Wang	http://arxiv.org/pdf/2312.02141v3	None

神经渲染

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Image Velocimetry using Direct Displacement Field estimation with Neural Networks for Fluids	基于神经网络直接位移场估计的流体图像速度场测量	Efraín Magaña, Francisco Sahli Costabal, Wernher Brevis	http://arxiv.org/pdf/2501.18641v1	None
🆕 发布	What Really Matters for Learning-based LiDAR-Camera Calibration	基于学习的激光雷达-相机标定真正重要的事情	Shujuan Huang, Chunyu Lin, Yao Zhao	http://arxiv.org/pdf/2501.16969v1	None
📝 更新	LinPrim: Linear Primitives for Differentiable Volumetric Rendering	线性基元：可微分体渲染的线性原语	Nicolas von Lützow, Matthias Nießner	http://arxiv.org/pdf/2501.16312v2	None
📝 更新	Efficiency Bottlenecks of Convolutional Kolmogorov-Arnold Networks: A Comprehensive Scrutiny with ImageNet, AlexNet, LeNet and Tabular Classification	卷积柯尔莫哥洛夫-阿诺德网络效率瓶颈：基于ImageNet、AlexNet、LeNet和表格分类的全面审视	Ashim Dahal, Saydul Akbar Murad, Nick Rahimi	http://arxiv.org/pdf/2501.15757v2	https://github.com/ashimdahal/Study-of-Convolutional-Kolmogorov-Arnold-networks
📝 更新	NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields	NeRAF：3D场景融合神经辐射场和声场	Amandine Brunetto, Sascha Hornauer, Fabien Moutarde	http://arxiv.org/pdf/2405.18213v3	None

3DGS

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Evaluating CrowdSplat: Perceived Level of Detail for Gaussian Crowds	评估CrowdSplat：高斯人群的感知细节级别	Xiaohan Sun, Yinghan Xu, John Dingliana, Carol O'Sullivan	http://arxiv.org/pdf/2501.17085v1	None
📝 更新	LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes	LUDVIG：无需学习的二维视觉特征提升至高斯分层场景	Juliette Marrie, Romain Menegaux, Michael Arbel, Diane Larlus, Julien Mairal	http://arxiv.org/pdf/2410.14462v4	None

多模态

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	IC-Portrait: In-Context Matching for View-Consistent Personalized Portrait	IC-Portrait：基于上下文的匹配以实现视角一致的个人肖像	Han Yang, Enis Simsar, Sotiris Anagnostidis, Yanlong Zang, Thomas Hofmann, Ziwei Liu	http://arxiv.org/pdf/2501.17159v2	None
🆕 发布	Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding	探索显式时间建模在多模态大型语言模型视频理解中的作用	Yun Li, Zhe Liu, Yajing Kong, Guangrui Li, Jiyuan Zhang, Chao Bian, Feng Liu, Lina Yao .etc.	http://arxiv.org/pdf/2501.16786v1	None
🆕 发布	3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow	3D-MoE：一种通过校正流进行3D视觉和姿态扩散的多模态专家混合模型	Yueen Ma, Yuzheng Zhuang, Jianye Hao, Irwin King	http://arxiv.org/pdf/2501.16698v1	None
🆕 发布	CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs	CHiP：多模态LLMs的跨模态层次直接偏好优化	Jinlan Fu, Shenzhen Huangfu, Hao Fei, Xiaoyu Shen, Bryan Hooi, Xipeng Qiu, See-Kiong Ng	http://arxiv.org/pdf/2501.16629v1	https://github.com/LVUGAI/CHiP.
📝 更新	VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding	视频LLaMA 3：图像和视频理解的领先多模态基础模型	Boqiang Zhang, Kehan Li, Zesen Cheng, Zhiqiang Hu, Yuqian Yuan, Guanzheng Chen, Sicong Leng, Yuming Jiang .etc.	http://arxiv.org/pdf/2501.13106v3	None

具身智能

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Machine learning of microstructure--property relationships in materials with robust features from foundational vision transformers	材料中基于基础视觉Transformer的稳健特征微结构-性能关系机器学习	Sheila E. Whitman, Marat I. Latypov	http://arxiv.org/pdf/2501.18637v1	None
🆕 发布	EdgeMLOps: Operationalizing ML models with Cumulocity IoT and thin-edge.io for Visual quality Inspection	边缘MLOps：利用Cumulocity IoT和thin-edge.io实现机器学习模型在视觉质量检测中的运营	Kanishk Chaturvedi, Johannes Gasthuber, Mohamed Abdelaal	http://arxiv.org/pdf/2501.17062v1	None
🆕 发布	RG-Attn: Radian Glue Attention for Multi-modality Multi-agent Cooperative Perception	RG-Attn：多模态多智能体协同感知的径向粘合注意力	Lantao Li, Kang Yang, Wenqi Zhang, Xiaoxue Wang, Chen Sun	http://arxiv.org/pdf/2501.16803v1	None
🆕 发布	SSF-PAN: Semantic Scene Flow-Based Perception for Autonomous Navigation in Traffic Scenarios	SSF-PAN：基于语义场景流的交通场景自主导航感知	Yinqi Chen, Meiying Zhang, Qi Hao, Guang Zhou	http://arxiv.org/pdf/2501.16754v1	None
🆕 发布	Dream to Drive with Predictive Individual World Model	梦境驾驶：基于预测性个体世界模型的驾驶	Yinfeng Gao, Qichao Zhang, Da-wei Ding, Dongbin Zhao	http://arxiv.org/pdf/2501.16733v1	None
🆕 发布	One Head Eight Arms: Block Matrix based Low Rank Adaptation for CLIP-based Few-Shot Learning	一头八臂：基于块矩阵的低秩自适应方法在CLIP基础上的小样本学习	Chunpeng Zhou, Qianqian Shen, Zhi Yu, Jiajun Bu, Haishuai Wang	http://arxiv.org/pdf/2501.16720v1	None
🆕 发布	SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation	SliceOcc：基于垂直切片表示的室内3D语义占用预测	Jianing Li, Ming Lu, Hao Wang, Chenyang Gu, Wenzhao Zheng, Li Du, Shanghang Zhang	http://arxiv.org/pdf/2501.16684v1	https://github.com/NorthSummer/SliceOcc.
🆕 发布	Improving Vision-Language-Action Model with Online Reinforcement Learning	基于在线强化学习的视觉-语言-动作模型改进	Yanjiang Guo, Jianke Zhang, Xiaoyu Chen, Xiang Ji, Yen-Jen Wang, Yucheng Hu, Jianyu Chen	http://arxiv.org/pdf/2501.16664v1	None
🆕 发布	Predicting 3D representations for Dynamic Scenes	预测动态场景的3D表示	Di Qi, Tong Yang, Beining Wang, Xiangyu Zhang, Wenqiang Zhang	http://arxiv.org/pdf/2501.16617v1	None
📝 更新	Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks	移动智能体-E：用于复杂任务的自我进化移动助手	Zhenhailong Wang, Haiyang Xu, Junyang Wang, Xi Zhang, Ming Yan, Ji Zhang, Fei Huang, Heng Ji	http://arxiv.org/pdf/2501.11733v2	https://x-plug.github.io/MobileAgent.
📝 更新	Competency-Aware Planning for Probabilistically Safe Navigation Under Perception Uncertainty	感知不确定性下的概率安全导航的胜任力感知规划	Sara Pohland, Claire Tomlin	http://arxiv.org/pdf/2409.06111v4	None

人脸技术

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	B-FPGM: Lightweight Face Detection via Bayesian-Optimized Soft FPGM Pruning	B-FPGM：基于贝叶斯优化的软FPGM剪枝的轻量级人脸检测	Nikolaos Kaparinos, Vasileios Mezaris	http://arxiv.org/pdf/2501.16917v1	https://github.com/IDTITI/B-FPGM.
🆕 发布	Frequency Matters: Explaining Biases of Face Recognition in the Frequency Domain	频率决定一切：解释频域中人脸识别的偏差	Marco Huber, Fadi Boutros, Naser Damer	http://arxiv.org/pdf/2501.16896v1	None
🆕 发布	Experimenting with Affective Computing Models in Video Interviews with Spanish-speaking Older Adults	在西班牙语老年人视频面试中实验情感计算模型	Josep Lopez Camunas, Cristina Bustos, Yanjun Zhu, Raquel Ros, Agata Lapedriza	http://arxiv.org/pdf/2501.16870v1	None
🆕 发布	B-RIGHT: Benchmark Re-evaluation for Integrity in Generalized Human-Object Interaction Testing	B-RIGHT：广义人-物交互测试中完整性的基准重新评估	Yoojin Jang, Junsu Kim, Hayeon Kim, Eun-ki Lee, Eun-sol Kim, Seungryul Baek, Jaejun Yoo	http://arxiv.org/pdf/2501.16724v1	None
📝 更新	EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face Animation	情感面孔：情感-内容解耦的语音驱动3D说话人脸动画	Yihong Lin, Liang Peng, Xianjia Wu, Jianqiao Hu, Xiandong Li, Wenxiong Kang, Songju Lei, Huang Xu	http://arxiv.org/pdf/2408.11518v2	None

数字人

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Towards Understanding Depth Perception in Foveated Rendering	朝向理解注视点渲染中的深度感知	Sophie Kergaßner, Taimoor Tariq, Piotr Didyk	http://arxiv.org/pdf/2501.18635v1	None
🆕 发布	Not Every Patch is Needed: Towards a More Efficient and Effective Backbone for Video-based Person Re-identification	并非每个补丁都必不可少：迈向更高效、更有效的基于视频的人体重识别骨干网络	Lanyun Zhu, Tianrun Chen, Deyi Ji, Jieping Ye, Jun Liu	http://arxiv.org/pdf/2501.16811v1	None
🆕 发布	FlexMotion: Lightweight, Physics-Aware, and Controllable Human Motion Generation	轻量级、物理感知且可控的人体运动生成：FlexMotion	Arvin Tashakori, Arash Tashakori, Gongbo Yang, Z. Jane Wang, Peyman Servati	http://arxiv.org/pdf/2501.16778v1	None
📝 更新	GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer	GLDiTalker：基于图潜在扩散变换器的语音驱动3D面部动画	Yihong Lin, Zhaoxin Fan, Xianjia Wu, Lingyu Xiong, Liang Peng, Xiandong Li, Wenxiong Kang, Songju Lei .etc.	http://arxiv.org/pdf/2408.01826v3	None

模型优化

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Target-driven Self-Distillation for Partial Observed Trajectories Forecasting	基于目标驱动的部分观测轨迹预测的自蒸馏	Pengfei Zhu, Peng Shu, Mengshi Qi, Liang Liu, Huadong Ma	http://arxiv.org/pdf/2501.16767v1	None
🆕 发布	CascadeV: An Implementation of Wurstchen Architecture for Video Generation	级联V：视频生成中Wurstchen架构的实现	Wenfeng Lin, Jiangchuan Wei, Boyuan Liu, Yichen Zhang, Shiyue Yan, Mingyu Guo	http://arxiv.org/pdf/2501.16612v1	https://github.com/bytedance/CascadeV.
📝 更新	Distilling foundation models for robust and efficient models in digital pathology	从基础模型中提炼出数字病理学中的鲁棒和高效模型	Alexandre Filiot, Nicolas Dop, Oussama Tchita, Auriane Riou, Rémy Dubois, Thomas Peeters, Daria Valter, Marin Scalbert .etc.	http://arxiv.org/pdf/2501.16239v2	None
📝 更新	SelfPrompt: Confidence-Aware Semi-Supervised Tuning for Robust Vision-Language Model Adaptation	自提示：基于置信度的鲁棒视觉-语言模型自适应半监督调优	Shuvendu Roy, Ali Etemad	http://arxiv.org/pdf/2501.14148v2	None
📝 更新	Multi-aspect Knowledge Distillation with Large Language Model	多方面知识蒸馏与大型语言模型	Taegyeong Lee, Jinsik Bang, Soyeong Kwon, Taehwan Kim	http://arxiv.org/pdf/2501.13341v3	None

医学应用

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Post-Training Quantization for 3D Medical Image Segmentation: A Practical Study on Real Inference Engines	3D医学图像分割的培训后量化：针对真实推理引擎的实际研究	Chongyu Qu, Ritchie Zhao, Ye Yu, Bin Liu, Tianyuan Yao, Junchao Zhu, Bennett A. Landman, Yucheng Tang .etc.	http://arxiv.org/pdf/2501.17343v1	https://github.com/hrlblab/PTQ.
🆕 发布	ViT-2SPN: Vision Transformer-based Dual-Stream Self-Supervised Pretraining Networks for Retinal OCT Classification	ViT-2SPN：基于视觉Transformer的双流自监督预训练网络用于视网膜OCT分类	Mohammadreza Saraei, Igor Kozak, Eung-Joo Lee	http://arxiv.org/pdf/2501.17260v1	None
🆕 发布	A Hybrid Deep Learning CNN Model for Enhanced COVID-19 Detection from Computed Tomography (CT) Scan Images	混合深度学习CNN模型用于增强CT扫描图像的COVID-19检测	Suresh Babu Nettur, Shanthi Karpurapu, Unnati Nettur, Likhit Sagar Gajja, Sravanthy Myneni, Akhil Dusi, Lalithya Posham	http://arxiv.org/pdf/2501.17160v1	None
🆕 发布	VidSole: A Multimodal Dataset for Joint Kinetics Quantification and Disease Detection with Deep Learning	VidSole：一种用于深度学习联合运动学量化与疾病检测的多模态数据集	Archit Kambhamettu, Samantha Snyder, Maliheh Fakhar, Samuel Audia, Ross Miller, Jae Kun Shim, Aniket Bera	http://arxiv.org/pdf/2501.17890v1	None
🆕 发布	FedEFM: Federated Endovascular Foundation Model with Unseen Data	联邦血管基础模型与未见数据	Tuong Do, Nghia Vu, Tudor Jianu, Baoru Huang, Minh Vu, Jionglong Su, Erman Tjiputra, Quang D. Tran .etc.	http://arxiv.org/pdf/2501.16992v1	None
🆕 发布	Ultra-high resolution multimodal MRI dense labelled holistic brain atlas	超高清多模态MRI密集标注整体脑图谱	José V. Manjón, Sergio Morell-Ortega, Marina Ruiz-Perez, Boris Mansencal, Edern Le Bot, Marien Gadea, Enrique Lanuza, Gwenaelle Catheline .etc.	http://arxiv.org/pdf/2501.16879v1	None
🆕 发布	Dynamic Hypergraph Representation for Bone Metastasis Cancer Analysis	动态超图表示在骨转移癌分析中的应用	Yuxuan Chen, Jiawen Li, Huijuan Shi, Yang Xu, Tian Guan, Lianghui Zhu, Yonghong He, Anjia Han	http://arxiv.org/pdf/2501.16787v1	None
🆕 发布	Efficient Knowledge Distillation of SAM for Medical Image Segmentation	高效的知识蒸馏：SAM在医学图像分割中的应用	Kunal Dasharath Patil, Gowthamaan Palani, Ganapathy Krishnamurthi	http://arxiv.org/pdf/2501.16740v1	None
🆕 发布	Point Cloud Upsampling as Statistical Shape Model for Pelvic	点云上采样作为骨盆统计形状模型	Tongxu Zhang, Bei Wang	http://arxiv.org/pdf/2501.16716v1	None
🆕 发布	Polyp-Gen: Realistic and Diverse Polyp Image Generation for Endoscopic Dataset Expansion	Polyp-Gen：用于内镜数据集扩展的逼真且多样化的息肉图像生成	Shengyuan Liu, Zhen Chen, Qiushi Yang, Weihao Yu, Di Dong, Jiancong Hu, Yixuan Yuan	http://arxiv.org/pdf/2501.16679v2	https://github.com/CUHK-AIM-Group/Polyp-Gen.
🆕 发布	Molecular-driven Foundation Model for Oncologic Pathology	分子驱动肿瘤病理学基础模型	Anurag Vaidya, Andrew Zhang, Guillaume Jaume, Andrew H. Song, Tong Ding, Sophia J. Wagner, Ming Y. Lu, Paul Doucet .etc.	http://arxiv.org/pdf/2501.16652v1	None
📝 更新	Steerable Conditional Diffusion for Out-of-Distribution Adaptation in Medical Image Reconstruction	可调节条件扩散在医学图像重建中的分布外适应	Riccardo Barbano, Alexander Denker, Hyungjin Chung, Tae Hoon Roh, Simon Arridge, Peter Maass, Bangti Jin, Jong Chul Ye	http://arxiv.org/pdf/2308.14409v3	None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2025-01-28.md

2025-01-28.md

[UPDATED!] 2025-01-28 (Update Time)

图像理解

检测分割

视频理解

生成模型

扩散桥

图像处理

3D场景

神经渲染

3DGS

多模态

具身智能

人脸技术

数字人

模型优化

医学应用

Files

2025-01-28.md

Latest commit

History

2025-01-28.md

File metadata and controls

[UPDATED!] 2025-01-28 (Update Time)

图像理解

检测分割

视频理解

生成模型

扩散桥

图像处理

3D场景

神经渲染

3DGS

多模态

具身智能

人脸技术

数字人

模型优化

医学应用