Skip to content

Latest commit

 

History

History
192 lines (146 loc) · 29.1 KB

2025-01-28.md

File metadata and controls

192 lines (146 loc) · 29.1 KB

[UPDATED!] 2025-01-28 (Update Time)

图像理解

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Influence of field of view in visual prostheses design: Analysis with a VR system 视觉假肢设计中对视场角的影响:VR系统分析 Melani Sanchez-Garcia, Ruben Martinez-Cantin, Jesus Bermudez-Cameo, Jose J. Guerrero http://arxiv.org/pdf/2501.17322v1 None
🆕 发布 SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training SFT记忆,RL泛化:基础模型后训练的比较研究 Tianzhe Chu, Yuexiang Zhai, Jihan Yang, Shengbang Tong, Saining Xie, Dale Schuurmans, Quoc V. Le, Sergey Levine .etc. http://arxiv.org/pdf/2501.17161v1 None
🆕 发布 DFCon: Attention-Driven Supervised Contrastive Learning for Robust Deepfake Detection DFCon:基于注意力的监督对比学习用于鲁棒深度伪造检测 MD Sadik Hossain Shanto, Mahir Labib Dihan, Souvik Ghosh, Riad Ahmed Anonto, Hafijul Hoque Chowdhury, Abir Muhtasim, Rakib Ahsan, MD Tanvir Hassan .etc. http://arxiv.org/pdf/2501.16704v1 None
🆕 发布 Determining Mosaic Resilience in Sugarcane Plants using Hyperspectral Images 利用高光谱图像确定甘蔗植株的马赛克抗性 Ali Zia, Jun Zhou, Muyiwa Olayemi http://arxiv.org/pdf/2501.16700v1 None
🆕 发布 Improving Interpretability and Accuracy in Neuro-Symbolic Rule Extraction Using Class-Specific Sparse Filters 基于类特定稀疏滤波器提升神经符号规则提取的可解释性和准确性 Parth Padalkar, Jaeseong Lee, Shiyi Wei, Gopal Gupta http://arxiv.org/pdf/2501.16677v1 None
🆕 发布 Unsupervised Domain Adaptation with Dynamic Clustering and Contrastive Refinement for Gait Recognition 无监督领域自适应:基于动态聚类和对比精炼的人体步态识别 Xiaolei Liu, Yan Sun, Mark Nixon http://arxiv.org/pdf/2501.16608v1 https://github.com/YanSun-github/GaitDCCR
📝 更新 Audio-Visual Deepfake Detection With Local Temporal Inconsistencies 基于局部时间不一致性的音视频深度伪造检测 Marcella Astrid, Enjie Ghorbel, Djamila Aouada http://arxiv.org/pdf/2501.08137v2 None

检测分割

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 WASUP: Interpretable Classification with Weight-Input Alignment and Class-Discriminative SUPports Vectors WASUP:基于权重输入对齐和类判别支持向量的可解释分类 Tom Nuno Wolf, Christian Wachinger http://arxiv.org/pdf/2501.17328v1 None
🆕 发布 A Contrastive Teacher-Student Framework for Novelty Detection under Style Shifts 对比式风格变化下的新颖性检测教师-学生框架 Hossein Mirzaei, Mojtaba Nafez, Moein Madadi, Arad Maleki, Mahdi Hajialilue, Zeinab Sadat Taghavi, Sepehr Rezaee, Ali Ansari .etc. http://arxiv.org/pdf/2501.17289v1 None
🆕 发布 DINOSTAR: Deep Iterative Neural Object Detector Self-Supervised Training for Roadside LiDAR Applications 深迭代神经目标检测器自监督训练,适用于道路侧激光雷达应用 Muhammad Shahbaz, Shaurya Agarwal http://arxiv.org/pdf/2501.17076v1 None
🆕 发布 Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding 弱监督时空视频定位的上下文自定步长学习 Akash Kumar, Zsolt Kira, Yogesh Singh Rawat http://arxiv.org/pdf/2501.17053v1 None
🆕 发布 MAUCell: An Adaptive Multi-Attention Framework for Video Frame Prediction MAUCell:一种自适应多注意力框架的视频帧预测 Shreyam Gupta, P. Agrawal, Priyam Gupta http://arxiv.org/pdf/2501.16997v1 None
🆕 发布 Modulating CNN Features with Pre-Trained ViT Representations for Open-Vocabulary Object Detection 利用预训练的ViT表示调节CNN特征进行开放词汇物体检测 Xiangyu Gao, Yu Dai, Benliu Qiu, Hongliang Li http://arxiv.org/pdf/2501.16981v1 None
🆕 发布 Beyond-Labels: Advancing Open-Vocabulary Segmentation With Vision-Language Models 超越标签:利用视觉-语言模型推进开放词汇分割 Muhammad Atta ur Rahman http://arxiv.org/pdf/2501.16769v2 None
🆕 发布 AdaSemSeg: An Adaptive Few-shot Semantic Segmentation of Seismic Facies AdaSemSeg:一种自适应的地震岩性少样本语义分割 Surojit Saha, Ross Whitaker http://arxiv.org/pdf/2501.16760v1 None
🆕 发布 DebugAgent: Efficient and Interpretable Error Slice Discovery for Comprehensive Model Debugging DebugAgent:高效且可解释的错误切片发现,用于全面模型调试 Muxi Chen, Chenchen Zhao, Qiang Xu http://arxiv.org/pdf/2501.16751v1 None
🆕 发布 CSPCL: Category Semantic Prior Contrastive Learning for Deformable DETR-Based Prohibited Item Detectors CSPCL:基于可变形DETR的违禁物品检测器类别语义先验对比学习 Mingyuan Li, Tong Jia, Hui Lu, Bowen Ma, Hao Wang, Dongyue Chen http://arxiv.org/pdf/2501.16665v1 None
🆕 发布 Vision-based autonomous structural damage detection using data-driven methods 基于视觉的驱动数据方法自主结构损伤检测 Seyyed Taghi Ataei, Parviz Mohammad Zadeh, Saeid Ataei http://arxiv.org/pdf/2501.16662v2 None
📝 更新 SpikSSD: Better Extraction and Fusion for Object Detection with Spiking Neuron Networks SpikSSD:基于脉冲神经网络的对象检测中的更好提取与融合 Yimeng Fan, Changsong Liu, Mingyang Li, Wei Zhang http://arxiv.org/pdf/2501.15151v2 https://github.com/yimeng-fan/SpikSSD.
📝 更新 Proto-OOD: Enhancing OOD Object Detection with Prototype Feature Similarity 原型-OOOD:利用原型特征相似性增强OOOD目标检测 Junkun Chen, Jilin Mei, Liang Chen, Fangzhou Zhao, Yan Xing, Yu Hu http://arxiv.org/pdf/2409.05466v2 None
📝 更新 Weakly-Supervised Learning via Multi-Lateral Decoder Branching for Tool Segmentation in Robot-Assisted Cardiovascular Catheterization 基于多侧解码分支的弱监督学习在机器人辅助心血管导管消融工具分割中的应用 Olatunji Mumini Omisore, Toluwanimi Akinyemi, Anh Nguyen, Lei Wang http://arxiv.org/pdf/2404.07594v2 None
📝 更新 A Deep Learning-Based Unified Framework for Red Lesions Detection on Retinal Fundus Images 基于深度学习的视网膜眼底图像红病变检测统一框架 Norah Asiri, Muhammad Hussain, Fadwa Al Adel http://arxiv.org/pdf/2109.05021v5 None
📝 更新 Conterfactual Generative Zero-Shot Semantic Segmentation 反事实生成零样本语义分割 Feihong Shen, Jun Liu, Ping Hu http://arxiv.org/pdf/2106.06360v2 None
📝 更新 Semantic and structural image segmentation for prosthetic vision 语义和结构图像分割用于假肢视觉 Melani Sanchez-Garcia, Ruben Martinez-Cantin, Jose J. Guerrero http://arxiv.org/pdf/1809.09607v3 None

视频理解

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Extending Information Bottleneck Attribution to Video Sequences 扩展信息瓶颈归因到视频序列 Veronika Solopova, Lucas Schmidt, Dorothea Kolossa http://arxiv.org/pdf/2501.16889v1 None
🆕 发布 Overcoming Semantic Dilution in Transformer-Based Next Frame Prediction 克服基于Transformer的下一帧预测中的语义稀释问题 Hy Nguyen, Srikanth Thudumu, Hung Du, Rajesh Vasa, Kon Mouzakis http://arxiv.org/pdf/2501.16753v1 None
📝 更新 Uni-Sign: Toward Unified Sign Language Understanding at Scale 统一手语理解:迈向大规模统一 Zecheng Li, Wengang Zhou, Weichao Zhao, Kepeng Wu, Hezhen Hu, Houqiang Li http://arxiv.org/pdf/2501.15187v2 https://github.com/ZechengLi19/Uni-Sign.

生成模型

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 DebiasPI: Inference-time Debiasing by Prompt Iteration of a Text-to-Image Generative Model DebiasPI:通过文本到图像生成模型的提示迭代进行推理时去偏 Sarah Bonna, Yu-Cheng Huang, Ekaterina Novozhilova, Sejin Paik, Zhengyang Shan, Michelle Yilin Feng, Ge Gao, Yonish Tayal .etc. http://arxiv.org/pdf/2501.18642v1 None
🆕 发布 CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation 立方差异:将基于扩散的图像模型重新用于全景生成 Nikolai Kalischek, Michael Oechsle, Fabian Manhardt, Philipp Henzler, Konrad Schindler, Federico Tombari http://arxiv.org/pdf/2501.17162v1 None
🆕 发布 Text-to-Image Generation for Vocabulary Learning Using the Keyword Method 基于关键词方法的文本到图像生成用于词汇学习 Nuwan T. Attygalle, Matjaž Kljun, Aaron Quigley, Klen čOpič Pucihar, Jens Grubert, Verena Biener, Luis A. Leiva, Juri Yoneyama .etc. http://arxiv.org/pdf/2501.17099v1 None
🆕 发布 DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation DiffSplat:重用图像扩散模型以实现可扩展高斯喷溅生成 Chenguo Lin, Panwang Pan, Bangbang Yang, Zeming Li, Yadong Mu http://arxiv.org/pdf/2501.16764v1 None
🆕 发布 ITVTON:Virtual Try-On Diffusion Transformer Model Based on Integrated Image and Text ITVTON:基于集成图像和文本的虚拟试穿扩散Transformer模型 Haifeng Ni http://arxiv.org/pdf/2501.16757v1 None
🆕 发布 Separate Motion from Appearance: Customizing Motion via Customizing Text-to-Video Diffusion Models 从外观中分离运动:通过定制文本到视频扩散模型定制运动 Huijie Liu, Jingyun Wang, Shuai Ma, Jie Hu, Xiaoming Wei, Guoliang Kang http://arxiv.org/pdf/2501.16714v1 None
📝 更新 Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation 基于槽位引导的预训练扩散模型在对象中心学习和组合生成中的应用 Adil Kaan Akan, Yucel Yemez http://arxiv.org/pdf/2501.15878v2 https://kaanakan.github.io/SlotAdapt
📝 更新 StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning 稳定材料:通过半监督学习增强材料生成多样性 Giuseppe Vecchio http://arxiv.org/pdf/2406.09293v3 None

扩散桥

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Adversarial Masked Autoencoder Purifier with Defense Transferability 对抗性掩码自编码器净化器与防御迁移性 Yuan-Chih Chen, Chun-Shien Lu http://arxiv.org/pdf/2501.16904v1 None
📝 更新 Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion 统一渲染与逆渲染:通过双流扩散实现 Zhifei Chen, Tianshuo Xu, Wenhang Ge, Leyi Wu, Dongyu Yan, Jing He, Luozhou Wang, Lu Zeng .etc. http://arxiv.org/pdf/2412.15050v3 None

图像处理

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Scenario Understanding of Traffic Scenes Through Large Visual Language Models 通过大型视觉语言模型理解交通场景的场景感知 Rivera Esteban, Lübberstedt Jannik, Nico Uhlemann, Markus Lienkamp http://arxiv.org/pdf/2501.17131v1 None
🆕 发布 RODEO: Robust Outlier Detection via Exposing Adaptive Out-of-Distribution Samples RODEO:通过暴露自适应异常值样本实现鲁棒异常值检测 Hossein Mirzaei, Mohammad Jafari, Hamid Reza Dehbashi, Ali Ansari, Sepehr Ghobadi, Masoud Hadi, Arshia Soltani Moakhar, Mohammad Azizmalayeri .etc. http://arxiv.org/pdf/2501.16971v1 None
🆕 发布 Image-based Geo-localization for Robotics: Are Black-box Vision-Language Models there yet? 基于图像的机器人地理定位:黑盒视觉-语言模型是否已经到来? Sania Waheed, Bruno Ferrarini, Michael Milford, Sarvapali D. Ramchurn, Shoaib Ehsan http://arxiv.org/pdf/2501.16947v1 None
📝 更新 SPECIAL: Zero-shot Hyperspectral Image Classification With CLIP 特别篇:基于CLIP的零样本高光谱图像分类 Li Pang, Jing Yao, Kaiyu Li, Xiangyong Cao http://arxiv.org/pdf/2501.16222v2 https://github.com/LiPang/SPECIAL.
📝 更新 The Hatching-Box: A Novel System for Automated Monitoring and Quantification of Drosophila melanogaster Developmental Behavior 孵化箱:一种用于自动监测和量化黑腹果蝇发育行为的创新系统 Julian Bigge, Maite Ogueta, Luis Garcia, Benjamin Risse http://arxiv.org/pdf/2411.15390v3 None
📝 更新 Cauchy activation function and XNet 柯西激活函数与XNet Xin Li, Zhihong Xia, Hongkun Zhang http://arxiv.org/pdf/2409.19221v2 None
📝 更新 FlexCap: Describe Anything in Images in Controllable Detail FlexCap:以可控细节描述图像中的任何内容 Debidatta Dwibedi, Vidhi Jain, Jonathan Tompson, Andrew Zisserman, Yusuf Aytar http://arxiv.org/pdf/2403.12026v2 None

3D场景

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Synthesizing 3D Abstractions by Inverting Procedural Buildings with Transformers 通过逆变换程序化建筑生成3D抽象 Maximilian Dax, Jordi Berbel, Jan Stria, Leonidas Guibas, Urs Bergmann http://arxiv.org/pdf/2501.17044v2 None
🆕 发布 Consistency Diffusion Models for Single-Image 3D Reconstruction with Priors 一致性扩散模型在具有先验知识的单图像3D重建中的应用 Chenru Jiang, Chengrui Zhang, Xi Yang, Jie Sun, Yifei Zhang, Bin Dong, Kaizhu Huang http://arxiv.org/pdf/2501.16737v2 None
📝 更新 Automatic Calibration of a Multi-Camera System with Limited Overlapping Fields of View for 3D Surgical Scene Reconstruction 多摄像头系统有限重叠视场自动校准用于三维手术场景重建 Tim Flückiger, Jonas Hein, Valery Fischer, Philipp Fürnstahl, Lilian Calvet http://arxiv.org/pdf/2501.16221v2 None
📝 更新 Acquiring Submillimeter-Accurate Multi-Task Vision Datasets for Computer-Assisted Orthopedic Surgery 获取用于计算机辅助骨科手术的亚毫米级多任务视觉数据集 Emma Most, Jonas Hein, Frédéric Giraud, Nicola A. Cavalcanti, Lukas Zingg, Baptiste Brument, Nino Louman, Fabio Carrillo .etc. http://arxiv.org/pdf/2501.15371v2 None
📝 更新 PokeFlex: A Real-World Dataset of Volumetric Deformable Objects for Robotics PokeFlex:一个用于机器人的真实世界体积可变形物体数据集 Jan Obrist, Miguel Zamora, Hehui Zheng, Ronan Hinchet, Firat Ozdemir, Juan Zarate, Robert K. Katzschmann, Stelian Coros http://arxiv.org/pdf/2410.07688v2 None
📝 更新 Manydepth2: Motion-Aware Self-Supervised Multi-Frame Monocular Depth Estimation in Dynamic Scenes Manydepth2:动态场景中的运动感知自监督多帧单目深度估计 Kaichen Zhou, Jia-Wang Bian, Jian-Qing Zheng, Jiaxing Zhong, Qian Xie, Niki Trigoni, Andrew Markham http://arxiv.org/pdf/2312.15268v8 https://github.com/kaichen-z/Manydepth2.
📝 更新 iMatching: Imperative Correspondence Learning iMatching:命令式对应学习 Zitong Zhan, Dasong Gao, Yun-Jou Lin, Youjie Xia, Chen Wang http://arxiv.org/pdf/2312.02141v3 None

神经渲染

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Image Velocimetry using Direct Displacement Field estimation with Neural Networks for Fluids 基于神经网络直接位移场估计的流体图像速度场测量 Efraín Magaña, Francisco Sahli Costabal, Wernher Brevis http://arxiv.org/pdf/2501.18641v1 None
🆕 发布 What Really Matters for Learning-based LiDAR-Camera Calibration 基于学习的激光雷达-相机标定真正重要的事情 Shujuan Huang, Chunyu Lin, Yao Zhao http://arxiv.org/pdf/2501.16969v1 None
📝 更新 LinPrim: Linear Primitives for Differentiable Volumetric Rendering 线性基元:可微分体渲染的线性原语 Nicolas von Lützow, Matthias Nießner http://arxiv.org/pdf/2501.16312v2 None
📝 更新 Efficiency Bottlenecks of Convolutional Kolmogorov-Arnold Networks: A Comprehensive Scrutiny with ImageNet, AlexNet, LeNet and Tabular Classification 卷积柯尔莫哥洛夫-阿诺德网络效率瓶颈:基于ImageNet、AlexNet、LeNet和表格分类的全面审视 Ashim Dahal, Saydul Akbar Murad, Nick Rahimi http://arxiv.org/pdf/2501.15757v2 https://github.com/ashimdahal/Study-of-Convolutional-Kolmogorov-Arnold-networks
📝 更新 NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields NeRAF:3D场景融合神经辐射场和声场 Amandine Brunetto, Sascha Hornauer, Fabien Moutarde http://arxiv.org/pdf/2405.18213v3 None

3DGS

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Evaluating CrowdSplat: Perceived Level of Detail for Gaussian Crowds 评估CrowdSplat:高斯人群的感知细节级别 Xiaohan Sun, Yinghan Xu, John Dingliana, Carol O'Sullivan http://arxiv.org/pdf/2501.17085v1 None
📝 更新 LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes LUDVIG:无需学习的二维视觉特征提升至高斯分层场景 Juliette Marrie, Romain Menegaux, Michael Arbel, Diane Larlus, Julien Mairal http://arxiv.org/pdf/2410.14462v4 None

多模态

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 IC-Portrait: In-Context Matching for View-Consistent Personalized Portrait IC-Portrait:基于上下文的匹配以实现视角一致的个人肖像 Han Yang, Enis Simsar, Sotiris Anagnostidis, Yanlong Zang, Thomas Hofmann, Ziwei Liu http://arxiv.org/pdf/2501.17159v2 None
🆕 发布 Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding 探索显式时间建模在多模态大型语言模型视频理解中的作用 Yun Li, Zhe Liu, Yajing Kong, Guangrui Li, Jiyuan Zhang, Chao Bian, Feng Liu, Lina Yao .etc. http://arxiv.org/pdf/2501.16786v1 None
🆕 发布 3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow 3D-MoE:一种通过校正流进行3D视觉和姿态扩散的多模态专家混合模型 Yueen Ma, Yuzheng Zhuang, Jianye Hao, Irwin King http://arxiv.org/pdf/2501.16698v1 None
🆕 发布 CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs CHiP:多模态LLMs的跨模态层次直接偏好优化 Jinlan Fu, Shenzhen Huangfu, Hao Fei, Xiaoyu Shen, Bryan Hooi, Xipeng Qiu, See-Kiong Ng http://arxiv.org/pdf/2501.16629v1 https://github.com/LVUGAI/CHiP.
📝 更新 VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding 视频LLaMA 3:图像和视频理解的领先多模态基础模型 Boqiang Zhang, Kehan Li, Zesen Cheng, Zhiqiang Hu, Yuqian Yuan, Guanzheng Chen, Sicong Leng, Yuming Jiang .etc. http://arxiv.org/pdf/2501.13106v3 None

具身智能

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Machine learning of microstructure--property relationships in materials with robust features from foundational vision transformers 材料中基于基础视觉Transformer的稳健特征微结构-性能关系机器学习 Sheila E. Whitman, Marat I. Latypov http://arxiv.org/pdf/2501.18637v1 None
🆕 发布 EdgeMLOps: Operationalizing ML models with Cumulocity IoT and thin-edge.io for Visual quality Inspection 边缘MLOps:利用Cumulocity IoT和thin-edge.io实现机器学习模型在视觉质量检测中的运营 Kanishk Chaturvedi, Johannes Gasthuber, Mohamed Abdelaal http://arxiv.org/pdf/2501.17062v1 None
🆕 发布 RG-Attn: Radian Glue Attention for Multi-modality Multi-agent Cooperative Perception RG-Attn:多模态多智能体协同感知的径向粘合注意力 Lantao Li, Kang Yang, Wenqi Zhang, Xiaoxue Wang, Chen Sun http://arxiv.org/pdf/2501.16803v1 None
🆕 发布 SSF-PAN: Semantic Scene Flow-Based Perception for Autonomous Navigation in Traffic Scenarios SSF-PAN:基于语义场景流的交通场景自主导航感知 Yinqi Chen, Meiying Zhang, Qi Hao, Guang Zhou http://arxiv.org/pdf/2501.16754v1 None
🆕 发布 Dream to Drive with Predictive Individual World Model 梦境驾驶:基于预测性个体世界模型的驾驶 Yinfeng Gao, Qichao Zhang, Da-wei Ding, Dongbin Zhao http://arxiv.org/pdf/2501.16733v1 None
🆕 发布 One Head Eight Arms: Block Matrix based Low Rank Adaptation for CLIP-based Few-Shot Learning 一头八臂:基于块矩阵的低秩自适应方法在CLIP基础上的小样本学习 Chunpeng Zhou, Qianqian Shen, Zhi Yu, Jiajun Bu, Haishuai Wang http://arxiv.org/pdf/2501.16720v1 None
🆕 发布 SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation SliceOcc:基于垂直切片表示的室内3D语义占用预测 Jianing Li, Ming Lu, Hao Wang, Chenyang Gu, Wenzhao Zheng, Li Du, Shanghang Zhang http://arxiv.org/pdf/2501.16684v1 https://github.com/NorthSummer/SliceOcc.
🆕 发布 Improving Vision-Language-Action Model with Online Reinforcement Learning 基于在线强化学习的视觉-语言-动作模型改进 Yanjiang Guo, Jianke Zhang, Xiaoyu Chen, Xiang Ji, Yen-Jen Wang, Yucheng Hu, Jianyu Chen http://arxiv.org/pdf/2501.16664v1 None
🆕 发布 Predicting 3D representations for Dynamic Scenes 预测动态场景的3D表示 Di Qi, Tong Yang, Beining Wang, Xiangyu Zhang, Wenqiang Zhang http://arxiv.org/pdf/2501.16617v1 None
📝 更新 Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks 移动智能体-E:用于复杂任务的自我进化移动助手 Zhenhailong Wang, Haiyang Xu, Junyang Wang, Xi Zhang, Ming Yan, Ji Zhang, Fei Huang, Heng Ji http://arxiv.org/pdf/2501.11733v2 https://x-plug.github.io/MobileAgent.
📝 更新 Competency-Aware Planning for Probabilistically Safe Navigation Under Perception Uncertainty 感知不确定性下的概率安全导航的胜任力感知规划 Sara Pohland, Claire Tomlin http://arxiv.org/pdf/2409.06111v4 None

人脸技术

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 B-FPGM: Lightweight Face Detection via Bayesian-Optimized Soft FPGM Pruning B-FPGM:基于贝叶斯优化的软FPGM剪枝的轻量级人脸检测 Nikolaos Kaparinos, Vasileios Mezaris http://arxiv.org/pdf/2501.16917v1 https://github.com/IDTITI/B-FPGM.
🆕 发布 Frequency Matters: Explaining Biases of Face Recognition in the Frequency Domain 频率决定一切:解释频域中人脸识别的偏差 Marco Huber, Fadi Boutros, Naser Damer http://arxiv.org/pdf/2501.16896v1 None
🆕 发布 Experimenting with Affective Computing Models in Video Interviews with Spanish-speaking Older Adults 在西班牙语老年人视频面试中实验情感计算模型 Josep Lopez Camunas, Cristina Bustos, Yanjun Zhu, Raquel Ros, Agata Lapedriza http://arxiv.org/pdf/2501.16870v1 None
🆕 发布 B-RIGHT: Benchmark Re-evaluation for Integrity in Generalized Human-Object Interaction Testing B-RIGHT:广义人-物交互测试中完整性的基准重新评估 Yoojin Jang, Junsu Kim, Hayeon Kim, Eun-ki Lee, Eun-sol Kim, Seungryul Baek, Jaejun Yoo http://arxiv.org/pdf/2501.16724v1 None
📝 更新 EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face Animation 情感面孔:情感-内容解耦的语音驱动3D说话人脸动画 Yihong Lin, Liang Peng, Xianjia Wu, Jianqiao Hu, Xiandong Li, Wenxiong Kang, Songju Lei, Huang Xu http://arxiv.org/pdf/2408.11518v2 None

数字人

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Towards Understanding Depth Perception in Foveated Rendering 朝向理解注视点渲染中的深度感知 Sophie Kergaßner, Taimoor Tariq, Piotr Didyk http://arxiv.org/pdf/2501.18635v1 None
🆕 发布 Not Every Patch is Needed: Towards a More Efficient and Effective Backbone for Video-based Person Re-identification 并非每个补丁都必不可少:迈向更高效、更有效的基于视频的人体重识别骨干网络 Lanyun Zhu, Tianrun Chen, Deyi Ji, Jieping Ye, Jun Liu http://arxiv.org/pdf/2501.16811v1 None
🆕 发布 FlexMotion: Lightweight, Physics-Aware, and Controllable Human Motion Generation 轻量级、物理感知且可控的人体运动生成:FlexMotion Arvin Tashakori, Arash Tashakori, Gongbo Yang, Z. Jane Wang, Peyman Servati http://arxiv.org/pdf/2501.16778v1 None
📝 更新 GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer GLDiTalker:基于图潜在扩散变换器的语音驱动3D面部动画 Yihong Lin, Zhaoxin Fan, Xianjia Wu, Lingyu Xiong, Liang Peng, Xiandong Li, Wenxiong Kang, Songju Lei .etc. http://arxiv.org/pdf/2408.01826v3 None

模型优化

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Target-driven Self-Distillation for Partial Observed Trajectories Forecasting 基于目标驱动的部分观测轨迹预测的自蒸馏 Pengfei Zhu, Peng Shu, Mengshi Qi, Liang Liu, Huadong Ma http://arxiv.org/pdf/2501.16767v1 None
🆕 发布 CascadeV: An Implementation of Wurstchen Architecture for Video Generation 级联V:视频生成中Wurstchen架构的实现 Wenfeng Lin, Jiangchuan Wei, Boyuan Liu, Yichen Zhang, Shiyue Yan, Mingyu Guo http://arxiv.org/pdf/2501.16612v1 https://github.com/bytedance/CascadeV.
📝 更新 Distilling foundation models for robust and efficient models in digital pathology 从基础模型中提炼出数字病理学中的鲁棒和高效模型 Alexandre Filiot, Nicolas Dop, Oussama Tchita, Auriane Riou, Rémy Dubois, Thomas Peeters, Daria Valter, Marin Scalbert .etc. http://arxiv.org/pdf/2501.16239v2 None
📝 更新 SelfPrompt: Confidence-Aware Semi-Supervised Tuning for Robust Vision-Language Model Adaptation 自提示:基于置信度的鲁棒视觉-语言模型自适应半监督调优 Shuvendu Roy, Ali Etemad http://arxiv.org/pdf/2501.14148v2 None
📝 更新 Multi-aspect Knowledge Distillation with Large Language Model 多方面知识蒸馏与大型语言模型 Taegyeong Lee, Jinsik Bang, Soyeong Kwon, Taehwan Kim http://arxiv.org/pdf/2501.13341v3 None

医学应用

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Post-Training Quantization for 3D Medical Image Segmentation: A Practical Study on Real Inference Engines 3D医学图像分割的培训后量化:针对真实推理引擎的实际研究 Chongyu Qu, Ritchie Zhao, Ye Yu, Bin Liu, Tianyuan Yao, Junchao Zhu, Bennett A. Landman, Yucheng Tang .etc. http://arxiv.org/pdf/2501.17343v1 https://github.com/hrlblab/PTQ.
🆕 发布 ViT-2SPN: Vision Transformer-based Dual-Stream Self-Supervised Pretraining Networks for Retinal OCT Classification ViT-2SPN:基于视觉Transformer的双流自监督预训练网络用于视网膜OCT分类 Mohammadreza Saraei, Igor Kozak, Eung-Joo Lee http://arxiv.org/pdf/2501.17260v1 None
🆕 发布 A Hybrid Deep Learning CNN Model for Enhanced COVID-19 Detection from Computed Tomography (CT) Scan Images 混合深度学习CNN模型用于增强CT扫描图像的COVID-19检测 Suresh Babu Nettur, Shanthi Karpurapu, Unnati Nettur, Likhit Sagar Gajja, Sravanthy Myneni, Akhil Dusi, Lalithya Posham http://arxiv.org/pdf/2501.17160v1 None
🆕 发布 VidSole: A Multimodal Dataset for Joint Kinetics Quantification and Disease Detection with Deep Learning VidSole:一种用于深度学习联合运动学量化与疾病检测的多模态数据集 Archit Kambhamettu, Samantha Snyder, Maliheh Fakhar, Samuel Audia, Ross Miller, Jae Kun Shim, Aniket Bera http://arxiv.org/pdf/2501.17890v1 None
🆕 发布 FedEFM: Federated Endovascular Foundation Model with Unseen Data 联邦血管基础模型与未见数据 Tuong Do, Nghia Vu, Tudor Jianu, Baoru Huang, Minh Vu, Jionglong Su, Erman Tjiputra, Quang D. Tran .etc. http://arxiv.org/pdf/2501.16992v1 None
🆕 发布 Ultra-high resolution multimodal MRI dense labelled holistic brain atlas 超高清多模态MRI密集标注整体脑图谱 José V. Manjón, Sergio Morell-Ortega, Marina Ruiz-Perez, Boris Mansencal, Edern Le Bot, Marien Gadea, Enrique Lanuza, Gwenaelle Catheline .etc. http://arxiv.org/pdf/2501.16879v1 None
🆕 发布 Dynamic Hypergraph Representation for Bone Metastasis Cancer Analysis 动态超图表示在骨转移癌分析中的应用 Yuxuan Chen, Jiawen Li, Huijuan Shi, Yang Xu, Tian Guan, Lianghui Zhu, Yonghong He, Anjia Han http://arxiv.org/pdf/2501.16787v1 None
🆕 发布 Efficient Knowledge Distillation of SAM for Medical Image Segmentation 高效的知识蒸馏:SAM在医学图像分割中的应用 Kunal Dasharath Patil, Gowthamaan Palani, Ganapathy Krishnamurthi http://arxiv.org/pdf/2501.16740v1 None
🆕 发布 Point Cloud Upsampling as Statistical Shape Model for Pelvic 点云上采样作为骨盆统计形状模型 Tongxu Zhang, Bei Wang http://arxiv.org/pdf/2501.16716v1 None
🆕 发布 Polyp-Gen: Realistic and Diverse Polyp Image Generation for Endoscopic Dataset Expansion Polyp-Gen:用于内镜数据集扩展的逼真且多样化的息肉图像生成 Shengyuan Liu, Zhen Chen, Qiushi Yang, Weihao Yu, Di Dong, Jiancong Hu, Yixuan Yuan http://arxiv.org/pdf/2501.16679v2 https://github.com/CUHK-AIM-Group/Polyp-Gen.
🆕 发布 Molecular-driven Foundation Model for Oncologic Pathology 分子驱动肿瘤病理学基础模型 Anurag Vaidya, Andrew Zhang, Guillaume Jaume, Andrew H. Song, Tong Ding, Sophia J. Wagner, Ming Y. Lu, Paul Doucet .etc. http://arxiv.org/pdf/2501.16652v1 None
📝 更新 Steerable Conditional Diffusion for Out-of-Distribution Adaptation in Medical Image Reconstruction 可调节条件扩散在医学图像重建中的分布外适应 Riccardo Barbano, Alexander Denker, Hyungjin Chung, Tae Hoon Roh, Simon Arridge, Peter Maass, Bangti Jin, Jong Chul Ye http://arxiv.org/pdf/2308.14409v3 None