状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Influence of field of view in visual prostheses design: Analysis with a VR system | 视觉假肢设计中对视场角的影响:VR系统分析 | Melani Sanchez-Garcia, Ruben Martinez-Cantin, Jesus Bermudez-Cameo, Jose J. Guerrero | http://arxiv.org/pdf/2501.17322v1 | None |
🆕 发布 | SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training | SFT记忆,RL泛化:基础模型后训练的比较研究 | Tianzhe Chu, Yuexiang Zhai, Jihan Yang, Shengbang Tong, Saining Xie, Dale Schuurmans, Quoc V. Le, Sergey Levine .etc. | http://arxiv.org/pdf/2501.17161v1 | None |
🆕 发布 | DFCon: Attention-Driven Supervised Contrastive Learning for Robust Deepfake Detection | DFCon:基于注意力的监督对比学习用于鲁棒深度伪造检测 | MD Sadik Hossain Shanto, Mahir Labib Dihan, Souvik Ghosh, Riad Ahmed Anonto, Hafijul Hoque Chowdhury, Abir Muhtasim, Rakib Ahsan, MD Tanvir Hassan .etc. | http://arxiv.org/pdf/2501.16704v1 | None |
🆕 发布 | Determining Mosaic Resilience in Sugarcane Plants using Hyperspectral Images | 利用高光谱图像确定甘蔗植株的马赛克抗性 | Ali Zia, Jun Zhou, Muyiwa Olayemi | http://arxiv.org/pdf/2501.16700v1 | None |
🆕 发布 | Improving Interpretability and Accuracy in Neuro-Symbolic Rule Extraction Using Class-Specific Sparse Filters | 基于类特定稀疏滤波器提升神经符号规则提取的可解释性和准确性 | Parth Padalkar, Jaeseong Lee, Shiyi Wei, Gopal Gupta | http://arxiv.org/pdf/2501.16677v1 | None |
🆕 发布 | Unsupervised Domain Adaptation with Dynamic Clustering and Contrastive Refinement for Gait Recognition | 无监督领域自适应:基于动态聚类和对比精炼的人体步态识别 | Xiaolei Liu, Yan Sun, Mark Nixon | http://arxiv.org/pdf/2501.16608v1 | https://github.com/YanSun-github/GaitDCCR |
📝 更新 | Audio-Visual Deepfake Detection With Local Temporal Inconsistencies | 基于局部时间不一致性的音视频深度伪造检测 | Marcella Astrid, Enjie Ghorbel, Djamila Aouada | http://arxiv.org/pdf/2501.08137v2 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | WASUP: Interpretable Classification with Weight-Input Alignment and Class-Discriminative SUPports Vectors | WASUP:基于权重输入对齐和类判别支持向量的可解释分类 | Tom Nuno Wolf, Christian Wachinger | http://arxiv.org/pdf/2501.17328v1 | None |
🆕 发布 | A Contrastive Teacher-Student Framework for Novelty Detection under Style Shifts | 对比式风格变化下的新颖性检测教师-学生框架 | Hossein Mirzaei, Mojtaba Nafez, Moein Madadi, Arad Maleki, Mahdi Hajialilue, Zeinab Sadat Taghavi, Sepehr Rezaee, Ali Ansari .etc. | http://arxiv.org/pdf/2501.17289v1 | None |
🆕 发布 | DINOSTAR: Deep Iterative Neural Object Detector Self-Supervised Training for Roadside LiDAR Applications | 深迭代神经目标检测器自监督训练,适用于道路侧激光雷达应用 | Muhammad Shahbaz, Shaurya Agarwal | http://arxiv.org/pdf/2501.17076v1 | None |
🆕 发布 | Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding | 弱监督时空视频定位的上下文自定步长学习 | Akash Kumar, Zsolt Kira, Yogesh Singh Rawat | http://arxiv.org/pdf/2501.17053v1 | None |
🆕 发布 | MAUCell: An Adaptive Multi-Attention Framework for Video Frame Prediction | MAUCell:一种自适应多注意力框架的视频帧预测 | Shreyam Gupta, P. Agrawal, Priyam Gupta | http://arxiv.org/pdf/2501.16997v1 | None |
🆕 发布 | Modulating CNN Features with Pre-Trained ViT Representations for Open-Vocabulary Object Detection | 利用预训练的ViT表示调节CNN特征进行开放词汇物体检测 | Xiangyu Gao, Yu Dai, Benliu Qiu, Hongliang Li | http://arxiv.org/pdf/2501.16981v1 | None |
🆕 发布 | Beyond-Labels: Advancing Open-Vocabulary Segmentation With Vision-Language Models | 超越标签:利用视觉-语言模型推进开放词汇分割 | Muhammad Atta ur Rahman | http://arxiv.org/pdf/2501.16769v2 | None |
🆕 发布 | AdaSemSeg: An Adaptive Few-shot Semantic Segmentation of Seismic Facies | AdaSemSeg:一种自适应的地震岩性少样本语义分割 | Surojit Saha, Ross Whitaker | http://arxiv.org/pdf/2501.16760v1 | None |
🆕 发布 | DebugAgent: Efficient and Interpretable Error Slice Discovery for Comprehensive Model Debugging | DebugAgent:高效且可解释的错误切片发现,用于全面模型调试 | Muxi Chen, Chenchen Zhao, Qiang Xu | http://arxiv.org/pdf/2501.16751v1 | None |
🆕 发布 | CSPCL: Category Semantic Prior Contrastive Learning for Deformable DETR-Based Prohibited Item Detectors | CSPCL:基于可变形DETR的违禁物品检测器类别语义先验对比学习 | Mingyuan Li, Tong Jia, Hui Lu, Bowen Ma, Hao Wang, Dongyue Chen | http://arxiv.org/pdf/2501.16665v1 | None |
🆕 发布 | Vision-based autonomous structural damage detection using data-driven methods | 基于视觉的驱动数据方法自主结构损伤检测 | Seyyed Taghi Ataei, Parviz Mohammad Zadeh, Saeid Ataei | http://arxiv.org/pdf/2501.16662v2 | None |
📝 更新 | SpikSSD: Better Extraction and Fusion for Object Detection with Spiking Neuron Networks | SpikSSD:基于脉冲神经网络的对象检测中的更好提取与融合 | Yimeng Fan, Changsong Liu, Mingyang Li, Wei Zhang | http://arxiv.org/pdf/2501.15151v2 | https://github.com/yimeng-fan/SpikSSD. |
📝 更新 | Proto-OOD: Enhancing OOD Object Detection with Prototype Feature Similarity | 原型-OOOD:利用原型特征相似性增强OOOD目标检测 | Junkun Chen, Jilin Mei, Liang Chen, Fangzhou Zhao, Yan Xing, Yu Hu | http://arxiv.org/pdf/2409.05466v2 | None |
📝 更新 | Weakly-Supervised Learning via Multi-Lateral Decoder Branching for Tool Segmentation in Robot-Assisted Cardiovascular Catheterization | 基于多侧解码分支的弱监督学习在机器人辅助心血管导管消融工具分割中的应用 | Olatunji Mumini Omisore, Toluwanimi Akinyemi, Anh Nguyen, Lei Wang | http://arxiv.org/pdf/2404.07594v2 | None |
📝 更新 | A Deep Learning-Based Unified Framework for Red Lesions Detection on Retinal Fundus Images | 基于深度学习的视网膜眼底图像红病变检测统一框架 | Norah Asiri, Muhammad Hussain, Fadwa Al Adel | http://arxiv.org/pdf/2109.05021v5 | None |
📝 更新 | Conterfactual Generative Zero-Shot Semantic Segmentation | 反事实生成零样本语义分割 | Feihong Shen, Jun Liu, Ping Hu | http://arxiv.org/pdf/2106.06360v2 | None |
📝 更新 | Semantic and structural image segmentation for prosthetic vision | 语义和结构图像分割用于假肢视觉 | Melani Sanchez-Garcia, Ruben Martinez-Cantin, Jose J. Guerrero | http://arxiv.org/pdf/1809.09607v3 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Extending Information Bottleneck Attribution to Video Sequences | 扩展信息瓶颈归因到视频序列 | Veronika Solopova, Lucas Schmidt, Dorothea Kolossa | http://arxiv.org/pdf/2501.16889v1 | None |
🆕 发布 | Overcoming Semantic Dilution in Transformer-Based Next Frame Prediction | 克服基于Transformer的下一帧预测中的语义稀释问题 | Hy Nguyen, Srikanth Thudumu, Hung Du, Rajesh Vasa, Kon Mouzakis | http://arxiv.org/pdf/2501.16753v1 | None |
📝 更新 | Uni-Sign: Toward Unified Sign Language Understanding at Scale | 统一手语理解:迈向大规模统一 | Zecheng Li, Wengang Zhou, Weichao Zhao, Kepeng Wu, Hezhen Hu, Houqiang Li | http://arxiv.org/pdf/2501.15187v2 | https://github.com/ZechengLi19/Uni-Sign. |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | DebiasPI: Inference-time Debiasing by Prompt Iteration of a Text-to-Image Generative Model | DebiasPI:通过文本到图像生成模型的提示迭代进行推理时去偏 | Sarah Bonna, Yu-Cheng Huang, Ekaterina Novozhilova, Sejin Paik, Zhengyang Shan, Michelle Yilin Feng, Ge Gao, Yonish Tayal .etc. | http://arxiv.org/pdf/2501.18642v1 | None |
🆕 发布 | CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation | 立方差异:将基于扩散的图像模型重新用于全景生成 | Nikolai Kalischek, Michael Oechsle, Fabian Manhardt, Philipp Henzler, Konrad Schindler, Federico Tombari | http://arxiv.org/pdf/2501.17162v1 | None |
🆕 发布 | Text-to-Image Generation for Vocabulary Learning Using the Keyword Method | 基于关键词方法的文本到图像生成用于词汇学习 | Nuwan T. Attygalle, Matjaž Kljun, Aaron Quigley, Klen čOpič Pucihar, Jens Grubert, Verena Biener, Luis A. Leiva, Juri Yoneyama .etc. | http://arxiv.org/pdf/2501.17099v1 | None |
🆕 发布 | DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation | DiffSplat:重用图像扩散模型以实现可扩展高斯喷溅生成 | Chenguo Lin, Panwang Pan, Bangbang Yang, Zeming Li, Yadong Mu | http://arxiv.org/pdf/2501.16764v1 | None |
🆕 发布 | ITVTON:Virtual Try-On Diffusion Transformer Model Based on Integrated Image and Text | ITVTON:基于集成图像和文本的虚拟试穿扩散Transformer模型 | Haifeng Ni | http://arxiv.org/pdf/2501.16757v1 | None |
🆕 发布 | Separate Motion from Appearance: Customizing Motion via Customizing Text-to-Video Diffusion Models | 从外观中分离运动:通过定制文本到视频扩散模型定制运动 | Huijie Liu, Jingyun Wang, Shuai Ma, Jie Hu, Xiaoming Wei, Guoliang Kang | http://arxiv.org/pdf/2501.16714v1 | None |
📝 更新 | Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation | 基于槽位引导的预训练扩散模型在对象中心学习和组合生成中的应用 | Adil Kaan Akan, Yucel Yemez | http://arxiv.org/pdf/2501.15878v2 | https://kaanakan.github.io/SlotAdapt |
📝 更新 | StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning | 稳定材料:通过半监督学习增强材料生成多样性 | Giuseppe Vecchio | http://arxiv.org/pdf/2406.09293v3 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Adversarial Masked Autoencoder Purifier with Defense Transferability | 对抗性掩码自编码器净化器与防御迁移性 | Yuan-Chih Chen, Chun-Shien Lu | http://arxiv.org/pdf/2501.16904v1 | None |
📝 更新 | Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion | 统一渲染与逆渲染:通过双流扩散实现 | Zhifei Chen, Tianshuo Xu, Wenhang Ge, Leyi Wu, Dongyu Yan, Jing He, Luozhou Wang, Lu Zeng .etc. | http://arxiv.org/pdf/2412.15050v3 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Scenario Understanding of Traffic Scenes Through Large Visual Language Models | 通过大型视觉语言模型理解交通场景的场景感知 | Rivera Esteban, Lübberstedt Jannik, Nico Uhlemann, Markus Lienkamp | http://arxiv.org/pdf/2501.17131v1 | None |
🆕 发布 | RODEO: Robust Outlier Detection via Exposing Adaptive Out-of-Distribution Samples | RODEO:通过暴露自适应异常值样本实现鲁棒异常值检测 | Hossein Mirzaei, Mohammad Jafari, Hamid Reza Dehbashi, Ali Ansari, Sepehr Ghobadi, Masoud Hadi, Arshia Soltani Moakhar, Mohammad Azizmalayeri .etc. | http://arxiv.org/pdf/2501.16971v1 | None |
🆕 发布 | Image-based Geo-localization for Robotics: Are Black-box Vision-Language Models there yet? | 基于图像的机器人地理定位:黑盒视觉-语言模型是否已经到来? | Sania Waheed, Bruno Ferrarini, Michael Milford, Sarvapali D. Ramchurn, Shoaib Ehsan | http://arxiv.org/pdf/2501.16947v1 | None |
📝 更新 | SPECIAL: Zero-shot Hyperspectral Image Classification With CLIP | 特别篇:基于CLIP的零样本高光谱图像分类 | Li Pang, Jing Yao, Kaiyu Li, Xiangyong Cao | http://arxiv.org/pdf/2501.16222v2 | https://github.com/LiPang/SPECIAL. |
📝 更新 | The Hatching-Box: A Novel System for Automated Monitoring and Quantification of Drosophila melanogaster Developmental Behavior | 孵化箱:一种用于自动监测和量化黑腹果蝇发育行为的创新系统 | Julian Bigge, Maite Ogueta, Luis Garcia, Benjamin Risse | http://arxiv.org/pdf/2411.15390v3 | None |
📝 更新 | Cauchy activation function and XNet | 柯西激活函数与XNet | Xin Li, Zhihong Xia, Hongkun Zhang | http://arxiv.org/pdf/2409.19221v2 | None |
📝 更新 | FlexCap: Describe Anything in Images in Controllable Detail | FlexCap:以可控细节描述图像中的任何内容 | Debidatta Dwibedi, Vidhi Jain, Jonathan Tompson, Andrew Zisserman, Yusuf Aytar | http://arxiv.org/pdf/2403.12026v2 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Synthesizing 3D Abstractions by Inverting Procedural Buildings with Transformers | 通过逆变换程序化建筑生成3D抽象 | Maximilian Dax, Jordi Berbel, Jan Stria, Leonidas Guibas, Urs Bergmann | http://arxiv.org/pdf/2501.17044v2 | None |
🆕 发布 | Consistency Diffusion Models for Single-Image 3D Reconstruction with Priors | 一致性扩散模型在具有先验知识的单图像3D重建中的应用 | Chenru Jiang, Chengrui Zhang, Xi Yang, Jie Sun, Yifei Zhang, Bin Dong, Kaizhu Huang | http://arxiv.org/pdf/2501.16737v2 | None |
📝 更新 | Automatic Calibration of a Multi-Camera System with Limited Overlapping Fields of View for 3D Surgical Scene Reconstruction | 多摄像头系统有限重叠视场自动校准用于三维手术场景重建 | Tim Flückiger, Jonas Hein, Valery Fischer, Philipp Fürnstahl, Lilian Calvet | http://arxiv.org/pdf/2501.16221v2 | None |
📝 更新 | Acquiring Submillimeter-Accurate Multi-Task Vision Datasets for Computer-Assisted Orthopedic Surgery | 获取用于计算机辅助骨科手术的亚毫米级多任务视觉数据集 | Emma Most, Jonas Hein, Frédéric Giraud, Nicola A. Cavalcanti, Lukas Zingg, Baptiste Brument, Nino Louman, Fabio Carrillo .etc. | http://arxiv.org/pdf/2501.15371v2 | None |
📝 更新 | PokeFlex: A Real-World Dataset of Volumetric Deformable Objects for Robotics | PokeFlex:一个用于机器人的真实世界体积可变形物体数据集 | Jan Obrist, Miguel Zamora, Hehui Zheng, Ronan Hinchet, Firat Ozdemir, Juan Zarate, Robert K. Katzschmann, Stelian Coros | http://arxiv.org/pdf/2410.07688v2 | None |
📝 更新 | Manydepth2: Motion-Aware Self-Supervised Multi-Frame Monocular Depth Estimation in Dynamic Scenes | Manydepth2:动态场景中的运动感知自监督多帧单目深度估计 | Kaichen Zhou, Jia-Wang Bian, Jian-Qing Zheng, Jiaxing Zhong, Qian Xie, Niki Trigoni, Andrew Markham | http://arxiv.org/pdf/2312.15268v8 | https://github.com/kaichen-z/Manydepth2. |
📝 更新 | iMatching: Imperative Correspondence Learning | iMatching:命令式对应学习 | Zitong Zhan, Dasong Gao, Yun-Jou Lin, Youjie Xia, Chen Wang | http://arxiv.org/pdf/2312.02141v3 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Image Velocimetry using Direct Displacement Field estimation with Neural Networks for Fluids | 基于神经网络直接位移场估计的流体图像速度场测量 | Efraín Magaña, Francisco Sahli Costabal, Wernher Brevis | http://arxiv.org/pdf/2501.18641v1 | None |
🆕 发布 | What Really Matters for Learning-based LiDAR-Camera Calibration | 基于学习的激光雷达-相机标定真正重要的事情 | Shujuan Huang, Chunyu Lin, Yao Zhao | http://arxiv.org/pdf/2501.16969v1 | None |
📝 更新 | LinPrim: Linear Primitives for Differentiable Volumetric Rendering | 线性基元:可微分体渲染的线性原语 | Nicolas von Lützow, Matthias Nießner | http://arxiv.org/pdf/2501.16312v2 | None |
📝 更新 | Efficiency Bottlenecks of Convolutional Kolmogorov-Arnold Networks: A Comprehensive Scrutiny with ImageNet, AlexNet, LeNet and Tabular Classification | 卷积柯尔莫哥洛夫-阿诺德网络效率瓶颈:基于ImageNet、AlexNet、LeNet和表格分类的全面审视 | Ashim Dahal, Saydul Akbar Murad, Nick Rahimi | http://arxiv.org/pdf/2501.15757v2 | https://github.com/ashimdahal/Study-of-Convolutional-Kolmogorov-Arnold-networks |
📝 更新 | NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields | NeRAF:3D场景融合神经辐射场和声场 | Amandine Brunetto, Sascha Hornauer, Fabien Moutarde | http://arxiv.org/pdf/2405.18213v3 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Evaluating CrowdSplat: Perceived Level of Detail for Gaussian Crowds | 评估CrowdSplat:高斯人群的感知细节级别 | Xiaohan Sun, Yinghan Xu, John Dingliana, Carol O'Sullivan | http://arxiv.org/pdf/2501.17085v1 | None |
📝 更新 | LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes | LUDVIG:无需学习的二维视觉特征提升至高斯分层场景 | Juliette Marrie, Romain Menegaux, Michael Arbel, Diane Larlus, Julien Mairal | http://arxiv.org/pdf/2410.14462v4 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | IC-Portrait: In-Context Matching for View-Consistent Personalized Portrait | IC-Portrait:基于上下文的匹配以实现视角一致的个人肖像 | Han Yang, Enis Simsar, Sotiris Anagnostidis, Yanlong Zang, Thomas Hofmann, Ziwei Liu | http://arxiv.org/pdf/2501.17159v2 | None |
🆕 发布 | Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding | 探索显式时间建模在多模态大型语言模型视频理解中的作用 | Yun Li, Zhe Liu, Yajing Kong, Guangrui Li, Jiyuan Zhang, Chao Bian, Feng Liu, Lina Yao .etc. | http://arxiv.org/pdf/2501.16786v1 | None |
🆕 发布 | 3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow | 3D-MoE:一种通过校正流进行3D视觉和姿态扩散的多模态专家混合模型 | Yueen Ma, Yuzheng Zhuang, Jianye Hao, Irwin King | http://arxiv.org/pdf/2501.16698v1 | None |
🆕 发布 | CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs | CHiP:多模态LLMs的跨模态层次直接偏好优化 | Jinlan Fu, Shenzhen Huangfu, Hao Fei, Xiaoyu Shen, Bryan Hooi, Xipeng Qiu, See-Kiong Ng | http://arxiv.org/pdf/2501.16629v1 | https://github.com/LVUGAI/CHiP. |
📝 更新 | VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding | 视频LLaMA 3:图像和视频理解的领先多模态基础模型 | Boqiang Zhang, Kehan Li, Zesen Cheng, Zhiqiang Hu, Yuqian Yuan, Guanzheng Chen, Sicong Leng, Yuming Jiang .etc. | http://arxiv.org/pdf/2501.13106v3 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Machine learning of microstructure--property relationships in materials with robust features from foundational vision transformers | 材料中基于基础视觉Transformer的稳健特征微结构-性能关系机器学习 | Sheila E. Whitman, Marat I. Latypov | http://arxiv.org/pdf/2501.18637v1 | None |
🆕 发布 | EdgeMLOps: Operationalizing ML models with Cumulocity IoT and thin-edge.io for Visual quality Inspection | 边缘MLOps:利用Cumulocity IoT和thin-edge.io实现机器学习模型在视觉质量检测中的运营 | Kanishk Chaturvedi, Johannes Gasthuber, Mohamed Abdelaal | http://arxiv.org/pdf/2501.17062v1 | None |
🆕 发布 | RG-Attn: Radian Glue Attention for Multi-modality Multi-agent Cooperative Perception | RG-Attn:多模态多智能体协同感知的径向粘合注意力 | Lantao Li, Kang Yang, Wenqi Zhang, Xiaoxue Wang, Chen Sun | http://arxiv.org/pdf/2501.16803v1 | None |
🆕 发布 | SSF-PAN: Semantic Scene Flow-Based Perception for Autonomous Navigation in Traffic Scenarios | SSF-PAN:基于语义场景流的交通场景自主导航感知 | Yinqi Chen, Meiying Zhang, Qi Hao, Guang Zhou | http://arxiv.org/pdf/2501.16754v1 | None |
🆕 发布 | Dream to Drive with Predictive Individual World Model | 梦境驾驶:基于预测性个体世界模型的驾驶 | Yinfeng Gao, Qichao Zhang, Da-wei Ding, Dongbin Zhao | http://arxiv.org/pdf/2501.16733v1 | None |
🆕 发布 | One Head Eight Arms: Block Matrix based Low Rank Adaptation for CLIP-based Few-Shot Learning | 一头八臂:基于块矩阵的低秩自适应方法在CLIP基础上的小样本学习 | Chunpeng Zhou, Qianqian Shen, Zhi Yu, Jiajun Bu, Haishuai Wang | http://arxiv.org/pdf/2501.16720v1 | None |
🆕 发布 | SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation | SliceOcc:基于垂直切片表示的室内3D语义占用预测 | Jianing Li, Ming Lu, Hao Wang, Chenyang Gu, Wenzhao Zheng, Li Du, Shanghang Zhang | http://arxiv.org/pdf/2501.16684v1 | https://github.com/NorthSummer/SliceOcc. |
🆕 发布 | Improving Vision-Language-Action Model with Online Reinforcement Learning | 基于在线强化学习的视觉-语言-动作模型改进 | Yanjiang Guo, Jianke Zhang, Xiaoyu Chen, Xiang Ji, Yen-Jen Wang, Yucheng Hu, Jianyu Chen | http://arxiv.org/pdf/2501.16664v1 | None |
🆕 发布 | Predicting 3D representations for Dynamic Scenes | 预测动态场景的3D表示 | Di Qi, Tong Yang, Beining Wang, Xiangyu Zhang, Wenqiang Zhang | http://arxiv.org/pdf/2501.16617v1 | None |
📝 更新 | Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks | 移动智能体-E:用于复杂任务的自我进化移动助手 | Zhenhailong Wang, Haiyang Xu, Junyang Wang, Xi Zhang, Ming Yan, Ji Zhang, Fei Huang, Heng Ji | http://arxiv.org/pdf/2501.11733v2 | https://x-plug.github.io/MobileAgent. |
📝 更新 | Competency-Aware Planning for Probabilistically Safe Navigation Under Perception Uncertainty | 感知不确定性下的概率安全导航的胜任力感知规划 | Sara Pohland, Claire Tomlin | http://arxiv.org/pdf/2409.06111v4 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | B-FPGM: Lightweight Face Detection via Bayesian-Optimized Soft FPGM Pruning | B-FPGM:基于贝叶斯优化的软FPGM剪枝的轻量级人脸检测 | Nikolaos Kaparinos, Vasileios Mezaris | http://arxiv.org/pdf/2501.16917v1 | https://github.com/IDTITI/B-FPGM. |
🆕 发布 | Frequency Matters: Explaining Biases of Face Recognition in the Frequency Domain | 频率决定一切:解释频域中人脸识别的偏差 | Marco Huber, Fadi Boutros, Naser Damer | http://arxiv.org/pdf/2501.16896v1 | None |
🆕 发布 | Experimenting with Affective Computing Models in Video Interviews with Spanish-speaking Older Adults | 在西班牙语老年人视频面试中实验情感计算模型 | Josep Lopez Camunas, Cristina Bustos, Yanjun Zhu, Raquel Ros, Agata Lapedriza | http://arxiv.org/pdf/2501.16870v1 | None |
🆕 发布 | B-RIGHT: Benchmark Re-evaluation for Integrity in Generalized Human-Object Interaction Testing | B-RIGHT:广义人-物交互测试中完整性的基准重新评估 | Yoojin Jang, Junsu Kim, Hayeon Kim, Eun-ki Lee, Eun-sol Kim, Seungryul Baek, Jaejun Yoo | http://arxiv.org/pdf/2501.16724v1 | None |
📝 更新 | EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face Animation | 情感面孔:情感-内容解耦的语音驱动3D说话人脸动画 | Yihong Lin, Liang Peng, Xianjia Wu, Jianqiao Hu, Xiandong Li, Wenxiong Kang, Songju Lei, Huang Xu | http://arxiv.org/pdf/2408.11518v2 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Towards Understanding Depth Perception in Foveated Rendering | 朝向理解注视点渲染中的深度感知 | Sophie Kergaßner, Taimoor Tariq, Piotr Didyk | http://arxiv.org/pdf/2501.18635v1 | None |
🆕 发布 | Not Every Patch is Needed: Towards a More Efficient and Effective Backbone for Video-based Person Re-identification | 并非每个补丁都必不可少:迈向更高效、更有效的基于视频的人体重识别骨干网络 | Lanyun Zhu, Tianrun Chen, Deyi Ji, Jieping Ye, Jun Liu | http://arxiv.org/pdf/2501.16811v1 | None |
🆕 发布 | FlexMotion: Lightweight, Physics-Aware, and Controllable Human Motion Generation | 轻量级、物理感知且可控的人体运动生成:FlexMotion | Arvin Tashakori, Arash Tashakori, Gongbo Yang, Z. Jane Wang, Peyman Servati | http://arxiv.org/pdf/2501.16778v1 | None |
📝 更新 | GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer | GLDiTalker:基于图潜在扩散变换器的语音驱动3D面部动画 | Yihong Lin, Zhaoxin Fan, Xianjia Wu, Lingyu Xiong, Liang Peng, Xiandong Li, Wenxiong Kang, Songju Lei .etc. | http://arxiv.org/pdf/2408.01826v3 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Target-driven Self-Distillation for Partial Observed Trajectories Forecasting | 基于目标驱动的部分观测轨迹预测的自蒸馏 | Pengfei Zhu, Peng Shu, Mengshi Qi, Liang Liu, Huadong Ma | http://arxiv.org/pdf/2501.16767v1 | None |
🆕 发布 | CascadeV: An Implementation of Wurstchen Architecture for Video Generation | 级联V:视频生成中Wurstchen架构的实现 | Wenfeng Lin, Jiangchuan Wei, Boyuan Liu, Yichen Zhang, Shiyue Yan, Mingyu Guo | http://arxiv.org/pdf/2501.16612v1 | https://github.com/bytedance/CascadeV. |
📝 更新 | Distilling foundation models for robust and efficient models in digital pathology | 从基础模型中提炼出数字病理学中的鲁棒和高效模型 | Alexandre Filiot, Nicolas Dop, Oussama Tchita, Auriane Riou, Rémy Dubois, Thomas Peeters, Daria Valter, Marin Scalbert .etc. | http://arxiv.org/pdf/2501.16239v2 | None |
📝 更新 | SelfPrompt: Confidence-Aware Semi-Supervised Tuning for Robust Vision-Language Model Adaptation | 自提示:基于置信度的鲁棒视觉-语言模型自适应半监督调优 | Shuvendu Roy, Ali Etemad | http://arxiv.org/pdf/2501.14148v2 | None |
📝 更新 | Multi-aspect Knowledge Distillation with Large Language Model | 多方面知识蒸馏与大型语言模型 | Taegyeong Lee, Jinsik Bang, Soyeong Kwon, Taehwan Kim | http://arxiv.org/pdf/2501.13341v3 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Post-Training Quantization for 3D Medical Image Segmentation: A Practical Study on Real Inference Engines | 3D医学图像分割的培训后量化:针对真实推理引擎的实际研究 | Chongyu Qu, Ritchie Zhao, Ye Yu, Bin Liu, Tianyuan Yao, Junchao Zhu, Bennett A. Landman, Yucheng Tang .etc. | http://arxiv.org/pdf/2501.17343v1 | https://github.com/hrlblab/PTQ. |
🆕 发布 | ViT-2SPN: Vision Transformer-based Dual-Stream Self-Supervised Pretraining Networks for Retinal OCT Classification | ViT-2SPN:基于视觉Transformer的双流自监督预训练网络用于视网膜OCT分类 | Mohammadreza Saraei, Igor Kozak, Eung-Joo Lee | http://arxiv.org/pdf/2501.17260v1 | None |
🆕 发布 | A Hybrid Deep Learning CNN Model for Enhanced COVID-19 Detection from Computed Tomography (CT) Scan Images | 混合深度学习CNN模型用于增强CT扫描图像的COVID-19检测 | Suresh Babu Nettur, Shanthi Karpurapu, Unnati Nettur, Likhit Sagar Gajja, Sravanthy Myneni, Akhil Dusi, Lalithya Posham | http://arxiv.org/pdf/2501.17160v1 | None |
🆕 发布 | VidSole: A Multimodal Dataset for Joint Kinetics Quantification and Disease Detection with Deep Learning | VidSole:一种用于深度学习联合运动学量化与疾病检测的多模态数据集 | Archit Kambhamettu, Samantha Snyder, Maliheh Fakhar, Samuel Audia, Ross Miller, Jae Kun Shim, Aniket Bera | http://arxiv.org/pdf/2501.17890v1 | None |
🆕 发布 | FedEFM: Federated Endovascular Foundation Model with Unseen Data | 联邦血管基础模型与未见数据 | Tuong Do, Nghia Vu, Tudor Jianu, Baoru Huang, Minh Vu, Jionglong Su, Erman Tjiputra, Quang D. Tran .etc. | http://arxiv.org/pdf/2501.16992v1 | None |
🆕 发布 | Ultra-high resolution multimodal MRI dense labelled holistic brain atlas | 超高清多模态MRI密集标注整体脑图谱 | José V. Manjón, Sergio Morell-Ortega, Marina Ruiz-Perez, Boris Mansencal, Edern Le Bot, Marien Gadea, Enrique Lanuza, Gwenaelle Catheline .etc. | http://arxiv.org/pdf/2501.16879v1 | None |
🆕 发布 | Dynamic Hypergraph Representation for Bone Metastasis Cancer Analysis | 动态超图表示在骨转移癌分析中的应用 | Yuxuan Chen, Jiawen Li, Huijuan Shi, Yang Xu, Tian Guan, Lianghui Zhu, Yonghong He, Anjia Han | http://arxiv.org/pdf/2501.16787v1 | None |
🆕 发布 | Efficient Knowledge Distillation of SAM for Medical Image Segmentation | 高效的知识蒸馏:SAM在医学图像分割中的应用 | Kunal Dasharath Patil, Gowthamaan Palani, Ganapathy Krishnamurthi | http://arxiv.org/pdf/2501.16740v1 | None |
🆕 发布 | Point Cloud Upsampling as Statistical Shape Model for Pelvic | 点云上采样作为骨盆统计形状模型 | Tongxu Zhang, Bei Wang | http://arxiv.org/pdf/2501.16716v1 | None |
🆕 发布 | Polyp-Gen: Realistic and Diverse Polyp Image Generation for Endoscopic Dataset Expansion | Polyp-Gen:用于内镜数据集扩展的逼真且多样化的息肉图像生成 | Shengyuan Liu, Zhen Chen, Qiushi Yang, Weihao Yu, Di Dong, Jiancong Hu, Yixuan Yuan | http://arxiv.org/pdf/2501.16679v2 | https://github.com/CUHK-AIM-Group/Polyp-Gen. |
🆕 发布 | Molecular-driven Foundation Model for Oncologic Pathology | 分子驱动肿瘤病理学基础模型 | Anurag Vaidya, Andrew Zhang, Guillaume Jaume, Andrew H. Song, Tong Ding, Sophia J. Wagner, Ming Y. Lu, Paul Doucet .etc. | http://arxiv.org/pdf/2501.16652v1 | None |
📝 更新 | Steerable Conditional Diffusion for Out-of-Distribution Adaptation in Medical Image Reconstruction | 可调节条件扩散在医学图像重建中的分布外适应 | Riccardo Barbano, Alexander Denker, Hyungjin Chung, Tae Hoon Roh, Simon Arridge, Peter Maass, Bangti Jin, Jong Chul Ye | http://arxiv.org/pdf/2308.14409v3 | None |