状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding | 物理世界理解中的视觉-语言模型基准与提升:PhysBench | Wei Chow, Jiageng Mao, Boyi Li, Daniel Seita, Vitor Guizilini, Yue Wang | http://arxiv.org/pdf/2501.16411v2 | None |
🆕 发布 | Multi-view Structural Convolution Network for Domain-Invariant Point Cloud Recognition of Autonomous Vehicles | 多视角结构卷积网络用于自动驾驶车辆领域不变点云识别 | Younggun Kim, Beomsik Cho, Seonghoon Ryoo, Soomok Lee | http://arxiv.org/pdf/2501.16289v1 | https://github.com/MLMLab/MSCN. |
🆕 发布 | PDC-ViT : Source Camera Identification using Pixel Difference Convolution and Vision Transformer | PDC-ViT:基于像素差异卷积和视觉Transformer的源相机识别 | Omar Elharrouss, Younes Akbari, Noor Almaadeed, Somaya Al-Maadeed, Fouad Khelifi, Ahmed Bouridane | http://arxiv.org/pdf/2501.16227v1 | None |
📝 更新 | Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual Classification | 解读您的决策:视觉分类中的逻辑推理正则化以实现泛化 | Zhaorui Tan, Xi Yang, Qiufeng Wang, Anh Nguyen, Kaizhu Huang | http://arxiv.org/pdf/2410.04492v5 | None |
📝 更新 | Dimensions underlying the representational alignment of deep neural networks with humans | 深神经网络与人类表征对齐的潜在维度 | Florian P. Mahner, Lukas Muttenthaler, Umut Güçlü, Martin N. Hebart | http://arxiv.org/pdf/2406.19087v2 | None |
📝 更新 | Task Me Anything | 任意任务处理 | Jieyu Zhang, Weikai Huang, Zixian Ma, Oscar Michel, Dong He, Tanmay Gupta, Wei-Chiu Ma, Ali Farhadi .etc. | http://arxiv.org/pdf/2406.11775v2 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Efficient Object Detection of Marine Debris using Pruned YOLO Model | 基于剪枝YOLO模型的海洋垃圾高效目标检测 | Abi Aryaza, Novanto Yudistira, Tibyani | http://arxiv.org/pdf/2501.16571v1 | None |
🆕 发布 | Cross-Domain Semantic Segmentation with Large Language Model-Assisted Descriptor Generation | 跨域语义分割:基于大型语言模型辅助描述符生成的技术 | Philip Hughes, Larry Burns, Luke Adams | http://arxiv.org/pdf/2501.16467v1 | None |
🆕 发布 | DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation | DynAlign:无监督跨域分割的动态分类对齐 | Han Sun, Rui Gong, Ismail Nejjar, Olga Fink | http://arxiv.org/pdf/2501.16410v1 | None |
🆕 发布 | Large Models in Dialogue for Active Perception and Anomaly Detection | 大型模型在主动感知和异常检测中的对话 | Tzoulio Chamiti, Nikolaos Passalis, Anastasios Tefas | http://arxiv.org/pdf/2501.16300v1 | None |
🆕 发布 | Lightweight Weighted Average Ensemble Model for Pneumonia Detection in Chest X-Ray Images | 轻量级加权平均集成模型在胸部X光片中的肺炎检测 | Suresh Babu Nettur, Shanthi Karpurapu, Unnati Nettur, Likhit Sagar Gajja, Sravanthy Myneni, Akhil Dusi, Lalithya Posham | http://arxiv.org/pdf/2501.16249v2 | None |
🆕 发布 | The Linear Attention Resurrection in Vision Transformer | 视觉Transformer中的线性注意力复兴 | Chuanyang Zheng | http://arxiv.org/pdf/2501.16182v1 | None |
🆕 发布 | Addressing Out-of-Label Hazard Detection in Dashcam Videos: Insights from the COOOL Challenge | 应对行车记录仪视频中的标签外危险检测:COOOL挑战赛的见解 | Anh-Kiet Duong, Petra Gomez-Krämer | http://arxiv.org/pdf/2501.16037v1 | https://github.com/ffyyytt/COOOL_2025. |
🆕 发布 | Controllable Forgetting Mechanism for Few-Shot Class-Incremental Learning | 可控遗忘机制在少样本类增量学习中的应用 | Kirill Paramonov, Mete Ozay, Eunju Yang, Jijoong Moon, Umberto Michieli | http://arxiv.org/pdf/2501.15998v1 | None |
🆕 发布 | D-PLS: Decoupled Semantic Segmentation for 4D-Panoptic-LiDAR-Segmentation | D-PLS:4D全景激光雷达分割的解耦语义分割 | Maik Steinhauser, Laurenz Reichardt, Nikolas Ebert, Oliver Wasenmüller | http://arxiv.org/pdf/2501.15870v1 | None |
🆕 发布 | Can Location Embeddings Enhance Super-Resolution of Satellite Imagery? | 卫星图像超分辨率中的位置嵌入能否增强? | Daniel Panangian, Ksenia Bittner | http://arxiv.org/pdf/2501.15847v2 | None |
📝 更新 | Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving | 自动驾驶中输入监测用视觉基础模型的基准测试 | Mert Keser, Halil Ibrahim Orhan, Niki Amini-Naieni, Gesina Schwalbe, Alois Knoll, Matthias Rottmann | http://arxiv.org/pdf/2501.08083v2 | None |
📝 更新 | Label-Efficient Data Augmentation with Video Diffusion Models for Guidewire Segmentation in Cardiac Fluoroscopy | 基于视频扩散模型的标签高效数据增强在心脏荧光透视导丝分割中的应用 | Shaoyan Pan, Yikang Liu, Lin Zhao, Eric Z. Chen, Xiao Chen, Terrence Chen, Shanhui Sun | http://arxiv.org/pdf/2412.16050v4 | None |
📝 更新 | Segmentation Dataset for Reinforced Concrete Construction | 钢筋混凝土结构分割数据集 | Patrick Schmidt, Lazaros Nalpantidis | http://arxiv.org/pdf/2407.09372v2 | None |
📝 更新 | Comprehensive Performance Evaluation of YOLO11, YOLOv10, YOLOv9 and YOLOv8 on Detecting and Counting Fruitlet in Complex Orchard Environments | 全面评估YOLO11、YOLOv10、YOLOv9和YOLOv8在复杂果园环境中检测和计数幼果的性能 | Ranjan Sapkota, Zhichao Meng, Martin Churuvija, Xiaoqiang Du, Zenghong Ma, Manoj Karkee | http://arxiv.org/pdf/2407.12040v6 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion | 文档链:一个高效的AI驱动文档转换开源工具包 | Nikolaos Livathinos, Christoph Auer, Maksym Lysak, Ahmed Nassar, Michele Dolfi, Panos Vagenas, Cesar Berrospi Ramis, Matteo Omenetti .etc. | http://arxiv.org/pdf/2501.17887v1 | None |
🆕 发布 | Understanding Long Videos via LLM-Powered Entity Relation Graphs | 通过LLM驱动的实体关系图理解长视频 | Meng Chu, Yicong Li, Tat-Seng Chua | http://arxiv.org/pdf/2501.15953v1 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation | LoRA-X:连接基础模型与无需训练的跨模型自适应 | Farzad Farhadzadeh, Debasmit Das, Shubhankar Borse, Fatih Porikli | http://arxiv.org/pdf/2501.16559v1 | None |
🆕 发布 | PackDiT: Joint Human Motion and Text Generation via Mutual Prompting | PackDiT:通过相互提示联合人类动作和文本生成 | Zhongyu Jiang, Wenhao Chai, Zhuoran Zhou, Cheng-Yen Yang, Hsiang-Wei Huang, Jenq-Neng Hwang | http://arxiv.org/pdf/2501.16551v1 | None |
🆕 发布 | RelightVid: Temporal-Consistent Diffusion Model for Video Relighting | 视频重光照:时序一致扩散模型 | Ye Fang, Zeyi Sun, Shangzhan Zhang, Tong Wu, Yinghao Xu, Pan Zhang, Jiaqi Wang, Gordon Wetzstein .etc. | http://arxiv.org/pdf/2501.16330v1 | None |
🆕 发布 | Efficient Portrait Matte Creation With Layer Diffusion and Connectivity Priors | 高效的人像磨皮:基于层扩散和连接先验 | Zhiyuan Lu, Hao Lu, Hua Huang | http://arxiv.org/pdf/2501.16147v1 | None |
🆕 发布 | Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation | 基于槽位引导的预训练扩散模型在对象中心学习和组合生成中的应用 | Adil Kaan Akan, Yucel Yemez | http://arxiv.org/pdf/2501.15878v2 | https://kaanakan.github.io/SlotAdapt |
📝 更新 | Textualize Visual Prompt for Image Editing via Diffusion Bridge | 通过扩散桥文本化视觉提示进行图像编辑 | Pengcheng Xu, Qingnan Fan, Fei Kou, Shuai Qin, Hong Gu, Ruoyu Zhao, Charles Ling, Boyu Wang | http://arxiv.org/pdf/2501.03495v2 | None |
📝 更新 | Make-A-Texture: Fast Shape-Aware Texture Generation in 3 Seconds | 制作纹理:3秒内快速生成形状感知纹理 | Xiaoyu Xiang, Liat Sless Gorelik, Yuchen Fan, Omri Armstrong, Forrest Iandola, Yilei Li, Ita Lifshitz, Rakesh Ranjan | http://arxiv.org/pdf/2412.07766v2 | None |
📝 更新 | Deciphering Oracle Bone Language with Diffusion Models | 《利用扩散模型解读甲骨文语言》 | Haisu Guan, Huanxin Yang, Xinyu Wang, Shengwei Han, Yongge Liu, Lianwen Jin, Xiang Bai, Yuliang Liu | http://arxiv.org/pdf/2406.00684v2 | https://github.com/guanhaisu/OBSD. |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | MatCLIP: Light- and Shape-Insensitive Assignment of PBR Material Models | MatCLIP:对PBR材质模型的光照和形状无关的分配 | Michael Birsak, John Femiani, Biao Zhang, Peter Wonka | http://arxiv.org/pdf/2501.15981v1 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | ARFlow: Autogressive Flow with Hybrid Linear Attention | ARFlow:具有混合线性注意力的自回归流 | Mude Hui, Rui-Jie Zhu, Songlin Yang, Yu Zhang, Zirui Wang, Yuyin Zhou, Jason Eshraghian, Cihang Xie | http://arxiv.org/pdf/2501.16085v1 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration | 引导Mamba处理复杂纹理:一种高效的纹理感知状态空间模型用于图像恢复 | Long Peng, Xin Di, Zhanfeng Feng, Wenbo Li, Renjing Pei, Yang Wang, Xueyang Fu, Yang Cao .etc. | http://arxiv.org/pdf/2501.16583v1 | None |
🆕 发布 | Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity | 混合曼巴:通过模态感知稀疏性增强多模态状态空间模型 | Weixin Liang, Junhong Shen, Genghan Zhang, Ning Dong, Luke Zettlemoyer, Lili Yu | http://arxiv.org/pdf/2501.16295v1 | https://github.com/Weixin-Liang/Mixture-of-Mamba |
🆕 发布 | SPECIAL: Zero-shot Hyperspectral Image Classification With CLIP | 特别篇:基于CLIP的零样本高光谱图像分类 | Li Pang, Jing Yao, Kaiyu Li, Xiangyong Cao | http://arxiv.org/pdf/2501.16222v2 | https://github.com/LiPang/SPECIAL. |
🆕 发布 | UDBE: Unsupervised Diffusion-based Brightness Enhancement in Underwater Images | 无监督水下图像扩散亮度增强:UDBE | Tatiana Taís Schein, Gustavo Pereira de Almeira, Stephanie Loi Brião, Rodrigo Andrade de Bem, Felipe Gomes de Oliveira, Paulo L. J. Drews-Jr | http://arxiv.org/pdf/2501.16211v1 | https://github.com/gusanagy/UDBE. |
🆕 发布 | CILP-FGDI: Exploiting Vision-Language Model for Generalizable Person Re-Identification | CILP-FGDI:利用视觉-语言模型进行泛化的人脸重识别 | Huazhong Zhao, Lei Qi, Xin Geng | http://arxiv.org/pdf/2501.16065v2 | None |
🆕 发布 | Freestyle Sketch-in-the-Loop Image Segmentation | 自由式循环草图图像分割 | Subhadeep Koley, Viswanatha Reddy Gajjala, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Ayan Kumar Bhunia, Yi-Zhe Song | http://arxiv.org/pdf/2501.16022v1 | None |
🆕 发布 | CausalSR: Structural Causal Model-Driven Super-Resolution with Counterfactual Inference | 因果超分辨率:基于结构因果模型和反事实推理的超级分辨率 | Zhengyang Lu, Bingjie Lu, Feng Wang | http://arxiv.org/pdf/2501.15852v1 | None |
🆕 发布 | MM-Retinal V2: Transfer an Elite Knowledge Spark into Fundus Vision-Language Pretraining | MM-Retinal V2:将精英知识火花迁移至眼底视觉-语言预训练 | Ruiqi Wu, Na Su, Chenran Zhang, Tengfei Ma, Tao Zhou, Zhiting Cui, Nianfeng Tang, Tianyu Mao .etc. | http://arxiv.org/pdf/2501.15798v1 | https://github.com/lxirich/MM-Retinal. |
🆕 发布 | Efficient Attention-Sharing Information Distillation Transformer for Lightweight Single Image Super-Resolution | 高效注意力共享信息蒸馏Transformer用于轻量级单图像超分辨率 | Karam Park, Jae Woong Soh, Nam Ik Cho | http://arxiv.org/pdf/2501.15774v1 | None |
🆕 发布 | VLMaterial: Procedural Material Generation with Large Vision-Language Models | VLMaterial:基于大型视觉-语言模型的程序化材质生成 | Beichen Li, Rundi Wu, Armando Solar-Lezama, Changxi Zheng, Liang Shi, Bernd Bickel, Wojciech Matusik | http://arxiv.org/pdf/2501.18623v1 | None |
📝 更新 | Text-driven Adaptation of Foundation Models for Few-shot Surgical Workflow Analysis | 基于文本驱动的基座模型在少样本手术流程分析中的应用 | Tingxuan Chen, Kun Yuan, Vinkle Srivastav, Nassir Navab, Nicolas Padoy | http://arxiv.org/pdf/2501.09555v2 | https://github.com/CAMMA-public/Surg-FTDA |
📝 更新 | MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning | MoColl:基于代理的特定和通用模型协作进行图像描述 | Pu Yang, Bin Dong | http://arxiv.org/pdf/2501.01834v3 | None |
📝 更新 | Accelerating lensed quasar discovery and modeling with physics-informed variational autoencoders | 加速使用物理信息变分自编码器进行透镜引力透镜类星体发现和建模 | Irham T. Andika, Stefan Schuldt, Sherry H. Suyu, Satadru Bag, Raoul Cañameras, Alejandra Melo, Claudio Grillo, James H. H. Chan | http://arxiv.org/pdf/2412.12709v3 | None |
📝 更新 | BioTrove: A Large Curated Image Dataset Enabling AI for Biodiversity | 生物宝库:一个大型精选图像数据集,助力人工智能在生物多样性领域的应用 | Chih-Hsuan Yang, Benjamin Feuer, Zaki Jubery, Zi K. Deng, Andre Nakkab, Md Zahid Hasan, Shivani Chiranjeevi, Kelly Marshall .etc. | http://arxiv.org/pdf/2406.17720v2 | None |
📝 更新 | Learning Point Spread Function Invertibility Assessment for Image Deconvolution | 学习图像去卷积中点扩散函数可逆性评估 | Romario Gualdrón-Hurtado, Roman Jacome, Sergio Urrea, Henry Arguello, Luis Gonzalez | http://arxiv.org/pdf/2405.16343v3 | None |
📝 更新 | A New Cross-Space Total Variation Regularization Model for Color Image Restoration with Quaternion Blur Operator | 一种基于四元数模糊算子的彩色图像恢复的新跨空间全变分正则化模型 | Zhigang Jia, Yuelian Xiang, Meixiang Zhao, Tingting Wu, Michael K. Ng | http://arxiv.org/pdf/2405.12114v3 | None |
📝 更新 | QOC: Quantum On-Chip Training with Parameter Shift and Gradient Pruning | QOC:基于参数移位和梯度剪枝的片上量子训练 | Hanrui Wang, Zirui Li, Jiaqi Gu, Yongshan Ding, David Z. Pan, Song Han | http://arxiv.org/pdf/2202.13239v3 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Automatic Calibration of a Multi-Camera System with Limited Overlapping Fields of View for 3D Surgical Scene Reconstruction | 多摄像头系统有限重叠视场自动校准用于三维手术场景重建 | Tim Flückiger, Jonas Hein, Valery Fischer, Philipp Fürnstahl, Lilian Calvet | http://arxiv.org/pdf/2501.16221v2 | None |
🆕 发布 | 3D Reconstruction of non-visible surfaces of objects from a Single Depth View -- Comparative Study | 从单张深度图中重建物体不可见表面的3D重建——比较研究 | Rafał Staszak, Piotr Michałek, Jakub Chudziński, Marek Kopicki, Dominik Belter | http://arxiv.org/pdf/2501.16101v1 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | LinPrim: Linear Primitives for Differentiable Volumetric Rendering | 线性基元:可微分体渲染的线性原语 | Nicolas von Lützow, Matthias Nießner | http://arxiv.org/pdf/2501.16312v2 | None |
🆕 发布 | A Radiance Field Loss for Fast and Simple Emissive Surface Reconstruction | 辐射场损失用于快速简单发射表面重建 | Ziyi Zhang, Nicolas Roussel, Thomas Müller, Tizian Zeltner, Merlin Nimier-David, Fabrice Rousselle, Wenzel Jakob | http://arxiv.org/pdf/2501.18627v1 | None |
🆕 发布 | Efficiency Bottlenecks of Convolutional Kolmogorov-Arnold Networks: A Comprehensive Scrutiny with ImageNet, AlexNet, LeNet and Tabular Classification | 卷积柯尔莫哥洛夫-阿诺德网络效率瓶颈:基于ImageNet、AlexNet、LeNet和表格分类的全面审视 | Ashim Dahal, Saydul Akbar Murad, Nick Rahimi | http://arxiv.org/pdf/2501.15757v2 | https://github.com/ashimdahal/Study-of-Convolutional-Kolmogorov-Arnold-networks |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Deformable Beta Splatting | 可变形贝塔分层 | Rong Liu, Dylan Sun, Meida Chen, Yue Wang, Andrew Feng | http://arxiv.org/pdf/2501.18630v1 | None |
📝 更新 | 3DGS$^2$: Near Second-order Converging 3D Gaussian Splatting | 3DGS$^2$:近二阶收敛的3D高斯分层渲染 | Lei Lan, Tianjia Shao, Zixuan Lu, Yu Zhang, Chenfanfu Jiang, Yin Yang | http://arxiv.org/pdf/2501.13975v2 | None |
📝 更新 | EasySplat: View-Adaptive Learning makes 3D Gaussian Splatting Easy | EasySplat:视图自适应学习让3D高斯分层变得简单 | Ao Gao, Luosong Guo, Tao Chen, Zhao Wang, Ying Tai, Jian Yang, Zhenyu Zhang | http://arxiv.org/pdf/2501.01003v2 | None |
📝 更新 | PEP-GS: Perceptually-Enhanced Precise Structured 3D Gaussians for View-Adaptive Rendering | 感知增强精确结构化3D高斯用于视适应渲染 | Junxi Jin, Xiulai Li, Haiping Huang, Lianjun Liu, Yujie Sun, Boyi Liu | http://arxiv.org/pdf/2411.05731v2 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers | FALCON:通过视觉注册解决高分辨率多模态大型语言模型中的视觉冗余和碎片化 | Renshan Zhang, Rui Shao, Gongwei Chen, Kaiwen Zhou, Weili Guan, Liqiang Nie | http://arxiv.org/pdf/2501.16297v1 | None |
🆕 发布 | Can Multimodal Large Language Models be Guided to Improve Industrial Anomaly Detection? | 多模态大型语言模型能否被引导以提升工业异常检测? | Zhiling Chen, Hanning Chen, Mohsen Imani, Farhad Imani | http://arxiv.org/pdf/2501.15795v1 | None |
📝 更新 | 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining | 2.5年课堂经验:视觉-语言预训练的多模态教科书 | Wenqi Zhang, Hang Zhang, Xin Li, Jiashuo Sun, Yongliang Shen, Weiming Lu, Deli Zhao, Yueting Zhuang .etc. | http://arxiv.org/pdf/2501.00958v3 | https://github.com/DAMO-NLP-SG/multimodal_textbook. |
📝 更新 | TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data | TEOChat:一种用于时序地球观测数据的大规模视觉语言助手 | Jeremy Andrew Irvin, Emily Ruoyu Liu, Joyce Chuyi Chen, Ines Dormoy, Jinyoung Kim, Samar Khanna, Zhuo Zheng, Stefano Ermon | http://arxiv.org/pdf/2410.06234v2 | https://github.com/ermongroup/TEOChat |
📝 更新 | E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection | E2E-MFD:迈向端到端同步多模态融合检测 | Jiaqing Zhang, Mingxiang Cao, Weiying Xie, Jie Lei, Daixun Li, Wenbo Huang, Yunsong Li, Xue Yang | http://arxiv.org/pdf/2403.09323v4 | https://github.com/icey-zhang/E2E-MFD. |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | PhysAnimator: Physics-Guided Generative Cartoon Animation | 物理引导的生成卡通动画:PhysAnimator | Tianyi Xie, Yiwei Zhao, Ying Jiang, Chenfanfu Jiang | http://arxiv.org/pdf/2501.16550v1 | None |
🆕 发布 | Objects matter: object-centric world models improve reinforcement learning in visually complex environments | 物体至上:以物体为中心的世界模型提升视觉复杂环境中的强化学习 | Weipu Zhang, Adam Jelley, Trevor McInroe, Amos Storkey | http://arxiv.org/pdf/2501.16443v1 | None |
🆕 发布 | Improving Tropical Cyclone Forecasting With Video Diffusion Models | 利用视频扩散模型提升热带气旋预报 | Zhibo Ren, Pritthijit Nath, Pancham Shukla | http://arxiv.org/pdf/2501.16003v1 | https://github.com/Ren-creater/forecast-video-diffmodels. |
🆕 发布 | Evaluating Data Influence in Meta Learning | 评估元学习中的数据影响 | Chenyang Ren, Huanyi Xie, Shu Yang, Meng Ding, Lijie Hu, Di Wang | http://arxiv.org/pdf/2501.15963v1 | None |
🆕 发布 | The Components of Collaborative Joint Perception and Prediction -- A Conceptual Framework | 协同联合感知与预测的组成部分——一个概念框架 | Lei Wan, Hannan Ejaz Keen, Alexey Vinel | http://arxiv.org/pdf/2501.15860v1 | None |
📝 更新 | GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration | GUI-Bee:通过自主探索将GUI动作定位与新型环境对齐 | Yue Fan, Handong Zhao, Ruiyi Zhang, Yu Shen, Xin Eric Wang, Gang Wu | http://arxiv.org/pdf/2501.13896v2 | None |
📝 更新 | Towards Kriging-informed Conditional Diffusion for Regional Sea-Level Data Downscaling | 向区域海平面数据降尺度迈进:基于克里金信息的条件扩散 | Subhankar Ghosh, Arun Sharma, Jayant Gupta, Aneesh Subramanian, Shashi Shekhar | http://arxiv.org/pdf/2410.15628v3 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Toward Efficient Generalization in 3D Human Pose Estimation via a Canonical Domain Approach | 迈向通过规范域方法实现高效泛化的3D人体姿态估计 | Hoosang Lee, Jeha Ryu | http://arxiv.org/pdf/2501.16146v1 | None |
🆕 发布 | Automated Detection of Sport Highlights from Audio and Video Sources | 从音频和视频源自动检测体育精彩瞬间 | Francesco Della Santa, Morgana Lalli | http://arxiv.org/pdf/2501.16100v2 | None |
🆕 发布 | NanoHTNet: Nano Human Topology Network for Efficient 3D Human Pose Estimation | 纳米人拓扑网络:用于高效3D人体姿态估计的纳米人类拓扑网络 | Jialun Cai, Mengyuan Liu, Hong Liu, Wenhao Li, Shuheng Zhou | http://arxiv.org/pdf/2501.15763v1 | https://github.com/vefalun/NanoHTNet. |
📝 更新 | VCRScore: Image captioning metric based on V&L Transformers, CLIP, and precision-recall | VCRScore:基于V&L Transformers、CLIP和精确率-召回率的图像标题度量标准 | Guillermo Ruiz, Tania Ramírez, Daniela Moctezuma | http://arxiv.org/pdf/2501.09155v2 | None |
📝 更新 | From Dashcam Videos to Driving Simulations: Stress Testing Automated Vehicles against Rare Events | 从行车记录仪视频到驾驶模拟:对自动驾驶汽车进行罕见事件的压力测试 | Yan Miao, Georgios Fainekos, Bardh Hoxha, Hideki Okamoto, Danil Prokhorov, Sayan Mitra | http://arxiv.org/pdf/2411.16027v2 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | LLM-attacker: Enhancing Closed-loop Adversarial Scenario Generation for Autonomous Driving with Large Language Models | LLM-attacker:利用大型语言模型增强自动驾驶闭环对抗场景生成 | Yuewen Mei, Tong Nie, Jian Sun, Ye Tian | http://arxiv.org/pdf/2501.15850v1 | None |
📝 更新 | MADation: Face Morphing Attack Detection with Foundation Models | MADation:基于基础模型的表情合成攻击检测 | Eduarda Caldeira, Guray Ozgur, Tahar Chettaoui, Marija Ivanovska, Peter Peer, Fadi Boutros, Vitomir Struc, Naser Damer | http://arxiv.org/pdf/2501.03800v3 | https://github.com/gurayozgur/MADation |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | BAG: Body-Aligned 3D Wearable Asset Generation | BAG:基于身体对齐的3D可穿戴资产生成 | Zhongjin Luo, Yang Li, Mingrui Zhang, Senbo Wang, Han Yan, Xibin Song, Taizhang Shang, Wei Mao .etc. | http://arxiv.org/pdf/2501.16177v1 | https://bag-3d.github.io/. |
🆕 发布 | A Data-Centric Approach: Dimensions of Visual Complexity and How to find Them | 数据驱动的解决方案:视觉复杂性的维度及其发现方法 | Karahan Sarıtaş, Tingke Shen, Surabhi S Nath, Peter Dayan | http://arxiv.org/pdf/2501.15890v1 | None |
🆕 发布 | ClearSight: Human Vision-Inspired Solutions for Event-Based Motion Deblurring | 清晰视界:基于人类视觉的动态模糊去噪解决方案 | Xiaopeng Lin, Yulong Huang, Hongwei Ren, Zunchang Liu, Yue Zhou, Haotian Fu, Bojun Cheng | http://arxiv.org/pdf/2501.15808v1 | None |
🆕 发布 | Do Existing Testing Tools Really Uncover Gender Bias in Text-to-Image Models? | 现有测试工具真的能揭示文本到图像模型中的性别偏见吗? | Yunbo Lyu, Zhou Yang, Yuqing Niu, Jing Jiang, David Lo | http://arxiv.org/pdf/2501.15775v1 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | BiFold: Bimanual Cloth Folding with Language Guidance | 双面折叠:语言引导下的双手布料折叠 | Oriol Barbany, Adrià Colomé, Carme Torras | http://arxiv.org/pdf/2501.16458v1 | None |
🆕 发布 | Return of the Encoder: Maximizing Parameter Efficiency for SLMs | 编码器归来:最大化SLMs的参数效率 | Mohamed Elfeki, Rui Liu, Chad Voegele | http://arxiv.org/pdf/2501.16273v2 | None |
🆕 发布 | Distilling foundation models for robust and efficient models in digital pathology | 从基础模型中提炼出数字病理学中的鲁棒和高效模型 | Alexandre Filiot, Nicolas Dop, Oussama Tchita, Auriane Riou, Rémy Dubois, Thomas Peeters, Daria Valter, Marin Scalbert .etc. | http://arxiv.org/pdf/2501.16239v2 | None |
🆕 发布 | Rethinking the Bias of Foundation Model under Long-tailed Distribution | 重新思考长尾分布下基础模型的偏差 | Jiahao Chen, Bin Qin, Jiangmeng Li, Hao Chen, Bing Su | http://arxiv.org/pdf/2501.15955v1 | None |
🆕 发布 | Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks | 任意到任意Tryon:利用自适应位置嵌入实现多功能的虚拟服装任务 | Hailong Guo, Bohan Zeng, Yiren Song, Wentao Zhang, Chuang Zhang, Jiaming Liu | http://arxiv.org/pdf/2501.15891v1 | https://logn-2024.github.io/Any2anyTryonProjectPage |
🆕 发布 | Controllable Hand Grasp Generation for HOI and Efficient Evaluation Methods | 可控手部抓取生成用于人机交互和高效评估方法 | Ishant, Rongliang Wu, Joo Hwee Lim | http://arxiv.org/pdf/2501.15839v1 | None |
📝 更新 | Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning | 通过互补掩码实现隐式位置-标题对齐的弱监督密集视频字幕生成 | Shiping Ge, Qiang Chen, Zhiwei Jiang, Yafeng Yin, Liu Qin, Ziyao Chen, Qing Gu | http://arxiv.org/pdf/2412.12791v2 | None |
📝 更新 | CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes | CAFuser:基于条件感知的多模态融合,用于驾驶场景的鲁棒语义感知 | Tim Broedermann, Christos Sakaridis, Yuqian Fu, Luc Van Gool | http://arxiv.org/pdf/2410.10791v2 | https://github.com/timbroed/CAFuser. |
📝 更新 | JAM: A Comprehensive Model for Age Estimation, Verification, and Comparability | JAM:一种用于年龄估计、验证和可比性的综合模型 | François David, Alexey A. Novikov, Ruslan Parkhomenko, Artem Voronin, Alix Melchy | http://arxiv.org/pdf/2410.04012v2 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Multi-Objective Deep-Learning-based Biomechanical Deformable Image Registration with MOREA | 基于多目标深度学习的生物力学可变形图像配准:使用MOREA | Georgios Andreadis, Eduard Ruiz Munné, Thomas H. W. Bäck, Peter A. N. Bosman, Tanja Alderliesten | http://arxiv.org/pdf/2501.16525v1 | None |
🆕 发布 | Generating customized prompts for Zero-Shot Rare Event Medical Image Classification using LLM | 基于大型语言模型生成零样本罕见事件医学图像分类的定制提示 | Payal Kamboj, Ayan Banerjee, Bin Xu, Sandeep Gupta | http://arxiv.org/pdf/2501.16481v1 | None |
🆕 发布 | Object Detection for Medical Image Analysis: Insights from the RT-DETR Model | 医学图像分析中的目标检测:RT-DETR模型见解 | Weijie He, Yuwei Zhang, Ting Xu, Tai An, Yingbin Liang, Bo Zhang | http://arxiv.org/pdf/2501.16469v1 | None |
🆕 发布 | Adaptive Iterative Compression for High-Resolution Files: an Approach Focused on Preserving Visual Quality in Cinematic Workflows | 自适应迭代压缩:一种专注于电影制作流程中保持视觉质量的解决方案 | Leonardo Melo, Filipe Litaiff | http://arxiv.org/pdf/2501.16319v1 | None |
🆕 发布 | Brain-Adapter: Enhancing Neurological Disorder Analysis with Adapter-Tuning Multimodal Large Language Models | 脑适配器:通过适配器调优的多模态大型语言模型增强神经系统疾病分析 | Jing Zhang, Xiaowei Yu, Yanjun Lyu, Lu Zhang, Tong Chen, Chao Cao, Yan Zhuang, Minheng Chen .etc. | http://arxiv.org/pdf/2501.16282v1 | None |
🆕 发布 | CLISC: Bridging clip and sam by enhanced cam for unsupervised brain tumor segmentation | CLISC:通过增强CAM连接CLIP和SAM以实现无监督脑肿瘤分割 | Xiaochuan Ma, Jia Fu, Wenjun Liao, Shichuan Zhang, Guotai Wang | http://arxiv.org/pdf/2501.16246v1 | None |
🆕 发布 | Real-Time Brain Tumor Detection in Intraoperative Ultrasound Using YOLO11: From Model Training to Deployment in the Operating Room | 实时术中超声脑肿瘤检测:基于YOLO11从模型训练到手术室部署 | Santiago Cepeda, Olga Esteban-Sinovas, Roberto Romero, Vikas Singh, Prakash Shetty, Aliasgar Moiyadi, Ilyess Zemmoura, Giuseppe Roberto Giammalva .etc. | http://arxiv.org/pdf/2501.15994v1 | None |
🆕 发布 | Pfungst and Clever Hans: Identifying the unintended cues in a widely used Alzheimer's disease MRI dataset using explainable deep learning | 《普芬施和聪明的汉斯:利用可解释深度学习识别广泛使用的阿尔茨海默病MRI数据集中未预期的提示》 | Christian Tinauer, Maximilian Sackl, Rudolf Stollberger, Stefan Ropele, Christian Langkammer | http://arxiv.org/pdf/2501.15831v1 | None |
🆕 发布 | Z-Stack Scanning can Improve AI Detection of Mitosis: A Case Study of Meningiomas | Z-Stack扫描可提升AI对有丝分裂的检测:脑膜瘤案例分析 | Hongyan Gu, Ellie Onstott, Wenzhong Yan, Tengyou Xu, Ruolin Wang, Zida Wu, Xiang 'Anthony' Chen, Mohammad Haeri | http://arxiv.org/pdf/2501.15743v1 | None |
🆕 发布 | Leveraging Video Vision Transformer for Alzheimer's Disease Diagnosis from 3D Brain MRI | 利用视频视觉Transformer从3D脑MRI诊断阿尔茨海默病 | Taymaz Akan, Sait Alp, Md. Shenuarin Bhuiyan, Elizabeth A. Disbrow, Steven A. Conrad, John A. Vanchiere, Christopher G. Kevil, Mohammad A. N. Bhuiyan | http://arxiv.org/pdf/2501.15733v1 | None |
🆕 发布 | A Survey on Computational Pathology Foundation Models: Datasets, Adaptation Strategies, and Evaluation Tasks | 计算病理学基础模型综述:数据集、自适应策略和评估任务 | Dong Li, Guihong Wan, Xintao Wu, Xinyu Wu, Ajit J. Nirmal, Christine G. Lian, Peter K. Sorger, Yevgeniy R. Semenov .etc. | http://arxiv.org/pdf/2501.15724v1 | None |
🆕 发布 | SeqSeg: Learning Local Segments for Automatic Vascular Model Construction | SeqSeg:学习局部段以自动构建血管模型 | Numi Sveinsson Cepero, Shawn C. Shadden | http://arxiv.org/pdf/2501.15712v1 | None |
📝 更新 | FedDAG: Federated Domain Adversarial Generation Towards Generalizable Medical Image Analysis | 联邦域对抗生成以实现可泛化医学图像分析:FedDAG | Haoxuan Che, Yifei Wu, Haibo Jin, Yong Xia, Hao Chen | http://arxiv.org/pdf/2501.13967v2 | None |
📝 更新 | Slot-BERT: Self-supervised Object Discovery in Surgical Video | 槽位BERT:手术视频中的自监督物体发现 | Guiqiu Liao, Matjaz Jogan, Marcel Hussing, Kenta Nakahashi, Kazuhiro Yasufuku, Amin Madani, Eric Eaton, Daniel A. Hashimoto | http://arxiv.org/pdf/2501.12477v2 | None |
📝 更新 | Multi-Tiered Self-Contrastive Learning for Medical Microwave Radiometry (MWR) Breast Cancer Detection | 多层自对比学习在医学微波辐射计(MWR)乳腺癌检测中的应用 | Christoforos Galazis, Huiyi Wu, Igor Goryanin | http://arxiv.org/pdf/2410.04636v2 | https://github.com/cgalaz01/self_contrastive_mwr. |
📝 更新 | MSDet: Receptive Field Enhanced Multiscale Detection for Tiny Pulmonary Nodule | MSDet:针对微小肺结节的多尺度检测感受野增强方法 | Guohui Cai, Ruicheng Zhang, Hongyang He, Zeyu Zhang, Daji Ergu, Yuanzhouhan Cao, Jinman Zhao, Binbin Hu .etc. | http://arxiv.org/pdf/2409.14028v2 | https://github.com/CaiGuoHui123/MSDet |
📝 更新 | Generative Adversarial Networks in Ultrasound Imaging: Extending Field of View Beyond Conventional Limits | 超声成像中生成对抗网络:扩展视场超越传统限制 | Matej Gazda, Samuel Kadoury, Jakub Gazda, Peter Drotar | http://arxiv.org/pdf/2405.20981v2 | None |
📝 更新 | MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis | MedPromptX:基于地面多模态提示的胸部X光诊断 | Mai A. Shaaban, Adnan Khan, Mohammad Yaqub | http://arxiv.org/pdf/2403.15585v4 | https://github.com/BioMedIA-MBZUAI/MedPromptX. |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
📝 更新 | Evaluation of GPT-4o and GPT-4o-mini's Vision Capabilities for Compositional Analysis from Dried Solution Drops | GPT-4o和GPT-4o-mini在干溶液滴组合分析中的视觉能力评估 | Deven B. Dangi, Beni B. Dangi, Oliver Steinbock | http://arxiv.org/pdf/2412.10587v2 | None |