Skip to content

Latest commit

 

History

History
142 lines (96 loc) · 16 KB

2025-01-18.md

File metadata and controls

142 lines (96 loc) · 16 KB

[UPDATED!] 2025-01-18 (Update Time)

图像理解

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Visual RAG: Expanding MLLM visual knowledge without fine-tuning 视觉RAG:无需微调扩展MLLM视觉知识 Mirco Bonomo, Simone Bianco http://arxiv.org/pdf/2501.10834v1 None
🆕 发布 A Resource-Efficient Training Framework for Remote Sensing Text--Image Retrieval 资源高效遥感文本-图像检索训练框架 Weihang Zhang, Jihao Li, Shuoke Li, Ziqing Niu, Jialiang Chen, Wenkai Zhang http://arxiv.org/pdf/2501.10638v1 https://github.com/ZhangWeihang99/CMER.
📝 更新 JigsawHSI: a network for Hyperspectral Image classification 拼图高光谱图像分类网络 Jaime Moraga http://arxiv.org/pdf/2206.02327v3 None

检测分割

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 OpenEarthMap-SAR: A Benchmark Synthetic Aperture Radar Dataset for Global High-Resolution Land Cover Mapping 开放地球地图-SAR:全球高分辨率土地覆盖制图基准合成孔径雷达数据集 Junshi Xia, Hongruixuan Chen, Clifford Broni-Bediako, Yimin Wei, Jian Song, Naoto Yokoya http://arxiv.org/pdf/2501.10891v1 None
🆕 发布 GAUDA: Generative Adaptive Uncertainty-guided Diffusion-based Augmentation for Surgical Segmentation GAUDA:基于生成自适应不确定性引导的扩散增强用于手术分割 Yannik Frisch, Christina Bornberg, Moritz Fuchs, Anirban Mukhopadhyay http://arxiv.org/pdf/2501.10819v1 None
🆕 发布 Efficient Auto-Labeling of Large-Scale Poultry Datasets (ALPD) Using Semi-Supervised Models, Active Learning, and Prompt-then-Detect Approach 高效利用半监督模型、主动学习和提示-检测方法对大规模家禽数据集进行自动标注(ALPD) Ramesh Bahadur Bist, Lilong Chai, Shawna Weimer, Hannah Atungulua, Chantel Pennicott, Xiao Yang, Sachin Subedi, Chaitanya Pallerla .etc. http://arxiv.org/pdf/2501.10809v1 None
🆕 发布 Semi-supervised Semantic Segmentation for Remote Sensing Images via Multi-scale Uncertainty Consistency and Cross-Teacher-Student Attention 基于多尺度不确定性一致性和跨教师-学生注意力的遥感图像半监督语义分割 Shanwen Wang, Changrui Chen, Xin Sun, Danfeng Hong, Jungong Han http://arxiv.org/pdf/2501.10736v1 None
🆕 发布 Multi-modal Fusion and Query Refinement Network for Video Moment Retrieval and Highlight Detection 多模态融合与查询细化网络在视频时刻检索与高光检测中的应用 Yifang Xu, Yunzhuo Sun, Benxiang Zhai, Zien Xie, Youyao Jia, Sidan Du http://arxiv.org/pdf/2501.10692v1 None
🆕 发布 ClusterViG: Efficient Globally Aware Vision GNNs via Image Partitioning ClusterViG:通过图像分区实现高效的全球感知视觉图神经网络 Dhruv Parikh, Jacob Fein-Ashley, Tian Ye, Rajgopal Kannan, Viktor Prasanna http://arxiv.org/pdf/2501.10640v1 None
📝 更新 Impact of color and mixing proportion of synthetic point clouds on semantic segmentation 合成点云中颜色和混合比例对语义分割的影响 Shaojie Zhou, Jia-Rui Lin, Peng Pan, Yuandong Pan, Ioannis Brilakis http://arxiv.org/pdf/2412.19145v2 None
📝 更新 Uncertainty-Guided Appearance-Motion Association Network for Out-of-Distribution Action Detection 基于不确定性引导的外观-运动关联网络进行分布外动作检测 Xiang Fang, Arvind Easwaran, Blaise Genest http://arxiv.org/pdf/2409.09953v2 None
📝 更新 Depth-Weighted Detection of Behaviours of Risk in People with Dementia using Cameras 深度加权痴呆症患者行为风险检测使用摄像头 Pratik K. Mishra, Irene Ballester, Andrea Iaboni, Bing Ye, Kristine Newman, Alex Mihailidis, Shehroz S. Khan http://arxiv.org/pdf/2408.15519v2 None
📝 更新 Distilling Aggregated Knowledge for Weakly-Supervised Video Anomaly Detection 弱监督视频异常检测中的知识蒸馏 Jash Dalvi, Ali Dabouei, Gunjan Dhanuka, Min Xu http://arxiv.org/pdf/2406.02831v2 None

视频理解

状态 英文标题 中文标题 作者 PDF链接 代码链接
📝 更新 Neptune: The Long Orbit to Benchmarking Long Video Understanding 涅普顿:迈向长视频理解基准的长途之旅 Arsha Nagrani, Mingda Zhang, Ramin Mehran, Rachel Hornung, Nitesh Bharadwaj Gundavarapu, Nilpa Jha, Austin Myers, Xingyi Zhou .etc. http://arxiv.org/pdf/2412.09582v2 https://github.com/google-deepmind/neptune

生成模型

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 EMO2: End-Effector Guided Audio-Driven Avatar Video Generation EMO2:末端执行器引导的音频驱动虚拟形象视频生成 Linrui Tian, Siqi Hu, Qi Wang, Bang Zhang, Liefeng Bo http://arxiv.org/pdf/2501.10687v1 None
📝 更新 DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder 梦合:基于服装的轻量级任意物体着装编码器生成人类 Ente Lin, Xujie Zhang, Fuwei Zhao, Yuxuan Luo, Xin Dong, Long Zeng, Xiaodan Liang http://arxiv.org/pdf/2412.17644v3 None
📝 更新 Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation 即时调度:用于更快更好图像生成的扩散时间预测 Zilyu Ye, Zhiyang Chen, Tiancheng Li, Zemin Huang, Weijian Luo, Guo-Jun Qi http://arxiv.org/pdf/2412.01243v2 None

图像处理

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption 红外与可见光图像融合:从数据兼容性到任务适应性 Jinyuan Liu, Guanyao Wu, Zhu Liu, Di Wang, Zhiying Jiang, Long Ma, Wei Zhong, Xin Fan .etc. http://arxiv.org/pdf/2501.10761v1 https://github.com/RollingPlain/IVIF_ZOO.
🆕 发布 Quadcopter Position Hold Function using Optical Flow in a Smartphone-based Flight Computer 基于智能手机飞行计算机的光流四旋翼定位保持功能 Noel P Caliston, Chris Jordan C. Aliac, James Arnold E. Nogra http://arxiv.org/pdf/2501.10752v1 None
📝 更新 Active Prompt Tuning Enables Gpt-40 To Do Efficient Classification Of Microscopy Images 主动提示调整使Gpt-40能够高效分类显微镜图像 Abhiram Kandiyana, Peter R. Mouton, Yaroslav Kolinko, Lawrence O. Hall, Dmitry Goldgof http://arxiv.org/pdf/2411.02639v2 None

3D场景

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 CS-Net:Contribution-based Sampling Network for Point Cloud Simplification CS-Net:基于贡献的点云简化采样网络 Tian Guo, Chen Chen, Hui Yuan, Xiaolong Mao, Raouf Hamzaoui, Junhui Hou http://arxiv.org/pdf/2501.10789v1 None
📝 更新 Leveraging Consistent Spatio-Temporal Correspondence for Robust Visual Odometry 利用一致时空对应关系进行鲁棒视觉里程计 Zhaoxing Zhang, Junda Cheng, Gangwei Xu, Xiaoxiang Wang, Can Zhang, Xin Yang http://arxiv.org/pdf/2412.16923v3 None
📝 更新 Self-Supervised Scene Flow Estimation with Point-Voxel Fusion and Surface Representation 自监督场景光流估计:基于点-体素融合与表面表示 Xuezhi Xiang, Xi Wang, Lei Zhang, Denis Ombati, Himaloy Himu, Xiantong Zhen http://arxiv.org/pdf/2410.13355v2 None
📝 更新 Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction 莲花:基于扩散的高质量密集预测视觉基础模型 Jing He, Haodong Li, Wei Yin, Yixun Liang, Leheng Li, Kaiqiang Zhou, Hongbo Zhang, Bingbing Liu .etc. http://arxiv.org/pdf/2409.18124v5 https://lotus3d.github.io/.
📝 更新 Manydepth2: Motion-Aware Self-Supervised Multi-Frame Monocular Depth Estimation in Dynamic Scenes Manydepth2:动态场景中的运动感知自监督多帧单目深度估计 Kaichen Zhou, Jia-Wang Bian, Qian Xie, Jian-Qing Zheng, Niki Trigoni, Andrew Markham http://arxiv.org/pdf/2312.15268v7 https://github.com/kaichen-z/Manydepth2.
📝 更新 Human as Points: Explicit Point-based 3D Human Reconstruction from Single-view RGB Images 人点化:从单视图RGB图像中显式点云的3D人体重建 Yingzhi Tang, Qijian Zhang, Junhui Hou, Yebin Liu http://arxiv.org/pdf/2311.02892v2 https://github.com/yztang4/HaP.

神经渲染

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Exploring Siamese Networks in Self-Supervised Fast MRI Reconstruction 探索Siamese网络在自监督快速MRI重建中的应用 Liyan Sun, Shaocong Yu, Chi Zhang, Xinghao Ding http://arxiv.org/pdf/2501.10851v1 None
📝 更新 DynPoint: Dynamic Neural Point For View Synthesis 动态神经视点合成点 Kaichen Zhou, Jia-Xing Zhong, Sangyun Shin, Kai Lu, Yiyuan Yang, Andrew Markham, Niki Trigoni http://arxiv.org/pdf/2310.18999v4 None

3DGS

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Decoupling Appearance Variations with 3D Consistent Features in Gaussian Splatting 基于高斯分层中的3D一致性特征解耦外观变化 Jiaqi Lin, Zhihao Li, Binxiao Huang, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Xiaofei Wu, Fenglong Song .etc. http://arxiv.org/pdf/2501.10788v1 None
📝 更新 3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement 基于3D高斯散布的物理物体排列变化检测:3DGS-CD Ziqi Lu, Jianbo Ye, John Leonard http://arxiv.org/pdf/2411.03706v2 https://github.com/520xyxyzq/3DGS-CD.

多模态

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 Exploring Transferable Homogeneous Groups for Compositional Zero-Shot Learning 探索可迁移的同质组用于组合零样本学习 Zhijie Rao, Jingcai Guo, Miaoge Li, Yang Chen http://arxiv.org/pdf/2501.10695v1 None
🆕 发布 Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No! 多模态大型语言模型能否进行视觉时空理解和推理?答案是:不能! Mohamed Fazli Imam, Chenyang Lyu, Alham Fikri Aji http://arxiv.org/pdf/2501.10674v1 None
📝 更新 Automatic Fused Multimodal Deep Learning for Plant Identification 自动融合多模态深度学习植物识别 Alfreds Lapkovskis, Natalia Nefedova, Ali Beikmohammadi http://arxiv.org/pdf/2406.01455v3 None

具身智能

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 RoMu4o: A Robotic Manipulation Unit For Orchard Operations Automating Proximal Hyperspectral Leaf Sensing RoMu4o:一种用于果园作业的机器人操作单元,实现近程高光谱叶片传感自动化 Mehrad Mortazavi, David J. Cappelleri, Reza Ehsani http://arxiv.org/pdf/2501.10621v1 https://github.com/mehradmrt/UCM-AgBot-ROS2
📝 更新 BTMTrack: Robust RGB-T Tracking via Dual-template Bridging and Temporal-Modal Candidate Elimination BTMTrack:通过双模板桥接和时序模态候选消除实现的鲁棒RGB-T跟踪 Zhongxuan Zhang, Bi Zeng, Xinyu Ni, Yimin Du http://arxiv.org/pdf/2501.03616v3 None

人体分析

状态 英文标题 中文标题 作者 PDF链接 代码链接
📝 更新 VIPeR: Visual Incremental Place Recognition with Adaptive Mining and Lifelong Learning VIPeR:基于自适应挖掘和终身学习的视觉增量场所识别 Yuhang Ming, Minyang Xu, Xingrui Yang, Weicai Ye, Weihan Wang, Yong Peng, Weichen Dai, Wanzeng Kong http://arxiv.org/pdf/2407.21416v2 None

人脸技术

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection LD-DETR:循环解码器检测Transformer用于视频瞬间检索和精彩片段检测 Pengcheng Zhao, Zhixian He, Fuwei Zhang, Shujin Lin, Fan Zhou http://arxiv.org/pdf/2501.10787v1 https://github.com/qingchen239/ld-detr.
📝 更新 PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration PSReg:基于先验的稀疏专家混合点云配准 Xiaoshui Huang, Zhou Huang, Yifan Zuo, Yongshun Gong, Chengdong Zhang, Deyang Liu, Yuming Fang http://arxiv.org/pdf/2501.07762v2 None

数字人

状态 英文标题 中文标题 作者 PDF链接 代码链接
📝 更新 Golden Noise for Diffusion Models: A Learning Framework 金噪扩散模型:一个学习框架 Zikai Zhou, Shitong Shao, Lichen Bai, Zhiqiang Xu, Bo Han, Zeke Xie http://arxiv.org/pdf/2411.09502v4 None

模型优化

状态 英文标题 中文标题 作者 PDF链接 代码链接
📝 更新 Enhanced Urban Region Profiling with Adversarial Self-Supervised Learning for Robust Forecasting and Security 增强城市区域特征提取:基于对抗自监督学习的鲁棒预测与安全 Weiliang Chen, Qianqian Ren, Yong Liu, Jianguo Sun http://arxiv.org/pdf/2402.01163v3 None

医学应用

状态 英文标题 中文标题 作者 PDF链接 代码链接
🆕 发布 No More Sliding Window: Efficient 3D Medical Image Segmentation with Differentiable Top-k Patch Sampling 不再使用滑动窗口:基于可微分的Top-k补丁采样的高效3D医学图像分割 Young Seok Jeon, Hongfei Yang, Huazhu Fu, Mengling Feng http://arxiv.org/pdf/2501.10814v1 None
🆕 发布 MedFILIP: Medical Fine-grained Language-Image Pre-training 医细粒度语言-图像预训练:MedFILIP Xinjie Liang, Xiangyu Li, Fanding Li, Jie Jiang, Qing Dong, Wei Wang, Kuanquan Wang, Suyu Dong .etc. http://arxiv.org/pdf/2501.10775v1 https://github.com/PerceptionComputingLab/MedFILIP.
🆕 发布 Enhancing Diagnostic in 3D COVID-19 Pneumonia CT-scans through Explainable Uncertainty Bayesian Quantification 通过可解释的不确定性贝叶斯量化增强3D COVID-19肺炎CT扫描的诊断 Juan Manuel Liscano Fierro, Hector J. Hortua http://arxiv.org/pdf/2501.10770v1 None
🆕 发布 Deformable Image Registration of Dark-Field Chest Radiographs for Local Lung Signal Change Assessment 可变形图像配准用于暗场胸部X光片局部肺信号变化评估 Fabian Drexel, Vasiliki Sideri-Lampretsa, Henriette Bast, Alexander W. Marka, Thomas Koehler, Florian T. Gassert, Daniela Pfeiffer, Daniel Rueckert .etc. http://arxiv.org/pdf/2501.10757v1 None
🆕 发布 A CNN-Transformer for Classification of Longitudinal 3D MRI Images -- A Case Study on Hepatocellular Carcinoma Prediction 基于CNN-Transformer的纵向3D MRI图像分类——肝癌预测案例研究 Jakob Nolte, Maureen M. J. Guichelaar, Donald E. Bouman, Stephanie M. van den Berg, Maryam Amir Haeri http://arxiv.org/pdf/2501.10733v1 None
🆕 发布 In the Picture: Medical Imaging Datasets, Artifacts, and their Living Review 图像中的医学影像数据集、伪影及其生活综述 Amelia Jiménez-Sánchez, Natalia-Rozalia Avlona, Sarah de Boer, Víctor M. Campello, Aasa Feragen, Enzo Ferrante, Melanie Ganz, Judy Wawira Gichoya .etc. http://arxiv.org/pdf/2501.10727v1 None
🆕 发布 Hierarchical LoG Bayesian Neural Network for Enhanced Aorta Segmentation 分层LoG贝叶斯神经网络增强主动脉分割 Delin An, Pan Du, Pengfei Gu, Jian-Xun Wang, Chaoli Wang http://arxiv.org/pdf/2501.10615v1 https://github.com/adlsn/LoGBNet.
📝 更新 CBAM-EfficientNetV2 for Histopathology Image Classification using Transfer Learning and Dual Attention Mechanisms 基于迁移学习和双注意力机制的CBAM-EfficientNetV2在病理图像分类中的应用 Naren Sengodan http://arxiv.org/pdf/2410.22392v5 None
📝 更新 Latent Diffusion for Medical Image Segmentation: End to end learning for fast sampling and accuracy 潜扩散在医学图像分割中的应用:端到端学习以实现快速采样和精度 Fahim Ahmed Zaman, Mathews Jacob, Amanda Chang, Kan Liu, Milan Sonka, Xiaodong Wu http://arxiv.org/pdf/2407.12952v2 https://github.com/FahimZaman/LDSeg.git.
📝 更新 Stitching, Fine-tuning, Re-training: A SAM-enabled Framework for Semi-supervised 3D Medical Image Segmentation 基于SAM的半监督3D医学图像分割框架:拼接、微调和再训练 Shumeng Li, Lei Qi, Qian Yu, Jing Huo, Yinghuan Shi, Yang Gao http://arxiv.org/pdf/2403.11229v2 None