[UPDATED!] 2025-01-18 (Update Time)

图像理解

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Visual RAG: Expanding MLLM visual knowledge without fine-tuning	视觉RAG：无需微调扩展MLLM视觉知识	Mirco Bonomo, Simone Bianco	http://arxiv.org/pdf/2501.10834v1	None
🆕 发布	A Resource-Efficient Training Framework for Remote Sensing Text--Image Retrieval	资源高效遥感文本-图像检索训练框架	Weihang Zhang, Jihao Li, Shuoke Li, Ziqing Niu, Jialiang Chen, Wenkai Zhang	http://arxiv.org/pdf/2501.10638v1	https://github.com/ZhangWeihang99/CMER.
📝 更新	JigsawHSI: a network for Hyperspectral Image classification	拼图高光谱图像分类网络	Jaime Moraga	http://arxiv.org/pdf/2206.02327v3	None

检测分割

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	OpenEarthMap-SAR: A Benchmark Synthetic Aperture Radar Dataset for Global High-Resolution Land Cover Mapping	开放地球地图-SAR：全球高分辨率土地覆盖制图基准合成孔径雷达数据集	Junshi Xia, Hongruixuan Chen, Clifford Broni-Bediako, Yimin Wei, Jian Song, Naoto Yokoya	http://arxiv.org/pdf/2501.10891v1	None
🆕 发布	GAUDA: Generative Adaptive Uncertainty-guided Diffusion-based Augmentation for Surgical Segmentation	GAUDA：基于生成自适应不确定性引导的扩散增强用于手术分割	Yannik Frisch, Christina Bornberg, Moritz Fuchs, Anirban Mukhopadhyay	http://arxiv.org/pdf/2501.10819v1	None
🆕 发布	Efficient Auto-Labeling of Large-Scale Poultry Datasets (ALPD) Using Semi-Supervised Models, Active Learning, and Prompt-then-Detect Approach	高效利用半监督模型、主动学习和提示-检测方法对大规模家禽数据集进行自动标注（ALPD）	Ramesh Bahadur Bist, Lilong Chai, Shawna Weimer, Hannah Atungulua, Chantel Pennicott, Xiao Yang, Sachin Subedi, Chaitanya Pallerla .etc.	http://arxiv.org/pdf/2501.10809v1	None
🆕 发布	Semi-supervised Semantic Segmentation for Remote Sensing Images via Multi-scale Uncertainty Consistency and Cross-Teacher-Student Attention	基于多尺度不确定性一致性和跨教师-学生注意力的遥感图像半监督语义分割	Shanwen Wang, Changrui Chen, Xin Sun, Danfeng Hong, Jungong Han	http://arxiv.org/pdf/2501.10736v1	None
🆕 发布	Multi-modal Fusion and Query Refinement Network for Video Moment Retrieval and Highlight Detection	多模态融合与查询细化网络在视频时刻检索与高光检测中的应用	Yifang Xu, Yunzhuo Sun, Benxiang Zhai, Zien Xie, Youyao Jia, Sidan Du	http://arxiv.org/pdf/2501.10692v1	None
🆕 发布	ClusterViG: Efficient Globally Aware Vision GNNs via Image Partitioning	ClusterViG：通过图像分区实现高效的全球感知视觉图神经网络	Dhruv Parikh, Jacob Fein-Ashley, Tian Ye, Rajgopal Kannan, Viktor Prasanna	http://arxiv.org/pdf/2501.10640v1	None
📝 更新	Impact of color and mixing proportion of synthetic point clouds on semantic segmentation	合成点云中颜色和混合比例对语义分割的影响	Shaojie Zhou, Jia-Rui Lin, Peng Pan, Yuandong Pan, Ioannis Brilakis	http://arxiv.org/pdf/2412.19145v2	None
📝 更新	Uncertainty-Guided Appearance-Motion Association Network for Out-of-Distribution Action Detection	基于不确定性引导的外观-运动关联网络进行分布外动作检测	Xiang Fang, Arvind Easwaran, Blaise Genest	http://arxiv.org/pdf/2409.09953v2	None
📝 更新	Depth-Weighted Detection of Behaviours of Risk in People with Dementia using Cameras	深度加权痴呆症患者行为风险检测使用摄像头	Pratik K. Mishra, Irene Ballester, Andrea Iaboni, Bing Ye, Kristine Newman, Alex Mihailidis, Shehroz S. Khan	http://arxiv.org/pdf/2408.15519v2	None
📝 更新	Distilling Aggregated Knowledge for Weakly-Supervised Video Anomaly Detection	弱监督视频异常检测中的知识蒸馏	Jash Dalvi, Ali Dabouei, Gunjan Dhanuka, Min Xu	http://arxiv.org/pdf/2406.02831v2	None

视频理解

状态	英文标题	中文标题	作者	PDF链接	代码链接
📝 更新	Neptune: The Long Orbit to Benchmarking Long Video Understanding	涅普顿：迈向长视频理解基准的长途之旅	Arsha Nagrani, Mingda Zhang, Ramin Mehran, Rachel Hornung, Nitesh Bharadwaj Gundavarapu, Nilpa Jha, Austin Myers, Xingyi Zhou .etc.	http://arxiv.org/pdf/2412.09582v2	https://github.com/google-deepmind/neptune

生成模型

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	EMO2: End-Effector Guided Audio-Driven Avatar Video Generation	EMO2：末端执行器引导的音频驱动虚拟形象视频生成	Linrui Tian, Siqi Hu, Qi Wang, Bang Zhang, Liefeng Bo	http://arxiv.org/pdf/2501.10687v1	None
📝 更新	DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder	梦合：基于服装的轻量级任意物体着装编码器生成人类	Ente Lin, Xujie Zhang, Fuwei Zhao, Yuxuan Luo, Xin Dong, Long Zeng, Xiaodan Liang	http://arxiv.org/pdf/2412.17644v3	None
📝 更新	Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation	即时调度：用于更快更好图像生成的扩散时间预测	Zilyu Ye, Zhiyang Chen, Tiancheng Li, Zemin Huang, Weijian Luo, Guo-Jun Qi	http://arxiv.org/pdf/2412.01243v2	None

图像处理

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption	红外与可见光图像融合：从数据兼容性到任务适应性	Jinyuan Liu, Guanyao Wu, Zhu Liu, Di Wang, Zhiying Jiang, Long Ma, Wei Zhong, Xin Fan .etc.	http://arxiv.org/pdf/2501.10761v1	https://github.com/RollingPlain/IVIF_ZOO.
🆕 发布	Quadcopter Position Hold Function using Optical Flow in a Smartphone-based Flight Computer	基于智能手机飞行计算机的光流四旋翼定位保持功能	Noel P Caliston, Chris Jordan C. Aliac, James Arnold E. Nogra	http://arxiv.org/pdf/2501.10752v1	None
📝 更新	Active Prompt Tuning Enables Gpt-40 To Do Efficient Classification Of Microscopy Images	主动提示调整使Gpt-40能够高效分类显微镜图像	Abhiram Kandiyana, Peter R. Mouton, Yaroslav Kolinko, Lawrence O. Hall, Dmitry Goldgof	http://arxiv.org/pdf/2411.02639v2	None

3D场景

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	CS-Net:Contribution-based Sampling Network for Point Cloud Simplification	CS-Net：基于贡献的点云简化采样网络	Tian Guo, Chen Chen, Hui Yuan, Xiaolong Mao, Raouf Hamzaoui, Junhui Hou	http://arxiv.org/pdf/2501.10789v1	None
📝 更新	Leveraging Consistent Spatio-Temporal Correspondence for Robust Visual Odometry	利用一致时空对应关系进行鲁棒视觉里程计	Zhaoxing Zhang, Junda Cheng, Gangwei Xu, Xiaoxiang Wang, Can Zhang, Xin Yang	http://arxiv.org/pdf/2412.16923v3	None
📝 更新	Self-Supervised Scene Flow Estimation with Point-Voxel Fusion and Surface Representation	自监督场景光流估计：基于点-体素融合与表面表示	Xuezhi Xiang, Xi Wang, Lei Zhang, Denis Ombati, Himaloy Himu, Xiantong Zhen	http://arxiv.org/pdf/2410.13355v2	None
📝 更新	Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction	莲花：基于扩散的高质量密集预测视觉基础模型	Jing He, Haodong Li, Wei Yin, Yixun Liang, Leheng Li, Kaiqiang Zhou, Hongbo Zhang, Bingbing Liu .etc.	http://arxiv.org/pdf/2409.18124v5	https://lotus3d.github.io/.
📝 更新	Manydepth2: Motion-Aware Self-Supervised Multi-Frame Monocular Depth Estimation in Dynamic Scenes	Manydepth2：动态场景中的运动感知自监督多帧单目深度估计	Kaichen Zhou, Jia-Wang Bian, Qian Xie, Jian-Qing Zheng, Niki Trigoni, Andrew Markham	http://arxiv.org/pdf/2312.15268v7	https://github.com/kaichen-z/Manydepth2.
📝 更新	Human as Points: Explicit Point-based 3D Human Reconstruction from Single-view RGB Images	人点化：从单视图RGB图像中显式点云的3D人体重建	Yingzhi Tang, Qijian Zhang, Junhui Hou, Yebin Liu	http://arxiv.org/pdf/2311.02892v2	https://github.com/yztang4/HaP.

神经渲染

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Exploring Siamese Networks in Self-Supervised Fast MRI Reconstruction	探索Siamese网络在自监督快速MRI重建中的应用	Liyan Sun, Shaocong Yu, Chi Zhang, Xinghao Ding	http://arxiv.org/pdf/2501.10851v1	None
📝 更新	DynPoint: Dynamic Neural Point For View Synthesis	动态神经视点合成点	Kaichen Zhou, Jia-Xing Zhong, Sangyun Shin, Kai Lu, Yiyuan Yang, Andrew Markham, Niki Trigoni	http://arxiv.org/pdf/2310.18999v4	None

3DGS

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Decoupling Appearance Variations with 3D Consistent Features in Gaussian Splatting	基于高斯分层中的3D一致性特征解耦外观变化	Jiaqi Lin, Zhihao Li, Binxiao Huang, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Xiaofei Wu, Fenglong Song .etc.	http://arxiv.org/pdf/2501.10788v1	None
📝 更新	3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement	基于3D高斯散布的物理物体排列变化检测：3DGS-CD	Ziqi Lu, Jianbo Ye, John Leonard	http://arxiv.org/pdf/2411.03706v2	https://github.com/520xyxyzq/3DGS-CD.

多模态

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	Exploring Transferable Homogeneous Groups for Compositional Zero-Shot Learning	探索可迁移的同质组用于组合零样本学习	Zhijie Rao, Jingcai Guo, Miaoge Li, Yang Chen	http://arxiv.org/pdf/2501.10695v1	None
🆕 发布	Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No!	多模态大型语言模型能否进行视觉时空理解和推理？答案是：不能！	Mohamed Fazli Imam, Chenyang Lyu, Alham Fikri Aji	http://arxiv.org/pdf/2501.10674v1	None
📝 更新	Automatic Fused Multimodal Deep Learning for Plant Identification	自动融合多模态深度学习植物识别	Alfreds Lapkovskis, Natalia Nefedova, Ali Beikmohammadi	http://arxiv.org/pdf/2406.01455v3	None

具身智能

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	RoMu4o: A Robotic Manipulation Unit For Orchard Operations Automating Proximal Hyperspectral Leaf Sensing	RoMu4o：一种用于果园作业的机器人操作单元，实现近程高光谱叶片传感自动化	Mehrad Mortazavi, David J. Cappelleri, Reza Ehsani	http://arxiv.org/pdf/2501.10621v1	https://github.com/mehradmrt/UCM-AgBot-ROS2
📝 更新	BTMTrack: Robust RGB-T Tracking via Dual-template Bridging and Temporal-Modal Candidate Elimination	BTMTrack：通过双模板桥接和时序模态候选消除实现的鲁棒RGB-T跟踪	Zhongxuan Zhang, Bi Zeng, Xinyu Ni, Yimin Du	http://arxiv.org/pdf/2501.03616v3	None

人体分析

状态	英文标题	中文标题	作者	PDF链接	代码链接
📝 更新	VIPeR: Visual Incremental Place Recognition with Adaptive Mining and Lifelong Learning	VIPeR：基于自适应挖掘和终身学习的视觉增量场所识别	Yuhang Ming, Minyang Xu, Xingrui Yang, Weicai Ye, Weihan Wang, Yong Peng, Weichen Dai, Wanzeng Kong	http://arxiv.org/pdf/2407.21416v2	None

人脸技术

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection	LD-DETR：循环解码器检测Transformer用于视频瞬间检索和精彩片段检测	Pengcheng Zhao, Zhixian He, Fuwei Zhang, Shujin Lin, Fan Zhou	http://arxiv.org/pdf/2501.10787v1	https://github.com/qingchen239/ld-detr.
📝 更新	PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration	PSReg：基于先验的稀疏专家混合点云配准	Xiaoshui Huang, Zhou Huang, Yifan Zuo, Yongshun Gong, Chengdong Zhang, Deyang Liu, Yuming Fang	http://arxiv.org/pdf/2501.07762v2	None

数字人

状态	英文标题	中文标题	作者	PDF链接	代码链接
📝 更新	Golden Noise for Diffusion Models: A Learning Framework	金噪扩散模型：一个学习框架	Zikai Zhou, Shitong Shao, Lichen Bai, Zhiqiang Xu, Bo Han, Zeke Xie	http://arxiv.org/pdf/2411.09502v4	None

模型优化

状态	英文标题	中文标题	作者	PDF链接	代码链接
📝 更新	Enhanced Urban Region Profiling with Adversarial Self-Supervised Learning for Robust Forecasting and Security	增强城市区域特征提取：基于对抗自监督学习的鲁棒预测与安全	Weiliang Chen, Qianqian Ren, Yong Liu, Jianguo Sun	http://arxiv.org/pdf/2402.01163v3	None

医学应用

状态	英文标题	中文标题	作者	PDF链接	代码链接
🆕 发布	No More Sliding Window: Efficient 3D Medical Image Segmentation with Differentiable Top-k Patch Sampling	不再使用滑动窗口：基于可微分的Top-k补丁采样的高效3D医学图像分割	Young Seok Jeon, Hongfei Yang, Huazhu Fu, Mengling Feng	http://arxiv.org/pdf/2501.10814v1	None
🆕 发布	MedFILIP: Medical Fine-grained Language-Image Pre-training	医细粒度语言-图像预训练：MedFILIP	Xinjie Liang, Xiangyu Li, Fanding Li, Jie Jiang, Qing Dong, Wei Wang, Kuanquan Wang, Suyu Dong .etc.	http://arxiv.org/pdf/2501.10775v1	https://github.com/PerceptionComputingLab/MedFILIP.
🆕 发布	Enhancing Diagnostic in 3D COVID-19 Pneumonia CT-scans through Explainable Uncertainty Bayesian Quantification	通过可解释的不确定性贝叶斯量化增强3D COVID-19肺炎CT扫描的诊断	Juan Manuel Liscano Fierro, Hector J. Hortua	http://arxiv.org/pdf/2501.10770v1	None
🆕 发布	Deformable Image Registration of Dark-Field Chest Radiographs for Local Lung Signal Change Assessment	可变形图像配准用于暗场胸部X光片局部肺信号变化评估	Fabian Drexel, Vasiliki Sideri-Lampretsa, Henriette Bast, Alexander W. Marka, Thomas Koehler, Florian T. Gassert, Daniela Pfeiffer, Daniel Rueckert .etc.	http://arxiv.org/pdf/2501.10757v1	None
🆕 发布	A CNN-Transformer for Classification of Longitudinal 3D MRI Images -- A Case Study on Hepatocellular Carcinoma Prediction	基于CNN-Transformer的纵向3D MRI图像分类——肝癌预测案例研究	Jakob Nolte, Maureen M. J. Guichelaar, Donald E. Bouman, Stephanie M. van den Berg, Maryam Amir Haeri	http://arxiv.org/pdf/2501.10733v1	None
🆕 发布	In the Picture: Medical Imaging Datasets, Artifacts, and their Living Review	图像中的医学影像数据集、伪影及其生活综述	Amelia Jiménez-Sánchez, Natalia-Rozalia Avlona, Sarah de Boer, Víctor M. Campello, Aasa Feragen, Enzo Ferrante, Melanie Ganz, Judy Wawira Gichoya .etc.	http://arxiv.org/pdf/2501.10727v1	None
🆕 发布	Hierarchical LoG Bayesian Neural Network for Enhanced Aorta Segmentation	分层LoG贝叶斯神经网络增强主动脉分割	Delin An, Pan Du, Pengfei Gu, Jian-Xun Wang, Chaoli Wang	http://arxiv.org/pdf/2501.10615v1	https://github.com/adlsn/LoGBNet.
📝 更新	CBAM-EfficientNetV2 for Histopathology Image Classification using Transfer Learning and Dual Attention Mechanisms	基于迁移学习和双注意力机制的CBAM-EfficientNetV2在病理图像分类中的应用	Naren Sengodan	http://arxiv.org/pdf/2410.22392v5	None
📝 更新	Latent Diffusion for Medical Image Segmentation: End to end learning for fast sampling and accuracy	潜扩散在医学图像分割中的应用：端到端学习以实现快速采样和精度	Fahim Ahmed Zaman, Mathews Jacob, Amanda Chang, Kan Liu, Milan Sonka, Xiaodong Wu	http://arxiv.org/pdf/2407.12952v2	https://github.com/FahimZaman/LDSeg.git.
📝 更新	Stitching, Fine-tuning, Re-training: A SAM-enabled Framework for Semi-supervised 3D Medical Image Segmentation	基于SAM的半监督3D医学图像分割框架：拼接、微调和再训练	Shumeng Li, Lei Qi, Qian Yu, Jing Huo, Yinghuan Shi, Yang Gao	http://arxiv.org/pdf/2403.11229v2	None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2025-01-18.md

2025-01-18.md

[UPDATED!] 2025-01-18 (Update Time)

图像理解

检测分割

视频理解

生成模型

图像处理

3D场景

神经渲染

3DGS

多模态

具身智能

人体分析

人脸技术

数字人

模型优化

医学应用

Files

2025-01-18.md

Latest commit

History

2025-01-18.md

File metadata and controls

[UPDATED!] 2025-01-18 (Update Time)

图像理解

检测分割

视频理解

生成模型

图像处理

3D场景

神经渲染

3DGS

多模态

具身智能

人体分析

人脸技术

数字人

模型优化

医学应用