状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Visual RAG: Expanding MLLM visual knowledge without fine-tuning | 视觉RAG:无需微调扩展MLLM视觉知识 | Mirco Bonomo, Simone Bianco | http://arxiv.org/pdf/2501.10834v1 | None |
🆕 发布 | A Resource-Efficient Training Framework for Remote Sensing Text--Image Retrieval | 资源高效遥感文本-图像检索训练框架 | Weihang Zhang, Jihao Li, Shuoke Li, Ziqing Niu, Jialiang Chen, Wenkai Zhang | http://arxiv.org/pdf/2501.10638v1 | https://github.com/ZhangWeihang99/CMER. |
📝 更新 | JigsawHSI: a network for Hyperspectral Image classification | 拼图高光谱图像分类网络 | Jaime Moraga | http://arxiv.org/pdf/2206.02327v3 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | OpenEarthMap-SAR: A Benchmark Synthetic Aperture Radar Dataset for Global High-Resolution Land Cover Mapping | 开放地球地图-SAR:全球高分辨率土地覆盖制图基准合成孔径雷达数据集 | Junshi Xia, Hongruixuan Chen, Clifford Broni-Bediako, Yimin Wei, Jian Song, Naoto Yokoya | http://arxiv.org/pdf/2501.10891v1 | None |
🆕 发布 | GAUDA: Generative Adaptive Uncertainty-guided Diffusion-based Augmentation for Surgical Segmentation | GAUDA:基于生成自适应不确定性引导的扩散增强用于手术分割 | Yannik Frisch, Christina Bornberg, Moritz Fuchs, Anirban Mukhopadhyay | http://arxiv.org/pdf/2501.10819v1 | None |
🆕 发布 | Efficient Auto-Labeling of Large-Scale Poultry Datasets (ALPD) Using Semi-Supervised Models, Active Learning, and Prompt-then-Detect Approach | 高效利用半监督模型、主动学习和提示-检测方法对大规模家禽数据集进行自动标注(ALPD) | Ramesh Bahadur Bist, Lilong Chai, Shawna Weimer, Hannah Atungulua, Chantel Pennicott, Xiao Yang, Sachin Subedi, Chaitanya Pallerla .etc. | http://arxiv.org/pdf/2501.10809v1 | None |
🆕 发布 | Semi-supervised Semantic Segmentation for Remote Sensing Images via Multi-scale Uncertainty Consistency and Cross-Teacher-Student Attention | 基于多尺度不确定性一致性和跨教师-学生注意力的遥感图像半监督语义分割 | Shanwen Wang, Changrui Chen, Xin Sun, Danfeng Hong, Jungong Han | http://arxiv.org/pdf/2501.10736v1 | None |
🆕 发布 | Multi-modal Fusion and Query Refinement Network for Video Moment Retrieval and Highlight Detection | 多模态融合与查询细化网络在视频时刻检索与高光检测中的应用 | Yifang Xu, Yunzhuo Sun, Benxiang Zhai, Zien Xie, Youyao Jia, Sidan Du | http://arxiv.org/pdf/2501.10692v1 | None |
🆕 发布 | ClusterViG: Efficient Globally Aware Vision GNNs via Image Partitioning | ClusterViG:通过图像分区实现高效的全球感知视觉图神经网络 | Dhruv Parikh, Jacob Fein-Ashley, Tian Ye, Rajgopal Kannan, Viktor Prasanna | http://arxiv.org/pdf/2501.10640v1 | None |
📝 更新 | Impact of color and mixing proportion of synthetic point clouds on semantic segmentation | 合成点云中颜色和混合比例对语义分割的影响 | Shaojie Zhou, Jia-Rui Lin, Peng Pan, Yuandong Pan, Ioannis Brilakis | http://arxiv.org/pdf/2412.19145v2 | None |
📝 更新 | Uncertainty-Guided Appearance-Motion Association Network for Out-of-Distribution Action Detection | 基于不确定性引导的外观-运动关联网络进行分布外动作检测 | Xiang Fang, Arvind Easwaran, Blaise Genest | http://arxiv.org/pdf/2409.09953v2 | None |
📝 更新 | Depth-Weighted Detection of Behaviours of Risk in People with Dementia using Cameras | 深度加权痴呆症患者行为风险检测使用摄像头 | Pratik K. Mishra, Irene Ballester, Andrea Iaboni, Bing Ye, Kristine Newman, Alex Mihailidis, Shehroz S. Khan | http://arxiv.org/pdf/2408.15519v2 | None |
📝 更新 | Distilling Aggregated Knowledge for Weakly-Supervised Video Anomaly Detection | 弱监督视频异常检测中的知识蒸馏 | Jash Dalvi, Ali Dabouei, Gunjan Dhanuka, Min Xu | http://arxiv.org/pdf/2406.02831v2 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
📝 更新 | Neptune: The Long Orbit to Benchmarking Long Video Understanding | 涅普顿:迈向长视频理解基准的长途之旅 | Arsha Nagrani, Mingda Zhang, Ramin Mehran, Rachel Hornung, Nitesh Bharadwaj Gundavarapu, Nilpa Jha, Austin Myers, Xingyi Zhou .etc. | http://arxiv.org/pdf/2412.09582v2 | https://github.com/google-deepmind/neptune |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | EMO2: End-Effector Guided Audio-Driven Avatar Video Generation | EMO2:末端执行器引导的音频驱动虚拟形象视频生成 | Linrui Tian, Siqi Hu, Qi Wang, Bang Zhang, Liefeng Bo | http://arxiv.org/pdf/2501.10687v1 | None |
📝 更新 | DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder | 梦合:基于服装的轻量级任意物体着装编码器生成人类 | Ente Lin, Xujie Zhang, Fuwei Zhao, Yuxuan Luo, Xin Dong, Long Zeng, Xiaodan Liang | http://arxiv.org/pdf/2412.17644v3 | None |
📝 更新 | Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation | 即时调度:用于更快更好图像生成的扩散时间预测 | Zilyu Ye, Zhiyang Chen, Tiancheng Li, Zemin Huang, Weijian Luo, Guo-Jun Qi | http://arxiv.org/pdf/2412.01243v2 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption | 红外与可见光图像融合:从数据兼容性到任务适应性 | Jinyuan Liu, Guanyao Wu, Zhu Liu, Di Wang, Zhiying Jiang, Long Ma, Wei Zhong, Xin Fan .etc. | http://arxiv.org/pdf/2501.10761v1 | https://github.com/RollingPlain/IVIF_ZOO. |
🆕 发布 | Quadcopter Position Hold Function using Optical Flow in a Smartphone-based Flight Computer | 基于智能手机飞行计算机的光流四旋翼定位保持功能 | Noel P Caliston, Chris Jordan C. Aliac, James Arnold E. Nogra | http://arxiv.org/pdf/2501.10752v1 | None |
📝 更新 | Active Prompt Tuning Enables Gpt-40 To Do Efficient Classification Of Microscopy Images | 主动提示调整使Gpt-40能够高效分类显微镜图像 | Abhiram Kandiyana, Peter R. Mouton, Yaroslav Kolinko, Lawrence O. Hall, Dmitry Goldgof | http://arxiv.org/pdf/2411.02639v2 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | CS-Net:Contribution-based Sampling Network for Point Cloud Simplification | CS-Net:基于贡献的点云简化采样网络 | Tian Guo, Chen Chen, Hui Yuan, Xiaolong Mao, Raouf Hamzaoui, Junhui Hou | http://arxiv.org/pdf/2501.10789v1 | None |
📝 更新 | Leveraging Consistent Spatio-Temporal Correspondence for Robust Visual Odometry | 利用一致时空对应关系进行鲁棒视觉里程计 | Zhaoxing Zhang, Junda Cheng, Gangwei Xu, Xiaoxiang Wang, Can Zhang, Xin Yang | http://arxiv.org/pdf/2412.16923v3 | None |
📝 更新 | Self-Supervised Scene Flow Estimation with Point-Voxel Fusion and Surface Representation | 自监督场景光流估计:基于点-体素融合与表面表示 | Xuezhi Xiang, Xi Wang, Lei Zhang, Denis Ombati, Himaloy Himu, Xiantong Zhen | http://arxiv.org/pdf/2410.13355v2 | None |
📝 更新 | Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction | 莲花:基于扩散的高质量密集预测视觉基础模型 | Jing He, Haodong Li, Wei Yin, Yixun Liang, Leheng Li, Kaiqiang Zhou, Hongbo Zhang, Bingbing Liu .etc. | http://arxiv.org/pdf/2409.18124v5 | https://lotus3d.github.io/. |
📝 更新 | Manydepth2: Motion-Aware Self-Supervised Multi-Frame Monocular Depth Estimation in Dynamic Scenes | Manydepth2:动态场景中的运动感知自监督多帧单目深度估计 | Kaichen Zhou, Jia-Wang Bian, Qian Xie, Jian-Qing Zheng, Niki Trigoni, Andrew Markham | http://arxiv.org/pdf/2312.15268v7 | https://github.com/kaichen-z/Manydepth2. |
📝 更新 | Human as Points: Explicit Point-based 3D Human Reconstruction from Single-view RGB Images | 人点化:从单视图RGB图像中显式点云的3D人体重建 | Yingzhi Tang, Qijian Zhang, Junhui Hou, Yebin Liu | http://arxiv.org/pdf/2311.02892v2 | https://github.com/yztang4/HaP. |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Exploring Siamese Networks in Self-Supervised Fast MRI Reconstruction | 探索Siamese网络在自监督快速MRI重建中的应用 | Liyan Sun, Shaocong Yu, Chi Zhang, Xinghao Ding | http://arxiv.org/pdf/2501.10851v1 | None |
📝 更新 | DynPoint: Dynamic Neural Point For View Synthesis | 动态神经视点合成点 | Kaichen Zhou, Jia-Xing Zhong, Sangyun Shin, Kai Lu, Yiyuan Yang, Andrew Markham, Niki Trigoni | http://arxiv.org/pdf/2310.18999v4 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Decoupling Appearance Variations with 3D Consistent Features in Gaussian Splatting | 基于高斯分层中的3D一致性特征解耦外观变化 | Jiaqi Lin, Zhihao Li, Binxiao Huang, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Xiaofei Wu, Fenglong Song .etc. | http://arxiv.org/pdf/2501.10788v1 | None |
📝 更新 | 3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement | 基于3D高斯散布的物理物体排列变化检测:3DGS-CD | Ziqi Lu, Jianbo Ye, John Leonard | http://arxiv.org/pdf/2411.03706v2 | https://github.com/520xyxyzq/3DGS-CD. |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | Exploring Transferable Homogeneous Groups for Compositional Zero-Shot Learning | 探索可迁移的同质组用于组合零样本学习 | Zhijie Rao, Jingcai Guo, Miaoge Li, Yang Chen | http://arxiv.org/pdf/2501.10695v1 | None |
🆕 发布 | Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No! | 多模态大型语言模型能否进行视觉时空理解和推理?答案是:不能! | Mohamed Fazli Imam, Chenyang Lyu, Alham Fikri Aji | http://arxiv.org/pdf/2501.10674v1 | None |
📝 更新 | Automatic Fused Multimodal Deep Learning for Plant Identification | 自动融合多模态深度学习植物识别 | Alfreds Lapkovskis, Natalia Nefedova, Ali Beikmohammadi | http://arxiv.org/pdf/2406.01455v3 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | RoMu4o: A Robotic Manipulation Unit For Orchard Operations Automating Proximal Hyperspectral Leaf Sensing | RoMu4o:一种用于果园作业的机器人操作单元,实现近程高光谱叶片传感自动化 | Mehrad Mortazavi, David J. Cappelleri, Reza Ehsani | http://arxiv.org/pdf/2501.10621v1 | https://github.com/mehradmrt/UCM-AgBot-ROS2 |
📝 更新 | BTMTrack: Robust RGB-T Tracking via Dual-template Bridging and Temporal-Modal Candidate Elimination | BTMTrack:通过双模板桥接和时序模态候选消除实现的鲁棒RGB-T跟踪 | Zhongxuan Zhang, Bi Zeng, Xinyu Ni, Yimin Du | http://arxiv.org/pdf/2501.03616v3 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
📝 更新 | VIPeR: Visual Incremental Place Recognition with Adaptive Mining and Lifelong Learning | VIPeR:基于自适应挖掘和终身学习的视觉增量场所识别 | Yuhang Ming, Minyang Xu, Xingrui Yang, Weicai Ye, Weihan Wang, Yong Peng, Weichen Dai, Wanzeng Kong | http://arxiv.org/pdf/2407.21416v2 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection | LD-DETR:循环解码器检测Transformer用于视频瞬间检索和精彩片段检测 | Pengcheng Zhao, Zhixian He, Fuwei Zhang, Shujin Lin, Fan Zhou | http://arxiv.org/pdf/2501.10787v1 | https://github.com/qingchen239/ld-detr. |
📝 更新 | PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration | PSReg:基于先验的稀疏专家混合点云配准 | Xiaoshui Huang, Zhou Huang, Yifan Zuo, Yongshun Gong, Chengdong Zhang, Deyang Liu, Yuming Fang | http://arxiv.org/pdf/2501.07762v2 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
📝 更新 | Golden Noise for Diffusion Models: A Learning Framework | 金噪扩散模型:一个学习框架 | Zikai Zhou, Shitong Shao, Lichen Bai, Zhiqiang Xu, Bo Han, Zeke Xie | http://arxiv.org/pdf/2411.09502v4 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
📝 更新 | Enhanced Urban Region Profiling with Adversarial Self-Supervised Learning for Robust Forecasting and Security | 增强城市区域特征提取:基于对抗自监督学习的鲁棒预测与安全 | Weiliang Chen, Qianqian Ren, Yong Liu, Jianguo Sun | http://arxiv.org/pdf/2402.01163v3 | None |
状态 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
🆕 发布 | No More Sliding Window: Efficient 3D Medical Image Segmentation with Differentiable Top-k Patch Sampling | 不再使用滑动窗口:基于可微分的Top-k补丁采样的高效3D医学图像分割 | Young Seok Jeon, Hongfei Yang, Huazhu Fu, Mengling Feng | http://arxiv.org/pdf/2501.10814v1 | None |
🆕 发布 | MedFILIP: Medical Fine-grained Language-Image Pre-training | 医细粒度语言-图像预训练:MedFILIP | Xinjie Liang, Xiangyu Li, Fanding Li, Jie Jiang, Qing Dong, Wei Wang, Kuanquan Wang, Suyu Dong .etc. | http://arxiv.org/pdf/2501.10775v1 | https://github.com/PerceptionComputingLab/MedFILIP. |
🆕 发布 | Enhancing Diagnostic in 3D COVID-19 Pneumonia CT-scans through Explainable Uncertainty Bayesian Quantification | 通过可解释的不确定性贝叶斯量化增强3D COVID-19肺炎CT扫描的诊断 | Juan Manuel Liscano Fierro, Hector J. Hortua | http://arxiv.org/pdf/2501.10770v1 | None |
🆕 发布 | Deformable Image Registration of Dark-Field Chest Radiographs for Local Lung Signal Change Assessment | 可变形图像配准用于暗场胸部X光片局部肺信号变化评估 | Fabian Drexel, Vasiliki Sideri-Lampretsa, Henriette Bast, Alexander W. Marka, Thomas Koehler, Florian T. Gassert, Daniela Pfeiffer, Daniel Rueckert .etc. | http://arxiv.org/pdf/2501.10757v1 | None |
🆕 发布 | A CNN-Transformer for Classification of Longitudinal 3D MRI Images -- A Case Study on Hepatocellular Carcinoma Prediction | 基于CNN-Transformer的纵向3D MRI图像分类——肝癌预测案例研究 | Jakob Nolte, Maureen M. J. Guichelaar, Donald E. Bouman, Stephanie M. van den Berg, Maryam Amir Haeri | http://arxiv.org/pdf/2501.10733v1 | None |
🆕 发布 | In the Picture: Medical Imaging Datasets, Artifacts, and their Living Review | 图像中的医学影像数据集、伪影及其生活综述 | Amelia Jiménez-Sánchez, Natalia-Rozalia Avlona, Sarah de Boer, Víctor M. Campello, Aasa Feragen, Enzo Ferrante, Melanie Ganz, Judy Wawira Gichoya .etc. | http://arxiv.org/pdf/2501.10727v1 | None |
🆕 发布 | Hierarchical LoG Bayesian Neural Network for Enhanced Aorta Segmentation | 分层LoG贝叶斯神经网络增强主动脉分割 | Delin An, Pan Du, Pengfei Gu, Jian-Xun Wang, Chaoli Wang | http://arxiv.org/pdf/2501.10615v1 | https://github.com/adlsn/LoGBNet. |
📝 更新 | CBAM-EfficientNetV2 for Histopathology Image Classification using Transfer Learning and Dual Attention Mechanisms | 基于迁移学习和双注意力机制的CBAM-EfficientNetV2在病理图像分类中的应用 | Naren Sengodan | http://arxiv.org/pdf/2410.22392v5 | None |
📝 更新 | Latent Diffusion for Medical Image Segmentation: End to end learning for fast sampling and accuracy | 潜扩散在医学图像分割中的应用:端到端学习以实现快速采样和精度 | Fahim Ahmed Zaman, Mathews Jacob, Amanda Chang, Kan Liu, Milan Sonka, Xiaodong Wu | http://arxiv.org/pdf/2407.12952v2 | https://github.com/FahimZaman/LDSeg.git. |
📝 更新 | Stitching, Fine-tuning, Re-training: A SAM-enabled Framework for Semi-supervised 3D Medical Image Segmentation | 基于SAM的半监督3D医学图像分割框架:拼接、微调和再训练 | Shumeng Li, Lei Qi, Qian Yu, Jing Huo, Yinghuan Shi, Yang Gao | http://arxiv.org/pdf/2403.11229v2 | None |