Skip to content

Latest commit

 

History

History
executable file
·
68 lines (53 loc) · 8.8 KB

2024-02-24.md

File metadata and controls

executable file
·
68 lines (53 loc) · 8.8 KB

[UPDATED!] 2024-02-24 (Publish Time)

生成模型

Publish Date Title Title_CN Authors PDF Code
2024-02-24 Sandwich GAN: Image Reconstruction from Phase Mask based Anti-dazzle Imaging Sandwich GAN:基于相位掩模的防眩光图像重建 Xiaopeng Peng, Erin F. Fleet, Abbie T. Watnik, Grover A. Swartzlander http://arxiv.org/pdf/2402.15919v1 null
2024-02-24 Enhanced Droplet Analysis Using Generative Adversarial Networks 使用生成对抗网络增强液滴分析 Tan-Hanh Pham, Kim-Doang Nguyen http://arxiv.org/pdf/2402.15909v1 null
2024-02-24 HIR-Diff: Unsupervised Hyperspectral Image Restoration Via Improved Diffusion Models HIR-Diff:通过改进的扩散模型进行无监督高光谱图像恢复 Li Pang, Xiangyu Rui, Long Cui, Hongzhong Wang, Deyu Meng, Xiangyong Cao http://arxiv.org/pdf/2402.15865v1 null
2024-02-24 A Generative Machine Learning Model for Material Microstructure 3D Reconstruction and Performance Evaluation 用于材料微观结构 3D 重建和性能评估的生成机器学习模型 Yilin Zheng, Zhigong Song http://arxiv.org/pdf/2402.15815v1 null
2024-02-24 Intelligent Director: An Automatic Framework for Dynamic Visual Composition using ChatGPT 智能导演:使用 ChatGPT 的动态视觉合成自动框架 Sixiao Zheng, Jingyang Huo, Yu Wang, Yanwei Fu http://arxiv.org/pdf/2402.15746v1 null

多模态

Publish Date Title Title_CN Authors PDF Code
2024-02-24 Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA 弥合 2D 和 3D 视觉问答之间的差距:3D VQA 的融合方法 Wentao Mo, Yang Liu http://arxiv.org/pdf/2402.15933v1 null
2024-02-24 Multimodal Instruction Tuning with Conditional Mixture of LoRA 使用 LoRA 的条件混合进行多模式指令调整 Ying Shen, Zhiyang Xu, Qifan Wang, Yu Cheng, Wenpeng Yin, Lifu Huang http://arxiv.org/pdf/2402.15896v1 null
2024-02-24 FedMM: Federated Multi-Modal Learning with Modality Heterogeneity in Computational Pathology FedMM:计算病理学中具有模态异质性的联合多模态学习 Yuanzhe Peng, Jieming Bian, Jie Xu http://arxiv.org/pdf/2402.15858v1 null
2024-02-24 Parameter-efficient Prompt Learning for 3D Point Cloud Understanding 用于 3D 点云理解的参数高效快速学习 Hongyu Sun, Yongcai Wang, Wang Chen, Haoran Deng, Deying Li http://arxiv.org/pdf/2402.15823v1 null
2024-02-24 Increasing SAM Zero-Shot Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human Annotation 使用 GPT-4 生成的描述性提示(无需人工注释)提高多模态医学图像的 SAM 零样本性能 Zekun Jiang, Dongjie Cheng, Ziyuan Qin, Jun Gao, Qicheng Lao, Kang Li, Le Zhang http://arxiv.org/pdf/2402.15759v1 null
2024-02-24 GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation GAOKAO-MM:中国人类水平的多模态模型评估基准 Yi Zong, Xipeng Qiu http://arxiv.org/pdf/2402.15745v1 null
2024-02-24 CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge CLIPose:利用预先训练的视觉语言知识进行类别级物体姿态估计 Xiao Lin, Minghao Zhu, Ronghao Dang, Guangliang Zhou, Shaolong Shu, Feng Lin, Chengju Liu, Qijun Chen http://arxiv.org/pdf/2402.15726v1 null
2024-02-24 DeepLight: Reconstructing High-Resolution Observations of Nighttime Light With Multi-Modal Remote Sensing Data DeepLight:利用多模态遥感数据重建夜间光的高分辨率观测 Lixian Zhang, Runmin Dong, Shuai Yuan, Jinxiao Zhang, Mengxuan Chen, Juepeng Zheng, Haohuan Fu http://arxiv.org/pdf/2402.15659v1 null

3DGS

Publish Date Title Title_CN Authors PDF Code
2024-02-24 Spec-Gaussian: Anisotropic View-Dependent Appearance for 3D Gaussian Splatting Spec-Gaussian:3D 高斯泼溅的各向异性视图相关外观 Ziyi Yang, Xinyu Gao, Yangtian Sun, Yihua Huang, Xiaoyang Lyu, Wen Zhou, Shaohui Jiao, Xiaojuan Qi, Xiaogang Jin http://arxiv.org/pdf/2402.15870v1 null

模型压缩/优化

Publish Date Title Title_CN Authors PDF Code
2024-02-24 DART: Depth-Enhanced Accurate and Real-Time Background Matting DART:深度增强的准确实时背景抠图 Hanxi Li, Guofeng Li, Bo Li, Lin Wu, Yan Cheng http://arxiv.org/pdf/2402.15820v1 null

分类/检测/识别/分割/...

Publish Date Title Title_CN Authors PDF Code
2024-02-24 Explainable Contrastive and Cost-Sensitive Learning for Cervical Cancer Classification 宫颈癌分类的可解释对比和成本敏感学习 Ashfiqun Mustari, Rushmia Ahmed, Afsara Tasnim, Jakia Sultana Juthi, G M Shahariar http://arxiv.org/pdf/2402.15905v1 null
2024-02-24 Multi-Object Tracking by Hierarchical Visual Representations 通过分层视觉表示进行多目标跟踪 Jinkun Cao, Jiangmiao Pang, Kris Kitani http://arxiv.org/pdf/2402.15895v1 null
2024-02-24 Multi-graph Graph Matching for Coronary Artery Semantic Labeling 冠状动脉语义标记的多图图形匹配 Chen Zhao, Zhihui Xu, Pukar Baral, Michel Esposito, Weihua Zhou http://arxiv.org/pdf/2402.15894v1 null
2024-02-24 Multiple Instance Learning for Glioma Diagnosis using Hematoxylin and Eosin Whole Slide Images: An Indian cohort Study 使用苏木精和曙红全幻灯片图像进行神经胶质瘤诊断的多实例学习:一项印度队列研究 Ekansh Chauhan, Amit Sharma, Megha S Uppin, C. V. Jawahar, Vinod P. K http://arxiv.org/pdf/2402.15832v1 null
2024-02-24 Sequential Visual and Semantic Consistency for Semi-supervised Text Recognition 半监督文本识别的顺序视觉和语义一致性 Mingkun Yang, Biao Yang, Minghui Liao, Yingying Zhu, Xiang Bai http://arxiv.org/pdf/2402.15806v1 null
2024-02-24 IRConStyle: Image Restoration Framework Using Contrastive Learning and Style Transfer IRConStyle:使用对比学习和风格迁移的图像恢复框架 Dongqi Fan, Xin Zhao, Liang Chang http://arxiv.org/pdf/2402.15784v1 null
2024-02-24 Res-VMamba: Fine-Grained Food Category Visual Classification Using Selective State Space Models with Deep Residual Learning Res-VMamba:使用选择性状态空间模型和深度残差学习进行细粒度食品类别视觉分类 Chi-Sheng Chen, Guan-Ying Chen, Dong Zhou, Di Jiang, Dai-Shi Chen http://arxiv.org/pdf/2402.15761v1 null
2024-02-24 Detection Is Tracking: Point Cloud Multi-Sweep Deep Learning Models Revisited 检测即跟踪:重新审视点云多重扫描深度学习模型 Lingji Chen http://arxiv.org/pdf/2402.15756v1 null
2024-02-24 GiMeFive: Towards Interpretable Facial Emotion Classification GiMeFive:迈向可解释的面部情绪分类 Jiawen Wang, Leah Kawka http://arxiv.org/pdf/2402.15662v1 null

图像理解

Publish Date Title Title_CN Authors PDF Code
2024-02-24 RAUCA: A Novel Physical Adversarial Attack on Vehicle Detectors via Robust and Accurate Camouflage Generation RAUCA:通过强大而准确的伪装生成对车辆探测器进行新型物理对抗攻击 Jiawei Zhou, Linye Lyu, Daojing He, Yu Li http://arxiv.org/pdf/2402.15853v1 null

其他

Publish Date Title Title_CN Authors PDF Code
2024-02-24 NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation NaVid:基于视频的 VLM 计划视觉和语言导航的下一步 Jiazhao Zhang, Kunyu Wang, Rongtao Xu, Gengze Zhou, Yicong Hong, Xiaomeng Fang, Qi Wu, Zhizheng Zhang, Wang He http://arxiv.org/pdf/2402.15852v1 null
2024-02-24 Design, Implementation and Analysis of a Compressed Sensing Photoacoustic Projection Imaging System 压缩感知光声投影成像系统的设计、实现与分析 Markus Haltmeier, Matthias Ye, Karoline Felbermayer, Florian Hinterleitner, Peter Burgholzer http://arxiv.org/pdf/2402.15750v1 null
2024-02-24 Traditional Transformation Theory Guided Model for Learned Image Compression 传统变换理论指导的学习图像压缩模型 Zhiyuan Li, Chenyang Ge, Shun Li http://arxiv.org/pdf/2402.15744v1 null
2024-02-24 A Heterogeneous Dynamic Convolutional Neural Network for Image Super-resolution 一种用于图像超分辨率的异构动态卷积神经网络 Chunwei Tian, Xuanyu Zhang, Jia Ren, Wangmeng Zuo, Yanning Zhang, Chia-Wen Lin http://arxiv.org/pdf/2402.15704v1 null
2024-02-24 General Purpose Image Encoder DINOv2 for Medical Image Registration 用于医学图像配准的通用图像编码器 DINOv2 Xinrui Song, Xuanang Xu, Pingkun Yan http://arxiv.org/pdf/2402.15687v1 null
2024-02-24 Scalable Density-based Clustering with Random Projections 具有随机投影的可扩展的基于密度的聚类 Haochuan Xu, Ninh Pham http://arxiv.org/pdf/2402.15679v1 null