Skip to content

Latest commit

 

History

History
executable file
·
78 lines (59 loc) · 9.2 KB

2024-07-27.md

File metadata and controls

executable file
·
78 lines (59 loc) · 9.2 KB

[UPDATED!] 2024-07-27 (Publish Time)

生成模型

Publish Date Title Title_CN Authors PDF Code
2024-07-27 Mamba? Catch The Hype Or Rethink What Really Helps for Image Registration Mamba?赶上炒作还是重新思考图像配准的真正帮助 Bailiang Jian, Jiazhen Pan, Morteza Ghahremani, Daniel Rueckert, Christian Wachinger, Benedikt Wiestler http://arxiv.org/pdf/2407.19274v1 null
2024-07-27 Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction 通过样本级偏差预测生成细粒度场景图 Yansheng Li, Tingzhu Wang, Kang Wu, Linlin Wang, Xin Guo, Wenbin Wang http://arxiv.org/pdf/2407.19259v1 null
2024-07-27 Radio Frequency Signal based Human Silhouette Segmentation: A Sequential Diffusion Approach 基于射频信号的人体轮廓分割:一种序贯扩散方法 Penghui Wen, Kun Hu, Dong Yuan, Zhiyuan Ning, Changyang Li, Zhiyong Wang http://arxiv.org/pdf/2407.19244v1 null
2024-07-27 Channel Boosted CNN-Transformer-based Multi-Level and Multi-Scale Nuclei Segmentation 基于通道增强 CNN-Transformer 的多级多尺度核分割 Zunaira Rauf, Abdul Rehman Khan, Asifullah Khan http://arxiv.org/pdf/2407.19186v1 null
2024-07-27 Data Processing Techniques for Modern Multimodal Models 现代多模态模型的数据处理技术 Yinheng Li, Han Ding, Hang Chen http://arxiv.org/pdf/2407.19180v1 null

多模态

Publish Date Title Title_CN Authors PDF Code
2024-07-27 Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification 将大型语言模型集成到三模态架构中,实现抑郁症的自动分类 Santosh V. Patapati http://arxiv.org/pdf/2407.19340v1 null
2024-07-27 Harmfully Manipulated Images Matter in Multimodal Misinformation Detection 有害操纵的图像在多模态错误信息检测中很重要 Bing Wang, Shengsheng Wang, Changchun Li, Renchu Guan, Ximing Li http://arxiv.org/pdf/2407.19192v1 null
2024-07-27 LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models LLaVA-Read:增强多模态语言模型的阅读能力 Ruiyi Zhang, Yufan Zhou, Jian Chen, Jiuxiang Gu, Changyou Chen, Tong Sun http://arxiv.org/pdf/2407.19185v1 null
2024-07-27 Robust Multimodal 3D Object Detection via Modality-Agnostic Decoding and Proximity-based Modality Ensemble 通过模态无关解码和基于邻近度的模态集成实现稳健的多模态 3D 物体检测 Juhan Cha, Minseok Joo, Jihwan Park, Sanghyeok Lee, Injae Kim, Hyunwoo J. Kim http://arxiv.org/pdf/2407.19156v1 null

Nerf

Publish Date Title Title_CN Authors PDF Code
2024-07-27 Revisit Self-supervised Depth Estimation with Local Structure-from-Motion 重新审视基于局部运动结构的自监督深度估计 Shengjie Zhu, Xiaoming Liu http://arxiv.org/pdf/2407.19166v1 null

模型压缩/优化

Publish Date Title Title_CN Authors PDF Code
2024-07-27 Sewer Image Super-Resolution with Depth Priors and Its Lightweight Network 基于深度先验的下水道图像超分辨率及其轻量级网络 Gang Pan, Chen Wang, Zhijie Sui, Shuai Guo, Yaozhi Lv, Honglie Li, Di Sun http://arxiv.org/pdf/2407.19271v1 null

分类/检测/识别/分割/...

Publish Date Title Title_CN Authors PDF Code
2024-07-27 Polyp segmentation in colonoscopy images using DeepLabV3++ 使用 DeepLabV3++ 对结肠镜检查图像中的息肉进行分割 Al Mohimanul Islam, Sadia Shakiba Bhuiyan, Mysun Mashira, Md. Rayhan Ahmed, Salekul Islam, Swakkhar Shatabda http://arxiv.org/pdf/2407.19327v1 null
2024-07-27 MSP-MVS: Multi-granularity Segmentation Prior Guided Multi-View Stereo MSP-MVS:多粒度分割优先引导多视图立体 Zhenlong Yuan, Cong Liu, Fei Shen, Zhaoxin Li, Tianlu Mao, Zhaoqi Wang http://arxiv.org/pdf/2407.19323v1 null
2024-07-27 AResNet-ViT: A Hybrid CNN-Transformer Network for Benign and Malignant Breast Nodule Classification in Ultrasound Images AResNet-ViT:一种用于超声图像中乳腺结节良恶性分类的混合 CNN-Transformer 网络 Xin Zhao, Qianqian Zhu, Jialing Wu http://arxiv.org/pdf/2407.19316v1 null
2024-07-27 Ensembling convolutional neural networks for human skin segmentation 集成卷积神经网络进行人体皮肤分割 Patryk Kuban, Michal Kawulok http://arxiv.org/pdf/2407.19310v1 null
2024-07-27 Symmetrical Joint Learning Support-query Prototypes for Few-shot Segmentation 用于小样本分割的对称联合学习支持查询原型 Qun Li, Baoquan Sun, Fu Xiao, Yonggang Qi, Bir Bhanu http://arxiv.org/pdf/2407.19306v1 null
2024-07-27 GP-VLS: A general-purpose vision language model for surgery GP-VLS:用于手术的通用视觉语言模型 Samuel Schmidgall, Joseph Cho, Cyril Zakka, William Hiesinger http://arxiv.org/pdf/2407.19305v1 null
2024-07-27 Rethinking Attention Module Design for Point Cloud Analysis 重新思考点云分析的注意力模块设计 Chengzhi Wu, Kaige Wang, Zeyun Zhong, Hao Fu, Junwei Zheng, Jiaming Zhang, Julius Pfrommer, Jürgen Beyerer http://arxiv.org/pdf/2407.19294v1 null
2024-07-27 Optimizing Synthetic Data for Enhanced Pancreatic Tumor Segmentation 优化合成数据以增强胰腺肿瘤分割 Linkai Peng, Zheyuan Zhang, Gorkem Durak, Frank H. Miller, Alpay Medetalibeyoglu, Michael B. Wallace, Ulas Bagci http://arxiv.org/pdf/2407.19284v1 null
2024-07-27 Enhancing Tree Type Detection in Forest Fire Risk Assessment: Multi-Stage Approach and Color Encoding with Forest Fire Risk Evaluation Framework for UAV Imagery 增强森林火灾风险评估中的树木类型检测:无人机图像森林火灾风险评估框架的多阶段方法和颜色编码 Jinda Zhang, Michal Aibin http://arxiv.org/pdf/2407.19184v1 null
2024-07-27 Reducing Spurious Correlation for Federated Domain Generalization 减少联邦域泛化的虚假相关性 Shuran Ma, Weiying Xie, Daixun Li, Haowei Li, Yunsong Li http://arxiv.org/pdf/2407.19174v1 null
2024-07-27 Few-Shot Medical Image Segmentation with Large Kernel Attention 具有大核注意力机制的少样本医学图像分割 Xiaoxiao Wu, Xiaowei Chen, Zhenguo Gao, Shulei Qu, Yuanyuan Qiu http://arxiv.org/pdf/2407.19148v1 null

图像理解

Publish Date Title Title_CN Authors PDF Code
2024-07-27 Multi-Expert Adaptive Selection: Task-Balancing for All-in-One Image Restoration 多专家自适应选择:一体化图像修复的任务平衡 Xiaoyan Yu, Shen Zhou, Huafeng Li, Liehuang Zhu http://arxiv.org/pdf/2407.19139v1 null

Transformer

Publish Date Title Title_CN Authors PDF Code
2024-07-27 Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's Impact on Spatio-Temporal Cross-Attentions 更快的图像到视频生成:仔细研究 CLIP 图像嵌入对时空交叉注意力的影响 Ashkan Taghipour, Morteza Ghahremani, Mohammed Bennamoun, Aref Miri Rekavandi, Zinuo Li, Hamid Laga, Farid Boussaid http://arxiv.org/pdf/2407.19205v1 null

3D/CG

Publish Date Title Title_CN Authors PDF Code
2024-07-27 A Bayesian Approach Toward Robust Multidimensional Ellipsoid-Specific Fitting 一种稳健的多维椭球体拟合的贝叶斯方法 Zhao Mingyang, Jia Xiaohong, Ma Lei, Shi Yuke, Jiang Jingen, Li Qizhai, Yan Dong-Ming, Huang Tiejun http://arxiv.org/pdf/2407.19269v1 null
2024-07-27 Magic3DSketch: Create Colorful 3D Models From Sketch-Based 3D Modeling Guided by Text and Language-Image Pre-Training Magic3DSketch:通过文本和语言图像预训练指导基于草图的 3D 建模创建彩色 3D 模型 Ying Zang, Yidong Han, Chaotao Ding, Jianqi Zhang, Tianrun Chen http://arxiv.org/pdf/2407.19225v1 null
2024-07-27 RePLAy: Remove Projective LiDAR Depthmap Artifacts via Exploiting Epipolar Geometry RePLAy:利用对极几何去除投影 LiDAR 深度图伪影 Shengjie Zhu, Girish Chandar Ganesan, Abhinav Kumar, Xiaoming Liu http://arxiv.org/pdf/2407.19154v1 null

其他

Publish Date Title Title_CN Authors PDF Code
2024-07-27 Comprehensive Attribution: Inherently Explainable Vision Model with Feature Detector 综合归因:具有特征检测器的固有可解释视觉模型 Xianren Zhang, Dongwon Lee, Suhang Wang http://arxiv.org/pdf/2407.19308v1 null
2024-07-27 A self-supervised and adversarial approach to hyperspectral demosaicking and RGB reconstruction in surgical imaging 手术成像中高光谱去马赛克和 RGB 重建的自监督和对抗方法 Peichao Li, Oscar MacCormac, Jonathan Shapey, Tom Vercauteren http://arxiv.org/pdf/2407.19282v1 null
2024-07-27 Towards the Dynamics of a DNN Learning Symbolic Interactions 面向学习符号交互的 DNN 动力学 Qihan Ren, Yang Xu, Junpeng Zhang, Yue Xin, Dongrui Liu, Quanshi Zhang http://arxiv.org/pdf/2407.19198v1 null
2024-07-27 Power-LLaVA: Large Language and Vision Assistant for Power Transmission Line Inspection Power-LLaVA:电力输电线路巡检大型语言和视觉助手 Jiahao Wang, Mingxuan Li, Haichen Luo, Jinguo Zhu, Aijun Yang, Mingzhe Rong, Xiaohua Wang http://arxiv.org/pdf/2407.19178v1 null