Skip to content

Latest commit

 

History

History
executable file
·
182 lines (157 loc) · 34.9 KB

2024-11-18.md

File metadata and controls

executable file
·
182 lines (157 loc) · 34.9 KB

[UPDATED!] 2024-11-18 (Publish Time)

生成模型

Publish Date Title Title_CN Authors PDF Code
2024-11-18 The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning 众智之力:多智能体多模态模型用于文化图像标题生成 Longju Bai, Angana Borah, Oana Ignat, Rada Mihalcea http://arxiv.org/pdf/2411.11758v1 null
2024-11-18 Aligning Few-Step Diffusion Models with Dense Reward Difference Learning 与密集奖励差异学习对齐的几步扩散模型 Ziyi Zhang, Li Shen, Sen Zhang, Deheng Ye, Yong Luo, Miaojing Shi, Bo Du, Dacheng Tao http://arxiv.org/pdf/2411.11727v1 null
2024-11-18 SP${ }^3$ : Superpixel-propagated pseudo-label learning for weakly semi-supervised medical image segmentation SP${ }^3$:基于超像素传播的弱监督半监督医学图像分割伪标签学习 Shiman Li, Jiayue Zhao, Shaolei Liu, Xiaokun Dai, Chenxi Zhang, Zhijian Song http://arxiv.org/pdf/2411.11636v1 null
2024-11-18 Cascaded Diffusion Models for 2D and 3D Microscopy Image Synthesis to Enhance Cell Segmentation 级联扩散模型用于二维和三维显微镜图像合成以增强细胞分割 Rüveyda Yilmaz, Kaan Keven, Yuli Wu, Johannes Stegmaier http://arxiv.org/pdf/2411.11515v1 null
2024-11-18 LaVin-DiT: Large Vision Diffusion Transformer LaVin-DiT:大视觉扩散Transformer Zhaoqing Wang, Xiaobo Xia, Runnan Chen, Dongdong Yu, Changhu Wang, Mingming Gong, Tongliang Liu http://arxiv.org/pdf/2411.11505v1 null
2024-11-18 Look a Group at Once: Multi-Slide Modeling for Survival Prediction 一次看多个群体:生存预测的多幻灯片建模 Xinyang Li, Yi Zhang, Yi Xie, Jianfei Yang, Xi Wang, Hao Chen, Haixian Zhang http://arxiv.org/pdf/2411.11487v1 null
2024-11-18 MVLight: Relightable Text-to-3D Generation via Light-conditioned Multi-View Diffusion MVLight:基于光照条件的多视图扩散重照明文本到3D生成 Dongseok Shim, Yichun Shi, Kejie Li, H. Jin Kim, Peng Wang http://arxiv.org/pdf/2411.11475v1 null
2024-11-18 HistoEncoder: a digital pathology foundation model for prostate cancer HistoEncoder:前列腺癌的数字病理基础模型 Joona Pohjonen, Abderrahim-Oussama Batouche, Antti Rannikko, Kevin Sandeman, Andrew Erickson, Esa Pitkanen, Tuomas Mirtti http://arxiv.org/pdf/2411.11458v1 null
2024-11-18 Teaching Video Diffusion Model with Latent Physical Phenomenon Knowledge 利用潜在物理现象知识训练视频扩散模型 Qinglong Cao, Ding Wang, Xirui Li, Yuntian Chen, Chao Ma, Xiaokang Yang http://arxiv.org/pdf/2411.11343v1 null
2024-11-18 TP-UNet: Temporal Prompt Guided UNet for Medical Image Segmentation 时序提示引导的UNet用于医学图像分割 Ranmin Wang, Limin Zhuang, Hongkun Chen, Boyan Xu, Ruichu Cai http://arxiv.org/pdf/2411.11305v1 null
2024-11-18 Zero-Shot Automatic Annotation and Instance Segmentation using LLM-Generated Datasets: Eliminating Field Imaging and Manual Annotation for Deep Learning Model Development 基于LLM生成数据集的零样本自动标注与实例分割:消除场成像和手动标注以开发深度学习模型 Ranjan Sapkota, Achyut Paudel, Manoj Karkee http://arxiv.org/pdf/2411.11285v1 null
2024-11-18 V2X-R: Cooperative LiDAR-4D Radar Fusion for 3D Object Detection with Denoising Diffusion V2X-R:基于去噪扩散的协同激光雷达-4D雷达融合三维目标检测 Xun Huang, Jinlong Wang, Qiming Xia, Siheng Chen, Bisheng Yang, Xin Li, Cheng Wang, Chenglu Wen http://arxiv.org/pdf/2411.08402v2 link
2024-11-18 A Hybrid Approach for COVID-19 Detection: Combining Wasserstein GAN with Transfer Learning 基于Wasserstein GAN和迁移学习的COVID-19检测混合方法 Sumera Rounaq, Shahid Munir Shah, Mahmoud Aljawarneh http://arxiv.org/pdf/2411.06397v2 null
2024-11-18 Understanding Generalizability of Diffusion Models Requires Rethinking the Hidden Gaussian Structure 理解扩散模型的泛化性需要重新思考隐藏高斯结构 Xiang Li, Yixiang Dai, Qing Qu http://arxiv.org/pdf/2410.24060v3 link
2024-11-18 CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense 因果差异:基于扩散模型的因果解耦对抗防御 Mingkun Zhang, Keping Bi, Wei Chen, Quanrun Chen, Jiafeng Guo, Xueqi Cheng http://arxiv.org/pdf/2410.23091v4 link
2024-11-18 Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing 基于视觉引导和掩码增强的提示式图像编辑自适应去噪 Kejie Wang, Xuemeng Song, Meng Liu, Jin Yuan, Weili Guan http://arxiv.org/pdf/2410.10496v2 null
2024-11-18 MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning MegaFusion:无需进一步调优,扩展扩散模型以实现更高分辨率图像生成 Haoning Wu, Shaocheng Shen, Qiang Hu, Xiaoyun Zhang, Ya Zhang, Yanfeng Wang http://arxiv.org/pdf/2408.11001v3 link
2024-11-18 An Open-Source Tool for Mapping War Destruction at Scale in Ukraine using Sentinel-1 Time Series 乌克兰大规模战争破坏映射的开源工具:利用Sentinel-1时间序列 Olivier Dietrich, Torben Peters, Vivien Sainte Fare Garnot, Valerie Sticher, Thao Ton-That Whelan, Konrad Schindler, Jan Dirk Wegner http://arxiv.org/pdf/2406.02506v2 null
2024-11-18 ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model 艺术编织者:通过扩散模型实现高级动态风格集成 Chengming Xu, Kai Hu, Qilin Wang, Donghao Luo, Jiangning Zhang, Xiaobin Hu, Yanwei Fu, Chengjie Wang http://arxiv.org/pdf/2405.15287v2 null
2024-11-18 Frame Interpolation with Consecutive Brownian Bridge Diffusion 基于连续布朗桥扩散的帧插值 Zonglin Lyu, Ming Li, Jianbo Jiao, Chen Chen http://arxiv.org/pdf/2405.05953v6 link
2024-11-18 ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback 控制网++:通过高效一致性反馈提升条件控制 Ming Li, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, Chen Chen http://arxiv.org/pdf/2404.07987v3 link
2024-11-18 SynArtifact: Classifying and Alleviating Artifacts in Synthetic Images via Vision-Language Model SynArtifact:通过视觉-语言模型分类和缓解合成图像中的伪影 Bin Cao, Jianhao Yuan, Yexin Liu, Jian Li, Shuyang Sun, Jing Liu, Bo Zhao http://arxiv.org/pdf/2402.18068v3 null
2024-11-18 MagicStick: Controllable Video Editing via Control Handle Transformations MagicStick:通过控制手柄变换实现可控的视频编辑 Yue Ma, Xiaodong Cun, Sen Liang, Jinbo Xing, Yingqing He, Chenyang Qi, Siran Chen, Qifeng Chen http://arxiv.org/pdf/2312.03047v2 link
2024-11-18 A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning 深度学习领域超越持续学习的遗忘综合调查 Zhenyi Wang, Enneng Yang, Li Shen, Heng Huang http://arxiv.org/pdf/2307.09218v3 link
2024-11-18 3D microstructural generation from 2D images of cement paste using generative adversarial networks 基于生成对抗网络的2D水泥浆图像到3D微观结构生成 Xin Zhao, Lin Wang, Qinfei Li, Heng Chen, Shuangrong Liu, Pengkun Hou, Jiayuan Ye, Yan Pei, Xu Wu, Jianfeng Yuan, et.al. http://arxiv.org/pdf/2204.01645v3 null

多模态

Publish Date Title Title_CN Authors PDF Code
2024-11-18 Edge-Enhanced Dilated Residual Attention Network for Multimodal Medical Image Fusion 边缘增强扩张残差注意力网络在多模态医学图像融合中的应用 Meng Zhou, Yuxuan Zhang, Xiaolan Xu, Jiayi Wang, Farzad Khalvati http://arxiv.org/pdf/2411.11799v1 null
2024-11-18 Dissecting Misalignment of Multimodal Large Language Models via Influence Function 通过影响函数剖析多模态大型语言模型的偏差 Lijie Hu, Chenyang Ren, Huanyi Xie, Khouloud Saadi, Shu Yang, Jingfeng Zhang, Di Wang http://arxiv.org/pdf/2411.11667v1 null
2024-11-18 The ADUULM-360 Dataset -- A Multi-Modal Dataset for Depth Estimation in Adverse Weather ADUULM-360数据集 —— 用于恶劣天气下深度估计的多模态数据集 Markus Schön, Jona Ruof, Thomas Wodtko, Michael Buchholz, Klaus Dietmayer http://arxiv.org/pdf/2411.11455v1 null
2024-11-18 GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts GLDesigner:利用多模态大型语言模型作为设计师以增强美学文本符号布局 Junwen He, Yifan Wang, Lijun Wang, Huchuan Lu, Jun-Yan He, Chenyang Li, Hanyuan Chen, Jin-Peng Lan, Bin Luo, Yifeng Geng http://arxiv.org/pdf/2411.11435v1 null
2024-11-18 TL-CLIP: A Power-specific Multimodal Pre-trained Visual Foundation Model for Transmission Line Defect Recognition TL-CLIP:一种针对电力系统的多模态预训练视觉基础模型用于输电线路缺陷识别 Ke Zhang, Zhaoye Zheng, Yurong Guo, Jiacun Wang, Jiyuan Yang, Yangjie Xiao http://arxiv.org/pdf/2411.11370v1 null
2024-11-18 MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models MAIRA-Seg:利用分割感知多模态大型语言模型提升放射学报告生成 Harshita Sharma, Valentina Salvatelli, Shaury Srivastav, Kenza Bouzid, Shruthi Bannur, Daniel C. Castro, Maximilian Ilse, Sam Bond-Taylor, Mercy Prasanna Ranjit, Fabian Falck, et.al. http://arxiv.org/pdf/2411.11362v1 null
2024-11-18 CCExpert: Advancing MLLM Capability in Remote Sensing Change Captioning with Difference-Aware Integration and a Foundational Dataset CCExpert:基于差异感知整合与基础数据集,提升遥感变化字幕中的多模态语言模型能力 Zhiming Wang, Mingze Wang, Sheng Xu, Yanjing Li, Baochang Zhang http://arxiv.org/pdf/2411.11360v1 null
2024-11-18 Towards Open-Vocabulary Audio-Visual Event Localization 面向开放词汇的视听事件定位 Jinxing Zhou, Dan Guo, Ruohao Guo, Yuxin Mao, Jingjing Hu, Yiran Zhong, Xiaojun Chang, Meng Wang http://arxiv.org/pdf/2411.11278v1 null
2024-11-18 Efficient Transfer Learning for Video-language Foundation Models 高效的视频-语言基础模型迁移学习 Haoxing Chen, Zizheng Huang, Yan Hong, Yanshuo Wang, Zhongcai Lyu, Zhuoer Xu, Jun Lan, Zhangxuan Gu http://arxiv.org/pdf/2411.11223v1 null
2024-11-18 MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation MedCLIP-SAMv2:迈向通用文本驱动医学图像分割 Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz, Yiming Xiao http://arxiv.org/pdf/2409.19483v3 link
2024-11-18 Multi-modal Situated Reasoning in 3D Scenes 三维场景中的多模态情境推理 Xiongkun Linghu, Jiangyong Huang, Xuesong Niu, Xiaojian Ma, Baoxiong Jia, Siyuan Huang http://arxiv.org/pdf/2409.02389v2 null
2024-11-18 MatchTime: Towards Automatic Soccer Game Commentary Generation 迈向自动足球比赛解说生成 Jiayuan Rao, Haoning Wu, Chang Liu, Yanfeng Wang, Weidi Xie http://arxiv.org/pdf/2406.18530v2 link
2024-11-18 Grounded 3D-LLM with Referent Tokens 基于参照标记的 grounded 3D-LLM Yilun Chen, Shuai Yang, Haifeng Huang, Tai Wang, Runsen Xu, Ruiyuan Lyu, Dahua Lin, Jiangmiao Pang http://arxiv.org/pdf/2405.10370v2 link
2024-11-18 PhD: A ChatGPT-Prompted Visual hallucination Evaluation Dataset 博士:基于ChatGPT提示的视觉幻觉评估数据集 Jiazhen Liu, Yuhan Fu, Ruobing Xie, Runquan Xie, Xingwu Sun, Fengzong Lian, Zhanhui Kang, Xirong Li http://arxiv.org/pdf/2403.11116v3 link

Nerf

Publish Date Title Title_CN Authors PDF Code
2024-11-18 Towards Degradation-Robust Reconstruction in Generalizable NeRF 通用于可泛化NeRF的鲁棒退化重建 Chan Ho Park, Ka Leong Cheng, Zhicheng Wang, Qifeng Chen http://arxiv.org/pdf/2411.11691v1 null
2024-11-18 LeC$^2$O-NeRF: Learning Continuous and Compact Large-Scale Occupancy for Urban Scenes LeC$^2$O-NeRF:学习连续且紧凑的大规模城市场景占用 Zhenxing Mi, Dan Xu http://arxiv.org/pdf/2411.11374v1 null

3DGS

Publish Date Title Title_CN Authors PDF Code
2024-11-18 RoboGSim: A Real2Sim2Real Robotic Gaussian Splatting Simulator RoboGSim:真实到仿真到真实机器人高斯散斑模拟器 Xinhai Li, Jialin Li, Ziheng Zhang, Rui Zhang, Fan Jia, Tiancai Wang, Haoqiang Fan, Kuo-Kun Tseng, Ruiping Wang http://arxiv.org/pdf/2411.11839v1 null
2024-11-18 GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views 基于GPS-Gaussian+的通用像素级3D高斯散斑渲染:从稀疏视图中实时渲染人景 Boyao Zhou, Shunyuan Zheng, Hanzhang Tu, Ruizhi Shao, Boning Liu, Shengping Zhang, Liqiang Nie, Yebin Liu http://arxiv.org/pdf/2411.11363v1 null
2024-11-18 BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis BrightDreamer:快速文本到3D合成通用的3D高斯生成框架 Lutao Jiang, Xu Zheng, Yuanhuiyi Lyu, Jiazhou Zhou, Lin Wang http://arxiv.org/pdf/2403.11273v2 link

模型压缩/优化

Publish Date Title Title_CN Authors PDF Code
2024-11-18 FERT: Real-Time Facial Expression Recognition with Short-Range FMCW Radar FERT:基于短距离FMCW雷达的实时面部表情识别 Sabri Mustafa Kahya, Muhammet Sami Yavuz, Eckehard Steinbach http://arxiv.org/pdf/2411.11619v1 null
2024-11-18 Color-Oriented Redundancy Reduction in Dataset Distillation 面向颜色特征的 数据集蒸馏冗余降低 Bowen Yuan, Zijian Wang, Yadan Luo, Mahsa Baktashmotlagh, Yadan Luo, Zi Huang http://arxiv.org/pdf/2411.11329v1 null
2024-11-18 GazeGen: Gaze-Driven User Interaction for Visual Content Generation gazeGen:视觉内容生成中的注视驱动用户交互 He-Yen Hsieh, Ziyun Li, Sai Qian Zhang, Wei-Te Mark Ting, Kao-Den Chang, Barbara De Salvo, Chiao Liu, H. T. Kung http://arxiv.org/pdf/2411.04335v2 null

分类/检测/识别/分割/...

Publish Date Title Title_CN Authors PDF Code
2024-11-18 LightFFDNets: Lightweight Convolutional Neural Networks for Rapid Facial Forgery Detection 轻量级卷积神经网络在快速人脸伪造检测中的应用 Günel Jabbarlı, Murat Kurt http://arxiv.org/pdf/2411.11826v1 null
2024-11-18 WoodYOLO: A Novel Object Detector for Wood Species Detection in Microscopic Images 基于木种识别的微观图像新型目标检测器:WoodYOLO Lars Nieradzik, Henrike Stephani, Jördis Sieburg-Rockel, Stephanie Helmling, Andrea Olbrich, Stephanie Wrage, Janis Keuper http://arxiv.org/pdf/2411.11738v1 null
2024-11-18 From Spectra to Geography: Intelligent Mapping of RRUFF Mineral Data 从光谱到地理:RRUFF矿物数据的智能映射 Francesco Pappone, Federico Califano, Marco Tafani http://arxiv.org/pdf/2411.11693v1 null
2024-11-18 Real-Time Fitness Exercise Classification and Counting from Video Frames 实时从视频帧中分类和计数健身运动 Riccardo Riccio http://arxiv.org/pdf/2411.11548v1 null
2024-11-18 Reliable Poisoned Sample Detection against Backdoor Attacks Enhanced by Sharpness Aware Minimization 基于Sharpness-Aware Minimization的针对后门攻击的可靠中毒样本检测 Mingda Zhang, Mingli Zhu, Zihao Zhu, Baoyuan Wu http://arxiv.org/pdf/2411.11525v1 null
2024-11-18 Learning a Neural Association Network for Self-supervised Multi-Object Tracking 学习用于自监督多目标跟踪的神经关联网络 Shuai Li, Michael Burke, Subramanian Ramamoorthy, Juergen Gall http://arxiv.org/pdf/2411.11514v1 null
2024-11-18 Exploring Emerging Trends and Research Opportunities in Visual Place Recognition 探索视觉场所识别中的新兴趋势和研究机遇 Antonios Gasteratos, Konstantinos A. Tsintotas, Tobias Fischer, Yiannis Aloimonos, Michael Milford http://arxiv.org/pdf/2411.11481v1 null
2024-11-18 SL-YOLO: A Stronger and Lighter Drone Target Detection Model SL-YOLO:更强大更轻量级的无人机目标检测模型 Defan Chen, Luchan Zhang http://arxiv.org/pdf/2411.11477v1 null
2024-11-18 MGNiceNet: Unified Monocular Geometric Scene Understanding MGNiceNet:统一单目几何场景理解 Markus Schön, Michael Buchholz, Klaus Dietmayer http://arxiv.org/pdf/2411.11466v1 null
2024-11-18 IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos 宜家手册在工作中的应用:基于互联网视频的4D组装说明接地 Yunong Liu, Cristobal Eyzaguirre, Manling Li, Shubh Khanna, Juan Carlos Niebles, Vineeth Ravi, Saumitra Mishra, Weiyu Liu, Jiajun Wu http://arxiv.org/pdf/2411.11409v1 null
2024-11-18 Stacking Brick by Brick: Aligned Feature Isolation for Incremental Face Forgery Detection 堆砌砖块式:增量人脸伪造检测的对齐特征隔离 Jikang Cheng, Zhiyuan Yan, Ying Zhang, Li Hao, Jiaxin Ai, Qin Zou, Chen Li, Zhongyuan Wang http://arxiv.org/pdf/2411.11396v1 null
2024-11-18 Lung Disease Detection with Vision Transformers: A Comparative Study of Machine Learning Methods 基于视觉 Transformer 的肺部疾病检测:机器学习方法的比较研究 Baljinnyam Dayan http://arxiv.org/pdf/2411.11376v1 null
2024-11-18 A comprehensive survey of oracle character recognition: challenges, benchmarks, and beyond 全面综述Oracle字符识别:挑战、基准及未来 Jing Li, Xueke Chi, Qiufeng Wang, Dahan Wang, Kaizhu Huang, Yongge Liu, Cheng-lin Liu http://arxiv.org/pdf/2411.11354v1 null
2024-11-18 Video-to-Task Learning via Motion-Guided Attention for Few-Shot Action Recognition 基于运动引导注意力机制的少样本动作识别视频到任务学习 Hanyu Guo, Wanchuan Yu, Suzhou Que, Kaiwen Du, Yan Yan, Hanzi Wang http://arxiv.org/pdf/2411.11335v1 null
2024-11-18 Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition 神经:学习上下文感知演化表示以实现零样本骨骼动作识别 Yang Chen, Jingcai Guo, Song Guo, Dacheng Tao http://arxiv.org/pdf/2411.11288v1 null
2024-11-18 Reducing Label Dependency for Underwater Scene Understanding: A Survey of Datasets, Techniques and Applications 水下场景理解中降低标签依赖性:数据集、技术和应用综述 Scarlett Raine, Frederic Maire, Niko Suenderhauf, Tobias Fischer http://arxiv.org/pdf/2411.11287v1 null
2024-11-18 Cross-Patient Pseudo Bags Generation and Curriculum Contrastive Learning for Imbalanced Multiclassification of Whole Slide Image 跨患者伪袋生成与课程对比学习,用于全切片图像不平衡多分类 Yonghuang Wu, Xuan Xie, Xinyuan Niu, Chengqian Zhao, Jinhua Yu http://arxiv.org/pdf/2411.11262v1 null
2024-11-18 Semantic or Covariate? A Study on the Intractable Case of Out-of-Distribution Detection 语义或协变量?分布式检测难题研究 Xingming Long, Jie Zhang, Shiguang Shan, Xilin Chen http://arxiv.org/pdf/2411.11254v1 null
2024-11-18 Noise Filtering Benchmark for Neuromorphic Satellites Observations 神经形态卫星观测噪声滤波基准 Sami Arja, Alexandre Marcireau, Nicholas Owen Ralph, Saeed Afshar, Gregory Cohen http://arxiv.org/pdf/2411.11233v1 null
2024-11-18 The Sound of Water: Inferring Physical Properties from Pouring Liquids 水之声音:从倒液体中推断物理性质 Piyush Bagad, Makarand Tapaswi, Cees G. M. Snoek, Andrew Zisserman http://arxiv.org/pdf/2411.11222v1 null
2024-11-18 Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition 关系对比学习与遮挡图像建模用于场景文本识别 Tiancheng Lin, Jinglei Zhang, Yi Xu, Kai Chen, Rui Zhang, Chang-Wen Chen http://arxiv.org/pdf/2411.11219v1 null
2024-11-18 Masked Autoencoders are Parameter-Efficient Federated Continual Learners 掩码自编码器:参数高效联邦持续学习者 Yuchen He, Xiangfeng Wang http://arxiv.org/pdf/2411.01916v2 link
2024-11-18 Task Adaptive Feature Distribution Based Network for Few-shot Fine-grained Target Classification 基于任务自适应特征分布网络的少量样本细粒度目标分类 Ping Li, Hongbo Wang, Lei Lu http://arxiv.org/pdf/2410.09797v2 null
2024-11-18 CerviXpert: A Multi-Structural Convolutional Neural Network for Predicting Cervix Type and Cervical Cell Abnormalities CerviXpert:一种用于预测宫颈类型和宫颈细胞异常的多结构卷积神经网络 Rashik Shahriar Akash, Radiful Islam, S. M. Saiful Islam Badhon, K. S. M. Tozammel Hossain http://arxiv.org/pdf/2409.06220v2 null
2024-11-18 MagicFace: Training-free Universal-Style Human Image Customized Synthesis 无训练通用风格人像定制合成:MagicFace Yibin Wang, Weizhong Zhang, Cheng Jin http://arxiv.org/pdf/2408.07433v5 null
2024-11-18 MIST: A Simple and Scalable End-To-End 3D Medical Imaging Segmentation Framework MIST:一种简单且可扩展的端到端3D医学影像分割框架 Adrian Celaya, Evan Lim, Rachel Glenn, Brayden Mi, Alex Balsells, Dawid Schellingerhout, Tucker Netherton, Caroline Chung, Beatrice Riviere, David Fuentes http://arxiv.org/pdf/2407.21343v2 link
2024-11-18 Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion 无约束开放词汇图像分类:通过CLIP反演从文本到图像的零样本迁移学习 Philipp Allgeuer, Kyra Ahrens, Stefan Wermter http://arxiv.org/pdf/2407.11211v3 null
2024-11-18 Formal Verification of Deep Neural Networks for Object Detection 深度神经网络对象检测的正式验证 Yizhak Y. Elboher, Avraham Raviv, Yael Leibovich Weiss, Omer Cohen, Roy Assa, Guy Katz, Hillel Kugler http://arxiv.org/pdf/2407.01295v5 null
2024-11-18 MV2Cyl: Reconstructing 3D Extrusion Cylinders from Multi-View Images MV2Cyl:从多视角图像重建3D挤出圆柱 Eunji Hong, Minh Hieu Nguyen, Mikaela Angelina Uy, Minhyuk Sung http://arxiv.org/pdf/2406.10853v3 null
2024-11-18 Searching for internal symbols underlying deep learning 寻找深层次学习背后的内部符号 Jung H. Lee, Sujith Vijayan http://arxiv.org/pdf/2405.20605v2 null
2024-11-18 Eidos: Efficient, Imperceptible Adversarial 3D Point Clouds Eidos:高效、难以察觉的对抗性3D点云 Hanwei Zhang, Luo Cheng, Qisong He, Wei Huang, Renjue Li, Ronan Sicre, Xiaowei Huang, Holger Hermanns, Lijun Zhang http://arxiv.org/pdf/2405.14210v2 null
2024-11-18 Machine Vision-Based Assessment of Fall Color Changes and its Relationship with Leaf Nitrogen Concentration 基于机器视觉的秋季叶色变化评估及其与叶片氮浓度的关系 Achyut Paudel, Jostan Brown, Priyanka Upadhyaya, Atif Bilal Asad, Safal Kshetri, Joseph R. Davidson, Cindy Grimm, Ashley Thompson, Bernardita Sallato, Matthew D. Whiting, et.al. http://arxiv.org/pdf/2404.14653v3 null
2024-11-18 Watermark-based Detection and Attribution of AI-Generated Content 基于水印的AI生成内容检测与归因 Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Neil Zhenqiang Gong http://arxiv.org/pdf/2404.04254v2 null
2024-11-18 Structural-Based Uncertainty in Deep Learning Across Anatomical Scales: Analysis in White Matter Lesion Segmentation 基于结构的不确定性在深度学习中跨解剖尺度:白质病变分割分析 Nataliia Molchanova, Vatsal Raina, Andrey Malinin, Francesco La Rosa, Adrien Depeursinge, Mark Gales, Cristina Granziera, Henning Muller, Mara Graziani, Meritxell Bach Cuadra http://arxiv.org/pdf/2311.08931v3 link
2024-11-18 Learning to mask: Towards generalized face forgery detection 学习进行遮挡:迈向通用的面部伪造检测 Jianwei Fei, Yunshu Dai, Huaming Wang, Zhihua Xia http://arxiv.org/pdf/2212.14309v2 null

图像理解

Publish Date Title Title_CN Authors PDF Code
2024-11-18 Scalable Autoregressive Monocular Depth Estimation 可扩展的自回归单目深度估计 Jinhong Wang, Jian Liu, Dongqi Tang, Weiqiang Wang, Wentong Li, Danny Chen, J intai Chen, Jian Wu http://arxiv.org/pdf/2411.11361v1 null

LLM

Publish Date Title Title_CN Authors PDF Code
2024-11-18 Generative World Explorer 生成式世界探索者 Taiming Lu, Tianmin Shu, Alan Yuille, Daniel Khashabi, Jieneng Chen http://arxiv.org/pdf/2411.11844v1 null
2024-11-18 Exploring adversarial robustness of JPEG AI: methodology, comparison and new methods 探索JPEG AI的抗对抗性:方法、比较与新方法 Egor Kovalev, Georgii Bychkov, Khaled Abud, Aleksandr Gushchin, Anna Chistyakova, Sergey Lavrushkin, Dmitriy Vatolin, Anastasia Antsiferova http://arxiv.org/pdf/2411.11795v1 null
2024-11-18 Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment 通过渐进式概念-瓶颈驱动对齐提升视觉-语言模型安全性 Zhendong Liu, Yuanbi Nie, Yingshui Tan, Xiangyu Yue, Qiushi Cui, Chongjun Wang, Xiaoyong Zhu, Bo Zheng http://arxiv.org/pdf/2411.11543v1 null
2024-11-18 Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment 精细粒度验证器:视觉-语言对齐中的偏好建模作为下一个标记预测 Chenhang Cui, An Zhang, Yiyang Zhou, Zhaorun Chen, Gelei Deng, Huaxiu Yao, Tat-Seng Chua http://arxiv.org/pdf/2410.14148v2 null
2024-11-18 Utilizing Large Language Models in an iterative paradigm with domain feedback for molecule optimization 利用具有领域反馈的迭代范式中的大型语言模型进行分子优化 Khiem Le, Nitesh V. Chawla http://arxiv.org/pdf/2410.13147v6 null

Transformer

Publish Date Title Title_CN Authors PDF Code
2024-11-18 Relevance-guided Audio Visual Fusion for Video Saliency Prediction 基于相关性的视频显著性预测中的音视频融合 Li Yu, Xuanzhe Sun, Pan Gao, Moncef Gabbouj http://arxiv.org/pdf/2411.11454v1 null
2024-11-18 Superpixel-informed Implicit Neural Representation for Multi-Dimensional Data 基于超像素信息的多维数据隐式神经网络表示 Jiayi Li, Xile Zhao, Jianli Wang, Chao Wang, Min Wang http://arxiv.org/pdf/2411.11356v1 null
2024-11-18 BeautyBank: Encoding Facial Makeup in Latent Space 美丽银行:面部妆容在潜在空间的编码 Qianwen Lu, Xingchao Yang, Takafumi Taketomi http://arxiv.org/pdf/2411.11231v1 null
2024-11-18 DeforHMR: Vision Transformer with Deformable Cross-Attention for 3D Human Mesh Recovery DeformHMR:用于3D人体网格恢复的可变形交叉注意力视觉Transformer Jaewoo Heo, George Hu, Zeyu Wang, Serena Yeung-Levy http://arxiv.org/pdf/2411.11214v1 null
2024-11-18 Rendering-Oriented 3D Point Cloud Attribute Compression using Sparse Tensor-based Transformer 基于稀疏张量变换的面向渲染的3D点云属性压缩 Xiao Huo, Junhui Hou, Shuai Wan, Fuzheng Yang http://arxiv.org/pdf/2411.07899v2 null
2024-11-18 Activating Self-Attention for Multi-Scene Absolute Pose Regression 激活自注意力实现多场景绝对姿态回归 Miso Lee, Jihwan Kim, Jae-Pil Heo http://arxiv.org/pdf/2411.01443v2 link
2024-11-18 DemMamba: Alignment-free Raw Video Demoireing with Frequency-assisted Spatio-Temporal Mamba DemMamba:基于频率辅助的时空Mamba无对齐原始视频去隔行处理 Shuning Xu, Xina Liu, Binbin Song, Xiangyu Chen, Qiubo Chen, Jiantao Zhou http://arxiv.org/pdf/2408.10679v2 null
2024-11-18 DreamText: High Fidelity Scene Text Synthesis 梦文:高保真场景文本合成 Yibin Wang, Weizhong Zhang, Cheng Jin http://arxiv.org/pdf/2405.14701v3 link
2024-11-18 Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild 变色龙:野外观测密集视觉预测中的高效通用数据算法 Donggyun Kim, Seongwoong Cho, Semin Kim, Chong Luo, Seunghoon Hong http://arxiv.org/pdf/2404.18459v2 link

3D/CG

Publish Date Title Title_CN Authors PDF Code
2024-11-18 UniHands: Unifying Various Wild-Collected Keypoints for Personalized Hand Reconstruction UniHands:统一各种野采集关键点以实现个性化手部重建 Menghe Zhang, Joonyeoup Kim, Yangwen Liang, Shuangquan Wang, Kee-Bong Song http://arxiv.org/pdf/2411.11845v1 null
2024-11-18 RAWMamba: Unified sRGB-to-RAW De-rendering With State Space Model RAWMamba:基于状态空间模型的统一sRGB到RAW去渲染 Hongjun Chen, Wencheng Han, Huan Zheng, Jianbing Shen http://arxiv.org/pdf/2411.11717v1 null
2024-11-18 Leveraging Computational Pathology AI for Noninvasive Optical Imaging Analysis Without Retraining 利用计算病理学AI进行无需重新训练的非侵入性光学成像分析 Danny Barash, Emilie Manning, Aidan Van Vleck, Omri Hirsch, Kyi Lei Aye, Jingxi Li, Philip O. Scumpia, Aydogan Ozcan, Sumaira Aasi, Kerri E. Rieger, et.al. http://arxiv.org/pdf/2411.11613v1 null
2024-11-18 A Review of Digital Pixel Sensors 数字像素传感器综述 Md Rahatul Islam Udoy, Shamiul Alam, Md Mazharul Islam, Akhilesh Jaiswal, Ahmedullah Aziz http://arxiv.org/pdf/2402.04507v2 null

各类学习方式

Publish Date Title Title_CN Authors PDF Code
2024-11-18 Text-guided Zero-Shot Object Localization 基于文本的零样本目标定位 Jingjing Wang, Xinglin Piao, Zongzhi Gao, Bo Li, Yong Zhang, Baocai Yin http://arxiv.org/pdf/2411.11357v1 null
2024-11-18 Visual-Semantic Graph Matching Net for Zero-Shot Learning 基于视觉语义图匹配网络的零样本学习 Bowen Duan, Shiming Chen, Yufei Guo, Guo-Sen Xie, Weiping Ding, Yisong Wang http://arxiv.org/pdf/2411.11351v1 null

其他

Publish Date Title Title_CN Authors PDF Code
2024-11-18 Equivariant spatio-hemispherical networks for diffusion MRI deconvolution 等变空间半球网络在扩散磁共振成像去卷积中的应用 Axel Elaldi, Guido Gerig, Neel Dey http://arxiv.org/pdf/2411.11819v1 null
2024-11-18 Revitalizing Electoral Trust: Enhancing Transparency and Efficiency through Automated Voter Counting with Machine Learning 通过机器学习自动选民计票提升选举信任度:增强透明性和效率 Mir Faris, Syeda Aynul Karim, Md. Juniadul Islam http://arxiv.org/pdf/2411.11740v1 null
2024-11-18 MC-LLaVA: Multi-Concept Personalized Vision-Language Model MC-LLaVA:多概念个性化视觉-语言模型 Ruichuan An, Sihan Yang, Ming Lu, Kai Zeng, Yulin Luo, Ying Chen, Jiajun Cao, Hao Liang, Qi She, Shanghang Zhang, et.al. http://arxiv.org/pdf/2411.11706v1 null
2024-11-18 MSSIDD: A Benchmark for Multi-Sensor Denoising 多传感器去噪基准:MSSIDD Shibin Mei, Hang Wang, Bingbing Ni http://arxiv.org/pdf/2411.11562v1 null
2024-11-18 SignEye: Traffic Sign Interpretation from Vehicle First-Person View SignEye:车辆第一视角的交通标志识别 Chuang Yang, Xu Han, Tao Han, Yuejiao SU, Junyu Gao, Hongyuan Zhang, Yi Wang, Lap-Pui Chau http://arxiv.org/pdf/2411.11507v1 null
2024-11-18 Generalizable Person Re-identification via Balancing Alignment and Uniformity 通用的人体重识别:平衡对齐与一致性 Yoonki Cho, Jaeyoon Kim, Woo Jae Kim, Junsik Jung, Sung-eui Yoon http://arxiv.org/pdf/2411.11471v1 null
2024-11-18 Towards fast DBSCAN via Spectrum-Preserving Data Compression 基于频谱保留数据压缩的快速DBSCAN算法 Yongyu Wang http://arxiv.org/pdf/2411.11421v1 null
2024-11-18 Performance Evaluation of Geospatial Images based on Zarr and Tiff 基于Zarr和Tiff的地理空间图像性能评估 Jaheer Khan, Swarup E, Rakshit Ramesh http://arxiv.org/pdf/2411.11291v1 null
2024-11-18 Continuous K-space Recovery Network with Image Guidance for Fast MRI Reconstruction 基于图像引导的连续K空间恢复网络实现快速MRI重建 Yucong Meng, Zhiwei Yang, Minghong Duan, Yonghong Shi, Zhijian Song http://arxiv.org/pdf/2411.11282v1 null
2024-11-18 DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation 驱动球体:构建高保真4D世界用于闭环仿真 Tianyi Yan, Dongming Wu, Wencheng Han, Junpeng Jiang, Xia Zhou, Kun Zhan, Cheng-zhong Xu, Jianbing Shen http://arxiv.org/pdf/2411.11252v1 null
2024-11-18 Partial Scene Text Retrieval 部分场景文本检索 Hao Wang, Minghui Liao, Zhouyi Xie, Wenyu Liu, Xiang Bai http://arxiv.org/pdf/2411.10261v2 link
2024-11-18 ObjectNLQ @ Ego4D Episodic Memory Challenge 2024 ObjectNLQ在Ego4D情景记忆挑战2024 Yisen Feng, Haoyu Zhang, Yuquan Xie, Zaijing Li, Meng Liu, Liqiang Nie http://arxiv.org/pdf/2406.15778v2 link
2024-11-18 A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting CAC配方:基于mosaic的通用损失函数以提升类无关计数 Tsung-Han Chou, Brian Wang, Wei-Chen Chiu, Jun-Cheng Chen http://arxiv.org/pdf/2404.09826v2 null
2024-11-18 Image Demoireing in RAW and sRGB Domains RAW与sRGB域中的图像去噪 Shuning Xu, Binbin Song, Xiangyu Chen, Xina Liu, Jiantao Zhou http://arxiv.org/pdf/2312.09063v3 null
2024-11-18 A Scalable Training Strategy for Blind Multi-Distribution Noise Removal Source-Channel Decoupling for Efficient Deep Learning of Multi-Modal Data Kevin Zhang, Sakshum Kulshrestha, Christopher Metzler http://arxiv.org/pdf/2310.20064v2 null
2024-11-18 Uncovering Hidden Connections: Iterative Search and Reasoning for Video-grounded Dialog 揭示隐藏关联:基于视频的对话的迭代搜索与推理 Haoyu Zhang, Meng Liu, Yaowei Wang, Da Cao, Weili Guan, Liqiang Nie http://arxiv.org/pdf/2310.07259v3 link
2024-11-18 Unmasking Parkinson's Disease with Smile: An AI-enabled Screening Framework 利用微笑揭示帕金森病:一款AI赋能的筛查框架 Tariq Adnan, Md Saiful Islam, Wasifur Rahman, Sangwu Lee, Sutapa Dey Tithi, Kazi Noshin, Imran Sarker, M Saifur Rahman, Ehsan Hoque http://arxiv.org/pdf/2308.02588v2 null