Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-18 | The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning | 众智之力:多智能体多模态模型用于文化图像标题生成 | Longju Bai, Angana Borah, Oana Ignat, Rada Mihalcea | http://arxiv.org/pdf/2411.11758v1 | null |
2024-11-18 | Aligning Few-Step Diffusion Models with Dense Reward Difference Learning | 与密集奖励差异学习对齐的几步扩散模型 | Ziyi Zhang, Li Shen, Sen Zhang, Deheng Ye, Yong Luo, Miaojing Shi, Bo Du, Dacheng Tao | http://arxiv.org/pdf/2411.11727v1 | null |
2024-11-18 | SP${ }^3$ : Superpixel-propagated pseudo-label learning for weakly semi-supervised medical image segmentation | SP${ }^3$:基于超像素传播的弱监督半监督医学图像分割伪标签学习 | Shiman Li, Jiayue Zhao, Shaolei Liu, Xiaokun Dai, Chenxi Zhang, Zhijian Song | http://arxiv.org/pdf/2411.11636v1 | null |
2024-11-18 | Cascaded Diffusion Models for 2D and 3D Microscopy Image Synthesis to Enhance Cell Segmentation | 级联扩散模型用于二维和三维显微镜图像合成以增强细胞分割 | Rüveyda Yilmaz, Kaan Keven, Yuli Wu, Johannes Stegmaier | http://arxiv.org/pdf/2411.11515v1 | null |
2024-11-18 | LaVin-DiT: Large Vision Diffusion Transformer | LaVin-DiT:大视觉扩散Transformer | Zhaoqing Wang, Xiaobo Xia, Runnan Chen, Dongdong Yu, Changhu Wang, Mingming Gong, Tongliang Liu | http://arxiv.org/pdf/2411.11505v1 | null |
2024-11-18 | Look a Group at Once: Multi-Slide Modeling for Survival Prediction | 一次看多个群体:生存预测的多幻灯片建模 | Xinyang Li, Yi Zhang, Yi Xie, Jianfei Yang, Xi Wang, Hao Chen, Haixian Zhang | http://arxiv.org/pdf/2411.11487v1 | null |
2024-11-18 | MVLight: Relightable Text-to-3D Generation via Light-conditioned Multi-View Diffusion | MVLight:基于光照条件的多视图扩散重照明文本到3D生成 | Dongseok Shim, Yichun Shi, Kejie Li, H. Jin Kim, Peng Wang | http://arxiv.org/pdf/2411.11475v1 | null |
2024-11-18 | HistoEncoder: a digital pathology foundation model for prostate cancer | HistoEncoder:前列腺癌的数字病理基础模型 | Joona Pohjonen, Abderrahim-Oussama Batouche, Antti Rannikko, Kevin Sandeman, Andrew Erickson, Esa Pitkanen, Tuomas Mirtti | http://arxiv.org/pdf/2411.11458v1 | null |
2024-11-18 | Teaching Video Diffusion Model with Latent Physical Phenomenon Knowledge | 利用潜在物理现象知识训练视频扩散模型 | Qinglong Cao, Ding Wang, Xirui Li, Yuntian Chen, Chao Ma, Xiaokang Yang | http://arxiv.org/pdf/2411.11343v1 | null |
2024-11-18 | TP-UNet: Temporal Prompt Guided UNet for Medical Image Segmentation | 时序提示引导的UNet用于医学图像分割 | Ranmin Wang, Limin Zhuang, Hongkun Chen, Boyan Xu, Ruichu Cai | http://arxiv.org/pdf/2411.11305v1 | null |
2024-11-18 | Zero-Shot Automatic Annotation and Instance Segmentation using LLM-Generated Datasets: Eliminating Field Imaging and Manual Annotation for Deep Learning Model Development | 基于LLM生成数据集的零样本自动标注与实例分割:消除场成像和手动标注以开发深度学习模型 | Ranjan Sapkota, Achyut Paudel, Manoj Karkee | http://arxiv.org/pdf/2411.11285v1 | null |
2024-11-18 | V2X-R: Cooperative LiDAR-4D Radar Fusion for 3D Object Detection with Denoising Diffusion | V2X-R:基于去噪扩散的协同激光雷达-4D雷达融合三维目标检测 | Xun Huang, Jinlong Wang, Qiming Xia, Siheng Chen, Bisheng Yang, Xin Li, Cheng Wang, Chenglu Wen | http://arxiv.org/pdf/2411.08402v2 | link |
2024-11-18 | A Hybrid Approach for COVID-19 Detection: Combining Wasserstein GAN with Transfer Learning | 基于Wasserstein GAN和迁移学习的COVID-19检测混合方法 | Sumera Rounaq, Shahid Munir Shah, Mahmoud Aljawarneh | http://arxiv.org/pdf/2411.06397v2 | null |
2024-11-18 | Understanding Generalizability of Diffusion Models Requires Rethinking the Hidden Gaussian Structure | 理解扩散模型的泛化性需要重新思考隐藏高斯结构 | Xiang Li, Yixiang Dai, Qing Qu | http://arxiv.org/pdf/2410.24060v3 | link |
2024-11-18 | CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense | 因果差异:基于扩散模型的因果解耦对抗防御 | Mingkun Zhang, Keping Bi, Wei Chen, Quanrun Chen, Jiafeng Guo, Xueqi Cheng | http://arxiv.org/pdf/2410.23091v4 | link |
2024-11-18 | Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing | 基于视觉引导和掩码增强的提示式图像编辑自适应去噪 | Kejie Wang, Xuemeng Song, Meng Liu, Jin Yuan, Weili Guan | http://arxiv.org/pdf/2410.10496v2 | null |
2024-11-18 | MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning | MegaFusion:无需进一步调优,扩展扩散模型以实现更高分辨率图像生成 | Haoning Wu, Shaocheng Shen, Qiang Hu, Xiaoyun Zhang, Ya Zhang, Yanfeng Wang | http://arxiv.org/pdf/2408.11001v3 | link |
2024-11-18 | An Open-Source Tool for Mapping War Destruction at Scale in Ukraine using Sentinel-1 Time Series | 乌克兰大规模战争破坏映射的开源工具:利用Sentinel-1时间序列 | Olivier Dietrich, Torben Peters, Vivien Sainte Fare Garnot, Valerie Sticher, Thao Ton-That Whelan, Konrad Schindler, Jan Dirk Wegner | http://arxiv.org/pdf/2406.02506v2 | null |
2024-11-18 | ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model | 艺术编织者:通过扩散模型实现高级动态风格集成 | Chengming Xu, Kai Hu, Qilin Wang, Donghao Luo, Jiangning Zhang, Xiaobin Hu, Yanwei Fu, Chengjie Wang | http://arxiv.org/pdf/2405.15287v2 | null |
2024-11-18 | Frame Interpolation with Consecutive Brownian Bridge Diffusion | 基于连续布朗桥扩散的帧插值 | Zonglin Lyu, Ming Li, Jianbo Jiao, Chen Chen | http://arxiv.org/pdf/2405.05953v6 | link |
2024-11-18 | ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback | 控制网++:通过高效一致性反馈提升条件控制 | Ming Li, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, Chen Chen | http://arxiv.org/pdf/2404.07987v3 | link |
2024-11-18 | SynArtifact: Classifying and Alleviating Artifacts in Synthetic Images via Vision-Language Model | SynArtifact:通过视觉-语言模型分类和缓解合成图像中的伪影 | Bin Cao, Jianhao Yuan, Yexin Liu, Jian Li, Shuyang Sun, Jing Liu, Bo Zhao | http://arxiv.org/pdf/2402.18068v3 | null |
2024-11-18 | MagicStick: Controllable Video Editing via Control Handle Transformations | MagicStick:通过控制手柄变换实现可控的视频编辑 | Yue Ma, Xiaodong Cun, Sen Liang, Jinbo Xing, Yingqing He, Chenyang Qi, Siran Chen, Qifeng Chen | http://arxiv.org/pdf/2312.03047v2 | link |
2024-11-18 | A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning | 深度学习领域超越持续学习的遗忘综合调查 | Zhenyi Wang, Enneng Yang, Li Shen, Heng Huang | http://arxiv.org/pdf/2307.09218v3 | link |
2024-11-18 | 3D microstructural generation from 2D images of cement paste using generative adversarial networks | 基于生成对抗网络的2D水泥浆图像到3D微观结构生成 | Xin Zhao, Lin Wang, Qinfei Li, Heng Chen, Shuangrong Liu, Pengkun Hou, Jiayuan Ye, Yan Pei, Xu Wu, Jianfeng Yuan, et.al. | http://arxiv.org/pdf/2204.01645v3 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-18 | Edge-Enhanced Dilated Residual Attention Network for Multimodal Medical Image Fusion | 边缘增强扩张残差注意力网络在多模态医学图像融合中的应用 | Meng Zhou, Yuxuan Zhang, Xiaolan Xu, Jiayi Wang, Farzad Khalvati | http://arxiv.org/pdf/2411.11799v1 | null |
2024-11-18 | Dissecting Misalignment of Multimodal Large Language Models via Influence Function | 通过影响函数剖析多模态大型语言模型的偏差 | Lijie Hu, Chenyang Ren, Huanyi Xie, Khouloud Saadi, Shu Yang, Jingfeng Zhang, Di Wang | http://arxiv.org/pdf/2411.11667v1 | null |
2024-11-18 | The ADUULM-360 Dataset -- A Multi-Modal Dataset for Depth Estimation in Adverse Weather | ADUULM-360数据集 —— 用于恶劣天气下深度估计的多模态数据集 | Markus Schön, Jona Ruof, Thomas Wodtko, Michael Buchholz, Klaus Dietmayer | http://arxiv.org/pdf/2411.11455v1 | null |
2024-11-18 | GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts | GLDesigner:利用多模态大型语言模型作为设计师以增强美学文本符号布局 | Junwen He, Yifan Wang, Lijun Wang, Huchuan Lu, Jun-Yan He, Chenyang Li, Hanyuan Chen, Jin-Peng Lan, Bin Luo, Yifeng Geng | http://arxiv.org/pdf/2411.11435v1 | null |
2024-11-18 | TL-CLIP: A Power-specific Multimodal Pre-trained Visual Foundation Model for Transmission Line Defect Recognition | TL-CLIP:一种针对电力系统的多模态预训练视觉基础模型用于输电线路缺陷识别 | Ke Zhang, Zhaoye Zheng, Yurong Guo, Jiacun Wang, Jiyuan Yang, Yangjie Xiao | http://arxiv.org/pdf/2411.11370v1 | null |
2024-11-18 | MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models | MAIRA-Seg:利用分割感知多模态大型语言模型提升放射学报告生成 | Harshita Sharma, Valentina Salvatelli, Shaury Srivastav, Kenza Bouzid, Shruthi Bannur, Daniel C. Castro, Maximilian Ilse, Sam Bond-Taylor, Mercy Prasanna Ranjit, Fabian Falck, et.al. | http://arxiv.org/pdf/2411.11362v1 | null |
2024-11-18 | CCExpert: Advancing MLLM Capability in Remote Sensing Change Captioning with Difference-Aware Integration and a Foundational Dataset | CCExpert:基于差异感知整合与基础数据集,提升遥感变化字幕中的多模态语言模型能力 | Zhiming Wang, Mingze Wang, Sheng Xu, Yanjing Li, Baochang Zhang | http://arxiv.org/pdf/2411.11360v1 | null |
2024-11-18 | Towards Open-Vocabulary Audio-Visual Event Localization | 面向开放词汇的视听事件定位 | Jinxing Zhou, Dan Guo, Ruohao Guo, Yuxin Mao, Jingjing Hu, Yiran Zhong, Xiaojun Chang, Meng Wang | http://arxiv.org/pdf/2411.11278v1 | null |
2024-11-18 | Efficient Transfer Learning for Video-language Foundation Models | 高效的视频-语言基础模型迁移学习 | Haoxing Chen, Zizheng Huang, Yan Hong, Yanshuo Wang, Zhongcai Lyu, Zhuoer Xu, Jun Lan, Zhangxuan Gu | http://arxiv.org/pdf/2411.11223v1 | null |
2024-11-18 | MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation | MedCLIP-SAMv2:迈向通用文本驱动医学图像分割 | Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz, Yiming Xiao | http://arxiv.org/pdf/2409.19483v3 | link |
2024-11-18 | Multi-modal Situated Reasoning in 3D Scenes | 三维场景中的多模态情境推理 | Xiongkun Linghu, Jiangyong Huang, Xuesong Niu, Xiaojian Ma, Baoxiong Jia, Siyuan Huang | http://arxiv.org/pdf/2409.02389v2 | null |
2024-11-18 | MatchTime: Towards Automatic Soccer Game Commentary Generation | 迈向自动足球比赛解说生成 | Jiayuan Rao, Haoning Wu, Chang Liu, Yanfeng Wang, Weidi Xie | http://arxiv.org/pdf/2406.18530v2 | link |
2024-11-18 | Grounded 3D-LLM with Referent Tokens | 基于参照标记的 grounded 3D-LLM | Yilun Chen, Shuai Yang, Haifeng Huang, Tai Wang, Runsen Xu, Ruiyuan Lyu, Dahua Lin, Jiangmiao Pang | http://arxiv.org/pdf/2405.10370v2 | link |
2024-11-18 | PhD: A ChatGPT-Prompted Visual hallucination Evaluation Dataset | 博士:基于ChatGPT提示的视觉幻觉评估数据集 | Jiazhen Liu, Yuhan Fu, Ruobing Xie, Runquan Xie, Xingwu Sun, Fengzong Lian, Zhanhui Kang, Xirong Li | http://arxiv.org/pdf/2403.11116v3 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-18 | Towards Degradation-Robust Reconstruction in Generalizable NeRF | 通用于可泛化NeRF的鲁棒退化重建 | Chan Ho Park, Ka Leong Cheng, Zhicheng Wang, Qifeng Chen | http://arxiv.org/pdf/2411.11691v1 | null |
2024-11-18 | LeC$^2$O-NeRF: Learning Continuous and Compact Large-Scale Occupancy for Urban Scenes | LeC$^2$O-NeRF:学习连续且紧凑的大规模城市场景占用 | Zhenxing Mi, Dan Xu | http://arxiv.org/pdf/2411.11374v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-18 | RoboGSim: A Real2Sim2Real Robotic Gaussian Splatting Simulator | RoboGSim:真实到仿真到真实机器人高斯散斑模拟器 | Xinhai Li, Jialin Li, Ziheng Zhang, Rui Zhang, Fan Jia, Tiancai Wang, Haoqiang Fan, Kuo-Kun Tseng, Ruiping Wang | http://arxiv.org/pdf/2411.11839v1 | null |
2024-11-18 | GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views | 基于GPS-Gaussian+的通用像素级3D高斯散斑渲染:从稀疏视图中实时渲染人景 | Boyao Zhou, Shunyuan Zheng, Hanzhang Tu, Ruizhi Shao, Boning Liu, Shengping Zhang, Liqiang Nie, Yebin Liu | http://arxiv.org/pdf/2411.11363v1 | null |
2024-11-18 | BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis | BrightDreamer:快速文本到3D合成通用的3D高斯生成框架 | Lutao Jiang, Xu Zheng, Yuanhuiyi Lyu, Jiazhou Zhou, Lin Wang | http://arxiv.org/pdf/2403.11273v2 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-18 | FERT: Real-Time Facial Expression Recognition with Short-Range FMCW Radar | FERT:基于短距离FMCW雷达的实时面部表情识别 | Sabri Mustafa Kahya, Muhammet Sami Yavuz, Eckehard Steinbach | http://arxiv.org/pdf/2411.11619v1 | null |
2024-11-18 | Color-Oriented Redundancy Reduction in Dataset Distillation | 面向颜色特征的 数据集蒸馏冗余降低 | Bowen Yuan, Zijian Wang, Yadan Luo, Mahsa Baktashmotlagh, Yadan Luo, Zi Huang | http://arxiv.org/pdf/2411.11329v1 | null |
2024-11-18 | GazeGen: Gaze-Driven User Interaction for Visual Content Generation | gazeGen:视觉内容生成中的注视驱动用户交互 | He-Yen Hsieh, Ziyun Li, Sai Qian Zhang, Wei-Te Mark Ting, Kao-Den Chang, Barbara De Salvo, Chiao Liu, H. T. Kung | http://arxiv.org/pdf/2411.04335v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-18 | LightFFDNets: Lightweight Convolutional Neural Networks for Rapid Facial Forgery Detection | 轻量级卷积神经网络在快速人脸伪造检测中的应用 | Günel Jabbarlı, Murat Kurt | http://arxiv.org/pdf/2411.11826v1 | null |
2024-11-18 | WoodYOLO: A Novel Object Detector for Wood Species Detection in Microscopic Images | 基于木种识别的微观图像新型目标检测器:WoodYOLO | Lars Nieradzik, Henrike Stephani, Jördis Sieburg-Rockel, Stephanie Helmling, Andrea Olbrich, Stephanie Wrage, Janis Keuper | http://arxiv.org/pdf/2411.11738v1 | null |
2024-11-18 | From Spectra to Geography: Intelligent Mapping of RRUFF Mineral Data | 从光谱到地理:RRUFF矿物数据的智能映射 | Francesco Pappone, Federico Califano, Marco Tafani | http://arxiv.org/pdf/2411.11693v1 | null |
2024-11-18 | Real-Time Fitness Exercise Classification and Counting from Video Frames | 实时从视频帧中分类和计数健身运动 | Riccardo Riccio | http://arxiv.org/pdf/2411.11548v1 | null |
2024-11-18 | Reliable Poisoned Sample Detection against Backdoor Attacks Enhanced by Sharpness Aware Minimization | 基于Sharpness-Aware Minimization的针对后门攻击的可靠中毒样本检测 | Mingda Zhang, Mingli Zhu, Zihao Zhu, Baoyuan Wu | http://arxiv.org/pdf/2411.11525v1 | null |
2024-11-18 | Learning a Neural Association Network for Self-supervised Multi-Object Tracking | 学习用于自监督多目标跟踪的神经关联网络 | Shuai Li, Michael Burke, Subramanian Ramamoorthy, Juergen Gall | http://arxiv.org/pdf/2411.11514v1 | null |
2024-11-18 | Exploring Emerging Trends and Research Opportunities in Visual Place Recognition | 探索视觉场所识别中的新兴趋势和研究机遇 | Antonios Gasteratos, Konstantinos A. Tsintotas, Tobias Fischer, Yiannis Aloimonos, Michael Milford | http://arxiv.org/pdf/2411.11481v1 | null |
2024-11-18 | SL-YOLO: A Stronger and Lighter Drone Target Detection Model | SL-YOLO:更强大更轻量级的无人机目标检测模型 | Defan Chen, Luchan Zhang | http://arxiv.org/pdf/2411.11477v1 | null |
2024-11-18 | MGNiceNet: Unified Monocular Geometric Scene Understanding | MGNiceNet:统一单目几何场景理解 | Markus Schön, Michael Buchholz, Klaus Dietmayer | http://arxiv.org/pdf/2411.11466v1 | null |
2024-11-18 | IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos | 宜家手册在工作中的应用:基于互联网视频的4D组装说明接地 | Yunong Liu, Cristobal Eyzaguirre, Manling Li, Shubh Khanna, Juan Carlos Niebles, Vineeth Ravi, Saumitra Mishra, Weiyu Liu, Jiajun Wu | http://arxiv.org/pdf/2411.11409v1 | null |
2024-11-18 | Stacking Brick by Brick: Aligned Feature Isolation for Incremental Face Forgery Detection | 堆砌砖块式:增量人脸伪造检测的对齐特征隔离 | Jikang Cheng, Zhiyuan Yan, Ying Zhang, Li Hao, Jiaxin Ai, Qin Zou, Chen Li, Zhongyuan Wang | http://arxiv.org/pdf/2411.11396v1 | null |
2024-11-18 | Lung Disease Detection with Vision Transformers: A Comparative Study of Machine Learning Methods | 基于视觉 Transformer 的肺部疾病检测:机器学习方法的比较研究 | Baljinnyam Dayan | http://arxiv.org/pdf/2411.11376v1 | null |
2024-11-18 | A comprehensive survey of oracle character recognition: challenges, benchmarks, and beyond | 全面综述Oracle字符识别:挑战、基准及未来 | Jing Li, Xueke Chi, Qiufeng Wang, Dahan Wang, Kaizhu Huang, Yongge Liu, Cheng-lin Liu | http://arxiv.org/pdf/2411.11354v1 | null |
2024-11-18 | Video-to-Task Learning via Motion-Guided Attention for Few-Shot Action Recognition | 基于运动引导注意力机制的少样本动作识别视频到任务学习 | Hanyu Guo, Wanchuan Yu, Suzhou Que, Kaiwen Du, Yan Yan, Hanzi Wang | http://arxiv.org/pdf/2411.11335v1 | null |
2024-11-18 | Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition | 神经:学习上下文感知演化表示以实现零样本骨骼动作识别 | Yang Chen, Jingcai Guo, Song Guo, Dacheng Tao | http://arxiv.org/pdf/2411.11288v1 | null |
2024-11-18 | Reducing Label Dependency for Underwater Scene Understanding: A Survey of Datasets, Techniques and Applications | 水下场景理解中降低标签依赖性:数据集、技术和应用综述 | Scarlett Raine, Frederic Maire, Niko Suenderhauf, Tobias Fischer | http://arxiv.org/pdf/2411.11287v1 | null |
2024-11-18 | Cross-Patient Pseudo Bags Generation and Curriculum Contrastive Learning for Imbalanced Multiclassification of Whole Slide Image | 跨患者伪袋生成与课程对比学习,用于全切片图像不平衡多分类 | Yonghuang Wu, Xuan Xie, Xinyuan Niu, Chengqian Zhao, Jinhua Yu | http://arxiv.org/pdf/2411.11262v1 | null |
2024-11-18 | Semantic or Covariate? A Study on the Intractable Case of Out-of-Distribution Detection | 语义或协变量?分布式检测难题研究 | Xingming Long, Jie Zhang, Shiguang Shan, Xilin Chen | http://arxiv.org/pdf/2411.11254v1 | null |
2024-11-18 | Noise Filtering Benchmark for Neuromorphic Satellites Observations | 神经形态卫星观测噪声滤波基准 | Sami Arja, Alexandre Marcireau, Nicholas Owen Ralph, Saeed Afshar, Gregory Cohen | http://arxiv.org/pdf/2411.11233v1 | null |
2024-11-18 | The Sound of Water: Inferring Physical Properties from Pouring Liquids | 水之声音:从倒液体中推断物理性质 | Piyush Bagad, Makarand Tapaswi, Cees G. M. Snoek, Andrew Zisserman | http://arxiv.org/pdf/2411.11222v1 | null |
2024-11-18 | Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition | 关系对比学习与遮挡图像建模用于场景文本识别 | Tiancheng Lin, Jinglei Zhang, Yi Xu, Kai Chen, Rui Zhang, Chang-Wen Chen | http://arxiv.org/pdf/2411.11219v1 | null |
2024-11-18 | Masked Autoencoders are Parameter-Efficient Federated Continual Learners | 掩码自编码器:参数高效联邦持续学习者 | Yuchen He, Xiangfeng Wang | http://arxiv.org/pdf/2411.01916v2 | link |
2024-11-18 | Task Adaptive Feature Distribution Based Network for Few-shot Fine-grained Target Classification | 基于任务自适应特征分布网络的少量样本细粒度目标分类 | Ping Li, Hongbo Wang, Lei Lu | http://arxiv.org/pdf/2410.09797v2 | null |
2024-11-18 | CerviXpert: A Multi-Structural Convolutional Neural Network for Predicting Cervix Type and Cervical Cell Abnormalities | CerviXpert:一种用于预测宫颈类型和宫颈细胞异常的多结构卷积神经网络 | Rashik Shahriar Akash, Radiful Islam, S. M. Saiful Islam Badhon, K. S. M. Tozammel Hossain | http://arxiv.org/pdf/2409.06220v2 | null |
2024-11-18 | MagicFace: Training-free Universal-Style Human Image Customized Synthesis | 无训练通用风格人像定制合成:MagicFace | Yibin Wang, Weizhong Zhang, Cheng Jin | http://arxiv.org/pdf/2408.07433v5 | null |
2024-11-18 | MIST: A Simple and Scalable End-To-End 3D Medical Imaging Segmentation Framework | MIST:一种简单且可扩展的端到端3D医学影像分割框架 | Adrian Celaya, Evan Lim, Rachel Glenn, Brayden Mi, Alex Balsells, Dawid Schellingerhout, Tucker Netherton, Caroline Chung, Beatrice Riviere, David Fuentes | http://arxiv.org/pdf/2407.21343v2 | link |
2024-11-18 | Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion | 无约束开放词汇图像分类:通过CLIP反演从文本到图像的零样本迁移学习 | Philipp Allgeuer, Kyra Ahrens, Stefan Wermter | http://arxiv.org/pdf/2407.11211v3 | null |
2024-11-18 | Formal Verification of Deep Neural Networks for Object Detection | 深度神经网络对象检测的正式验证 | Yizhak Y. Elboher, Avraham Raviv, Yael Leibovich Weiss, Omer Cohen, Roy Assa, Guy Katz, Hillel Kugler | http://arxiv.org/pdf/2407.01295v5 | null |
2024-11-18 | MV2Cyl: Reconstructing 3D Extrusion Cylinders from Multi-View Images | MV2Cyl:从多视角图像重建3D挤出圆柱 | Eunji Hong, Minh Hieu Nguyen, Mikaela Angelina Uy, Minhyuk Sung | http://arxiv.org/pdf/2406.10853v3 | null |
2024-11-18 | Searching for internal symbols underlying deep learning | 寻找深层次学习背后的内部符号 | Jung H. Lee, Sujith Vijayan | http://arxiv.org/pdf/2405.20605v2 | null |
2024-11-18 | Eidos: Efficient, Imperceptible Adversarial 3D Point Clouds | Eidos:高效、难以察觉的对抗性3D点云 | Hanwei Zhang, Luo Cheng, Qisong He, Wei Huang, Renjue Li, Ronan Sicre, Xiaowei Huang, Holger Hermanns, Lijun Zhang | http://arxiv.org/pdf/2405.14210v2 | null |
2024-11-18 | Machine Vision-Based Assessment of Fall Color Changes and its Relationship with Leaf Nitrogen Concentration | 基于机器视觉的秋季叶色变化评估及其与叶片氮浓度的关系 | Achyut Paudel, Jostan Brown, Priyanka Upadhyaya, Atif Bilal Asad, Safal Kshetri, Joseph R. Davidson, Cindy Grimm, Ashley Thompson, Bernardita Sallato, Matthew D. Whiting, et.al. | http://arxiv.org/pdf/2404.14653v3 | null |
2024-11-18 | Watermark-based Detection and Attribution of AI-Generated Content | 基于水印的AI生成内容检测与归因 | Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Neil Zhenqiang Gong | http://arxiv.org/pdf/2404.04254v2 | null |
2024-11-18 | Structural-Based Uncertainty in Deep Learning Across Anatomical Scales: Analysis in White Matter Lesion Segmentation | 基于结构的不确定性在深度学习中跨解剖尺度:白质病变分割分析 | Nataliia Molchanova, Vatsal Raina, Andrey Malinin, Francesco La Rosa, Adrien Depeursinge, Mark Gales, Cristina Granziera, Henning Muller, Mara Graziani, Meritxell Bach Cuadra | http://arxiv.org/pdf/2311.08931v3 | link |
2024-11-18 | Learning to mask: Towards generalized face forgery detection | 学习进行遮挡:迈向通用的面部伪造检测 | Jianwei Fei, Yunshu Dai, Huaming Wang, Zhihua Xia | http://arxiv.org/pdf/2212.14309v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-18 | Scalable Autoregressive Monocular Depth Estimation | 可扩展的自回归单目深度估计 | Jinhong Wang, Jian Liu, Dongqi Tang, Weiqiang Wang, Wentong Li, Danny Chen, J intai Chen, Jian Wu | http://arxiv.org/pdf/2411.11361v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-18 | Generative World Explorer | 生成式世界探索者 | Taiming Lu, Tianmin Shu, Alan Yuille, Daniel Khashabi, Jieneng Chen | http://arxiv.org/pdf/2411.11844v1 | null |
2024-11-18 | Exploring adversarial robustness of JPEG AI: methodology, comparison and new methods | 探索JPEG AI的抗对抗性:方法、比较与新方法 | Egor Kovalev, Georgii Bychkov, Khaled Abud, Aleksandr Gushchin, Anna Chistyakova, Sergey Lavrushkin, Dmitriy Vatolin, Anastasia Antsiferova | http://arxiv.org/pdf/2411.11795v1 | null |
2024-11-18 | Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment | 通过渐进式概念-瓶颈驱动对齐提升视觉-语言模型安全性 | Zhendong Liu, Yuanbi Nie, Yingshui Tan, Xiangyu Yue, Qiushi Cui, Chongjun Wang, Xiaoyong Zhu, Bo Zheng | http://arxiv.org/pdf/2411.11543v1 | null |
2024-11-18 | Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment | 精细粒度验证器:视觉-语言对齐中的偏好建模作为下一个标记预测 | Chenhang Cui, An Zhang, Yiyang Zhou, Zhaorun Chen, Gelei Deng, Huaxiu Yao, Tat-Seng Chua | http://arxiv.org/pdf/2410.14148v2 | null |
2024-11-18 | Utilizing Large Language Models in an iterative paradigm with domain feedback for molecule optimization | 利用具有领域反馈的迭代范式中的大型语言模型进行分子优化 | Khiem Le, Nitesh V. Chawla | http://arxiv.org/pdf/2410.13147v6 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-18 | Relevance-guided Audio Visual Fusion for Video Saliency Prediction | 基于相关性的视频显著性预测中的音视频融合 | Li Yu, Xuanzhe Sun, Pan Gao, Moncef Gabbouj | http://arxiv.org/pdf/2411.11454v1 | null |
2024-11-18 | Superpixel-informed Implicit Neural Representation for Multi-Dimensional Data | 基于超像素信息的多维数据隐式神经网络表示 | Jiayi Li, Xile Zhao, Jianli Wang, Chao Wang, Min Wang | http://arxiv.org/pdf/2411.11356v1 | null |
2024-11-18 | BeautyBank: Encoding Facial Makeup in Latent Space | 美丽银行:面部妆容在潜在空间的编码 | Qianwen Lu, Xingchao Yang, Takafumi Taketomi | http://arxiv.org/pdf/2411.11231v1 | null |
2024-11-18 | DeforHMR: Vision Transformer with Deformable Cross-Attention for 3D Human Mesh Recovery | DeformHMR:用于3D人体网格恢复的可变形交叉注意力视觉Transformer | Jaewoo Heo, George Hu, Zeyu Wang, Serena Yeung-Levy | http://arxiv.org/pdf/2411.11214v1 | null |
2024-11-18 | Rendering-Oriented 3D Point Cloud Attribute Compression using Sparse Tensor-based Transformer | 基于稀疏张量变换的面向渲染的3D点云属性压缩 | Xiao Huo, Junhui Hou, Shuai Wan, Fuzheng Yang | http://arxiv.org/pdf/2411.07899v2 | null |
2024-11-18 | Activating Self-Attention for Multi-Scene Absolute Pose Regression | 激活自注意力实现多场景绝对姿态回归 | Miso Lee, Jihwan Kim, Jae-Pil Heo | http://arxiv.org/pdf/2411.01443v2 | link |
2024-11-18 | DemMamba: Alignment-free Raw Video Demoireing with Frequency-assisted Spatio-Temporal Mamba | DemMamba:基于频率辅助的时空Mamba无对齐原始视频去隔行处理 | Shuning Xu, Xina Liu, Binbin Song, Xiangyu Chen, Qiubo Chen, Jiantao Zhou | http://arxiv.org/pdf/2408.10679v2 | null |
2024-11-18 | DreamText: High Fidelity Scene Text Synthesis | 梦文:高保真场景文本合成 | Yibin Wang, Weizhong Zhang, Cheng Jin | http://arxiv.org/pdf/2405.14701v3 | link |
2024-11-18 | Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild | 变色龙:野外观测密集视觉预测中的高效通用数据算法 | Donggyun Kim, Seongwoong Cho, Semin Kim, Chong Luo, Seunghoon Hong | http://arxiv.org/pdf/2404.18459v2 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-18 | UniHands: Unifying Various Wild-Collected Keypoints for Personalized Hand Reconstruction | UniHands:统一各种野采集关键点以实现个性化手部重建 | Menghe Zhang, Joonyeoup Kim, Yangwen Liang, Shuangquan Wang, Kee-Bong Song | http://arxiv.org/pdf/2411.11845v1 | null |
2024-11-18 | RAWMamba: Unified sRGB-to-RAW De-rendering With State Space Model | RAWMamba:基于状态空间模型的统一sRGB到RAW去渲染 | Hongjun Chen, Wencheng Han, Huan Zheng, Jianbing Shen | http://arxiv.org/pdf/2411.11717v1 | null |
2024-11-18 | Leveraging Computational Pathology AI for Noninvasive Optical Imaging Analysis Without Retraining | 利用计算病理学AI进行无需重新训练的非侵入性光学成像分析 | Danny Barash, Emilie Manning, Aidan Van Vleck, Omri Hirsch, Kyi Lei Aye, Jingxi Li, Philip O. Scumpia, Aydogan Ozcan, Sumaira Aasi, Kerri E. Rieger, et.al. | http://arxiv.org/pdf/2411.11613v1 | null |
2024-11-18 | A Review of Digital Pixel Sensors | 数字像素传感器综述 | Md Rahatul Islam Udoy, Shamiul Alam, Md Mazharul Islam, Akhilesh Jaiswal, Ahmedullah Aziz | http://arxiv.org/pdf/2402.04507v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-18 | Text-guided Zero-Shot Object Localization | 基于文本的零样本目标定位 | Jingjing Wang, Xinglin Piao, Zongzhi Gao, Bo Li, Yong Zhang, Baocai Yin | http://arxiv.org/pdf/2411.11357v1 | null |
2024-11-18 | Visual-Semantic Graph Matching Net for Zero-Shot Learning | 基于视觉语义图匹配网络的零样本学习 | Bowen Duan, Shiming Chen, Yufei Guo, Guo-Sen Xie, Weiping Ding, Yisong Wang | http://arxiv.org/pdf/2411.11351v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-18 | Equivariant spatio-hemispherical networks for diffusion MRI deconvolution | 等变空间半球网络在扩散磁共振成像去卷积中的应用 | Axel Elaldi, Guido Gerig, Neel Dey | http://arxiv.org/pdf/2411.11819v1 | null |
2024-11-18 | Revitalizing Electoral Trust: Enhancing Transparency and Efficiency through Automated Voter Counting with Machine Learning | 通过机器学习自动选民计票提升选举信任度:增强透明性和效率 | Mir Faris, Syeda Aynul Karim, Md. Juniadul Islam | http://arxiv.org/pdf/2411.11740v1 | null |
2024-11-18 | MC-LLaVA: Multi-Concept Personalized Vision-Language Model | MC-LLaVA:多概念个性化视觉-语言模型 | Ruichuan An, Sihan Yang, Ming Lu, Kai Zeng, Yulin Luo, Ying Chen, Jiajun Cao, Hao Liang, Qi She, Shanghang Zhang, et.al. | http://arxiv.org/pdf/2411.11706v1 | null |
2024-11-18 | MSSIDD: A Benchmark for Multi-Sensor Denoising | 多传感器去噪基准:MSSIDD | Shibin Mei, Hang Wang, Bingbing Ni | http://arxiv.org/pdf/2411.11562v1 | null |
2024-11-18 | SignEye: Traffic Sign Interpretation from Vehicle First-Person View | SignEye:车辆第一视角的交通标志识别 | Chuang Yang, Xu Han, Tao Han, Yuejiao SU, Junyu Gao, Hongyuan Zhang, Yi Wang, Lap-Pui Chau | http://arxiv.org/pdf/2411.11507v1 | null |
2024-11-18 | Generalizable Person Re-identification via Balancing Alignment and Uniformity | 通用的人体重识别:平衡对齐与一致性 | Yoonki Cho, Jaeyoon Kim, Woo Jae Kim, Junsik Jung, Sung-eui Yoon | http://arxiv.org/pdf/2411.11471v1 | null |
2024-11-18 | Towards fast DBSCAN via Spectrum-Preserving Data Compression | 基于频谱保留数据压缩的快速DBSCAN算法 | Yongyu Wang | http://arxiv.org/pdf/2411.11421v1 | null |
2024-11-18 | Performance Evaluation of Geospatial Images based on Zarr and Tiff | 基于Zarr和Tiff的地理空间图像性能评估 | Jaheer Khan, Swarup E, Rakshit Ramesh | http://arxiv.org/pdf/2411.11291v1 | null |
2024-11-18 | Continuous K-space Recovery Network with Image Guidance for Fast MRI Reconstruction | 基于图像引导的连续K空间恢复网络实现快速MRI重建 | Yucong Meng, Zhiwei Yang, Minghong Duan, Yonghong Shi, Zhijian Song | http://arxiv.org/pdf/2411.11282v1 | null |
2024-11-18 | DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation | 驱动球体:构建高保真4D世界用于闭环仿真 | Tianyi Yan, Dongming Wu, Wencheng Han, Junpeng Jiang, Xia Zhou, Kun Zhan, Cheng-zhong Xu, Jianbing Shen | http://arxiv.org/pdf/2411.11252v1 | null |
2024-11-18 | Partial Scene Text Retrieval | 部分场景文本检索 | Hao Wang, Minghui Liao, Zhouyi Xie, Wenyu Liu, Xiang Bai | http://arxiv.org/pdf/2411.10261v2 | link |
2024-11-18 | ObjectNLQ @ Ego4D Episodic Memory Challenge 2024 | ObjectNLQ在Ego4D情景记忆挑战2024 | Yisen Feng, Haoyu Zhang, Yuquan Xie, Zaijing Li, Meng Liu, Liqiang Nie | http://arxiv.org/pdf/2406.15778v2 | link |
2024-11-18 | A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting | CAC配方:基于mosaic的通用损失函数以提升类无关计数 | Tsung-Han Chou, Brian Wang, Wei-Chen Chiu, Jun-Cheng Chen | http://arxiv.org/pdf/2404.09826v2 | null |
2024-11-18 | Image Demoireing in RAW and sRGB Domains | RAW与sRGB域中的图像去噪 | Shuning Xu, Binbin Song, Xiangyu Chen, Xina Liu, Jiantao Zhou | http://arxiv.org/pdf/2312.09063v3 | null |
2024-11-18 | A Scalable Training Strategy for Blind Multi-Distribution Noise Removal | Source-Channel Decoupling for Efficient Deep Learning of Multi-Modal Data | Kevin Zhang, Sakshum Kulshrestha, Christopher Metzler | http://arxiv.org/pdf/2310.20064v2 | null |
2024-11-18 | Uncovering Hidden Connections: Iterative Search and Reasoning for Video-grounded Dialog | 揭示隐藏关联:基于视频的对话的迭代搜索与推理 | Haoyu Zhang, Meng Liu, Yaowei Wang, Da Cao, Weili Guan, Liqiang Nie | http://arxiv.org/pdf/2310.07259v3 | link |
2024-11-18 | Unmasking Parkinson's Disease with Smile: An AI-enabled Screening Framework | 利用微笑揭示帕金森病:一款AI赋能的筛查框架 | Tariq Adnan, Md Saiful Islam, Wasifur Rahman, Sangwu Lee, Sutapa Dey Tithi, Kazi Noshin, Imran Sarker, M Saifur Rahman, Ehsan Hoque | http://arxiv.org/pdf/2308.02588v2 | null |