Skip to content

Latest commit

 

History

History
executable file
·
199 lines (174 loc) · 40.1 KB

2024-10-16.md

File metadata and controls

executable file
·
199 lines (174 loc) · 40.1 KB

[UPDATED!] 2024-10-16 (Publish Time)

生成模型

Publish Date Title Title_CN Authors PDF Code
2024-10-16 Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts Models 中文翻译为:扩散模型上的元遗忘学习:防止重学前遗忘概念 Hongcheng Gao, Tianyu Pang, Chao Du, Taihang Hu, Zhijie Deng, Min Lin http://arxiv.org/pdf/2410.12777v1 null
2024-10-16 SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation SAFREE:无训练自适应保护机制,保障安全文本到图像与视频生成的安全性 Jaehong Yoon, Shoubin Yu, Vaidehi Patil, Huaxiu Yao, Mohit Bansal http://arxiv.org/pdf/2410.12761v1 null
2024-10-16 Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization 嵌入道德思维:通过轻量级价值优化对文本到图像合成进行校准 Xingqi Wang, Xiaoyuan Yi, Xing Xie, Jia Jia http://arxiv.org/pdf/2410.12700v1 null
2024-10-16 AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing An adaptive Drag:基于扩散的图像编辑中的语义驱动拖拽 DuoSheng Chen, Binghui Chen, Yifeng Geng, Liefeng Bo http://arxiv.org/pdf/2410.12696v1 null
2024-10-16 One Step Diffusion via Shortcut Models 一步扩散:通过捷径模型的快速实现 Kevin Frans, Danijar Hafner, Sergey Levine, Pieter Abbeel http://arxiv.org/pdf/2410.12557v1 null
2024-10-16 Shaping a Stabilized Video by Mitigating Unintended Changes for Concept-Augmented Video Editing 概念增强型视频编辑中通过减轻非预期变化塑造稳定视频的方法研究 Mingce Guo, Jingxuan He, Shengeng Tang, Zhangye Wang, Lechao Cheng http://arxiv.org/pdf/2410.12526v1 null
2024-10-16 DH-VTON: Deep Text-Driven Virtual Try-On via Hybrid Attention Learning DH-VTON: 基于混合注意力学习的深度文本驱动虚拟试衣技术 Jiabao Wei, Zhiyuan Ma http://arxiv.org/pdf/2410.12501v1 null
2024-10-16 Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective 稳定图像自回归建模的潜在空间:一种统一视角 Yongxin Zhu, Bocheng Li, Hang Zhang, Xin Li, Linli Xu, Lidong Bing http://arxiv.org/pdf/2410.12490v1 null
2024-10-16 Synthetic Augmentation for Anatomical Landmark Localization using DDPMs 合成增强用于基于DDPMs的解剖学地标定位 Arnela Hadzic, Lea Bogensperger, Simon Johannes Joham, Martin Urschler http://arxiv.org/pdf/2410.12489v1 null
2024-10-16 GAN Based Top-Down View Synthesis in Reinforcement Learning Environments 基于GAN的强化学习环境中的顶部视图合成 Usama Younus, Vinoj Jayasundara, Shivam Mishra, Suleyman Aslan http://arxiv.org/pdf/2410.12372v1 null
2024-10-16 Improved Anomaly Detection through Conditional Latent Space VAE Ensembles 条件潜在空间VAE集成在异常检测中的改进方法 Oskar Åström, Alexandros Sopasakis http://arxiv.org/pdf/2410.12328v1 null
2024-10-16 Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond 分解融合:一种自监督图像融合及扩展方法 Pengwei Liang, Junjun Jiang, Qing Ma, Xianming Liu, Jiayi Ma http://arxiv.org/pdf/2410.12274v1 null
2024-10-16 DaDiff: Domain-aware Diffusion Model for Nighttime UAV Tracking DaDiff: 针对夜间无人机追踪的领域感知扩散模型 Haobo Zuo, Changhong Fu, Guangze Zheng, Liangliang Yao, Kunhan Lu, Jia Pan http://arxiv.org/pdf/2410.12270v1 null
2024-10-16 Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices 高效扩散模型:从原理到实践的全面综述 Zhiyuan Ma, Yuzhu Zhang, Guoli Jia, Liangliang Zhao, Yichao Ma, Mingjie Ma, Gaofeng Liu, Kaiyan Zhang, Jianjun Li, Bowen Zhou http://arxiv.org/pdf/2410.11795v2 null
2024-10-16 Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction Lotus:基于扩散的高质量密集预测视觉基础模型 Jing He, Haodong Li, Wei Yin, Yixun Liang, Leheng Li, Kaiqiang Zhou, Hongbo Zhang, Bingbing Liu, Ying-Cong Chen http://arxiv.org/pdf/2409.18124v3 null
2024-10-16 Sample what you cant compress 无法压缩的样本,请进行采样 Vighnesh Birodkar, Gabriel Barcik, James Lyon, Sergey Ioffe, David Minnen, Joshua V. Dillon http://arxiv.org/pdf/2409.02529v3 null
2024-10-16 A3D: Does Diffusion Dream about 3D Alignment? A3D:扩散模型是否梦寐以求三维对齐? Savva Ignatyev, Nina Konovalova, Daniil Selikhanovych, Oleg Voynov, Nikolay Patakin, Ilya Olkov, Dmitry Senushkin, Alexey Artemov, Anton Konushin, Alexander Filippov, et.al. http://arxiv.org/pdf/2406.15020v3 null
2024-10-16 Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians Mini-Splatting:使用受限数量的高斯函数表示场景 Guangchi Fang, Bing Wang http://arxiv.org/pdf/2403.14166v3 link
2024-10-16 AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data AnimateLCM:无需个性化视频数据的计算高效个性化风格视频生成 Fu-Yun Wang, Zhaoyang Huang, Weikang Bian, Xiaoyu Shi, Keqiang Sun, Guanglu Song, Yu Liu, Hongsheng Li http://arxiv.org/pdf/2402.00769v3 link
2024-10-16 Generative Models: What Do They Know? Do They Know Things? Let's Find Out! 生成模型:它们知道什么?它们是否了解事物?让我们一探究竟! Xiaodan Du, Nicholas Kolkin, Greg Shakhnarovich, Anand Bhattad http://arxiv.org/pdf/2311.17137v3 null
2024-10-16 Reverse Stable Diffusion: What prompt was used to generate this image? 逆稳定扩散:生成这张图片使用了什么提示? Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Mubarak Shah http://arxiv.org/pdf/2308.01472v2 null

多模态

Publish Date Title Title_CN Authors PDF Code
2024-10-16 Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models 视觉语言模型的测试时泛化:双重原型演化方法 Ce Zhang, Simon Stepputtis, Katia Sycara, Yaqi Xie http://arxiv.org/pdf/2410.12790v1 null
2024-10-16 The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio 多模态之咒:评估大型多模态模型在语言、视觉和音频模态下的幻觉生成 Sicong Leng, Yun Xing, Zesen Cheng, Yang Zhou, Hang Zhang, Xin Li, Deli Zhao, Shijian Lu, Chunyan Miao, Lidong Bing http://arxiv.org/pdf/2410.12787v1 null
2024-10-16 WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines "全球美食:大规模多语言多文化全球美食视觉问答基准" Genta Indra Winata, Frederikus Hudi, Patrick Amadeus Irawan, David Anugraha, Rifki Afina Putri, Yutong Wang, Adam Nohejl, Ubaidillah Ariq Prathama, Nedjma Ousidhoum, Afifa Amriani, et.al. http://arxiv.org/pdf/2410.12705v1 null
2024-10-16 VividMed: Vision Language Model with Versatile Visual Grounding for Medicine VividMed:面向医学的具有多能视觉接地能力的视觉语言模型 Lingxiao Luo, Bingda Tang, Xuanzhong Chen, Rong Han, Ting Chen http://arxiv.org/pdf/2410.12694v1 null
2024-10-16 DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception DocLayout-YOLO:通过多样化合成数据与全局到局部自适应感知提升文档布局分析性能 Zhiyuan Zhao, Hengrui Kang, Bin Wang, Conghui He http://arxiv.org/pdf/2410.12628v1 null
2024-10-16 Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion Cocoon:具有不确定性感知的稳健多模态感知传感器融合技术 Minkyoung Cho, Yulong Cao, Jiachen Sun, Qingzhao Zhang, Marco Pavone, Jeong Joon Park, Heng Yang, Z. Morley Mao http://arxiv.org/pdf/2410.12592v1 null
2024-10-16 FTII-Bench: A Comprehensive Multimodal Benchmark for Flow Text with Image Insertion FTII-Bench:面向图文插入的全面多模态基准测试平台 Jiacheng Ruan, Yebin Yang, Zehao Lin, Feiyu Xiong, Zeyun Tang, Zhiyu Li http://arxiv.org/pdf/2410.12564v1 null
2024-10-16 HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks HumanEval-V:通过编程任务评估大型多模态模型的视觉理解与推理能力 Fengji Zhang, Linquan Wu, Huiyu Bai, Guancheng Lin, Xiao Li, Xiao Yu, Yue Wang, Bei Chen, Jacky Keung http://arxiv.org/pdf/2410.12381v1 null
2024-10-16 ARIC: An Activity Recognition Dataset in Classroom Surveillance Images ARIC:一种教室监控图像中的活动识别数据集 Linfeng Xu, Fanman Meng, Qingbo Wu, Lili Pan, Heqian Qiu, Lanxiao Wang, Kailong Chen, Kanglei Geng, Yilei Qian, Haojie Wang, et.al. http://arxiv.org/pdf/2410.12337v1 null
2024-10-16 MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs MC-Bench:面向MLLMs时代的多上下文视觉接地基准测试 Yunqiu Xu, Linchao Zhu, Yi Yang http://arxiv.org/pdf/2410.12332v1 null
2024-10-16 Sparse Prototype Network for Explainable Pedestrian Behavior Prediction 稀疏原型网络在可解释行人行为预测中的应用 Yan Feng, Alexander Carballo, Kazuya Takeda http://arxiv.org/pdf/2410.12195v1 null
2024-10-16 Unveiling the Limits of Alignment: Multi-modal Dynamic Local Fusion Network and A Benchmark for Unaligned RGBT Video Object Detection 揭示对齐极限:多模态动态局部融合网络及非对齐RGBT视频目标检测基准 Qishun Wang, Zhengzheng Tu, Kunpeng Wang, Le Gu, Chuanwang Guo http://arxiv.org/pdf/2410.12143v1 null
2024-10-16 Efficient and Effective Universal Adversarial Attack against Vision-Language Pre-training Models 高效且有效的针对视觉-语言预训练模型的通用对抗攻击策略 Fan Yang, Yihao Huang, Kailong Wang, Ling Shi, Geguang Pu, Yang Liu, Haoyu Wang http://arxiv.org/pdf/2410.11639v2 null
2024-10-16 Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities Mini-Omni2:迈向具有视觉、语音和双向通信能力的开源GPT-4o Zhifei Xie, Changqiao Wu http://arxiv.org/pdf/2410.11190v2 null
2024-10-16 Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs 自由视频-LLM:提示引导的视觉感知实现高效无训练视频LLM Kai Han, Jianyuan Guo, Yehui Tang, Wei He, Enhua Wu, Yunhe Wang http://arxiv.org/pdf/2410.10441v2 link
2024-10-16 Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate 解密大型视觉-语言模型中的跨模态对齐与模态融合速率 Qidong Huang, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Jiaqi Wang, Dahua Lin, Weiming Zhang, Nenghai Yu http://arxiv.org/pdf/2410.07167v2 link
2024-10-16 On Large Uni- and Multi-modal Models for Unsupervised Classification of Social Media Images: Nature's Contribution to People as a case study 大规模单模态与多模态模型在社交媒体图片无监督分类中的应用:以《自然》对人类贡献为例的研究 Rohaifa Khaldi, Domingo Alcaraz-Segura, Ignacio Sánchez-Herrera, Javier Martinez-Lopez, Carlos Javier Navarro, Siham Tabik http://arxiv.org/pdf/2410.00275v2 null
2024-10-16 InterACT: Inter-dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation InterACT:基于层次注意力Transformer的双臂操作中互依赖感知动作分块方法 Andrew Lee, Ian Chuang, Ling-Yuan Chen, Iman Soltani http://arxiv.org/pdf/2409.07914v3 null
2024-10-16 MERLIN: Multimodal Embedding Refinement via LLM-based Iterative Navigation for Text-Video Retrieval-Rerank Pipeline MERLIN:基于LLM的迭代导航多模态嵌入优化方法,用于文本-视频检索重排管道 Donghoon Han, Eunhwan Park, Gisang Lee, Adam Lee, Nojun Kwak http://arxiv.org/pdf/2407.12508v2 null
2024-10-16 AIC MLLM: Autonomous Interactive Correction MLLM for Robust Robotic Manipulation 自主交互校正多级别学习模型(AIC MLLM):用于稳健机器人操作的自治多级别学习模型 Chuyan Xiong, Chengyu Shen, Xiaoqi Li, Kaichen Zhou, Jiaming Liu, Ruiping Wang, Hao Dong http://arxiv.org/pdf/2406.11548v5 null
2024-10-16 MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models MFC-Bench:大规模视觉-语言模型的多模态事实核查基准测试 Shengkang Wang, Hongzhan Lin, Ziyang Luo, Zhen Ye, Guang Chen, Jing Ma http://arxiv.org/pdf/2406.11288v2 link
2024-10-16 Instruction-Guided Visual Masking 指导性视觉遮罩的指令引导研究 Jinliang Zheng, Jianxiong Li, Sijie Cheng, Yinan Zheng, Jiaming Li, Jihao Liu, Yu Liu, Jingjing Liu, Xianyuan Zhan http://arxiv.org/pdf/2405.19783v2 link
2024-10-16 Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography 开发基于多模态数据集的三维计算机断层扫描通用基础模型 Ibrahim Ethem Hamamci, Sezgin Er, Furkan Almas, Ayse Gulnihan Simsek, Sevval Nil Esirgun, Irem Dogan, Muhammed Furkan Dasdelen, Omer Faruk Durugol, Bastian Wittmann, Tamaz Amiranashvili, et.al. http://arxiv.org/pdf/2403.17834v2 link
2024-10-16 AdaMSS: Adaptive Multi-Modality Segmentation-to-Survival Learning for Survival Outcome Prediction from PET/CT Images AdaMSS:基于PET/CT图像的自适应多模态分割至生存学习用于生存结果预测 Mingyuan Meng, Bingxin Gu, Michael Fulham, Shaoli Song, Dagan Feng, Lei Bi, Jinman Kim http://arxiv.org/pdf/2305.09946v3 link

Nerf

Publish Date Title Title_CN Authors PDF Code
2024-10-16 EG-HumanNeRF: Efficient Generalizable Human NeRF Utilizing Human Prior for Sparse View EG-HumanNeRF:利用人类先验知识的高效泛化人类NeRF在稀疏视图中的应用 Zhaorong Wang, Yoshihiro Kanamori, Yuki Endo http://arxiv.org/pdf/2410.12242v1 null

3DGS

Publish Date Title Title_CN Authors PDF Code
2024-10-16 Gaussian Primitives for Deformable Image Registration 高斯基元在可变形图像配准中的应用 Jihe Li, Xiang Liu, Fabian Zhang, Xia Li, Xixin Cao, Ye Zhang, Joachim Buhmann http://arxiv.org/pdf/2406.03394v2 null

模型压缩/优化

Publish Date Title Title_CN Authors PDF Code
2024-10-16 Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats 长序列大范围高斯喷射重建模型:长-LRM Chen Ziwen, Hao Tan, Kai Zhang, Sai Bi, Fujun Luan, Yicong Hong, Li Fuxin, Zexiang Xu http://arxiv.org/pdf/2410.12781v1 null
2024-10-16 Towards Flexible and Efficient Diffusion Low Light Enhancer 柔性高效扩散型低光照增强器研究进展 Guanzhou Lan, Qianli Ma, Yuqi Yang, Zhigang Wang, Dong Wang, Yuan Yuan, Bin Zhao http://arxiv.org/pdf/2410.12346v1 null
2024-10-16 TAS: Distilling Arbitrary Teacher and Student via a Hybrid Assistant TAS:通过混合助手实现任意教师与学生模型的蒸馏 Guopeng Li, Qiang Wang, Ke Yan, Shouhong Ding, Yuan Gao, Gui-Song Xia http://arxiv.org/pdf/2410.12342v1 null
2024-10-16 Optimizing YOLOv5s Object Detection through Knowledge Distillation algorithm 优化YOLOv5s目标检测通过知识蒸馏算法 Guanming Huang, Aoran Shen, Yuxiang Hu, Junliang Du, Jiacheng Hu, Yingbin Liang http://arxiv.org/pdf/2410.12259v1 null
2024-10-16 TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration TransAgent:基于异构智能体协作的视觉-语言基础模型迁移学习 Yiwei Guo, Shaobin Zhuang, Kunchang Li, Yu Qiao, Yali Wang http://arxiv.org/pdf/2410.12183v1 null
2024-10-16 Dual-Model Distillation for Efficient Action Classification with Hybrid Edge-Cloud Solution 双模型蒸馏用于混合边缘-云解决方案的高效动作分类 Timothy Wei, Hsien Xin Peng, Elaine Xu, Bryan Zhao, Lei Ding, Diji Yang http://arxiv.org/pdf/2410.12165v1 null
2024-10-16 SAM-Guided Masked Token Prediction for 3D Scene Understanding SAM引导的掩码token预测在3D场景理解中的应用 Zhimin Chen, Liang Yang, Yingwei Li, Longlong Jing, Bing Li http://arxiv.org/pdf/2410.12158v1 null

分类/检测/识别/分割/...

Publish Date Title Title_CN Authors PDF Code
2024-10-16 Towards Zero-Shot Camera Trap Image Categorization 零样本相机陷阱图像分类研究进展 Jiří Vyskočil, Lukas Picek http://arxiv.org/pdf/2410.12769v1 null
2024-10-16 PND-Net: Plant Nutrition Deficiency and Disease Classification using Graph Convolutional Network PND-Net:基于图卷积网络实现植物营养缺乏与病害分类 Asish Bera, Debotosh Bhattacharjee, Ondrej Krejcar http://arxiv.org/pdf/2410.12742v1 null
2024-10-16 RAFA-Net: Region Attention Network For Food Items And Agricultural Stress Recognition RAFA-Net:面向食品项与农业压力识别的区域注意力网络 Asish Bera, Ondrej Krejcar, Debotosh Bhattacharjee http://arxiv.org/pdf/2410.12718v1 null
2024-10-16 MultiCamCows2024 -- A Multi-view Image Dataset for AI-driven Holstein-Friesian Cattle Re-Identification on a Working Farm 多视角牛只识别数据集MultiCamCows2024:面向工作农场AI驱动的荷斯坦-弗里生牛再识别 Phoenix Yu, Tilo Burghardt, Andrew W Dowsey, Neill W Campbell http://arxiv.org/pdf/2410.12695v1 null
2024-10-16 Machine Learning Approach to Brain Tumor Detection and Classification 机器学习在脑肿瘤检测与分类中的应用方法 Alice Oh, Inyoung Noh, Jian Choo, Jihoo Lee, Justin Park, Kate Hwang, Sanghyeon Kim, Soo Min Oh http://arxiv.org/pdf/2410.12692v1 null
2024-10-16 Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2 自动从自由文本中映射解剖学标志:基于Llama-2的大型语言模型的见解 Mohamad Abdi, Gerardo Hemosillo Valadez, Halid Ziya Yerebakan http://arxiv.org/pdf/2410.12686v1 null
2024-10-16 MambaBEV: An efficient 3D detection model with Mamba2 曼巴BEV:一种基于Mamba2的高效3D检测模型 Zihan You, Hao Wang, Qichao Zhao, Jinxiang Wang http://arxiv.org/pdf/2410.12673v1 null
2024-10-16 Cascade learning in multi-task encoder-decoder networks for concurrent bone segmentation and glenohumeral joint assessment in shoulder CT scans 多任务编码器-解码器网络中的级联学习在肩部CT扫描中的并发骨分割与盂肱关节评估 Luca Marsilio, Davide Marzorati, Matteo Rossi, Andrea Moglia, Luca Mainardi, Alfonso Manzotti, Pietro Cerveri http://arxiv.org/pdf/2410.12641v1 null
2024-10-16 CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training CMAL:一种新颖的跨模态关联学习框架用于视觉-语言预训练 Zhiyuan Ma, Jianjun Li, Guohui Li, Kaiyan Huang http://arxiv.org/pdf/2410.12595v1 null
2024-10-16 From Lab to Pocket: A Novel Continual Learning-based Mobile Application for Screening COVID-19 实验室到口袋:一种基于持续学习的筛查COVID-19新型移动应用程序 Danny Falero, Muhammad Ashad Kabir, Nusrat Homaira http://arxiv.org/pdf/2410.12589v1 null
2024-10-16 Self-DenseMobileNet: A Robust Framework for Lung Nodule Classification using Self-ONN and Stacking-based Meta-Classifier 自我密集MobileNet:一种基于自我ONN和堆叠元分类器的鲁棒肺结节分类框架 Md. Sohanur Rahman, Muhammad E. H. Chowdhury, Hasib Ryan Rahman, Mosabber Uddin Ahmed, Muhammad Ashad Kabir, Sanjiban Sekhar Roy, Rusab Sarmun http://arxiv.org/pdf/2410.12584v1 null
2024-10-16 Adaptive Prompt Learning with SAM for Few-shot Scanning Probe Microscope Image Segmentation 自适应提示学习与SAM结合的少样本扫描探针显微镜图像分割方法 Yao Shen, Ziwei Wei, Chunmeng Liu, Shuming Wei, Qi Zhao, Kaiyang Zeng, Guangyao Li http://arxiv.org/pdf/2410.12562v1 null
2024-10-16 Development of Image Collection Method Using YOLO and Siamese Network 基于YOLO与Siamese网络的图像采集方法开发 Chan Young Shin, Ah Hyun Lee, Jun Young Lee, Ji Min Lee, Soo Jin Park http://arxiv.org/pdf/2410.12561v1 null
2024-10-16 Evaluating Utility of Memory Efficient Medical Image Generation: A Study on Lung Nodule Segmentation 评估内存高效医学图像生成的实用性:肺结节分割研究 Kathrin Khadra, Utku Türkbey http://arxiv.org/pdf/2410.12542v1 null
2024-10-16 Mind the Gap Between Prototypes and Images in Cross-domain Finetuning 跨域微调中原型与图像之间的差异值得关注 Hongduan Tian, Feng Liu, Zhanke Zhou, Tongliang Liu, Chengqi Zhang, Bo Han http://arxiv.org/pdf/2410.12474v1 null
2024-10-16 Attention-Guided Perturbation for Consistency Regularization in Semi-Supervised Medical Image Segmentation 注意力引导扰动在半监督医学图像分割中的一致性正则化研究 Yuxuan Cheng, Chenxi Shao, Jie Ma, Guoliang Li http://arxiv.org/pdf/2410.12419v1 null
2024-10-16 Feature Augmentation for Self-supervised Contrastive Learning: A Closer Look 特征增强的自监督对比学习:深入探究 Yong Zhang, Rui Zhu, Shifeng Zhang, Xu Zhou, Shifeng Chen, Xiaofan Chen http://arxiv.org/pdf/2410.12396v1 null
2024-10-16 Real-time Stereo-based 3D Object Detection for Streaming Perception 基于实时立体视觉的流式感知三维目标检测 Changcai Li, Zonghua Gu, Gang Chen, Libo Huang, Wei Zhang, Huihui Zhou http://arxiv.org/pdf/2410.12394v1 null
2024-10-16 Context-Infused Visual Grounding for Art 语境增强的视觉定位技术在艺术领域的应用 Selina Khan, Nanne van Noord http://arxiv.org/pdf/2410.12369v1 null
2024-10-16 PAPL-SLAM: Principal Axis-Anchored Monocular Point-Line SLAM PAPL-SLAM:基于主轴锚定的单目点线SLAM系统 Guanghao Li, Yu Cao, Qi Chen, Yifan Yang, Jian Pu http://arxiv.org/pdf/2410.12324v1 null
2024-10-16 Controlled Automatic Task-Specific Synthetic Data Generation for Hallucination Detection 控制自动任务特定合成数据生成用于幻觉检测 Yong Xie, Karan Aggarwal, Aitzaz Ahmad, Stephen Lau http://arxiv.org/pdf/2410.12278v1 null
2024-10-16 Leveraging Spatial Attention and Edge Context for Optimized Feature Selection in Visual Localization 利用空间注意力和边缘上下文优化视觉定位中的特征选择 Nanda Febri Istighfarin, HyungGi Jo http://arxiv.org/pdf/2410.12240v1 null
2024-10-16 Evaluating Cascaded Methods of Vision-Language Models for Zero-Shot Detection and Association of Hardhats for Increased Construction Safety 评估视觉-语言模型级联方法在零样本检测和施工安全中安全帽的关联应用增强 Lucas Choi, Ross Greer http://arxiv.org/pdf/2410.12225v1 null
2024-10-16 Order-Aware Interactive Segmentation 顺序感知交互式分割算法 Bin Wang, Anwesa Choudhuri, Meng Zheng, Zhongpai Gao, Benjamin Planche, Andong Deng, Qin Liu, Terrence Chen, Ulas Bagci, Ziyan Wu http://arxiv.org/pdf/2410.12214v1 null
2024-10-16 CVCP-Fusion: On Implicit Depth Estimation for 3D Bounding Box Prediction CVCP-Fusion:面向3D边界框预测的隐式深度估计研究 Pranav Gupta, Rishabh Rengarajan, Viren Bankapur, Vedansh Mannem, Lakshit Ahuja, Surya Vijay, Kevin Wang http://arxiv.org/pdf/2410.11211v2 link
2024-10-16 Preserving Cardiac Integrity: A Topology-Infused Approach to Whole Heart Segmentation 保护心脏完整性:一种融合拓扑信息的全心脏分割方法 Chenyu Zhang, Wenxue Guan, Xiaodan Xing, Guang Yang http://arxiv.org/pdf/2410.10551v2 null
2024-10-16 Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP 语义标记重加权:用于CLIP中的可解释可控文本嵌入 Eunji Kim, Kyuhong Shim, Simyung Chang, Sungroh Yoon http://arxiv.org/pdf/2410.08469v2 null
2024-10-16 Delta-ICM: Entropy Modeling with Delta Function for Learned Image Compression Delta-ICM:基于Delta函数的熵建模在学习图像压缩中的应用 Takahiro Shindo, Taiju Watanabe, Yui Tatsumi, Hiroshi Watanabe http://arxiv.org/pdf/2410.07669v2 null
2024-10-16 Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual Classification 逻辑推理正则化:视觉分类中提高泛化能力的决策解释 Zhaorui Tan, Xi Yang, Qiufeng Wang, Anh Nguyen, Kaizhu Huang http://arxiv.org/pdf/2410.04492v3 link
2024-10-16 Key-Grid: Unsupervised 3D Keypoints Detection using Grid Heatmap Features 键格:使用网格热图特征的无需监督3D关键点检测方法 Chengkai Hou, Zhengrong Xue, Bingyang Zhou, Jinghan Ke, Lin Shao, Huazhe Xu http://arxiv.org/pdf/2410.02237v2 null
2024-10-16 See Where You Read with Eye Gaze Tracking and Large Language Model 基于视线追踪与大型语言模型的可视化阅读位置分析 Sikai Yang, Gang Yan http://arxiv.org/pdf/2409.19454v2 null
2024-10-16 Active Fake: DeepFake Camouflage 深度伪造伪装:主动伪造 Pu Sun, Honggang Qi, Yuezun Li http://arxiv.org/pdf/2409.03200v2 null
2024-10-16 ViLReF: An Expert Knowledge Enabled Vision-Language Retinal Foundation Model ViLReF:一种基于专家知识的视语言视网膜基础模型 Shengzhu Yang, Jiawei Du, Jia Guo, Weihang Zhang, Hanruo Liu, Huiqi Li, Ningli Wang http://arxiv.org/pdf/2408.10894v3 link
2024-10-16 VrdONE: One-stage Video Visual Relation Detection VrdONE:单阶段视频视觉关系检测 Xinjie Jiang, Chenxi Zheng, Xuemiao Xu, Bangzhen Liu, Weiying Zheng, Huaidong Zhang, Shengfeng He http://arxiv.org/pdf/2408.09408v2 link
2024-10-16 Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation 超-YOLO:当视觉目标检测遇见超图计算 Yifan Feng, Jiangang Huang, Shaoyi Du, Shihui Ying, Jun-Hai Yong, Yipeng Li, Guiguang Ding, Rongrong Ji, Yue Gao http://arxiv.org/pdf/2408.04804v2 link
2024-10-16 AssemAI: Interpretable Image-Based Anomaly Detection for Manufacturing Pipelines AssemAI: 面向制造管道的可解释基于图像异常检测 Renjith Prasad, Chathurangi Shyalika, Ramtin Zand, Fadi El Kalach, Revathy Venkataramanan, Ramy Harik, Amit Sheth http://arxiv.org/pdf/2408.02181v2 null
2024-10-16 DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training DNTextSpotter: 通过改进去噪训练实现任意形状场景文本检测 Yu Xie, Qian Qiao, Jun Gao, Tianxiang Wu, Jiaqing Fan, Yue Zhang, Jielei Zhang, Huyang Sun http://arxiv.org/pdf/2408.00355v2 null
2024-10-16 Vision-Based Adaptive Robotics for Autonomous Surface Crack Repair 基于视觉的自适应机器人技术用于自主表面裂缝修复 Joshua Genova, Eric Cabrera, Vedhus Hoskere http://arxiv.org/pdf/2407.16874v2 null
2024-10-16 Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation 动态调整ViT适配中的参数与推理效率 Wangbo Zhao, Jiasheng Tang, Yizeng Han, Yibing Song, Kai Wang, Gao Huang, Fan Wang, Yang You http://arxiv.org/pdf/2403.11808v2 link
2024-10-16 Zero-shot Generalizable Incremental Learning for Vision-Language Object Detection 零样本泛化增量学习在视觉-语言目标检测中的应用 Jieren Deng, Haojian Zhang, Kun Ding, Jianhua Hu, Xingxuan Zhang, Yunkuan Wang http://arxiv.org/pdf/2403.01680v3 null
2024-10-16 MixedNUTS: Training-Free Accuracy-Robustness Balance via Nonlinearly Mixed Classifiers MixedNUTS:通过非线性混合分类器实现训练无关的准确性与鲁棒性平衡 Yatong Bai, Mo Zhou, Vishal M. Patel, Somayeh Sojoudi http://arxiv.org/pdf/2402.02263v5 link
2024-10-16 Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration 自监督学习下的激光雷达三维点云与二维-三维神经校准 Yifan Zhang, Siyu Ren, Junhui Hou, Jinjian Wu, Yixuan Yuan, Guangming Shi http://arxiv.org/pdf/2401.12452v3 link

OCR

Publish Date Title Title_CN Authors PDF Code
2024-10-16 ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training ReLayout:基于布局增强的预训练迈向现实世界文档理解 Zhouqiang Jiang, Bowen Wang, Junhao Chen, Yuta Nakashima http://arxiv.org/pdf/2410.10471v2 null

图像理解

Publish Date Title Title_CN Authors PDF Code
2024-10-16 QueensCAMP: an RGB-D dataset for robust Visual SLAM 女王营地:用于鲁棒视觉SLAM的RGB-D数据集 Hudson M. S. Bruno, Esther L. Colombini, Sidney N. Givigi Jr http://arxiv.org/pdf/2410.12520v1 null
2024-10-16 Depth Estimation From Monocular Images With Enhanced Encoder-Decoder Architecture 基于增强型编码器-解码器架构的单目图像深度估计 Dabbrata Das, Argho Deb Das, Farhan Sadaf http://arxiv.org/pdf/2410.11610v2 null

LLM

Publish Date Title Title_CN Authors PDF Code
2024-10-16 Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models 跨模态安全机制在大规模视觉-语言模型中的迁移 Shicheng Xu, Liang Pang, Yunchang Zhu, Huawei Shen, Xueqi Cheng http://arxiv.org/pdf/2410.12662v1 null
2024-10-16 Exploring Model Kinship for Merging Large Language Models 探索模型亲缘关系以合并大型语言模型 Yedi Hu, Yunzhi Yao, Ningyu Zhang, Shumin Deng, Huajun Chen http://arxiv.org/pdf/2410.12613v1 null
2024-10-16 Consistency Calibration: Improving Uncertainty Calibration via Consistency among Perturbed Neighbors 一致性校准:通过扰动邻居间一致性改善不确定性校准 Linwei Tao, Haolan Guo, Minjing Dong, Chang Xu http://arxiv.org/pdf/2410.12295v1 null

Transformer

Publish Date Title Title_CN Authors PDF Code
2024-10-16 FaceChain-FACT: Face Adapter with Decoupled Training for Identity-preserved Personalization FaceChain-FACT: 面部适配器与解耦训练在保持身份隐私的个性化中的应用 Cheng Yu, Haoyu Xie, Lei Shang, Yang Liu, Jun Dan, Baigui Sun, Liefeng Bo http://arxiv.org/pdf/2410.12312v1 null
2024-10-16 Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models 专家混合模型的个性化改造:面向视觉-语言模型的联邦提示学习 Jun Luo, Chen Chen, Shandong Wu http://arxiv.org/pdf/2410.10114v2 null
2024-10-16 BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way BroadWay:以无训练方式提升您的文本到视频生成模型性能 Jiazi Bu, Pengyang Ling, Pan Zhang, Tong Wu, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang http://arxiv.org/pdf/2410.06241v2 null
2024-10-16 Progressive Retinal Image Registration via Global and Local Deformable Transformations 渐进式视网膜图像配准:全局与局部可变形变换方法 Yepeng Liu, Baosheng Yu, Tian Chen, Yuliang Gu, Bo Du, Yongchao Xu, Jun Cheng http://arxiv.org/pdf/2409.01068v2 link
2024-10-16 Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization 增强视觉-语言模型的鲁棒性:通过正交性学习与自正则化方法 Jinlong Li, Dong Zhao, Zequn Jie, Elisa Ricci, Lin Ma, Nicu Sebe http://arxiv.org/pdf/2407.08374v3 null
2024-10-16 Ultra-High-Definition Image Restoration: New Benchmarks and A Dual Interaction Prior-Driven Solution 超高清图像恢复:新基准与双交互先验驱动解决方案 Liyan Wang, Cong Wang, Jinshan Pan, Xiaofeng Liu, Weixiang Zhou, Xiaoran Sun, Wei Wang, Zhixun Su http://arxiv.org/pdf/2406.13607v4 link
2024-10-16 Knowledge Circuits in Pretrained Transformers 预训练变压器中的知识电路 Yunzhi Yao, Ningyu Zhang, Zekun Xi, Mengru Wang, Ziwen Xu, Shumin Deng, Huajun Chen http://arxiv.org/pdf/2405.17969v2 link
2024-10-16 In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation Transformer之眼:用于自我中心注视估计的全局-局部相关性分析 Bolin Lai, Miao Liu, Fiona Ryan, James M. Rehg http://arxiv.org/pdf/2208.04464v3 null

3D/CG

Publish Date Title Title_CN Authors PDF Code
2024-10-16 Gravity-aligned Rotation Averaging with Circular Regression 基于圆周回归的重力对齐旋转平均方法 Linfei Pan, Marc Pollefeys, Dániel Baráth http://arxiv.org/pdf/2410.12763v1 null
2024-10-16 Optimizing 3D Geometry Reconstruction from Implicit Neural Representations 优化基于隐式神经表示的三维几何重建 Shen Fan, Przemyslaw Musialski http://arxiv.org/pdf/2410.12725v1 null
2024-10-16 3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation 3DIS:基于深度驱动的解耦实例合成用于文本到图像生成 Dewei Zhou, Ji Xie, Zongxin Yang, Yi Yang http://arxiv.org/pdf/2410.12669v1 null
2024-10-16 MambaPainter: Neural Stroke-Based Rendering in a Single Step MambaPainter:单步神经笔触渲染技术 Tomoya Sawada, Marie Katsurai http://arxiv.org/pdf/2410.12524v1 null
2024-10-16 Triplet: Triangle Patchlet for Mesh-Based Inverse Rendering and Scene Parameters Approximation 三角剖分:基于网格的逆渲染与场景参数逼近的三角形补丁方法 Jiajie Yang http://arxiv.org/pdf/2410.12414v1 null
2024-10-16 LoD-Loc: Aerial Visual Localization using LoD 3D Map with Neural Wireframe Alignment LoD-Loc: 利用带神经线框对齐的LoD三维地图进行航空视觉定位 Juelin Zhu, Shen Yan, Long Wang, Shengyue Zhang, Yu Liu, Maojun Zhang http://arxiv.org/pdf/2410.12269v1 null
2024-10-16 ScaleFlow++: Robust and Accurate Estimation of 3D Motion from Video ScaleFlow++:视频中的稳健与精确三维运动估计 Han Ling, Quansen Sun http://arxiv.org/pdf/2407.09797v2 null
2024-10-16 Topological reconstruction of sampled surfaces via Morse theory 通过莫尔斯理论进行采样曲面的拓扑重建 Franco Coltraro, Jaume Amorós, Maria Alberich-Carramiñana, Carme Torras http://arxiv.org/pdf/2405.17257v2 null
2024-10-16 No Bells, Just Whistles: Sports Field Registration by Leveraging Geometric Properties 无铃声,唯有哨声:利用几何属性进行运动场注册登记 Marc Gutiérrez-Pérez, Antonio Agudo http://arxiv.org/pdf/2404.08401v2 link

其他

Publish Date Title Title_CN Authors PDF Code
2024-10-16 Rethinking Visual Counterfactual Explanations Through Region Constraint 重新审视通过区域约束的可视化反事实解释 Bartlomiej Sobieski, Jakub Grzywaczewski, Bartlomiej Sadlej, Matthew Tivnan, Przemyslaw Biecek http://arxiv.org/pdf/2410.12591v1 null
2024-10-16 A Primal-dual algorithm for image reconstruction with ICNNs 图像重建的原始-对偶算法与ICNNs Hok Shing Wong, Matthias J. Ehrhardt, Subhadip Mukherjee http://arxiv.org/pdf/2410.12441v1 null
2024-10-16 AdaCropFollow: Self-Supervised Online Adaptation for Visual Under-Canopy Navigation 自适应作物跟随:视觉冠下导航的自监督在线适应方法 Arun N. Sivakumar, Federico Magistri, Mateus V. Gasparino, Jens Behley, Cyrill Stachniss, Girish Chowdhary http://arxiv.org/pdf/2410.12411v1 null
2024-10-16 Beyond Coarse-Grained Matching in Video-Text Retrieval 视频-文本检索中超越粗粒度匹配策略 Aozhu Chen, Hazel Doughty, Xirong Li, Cees G. M. Snoek http://arxiv.org/pdf/2410.12407v1 null
2024-10-16 De-Identification of Medical Imaging Data: A Comprehensive Tool for Ensuring Patient Privacy 医疗影像数据去标识化:确保患者隐私的全面工具 Moritz Rempe, Lukas Heine, Constantin Seibold, Fabian Hörst, Jens Kleesiek http://arxiv.org/pdf/2410.12402v1 null
2024-10-16 Stylistic Multi-Task Analysis of Ukiyo-e Woodblock Prints 风格多样化的任务分析:浮世绘木版画研究 Selina Khan, Nanne van Noord http://arxiv.org/pdf/2410.12379v1 null
2024-10-16 DAT: Improving Adversarial Robustness via Generative Amplitude Mix-up in Frequency Domain DAT:通过频率域生成振幅混合提升对抗鲁棒性 Fengpeng Li, Kemou Li, Haiwei Wu, Jinyu Tian, Jiantao Zhou http://arxiv.org/pdf/2410.12307v1 null
2024-10-16 Fool Me Once? Contrasting Textual and Visual Explanations in a Clinical Decision-Support Setting 一次被骗?在临床决策支持环境中对比文本与视觉解释 Maxime Kayser, Bayar Menzat, Cornelius Emde, Bogdan Bercean, Alex Novak, Abdala Espinosa, Bartlomiej W. Papiez, Susanne Gaube, Thomas Lukasiewicz, Oana-Maria Camburu http://arxiv.org/pdf/2410.12284v1 null
2024-10-16 Advancing Healthcare: Innovative ML Approaches for Improved Medical Imaging in Data-Constrained Environments 推进医疗健康:数据受限环境下提升医学影像的创新的ML方法 Al Amin, Kamrul Hasan, Saleh Zein-Sabatto, Liang Hong, Sachin Shetty, Imtiaz Ahmed, Tariqul Islam http://arxiv.org/pdf/2410.12245v1 null
2024-10-16 Test-time adaptation for image compression with distribution regularization 图像压缩的测试时自适应方法与分布正则化 Kecheng Chen, Pingping Zhang, Tiexin Qin, Shiqi Wang, Hong Yan, Haoliang Li http://arxiv.org/pdf/2410.12191v1 null
2024-10-16 PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation PIVOT-R:面向机器人操作的原语驱动关注路径的世界模型 Kaidong Zhang, Pengzhen Ren, Bingqian Lin, Junfan Lin, Shikui Ma, Hang Xu, Xiaodan Liang http://arxiv.org/pdf/2410.10394v2 null
2024-10-16 MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting MuseTalk:基于潜在空间修复的实时高质量唇同步技术 Yue Zhang, Minhao Liu, Zhaokang Chen, Bin Wu, Yubin Zeng, Chao Zhan, Yingjie He, Junxin Huang, Wenjiang Zhou http://arxiv.org/pdf/2410.10122v2 link
2024-10-16 Tri-Cam: Practical Eye Gaze Tracking via Camera Network Tri-Cam:基于摄像头网络的实用眼动追踪技术 Sikai Yang http://arxiv.org/pdf/2409.19554v2 null
2024-10-16 Video-to-Audio Generation with Hidden Alignment 隐藏对齐的视频到音频生成技术 Manjie Xu, Chenxing Li, Xinyi Tu, Yong Ren, Rilin Chen, Yu Gu, Wei Liang, Dong Yu http://arxiv.org/pdf/2407.07464v2 null
2024-10-16 Scaling Up Personalized Image Aesthetic Assessment via Task Vector Customization 扩大个性化图像美学评估规模:基于任务向量定制化 Jooyeol Yun, Jaegul Choo http://arxiv.org/pdf/2407.07176v2 link
2024-10-16 Understanding Figurative Meaning through Explainable Visual Entailment 通过可解释视觉蕴含理解隐喻意义 Arkadiy Saakyan, Shreyas Kulkarni, Tuhin Chakrabarty, Smaranda Muresan http://arxiv.org/pdf/2405.01474v2 link
2024-10-16 Adaptive Convolutional Neural Network for Image Super-resolution 自适应卷积神经网络在图像超分辨率中的应用 Chunwei Tian, Xuanyu Zhang, Tao Wang, Yongjun Zhang, Qi Zhu, Chia-Wen Lin http://arxiv.org/pdf/2402.15704v4 link
2024-10-16 Latent Inversion with Timestep-aware Sampling for Training-free Non-rigid Editing 隐变量反转与时间步感知采样在无训练非刚性编辑中的应用 Yunji Jung, Seokju Lee, Tair Djanibekov, Hyunjung Shim, Jong Chul Ye http://arxiv.org/pdf/2402.08601v3 null