Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-16 | Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts | Models 中文翻译为:扩散模型上的元遗忘学习:防止重学前遗忘概念 | Hongcheng Gao, Tianyu Pang, Chao Du, Taihang Hu, Zhijie Deng, Min Lin | http://arxiv.org/pdf/2410.12777v1 | null |
2024-10-16 | SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation | SAFREE:无训练自适应保护机制,保障安全文本到图像与视频生成的安全性 | Jaehong Yoon, Shoubin Yu, Vaidehi Patil, Huaxiu Yao, Mohit Bansal | http://arxiv.org/pdf/2410.12761v1 | null |
2024-10-16 | Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization | 嵌入道德思维:通过轻量级价值优化对文本到图像合成进行校准 | Xingqi Wang, Xiaoyuan Yi, Xing Xie, Jia Jia | http://arxiv.org/pdf/2410.12700v1 | null |
2024-10-16 | AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing | An adaptive Drag:基于扩散的图像编辑中的语义驱动拖拽 | DuoSheng Chen, Binghui Chen, Yifeng Geng, Liefeng Bo | http://arxiv.org/pdf/2410.12696v1 | null |
2024-10-16 | One Step Diffusion via Shortcut Models | 一步扩散:通过捷径模型的快速实现 | Kevin Frans, Danijar Hafner, Sergey Levine, Pieter Abbeel | http://arxiv.org/pdf/2410.12557v1 | null |
2024-10-16 | Shaping a Stabilized Video by Mitigating Unintended Changes for Concept-Augmented Video Editing | 概念增强型视频编辑中通过减轻非预期变化塑造稳定视频的方法研究 | Mingce Guo, Jingxuan He, Shengeng Tang, Zhangye Wang, Lechao Cheng | http://arxiv.org/pdf/2410.12526v1 | null |
2024-10-16 | DH-VTON: Deep Text-Driven Virtual Try-On via Hybrid Attention Learning | DH-VTON: 基于混合注意力学习的深度文本驱动虚拟试衣技术 | Jiabao Wei, Zhiyuan Ma | http://arxiv.org/pdf/2410.12501v1 | null |
2024-10-16 | Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective | 稳定图像自回归建模的潜在空间:一种统一视角 | Yongxin Zhu, Bocheng Li, Hang Zhang, Xin Li, Linli Xu, Lidong Bing | http://arxiv.org/pdf/2410.12490v1 | null |
2024-10-16 | Synthetic Augmentation for Anatomical Landmark Localization using DDPMs | 合成增强用于基于DDPMs的解剖学地标定位 | Arnela Hadzic, Lea Bogensperger, Simon Johannes Joham, Martin Urschler | http://arxiv.org/pdf/2410.12489v1 | null |
2024-10-16 | GAN Based Top-Down View Synthesis in Reinforcement Learning Environments | 基于GAN的强化学习环境中的顶部视图合成 | Usama Younus, Vinoj Jayasundara, Shivam Mishra, Suleyman Aslan | http://arxiv.org/pdf/2410.12372v1 | null |
2024-10-16 | Improved Anomaly Detection through Conditional Latent Space VAE Ensembles | 条件潜在空间VAE集成在异常检测中的改进方法 | Oskar Åström, Alexandros Sopasakis | http://arxiv.org/pdf/2410.12328v1 | null |
2024-10-16 | Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond | 分解融合:一种自监督图像融合及扩展方法 | Pengwei Liang, Junjun Jiang, Qing Ma, Xianming Liu, Jiayi Ma | http://arxiv.org/pdf/2410.12274v1 | null |
2024-10-16 | DaDiff: Domain-aware Diffusion Model for Nighttime UAV Tracking | DaDiff: 针对夜间无人机追踪的领域感知扩散模型 | Haobo Zuo, Changhong Fu, Guangze Zheng, Liangliang Yao, Kunhan Lu, Jia Pan | http://arxiv.org/pdf/2410.12270v1 | null |
2024-10-16 | Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices | 高效扩散模型:从原理到实践的全面综述 | Zhiyuan Ma, Yuzhu Zhang, Guoli Jia, Liangliang Zhao, Yichao Ma, Mingjie Ma, Gaofeng Liu, Kaiyan Zhang, Jianjun Li, Bowen Zhou | http://arxiv.org/pdf/2410.11795v2 | null |
2024-10-16 | Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction | Lotus:基于扩散的高质量密集预测视觉基础模型 | Jing He, Haodong Li, Wei Yin, Yixun Liang, Leheng Li, Kaiqiang Zhou, Hongbo Zhang, Bingbing Liu, Ying-Cong Chen | http://arxiv.org/pdf/2409.18124v3 | null |
2024-10-16 | Sample what you cant compress | 无法压缩的样本,请进行采样 | Vighnesh Birodkar, Gabriel Barcik, James Lyon, Sergey Ioffe, David Minnen, Joshua V. Dillon | http://arxiv.org/pdf/2409.02529v3 | null |
2024-10-16 | A3D: Does Diffusion Dream about 3D Alignment? | A3D:扩散模型是否梦寐以求三维对齐? | Savva Ignatyev, Nina Konovalova, Daniil Selikhanovych, Oleg Voynov, Nikolay Patakin, Ilya Olkov, Dmitry Senushkin, Alexey Artemov, Anton Konushin, Alexander Filippov, et.al. | http://arxiv.org/pdf/2406.15020v3 | null |
2024-10-16 | Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians | Mini-Splatting:使用受限数量的高斯函数表示场景 | Guangchi Fang, Bing Wang | http://arxiv.org/pdf/2403.14166v3 | link |
2024-10-16 | AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data | AnimateLCM:无需个性化视频数据的计算高效个性化风格视频生成 | Fu-Yun Wang, Zhaoyang Huang, Weikang Bian, Xiaoyu Shi, Keqiang Sun, Guanglu Song, Yu Liu, Hongsheng Li | http://arxiv.org/pdf/2402.00769v3 | link |
2024-10-16 | Generative Models: What Do They Know? Do They Know Things? Let's Find Out! | 生成模型:它们知道什么?它们是否了解事物?让我们一探究竟! | Xiaodan Du, Nicholas Kolkin, Greg Shakhnarovich, Anand Bhattad | http://arxiv.org/pdf/2311.17137v3 | null |
2024-10-16 | Reverse Stable Diffusion: What prompt was used to generate this image? | 逆稳定扩散:生成这张图片使用了什么提示? | Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Mubarak Shah | http://arxiv.org/pdf/2308.01472v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-16 | Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models | 视觉语言模型的测试时泛化:双重原型演化方法 | Ce Zhang, Simon Stepputtis, Katia Sycara, Yaqi Xie | http://arxiv.org/pdf/2410.12790v1 | null |
2024-10-16 | The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio | 多模态之咒:评估大型多模态模型在语言、视觉和音频模态下的幻觉生成 | Sicong Leng, Yun Xing, Zesen Cheng, Yang Zhou, Hang Zhang, Xin Li, Deli Zhao, Shijian Lu, Chunyan Miao, Lidong Bing | http://arxiv.org/pdf/2410.12787v1 | null |
2024-10-16 | WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines | "全球美食:大规模多语言多文化全球美食视觉问答基准" | Genta Indra Winata, Frederikus Hudi, Patrick Amadeus Irawan, David Anugraha, Rifki Afina Putri, Yutong Wang, Adam Nohejl, Ubaidillah Ariq Prathama, Nedjma Ousidhoum, Afifa Amriani, et.al. | http://arxiv.org/pdf/2410.12705v1 | null |
2024-10-16 | VividMed: Vision Language Model with Versatile Visual Grounding for Medicine | VividMed:面向医学的具有多能视觉接地能力的视觉语言模型 | Lingxiao Luo, Bingda Tang, Xuanzhong Chen, Rong Han, Ting Chen | http://arxiv.org/pdf/2410.12694v1 | null |
2024-10-16 | DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception | DocLayout-YOLO:通过多样化合成数据与全局到局部自适应感知提升文档布局分析性能 | Zhiyuan Zhao, Hengrui Kang, Bin Wang, Conghui He | http://arxiv.org/pdf/2410.12628v1 | null |
2024-10-16 | Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion | Cocoon:具有不确定性感知的稳健多模态感知传感器融合技术 | Minkyoung Cho, Yulong Cao, Jiachen Sun, Qingzhao Zhang, Marco Pavone, Jeong Joon Park, Heng Yang, Z. Morley Mao | http://arxiv.org/pdf/2410.12592v1 | null |
2024-10-16 | FTII-Bench: A Comprehensive Multimodal Benchmark for Flow Text with Image Insertion | FTII-Bench:面向图文插入的全面多模态基准测试平台 | Jiacheng Ruan, Yebin Yang, Zehao Lin, Feiyu Xiong, Zeyun Tang, Zhiyu Li | http://arxiv.org/pdf/2410.12564v1 | null |
2024-10-16 | HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks | HumanEval-V:通过编程任务评估大型多模态模型的视觉理解与推理能力 | Fengji Zhang, Linquan Wu, Huiyu Bai, Guancheng Lin, Xiao Li, Xiao Yu, Yue Wang, Bei Chen, Jacky Keung | http://arxiv.org/pdf/2410.12381v1 | null |
2024-10-16 | ARIC: An Activity Recognition Dataset in Classroom Surveillance Images | ARIC:一种教室监控图像中的活动识别数据集 | Linfeng Xu, Fanman Meng, Qingbo Wu, Lili Pan, Heqian Qiu, Lanxiao Wang, Kailong Chen, Kanglei Geng, Yilei Qian, Haojie Wang, et.al. | http://arxiv.org/pdf/2410.12337v1 | null |
2024-10-16 | MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs | MC-Bench:面向MLLMs时代的多上下文视觉接地基准测试 | Yunqiu Xu, Linchao Zhu, Yi Yang | http://arxiv.org/pdf/2410.12332v1 | null |
2024-10-16 | Sparse Prototype Network for Explainable Pedestrian Behavior Prediction | 稀疏原型网络在可解释行人行为预测中的应用 | Yan Feng, Alexander Carballo, Kazuya Takeda | http://arxiv.org/pdf/2410.12195v1 | null |
2024-10-16 | Unveiling the Limits of Alignment: Multi-modal Dynamic Local Fusion Network and A Benchmark for Unaligned RGBT Video Object Detection | 揭示对齐极限:多模态动态局部融合网络及非对齐RGBT视频目标检测基准 | Qishun Wang, Zhengzheng Tu, Kunpeng Wang, Le Gu, Chuanwang Guo | http://arxiv.org/pdf/2410.12143v1 | null |
2024-10-16 | Efficient and Effective Universal Adversarial Attack against Vision-Language Pre-training Models | 高效且有效的针对视觉-语言预训练模型的通用对抗攻击策略 | Fan Yang, Yihao Huang, Kailong Wang, Ling Shi, Geguang Pu, Yang Liu, Haoyu Wang | http://arxiv.org/pdf/2410.11639v2 | null |
2024-10-16 | Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities | Mini-Omni2:迈向具有视觉、语音和双向通信能力的开源GPT-4o | Zhifei Xie, Changqiao Wu | http://arxiv.org/pdf/2410.11190v2 | null |
2024-10-16 | Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs | 自由视频-LLM:提示引导的视觉感知实现高效无训练视频LLM | Kai Han, Jianyuan Guo, Yehui Tang, Wei He, Enhua Wu, Yunhe Wang | http://arxiv.org/pdf/2410.10441v2 | link |
2024-10-16 | Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate | 解密大型视觉-语言模型中的跨模态对齐与模态融合速率 | Qidong Huang, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Jiaqi Wang, Dahua Lin, Weiming Zhang, Nenghai Yu | http://arxiv.org/pdf/2410.07167v2 | link |
2024-10-16 | On Large Uni- and Multi-modal Models for Unsupervised Classification of Social Media Images: Nature's Contribution to People as a case study | 大规模单模态与多模态模型在社交媒体图片无监督分类中的应用:以《自然》对人类贡献为例的研究 | Rohaifa Khaldi, Domingo Alcaraz-Segura, Ignacio Sánchez-Herrera, Javier Martinez-Lopez, Carlos Javier Navarro, Siham Tabik | http://arxiv.org/pdf/2410.00275v2 | null |
2024-10-16 | InterACT: Inter-dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation | InterACT:基于层次注意力Transformer的双臂操作中互依赖感知动作分块方法 | Andrew Lee, Ian Chuang, Ling-Yuan Chen, Iman Soltani | http://arxiv.org/pdf/2409.07914v3 | null |
2024-10-16 | MERLIN: Multimodal Embedding Refinement via LLM-based Iterative Navigation for Text-Video Retrieval-Rerank Pipeline | MERLIN:基于LLM的迭代导航多模态嵌入优化方法,用于文本-视频检索重排管道 | Donghoon Han, Eunhwan Park, Gisang Lee, Adam Lee, Nojun Kwak | http://arxiv.org/pdf/2407.12508v2 | null |
2024-10-16 | AIC MLLM: Autonomous Interactive Correction MLLM for Robust Robotic Manipulation | 自主交互校正多级别学习模型(AIC MLLM):用于稳健机器人操作的自治多级别学习模型 | Chuyan Xiong, Chengyu Shen, Xiaoqi Li, Kaichen Zhou, Jiaming Liu, Ruiping Wang, Hao Dong | http://arxiv.org/pdf/2406.11548v5 | null |
2024-10-16 | MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models | MFC-Bench:大规模视觉-语言模型的多模态事实核查基准测试 | Shengkang Wang, Hongzhan Lin, Ziyang Luo, Zhen Ye, Guang Chen, Jing Ma | http://arxiv.org/pdf/2406.11288v2 | link |
2024-10-16 | Instruction-Guided Visual Masking | 指导性视觉遮罩的指令引导研究 | Jinliang Zheng, Jianxiong Li, Sijie Cheng, Yinan Zheng, Jiaming Li, Jihao Liu, Yu Liu, Jingjing Liu, Xianyuan Zhan | http://arxiv.org/pdf/2405.19783v2 | link |
2024-10-16 | Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography | 开发基于多模态数据集的三维计算机断层扫描通用基础模型 | Ibrahim Ethem Hamamci, Sezgin Er, Furkan Almas, Ayse Gulnihan Simsek, Sevval Nil Esirgun, Irem Dogan, Muhammed Furkan Dasdelen, Omer Faruk Durugol, Bastian Wittmann, Tamaz Amiranashvili, et.al. | http://arxiv.org/pdf/2403.17834v2 | link |
2024-10-16 | AdaMSS: Adaptive Multi-Modality Segmentation-to-Survival Learning for Survival Outcome Prediction from PET/CT Images | AdaMSS:基于PET/CT图像的自适应多模态分割至生存学习用于生存结果预测 | Mingyuan Meng, Bingxin Gu, Michael Fulham, Shaoli Song, Dagan Feng, Lei Bi, Jinman Kim | http://arxiv.org/pdf/2305.09946v3 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-16 | EG-HumanNeRF: Efficient Generalizable Human NeRF Utilizing Human Prior for Sparse View | EG-HumanNeRF:利用人类先验知识的高效泛化人类NeRF在稀疏视图中的应用 | Zhaorong Wang, Yoshihiro Kanamori, Yuki Endo | http://arxiv.org/pdf/2410.12242v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-16 | Gaussian Primitives for Deformable Image Registration | 高斯基元在可变形图像配准中的应用 | Jihe Li, Xiang Liu, Fabian Zhang, Xia Li, Xixin Cao, Ye Zhang, Joachim Buhmann | http://arxiv.org/pdf/2406.03394v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-16 | Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats | 长序列大范围高斯喷射重建模型:长-LRM | Chen Ziwen, Hao Tan, Kai Zhang, Sai Bi, Fujun Luan, Yicong Hong, Li Fuxin, Zexiang Xu | http://arxiv.org/pdf/2410.12781v1 | null |
2024-10-16 | Towards Flexible and Efficient Diffusion Low Light Enhancer | 柔性高效扩散型低光照增强器研究进展 | Guanzhou Lan, Qianli Ma, Yuqi Yang, Zhigang Wang, Dong Wang, Yuan Yuan, Bin Zhao | http://arxiv.org/pdf/2410.12346v1 | null |
2024-10-16 | TAS: Distilling Arbitrary Teacher and Student via a Hybrid Assistant | TAS:通过混合助手实现任意教师与学生模型的蒸馏 | Guopeng Li, Qiang Wang, Ke Yan, Shouhong Ding, Yuan Gao, Gui-Song Xia | http://arxiv.org/pdf/2410.12342v1 | null |
2024-10-16 | Optimizing YOLOv5s Object Detection through Knowledge Distillation algorithm | 优化YOLOv5s目标检测通过知识蒸馏算法 | Guanming Huang, Aoran Shen, Yuxiang Hu, Junliang Du, Jiacheng Hu, Yingbin Liang | http://arxiv.org/pdf/2410.12259v1 | null |
2024-10-16 | TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration | TransAgent:基于异构智能体协作的视觉-语言基础模型迁移学习 | Yiwei Guo, Shaobin Zhuang, Kunchang Li, Yu Qiao, Yali Wang | http://arxiv.org/pdf/2410.12183v1 | null |
2024-10-16 | Dual-Model Distillation for Efficient Action Classification with Hybrid Edge-Cloud Solution | 双模型蒸馏用于混合边缘-云解决方案的高效动作分类 | Timothy Wei, Hsien Xin Peng, Elaine Xu, Bryan Zhao, Lei Ding, Diji Yang | http://arxiv.org/pdf/2410.12165v1 | null |
2024-10-16 | SAM-Guided Masked Token Prediction for 3D Scene Understanding | SAM引导的掩码token预测在3D场景理解中的应用 | Zhimin Chen, Liang Yang, Yingwei Li, Longlong Jing, Bing Li | http://arxiv.org/pdf/2410.12158v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-16 | Towards Zero-Shot Camera Trap Image Categorization | 零样本相机陷阱图像分类研究进展 | Jiří Vyskočil, Lukas Picek | http://arxiv.org/pdf/2410.12769v1 | null |
2024-10-16 | PND-Net: Plant Nutrition Deficiency and Disease Classification using Graph Convolutional Network | PND-Net:基于图卷积网络实现植物营养缺乏与病害分类 | Asish Bera, Debotosh Bhattacharjee, Ondrej Krejcar | http://arxiv.org/pdf/2410.12742v1 | null |
2024-10-16 | RAFA-Net: Region Attention Network For Food Items And Agricultural Stress Recognition | RAFA-Net:面向食品项与农业压力识别的区域注意力网络 | Asish Bera, Ondrej Krejcar, Debotosh Bhattacharjee | http://arxiv.org/pdf/2410.12718v1 | null |
2024-10-16 | MultiCamCows2024 -- A Multi-view Image Dataset for AI-driven Holstein-Friesian Cattle Re-Identification on a Working Farm | 多视角牛只识别数据集MultiCamCows2024:面向工作农场AI驱动的荷斯坦-弗里生牛再识别 | Phoenix Yu, Tilo Burghardt, Andrew W Dowsey, Neill W Campbell | http://arxiv.org/pdf/2410.12695v1 | null |
2024-10-16 | Machine Learning Approach to Brain Tumor Detection and Classification | 机器学习在脑肿瘤检测与分类中的应用方法 | Alice Oh, Inyoung Noh, Jian Choo, Jihoo Lee, Justin Park, Kate Hwang, Sanghyeon Kim, Soo Min Oh | http://arxiv.org/pdf/2410.12692v1 | null |
2024-10-16 | Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2 | 自动从自由文本中映射解剖学标志:基于Llama-2的大型语言模型的见解 | Mohamad Abdi, Gerardo Hemosillo Valadez, Halid Ziya Yerebakan | http://arxiv.org/pdf/2410.12686v1 | null |
2024-10-16 | MambaBEV: An efficient 3D detection model with Mamba2 | 曼巴BEV:一种基于Mamba2的高效3D检测模型 | Zihan You, Hao Wang, Qichao Zhao, Jinxiang Wang | http://arxiv.org/pdf/2410.12673v1 | null |
2024-10-16 | Cascade learning in multi-task encoder-decoder networks for concurrent bone segmentation and glenohumeral joint assessment in shoulder CT scans | 多任务编码器-解码器网络中的级联学习在肩部CT扫描中的并发骨分割与盂肱关节评估 | Luca Marsilio, Davide Marzorati, Matteo Rossi, Andrea Moglia, Luca Mainardi, Alfonso Manzotti, Pietro Cerveri | http://arxiv.org/pdf/2410.12641v1 | null |
2024-10-16 | CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training | CMAL:一种新颖的跨模态关联学习框架用于视觉-语言预训练 | Zhiyuan Ma, Jianjun Li, Guohui Li, Kaiyan Huang | http://arxiv.org/pdf/2410.12595v1 | null |
2024-10-16 | From Lab to Pocket: A Novel Continual Learning-based Mobile Application for Screening COVID-19 | 实验室到口袋:一种基于持续学习的筛查COVID-19新型移动应用程序 | Danny Falero, Muhammad Ashad Kabir, Nusrat Homaira | http://arxiv.org/pdf/2410.12589v1 | null |
2024-10-16 | Self-DenseMobileNet: A Robust Framework for Lung Nodule Classification using Self-ONN and Stacking-based Meta-Classifier | 自我密集MobileNet:一种基于自我ONN和堆叠元分类器的鲁棒肺结节分类框架 | Md. Sohanur Rahman, Muhammad E. H. Chowdhury, Hasib Ryan Rahman, Mosabber Uddin Ahmed, Muhammad Ashad Kabir, Sanjiban Sekhar Roy, Rusab Sarmun | http://arxiv.org/pdf/2410.12584v1 | null |
2024-10-16 | Adaptive Prompt Learning with SAM for Few-shot Scanning Probe Microscope Image Segmentation | 自适应提示学习与SAM结合的少样本扫描探针显微镜图像分割方法 | Yao Shen, Ziwei Wei, Chunmeng Liu, Shuming Wei, Qi Zhao, Kaiyang Zeng, Guangyao Li | http://arxiv.org/pdf/2410.12562v1 | null |
2024-10-16 | Development of Image Collection Method Using YOLO and Siamese Network | 基于YOLO与Siamese网络的图像采集方法开发 | Chan Young Shin, Ah Hyun Lee, Jun Young Lee, Ji Min Lee, Soo Jin Park | http://arxiv.org/pdf/2410.12561v1 | null |
2024-10-16 | Evaluating Utility of Memory Efficient Medical Image Generation: A Study on Lung Nodule Segmentation | 评估内存高效医学图像生成的实用性:肺结节分割研究 | Kathrin Khadra, Utku Türkbey | http://arxiv.org/pdf/2410.12542v1 | null |
2024-10-16 | Mind the Gap Between Prototypes and Images in Cross-domain Finetuning | 跨域微调中原型与图像之间的差异值得关注 | Hongduan Tian, Feng Liu, Zhanke Zhou, Tongliang Liu, Chengqi Zhang, Bo Han | http://arxiv.org/pdf/2410.12474v1 | null |
2024-10-16 | Attention-Guided Perturbation for Consistency Regularization in Semi-Supervised Medical Image Segmentation | 注意力引导扰动在半监督医学图像分割中的一致性正则化研究 | Yuxuan Cheng, Chenxi Shao, Jie Ma, Guoliang Li | http://arxiv.org/pdf/2410.12419v1 | null |
2024-10-16 | Feature Augmentation for Self-supervised Contrastive Learning: A Closer Look | 特征增强的自监督对比学习:深入探究 | Yong Zhang, Rui Zhu, Shifeng Zhang, Xu Zhou, Shifeng Chen, Xiaofan Chen | http://arxiv.org/pdf/2410.12396v1 | null |
2024-10-16 | Real-time Stereo-based 3D Object Detection for Streaming Perception | 基于实时立体视觉的流式感知三维目标检测 | Changcai Li, Zonghua Gu, Gang Chen, Libo Huang, Wei Zhang, Huihui Zhou | http://arxiv.org/pdf/2410.12394v1 | null |
2024-10-16 | Context-Infused Visual Grounding for Art | 语境增强的视觉定位技术在艺术领域的应用 | Selina Khan, Nanne van Noord | http://arxiv.org/pdf/2410.12369v1 | null |
2024-10-16 | PAPL-SLAM: Principal Axis-Anchored Monocular Point-Line SLAM | PAPL-SLAM:基于主轴锚定的单目点线SLAM系统 | Guanghao Li, Yu Cao, Qi Chen, Yifan Yang, Jian Pu | http://arxiv.org/pdf/2410.12324v1 | null |
2024-10-16 | Controlled Automatic Task-Specific Synthetic Data Generation for Hallucination Detection | 控制自动任务特定合成数据生成用于幻觉检测 | Yong Xie, Karan Aggarwal, Aitzaz Ahmad, Stephen Lau | http://arxiv.org/pdf/2410.12278v1 | null |
2024-10-16 | Leveraging Spatial Attention and Edge Context for Optimized Feature Selection in Visual Localization | 利用空间注意力和边缘上下文优化视觉定位中的特征选择 | Nanda Febri Istighfarin, HyungGi Jo | http://arxiv.org/pdf/2410.12240v1 | null |
2024-10-16 | Evaluating Cascaded Methods of Vision-Language Models for Zero-Shot Detection and Association of Hardhats for Increased Construction Safety | 评估视觉-语言模型级联方法在零样本检测和施工安全中安全帽的关联应用增强 | Lucas Choi, Ross Greer | http://arxiv.org/pdf/2410.12225v1 | null |
2024-10-16 | Order-Aware Interactive Segmentation | 顺序感知交互式分割算法 | Bin Wang, Anwesa Choudhuri, Meng Zheng, Zhongpai Gao, Benjamin Planche, Andong Deng, Qin Liu, Terrence Chen, Ulas Bagci, Ziyan Wu | http://arxiv.org/pdf/2410.12214v1 | null |
2024-10-16 | CVCP-Fusion: On Implicit Depth Estimation for 3D Bounding Box Prediction | CVCP-Fusion:面向3D边界框预测的隐式深度估计研究 | Pranav Gupta, Rishabh Rengarajan, Viren Bankapur, Vedansh Mannem, Lakshit Ahuja, Surya Vijay, Kevin Wang | http://arxiv.org/pdf/2410.11211v2 | link |
2024-10-16 | Preserving Cardiac Integrity: A Topology-Infused Approach to Whole Heart Segmentation | 保护心脏完整性:一种融合拓扑信息的全心脏分割方法 | Chenyu Zhang, Wenxue Guan, Xiaodan Xing, Guang Yang | http://arxiv.org/pdf/2410.10551v2 | null |
2024-10-16 | Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP | 语义标记重加权:用于CLIP中的可解释可控文本嵌入 | Eunji Kim, Kyuhong Shim, Simyung Chang, Sungroh Yoon | http://arxiv.org/pdf/2410.08469v2 | null |
2024-10-16 | Delta-ICM: Entropy Modeling with Delta Function for Learned Image Compression | Delta-ICM:基于Delta函数的熵建模在学习图像压缩中的应用 | Takahiro Shindo, Taiju Watanabe, Yui Tatsumi, Hiroshi Watanabe | http://arxiv.org/pdf/2410.07669v2 | null |
2024-10-16 | Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual Classification | 逻辑推理正则化:视觉分类中提高泛化能力的决策解释 | Zhaorui Tan, Xi Yang, Qiufeng Wang, Anh Nguyen, Kaizhu Huang | http://arxiv.org/pdf/2410.04492v3 | link |
2024-10-16 | Key-Grid: Unsupervised 3D Keypoints Detection using Grid Heatmap Features | 键格:使用网格热图特征的无需监督3D关键点检测方法 | Chengkai Hou, Zhengrong Xue, Bingyang Zhou, Jinghan Ke, Lin Shao, Huazhe Xu | http://arxiv.org/pdf/2410.02237v2 | null |
2024-10-16 | See Where You Read with Eye Gaze Tracking and Large Language Model | 基于视线追踪与大型语言模型的可视化阅读位置分析 | Sikai Yang, Gang Yan | http://arxiv.org/pdf/2409.19454v2 | null |
2024-10-16 | Active Fake: DeepFake Camouflage | 深度伪造伪装:主动伪造 | Pu Sun, Honggang Qi, Yuezun Li | http://arxiv.org/pdf/2409.03200v2 | null |
2024-10-16 | ViLReF: An Expert Knowledge Enabled Vision-Language Retinal Foundation Model | ViLReF:一种基于专家知识的视语言视网膜基础模型 | Shengzhu Yang, Jiawei Du, Jia Guo, Weihang Zhang, Hanruo Liu, Huiqi Li, Ningli Wang | http://arxiv.org/pdf/2408.10894v3 | link |
2024-10-16 | VrdONE: One-stage Video Visual Relation Detection | VrdONE:单阶段视频视觉关系检测 | Xinjie Jiang, Chenxi Zheng, Xuemiao Xu, Bangzhen Liu, Weiying Zheng, Huaidong Zhang, Shengfeng He | http://arxiv.org/pdf/2408.09408v2 | link |
2024-10-16 | Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation | 超-YOLO:当视觉目标检测遇见超图计算 | Yifan Feng, Jiangang Huang, Shaoyi Du, Shihui Ying, Jun-Hai Yong, Yipeng Li, Guiguang Ding, Rongrong Ji, Yue Gao | http://arxiv.org/pdf/2408.04804v2 | link |
2024-10-16 | AssemAI: Interpretable Image-Based Anomaly Detection for Manufacturing Pipelines | AssemAI: 面向制造管道的可解释基于图像异常检测 | Renjith Prasad, Chathurangi Shyalika, Ramtin Zand, Fadi El Kalach, Revathy Venkataramanan, Ramy Harik, Amit Sheth | http://arxiv.org/pdf/2408.02181v2 | null |
2024-10-16 | DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training | DNTextSpotter: 通过改进去噪训练实现任意形状场景文本检测 | Yu Xie, Qian Qiao, Jun Gao, Tianxiang Wu, Jiaqing Fan, Yue Zhang, Jielei Zhang, Huyang Sun | http://arxiv.org/pdf/2408.00355v2 | null |
2024-10-16 | Vision-Based Adaptive Robotics for Autonomous Surface Crack Repair | 基于视觉的自适应机器人技术用于自主表面裂缝修复 | Joshua Genova, Eric Cabrera, Vedhus Hoskere | http://arxiv.org/pdf/2407.16874v2 | null |
2024-10-16 | Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation | 动态调整ViT适配中的参数与推理效率 | Wangbo Zhao, Jiasheng Tang, Yizeng Han, Yibing Song, Kai Wang, Gao Huang, Fan Wang, Yang You | http://arxiv.org/pdf/2403.11808v2 | link |
2024-10-16 | Zero-shot Generalizable Incremental Learning for Vision-Language Object Detection | 零样本泛化增量学习在视觉-语言目标检测中的应用 | Jieren Deng, Haojian Zhang, Kun Ding, Jianhua Hu, Xingxuan Zhang, Yunkuan Wang | http://arxiv.org/pdf/2403.01680v3 | null |
2024-10-16 | MixedNUTS: Training-Free Accuracy-Robustness Balance via Nonlinearly Mixed Classifiers | MixedNUTS:通过非线性混合分类器实现训练无关的准确性与鲁棒性平衡 | Yatong Bai, Mo Zhou, Vishal M. Patel, Somayeh Sojoudi | http://arxiv.org/pdf/2402.02263v5 | link |
2024-10-16 | Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration | 自监督学习下的激光雷达三维点云与二维-三维神经校准 | Yifan Zhang, Siyu Ren, Junhui Hou, Jinjian Wu, Yixuan Yuan, Guangming Shi | http://arxiv.org/pdf/2401.12452v3 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-16 | ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training | ReLayout:基于布局增强的预训练迈向现实世界文档理解 | Zhouqiang Jiang, Bowen Wang, Junhao Chen, Yuta Nakashima | http://arxiv.org/pdf/2410.10471v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-16 | QueensCAMP: an RGB-D dataset for robust Visual SLAM | 女王营地:用于鲁棒视觉SLAM的RGB-D数据集 | Hudson M. S. Bruno, Esther L. Colombini, Sidney N. Givigi Jr | http://arxiv.org/pdf/2410.12520v1 | null |
2024-10-16 | Depth Estimation From Monocular Images With Enhanced Encoder-Decoder Architecture | 基于增强型编码器-解码器架构的单目图像深度估计 | Dabbrata Das, Argho Deb Das, Farhan Sadaf | http://arxiv.org/pdf/2410.11610v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-16 | Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models | 跨模态安全机制在大规模视觉-语言模型中的迁移 | Shicheng Xu, Liang Pang, Yunchang Zhu, Huawei Shen, Xueqi Cheng | http://arxiv.org/pdf/2410.12662v1 | null |
2024-10-16 | Exploring Model Kinship for Merging Large Language Models | 探索模型亲缘关系以合并大型语言模型 | Yedi Hu, Yunzhi Yao, Ningyu Zhang, Shumin Deng, Huajun Chen | http://arxiv.org/pdf/2410.12613v1 | null |
2024-10-16 | Consistency Calibration: Improving Uncertainty Calibration via Consistency among Perturbed Neighbors | 一致性校准:通过扰动邻居间一致性改善不确定性校准 | Linwei Tao, Haolan Guo, Minjing Dong, Chang Xu | http://arxiv.org/pdf/2410.12295v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-16 | FaceChain-FACT: Face Adapter with Decoupled Training for Identity-preserved Personalization | FaceChain-FACT: 面部适配器与解耦训练在保持身份隐私的个性化中的应用 | Cheng Yu, Haoyu Xie, Lei Shang, Yang Liu, Jun Dan, Baigui Sun, Liefeng Bo | http://arxiv.org/pdf/2410.12312v1 | null |
2024-10-16 | Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models | 专家混合模型的个性化改造:面向视觉-语言模型的联邦提示学习 | Jun Luo, Chen Chen, Shandong Wu | http://arxiv.org/pdf/2410.10114v2 | null |
2024-10-16 | BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way | BroadWay:以无训练方式提升您的文本到视频生成模型性能 | Jiazi Bu, Pengyang Ling, Pan Zhang, Tong Wu, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang | http://arxiv.org/pdf/2410.06241v2 | null |
2024-10-16 | Progressive Retinal Image Registration via Global and Local Deformable Transformations | 渐进式视网膜图像配准:全局与局部可变形变换方法 | Yepeng Liu, Baosheng Yu, Tian Chen, Yuliang Gu, Bo Du, Yongchao Xu, Jun Cheng | http://arxiv.org/pdf/2409.01068v2 | link |
2024-10-16 | Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization | 增强视觉-语言模型的鲁棒性:通过正交性学习与自正则化方法 | Jinlong Li, Dong Zhao, Zequn Jie, Elisa Ricci, Lin Ma, Nicu Sebe | http://arxiv.org/pdf/2407.08374v3 | null |
2024-10-16 | Ultra-High-Definition Image Restoration: New Benchmarks and A Dual Interaction Prior-Driven Solution | 超高清图像恢复:新基准与双交互先验驱动解决方案 | Liyan Wang, Cong Wang, Jinshan Pan, Xiaofeng Liu, Weixiang Zhou, Xiaoran Sun, Wei Wang, Zhixun Su | http://arxiv.org/pdf/2406.13607v4 | link |
2024-10-16 | Knowledge Circuits in Pretrained Transformers | 预训练变压器中的知识电路 | Yunzhi Yao, Ningyu Zhang, Zekun Xi, Mengru Wang, Ziwen Xu, Shumin Deng, Huajun Chen | http://arxiv.org/pdf/2405.17969v2 | link |
2024-10-16 | In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation | Transformer之眼:用于自我中心注视估计的全局-局部相关性分析 | Bolin Lai, Miao Liu, Fiona Ryan, James M. Rehg | http://arxiv.org/pdf/2208.04464v3 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-16 | Gravity-aligned Rotation Averaging with Circular Regression | 基于圆周回归的重力对齐旋转平均方法 | Linfei Pan, Marc Pollefeys, Dániel Baráth | http://arxiv.org/pdf/2410.12763v1 | null |
2024-10-16 | Optimizing 3D Geometry Reconstruction from Implicit Neural Representations | 优化基于隐式神经表示的三维几何重建 | Shen Fan, Przemyslaw Musialski | http://arxiv.org/pdf/2410.12725v1 | null |
2024-10-16 | 3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation | 3DIS:基于深度驱动的解耦实例合成用于文本到图像生成 | Dewei Zhou, Ji Xie, Zongxin Yang, Yi Yang | http://arxiv.org/pdf/2410.12669v1 | null |
2024-10-16 | MambaPainter: Neural Stroke-Based Rendering in a Single Step | MambaPainter:单步神经笔触渲染技术 | Tomoya Sawada, Marie Katsurai | http://arxiv.org/pdf/2410.12524v1 | null |
2024-10-16 | Triplet: Triangle Patchlet for Mesh-Based Inverse Rendering and Scene Parameters Approximation | 三角剖分:基于网格的逆渲染与场景参数逼近的三角形补丁方法 | Jiajie Yang | http://arxiv.org/pdf/2410.12414v1 | null |
2024-10-16 | LoD-Loc: Aerial Visual Localization using LoD 3D Map with Neural Wireframe Alignment | LoD-Loc: 利用带神经线框对齐的LoD三维地图进行航空视觉定位 | Juelin Zhu, Shen Yan, Long Wang, Shengyue Zhang, Yu Liu, Maojun Zhang | http://arxiv.org/pdf/2410.12269v1 | null |
2024-10-16 | ScaleFlow++: Robust and Accurate Estimation of 3D Motion from Video | ScaleFlow++:视频中的稳健与精确三维运动估计 | Han Ling, Quansen Sun | http://arxiv.org/pdf/2407.09797v2 | null |
2024-10-16 | Topological reconstruction of sampled surfaces via Morse theory | 通过莫尔斯理论进行采样曲面的拓扑重建 | Franco Coltraro, Jaume Amorós, Maria Alberich-Carramiñana, Carme Torras | http://arxiv.org/pdf/2405.17257v2 | null |
2024-10-16 | No Bells, Just Whistles: Sports Field Registration by Leveraging Geometric Properties | 无铃声,唯有哨声:利用几何属性进行运动场注册登记 | Marc Gutiérrez-Pérez, Antonio Agudo | http://arxiv.org/pdf/2404.08401v2 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-16 | Rethinking Visual Counterfactual Explanations Through Region Constraint | 重新审视通过区域约束的可视化反事实解释 | Bartlomiej Sobieski, Jakub Grzywaczewski, Bartlomiej Sadlej, Matthew Tivnan, Przemyslaw Biecek | http://arxiv.org/pdf/2410.12591v1 | null |
2024-10-16 | A Primal-dual algorithm for image reconstruction with ICNNs | 图像重建的原始-对偶算法与ICNNs | Hok Shing Wong, Matthias J. Ehrhardt, Subhadip Mukherjee | http://arxiv.org/pdf/2410.12441v1 | null |
2024-10-16 | AdaCropFollow: Self-Supervised Online Adaptation for Visual Under-Canopy Navigation | 自适应作物跟随:视觉冠下导航的自监督在线适应方法 | Arun N. Sivakumar, Federico Magistri, Mateus V. Gasparino, Jens Behley, Cyrill Stachniss, Girish Chowdhary | http://arxiv.org/pdf/2410.12411v1 | null |
2024-10-16 | Beyond Coarse-Grained Matching in Video-Text Retrieval | 视频-文本检索中超越粗粒度匹配策略 | Aozhu Chen, Hazel Doughty, Xirong Li, Cees G. M. Snoek | http://arxiv.org/pdf/2410.12407v1 | null |
2024-10-16 | De-Identification of Medical Imaging Data: A Comprehensive Tool for Ensuring Patient Privacy | 医疗影像数据去标识化:确保患者隐私的全面工具 | Moritz Rempe, Lukas Heine, Constantin Seibold, Fabian Hörst, Jens Kleesiek | http://arxiv.org/pdf/2410.12402v1 | null |
2024-10-16 | Stylistic Multi-Task Analysis of Ukiyo-e Woodblock Prints | 风格多样化的任务分析:浮世绘木版画研究 | Selina Khan, Nanne van Noord | http://arxiv.org/pdf/2410.12379v1 | null |
2024-10-16 | DAT: Improving Adversarial Robustness via Generative Amplitude Mix-up in Frequency Domain | DAT:通过频率域生成振幅混合提升对抗鲁棒性 | Fengpeng Li, Kemou Li, Haiwei Wu, Jinyu Tian, Jiantao Zhou | http://arxiv.org/pdf/2410.12307v1 | null |
2024-10-16 | Fool Me Once? Contrasting Textual and Visual Explanations in a Clinical Decision-Support Setting | 一次被骗?在临床决策支持环境中对比文本与视觉解释 | Maxime Kayser, Bayar Menzat, Cornelius Emde, Bogdan Bercean, Alex Novak, Abdala Espinosa, Bartlomiej W. Papiez, Susanne Gaube, Thomas Lukasiewicz, Oana-Maria Camburu | http://arxiv.org/pdf/2410.12284v1 | null |
2024-10-16 | Advancing Healthcare: Innovative ML Approaches for Improved Medical Imaging in Data-Constrained Environments | 推进医疗健康:数据受限环境下提升医学影像的创新的ML方法 | Al Amin, Kamrul Hasan, Saleh Zein-Sabatto, Liang Hong, Sachin Shetty, Imtiaz Ahmed, Tariqul Islam | http://arxiv.org/pdf/2410.12245v1 | null |
2024-10-16 | Test-time adaptation for image compression with distribution regularization | 图像压缩的测试时自适应方法与分布正则化 | Kecheng Chen, Pingping Zhang, Tiexin Qin, Shiqi Wang, Hong Yan, Haoliang Li | http://arxiv.org/pdf/2410.12191v1 | null |
2024-10-16 | PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation | PIVOT-R:面向机器人操作的原语驱动关注路径的世界模型 | Kaidong Zhang, Pengzhen Ren, Bingqian Lin, Junfan Lin, Shikui Ma, Hang Xu, Xiaodan Liang | http://arxiv.org/pdf/2410.10394v2 | null |
2024-10-16 | MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting | MuseTalk:基于潜在空间修复的实时高质量唇同步技术 | Yue Zhang, Minhao Liu, Zhaokang Chen, Bin Wu, Yubin Zeng, Chao Zhan, Yingjie He, Junxin Huang, Wenjiang Zhou | http://arxiv.org/pdf/2410.10122v2 | link |
2024-10-16 | Tri-Cam: Practical Eye Gaze Tracking via Camera Network | Tri-Cam:基于摄像头网络的实用眼动追踪技术 | Sikai Yang | http://arxiv.org/pdf/2409.19554v2 | null |
2024-10-16 | Video-to-Audio Generation with Hidden Alignment | 隐藏对齐的视频到音频生成技术 | Manjie Xu, Chenxing Li, Xinyi Tu, Yong Ren, Rilin Chen, Yu Gu, Wei Liang, Dong Yu | http://arxiv.org/pdf/2407.07464v2 | null |
2024-10-16 | Scaling Up Personalized Image Aesthetic Assessment via Task Vector Customization | 扩大个性化图像美学评估规模:基于任务向量定制化 | Jooyeol Yun, Jaegul Choo | http://arxiv.org/pdf/2407.07176v2 | link |
2024-10-16 | Understanding Figurative Meaning through Explainable Visual Entailment | 通过可解释视觉蕴含理解隐喻意义 | Arkadiy Saakyan, Shreyas Kulkarni, Tuhin Chakrabarty, Smaranda Muresan | http://arxiv.org/pdf/2405.01474v2 | link |
2024-10-16 | Adaptive Convolutional Neural Network for Image Super-resolution | 自适应卷积神经网络在图像超分辨率中的应用 | Chunwei Tian, Xuanyu Zhang, Tao Wang, Yongjun Zhang, Qi Zhu, Chia-Wen Lin | http://arxiv.org/pdf/2402.15704v4 | link |
2024-10-16 | Latent Inversion with Timestep-aware Sampling for Training-free Non-rigid Editing | 隐变量反转与时间步感知采样在无训练非刚性编辑中的应用 | Yunji Jung, Seokju Lee, Tair Djanibekov, Hyunjung Shim, Jong Chul Ye | http://arxiv.org/pdf/2402.08601v3 | null |