Skip to content

Latest commit

 

History

History
executable file
·
199 lines (174 loc) · 40.2 KB

2024-10-09.md

File metadata and controls

executable file
·
199 lines (174 loc) · 40.2 KB

[UPDATED!] 2024-10-09 (Publish Time)

生成模型

Publish Date Title Title_CN Authors PDF Code
2024-10-09 IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation IterComp:基于模型库的迭代组合感知反馈学习在文本到图像生成中的应用 Xinchen Zhang, Ling Yang, Guohao Li, Yaqi Cai, Jiake Xie, Yong Tang, Yujiu Yang, Mengdi Wang, Bin Cui http://arxiv.org/pdf/2410.07171v1 null
2024-10-09 AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation AvatarGO:零样本4D人-物交互生成与动画技术 Yukang Cao, Liang Pan, Kai Han, Kwan-Yee K. Wong, Ziwei Liu http://arxiv.org/pdf/2410.07164v1 null
2024-10-09 InstructG2I: Synthesizing Images from Multimodal Attributed Graphs InstructG2I: 从多模态属性图合成图像 Bowen Jin, Ziqi Pang, Bingjun Guo, Yu-Xiong Wang, Jiaxuan You, Jiawei Han http://arxiv.org/pdf/2410.07157v1 null
2024-10-09 Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis Trans4D:面向组合文本到四维合成的真实几何感知过渡技术 Bohan Zeng, Ling Yang, Siyu Li, Jiaming Liu, Zixiang Zhang, Juanxi Tian, Kaixin Zhu, Yongzhen Guo, Fu-Yun Wang, Minkai Xu, et.al. http://arxiv.org/pdf/2410.07155v1 null
2024-10-09 Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control 多视角一致性PBR纹理的协同控制联合生成技术研究 Shimon Vainer, Konstantin Kutsy, Dante De Nigris, Ciara Rowles, Slava Elizarov, Simon Donné http://arxiv.org/pdf/2410.06985v1 null
2024-10-09 Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think 生成过程中的表示对齐:训练扩散变换器比你想的更容易 Sihyun Yu, Sangkyung Kwak, Huiwon Jang, Jongheon Jeong, Jonathan Huang, Jinwoo Shin, Saining Xie http://arxiv.org/pdf/2410.06940v1 null
2024-10-09 Compositional Entailment Learning for Hyperbolic Vision-Language Models 组件蕴含学习在双曲视觉-语言模型中的应用 Avik Pal, Max van Spengler, Guido Maria D'Amely di Melendugno, Alessandro Flaborea, Fabio Galasso, Pascal Mettes http://arxiv.org/pdf/2410.06912v1 null
2024-10-09 Boosting Few-Shot Detection with Large Language Models and Layout-to-Image Synthesis 增强型少数镜头检测:结合大型语言模型与布局到图像合成技术 Ahmed Abdullah, Nikolas Ebert, Oliver Wasenmüller http://arxiv.org/pdf/2410.06841v1 null
2024-10-09 Transesophageal Echocardiography Generation using Anatomical Models 使用解剖模型的经食管超声心动图生成技术研究 Emmanuel Oladokun, Musa Abdulkareem, Jurica Šprem, Vicente Grau http://arxiv.org/pdf/2410.06781v1 null
2024-10-09 Diff-FMT: Diffusion Models for Fluorescence Molecular Tomography Diff-FMT: 用于荧光分子层析的扩散模型研究 Qianqian Xue, Peng Zhang, Xingyu Liu, Wenjian Wang, Guanglei Zhang http://arxiv.org/pdf/2410.06757v1 null
2024-10-09 Suppress Content Shift: Better Diffusion Features via Off-the-Shelf Generation Techniques 抑制内容偏移:通过现成生成技术实现更优的扩散特征 Benyuan Meng, Qianqian Xu, Zitai Wang, Zhiyong Yang, Xiaochun Cao, Qingming Huang http://arxiv.org/pdf/2410.06719v1 null
2024-10-09 Decouple-Then-Merge: Towards Better Training for Diffusion Models 解耦后合并:迈向更好的扩散模型训练方法 Qianli Ma, Xuefei Ning, Dongrui Liu, Li Niu, Linfeng Zhang http://arxiv.org/pdf/2410.06664v1 null
2024-10-09 DDRN:a Data Distribution Reconstruction Network for Occluded Person Re-Identification DDRN:一种用于遮挡人物重识别的数据分布重建网络 Zhaoyong Wang, Yujie Liu, Mingyue Li, Wenxin Zhang, Zongmin Li http://arxiv.org/pdf/2410.06600v1 null
2024-10-09 InstantIR: Blind Image Restoration with Instant Generative Reference 即时IR:基于即时生成参考的盲图像恢复技术 Jen-Yuan Huang, Haofan Wang, Qixun Wang, Xu Bai, Hao Ai, Peng Xing, Jen-Tse Huang http://arxiv.org/pdf/2410.06551v1 null
2024-10-09 HFH-Font: Few-shot Chinese Font Synthesis with Higher Quality, Faster Speed, and Higher Resolution HFH-Font:高质量、快速、高分辨率下的少样本中文字体合成技术 Hua Li, Zhouhui Lian http://arxiv.org/pdf/2410.06488v1 null
2024-10-09 Does Spatial Cognition Emerge in Frontier Models? 空间认知在前沿模型中是否涌现? Santhosh Kumar Ramakrishnan, Erik Wijmans, Philipp Kraehenbuehl, Vladlen Koltun http://arxiv.org/pdf/2410.06468v1 null
2024-10-09 Dynamic Diffusion Transformer 动态扩散Transformer Wangbo Zhao, Yizeng Han, Jiasheng Tang, Kai Wang, Yibing Song, Gao Huang, Fan Wang, Yang You http://arxiv.org/pdf/2410.03456v2 link
2024-10-09 Can Your Generative Model Detect Out-of-Distribution Covariate Shift? 能否用您的生成模型检测分布外协变量漂移? Christiaan Viviers, Amaan Valiuddin, Francisco Caetano, Lemar Abdi, Lena Filatova, Peter de With, Fons van der Sommen http://arxiv.org/pdf/2409.03043v2 link
2024-10-09 Atlas Gaussians Diffusion for 3D Generation Atlas 高斯扩散在三维生成中的应用 Haitao Yang, Yuan Dong, Hanwen Jiang, Dejia Xu, Georgios Pavlakos, Qixing Huang http://arxiv.org/pdf/2408.13055v2 null
2024-10-09 FlowDreamer: Exploring High Fidelity Text-to-3D Generation via Rectified Flow FlowDreamer:基于矫正流的高保真文本至3D生成探索 Hangyu Li, Xiangxiang Chu, Dingyuan Shi, Wang Lin http://arxiv.org/pdf/2408.05008v3 null
2024-10-09 MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence MovieDreamer:协同长视觉序列的层次化生成方法 Canyu Zhao, Mingyu Liu, Wen Wang, Weihua Chen, Fan Wang, Hao Chen, Bo Zhang, Chunhua Shen http://arxiv.org/pdf/2407.16655v2 null
2024-10-09 MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers MeshAnything:基于自回归变换器的艺术家创建网格生成技术 Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, et.al. http://arxiv.org/pdf/2406.10163v2 link
2024-10-09 Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models 防御性遗忘与对抗训练在扩散模型中实现稳健概念消除的研究 Yimeng Zhang, Xin Chen, Jinghan Jia, Yihua Zhang, Chongyu Fan, Jiancheng Liu, Mingyi Hong, Ke Ding, Sijia Liu http://arxiv.org/pdf/2405.15234v3 link
2024-10-09 Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training 通过大规模无动作视频预训练学习可操作离散扩散策略 Haoran He, Chenjia Bai, Ling Pan, Weinan Zhang, Bin Zhao, Xuelong Li http://arxiv.org/pdf/2402.14407v4 null
2024-10-09 Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training 从错误中学习:文本到图像扩散模型训练的迭代提示重标记方法 Xinyan Chen, Jiaxin Ge, Tianjun Zhang, Jiaming Liu, Shanghang Zhang http://arxiv.org/pdf/2312.16204v3 link
2024-10-09 A Unified Generative Framework for Realistic Lidar Simulation in Autonomous Driving Systems 统一生成框架在自动驾驶系统中的真实激光雷达模拟 Hamed Haghighi, Mehrdad Dianati, Valentina Donzella, Kurt Debattista http://arxiv.org/pdf/2312.15817v2 link
2024-10-09 The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization 在去噪中的彩票票假设:迈向语义驱动的初始化研究 Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa http://arxiv.org/pdf/2312.08872v4 null
2024-10-09 CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling CMMD:用于视频-音频条件建模的对比多模态扩散算法 Ruihan Yang, Hannes Gamper, Sebastian Braun http://arxiv.org/pdf/2312.05412v2 null
2024-10-09 Enforcing 3D Topological Constraints in Composite Objects via Implicit Functions 通过隐函数在复合对象中强制实施三维拓扑约束 Hieu Le, Jingyi Xu, Nicolas Talabot, Jiancheng Yang, Pascal Fua http://arxiv.org/pdf/2307.08716v2 null
2024-10-09 MedLSAM: Localize and Segment Anything Model for 3D CT Images MedLSAM:面向三维CT图像的定位与分割任意模型 Wenhui Lei, Xu Wei, Xiaofan Zhang, Kang Li, Shaoting Zhang http://arxiv.org/pdf/2306.14752v4 link
2024-10-09 Guided Image Synthesis via Initial Image Editing in Diffusion Model 指导性图像合成:通过扩散模型中的初始图像编辑实现 Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa http://arxiv.org/pdf/2305.03382v3 null

多模态

Publish Date Title Title_CN Authors PDF Code
2024-10-09 MM-Ego: Towards Building Egocentric Multimodal LLMs MM-Ego:构建以自我为中心的多模态大语言模型之路 Hanrong Ye, Haotian Zhang, Erik Daxberger, Lin Chen, Zongyu Lin, Yanghao Li, Bowen Zhang, Haoxuan You, Dan Xu, Zhe Gan, et.al. http://arxiv.org/pdf/2410.07177v1 null
2024-10-09 Do better language models have crisper vision? 更好的语言模型是否拥有更清晰的视觉? Jona Ruthardt, Gertjan J. Burghouts, Serge Belongie, Yuki M. Asano http://arxiv.org/pdf/2410.07173v1 null
2024-10-09 Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate 解密大型视觉-语言模型中的跨模态对齐与模态融合速率 Qidong Huang, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Jiaqi Wang, Dahua Lin, Weiming Zhang, Nenghai Yu http://arxiv.org/pdf/2410.07167v1 null
2024-10-09 Towards Interpreting Visual Information Processing in Vision-Language Models 视觉-语言模型中视觉信息处理的解释性研究进展 Clement Neo, Luke Ong, Philip Torr, Mor Geva, David Krueger, Fazl Barez http://arxiv.org/pdf/2410.07149v1 null
2024-10-09 Personalized Visual Instruction Tuning 个性化视觉指令调优 Renjie Pi, Jianshu Zhang, Tianyang Han, Jipeng Zhang, Rui Pan, Tong Zhang http://arxiv.org/pdf/2410.07113v1 null
2024-10-09 Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology 面向真实无人机视觉-语言导航:平台、基准与方法论 Xiangyu Wang, Donglin Yang, Ziqin Wang, Hohin Kwan, Jinyu Chen, Wenjun Wu, Hongsheng Li, Yue Liao, Si Liu http://arxiv.org/pdf/2410.07087v1 null
2024-10-09 Pixtral 12B Pixtral 12B:请提供完整的英文标题,以便进行准确翻译。仅有的这个部分看起来像是某个模型或技术的名称,不适宜单独翻译。 Pravesh Agrawal, Szymon Antoniak, Emma Bou Hanna, Devendra Chaplot, Jessica Chudnovsky, Saurabh Garg, Theophile Gervet, Soham Ghosh, Amélie Héliou, Paul Jacob, et.al. http://arxiv.org/pdf/2410.07073v1 null
2024-10-09 TinyEmo: Scaling down Emotional Reasoning via Metric Projection TinyEmo: 通过度量投影缩小情感推理规模 Cristian Gutierrez http://arxiv.org/pdf/2410.07062v1 null
2024-10-09 Secure Video Quality Assessment Resisting Adversarial Attacks 安全视频质量评估:抵御对抗性攻击 Ao-Xiang Zhang, Yu Ran, Weixuan Tang, Yuan-Gen Wang, Qingxiao Guan, Chunsheng Yang http://arxiv.org/pdf/2410.06866v1 null
2024-10-09 From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models 从像素到标记:重新审视大型视觉-语言模型中的对象幻觉问题 Yuying Shang, Xinyi Zeng, Yutao Zhu, Xiao Yang, Zhengwei Fang, Jingyuan Zhang, Jiawei Chen, Zinan Liu, Yu Tian http://arxiv.org/pdf/2410.06795v1 null
2024-10-09 HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric Understanding HERM: 面向以人为中心理解的多模态LLM基准测试与增强 Keliang Li, Zaifei Yang, Jiahe Zhao, Hongze Shen, Ruibing Hou, Hong Chang, Shiguang Shan, Xilin Chen http://arxiv.org/pdf/2410.06777v1 null
2024-10-09 To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models 深度探讨多模态大型语言模型中的连接器选择:保留还是压缩 Junyan Lin, Haoran Chen, Dawei Zhu, Xiaoyu Shen http://arxiv.org/pdf/2410.06765v1 null
2024-10-09 Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models 突破视觉感知:针对大型视觉-语言模型的编码视觉令牌的对抗性攻击 Yubo Wang, Chaohu Liu, Yanqiu Qu, Haoyu Cao, Deqiang Jiang, Linli Xu http://arxiv.org/pdf/2410.06699v1 null
2024-10-09 Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization 增强多模态大语言模型以实现详细准确的视频字幕生成:基于多轮偏好优化的方法 Changli Tang, Yixuan Li, Yudong Yang, Jimin Zhuang, Guangzhi Sun, Wei Li, Zujun Ma, Chao Zhang http://arxiv.org/pdf/2410.06682v1 null
2024-10-09 ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time ETA:推理时视觉语言模型的安全评估与对齐研究 Yi Ding, Bolian Li, Ruqi Zhang http://arxiv.org/pdf/2410.06625v1 null
2024-10-09 Decomposing Relationship from 1-to-N into N 1-to-1 for Text-Video Retrieval 将1对N关系分解为N个1对1关系以用于文本-视频检索 Jian Xiao, Zhenzhen Hu, Jia Li, Richang Hong http://arxiv.org/pdf/2410.06618v1 null
2024-10-09 Deep Correlated Prompting for Visual Recognition with Missing Modalities 深度相关提示用于缺失模态的视觉识别 Lianyu Hu, Tongkai Shi, Wei Feng, Fanhua Shang, Liang Wan http://arxiv.org/pdf/2410.06558v1 null
2024-10-09 The Sampling-Gaussian for stereo matching 样本高斯在立体匹配中的应用 Baiyu Pan, jichao jiao, Bowen Yao, Jianxin Pang, Jun Cheng http://arxiv.org/pdf/2410.06527v1 null
2024-10-09 IC3M: In-Car Multimodal Multi-object Monitoring for Abnormal Status of Both Driver and Passengers IC3M:车载多模态多目标监控,用于识别驾驶员与乘客异常状态 Zihan Fang, Zheng Lin, Senkang Hu, Hangcheng Cao, Yiqin Deng, Xianhao Chen, Yuguang Fang http://arxiv.org/pdf/2410.02592v2 null
2024-10-09 DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM DTVLT:基于LLM的多模态多样化文本基准视觉语言跟踪 Xuchen Li, Shiyu Hu, Xiaokun Feng, Dailing Zhang, Meiqi Wu, Jing Zhang, Kaiqi Huang http://arxiv.org/pdf/2410.02492v2 null
2024-10-09 LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation LLaVA-MoD:通过MoE知识蒸馏实现LLaVA小型化 Fangxun Shu, Yue Liao, Le Zhuo, Chenning Xu, Lei Zhang, Guanghao Zhang, Haonan Shi, Long Chen, Tao Zhong, Wanggui He, et.al. http://arxiv.org/pdf/2408.15881v2 link
2024-10-09 Towards Semantic Equivalence of Tokenization in Multimodal LLM 迈向多模态大语言模型中标记化语义等价的实现 Shengqiong Wu, Hao Fei, Xiangtai Li, Jiayi Ji, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan http://arxiv.org/pdf/2406.05127v3 null
2024-10-09 LG-VQ: Language-Guided Codebook Learning LG-VQ:语言引导的码书学习算法 Guotao Liang, Baoquan Zhang, Yaowei Wang, Xutao Li, Yunming Ye, Huaibin Wang, Chuyao Luo, Kola Ye, linfeng Luo http://arxiv.org/pdf/2405.14206v2 null
2024-10-09 DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM DTLLM-VLT: 基于LLM的视觉语言跟踪多样化文本生成方法 Xuchen Li, Xiaokun Feng, Shiyu Hu, Meiqi Wu, Dailing Zhang, Jing Zhang, Kaiqi Huang http://arxiv.org/pdf/2405.12139v2 null

Nerf

Publish Date Title Title_CN Authors PDF Code
2024-10-09 DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation DreamMesh4D:基于稀疏控制高斯网格混合表示的视频到四维生成技术 Zhiqi Li, Yiming Chen, Peidong Liu http://arxiv.org/pdf/2410.06756v1 null
2024-10-09 MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes MimicTalk:几分钟内模仿个性化的表情丰富的3D说话脸庞 Zhenhui Ye, Tianyun Zhong, Yi Ren, Ziyue Jiang, Jiawei Huang, Rongjie Huang, Jinglin Liu, Jinzheng He, Chen Zhang, Zehan Wang, et.al. http://arxiv.org/pdf/2410.06734v1 null
2024-10-09 3D Representation Methods: A Survey 3D表示方法:综述 Zhengren Wang http://arxiv.org/pdf/2410.06475v1 null
2024-10-09 EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis 实时视图合成的精确体积椭球渲染方法:EVER Alexander Mai, Peter Hedman, George Kopanas, Dor Verbin, David Futschik, Qiangeng Xu, Falko Kuester, Jonathan T. Barron, Yinda Zhang http://arxiv.org/pdf/2410.01804v3 null

3DGS

Publish Date Title Title_CN Authors PDF Code
2024-10-09 ES-Gaussian: Gaussian Splatting Mapping via Error Space-Based Gaussian Completion ES-Gaussian:基于误差空间的高斯补全高斯散射映射 Lu Chen, Yingfu Zeng, Haoang Li, Zhitao Deng, Jiafu Yan, Zhenjun Zhao http://arxiv.org/pdf/2410.06613v1 null
2024-10-09 Free-DyGS: Camera-Pose-Free Scene Reconstruction based on Gaussian Splatting for Dynamic Surgical Videos 自由-DyGS:基于高斯扩散的动态手术视频无相机姿态场景重建 Qian Li, Shuojue Yang, Daiyun Shen, Yueming Jin http://arxiv.org/pdf/2409.01003v2 null
2024-10-09 HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior HAHA:具有纹理网格先验的高度精细化高斯人像模型 David Svitov, Pietro Morerio, Lourdes Agapito, Alessio Del Bue http://arxiv.org/pdf/2404.01053v2 link
2024-10-09 StopThePop: Sorted Gaussian Splatting for View-Consistent Real-time Rendering StopThePop:用于视图一致实时渲染的排序高斯溅射技术 Lukas Radl, Michael Steiner, Mathias Parger, Alexander Weinrauch, Bernhard Kerbl, Markus Steinberger http://arxiv.org/pdf/2402.00525v3 link

模型压缩/优化

Publish Date Title Title_CN Authors PDF Code
2024-10-09 JPEG Inspired Deep Learning JPEG启发的深度学习 Ahmed H. Salamah, Kaixiang Zheng, Yiwen Liu, En-Hui Yang http://arxiv.org/pdf/2410.07081v1 null
2024-10-09 S2HPruner: Soft-to-Hard Distillation Bridges the Discretization Gap in Pruning S2HPruner:软硬蒸馏法桥接剪枝中的离散化差距 Weihao Lin, Shengji Tang, Chong Yu, Peng Ye, Tao Chen http://arxiv.org/pdf/2410.07046v1 null
2024-10-09 Structure-Centric Robust Monocular Depth Estimation via Knowledge Distillation 结构中心型鲁棒单目深度估计通过知识蒸馏方法 Runze Chen, Haiyong Luo, Fang Zhao, Jingze Yu, Yupeng Jia, Juan Wang, Xuepeng Ma http://arxiv.org/pdf/2410.06982v1 null
2024-10-09 Perceptual Quality Assessment of Trisoup-Lifting Encoded 3D Point Clouds 感知质量评估:Trisoup-Lifting编码的三维点云 Juncheng Long, Honglei Su, Qi Liu, Hui Yuan, Wei Gao, Jiarun Song, Zhou Wang http://arxiv.org/pdf/2410.06689v1 null
2024-10-09 DRUPI: Dataset Reduction Using Privileged Information DRUPI:利用特权信息进行数据集缩减技术研究 Shaobo Wang, Yantai Yang, Shuaiyu Zhang, Chenghao Sun, Weiya Li, Xuming Hu, Linfeng Zhang http://arxiv.org/pdf/2410.01611v2 null

分类/检测/识别/分割/...

Publish Date Title Title_CN Authors PDF Code
2024-10-09 CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition CHASE:基于骨架的多实体动作识别的凸包自适应平移学习算法 Yuhang Wen, Mengyuan Liu, Songtao Wu, Beichen Ding http://arxiv.org/pdf/2410.07153v1 link
2024-10-09 Diagnosis of Malignant Lymphoma Cancer Using Hybrid Optimized Techniques Based on Dense Neural Networks 基于密集神经网络混合优化技术的恶性淋巴瘤癌症诊断研究 Salah A. Aly, Ali Bakhiet, Mazen Balat http://arxiv.org/pdf/2410.06974v1 null
2024-10-09 Bridge the Points: Graph-based Few-shot Segment Anything Semantically 桥接点:基于图的少样本语义分割任意事物 Anqi Zhang, Guangyu Gao, Jianbo Jiao, Chi Harold Liu, Yunchao Wei http://arxiv.org/pdf/2410.06964v1 null
2024-10-09 Learning from Spatio-temporal Correlation for Semi-Supervised LiDAR Semantic Segmentation 基于时空相关性学习的半监督LiDAR语义分割 Seungho Lee, Hwijeong Lee, Hyunjung Shim http://arxiv.org/pdf/2410.06893v1 null
2024-10-09 Selecting the Best Sequential Transfer Path for Medical Image Segmentation with Limited Labeled Data 最佳顺序迁移路径选择:有限标注数据下的医学图像分割 Jingyun Yang, Jingge Wang, Guoqing Zhang, Yang Li http://arxiv.org/pdf/2410.06892v1 null
2024-10-09 Evaluating Model Performance with Hard-Swish Activation Function Adjustments 评估硬_swish激活函数调整下的模型性能 Sai Abhinav Pydimarry, Shekhar Madhav Khairnar, Sofia Garces Palacios, Ganesh Sankaranarayanan, Darian Hoagland, Dmitry Nepomnayshy, Huu Phong Nguyen http://arxiv.org/pdf/2410.06879v1 null
2024-10-09 SurANet: Surrounding-Aware Network for Concealed Object Detection via Highly-Efficient Interactive Contrastive Learning Strategy SurANet:基于高效交互对比学习策略的周界感知网络用于隐藏物体检测 Yuhan Kang, Qingpeng Li, Leyuan Fang, Jian Zhao, Xuelong Li http://arxiv.org/pdf/2410.06842v1 null
2024-10-09 An Improved Approach for Cardiac MRI Segmentation based on 3D UNet Combined with Papillary Muscle Exclusion ERROR Narjes Benameur, Ramzi Mahmoudi, Mohamed Deriche, Amira fayouka, Imene Masmoudi, Nessrine Zoghlami http://arxiv.org/pdf/2410.06818v1 null
2024-10-09 Rethinking the Evaluation of Visible and Infrared Image Fusion 重新审视可见光与红外图像融合的评价方法 Dayan Guan, Yixuan Wu, Tianzhu Liu, Alex C. Kot, Yanfeng Gu http://arxiv.org/pdf/2410.06811v1 null
2024-10-09 QuadMamba: Learning Quadtree-based Selective Scan for Visual State Space Model 四叉曼巴:学习基于四叉树的视觉状态空间模型的选择性扫描 Fei Xie, Weijia Zhang, Zhongdao Wang, Chao Ma http://arxiv.org/pdf/2410.06806v1 null
2024-10-09 Utilizing Transfer Learning and pre-trained Models for Effective Forest Fire Detection: A Case Study of Uttarakhand 利用迁移学习和预训练模型实现有效的森林火灾检测:乌塔尔阿坎德案例研究 Hari Prabhat Gupta, Rahul Mishra http://arxiv.org/pdf/2410.06743v1 null
2024-10-09 Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy 评估点云着色对语义分割准确性的影响 Qinfeng Zhu, Jiaze Cao, Yuanzhi Cai, Lei Fan http://arxiv.org/pdf/2410.06725v1 null
2024-10-09 Fourier-based Action Recognition for Wildlife Behavior Quantification with Event Cameras 基于傅里叶的行为识别方法:事件相机在野生动物行为量化中的应用 Friedhelm Hamann, Suman Ghosh, Ignacio Juarez Martinez, Tom Hart, Alex Kacelnik, Guillermo Gallego http://arxiv.org/pdf/2410.06698v1 null
2024-10-09 Continual Learning in the Frequency Domain 频率域中的持续学习 Ruiqi Liu, Boyu Diao, Libo Huang, Zijia An, Zhulin An, Yongjun Xu http://arxiv.org/pdf/2410.06645v1 null
2024-10-09 Open-RGBT: Open-vocabulary RGB-T Zero-shot Semantic Segmentation in Open-world Environments 开放RGBT: 开放词汇RGB-T零样本语义分割在开放世界环境中的应用 Meng Yu, Luojie Yang, Xunjie He, Yi Yang, Yufeng Yue http://arxiv.org/pdf/2410.06626v1 null
2024-10-09 Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers Pair-VPR:基于视觉变换器的视觉地点识别之地域感知预训练与对比对分类方法 Stephen Hausler, Peyman Moghadam http://arxiv.org/pdf/2410.06614v1 null
2024-10-09 Towards Natural Image Matting in the Wild via Real-Scenario Prior 走向野外自然图像精细化处理:真实场景先验方法 Ruihao Xia, Yu Liang, Peng-Tao Jiang, Hao Zhang, Qianru Sun, Yang Tang, Bo Li, Pan Zhou http://arxiv.org/pdf/2410.06593v1 null
2024-10-09 On The Relationship between Visual Anomaly-free and Anomalous Representations 视觉无异常与异常表示之间的关系研究 Riya Sadrani, Hrishikesh Sharma, Ayush Bachan http://arxiv.org/pdf/2410.06576v1 null
2024-10-09 MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging MedImageInsight:面向通用领域医学影像的开源嵌入模型 Noel C. F. Codella, Ying Jin, Shrey Jain, Yu Gu, Ho Hin Lee, Asma Ben Abacha, Alberto Santamaria-Pang, Will Guyman, Naiteek Sangani, Sheng Zhang, et.al. http://arxiv.org/pdf/2410.06542v1 null
2024-10-09 Deep Learning Ensemble for Predicting Diabetic Macular Edema Onset Using Ultra-Wide Field Color Fundus Image 深度学习集成模型预测糖尿病黄斑水肿发病期:基于超广角彩色眼底图像分析 Pengyao Qin, Arun J. Thirunavukarasu, Le Zhang http://arxiv.org/pdf/2410.06483v1 null
2024-10-09 LoTLIP: Improving Language-Image Pre-training for Long Text Understanding LoTLIP:提升长文本理解的语图预训练方法 Wei Wu, Kecheng Zheng, Shuailei Ma, Fan Lu, Yuxin Guo, Yifei Zhang, Wei Chen, Qingpei Guo, Yujun Shen, Zheng-Jun Zha http://arxiv.org/pdf/2410.05249v2 null
2024-10-09 RobustEMD: Domain Robust Matching for Cross-domain Few-shot Medical Image Segmentation 域鲁棒匹配的跨域少样本医疗图像分割:RobustEMD方法 Yazhou Zhu, Minxian Li, Qiaolin Ye, Shidong Wang, Tong Xin, Haofeng Zhang http://arxiv.org/pdf/2410.01110v2 null
2024-10-09 The BRAVO Semantic Segmentation Challenge Results in UNCV2024 BRAVO语义分割挑战赛UNCV2024年结果分析 Tuan-Hung Vu, Eduardo Valle, Andrei Bursuc, Tommie Kerssies, Daan de Geus, Gijs Dubbelman, Long Qian, Bingke Zhu, Yingying Chen, Ming Tang, et.al. http://arxiv.org/pdf/2409.15107v2 link
2024-10-09 Federated Impression for Learning with Distributed Heterogeneous Data 联邦影响学习在分布式异构数据中的应用 Atrin Arya, Sana Ayromlou, Armin Saadat, Purang Abolmaesumi, Xiaoxiao Li http://arxiv.org/pdf/2409.07351v2 link
2024-10-09 TASAR: Transfer-based Attack on Skeletal Action Recognition TASAR:基于迁移的骨骼动作识别攻击方法 Yunfeng Diao, Baiqi Wu, Ruixuan Zhang, Ajian Liu, Xingxing Wei, Meng Wang, He Wang http://arxiv.org/pdf/2409.02483v2 null
2024-10-09 Staircase Cascaded Fusion of Lightweight Local Pattern Recognition and Long-Range Dependencies for Structural Crack Segmentation 轻量级局部模式识别与长距离依赖的阶梯式级联融合结构裂缝分割方法 Hui Liu, Chen Jia, Fan Shi, Xu Cheng, Mianzhao Wang, Shengyong Chen http://arxiv.org/pdf/2408.12815v2 link
2024-10-09 Comprehensive Performance Evaluation of YOLO11, YOLOv10, YOLOv9 and YOLOv8 on Detecting and Counting Fruitlet in Complex Orchard Environments 全面评估YOLO11、YOLOv10、YOLOv9和YOLOv8在复杂果园环境中检测与计数果实性能表现 Ranjan Sapkota, Zhichao Meng, Martin Churuvija, Xiaoqiang Du, Zenghong Ma, Manoj Karkee http://arxiv.org/pdf/2407.12040v4 null
2024-10-09 Adaptive Parametric Activation 自适应参数激活函数 Konstantinos Panagiotis Alexandridis, Jiankang Deng, Anh Nguyen, Shan Luo http://arxiv.org/pdf/2407.08567v2 link
2024-10-09 Cell Tracking according to Biological Needs -- Strong Mitosis-aware Multi-Hypothesis Tracker with Aleatoric Uncertainty 根据生物需求进行细胞追踪——具有强有力有丝分裂感知的多假设追踪器与随机不确定性 Timo Kaiser, Maximilian Schier, Bodo Rosenhahn http://arxiv.org/pdf/2403.15011v3 null
2024-10-09 Topologically Faithful Multi-class Segmentation in Medical Images 医学图像中的拓扑保真多类分割方法 Alexander H. Berger, Nico Stucki, Laurin Lux, Vincent Buergin, Suprosanna Shit, Anna Banaszak, Daniel Rueckert, Ulrich Bauer, Johannes C. Paetzold http://arxiv.org/pdf/2403.11001v2 null
2024-10-09 Biophysics Informed Pathological Regularisation for Brain Tumour Segmentation 基于生物物理信息病理正则化的脑肿瘤分割方法 Lipei Zhang, Yanqi Cheng, Lihao Liu, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero http://arxiv.org/pdf/2403.09136v3 null
2024-10-09 AUPIMO: Redefining Visual Anomaly Detection Benchmarks with High Speed and Low Tolerance AUPIMO:以高速与低容忍度重新定义视觉异常检测基准 Joao P. C. Bertoldo, Dick Ameln, Ashwin Vaidya, Samet Akçay http://arxiv.org/pdf/2401.01984v4 link
2024-10-09 LISBET: a machine learning model for the automatic segmentation of social behavior motifs LISBET:一种用于社交行为模式自动分割的机器学习模型 Giuseppe Chindemi, Benoit Girard, Camilla Bellone http://arxiv.org/pdf/2311.04069v2 null
2024-10-09 Beyond the Visible: A Survey on Cross-spectral Face Recognition 超越可见光:跨谱段人脸识别技术综述 David Anghelone, Cunjian Chen, Arun Ross, Antitza Dantcheva http://arxiv.org/pdf/2201.04435v4 null

图像理解

Publish Date Title Title_CN Authors PDF Code
2024-10-09 Analysis of different disparity estimation techniques on aerial stereo image datasets 不同视差估计技术在航空立体图像数据集上的分析研究 Ishan Narayan, Shashi Poddar http://arxiv.org/pdf/2410.06711v1 null
2024-10-09 M${}^{3}$Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes M${}^{3}$Bench:三维场景中移动操作的全身体运动生成基准测试 Zeyu Zhang, Sixu Yan, Muzhi Han, Zaijin Wang, Xinggang Wang, Song-Chun Zhu, Hangxin Liu http://arxiv.org/pdf/2410.06678v1 null

LLM

Publish Date Title Title_CN Authors PDF Code
2024-10-09 Clean Evaluations on Contaminated Visual Language Models 清洁评估在污染视觉语言模型上的研究 Hongyuan Lu, Shujie Miao, Wai Lam http://arxiv.org/pdf/2410.07030v1 null
2024-10-09 Preference Fine-Tuning for Factuality in Chest X-Ray Interpretation Models Without Human Feedback ERROR Dennis Hein, Zhihong Chen, Sophie Ostmeier, Justin Xu, Maya Varma, Eduardo Pontes Reis, Arne Edward Michalson, Christian Bluethgen, Hyun Joo Shin, Curtis Langlotz, et.al. http://arxiv.org/pdf/2410.07025v1 null
2024-10-09 Weak-eval-Strong: Evaluating and Eliciting Lateral Thinking of LLMs with Situation Puzzles 弱评估-强激发:使用情境谜题评估和激发LLMs的横向思维能力 Qi Chen, Bowen Zhang, Gang Wang, Qi Wu http://arxiv.org/pdf/2410.06733v1 null

Transformer

Publish Date Title Title_CN Authors PDF Code
2024-10-09 LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning LaMP:面向运动生成、检索与描述的语言-运动预训练模型 Zhe Li, Weihao Yuan, Yisheng He, Lingteng Qiu, Shenhao Zhu, Xiaodong Gu, Weichao Shen, Yuan Dong, Zilong Dong, Laurence T. Yang http://arxiv.org/pdf/2410.07093v1 null
2024-10-09 Adaptive High-Frequency Transformer for Diverse Wildlife Re-Identification 自适应高频变换器在多样化野生动物重识别中的应用 Chenyue Li, Shuoyi Chen, Mang Ye http://arxiv.org/pdf/2410.06977v1 null
2024-10-09 ELMO: Enhanced Real-time LiDAR Motion Capture through Upsampling ELMO:通过上采样增强实时LiDAR运动捕捉 Deok-Kyeong Jang, Dongseok Yang, Deok-Yun Jang, Byeoli Choi, Donghoon Shin, Sung-hee Lee http://arxiv.org/pdf/2410.06963v1 null
2024-10-09 Reliable Probabilistic Human Trajectory Prediction for Autonomous Applications 可靠的概率性人类轨迹预测在自主应用中的研究 Manuel Hetzel, Hannes Reichert, Konrad Doll, Bernhard Sick http://arxiv.org/pdf/2410.06905v1 null
2024-10-09 MatMamba: A Matryoshka State Space Model MatMamba:一种套娃状态空间模型 Abhinav Shukla, Sai Vemprala, Aditya Kusupati, Ashish Kapoor http://arxiv.org/pdf/2410.06718v1 null
2024-10-09 SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference Language Model Inference 中文翻译:SparseVLM:用于高效视觉-语言模型推理的视觉标记稀疏化 Yuan Zhang, Chun-Kai Fan, Junpeng Ma, Wenzhao Zheng, Tao Huang, Kuan Cheng, Denis Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, et.al. http://arxiv.org/pdf/2410.04417v2 link
2024-10-09 Window-based Channel Attention for Wavelet-enhanced Learned Image Compression 基于窗口的通道注意力机制在Wavelet增强学习图像压缩中的应用 Heng Xu, Bowen Hai, Yushun Tang, Zhihai He http://arxiv.org/pdf/2409.14090v2 null
2024-10-09 GMSR:Gradient-Guided Mamba for Spectral Reconstruction from RGB Images GMSR:基于梯度的Mamba算法实现RGB图像光谱重建 Xinying Wang, Zhixiong Huang, Sifan Zhang, Jiawen Zhu, Paolo Gamba, Lin Feng http://arxiv.org/pdf/2405.07777v2 null

3D/CG

Publish Date Title Title_CN Authors PDF Code
2024-10-09 Thing2Reality: Transforming 2D Content into Conditioned Multiviews and 3D Gaussian Objects for XR Communication Thing2Reality:将2D内容转换为条件多视图与3D高斯对象以实现XR通信 Erzhen Hu, Mingyi Li, Jungtaek Hong, Xun Qian, Alex Olwal, David Kim, Seongkook Heo, Ruofei Du http://arxiv.org/pdf/2410.07119v1 null
2024-10-09 Z-upscaling: Optical Flow Guided Frame Interpolation for Isotropic Reconstruction of 3D EM Volumes Z轴放大:光流引导的帧插值用于三维电子显微镜体积各向同性重建 Fisseha A. Ferede, Ali Khalighifar, Jaison John, Krishnan Venkataraman, Khaled Khairy http://arxiv.org/pdf/2410.07043v1 null
2024-10-09 OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB 全位姿追踪OmniPose6D:面向动态场景中短期物体位姿追踪的单目RGB方法 Yunzhi Lin, Yipu Zhao, Fu-Jen Chu, Xingyu Chen, Weiyao Wang, Hao Tang, Patricio A. Vela, Matt Feiszli, Kevin Liang http://arxiv.org/pdf/2410.06694v1 null
2024-10-09 LocoVR: Multiuser Indoor Locomotion Dataset in Virtual Reality LocoVR:虚拟现实中的多用户室内移动数据集 Kojiro Takeyama, Yimeng Liu, Misha Sra http://arxiv.org/pdf/2410.06437v1 null
2024-10-09 SCILLA: SurfaCe Implicit Learning for Large Urban Area, a volumetric hybrid solution SCILLA:大规模城市区域表面隐式学习,一种体积分层解决方案 Hala Djeghim, Nathan Piasco, Moussab Bennehar, Luis Roldão, Dzmitry Tsishkou, Désiré Sidibé http://arxiv.org/pdf/2403.10344v3 null

各类学习方式

Publish Date Title Title_CN Authors PDF Code
2024-10-09 Continual Learning: Less Forgetting, More OOD Generalization via Adaptive Contrastive Replay 持续学习:通过自适应对比重放实现更少遗忘与更强OOD泛化能力 Hossein Rezaei, Mohammad Sabokrou http://arxiv.org/pdf/2410.07110v1 null

其他

Publish Date Title Title_CN Authors PDF Code
2024-10-09 EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models EvolveDirector:利用大型视觉-语言模型实现高级文本到图像生成 Rui Zhao, Hangjie Yuan, Yujie Wei, Shiwei Zhang, Yuchao Gu, Lingmin Ran, Xiang Wang, Zhangjie Wu, Junhao Zhang, Yingya Zhang, et.al. http://arxiv.org/pdf/2410.07133v1 null
2024-10-09 VHELM: A Holistic Evaluation of Vision Language Models VHELM:视觉语言模型的全面评估 Tony Lee, Haoqin Tu, Chi Heem Wong, Wenhao Zheng, Yiyang Zhou, Yifan Mai, Josselin Somerville Roberts, Michihiro Yasunaga, Huaxiu Yao, Cihang Xie, et.al. http://arxiv.org/pdf/2410.07112v1 null
2024-10-09 A Diffusion-based Xray2MRI Model: Generating Pseudo-MRI Volumes From one Single X-ray 基于扩散的Xray2MRI模型:从单张X射线生成伪MRI体积数据 Zhe Wang, Rachid Jennane, Aladine Chetouani, Mohamed Jarraya http://arxiv.org/pdf/2410.06997v1 null
2024-10-09 Evaluating Computational Pathology Foundation Models for Prostate Cancer Grading under Distribution Shifts 评估计算病理学基础模型在前列腺癌分级中的分布迁移性能 Fredrik K. Gustafsson, Mattias Rantalainen http://arxiv.org/pdf/2410.06723v1 null
2024-10-09 Happy: A Debiased Learning Framework for Continual Generalized Category Discovery 快乐:一种去偏见的持续广义类别发现学习框架 Shijie Ma, Fei Zhu, Zhun Zhong, Wenzhuo Liu, Xu-Yao Zhang, Cheng-Lin Liu http://arxiv.org/pdf/2410.06535v1 null
2024-10-09 MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning MotionRL: 利用多奖励强化学习将文本到运动生成对齐至人类偏好 Xiaoyang Liu, Yunyao Mao, Wengang Zhou, Houqiang Li http://arxiv.org/pdf/2410.06513v1 null
2024-10-09 MaskBlur: Spatial and Angular Data Augmentation for Light Field Image Super-Resolution MaskBlur:光场图像超分辨率的空间与角度数据增强方法 Wentao Chao, Fuqing Duan, Yulan Guo, Guanghui Wang http://arxiv.org/pdf/2410.06478v1 null
2024-10-09 From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning 从通用到专业:通过任务特定视觉指令调整适应视觉语言模型 Yang Bai, Yang Zhou, Jun Zhou, Rick Siow Mong Goh, Daniel Shu Wei Ting, Yong Liu http://arxiv.org/pdf/2410.06456v1 null
2024-10-09 Machine Unlearning in Forgettability Sequence 机器遗忘序列中的逆向学习 Junjie Chen, Qian Chen, Jian Lou, Xiaoyu Zhang, Kai Wu, Zilong Wang http://arxiv.org/pdf/2410.06446v1 null
2024-10-09 Motion and Structure from Event-based Normal Flow 基于事件正常流的运动与结构估计 Zhongyang Ren, Bangyan Liao, Delei Kong, Jinghang Li, Peidong Liu, Laurent Kneip, Guillermo Gallego, Yi Zhou http://arxiv.org/pdf/2407.12239v3 null
2024-10-09 Decompose and Compare Consistency: Measuring VLMs' Answer Reliability via Task-Decomposition Consistency Comparison 解构与对比一致性:通过任务解构一致性比较衡量VLMs答案可靠性 Qian Yang, Weixiang Yan, Aishwarya Agrawal http://arxiv.org/pdf/2407.07840v3 null
2024-10-09 Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models 评估大规模视觉-语言模型幻觉基准的质量 Bei Yan, Jie Zhang, Zheng Yuan, Shiguang Shan, Xilin Chen http://arxiv.org/pdf/2406.17115v2 link
2024-10-09 AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models AutoHallusion: 针对视觉-语言模型的自动幻觉基准生成方法 Xiyang Wu, Tianrui Guan, Dianqi Li, Shuaiyi Huang, Xiaoyu Liu, Xijun Wang, Ruiqi Xian, Abhinav Shrivastava, Furong Huang, Jordan Lee Boyd-Graber, et.al. http://arxiv.org/pdf/2406.10900v2 null
2024-10-09 AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales AGL-NET:空地跨模态全局定位与多尺度变化研究 Tianrui Guan, Ruiqi Xian, Xijun Wang, Xiyang Wu, Mohamed Elnoor, Daeun Song, Dinesh Manocha http://arxiv.org/pdf/2404.03187v2 null
2024-10-09 Less is More: High-value Data Selection for Visual Instruction Tuning "少即是多:视觉指令调优中的高价值数据选择" Zikang Liu, Kun Zhou, Wayne Xin Zhao, Dawei Gao, Yaliang Li, Ji-Rong Wen http://arxiv.org/pdf/2403.09559v3 null
2024-10-09 LM-HT SNN: Enhancing the Performance of SNN to ANN Counterpart through Learnable Multi-hierarchical Threshold Model LM-HT SNN:通过可学习多层级阈值模型提升SNN性能至ANN对应水平 Zecheng Hao, Xinyu Shi, Yujia Liu, Zhaofei Yu, Tiejun Huang http://arxiv.org/pdf/2402.00411v2 null