Skip to content

Latest commit

 

History

History
executable file
·
243 lines (218 loc) · 53.4 KB

2024-10-14.md

File metadata and controls

executable file
·
243 lines (218 loc) · 53.4 KB

[UPDATED!] 2024-10-14 (Publish Time)

生成模型

Publish Date Title Title_CN Authors PDF Code
2024-10-14 Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models Tex4D: 零样本4D场景纹理生成与视频扩散模型 Jingzhi Bao, Xueting Li, Ming-Hsuan Yang http://arxiv.org/pdf/2410.10821v1 null
2024-10-14 Depth Any Video with Scalable Synthetic Data 深度增强任意视频:可扩展合成数据方法 Honghui Yang, Di Huang, Wei Yin, Chunhua Shen, Haifeng Liu, Xiaofei He, Binbin Lin, Wanli Ouyang, Tong He http://arxiv.org/pdf/2410.10815v1 null
2024-10-14 HART: Efficient Visual Generation with Hybrid Autoregressive Transformer HART:基于混合自回归变换器的高效视觉生成 Haotian Tang, Yecheng Wu, Shang Yang, Enze Xie, Junsong Chen, Junyu Chen, Zhuoyang Zhang, Han Cai, Yao Lu, Song Han http://arxiv.org/pdf/2410.10812v1 null
2024-10-14 TrajDiffuse: A Conditional Diffusion Model for Environment-Aware Trajectory Prediction TrajDiffuse:一种环境感知的条件扩散模型用于轨迹预测 Qingze, Liu, Danrui Li, Samuel S. Sohn, Sejong Yoon, Mubbasir Kapadia, Vladimir Pavlovic http://arxiv.org/pdf/2410.10804v1 null
2024-10-14 Boosting Camera Motion Control for Video Diffusion Transformers 增强视频扩散变换器的摄像头运动控制 Soon Yau Cheong, Duygu Ceylan, Armin Mustafa, Andrew Gilbert, Chun-Hao Paul Huang http://arxiv.org/pdf/2410.10802v1 null
2024-10-14 Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations 使用矫正随机微分方程的语义图像反转与编辑 Litu Rout, Yujia Chen, Nataniel Ruiz, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu http://arxiv.org/pdf/2410.10792v1 null
2024-10-14 ControlMM: Controllable Masked Motion Generation 控制MM:可控遮罩动作生成方法 Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Korrawe Karunratanakul, Pu Wang, Hongfei Xue, Chen Chen, Chuan Guo, Junli Cao, Jian Ren, Sergey Tulyakov http://arxiv.org/pdf/2410.10780v1 null
2024-10-14 Adaptive Diffusion Terrain Generator for Autonomous Uneven Terrain Navigation 自适应扩散地形生成器在自主不平坦地形导航中的应用 Youwei Yu, Junhong Xu, Lantao Liu http://arxiv.org/pdf/2410.10766v1 null
2024-10-14 DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships DragEntity:基于实体与位置关系的轨迹引导视频生成技术 Zhang Wan, Sheng Tang, Jiawei Wei, Ruize Zhang, Juan Cao http://arxiv.org/pdf/2410.10751v1 null
2024-10-14 FlexGen: Flexible Multi-View Generation from Text and Image Inputs FlexGen:基于文本和图像输入的灵活多视角生成方法 Xinli Xu, Wenhang Ge, Jiantao Lin, Jiawei Feng, Lie Xu, HanFeng Zhao, Shunsi Zhang, Ying-Cong Chen http://arxiv.org/pdf/2410.10745v1 null
2024-10-14 Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models 深度压缩自编码器用于高效高分辨率扩散模型 Junyu Chen, Han Cai, Junsong Chen, Enze Xie, Shang Yang, Haotian Tang, Muyang Li, Yao Lu, Song Han http://arxiv.org/pdf/2410.10733v1 null
2024-10-14 TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model TALK-Act: 基于扩散模型的二维说话虚拟人复现中的纹理感知增强方法 Jiazhi Guan, Quanwei Yang, Kaisiyuan Wang, Hang Zhou, Shengyi He, Zhiliang Xu, Haocheng Feng, Errui Ding, Jingdong Wang, Hongtao Xie, et.al. http://arxiv.org/pdf/2410.10696v1 null
2024-10-14 Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation 双耳全开:迈向语言驱动的空间音频生成技术 Peiwen Sun, Sitong Cheng, Xiangtai Li, Zhen Ye, Huadai Liu, Honggang Zhang, Wei Xue, Yike Guo http://arxiv.org/pdf/2410.10676v1 null
2024-10-14 SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers SANA:基于线性扩散变换器的高效高分辨率图像合成 Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Yujun Lin, Zhekai Zhang, Muyang Li, Yao Lu, Song Han http://arxiv.org/pdf/2410.10629v1 null
2024-10-14 Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing 视觉引导与遮罩增强的自适应去噪在基于提示的图像编辑中的应用 Kejie Wang, Xuemeng Song, Meng Liu, Weili Guan, Liqiang Nie http://arxiv.org/pdf/2410.10496v1 null
2024-10-14 Towards Reliable Verification of Unauthorized Data Usage in Personalized Text-to-Image Diffusion Models 针对个性化文本到图像扩散模型中未经授权数据使用的可靠验证研究 Boheng Li, Yanhao Wei, Yankai Fu, Zhenting Wang, Yiming Li, Jie Zhang, Run Wang, Tianwei Zhang http://arxiv.org/pdf/2410.10437v1 null
2024-10-14 DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model DOME:将扩散模型驯化为高保真可控占用世界模型 Songen Gu, Wei Yin, Bu Jin, Xiaoyang Guo, Junming Wang, Haodong Li, Qian Zhang, Xiaoxiao Long http://arxiv.org/pdf/2410.10429v1 null
2024-10-14 Anatomical feature-prioritized loss for enhanced MR to CT translation 基于解剖特征优先级的增强型MR到CT转换损失函数研究 Arthur Longuefosse, Baudouin Denis de Senneville, Gael Dournes, Ilyes Benlala, Pascal Desbarats, Fabien Baldacci http://arxiv.org/pdf/2410.10328v1 null
2024-10-14 LG-CAV: Train Any Concept Activation Vector with Language Guidance LG-CAV:使用语言指导训练任意概念激活向量 Qihan Huang, Jie Song, Mengqi Xue, Haofei Zhang, Bingde Hu, Huiqiong Wang, Hao Jiang, Xingen Wang, Mingli Song http://arxiv.org/pdf/2410.10308v1 null
2024-10-14 Saliency Guided Optimization of Diffusion Latents 显著性引导的扩散潜在优化研究 Xiwen Wang, Jizhe Zhou, Xuekang Zhu, Cheng Li, Mao Li http://arxiv.org/pdf/2410.10257v1 null
2024-10-14 Detecting Unforeseen Data Properties with Diffusion Autoencoder Embeddings using Spine MRI data 使用脊柱MRI数据的扩散自编码器嵌入检测未预见的数据属性 Robert Graf, Florian Hunecke, Soeren Pohl, Matan Atad, Hendrik Moeller, Sophie Starck, Thomas Kroencke, Stefanie Bette, Fabian Bamberg, Tobias Pischon, et.al. http://arxiv.org/pdf/2410.10220v1 null
2024-10-14 MagicEraser: Erasing Any Objects via Semantics-Aware Control MagicEraser:通过语义感知控制擦除任意对象 Fan Li, Zixiao Zhang, Yi Huang, Jianzhuang Liu, Renjing Pei, Bin Shao, Songcen Xu http://arxiv.org/pdf/2410.10207v1 null
2024-10-14 Identity-Focused Inference and Extraction Attacks on Diffusion Models 针对扩散模型的以身份聚焦的推理与提取攻击 Jayneel Vora, Aditya Krishnan, Nader Bouacida, Prabhu RV Shankar, Prasant Mohapatra http://arxiv.org/pdf/2410.10177v1 null
2024-10-14 Generative Human Video Compression with Multi-granularity Temporal Trajectory Factorization 生成式人类视频压缩:多粒度时间轨迹分解 Shanzhi Yin, Bolin Chen, Shiqi Wang, Yan Ye http://arxiv.org/pdf/2410.10171v1 null
2024-10-14 First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending 首先构建背景再渲染文本:视觉文本融合的一种新范式 Zhenhang Li, Yan Shu, Weichao Zeng, Dongbao Yang, Yu Zhou http://arxiv.org/pdf/2410.10168v1 link
2024-10-14 Will the Inclusion of Generated Data Amplify Bias Across Generations in Future Image Classification Models? 是否包含生成数据会放大未来图像分类模型中各代之间的偏见? Zeliang Zhang, Xin Liang, Mingqian Feng, Susan Liang, Chenliang Xu http://arxiv.org/pdf/2410.10160v1 null
2024-10-14 TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control 基于先验引导控制的扩散场景文本编辑方法(TextCtrl) Weichao Zeng, Yan Shu, Zhenhang Li, Dongbao Yang, Yu Zhou http://arxiv.org/pdf/2410.10133v1 null
2024-10-14 StegaINR4MIH: steganography by implicit neural representation for multi-image hiding 隐式神经表示的多图像隐藏隐写术:StegaINR4MIH Weina Dong, Jia Liu, Lifeng Chen, Wenquan Sun, Xiaozhong Pan, Yan Ke http://arxiv.org/pdf/2410.10117v1 null
2024-10-14 High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity 高精度二分图像分割:基于探测扩散能力的算法 Qian Yu, Peng-Tao Jiang, Hao Zhang, Jinwei Chen, Bo Li, Lihe Zhang, Huchuan Lu http://arxiv.org/pdf/2410.10105v1 null
2024-10-14 The Ingredients for Robotic Diffusion Transformers 机器人扩散变换器的组成要素 Sudeep Dasari, Oier Mees, Sebastian Zhao, Mohan Kumar Srirama, Sergey Levine http://arxiv.org/pdf/2410.10088v1 null
2024-10-14 DINTR: Tracking via Diffusion-based Interpolation DINTR:基于扩散插值的跟踪方法 Pha Nguyen, Ngan Le, Jackson Cothren, Alper Yilmaz, Khoa Luu http://arxiv.org/pdf/2410.10053v1 null
2024-10-14 Gait Sequence Upsampling using Diffusion Models for Single LiDAR Sensors 使用扩散模型对单LiDAR传感器进行步态序列上采样 Jeongho Ahn, Kazuto Nakashima, Koki Yoshino, Yumi Iwashita, Ryo Kurazume http://arxiv.org/pdf/2410.08680v2 null
2024-10-14 Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation Hallo2:长时域高分辨率音频驱动的肖像图像动画技术 Jiahao Cui, Hui Li, Yao Yao, Hao Zhu, Hanlin Shang, Kaihui Cheng, Hang Zhou, Siyu Zhu, Jingdong Wang http://arxiv.org/pdf/2410.07718v2 null
2024-10-14 SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting SceneDreamer360:基于全景高斯涂抹的文本驱动3D一致性场景生成 Wenrui Li, Fucheng Cai, Yapeng Mi, Zhe Yang, Wangmeng Zuo, Xingtao Wang, Xiaopeng Fan http://arxiv.org/pdf/2408.13711v2 link
2024-10-14 Show-o: One Single Transformer to Unify Multimodal Understanding and Generation 展-o:单一Transformer统一多模态理解与生成 Jinheng Xie, Weijia Mao, Zechen Bai, David Junhao Zhang, Weihao Wang, Kevin Qinghong Lin, Yuchao Gu, Zhijie Chen, Zhenheng Yang, Mike Zheng Shou http://arxiv.org/pdf/2408.12528v5 null
2024-10-14 JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation JointDreamer:通过联合得分蒸馏确保文本到3D生成中的几何一致性及文本一致性 Chenhan Jiang, Yihan Zeng, Tianyang Hu, Songcun Xu, Wei Zhang, Hang Xu, Dit-Yan Yeung http://arxiv.org/pdf/2407.12291v2 null
2024-10-14 VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation VideoScore:构建自动度量标准以模拟视频生成中的细粒度人类反馈 Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhranil Chandra, Ziyan Jiang, Aaran Arulraj, et.al. http://arxiv.org/pdf/2406.15252v3 null
2024-10-14 Extracting Training Data from Unconditional Diffusion Models 从无条件扩散模型中提取训练数据 Yunhao Chen, Xingjun Ma, Difan Zou, Yu-Gang Jiang http://arxiv.org/pdf/2406.12752v2 null
2024-10-14 VideoTetris: Towards Compositional Text-to-Video Generation VideoTetris:迈向组合式文本到视频生成的探索 Ye Tian, Ling Yang, Haotian Yang, Yuan Gao, Yufan Deng, Jingmin Chen, Xintao Wang, Zhaochen Yu, Xin Tao, Pengfei Wan, et.al. http://arxiv.org/pdf/2406.04277v2 link
2024-10-14 TotalVibeSegmentator: Full Torso Segmentation for the NAKO and UK Biobank in Volumetric Interpolated Breath-hold Examination Body Images TotalVibeSegmentator: 针对NAKO和英国生物样本库在全容积插值屏气检查体图像中的全身躯干分割 Robert Graf, Paul-Sören Platzek, Evamaria Olga Riedel, Constanze Ramschütz, Sophie Starck, Hendrik Kristian Möller, Matan Atad, Henry Völzke, Robin Bülow, Carsten Oliver Schmidt, et.al. http://arxiv.org/pdf/2406.00125v2 link
2024-10-14 Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective 解构扩散模型的平滑性特性:高斯混合视角研究 Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou http://arxiv.org/pdf/2405.16418v2 null
2024-10-14 Sign Stitching: A Novel Approach to Sign Language Production 签名缝合:一种手语生成的新方法 Harry Walsh, Ben Saunders, Richard Bowden http://arxiv.org/pdf/2405.07663v2 link
2024-10-14 Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing 编辑您的动作:时空扩散解耦学习用于视频动作编辑 Yi Zuo, Lingling Li, Licheng Jiao, Fang Liu, Xu Liu, Wenping Ma, Shuyuan Yang, Yuwei Guo http://arxiv.org/pdf/2405.04496v2 null
2024-10-14 Generative inpainting of incomplete Euclidean distance matrices of trajectories generated by a fractional Brownian motion 生成式修复:基于分数布朗运动的轨迹生成的不完全欧几里得距离矩阵 Alexander Lobashev, Dmitry Guskov, Kirill Polovnikov http://arxiv.org/pdf/2404.07029v2 link
2024-10-14 Geometry-Informed Neural Networks 几何信息增强的神经网络 Arturs Berzins, Andreas Radler, Eric Volkmann, Sebastian Sanokowski, Sepp Hochreiter, Johannes Brandstetter http://arxiv.org/pdf/2402.14009v3 null
2024-10-14 RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models RealCompo:平衡真实性与构成性提升文本到图像扩散模型性能 Xinchen Zhang, Ling Yang, Yaqi Cai, Zhaochen Yu, Kai-Ni Wang, Jiake Xie, Ye Tian, Minkai Xu, Yong Tang, Yujiu Yang, et.al. http://arxiv.org/pdf/2402.12908v3 link

多模态

Publish Date Title Title_CN Authors PDF Code
2024-10-14 TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models "TemporalBench:多模态视频模型细粒度时间理解基准测试" Mu Cai, Reuben Tan, Jianrui Zhang, Bocheng Zou, Kai Zhang, Feng Yao, Fangrui Zhu, Jing Gu, Yiwu Zhong, Yuzhang Shang, et.al. http://arxiv.org/pdf/2410.10818v1 null
2024-10-14 Towards Foundation Models for 3D Vision: How Close Are We? 迈向三维视觉的基础模型:我们还有多远? Yiming Zuo, Karhan Kayan, Maggie Wang, Kevin Jeon, Jia Deng, Thomas L. Griffiths http://arxiv.org/pdf/2410.10799v1 null
2024-10-14 MMAR: Towards Lossless Multi-Modal Auto-Regressive Prababilistic Modeling MMAR:迈向无损多模态自回归概率建模 Jian Yang, Dacheng Yin, Yizhou Zhou, Fengyun Rao, Wei Zhai, Yang Cao, Zheng-Jun Zha http://arxiv.org/pdf/2410.10798v1 null
2024-10-14 Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes 条件感知的多模态融合方法用于驾驶场景的鲁棒语义感知 Tim Broedermann, Christos Sakaridis, Yuqian Fu, Luc Van Gool http://arxiv.org/pdf/2410.10791v1 null
2024-10-14 LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content LiveXiv——基于Arxiv论文内容的多模态实时评测基准 Nimrod Shabtay, Felipe Maia Polo, Sivan Doveh, Wei Lin, M. Jehanzeb Mirza, Leshem Chosen, Mikhail Yurochkin, Yuekai Sun, Assaf Arbelle, Leonid Karlinsky, et.al. http://arxiv.org/pdf/2410.10783v1 null
2024-10-14 Cross-Modal Few-Shot Learning: a Generative Transfer Learning Framework 跨模态少样本学习:一种生成式迁移学习框架 Zhengwei Yang, Yuke Li, Qiang Sun, Basura Fernando, Heng Huang, Zheng Wang http://arxiv.org/pdf/2410.10663v1 null
2024-10-14 BrainMVP: Multi-modal Vision Pre-training for Brain Image Analysis using Multi-parametric MRI 脑MVP:使用多参数MRI进行脑图像分析的多模态视觉预训练 Shaohao Rui, Lingzhi Chen, Zhenyu Tang, Lilong Wang, Mianxin Liu, Shaoting Zhang, Xiaosong Wang http://arxiv.org/pdf/2410.10604v1 null
2024-10-14 VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents VisRAG: 基于视觉的多模态文档检索增强生成方法 Shi Yu, Chaoyue Tang, Bokai Xu, Junbo Cui, Junhao Ran, Yukun Yan, Zhenghao Liu, Shuo Wang, Xu Han, Zhiyuan Liu, et.al. http://arxiv.org/pdf/2410.10594v1 null
2024-10-14 MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks MEGA-Bench:将多模态评估扩展至超过500个真实世界任务 Jiacheng Chen, Tianhao Liang, Sherman Siu, Zhengqing Wang, Kai Wang, Yubo Wang, Yuansheng Ni, Wang Zhu, Ziyan Jiang, Bohan Lyu, et.al. http://arxiv.org/pdf/2410.10563v1 null
2024-10-14 Hybrid Transformer for Early Alzheimer's Detection: Integration of Handwriting-Based 2D Images and 1D Signal Features 混合Transformer用于早期阿尔茨海默病检测:基于手写书法的二维图像与一维信号特征融合 Changqing Gong, Huafeng Qin, Mounîm A. El-Yacoubi http://arxiv.org/pdf/2410.10547v1 null
2024-10-14 Learning to Ground VLMs without Forgetting 标题翻译:学会在不忘却中扎根VLMs Aritra Bhowmik, Mohammad Mahdi Derakhshani, Dennis Koelma, Martin R. Oswald, Yuki M. Asano, Cees G. M. Snoek http://arxiv.org/pdf/2410.10491v1 null
2024-10-14 Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs 免费视频-LLM:提示引导的视觉感知实现高效无训练视频LLM Kai Han, Jianyuan Guo, Yehui Tang, Wei He, Enhua Wu, Yunhe Wang http://arxiv.org/pdf/2410.10441v1 null
2024-10-14 Class Balancing Diversity Multimodal Ensemble for Alzheimer's Disease Diagnosis and Early Detection 类平衡多样性多模态集成算法在阿尔茨海默病诊断与早期检测中的应用 Arianna Francesconi, Lazzaro di Biase, Donato Cappetta, Fabio Rebecchi, Paolo Soda, Rosa Sicilia, Valerio Guarrasi http://arxiv.org/pdf/2410.10374v1 null
2024-10-14 Spatial-Aware Efficient Projector for MLLMs via Multi-Layer Feature Aggregation 空间感知高效投影器:通过多层特征聚合实现多语言大型语言模型优化 Shun Qian, Bingquan Liu, Chengjie Sun, Zhen Xu, Baoxun Wang http://arxiv.org/pdf/2410.10319v1 null
2024-10-14 ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization ForgeryGPT:用于可解释图像伪造检测与定位的多模态大型语言模型 Jiawei Li, Fanrui Zhang, Jiaying Zhu, Esther Sun, Qiang Zhang, Zheng-Jun Zha http://arxiv.org/pdf/2410.10238v1 null
2024-10-14 Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention 消除视觉问答中的语言偏差:利用细粒度因果干预方法 Ying Liu, Ge Bai, Chenji Lu, Shilong Li, Zhang Zhang, Ruifang Liu, Wenbin Guo http://arxiv.org/pdf/2410.10184v1 null
2024-10-14 X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing X-Fi: 多模态人体感知的模态不变基础模型 Xinyan Chen, Jianfei Yang http://arxiv.org/pdf/2410.10167v1 null
2024-10-14 Performance Evaluation of Deep Learning and Transformer Models Using Multimodal Data for Breast Cancer Classification 深度学习与变换器模型在乳腺癌分类中利用多模态数据的性能评估 Sadam Hussain, Mansoor Ali, Usman Naseem, Beatriz Alejandra Bosques Palomo, Mario Alexis Monsivais Molina, Jorge Alberto Garza Abdala, Daly Betzabeth Avendano Avalos, Servando Cardona-Huerta, T. Aaron Gulliver, Jose Gerardo Tamez Pena http://arxiv.org/pdf/2410.10146v1 null
2024-10-14 MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models 大规模多模态交错理解基准MMIE:面向大型视觉-语言模型 Peng Xia, Siwei Han, Shi Qiu, Yiyang Zhou, Zhaoyang Wang, Wenhao Zheng, Zhaorun Chen, Chenhang Cui, Mingyu Ding, Linjie Li, et.al. http://arxiv.org/pdf/2410.10139v1 null
2024-10-14 Bridging the Gap between Text, Audio, Image, and Any Sequence: A Novel Approach using Gloss-based Annotation 桥接文本、音频、图像与任意序列之间的差距:一种基于词汇注释的新型方法 Sen Fang, Sizhou Chen, Yalin Feng, Xiaofeng Zhang, Teik Toe Teoh http://arxiv.org/pdf/2410.03146v2 null
2024-10-14 Benchmarking Vision Language Models for Cultural Understanding 视觉语言模型在文化理解方面的基准测试研究 Shravan Nayak, Kanishk Jain, Rabiul Awal, Siva Reddy, Sjoerd van Steenkiste, Lisa Anne Hendricks, Karolina Stańczak, Aishwarya Agrawal http://arxiv.org/pdf/2407.10920v3 null
2024-10-14 Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Large Multi-modal Models 提升低收入数据:大型多模态模型中的社会经济视角转换策略 Joan Nwatu, Oana Ignat, Rada Mihalcea http://arxiv.org/pdf/2407.02623v3 link
2024-10-14 The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval 多模态大型语言模型在视频时刻检索中的惊人有效性 Boris Meinardus, Anil Batra, Anna Rohrbach, Marcus Rohrbach http://arxiv.org/pdf/2406.18113v3 link
2024-10-14 White-box Multimodal Jailbreaks Against Large Vision-Language Models 白盒对抗多模态攻击大型视觉-语言模型研究 Ruofan Wang, Xingjun Ma, Hanxu Zhou, Chuanjun Ji, Guangnan Ye, Yu-Gang Jiang http://arxiv.org/pdf/2405.17894v2 null
2024-10-14 Improving Multimodal Learning with Multi-Loss Gradient Modulation 多损失梯度调制改善多模态学习性能 Konstantinos Kontras, Christos Chatzichristos, Matthew Blaschko, Maarten De Vos http://arxiv.org/pdf/2405.07930v2 null
2024-10-14 HAMMR: HierArchical MultiModal React agents for generic VQA HAMMR:用于通用视觉问答的层次多模态反应智能体 Lluis Castrejon, Thomas Mensink, Howard Zhou, Vittorio Ferrari, Andre Araujo, Jasper Uijlings http://arxiv.org/pdf/2404.05465v2 null
2024-10-14 TV-SAM: Increasing Zero-Shot Segmentation Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human Annotation TV-SAM:利用GPT-4生成的描述性提示提升多模态医学图像零样本分割性能,无需人工标注 Zekun Jiang, Dongjie Cheng, Ziyuan Qin, Jun Gao, Qicheng Lao, Abdullaev Bakhrom Ismoilovich, Urazboev Gayrat, Yuldashov Elyorbek, Bekchanov Habibullo, Defu Tang, et.al. http://arxiv.org/pdf/2402.15759v2 link
2024-10-14 Revisiting Few-Shot Object Detection with Vision-Language Models 重新审视基于视觉-语言模型的少样本目标检测 Anish Madan, Neehar Peri, Shu Kong, Deva Ramanan http://arxiv.org/pdf/2312.14494v4 link

Nerf

Publish Date Title Title_CN Authors PDF Code
2024-10-14 3DArticCyclists: Generating Simulated Dynamic 3D Cyclists for Human-Object Interaction (HOI) and Autonomous Driving Applications 3DArticCyclists:为人体-物体交互(HOI)和自动驾驶应用生成模拟动态3D骑行者 Eduardo R. Corral-Soto, Yang Liu, Tongtong Cao, Yuan Ren, Liu Bingbing http://arxiv.org/pdf/2410.10782v1 null
2024-10-14 SpikeGS: Learning 3D Gaussian Fields from Continuous Spike Stream SpikeGS:从连续尖峰流中学习三维高斯场 Jinze Yu, Xin Peng, Zhengda Lu, Laurent Kneip, Yiqun Wang http://arxiv.org/pdf/2409.15176v5 link

3DGS

Publish Date Title Title_CN Authors PDF Code
2024-10-14 4-LEGS: 4D Language Embedded Gaussian Splatting 4-LEGS:四维语言嵌入高斯扩散算法 Gal Fiebelman, Tamir Cohen, Ayellet Morgenstern, Peter Hedman, Hadar Averbuch-Elor http://arxiv.org/pdf/2410.10719v1 null
2024-10-14 4DStyleGaussian: Zero-shot 4D Style Transfer with Gaussian Splatting 4DStyleGaussian:基于高斯溅射的零样本4D风格迁移 Wanlin Liang, Hongbin Xu, Weitao Chen, Feng Xiao, Wenxiong Kang http://arxiv.org/pdf/2410.10412v1 null
2024-10-14 Event3DGS: Event-Based 3D Gaussian Splatting for High-Speed Robot Egomotion 基于事件驱动的三维高斯扩散体素化方法在高速度机器人自运动估计中的应用(Event3DGS) Tianyi Xiong, Jiayi Wu, Botao He, Cornelia Fermuller, Yiannis Aloimonos, Heng Huang, Christopher A. Metzler http://arxiv.org/pdf/2406.02972v4 null

模型压缩/优化

Publish Date Title Title_CN Authors PDF Code
2024-10-14 ROSAR: An Adversarial Re-Training Framework for Robust Side-Scan Sonar Object Detection ROSAR:一种用于鲁棒侧扫声纳目标检测的对抗性重训练框架 Martin Aubard, László Antal, Ana Madureira, Luis F. Teixeira, Erika Ábrahám http://arxiv.org/pdf/2410.10554v1 null
2024-10-14 QIANets: Quantum-Integrated Adaptive Networks for Reduced Latency and Improved Inference Times in CNN Models 量子集成自适应网络(QIANets):降低CNN模型延迟和提高推理时间的量子集成网络研究 Zhumazhan Balapanov, Edward Magongo, Vanessa Matvei, Olivia Holmberg, Jonathan Pei, Kevin Zhu http://arxiv.org/pdf/2410.10318v1 null
2024-10-14 LADMIM: Logical Anomaly Detection with Masked Image Modeling in Discrete Latent Space LADMIM:离散潜在空间中利用遮罩图像建模的逻辑异常检测方法 Shunsuke Sakai, Tatushito Hasegawa, Makoto Koshino http://arxiv.org/pdf/2410.10234v1 null
2024-10-14 REHRSeg: Unleashing the Power of Self-Supervised Super-Resolution for Resource-Efficient 3D MRI Segmentation Hyperspectral Image Reconstruction with Deep Learning 中文标题:自监督超分辨率在资源高效3D MRI分割中的力量释放:REHRSeg模型 Zhiyun Song, Yinjie Zhao, Xiaomin Li, Manman Fei, Xiangyu Zhao, Mengjun Liu, Cunjian Chen, Chung-Hsing Yeh, Qian Wang, Guoyan Zheng, et.al. http://arxiv.org/pdf/2410.10097v1 null
2024-10-14 Self-Distilled Depth Refinement with Noisy Poisson Fusion 自蒸馏深度细化与含噪泊松融合 Jiaqi Li, Yiran Wang, Jinghong Zheng, Zihao Huang, Ke Xian, Zhiguo Cao, Jianming Zhang http://arxiv.org/pdf/2409.17880v2 null
2024-10-14 ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers ADFQ-ViT:针对视觉变换器的激活分布友好型后训练量化方法 Yanfeng Jiang, Ning Sun, Xueshuo Xie, Fei Yang, Tao Li http://arxiv.org/pdf/2407.02763v2 null
2024-10-14 Improving Consistency Models with Generator-Induced Flows 提升一致性模型:引入生成器诱导流方法 Thibaut Issenhuth, Sangchul Lee, Ludovic Dos Santos, Jean-Yves Franceschi, Chansoo Kim, Alain Rakotomamonjy http://arxiv.org/pdf/2406.09570v2 link
2024-10-14 DD-RobustBench: An Adversarial Robustness Benchmark for Dataset Distillation DD-RobustBench:数据蒸馏对抗鲁棒性基准测试 Yifan Wu, Jiawei Du, Ping Liu, Yuewei Lin, Wei Xu, Wenqing Cheng http://arxiv.org/pdf/2403.13322v3 null

分类/检测/识别/分割/...

Publish Date Title Title_CN Authors PDF Code
2024-10-14 When Does Perceptual Alignment Benefit Vision Representations? 当感知对齐如何有利于视觉表征? Shobhita Sundaram, Stephanie Fu, Lukas Muttenthaler, Netanel Y. Tamir, Lucy Chai, Simon Kornblith, Trevor Darrell, Phillip Isola http://arxiv.org/pdf/2410.10817v1 null
2024-10-14 Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies 通用型人形机器人操作与改进的3D扩散策略 Yanjie Ze, Zixuan Chen, Wenhao Wang, Tianyi Chen, Xialin He, Ying Yuan, Xue Bin Peng, Jiajun Wu http://arxiv.org/pdf/2410.10803v1 link
2024-10-14 UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation UniMatch V2:半监督语义分割的极限探索 Lihe Yang, Zhen Zhao, Hengshuang Zhao http://arxiv.org/pdf/2410.10777v1 link
2024-10-14 Enhancing JEPAs with Spatial Conditioning: Robust and Efficient Representation Learning 增强JEPAs的空间条件化:稳健且高效的表示学习 Etai Littwin, Vimal Thilak, Anand Gopalakrishnan http://arxiv.org/pdf/2410.10773v1 null
2024-10-14 Adversarially Robust Out-of-Distribution Detection Using Lyapunov-Stabilized Embeddings 抗攻击稳健的分布外检测:基于李雅普诺夫稳定嵌入方法 Hossein Mirzaei, Mackenzie W. Mathis http://arxiv.org/pdf/2410.10744v1 null
2024-10-14 Benefiting from Quantum? A Comparative Study of Q-Seg, Quantum-Inspired Techniques, and U-Net for Crack Segmentation 量子优势何在?Q-Seg、量子启发技术与U-Net在裂缝分割中的比较研究 Akshaya Srinivasan, Alexander Geng, Antonio Macaluso, Maximilian Kiefer-Emmanouilidis, Ali Moghiseh http://arxiv.org/pdf/2410.10713v1 null
2024-10-14 Ensemble of ConvNeXt V2 and MaxViT for Long-Tailed CXR Classification with View-Based Aggregation 卷积Next V2与MaxViT集成用于基于视角聚合的长尾X光分类 Yosuke Yamagishi, SHouhei Hanaoka http://arxiv.org/pdf/2410.10710v1 null
2024-10-14 Early Diagnoses of Acute Lymphoblastic Leukemia Using YOLOv8 and YOLOv11 Deep Learning Models 急性淋巴细胞白血病的早期诊断:使用YOLOv8和YOLOv11深度学习模型 Alaa Awad, Mohamed Hegazy, Salah A. Aly http://arxiv.org/pdf/2410.10701v1 null
2024-10-14 PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion PCF-Lift:基于概率对比融合的全景提升算法 Runsong Zhu, Shi Qiu, Qianyi Wu, Ka-Hei Hui, Pheng-Ann Heng, Chi-Wing Fu http://arxiv.org/pdf/2410.10659v1 null
2024-10-14 MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer MoTE:视觉-语言到视频知识迁移中平衡泛化与特化的方法 Minghao Zhu, Zhengpu Wang, Mengxian Hu, Ronghao Dang, Xiao Lin, Xun Zhou, Chengju Liu, Qijun Chen http://arxiv.org/pdf/2410.10589v1 null
2024-10-14 TopoFR: A Closer Look at Topology Alignment on Face Recognition TopoFR:对面部识别中的拓扑对齐的深入研究 Jun Dan, Yang Liu, Jiankang Deng, Haoyu Xie, Siyuan Li, Baigui Sun, Shan Luo http://arxiv.org/pdf/2410.10587v1 null
2024-10-14 Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification 基于视觉-语言模型的查询原型多实例学习在增量全切片图像分类中的应用 Jiaxiang Gou, Luping Ji, Pei Liu, Mao Ye http://arxiv.org/pdf/2410.10573v1 null
2024-10-14 Preserving Cardiac Integrity: A Topology-Infused Approach to Whole Heart Segmentation 保护心脏完整性:一种融入拓扑结构的全心脏分割方法 Chenyu Zhang, Wenxue Guan, Xiaodan Xing, Guan Yang http://arxiv.org/pdf/2410.10551v1 null
2024-10-14 RICASSO: Reinforced Imbalance Learning with Class-Aware Self-Supervised Outliers Exposure RICASSO: 基于类感知的自监督异常值暴露增强的失衡学习 Xuan Zhang, Sin Chee Chin, Tingxuan Gao, Wenming Yang http://arxiv.org/pdf/2410.10548v1 null
2024-10-14 Motion-guided small MAV detection in complex and non-planar scenes 运动引导的复杂非平面场景中小型MAV检测 Hanqing Guo, Canlun Zheng, Shiyu Zhao http://arxiv.org/pdf/2410.10527v1 null
2024-10-14 Exploiting Local Features and Range Images for Small Data Real-Time Point Cloud Semantic Segmentation 利用局部特征和范围图像进行小数据实时点云语义分割 Daniel Fusaro, Simone Mosco, Emanuele Menegatti, Alberto Pretto http://arxiv.org/pdf/2410.10510v1 null
2024-10-14 Continual Learning Improves Zero-Shot Action Recognition 连续学习提升零样本动作识别性能 Shreyank N Gowda, Davide Moltisanti, Laura Sevilla-Lara http://arxiv.org/pdf/2410.10497v1 null
2024-10-14 Advancing Newborn Care: Precise Birth Time Detection Using AI-Driven Thermal Imaging with Adaptive Normalization 推进新生儿护理:基于自适应归一化的AI驱动热成像精确出生时间检测技术 Jorge García-Torres, Øyvind Meinich-Bache, Anders Johannessen, Siren Rettedal, Vilde Kolstad, Kjersti Engan http://arxiv.org/pdf/2410.10483v1 null
2024-10-14 Improve Meta-learning for Few-Shot Text Classification with All You Can Acquire from the Tasks 利用任务中可获得的一切提升元学习在少样本文本分类中的应用 Xinyue Liu, Yunlong Gao, Linlin Zong, Bo Xu http://arxiv.org/pdf/2410.10454v1 null
2024-10-14 LKASeg:Remote-Sensing Image Semantic Segmentation with Large Kernel Attention and Full-Scale Skip Connections LKASeg:基于大核注意力与全尺度跳跃连接的遥感图像语义分割 Xuezhi Xiang, Yibo Ning, Lei Zhang, Denis Ombati, Himaloy Himu, Xiantong Zhen http://arxiv.org/pdf/2410.10433v1 null
2024-10-14 Reverse Refinement Network for Narrow Rural Road Detection in High-Resolution Satellite Imagery 高分辨率卫星影像中窄乡村道路检测的逆向精炼网络 Ningjing Wang, Xinyu Wang, Yang Pan, Wanqiang Yao, Yanfei Zhong http://arxiv.org/pdf/2410.10389v1 null
2024-10-14 V2M: Visual 2-Dimensional Mamba for Image Representation Learning V2M:面向图像表征学习的二维曼巴视觉模型 Chengkun Wang, Wenzhao Zheng, Yuanhui Huang, Jie Zhou, Jiwen Lu http://arxiv.org/pdf/2410.10382v1 null
2024-10-14 Affinity-Graph-Guided Contractive Learning for Pretext-Free Medical Image Segmentation with Minimal Annotation 基于亲和图引导的紧缩学习无预处理的极小标注医学图像分割 Zehua Cheng, Di Yuan, Thomas Lukasiewicz http://arxiv.org/pdf/2410.10366v1 null
2024-10-14 Pubic Symphysis-Fetal Head Segmentation Network Using BiFormer Attention Mechanism and Multipath Dilated Convolution 基于BiFormer注意力机制与多路径扩张卷积的耻骨联合-胎儿头部分割网络 Pengzhou Cai, Lu Jiang, Yanxin Li, Xiaojuan Liu, Libin Lan http://arxiv.org/pdf/2410.10352v1 null
2024-10-14 GlobalMamba: Global Image Serialization for Vision Mamba GlobalMamba:视觉Mamba的全局图像序列化方法 Chengkun Wang, Wenzhao Zheng, Jie Zhou, Jiwen Lu http://arxiv.org/pdf/2410.10316v1 null
2024-10-14 ROA-BEV: 2D Region-Oriented Attention for BEV-based 3D Object ROA-BEV:基于BEV的3D目标检测的2D区域导向注意力机制 Jiwei Chen, Laiyan Ding, Chi Zhang, Feifei Li, Rui Huang http://arxiv.org/pdf/2410.10298v1 null
2024-10-14 Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection 细粒度异常提示学习用于零样本异常检测 Jiawen Zhu, Yew-Soon Ong, Chunhua Shen, Guansong Pang http://arxiv.org/pdf/2410.10289v1 null
2024-10-14 Manifold-Aware Local Feature Modeling for Semi-Supervised Medical Image Segmentation 曼ifold感知的局部特征建模在半监督医学图像分割中的应用 Sicheng Shen, Jinming Cao, Yifang Yin, Roger Zimmermann http://arxiv.org/pdf/2410.10287v1 null
2024-10-14 Two-Stage Approach for Brain MR Image Synthesis: 2D Image Synthesis and 3D Refinement 两阶段脑部MR图像合成方法:2D图像合成与3D细化 Jihoon Cho, Seunghyuck Park, Jinah Park http://arxiv.org/pdf/2410.10269v1 null
2024-10-14 big.LITTLE Vision Transformer for Efficient Visual Recognition 高效视觉识别的大.LITTLE视觉Transformer He Guo, Yulong Wang, Zixuan Ye, Jifeng Dai, Yuwen Xiong http://arxiv.org/pdf/2410.10267v1 null
2024-10-14 Capture Artifacts via Progressive Disentangling and Purifying Blended Identities for Deepfake Detection 通过渐进式解耦与净化混合身份捕捉伪造痕迹用于深度伪造检测 Weijie Zhou, Xiaoqing Luo, Zhancheng Zhang, Jiachen He, Xiaojun Wu http://arxiv.org/pdf/2410.10244v1 link
2024-10-14 Innovative Deep Learning Techniques for Obstacle Recognition: A Comparative Study of Modern Detection Algorithms 创新深度学习技术在障碍物识别中的应用:现代检测算法比较研究 Santiago Pérez, Camila Gómez, Matías Rodríguez http://arxiv.org/pdf/2410.10096v1 null
2024-10-14 Out-of-Bounding-Box Triggers: A Stealthy Approach to Cheat Object Detectors for Adversarial Attacks 中文翻译:越界盒触发器:一种针对对抗性攻击的隐秘欺骗目标检测器方法 Tao Lin, Lijia Yu, Gaojie Jin, Renjue Li, Peng Wu, Lijun Zhang http://arxiv.org/pdf/2410.10091v1 null
2024-10-14 PointNet with KAN versus PointNet with MLP for 3D Classification and Segmentation of Point Sets 点集三维分类与分割中KAN-PointNet与MLP-PointNet的比较研究 Ali Kashefi http://arxiv.org/pdf/2410.10084v1 null
2024-10-14 LIME-Eval: Rethinking Low-light Image Enhancement Evaluation via Object Detection LIME-Eval:基于目标检测的低光照图像增强评估方法再思考 Mingjia Li, Hao Zhao, Xiaojie Guo http://arxiv.org/pdf/2410.08810v2 link
2024-10-14 Finetuning YOLOv9 for Vehicle Detection: Deep Learning for Intelligent Transportation Systems in Dhaka, Bangladesh 微调YOLOv9进行车辆检测:深度学习在孟加拉国达卡智能交通系统中的应用 Shahriar Ahmad Fahim http://arxiv.org/pdf/2410.08230v2 null
2024-10-14 MCTrack: A Unified 3D Multi-Object Tracking Framework for Autonomous Driving MCTrack:自动驾驶统一三维多目标跟踪框架 Xiyang Wang, Shouzheng Qi, Jieyou Zhao, Hangning Zhou, Siyu Zhang, Guoan Wang, Kai Tu, Songlin Guo, Jianbo Zhao, Jian Li, et.al. http://arxiv.org/pdf/2409.16149v2 link
2024-10-14 Automatic Classification of White Blood Cell Images using Convolutional Neural Network 利用卷积神经网络实现白细胞图像的自动分类 Rabia Asghar, Arslan Shaukat, Usman Akram, Rimsha Tariq http://arxiv.org/pdf/2409.13442v4 null
2024-10-14 MedSegMamba: 3D CNN-Mamba Hybrid Architecture for Brain Segmentation MedSegMamba:用于脑部分割的3D CNN-Mamba混合架构 Aaron Cao, Zongyu Li, Jordan Jomsky, Andrew F. Laine, Jia Guo http://arxiv.org/pdf/2409.08307v3 link
2024-10-14 AnyDesign: Versatile Area Fashion Editing via Mask-Free Diffusion AnyDesign:基于无掩码扩散的通用区域时尚编辑方法 Yunfang Niu, Lingxiang Wu, Dong Yi, Jie Peng, Ning Jiang, Haiying Wu, Jinqiao Wang http://arxiv.org/pdf/2408.11553v3 link
2024-10-14 Detecting Audio-Visual Deepfakes with Fine-Grained Inconsistencies 检测音视频深度伪造中的细粒度不一致性 Marcella Astrid, Enjie Ghorbel, Djamila Aouada http://arxiv.org/pdf/2408.06753v3 null
2024-10-14 Designing Extremely Memory-Efficient CNNs for On-device Vision Tasks 设计极致内存高效的卷积神经网络以支持设备端视觉任务 Jaewook Lee, Yoel Park, Seulki Lee http://arxiv.org/pdf/2408.03663v2 null
2024-10-14 ParCon: Noise-Robust Collaborative Perception via Multi-module Parallel Connection ParCon:基于多模块并行连接的噪声鲁棒协同感知 Hyunchul Bae, Minhee Kang, Heejin Ahn http://arxiv.org/pdf/2407.11546v2 null
2024-10-14 Exploring the Potential of Polynomial Basis Functions in Kolmogorov-Arnold Networks: A Comparative Study of Different Groups of Polynomials 探索多项式基函数在Kolmogorov-Arnold网络中的潜力:不同组多项式的比较研究 Seyd Teymoor Seydi http://arxiv.org/pdf/2406.02583v2 link
2024-10-14 Advancing Supervised Local Learning Beyond Classification with Long-term Feature Bank 推进监督式局部学习:基于长期特征银行的超越分类方法 Feiyu Zhu, Yuming Zhang, Changpeng Cai, Chenghao He, Xiuyuan Guo, Jiao Li, Peizhe Wang, Junhao Su, Jialin Gao http://arxiv.org/pdf/2406.00446v2 null
2024-10-14 Neural Collapse Meets Differential Privacy: Curious Behaviors of NoisyGD with Near-perfect Representation Learning 神经崩溃与差分隐私相遇:噪声GD在近乎完美表征学习中的好奇行为 Chendi Wang, Yuqing Zhu, Weijie J. Su, Yu-Xiang Wang http://arxiv.org/pdf/2405.08920v3 null
2024-10-14 Multi-scale direction-aware SAR object detection network via global information fusion 多尺度方向感知SAR目标检测网络:全局信息融合方法 Mingxiang Cao, Weiying Xie, Jie Lei, Jiaqing Zhang, Daixun Li, Yunsong Li http://arxiv.org/pdf/2312.16943v5 null

OCR

Publish Date Title Title_CN Authors PDF Code
2024-10-14 ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training ReLayout:基于布局增强预训练的面向真实世界文档理解方法 Zhouqiang Jiang, Bowen Wang, Junhao Chen, Yuta Nakashima http://arxiv.org/pdf/2410.10471v1 null

图像理解

Publish Date Title Title_CN Authors PDF Code
2024-10-14 Fusion-Driven Tree Reconstruction and Fruit Localization: Advancing Precision in Agriculture 融合驱动树重建与果实定位:提升农业精准度 Kaiming Fu, Peng Wei, Juan Villacres, Zhaodan Kong, Stavros G. Vougioukas, Brian N. Bailey http://arxiv.org/pdf/2310.15138v2 null

Transformer

Publish Date Title Title_CN Authors PDF Code
2024-10-14 Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention Virtual Worlds 中文翻译:Cavia:基于视角集成注意力的摄像头可控多视角视频扩散虚拟世界方法 Dejia Xu, Yifan Jiang, Chen Huang, Liangchen Song, Thorsten Gernoth, Liangliang Cao, Zhangyang Wang, Hao Tang http://arxiv.org/pdf/2410.10774v1 null
2024-10-14 DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model 驱动道场数据集:推进互动性和知识增强的驾驶世界模型发展 Yuqi Wang, Ke Cheng, Jiawei He, Qitai Wang, Hengchen Dai, Yuntao Chen, Fei Xia, Zhaoxiang Zhang http://arxiv.org/pdf/2410.10738v1 null
2024-10-14 Transforming Game Play: A Comparative Study of DCQN and DTQN Architectures in Reinforcement Learning 游戏玩法转换:强化学习中DCQN与DTQN架构的对比研究 William A. Stigall http://arxiv.org/pdf/2410.10660v1 null
2024-10-14 Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling 定制您的视觉自回归食谱:集自回归建模方法 Wenze Liu, Le Zhuo, Yi Xin, Sheng Xia, Peng Gao, Xiangyu Yue http://arxiv.org/pdf/2410.10511v1 null
2024-10-14 Domain-Conditioned Transformer for Fully Test-time Adaptation 域条件Transformer用于全测试时间自适应 Yushun Tang, Shuoshuo Chen, Jiyuan Jia, Yi Zhang, Zhihai He http://arxiv.org/pdf/2410.10442v1 null
2024-10-14 Parameterize Structure with Differentiable Template for 3D Shape Generation 参数化结构:用于三维形状生成的可微模板 Changfeng Ma, Pengxiao Guo, Shuangyu Yang, Yinuo Chen, Jie Guo, Chongjun Wang, Yanwen Guo, Wenping Wang http://arxiv.org/pdf/2410.10399v1 null
2024-10-14 FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification FasterDiT:无需架构修改的更快扩散Transformer训练方法 Jingfeng Yao, Wang Cheng, Wenyu Liu, Xinggang Wang http://arxiv.org/pdf/2410.10356v1 null
2024-10-14 A Consistency-Aware Spot-Guided Transformer for Versatile and Hierarchical Point Cloud Registration 一致性感知的斑点引导Transformer用于灵活与层次化点云配准 Renlang Huang, Yufan Tang, Jiming Chen, Liang Li http://arxiv.org/pdf/2410.10295v1 null
2024-10-14 KNN Transformer with Pyramid Prompts for Few-Shot Learning KNN金字塔提示Transformer用于少样本学习 Wenhao Li, Qiangchang Wang, Peng Zhao, Yilong Yin http://arxiv.org/pdf/2410.10227v1 null
2024-10-14 Interaction-Guided Two-Branch Image Dehazing Network 交互引导的双分支图像去雾网络 Huichun Liu, Xiaosong Li, Tianshu Tan http://arxiv.org/pdf/2410.10121v1 null
2024-10-14 Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models 专家混合模型的个性化:面向视觉-语言模型的联邦提示学习 Jun Luo, Chen Chen, Shandong Wu http://arxiv.org/pdf/2410.10114v1 null
2024-10-14 Accelerating Diffusion Transformers with Token-wise Feature Caching 加速基于Token-wise特征缓存的扩散Transformer模型 Chang Zou, Xuyang Liu, Ting Liu, Siteng Huang, Linfeng Zhang http://arxiv.org/pdf/2410.05317v2 link
2024-10-14 Learning to Balance: Diverse Normalization for Cloth-Changing Person Re-Identification 学习平衡:面向换衣人物再识别的多样化归一化技术 Hongjun Wang, Jiyuan Chen, Zhengwei Yin, Xuan Song, Yinqiang Zheng http://arxiv.org/pdf/2410.03977v2 null

3D/CG

Publish Date Title Title_CN Authors PDF Code
2024-10-14 Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes Sitcom-Crafter:三维场景中的剧情驱动人体运动生成系统 Jianqi Chen, Panwen Hu, Xiaojun Chang, Zhenwei Shi, Michael Christian Kampffmeyer, Xiaodan Liang http://arxiv.org/pdf/2410.10790v1 null
2024-10-14 Self-Assessed Generation: Trustworthy Label Generation for Optical Flow and Stereo Matching in Real-world 自评估生成:真实世界光学流与立体匹配的可信标签生成方法 Han Ling, Yinghui Sun, Quansen Sun, Ivor Tsang, Yuhui Zheng http://arxiv.org/pdf/2410.10453v1 null
2024-10-14 On Representation of 3D Rotation in the Context of Deep Learning 在深度学习背景下三维旋转的表示研究 Viktória Pravdová, Lukáš Gajdošech, Hassan Ali, Viktor Kocur http://arxiv.org/pdf/2410.10350v1 null
2024-10-14 Animate-X: Universal Character Image Animation with Enhanced Motion Representation Animate-X: 基于增强运动表征的通用角色图像动画技术 Shuai Tan, Biao Gong, Xiang Wang, Shiwei Zhang, Dandan Zheng, Ruobing Zheng, Kecheng Zheng, Jingdong Chen, Ming Yang http://arxiv.org/pdf/2410.10306v1 null
2024-10-14 Slide-based Graph Collaborative Training for Histopathology Whole Slide Image Analysis 基于幻灯片图谱的协同训练在组织病理学全切片图像分析中的应用 Jun Shi, Tong Shu, Zhiguo Jiang, Wei Wang, Haibo Wu, Yushan Zheng http://arxiv.org/pdf/2410.10260v1 null
2024-10-14 Fast and Accurate Neural Rendering Using Semi-Gradients 快速准确的半梯度神经渲染方法 In-Young Cho, Jaewoong Cho http://arxiv.org/pdf/2410.10149v1 null
2024-10-14 Hi-Mamba: Hierarchical Mamba for Efficient Image Super-Resolution Hi-Mamba:用于高效图像超分辨的层次化Mamba算法 Junbo Qiao, Jincheng Liao, Wei Li, Yulun Zhang, Yong Guo, Yi Wen, Zhangxizi Qiu, Jiao Xie, Jie Hu, Shaohui Lin http://arxiv.org/pdf/2410.10140v1 null
2024-10-14 ScaleFlow++: Robust and Accurate Estimation of 3D Motion from Video ScaleFlow++:视频中的稳健与精确三维运动估计 Han Ling, Yinghui Sun, Quansen Sun, Yuhui Zheng http://arxiv.org/pdf/2409.12202v2 link
2024-10-14 Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation 基于3D特征场的视觉-语言导航的模拟到真实迁移 Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Shuqiang Jiang http://arxiv.org/pdf/2406.09798v3 link
2024-10-14 GAIA: Rethinking Action Quality Assessment for AI-Generated Videos GAIA:重新审视AI生成视频中的动作质量评估 Zijian Chen, Wei Sun, Yuan Tian, Jun Jia, Zicheng Zhang, Jiarui Wang, Ru Huang, Xiongkuo Min, Guangtao Zhai, Wenjun Zhang http://arxiv.org/pdf/2406.06087v2 link
2024-10-14 Twisting Lids Off with Two Hands 两手持握解锁旋转盖 Toru Lin, Zhao-Heng Yin, Haozhi Qi, Pieter Abbeel, Jitendra Malik http://arxiv.org/pdf/2403.02338v2 null

各类学习方式

Publish Date Title Title_CN Authors PDF Code
2024-10-14 Exploring Semi-Supervised Learning for Online Mapping 探索在线建图中的半监督学习技术 Adam Lilja, Erik Wallin, Junsheng Fu, Lars Hammarstrand http://arxiv.org/pdf/2410.10279v1 null
2024-10-14 Unsupervised Point Cloud Completion through Unbalanced Optimal Transport 无监督点云补全通过非平衡最优传输实现 Taekyung Lee, Jaemoo Choi, Myungjoo Kang, Jaewoong Choi http://arxiv.org/pdf/2410.02671v2 null
2024-10-14 Learning Temporally Equivariance for Degenerative Disease Progression in OCT by Predicting Future Representations 学习时序等变特性以预测OCT中退行性疾病进展的未来表征 Taha Emre, Arunava Chakravarty, Dmitrii Lachinov, Antoine Rivail, Ursula Schmidt-Erfurth, Hrvoje Bogunović http://arxiv.org/pdf/2405.09404v2 link
2024-10-14 Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation 探索无标注图像字幕生成:基于检索增强的伪句子生成方法 Zhiyuan Li, Dongnan Liu, Heng Wang, Chaoyi Zhang, Weidong Cai http://arxiv.org/pdf/2307.14750v3 link

其他

Publish Date Title Title_CN Authors PDF Code
2024-10-14 LVD-2M: A Long-take Video Dataset with Temporally Dense Captions LVD-2M:具有时间密集型字幕的长镜头视频数据集 Tianwei Xiong, Yuqing Wang, Daquan Zhou, Zhijie Lin, Jiashi Feng, Xihui Liu http://arxiv.org/pdf/2410.10816v1 null
2024-10-14 Deep Linear Probe Generators for Weight Space Learning 深度线性探针生成器在权重空间学习中的应用 Jonathan Kahana, Eliahu Horwitz, Imri Shuval, Yedid Hoshen http://arxiv.org/pdf/2410.10811v1 null
2024-10-14 A Counterexample in Image Registration ,中文翻译为:图像配准中的一个反例 Serap A. Savari http://arxiv.org/pdf/2410.10725v1 null
2024-10-14 Artificial Intelligence-Based Triaging of Cutaneous Melanocytic Lesions 基于人工智能的皮肤黑素细胞病变分拣方法 Ruben T. Lucassen, Nikolas Stathonikos, Gerben E. Breimer, Mitko Veta, Willeke A. M. Blokx http://arxiv.org/pdf/2410.10509v1 null
2024-10-14 A Novel No-Reference Image Quality Metric For Assessing Sharpness In Satellite Imagery 一种新型无参考图像质量评价指标:评估卫星图像清晰度 Lucas Gonzalo Antonel http://arxiv.org/pdf/2410.10488v1 null
2024-10-14 PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation PIVOT-R:面向机器人操作的原语驱动路径点感知世界模型 Kaidong Zhang, Pengzhen Ren, Bingqian Lin, Junfan Lin, Shikui Ma, Hang Xu, Xiaodan Liang http://arxiv.org/pdf/2410.10394v1 null
2024-10-14 Automated extraction of 4D aircraft trajectories from video recordings 4D飞机轨迹从视频记录中的自动提取 Jean-François Villeforceix http://arxiv.org/pdf/2410.10249v1 null
2024-10-14 LOBG:Less Overfitting for Better Generalization in Vision-Language Model LOBG:降低视觉-语言模型过拟合以提升泛化能力 Chenhao Ding, Xinyuan Gao, Songlin Dong, Yuhang He, Qiang Wang, Alex Kot, Yihong Gong http://arxiv.org/pdf/2410.10247v1 null
2024-10-14 MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting MuseTalk:基于潜在空间修复的实时高质量唇同步技术 Yue Zhang, Minhao Liu, Zhaokang Chen, Bin Wu, Yubin Zeng, Chao Zhan, Yingjie He, Junxin Huang, Wenjiang Zhou http://arxiv.org/pdf/2410.10122v1 null
2024-10-14 Can We Predict Performance of Large Models across Vision-Language Tasks? 能否预测大型模型在视觉-语言任务中的性能表现? Qinyu Zhao, Ming Xu, Kartik Gupta, Akshay Asthana, Liang Zheng, Stephen Gould http://arxiv.org/pdf/2410.10112v1 null
2024-10-14 Learning to Customize Text-to-Image Diffusion In Diverse Context 学习在多样化情境中定制文本到图像扩散 Taewook Kim, Wei Chen, Qiang Qiu http://arxiv.org/pdf/2410.10058v1 null
2024-10-14 Enhancing Performance of Point Cloud Completion Networks with Consistency Loss 增强点云补全网络性能的一致性损失方法 Christofel Rio Goenawan, Kevin Tirta Wijaya, Seung-Hyun Kong http://arxiv.org/pdf/2410.07298v2 null
2024-10-14 Autoencoded Image Compression for Secure and Fast Transmission 自编码图像压缩实现安全快速传输 Aryan Kashyap Naveen, Sunil Thunga, Anuhya Murki, Mahati A Kalale, Shriya Anil http://arxiv.org/pdf/2407.03990v2 link
2024-10-14 A Review of Electromagnetic Elimination Methods for low-field portable MRI scanner 低场便携式MRI扫描仪电磁消除方法综述 Wanyu Bian, Panfeng Li, Mengyao Zheng, Chihang Wang, Anying Li, Ying Li, Haowei Ni, Zixuan Zeng http://arxiv.org/pdf/2406.17804v2 null
2024-10-14 AdaPose: Towards Cross-Site Device-Free Human Pose Estimation with Commodity WiFi AdaPose:基于商用WiFi的跨站点无设备人体姿态估计研究 Yunjiao Zhou, Jianfei Yang, He Huang, Lihua Xie http://arxiv.org/pdf/2309.16964v2 null
2024-10-14 AR-TTA: A Simple Method for Real-World Continual Test-Time Adaptation AR-TTA:一种面向真实世界持续测试时适应的简单方法 Damian Sójka, Sebastian Cygert, Bartłomiej Twardowski, Tomasz Trzciński http://arxiv.org/pdf/2309.10109v2 null
2024-10-14 MT-SNN: Enhance Spiking Neural Network with Multiple Thresholds MT-SNN:多阈值增强型脉冲神经网络 Xiaoting Wang, Yanxiang Zhang http://arxiv.org/pdf/2303.11127v2 null