Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-14 | Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models | Tex4D: 零样本4D场景纹理生成与视频扩散模型 | Jingzhi Bao, Xueting Li, Ming-Hsuan Yang | http://arxiv.org/pdf/2410.10821v1 | null |
2024-10-14 | Depth Any Video with Scalable Synthetic Data | 深度增强任意视频:可扩展合成数据方法 | Honghui Yang, Di Huang, Wei Yin, Chunhua Shen, Haifeng Liu, Xiaofei He, Binbin Lin, Wanli Ouyang, Tong He | http://arxiv.org/pdf/2410.10815v1 | null |
2024-10-14 | HART: Efficient Visual Generation with Hybrid Autoregressive Transformer | HART:基于混合自回归变换器的高效视觉生成 | Haotian Tang, Yecheng Wu, Shang Yang, Enze Xie, Junsong Chen, Junyu Chen, Zhuoyang Zhang, Han Cai, Yao Lu, Song Han | http://arxiv.org/pdf/2410.10812v1 | null |
2024-10-14 | TrajDiffuse: A Conditional Diffusion Model for Environment-Aware Trajectory Prediction | TrajDiffuse:一种环境感知的条件扩散模型用于轨迹预测 | Qingze, Liu, Danrui Li, Samuel S. Sohn, Sejong Yoon, Mubbasir Kapadia, Vladimir Pavlovic | http://arxiv.org/pdf/2410.10804v1 | null |
2024-10-14 | Boosting Camera Motion Control for Video Diffusion Transformers | 增强视频扩散变换器的摄像头运动控制 | Soon Yau Cheong, Duygu Ceylan, Armin Mustafa, Andrew Gilbert, Chun-Hao Paul Huang | http://arxiv.org/pdf/2410.10802v1 | null |
2024-10-14 | Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations | 使用矫正随机微分方程的语义图像反转与编辑 | Litu Rout, Yujia Chen, Nataniel Ruiz, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu | http://arxiv.org/pdf/2410.10792v1 | null |
2024-10-14 | ControlMM: Controllable Masked Motion Generation | 控制MM:可控遮罩动作生成方法 | Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Korrawe Karunratanakul, Pu Wang, Hongfei Xue, Chen Chen, Chuan Guo, Junli Cao, Jian Ren, Sergey Tulyakov | http://arxiv.org/pdf/2410.10780v1 | null |
2024-10-14 | Adaptive Diffusion Terrain Generator for Autonomous Uneven Terrain Navigation | 自适应扩散地形生成器在自主不平坦地形导航中的应用 | Youwei Yu, Junhong Xu, Lantao Liu | http://arxiv.org/pdf/2410.10766v1 | null |
2024-10-14 | DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships | DragEntity:基于实体与位置关系的轨迹引导视频生成技术 | Zhang Wan, Sheng Tang, Jiawei Wei, Ruize Zhang, Juan Cao | http://arxiv.org/pdf/2410.10751v1 | null |
2024-10-14 | FlexGen: Flexible Multi-View Generation from Text and Image Inputs | FlexGen:基于文本和图像输入的灵活多视角生成方法 | Xinli Xu, Wenhang Ge, Jiantao Lin, Jiawei Feng, Lie Xu, HanFeng Zhao, Shunsi Zhang, Ying-Cong Chen | http://arxiv.org/pdf/2410.10745v1 | null |
2024-10-14 | Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models | 深度压缩自编码器用于高效高分辨率扩散模型 | Junyu Chen, Han Cai, Junsong Chen, Enze Xie, Shang Yang, Haotian Tang, Muyang Li, Yao Lu, Song Han | http://arxiv.org/pdf/2410.10733v1 | null |
2024-10-14 | TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model | TALK-Act: 基于扩散模型的二维说话虚拟人复现中的纹理感知增强方法 | Jiazhi Guan, Quanwei Yang, Kaisiyuan Wang, Hang Zhou, Shengyi He, Zhiliang Xu, Haocheng Feng, Errui Ding, Jingdong Wang, Hongtao Xie, et.al. | http://arxiv.org/pdf/2410.10696v1 | null |
2024-10-14 | Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation | 双耳全开:迈向语言驱动的空间音频生成技术 | Peiwen Sun, Sitong Cheng, Xiangtai Li, Zhen Ye, Huadai Liu, Honggang Zhang, Wei Xue, Yike Guo | http://arxiv.org/pdf/2410.10676v1 | null |
2024-10-14 | SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers | SANA:基于线性扩散变换器的高效高分辨率图像合成 | Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Yujun Lin, Zhekai Zhang, Muyang Li, Yao Lu, Song Han | http://arxiv.org/pdf/2410.10629v1 | null |
2024-10-14 | Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing | 视觉引导与遮罩增强的自适应去噪在基于提示的图像编辑中的应用 | Kejie Wang, Xuemeng Song, Meng Liu, Weili Guan, Liqiang Nie | http://arxiv.org/pdf/2410.10496v1 | null |
2024-10-14 | Towards Reliable Verification of Unauthorized Data Usage in Personalized Text-to-Image Diffusion Models | 针对个性化文本到图像扩散模型中未经授权数据使用的可靠验证研究 | Boheng Li, Yanhao Wei, Yankai Fu, Zhenting Wang, Yiming Li, Jie Zhang, Run Wang, Tianwei Zhang | http://arxiv.org/pdf/2410.10437v1 | null |
2024-10-14 | DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model | DOME:将扩散模型驯化为高保真可控占用世界模型 | Songen Gu, Wei Yin, Bu Jin, Xiaoyang Guo, Junming Wang, Haodong Li, Qian Zhang, Xiaoxiao Long | http://arxiv.org/pdf/2410.10429v1 | null |
2024-10-14 | Anatomical feature-prioritized loss for enhanced MR to CT translation | 基于解剖特征优先级的增强型MR到CT转换损失函数研究 | Arthur Longuefosse, Baudouin Denis de Senneville, Gael Dournes, Ilyes Benlala, Pascal Desbarats, Fabien Baldacci | http://arxiv.org/pdf/2410.10328v1 | null |
2024-10-14 | LG-CAV: Train Any Concept Activation Vector with Language Guidance | LG-CAV:使用语言指导训练任意概念激活向量 | Qihan Huang, Jie Song, Mengqi Xue, Haofei Zhang, Bingde Hu, Huiqiong Wang, Hao Jiang, Xingen Wang, Mingli Song | http://arxiv.org/pdf/2410.10308v1 | null |
2024-10-14 | Saliency Guided Optimization of Diffusion Latents | 显著性引导的扩散潜在优化研究 | Xiwen Wang, Jizhe Zhou, Xuekang Zhu, Cheng Li, Mao Li | http://arxiv.org/pdf/2410.10257v1 | null |
2024-10-14 | Detecting Unforeseen Data Properties with Diffusion Autoencoder Embeddings using Spine MRI data | 使用脊柱MRI数据的扩散自编码器嵌入检测未预见的数据属性 | Robert Graf, Florian Hunecke, Soeren Pohl, Matan Atad, Hendrik Moeller, Sophie Starck, Thomas Kroencke, Stefanie Bette, Fabian Bamberg, Tobias Pischon, et.al. | http://arxiv.org/pdf/2410.10220v1 | null |
2024-10-14 | MagicEraser: Erasing Any Objects via Semantics-Aware Control | MagicEraser:通过语义感知控制擦除任意对象 | Fan Li, Zixiao Zhang, Yi Huang, Jianzhuang Liu, Renjing Pei, Bin Shao, Songcen Xu | http://arxiv.org/pdf/2410.10207v1 | null |
2024-10-14 | Identity-Focused Inference and Extraction Attacks on Diffusion Models | 针对扩散模型的以身份聚焦的推理与提取攻击 | Jayneel Vora, Aditya Krishnan, Nader Bouacida, Prabhu RV Shankar, Prasant Mohapatra | http://arxiv.org/pdf/2410.10177v1 | null |
2024-10-14 | Generative Human Video Compression with Multi-granularity Temporal Trajectory Factorization | 生成式人类视频压缩:多粒度时间轨迹分解 | Shanzhi Yin, Bolin Chen, Shiqi Wang, Yan Ye | http://arxiv.org/pdf/2410.10171v1 | null |
2024-10-14 | First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending | 首先构建背景再渲染文本:视觉文本融合的一种新范式 | Zhenhang Li, Yan Shu, Weichao Zeng, Dongbao Yang, Yu Zhou | http://arxiv.org/pdf/2410.10168v1 | link |
2024-10-14 | Will the Inclusion of Generated Data Amplify Bias Across Generations in Future Image Classification Models? | 是否包含生成数据会放大未来图像分类模型中各代之间的偏见? | Zeliang Zhang, Xin Liang, Mingqian Feng, Susan Liang, Chenliang Xu | http://arxiv.org/pdf/2410.10160v1 | null |
2024-10-14 | TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control | 基于先验引导控制的扩散场景文本编辑方法(TextCtrl) | Weichao Zeng, Yan Shu, Zhenhang Li, Dongbao Yang, Yu Zhou | http://arxiv.org/pdf/2410.10133v1 | null |
2024-10-14 | StegaINR4MIH: steganography by implicit neural representation for multi-image hiding | 隐式神经表示的多图像隐藏隐写术:StegaINR4MIH | Weina Dong, Jia Liu, Lifeng Chen, Wenquan Sun, Xiaozhong Pan, Yan Ke | http://arxiv.org/pdf/2410.10117v1 | null |
2024-10-14 | High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity | 高精度二分图像分割:基于探测扩散能力的算法 | Qian Yu, Peng-Tao Jiang, Hao Zhang, Jinwei Chen, Bo Li, Lihe Zhang, Huchuan Lu | http://arxiv.org/pdf/2410.10105v1 | null |
2024-10-14 | The Ingredients for Robotic Diffusion Transformers | 机器人扩散变换器的组成要素 | Sudeep Dasari, Oier Mees, Sebastian Zhao, Mohan Kumar Srirama, Sergey Levine | http://arxiv.org/pdf/2410.10088v1 | null |
2024-10-14 | DINTR: Tracking via Diffusion-based Interpolation | DINTR:基于扩散插值的跟踪方法 | Pha Nguyen, Ngan Le, Jackson Cothren, Alper Yilmaz, Khoa Luu | http://arxiv.org/pdf/2410.10053v1 | null |
2024-10-14 | Gait Sequence Upsampling using Diffusion Models for Single LiDAR Sensors | 使用扩散模型对单LiDAR传感器进行步态序列上采样 | Jeongho Ahn, Kazuto Nakashima, Koki Yoshino, Yumi Iwashita, Ryo Kurazume | http://arxiv.org/pdf/2410.08680v2 | null |
2024-10-14 | Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation | Hallo2:长时域高分辨率音频驱动的肖像图像动画技术 | Jiahao Cui, Hui Li, Yao Yao, Hao Zhu, Hanlin Shang, Kaihui Cheng, Hang Zhou, Siyu Zhu, Jingdong Wang | http://arxiv.org/pdf/2410.07718v2 | null |
2024-10-14 | SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting | SceneDreamer360:基于全景高斯涂抹的文本驱动3D一致性场景生成 | Wenrui Li, Fucheng Cai, Yapeng Mi, Zhe Yang, Wangmeng Zuo, Xingtao Wang, Xiaopeng Fan | http://arxiv.org/pdf/2408.13711v2 | link |
2024-10-14 | Show-o: One Single Transformer to Unify Multimodal Understanding and Generation | 展-o:单一Transformer统一多模态理解与生成 | Jinheng Xie, Weijia Mao, Zechen Bai, David Junhao Zhang, Weihao Wang, Kevin Qinghong Lin, Yuchao Gu, Zhijie Chen, Zhenheng Yang, Mike Zheng Shou | http://arxiv.org/pdf/2408.12528v5 | null |
2024-10-14 | JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation | JointDreamer:通过联合得分蒸馏确保文本到3D生成中的几何一致性及文本一致性 | Chenhan Jiang, Yihan Zeng, Tianyang Hu, Songcun Xu, Wei Zhang, Hang Xu, Dit-Yan Yeung | http://arxiv.org/pdf/2407.12291v2 | null |
2024-10-14 | VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation | VideoScore:构建自动度量标准以模拟视频生成中的细粒度人类反馈 | Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhranil Chandra, Ziyan Jiang, Aaran Arulraj, et.al. | http://arxiv.org/pdf/2406.15252v3 | null |
2024-10-14 | Extracting Training Data from Unconditional Diffusion Models | 从无条件扩散模型中提取训练数据 | Yunhao Chen, Xingjun Ma, Difan Zou, Yu-Gang Jiang | http://arxiv.org/pdf/2406.12752v2 | null |
2024-10-14 | VideoTetris: Towards Compositional Text-to-Video Generation | VideoTetris:迈向组合式文本到视频生成的探索 | Ye Tian, Ling Yang, Haotian Yang, Yuan Gao, Yufan Deng, Jingmin Chen, Xintao Wang, Zhaochen Yu, Xin Tao, Pengfei Wan, et.al. | http://arxiv.org/pdf/2406.04277v2 | link |
2024-10-14 | TotalVibeSegmentator: Full Torso Segmentation for the NAKO and UK Biobank in Volumetric Interpolated Breath-hold Examination Body Images | TotalVibeSegmentator: 针对NAKO和英国生物样本库在全容积插值屏气检查体图像中的全身躯干分割 | Robert Graf, Paul-Sören Platzek, Evamaria Olga Riedel, Constanze Ramschütz, Sophie Starck, Hendrik Kristian Möller, Matan Atad, Henry Völzke, Robin Bülow, Carsten Oliver Schmidt, et.al. | http://arxiv.org/pdf/2406.00125v2 | link |
2024-10-14 | Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective | 解构扩散模型的平滑性特性:高斯混合视角研究 | Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou | http://arxiv.org/pdf/2405.16418v2 | null |
2024-10-14 | Sign Stitching: A Novel Approach to Sign Language Production | 签名缝合:一种手语生成的新方法 | Harry Walsh, Ben Saunders, Richard Bowden | http://arxiv.org/pdf/2405.07663v2 | link |
2024-10-14 | Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing | 编辑您的动作:时空扩散解耦学习用于视频动作编辑 | Yi Zuo, Lingling Li, Licheng Jiao, Fang Liu, Xu Liu, Wenping Ma, Shuyuan Yang, Yuwei Guo | http://arxiv.org/pdf/2405.04496v2 | null |
2024-10-14 | Generative inpainting of incomplete Euclidean distance matrices of trajectories generated by a fractional Brownian motion | 生成式修复:基于分数布朗运动的轨迹生成的不完全欧几里得距离矩阵 | Alexander Lobashev, Dmitry Guskov, Kirill Polovnikov | http://arxiv.org/pdf/2404.07029v2 | link |
2024-10-14 | Geometry-Informed Neural Networks | 几何信息增强的神经网络 | Arturs Berzins, Andreas Radler, Eric Volkmann, Sebastian Sanokowski, Sepp Hochreiter, Johannes Brandstetter | http://arxiv.org/pdf/2402.14009v3 | null |
2024-10-14 | RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models | RealCompo:平衡真实性与构成性提升文本到图像扩散模型性能 | Xinchen Zhang, Ling Yang, Yaqi Cai, Zhaochen Yu, Kai-Ni Wang, Jiake Xie, Ye Tian, Minkai Xu, Yong Tang, Yujiu Yang, et.al. | http://arxiv.org/pdf/2402.12908v3 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-14 | TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models | "TemporalBench:多模态视频模型细粒度时间理解基准测试" | Mu Cai, Reuben Tan, Jianrui Zhang, Bocheng Zou, Kai Zhang, Feng Yao, Fangrui Zhu, Jing Gu, Yiwu Zhong, Yuzhang Shang, et.al. | http://arxiv.org/pdf/2410.10818v1 | null |
2024-10-14 | Towards Foundation Models for 3D Vision: How Close Are We? | 迈向三维视觉的基础模型:我们还有多远? | Yiming Zuo, Karhan Kayan, Maggie Wang, Kevin Jeon, Jia Deng, Thomas L. Griffiths | http://arxiv.org/pdf/2410.10799v1 | null |
2024-10-14 | MMAR: Towards Lossless Multi-Modal Auto-Regressive Prababilistic Modeling | MMAR:迈向无损多模态自回归概率建模 | Jian Yang, Dacheng Yin, Yizhou Zhou, Fengyun Rao, Wei Zhai, Yang Cao, Zheng-Jun Zha | http://arxiv.org/pdf/2410.10798v1 | null |
2024-10-14 | Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes | 条件感知的多模态融合方法用于驾驶场景的鲁棒语义感知 | Tim Broedermann, Christos Sakaridis, Yuqian Fu, Luc Van Gool | http://arxiv.org/pdf/2410.10791v1 | null |
2024-10-14 | LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content | LiveXiv——基于Arxiv论文内容的多模态实时评测基准 | Nimrod Shabtay, Felipe Maia Polo, Sivan Doveh, Wei Lin, M. Jehanzeb Mirza, Leshem Chosen, Mikhail Yurochkin, Yuekai Sun, Assaf Arbelle, Leonid Karlinsky, et.al. | http://arxiv.org/pdf/2410.10783v1 | null |
2024-10-14 | Cross-Modal Few-Shot Learning: a Generative Transfer Learning Framework | 跨模态少样本学习:一种生成式迁移学习框架 | Zhengwei Yang, Yuke Li, Qiang Sun, Basura Fernando, Heng Huang, Zheng Wang | http://arxiv.org/pdf/2410.10663v1 | null |
2024-10-14 | BrainMVP: Multi-modal Vision Pre-training for Brain Image Analysis using Multi-parametric MRI | 脑MVP:使用多参数MRI进行脑图像分析的多模态视觉预训练 | Shaohao Rui, Lingzhi Chen, Zhenyu Tang, Lilong Wang, Mianxin Liu, Shaoting Zhang, Xiaosong Wang | http://arxiv.org/pdf/2410.10604v1 | null |
2024-10-14 | VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents | VisRAG: 基于视觉的多模态文档检索增强生成方法 | Shi Yu, Chaoyue Tang, Bokai Xu, Junbo Cui, Junhao Ran, Yukun Yan, Zhenghao Liu, Shuo Wang, Xu Han, Zhiyuan Liu, et.al. | http://arxiv.org/pdf/2410.10594v1 | null |
2024-10-14 | MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks | MEGA-Bench:将多模态评估扩展至超过500个真实世界任务 | Jiacheng Chen, Tianhao Liang, Sherman Siu, Zhengqing Wang, Kai Wang, Yubo Wang, Yuansheng Ni, Wang Zhu, Ziyan Jiang, Bohan Lyu, et.al. | http://arxiv.org/pdf/2410.10563v1 | null |
2024-10-14 | Hybrid Transformer for Early Alzheimer's Detection: Integration of Handwriting-Based 2D Images and 1D Signal Features | 混合Transformer用于早期阿尔茨海默病检测:基于手写书法的二维图像与一维信号特征融合 | Changqing Gong, Huafeng Qin, Mounîm A. El-Yacoubi | http://arxiv.org/pdf/2410.10547v1 | null |
2024-10-14 | Learning to Ground VLMs without Forgetting | 标题翻译:学会在不忘却中扎根VLMs | Aritra Bhowmik, Mohammad Mahdi Derakhshani, Dennis Koelma, Martin R. Oswald, Yuki M. Asano, Cees G. M. Snoek | http://arxiv.org/pdf/2410.10491v1 | null |
2024-10-14 | Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs | 免费视频-LLM:提示引导的视觉感知实现高效无训练视频LLM | Kai Han, Jianyuan Guo, Yehui Tang, Wei He, Enhua Wu, Yunhe Wang | http://arxiv.org/pdf/2410.10441v1 | null |
2024-10-14 | Class Balancing Diversity Multimodal Ensemble for Alzheimer's Disease Diagnosis and Early Detection | 类平衡多样性多模态集成算法在阿尔茨海默病诊断与早期检测中的应用 | Arianna Francesconi, Lazzaro di Biase, Donato Cappetta, Fabio Rebecchi, Paolo Soda, Rosa Sicilia, Valerio Guarrasi | http://arxiv.org/pdf/2410.10374v1 | null |
2024-10-14 | Spatial-Aware Efficient Projector for MLLMs via Multi-Layer Feature Aggregation | 空间感知高效投影器:通过多层特征聚合实现多语言大型语言模型优化 | Shun Qian, Bingquan Liu, Chengjie Sun, Zhen Xu, Baoxun Wang | http://arxiv.org/pdf/2410.10319v1 | null |
2024-10-14 | ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization | ForgeryGPT:用于可解释图像伪造检测与定位的多模态大型语言模型 | Jiawei Li, Fanrui Zhang, Jiaying Zhu, Esther Sun, Qiang Zhang, Zheng-Jun Zha | http://arxiv.org/pdf/2410.10238v1 | null |
2024-10-14 | Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention | 消除视觉问答中的语言偏差:利用细粒度因果干预方法 | Ying Liu, Ge Bai, Chenji Lu, Shilong Li, Zhang Zhang, Ruifang Liu, Wenbin Guo | http://arxiv.org/pdf/2410.10184v1 | null |
2024-10-14 | X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing | X-Fi: 多模态人体感知的模态不变基础模型 | Xinyan Chen, Jianfei Yang | http://arxiv.org/pdf/2410.10167v1 | null |
2024-10-14 | Performance Evaluation of Deep Learning and Transformer Models Using Multimodal Data for Breast Cancer Classification | 深度学习与变换器模型在乳腺癌分类中利用多模态数据的性能评估 | Sadam Hussain, Mansoor Ali, Usman Naseem, Beatriz Alejandra Bosques Palomo, Mario Alexis Monsivais Molina, Jorge Alberto Garza Abdala, Daly Betzabeth Avendano Avalos, Servando Cardona-Huerta, T. Aaron Gulliver, Jose Gerardo Tamez Pena | http://arxiv.org/pdf/2410.10146v1 | null |
2024-10-14 | MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models | 大规模多模态交错理解基准MMIE:面向大型视觉-语言模型 | Peng Xia, Siwei Han, Shi Qiu, Yiyang Zhou, Zhaoyang Wang, Wenhao Zheng, Zhaorun Chen, Chenhang Cui, Mingyu Ding, Linjie Li, et.al. | http://arxiv.org/pdf/2410.10139v1 | null |
2024-10-14 | Bridging the Gap between Text, Audio, Image, and Any Sequence: A Novel Approach using Gloss-based Annotation | 桥接文本、音频、图像与任意序列之间的差距:一种基于词汇注释的新型方法 | Sen Fang, Sizhou Chen, Yalin Feng, Xiaofeng Zhang, Teik Toe Teoh | http://arxiv.org/pdf/2410.03146v2 | null |
2024-10-14 | Benchmarking Vision Language Models for Cultural Understanding | 视觉语言模型在文化理解方面的基准测试研究 | Shravan Nayak, Kanishk Jain, Rabiul Awal, Siva Reddy, Sjoerd van Steenkiste, Lisa Anne Hendricks, Karolina Stańczak, Aishwarya Agrawal | http://arxiv.org/pdf/2407.10920v3 | null |
2024-10-14 | Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Large Multi-modal Models | 提升低收入数据:大型多模态模型中的社会经济视角转换策略 | Joan Nwatu, Oana Ignat, Rada Mihalcea | http://arxiv.org/pdf/2407.02623v3 | link |
2024-10-14 | The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval | 多模态大型语言模型在视频时刻检索中的惊人有效性 | Boris Meinardus, Anil Batra, Anna Rohrbach, Marcus Rohrbach | http://arxiv.org/pdf/2406.18113v3 | link |
2024-10-14 | White-box Multimodal Jailbreaks Against Large Vision-Language Models | 白盒对抗多模态攻击大型视觉-语言模型研究 | Ruofan Wang, Xingjun Ma, Hanxu Zhou, Chuanjun Ji, Guangnan Ye, Yu-Gang Jiang | http://arxiv.org/pdf/2405.17894v2 | null |
2024-10-14 | Improving Multimodal Learning with Multi-Loss Gradient Modulation | 多损失梯度调制改善多模态学习性能 | Konstantinos Kontras, Christos Chatzichristos, Matthew Blaschko, Maarten De Vos | http://arxiv.org/pdf/2405.07930v2 | null |
2024-10-14 | HAMMR: HierArchical MultiModal React agents for generic VQA | HAMMR:用于通用视觉问答的层次多模态反应智能体 | Lluis Castrejon, Thomas Mensink, Howard Zhou, Vittorio Ferrari, Andre Araujo, Jasper Uijlings | http://arxiv.org/pdf/2404.05465v2 | null |
2024-10-14 | TV-SAM: Increasing Zero-Shot Segmentation Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human Annotation | TV-SAM:利用GPT-4生成的描述性提示提升多模态医学图像零样本分割性能,无需人工标注 | Zekun Jiang, Dongjie Cheng, Ziyuan Qin, Jun Gao, Qicheng Lao, Abdullaev Bakhrom Ismoilovich, Urazboev Gayrat, Yuldashov Elyorbek, Bekchanov Habibullo, Defu Tang, et.al. | http://arxiv.org/pdf/2402.15759v2 | link |
2024-10-14 | Revisiting Few-Shot Object Detection with Vision-Language Models | 重新审视基于视觉-语言模型的少样本目标检测 | Anish Madan, Neehar Peri, Shu Kong, Deva Ramanan | http://arxiv.org/pdf/2312.14494v4 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-14 | 3DArticCyclists: Generating Simulated Dynamic 3D Cyclists for Human-Object Interaction (HOI) and Autonomous Driving Applications | 3DArticCyclists:为人体-物体交互(HOI)和自动驾驶应用生成模拟动态3D骑行者 | Eduardo R. Corral-Soto, Yang Liu, Tongtong Cao, Yuan Ren, Liu Bingbing | http://arxiv.org/pdf/2410.10782v1 | null |
2024-10-14 | SpikeGS: Learning 3D Gaussian Fields from Continuous Spike Stream | SpikeGS:从连续尖峰流中学习三维高斯场 | Jinze Yu, Xin Peng, Zhengda Lu, Laurent Kneip, Yiqun Wang | http://arxiv.org/pdf/2409.15176v5 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-14 | 4-LEGS: 4D Language Embedded Gaussian Splatting | 4-LEGS:四维语言嵌入高斯扩散算法 | Gal Fiebelman, Tamir Cohen, Ayellet Morgenstern, Peter Hedman, Hadar Averbuch-Elor | http://arxiv.org/pdf/2410.10719v1 | null |
2024-10-14 | 4DStyleGaussian: Zero-shot 4D Style Transfer with Gaussian Splatting | 4DStyleGaussian:基于高斯溅射的零样本4D风格迁移 | Wanlin Liang, Hongbin Xu, Weitao Chen, Feng Xiao, Wenxiong Kang | http://arxiv.org/pdf/2410.10412v1 | null |
2024-10-14 | Event3DGS: Event-Based 3D Gaussian Splatting for High-Speed Robot Egomotion | 基于事件驱动的三维高斯扩散体素化方法在高速度机器人自运动估计中的应用(Event3DGS) | Tianyi Xiong, Jiayi Wu, Botao He, Cornelia Fermuller, Yiannis Aloimonos, Heng Huang, Christopher A. Metzler | http://arxiv.org/pdf/2406.02972v4 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-14 | ROSAR: An Adversarial Re-Training Framework for Robust Side-Scan Sonar Object Detection | ROSAR:一种用于鲁棒侧扫声纳目标检测的对抗性重训练框架 | Martin Aubard, László Antal, Ana Madureira, Luis F. Teixeira, Erika Ábrahám | http://arxiv.org/pdf/2410.10554v1 | null |
2024-10-14 | QIANets: Quantum-Integrated Adaptive Networks for Reduced Latency and Improved Inference Times in CNN Models | 量子集成自适应网络(QIANets):降低CNN模型延迟和提高推理时间的量子集成网络研究 | Zhumazhan Balapanov, Edward Magongo, Vanessa Matvei, Olivia Holmberg, Jonathan Pei, Kevin Zhu | http://arxiv.org/pdf/2410.10318v1 | null |
2024-10-14 | LADMIM: Logical Anomaly Detection with Masked Image Modeling in Discrete Latent Space | LADMIM:离散潜在空间中利用遮罩图像建模的逻辑异常检测方法 | Shunsuke Sakai, Tatushito Hasegawa, Makoto Koshino | http://arxiv.org/pdf/2410.10234v1 | null |
2024-10-14 | REHRSeg: Unleashing the Power of Self-Supervised Super-Resolution for Resource-Efficient 3D MRI Segmentation | Hyperspectral Image Reconstruction with Deep Learning 中文标题:自监督超分辨率在资源高效3D MRI分割中的力量释放:REHRSeg模型 | Zhiyun Song, Yinjie Zhao, Xiaomin Li, Manman Fei, Xiangyu Zhao, Mengjun Liu, Cunjian Chen, Chung-Hsing Yeh, Qian Wang, Guoyan Zheng, et.al. | http://arxiv.org/pdf/2410.10097v1 | null |
2024-10-14 | Self-Distilled Depth Refinement with Noisy Poisson Fusion | 自蒸馏深度细化与含噪泊松融合 | Jiaqi Li, Yiran Wang, Jinghong Zheng, Zihao Huang, Ke Xian, Zhiguo Cao, Jianming Zhang | http://arxiv.org/pdf/2409.17880v2 | null |
2024-10-14 | ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers | ADFQ-ViT:针对视觉变换器的激活分布友好型后训练量化方法 | Yanfeng Jiang, Ning Sun, Xueshuo Xie, Fei Yang, Tao Li | http://arxiv.org/pdf/2407.02763v2 | null |
2024-10-14 | Improving Consistency Models with Generator-Induced Flows | 提升一致性模型:引入生成器诱导流方法 | Thibaut Issenhuth, Sangchul Lee, Ludovic Dos Santos, Jean-Yves Franceschi, Chansoo Kim, Alain Rakotomamonjy | http://arxiv.org/pdf/2406.09570v2 | link |
2024-10-14 | DD-RobustBench: An Adversarial Robustness Benchmark for Dataset Distillation | DD-RobustBench:数据蒸馏对抗鲁棒性基准测试 | Yifan Wu, Jiawei Du, Ping Liu, Yuewei Lin, Wei Xu, Wenqing Cheng | http://arxiv.org/pdf/2403.13322v3 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-14 | When Does Perceptual Alignment Benefit Vision Representations? | 当感知对齐如何有利于视觉表征? | Shobhita Sundaram, Stephanie Fu, Lukas Muttenthaler, Netanel Y. Tamir, Lucy Chai, Simon Kornblith, Trevor Darrell, Phillip Isola | http://arxiv.org/pdf/2410.10817v1 | null |
2024-10-14 | Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies | 通用型人形机器人操作与改进的3D扩散策略 | Yanjie Ze, Zixuan Chen, Wenhao Wang, Tianyi Chen, Xialin He, Ying Yuan, Xue Bin Peng, Jiajun Wu | http://arxiv.org/pdf/2410.10803v1 | link |
2024-10-14 | UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation | UniMatch V2:半监督语义分割的极限探索 | Lihe Yang, Zhen Zhao, Hengshuang Zhao | http://arxiv.org/pdf/2410.10777v1 | link |
2024-10-14 | Enhancing JEPAs with Spatial Conditioning: Robust and Efficient Representation Learning | 增强JEPAs的空间条件化:稳健且高效的表示学习 | Etai Littwin, Vimal Thilak, Anand Gopalakrishnan | http://arxiv.org/pdf/2410.10773v1 | null |
2024-10-14 | Adversarially Robust Out-of-Distribution Detection Using Lyapunov-Stabilized Embeddings | 抗攻击稳健的分布外检测:基于李雅普诺夫稳定嵌入方法 | Hossein Mirzaei, Mackenzie W. Mathis | http://arxiv.org/pdf/2410.10744v1 | null |
2024-10-14 | Benefiting from Quantum? A Comparative Study of Q-Seg, Quantum-Inspired Techniques, and U-Net for Crack Segmentation | 量子优势何在?Q-Seg、量子启发技术与U-Net在裂缝分割中的比较研究 | Akshaya Srinivasan, Alexander Geng, Antonio Macaluso, Maximilian Kiefer-Emmanouilidis, Ali Moghiseh | http://arxiv.org/pdf/2410.10713v1 | null |
2024-10-14 | Ensemble of ConvNeXt V2 and MaxViT for Long-Tailed CXR Classification with View-Based Aggregation | 卷积Next V2与MaxViT集成用于基于视角聚合的长尾X光分类 | Yosuke Yamagishi, SHouhei Hanaoka | http://arxiv.org/pdf/2410.10710v1 | null |
2024-10-14 | Early Diagnoses of Acute Lymphoblastic Leukemia Using YOLOv8 and YOLOv11 Deep Learning Models | 急性淋巴细胞白血病的早期诊断:使用YOLOv8和YOLOv11深度学习模型 | Alaa Awad, Mohamed Hegazy, Salah A. Aly | http://arxiv.org/pdf/2410.10701v1 | null |
2024-10-14 | PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion | PCF-Lift:基于概率对比融合的全景提升算法 | Runsong Zhu, Shi Qiu, Qianyi Wu, Ka-Hei Hui, Pheng-Ann Heng, Chi-Wing Fu | http://arxiv.org/pdf/2410.10659v1 | null |
2024-10-14 | MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer | MoTE:视觉-语言到视频知识迁移中平衡泛化与特化的方法 | Minghao Zhu, Zhengpu Wang, Mengxian Hu, Ronghao Dang, Xiao Lin, Xun Zhou, Chengju Liu, Qijun Chen | http://arxiv.org/pdf/2410.10589v1 | null |
2024-10-14 | TopoFR: A Closer Look at Topology Alignment on Face Recognition | TopoFR:对面部识别中的拓扑对齐的深入研究 | Jun Dan, Yang Liu, Jiankang Deng, Haoyu Xie, Siyuan Li, Baigui Sun, Shan Luo | http://arxiv.org/pdf/2410.10587v1 | null |
2024-10-14 | Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification | 基于视觉-语言模型的查询原型多实例学习在增量全切片图像分类中的应用 | Jiaxiang Gou, Luping Ji, Pei Liu, Mao Ye | http://arxiv.org/pdf/2410.10573v1 | null |
2024-10-14 | Preserving Cardiac Integrity: A Topology-Infused Approach to Whole Heart Segmentation | 保护心脏完整性:一种融入拓扑结构的全心脏分割方法 | Chenyu Zhang, Wenxue Guan, Xiaodan Xing, Guan Yang | http://arxiv.org/pdf/2410.10551v1 | null |
2024-10-14 | RICASSO: Reinforced Imbalance Learning with Class-Aware Self-Supervised Outliers Exposure | RICASSO: 基于类感知的自监督异常值暴露增强的失衡学习 | Xuan Zhang, Sin Chee Chin, Tingxuan Gao, Wenming Yang | http://arxiv.org/pdf/2410.10548v1 | null |
2024-10-14 | Motion-guided small MAV detection in complex and non-planar scenes | 运动引导的复杂非平面场景中小型MAV检测 | Hanqing Guo, Canlun Zheng, Shiyu Zhao | http://arxiv.org/pdf/2410.10527v1 | null |
2024-10-14 | Exploiting Local Features and Range Images for Small Data Real-Time Point Cloud Semantic Segmentation | 利用局部特征和范围图像进行小数据实时点云语义分割 | Daniel Fusaro, Simone Mosco, Emanuele Menegatti, Alberto Pretto | http://arxiv.org/pdf/2410.10510v1 | null |
2024-10-14 | Continual Learning Improves Zero-Shot Action Recognition | 连续学习提升零样本动作识别性能 | Shreyank N Gowda, Davide Moltisanti, Laura Sevilla-Lara | http://arxiv.org/pdf/2410.10497v1 | null |
2024-10-14 | Advancing Newborn Care: Precise Birth Time Detection Using AI-Driven Thermal Imaging with Adaptive Normalization | 推进新生儿护理:基于自适应归一化的AI驱动热成像精确出生时间检测技术 | Jorge García-Torres, Øyvind Meinich-Bache, Anders Johannessen, Siren Rettedal, Vilde Kolstad, Kjersti Engan | http://arxiv.org/pdf/2410.10483v1 | null |
2024-10-14 | Improve Meta-learning for Few-Shot Text Classification with All You Can Acquire from the Tasks | 利用任务中可获得的一切提升元学习在少样本文本分类中的应用 | Xinyue Liu, Yunlong Gao, Linlin Zong, Bo Xu | http://arxiv.org/pdf/2410.10454v1 | null |
2024-10-14 | LKASeg:Remote-Sensing Image Semantic Segmentation with Large Kernel Attention and Full-Scale Skip Connections | LKASeg:基于大核注意力与全尺度跳跃连接的遥感图像语义分割 | Xuezhi Xiang, Yibo Ning, Lei Zhang, Denis Ombati, Himaloy Himu, Xiantong Zhen | http://arxiv.org/pdf/2410.10433v1 | null |
2024-10-14 | Reverse Refinement Network for Narrow Rural Road Detection in High-Resolution Satellite Imagery | 高分辨率卫星影像中窄乡村道路检测的逆向精炼网络 | Ningjing Wang, Xinyu Wang, Yang Pan, Wanqiang Yao, Yanfei Zhong | http://arxiv.org/pdf/2410.10389v1 | null |
2024-10-14 | V2M: Visual 2-Dimensional Mamba for Image Representation Learning | V2M:面向图像表征学习的二维曼巴视觉模型 | Chengkun Wang, Wenzhao Zheng, Yuanhui Huang, Jie Zhou, Jiwen Lu | http://arxiv.org/pdf/2410.10382v1 | null |
2024-10-14 | Affinity-Graph-Guided Contractive Learning for Pretext-Free Medical Image Segmentation with Minimal Annotation | 基于亲和图引导的紧缩学习无预处理的极小标注医学图像分割 | Zehua Cheng, Di Yuan, Thomas Lukasiewicz | http://arxiv.org/pdf/2410.10366v1 | null |
2024-10-14 | Pubic Symphysis-Fetal Head Segmentation Network Using BiFormer Attention Mechanism and Multipath Dilated Convolution | 基于BiFormer注意力机制与多路径扩张卷积的耻骨联合-胎儿头部分割网络 | Pengzhou Cai, Lu Jiang, Yanxin Li, Xiaojuan Liu, Libin Lan | http://arxiv.org/pdf/2410.10352v1 | null |
2024-10-14 | GlobalMamba: Global Image Serialization for Vision Mamba | GlobalMamba:视觉Mamba的全局图像序列化方法 | Chengkun Wang, Wenzhao Zheng, Jie Zhou, Jiwen Lu | http://arxiv.org/pdf/2410.10316v1 | null |
2024-10-14 | ROA-BEV: 2D Region-Oriented Attention for BEV-based 3D Object | ROA-BEV:基于BEV的3D目标检测的2D区域导向注意力机制 | Jiwei Chen, Laiyan Ding, Chi Zhang, Feifei Li, Rui Huang | http://arxiv.org/pdf/2410.10298v1 | null |
2024-10-14 | Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection | 细粒度异常提示学习用于零样本异常检测 | Jiawen Zhu, Yew-Soon Ong, Chunhua Shen, Guansong Pang | http://arxiv.org/pdf/2410.10289v1 | null |
2024-10-14 | Manifold-Aware Local Feature Modeling for Semi-Supervised Medical Image Segmentation | 曼ifold感知的局部特征建模在半监督医学图像分割中的应用 | Sicheng Shen, Jinming Cao, Yifang Yin, Roger Zimmermann | http://arxiv.org/pdf/2410.10287v1 | null |
2024-10-14 | Two-Stage Approach for Brain MR Image Synthesis: 2D Image Synthesis and 3D Refinement | 两阶段脑部MR图像合成方法:2D图像合成与3D细化 | Jihoon Cho, Seunghyuck Park, Jinah Park | http://arxiv.org/pdf/2410.10269v1 | null |
2024-10-14 | big.LITTLE Vision Transformer for Efficient Visual Recognition | 高效视觉识别的大.LITTLE视觉Transformer | He Guo, Yulong Wang, Zixuan Ye, Jifeng Dai, Yuwen Xiong | http://arxiv.org/pdf/2410.10267v1 | null |
2024-10-14 | Capture Artifacts via Progressive Disentangling and Purifying Blended Identities for Deepfake Detection | 通过渐进式解耦与净化混合身份捕捉伪造痕迹用于深度伪造检测 | Weijie Zhou, Xiaoqing Luo, Zhancheng Zhang, Jiachen He, Xiaojun Wu | http://arxiv.org/pdf/2410.10244v1 | link |
2024-10-14 | Innovative Deep Learning Techniques for Obstacle Recognition: A Comparative Study of Modern Detection Algorithms | 创新深度学习技术在障碍物识别中的应用:现代检测算法比较研究 | Santiago Pérez, Camila Gómez, Matías Rodríguez | http://arxiv.org/pdf/2410.10096v1 | null |
2024-10-14 | Out-of-Bounding-Box Triggers: A Stealthy Approach to Cheat Object Detectors | for Adversarial Attacks 中文翻译:越界盒触发器:一种针对对抗性攻击的隐秘欺骗目标检测器方法 | Tao Lin, Lijia Yu, Gaojie Jin, Renjue Li, Peng Wu, Lijun Zhang | http://arxiv.org/pdf/2410.10091v1 | null |
2024-10-14 | PointNet with KAN versus PointNet with MLP for 3D Classification and Segmentation of Point Sets | 点集三维分类与分割中KAN-PointNet与MLP-PointNet的比较研究 | Ali Kashefi | http://arxiv.org/pdf/2410.10084v1 | null |
2024-10-14 | LIME-Eval: Rethinking Low-light Image Enhancement Evaluation via Object Detection | LIME-Eval:基于目标检测的低光照图像增强评估方法再思考 | Mingjia Li, Hao Zhao, Xiaojie Guo | http://arxiv.org/pdf/2410.08810v2 | link |
2024-10-14 | Finetuning YOLOv9 for Vehicle Detection: Deep Learning for Intelligent Transportation Systems in Dhaka, Bangladesh | 微调YOLOv9进行车辆检测:深度学习在孟加拉国达卡智能交通系统中的应用 | Shahriar Ahmad Fahim | http://arxiv.org/pdf/2410.08230v2 | null |
2024-10-14 | MCTrack: A Unified 3D Multi-Object Tracking Framework for Autonomous Driving | MCTrack:自动驾驶统一三维多目标跟踪框架 | Xiyang Wang, Shouzheng Qi, Jieyou Zhao, Hangning Zhou, Siyu Zhang, Guoan Wang, Kai Tu, Songlin Guo, Jianbo Zhao, Jian Li, et.al. | http://arxiv.org/pdf/2409.16149v2 | link |
2024-10-14 | Automatic Classification of White Blood Cell Images using Convolutional Neural Network | 利用卷积神经网络实现白细胞图像的自动分类 | Rabia Asghar, Arslan Shaukat, Usman Akram, Rimsha Tariq | http://arxiv.org/pdf/2409.13442v4 | null |
2024-10-14 | MedSegMamba: 3D CNN-Mamba Hybrid Architecture for Brain Segmentation | MedSegMamba:用于脑部分割的3D CNN-Mamba混合架构 | Aaron Cao, Zongyu Li, Jordan Jomsky, Andrew F. Laine, Jia Guo | http://arxiv.org/pdf/2409.08307v3 | link |
2024-10-14 | AnyDesign: Versatile Area Fashion Editing via Mask-Free Diffusion | AnyDesign:基于无掩码扩散的通用区域时尚编辑方法 | Yunfang Niu, Lingxiang Wu, Dong Yi, Jie Peng, Ning Jiang, Haiying Wu, Jinqiao Wang | http://arxiv.org/pdf/2408.11553v3 | link |
2024-10-14 | Detecting Audio-Visual Deepfakes with Fine-Grained Inconsistencies | 检测音视频深度伪造中的细粒度不一致性 | Marcella Astrid, Enjie Ghorbel, Djamila Aouada | http://arxiv.org/pdf/2408.06753v3 | null |
2024-10-14 | Designing Extremely Memory-Efficient CNNs for On-device Vision Tasks | 设计极致内存高效的卷积神经网络以支持设备端视觉任务 | Jaewook Lee, Yoel Park, Seulki Lee | http://arxiv.org/pdf/2408.03663v2 | null |
2024-10-14 | ParCon: Noise-Robust Collaborative Perception via Multi-module Parallel Connection | ParCon:基于多模块并行连接的噪声鲁棒协同感知 | Hyunchul Bae, Minhee Kang, Heejin Ahn | http://arxiv.org/pdf/2407.11546v2 | null |
2024-10-14 | Exploring the Potential of Polynomial Basis Functions in Kolmogorov-Arnold Networks: A Comparative Study of Different Groups of Polynomials | 探索多项式基函数在Kolmogorov-Arnold网络中的潜力:不同组多项式的比较研究 | Seyd Teymoor Seydi | http://arxiv.org/pdf/2406.02583v2 | link |
2024-10-14 | Advancing Supervised Local Learning Beyond Classification with Long-term Feature Bank | 推进监督式局部学习:基于长期特征银行的超越分类方法 | Feiyu Zhu, Yuming Zhang, Changpeng Cai, Chenghao He, Xiuyuan Guo, Jiao Li, Peizhe Wang, Junhao Su, Jialin Gao | http://arxiv.org/pdf/2406.00446v2 | null |
2024-10-14 | Neural Collapse Meets Differential Privacy: Curious Behaviors of NoisyGD with Near-perfect Representation Learning | 神经崩溃与差分隐私相遇:噪声GD在近乎完美表征学习中的好奇行为 | Chendi Wang, Yuqing Zhu, Weijie J. Su, Yu-Xiang Wang | http://arxiv.org/pdf/2405.08920v3 | null |
2024-10-14 | Multi-scale direction-aware SAR object detection network via global information fusion | 多尺度方向感知SAR目标检测网络:全局信息融合方法 | Mingxiang Cao, Weiying Xie, Jie Lei, Jiaqing Zhang, Daixun Li, Yunsong Li | http://arxiv.org/pdf/2312.16943v5 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-14 | ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training | ReLayout:基于布局增强预训练的面向真实世界文档理解方法 | Zhouqiang Jiang, Bowen Wang, Junhao Chen, Yuta Nakashima | http://arxiv.org/pdf/2410.10471v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-14 | Fusion-Driven Tree Reconstruction and Fruit Localization: Advancing Precision in Agriculture | 融合驱动树重建与果实定位:提升农业精准度 | Kaiming Fu, Peng Wei, Juan Villacres, Zhaodan Kong, Stavros G. Vougioukas, Brian N. Bailey | http://arxiv.org/pdf/2310.15138v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-14 | Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention | Virtual Worlds 中文翻译:Cavia:基于视角集成注意力的摄像头可控多视角视频扩散虚拟世界方法 | Dejia Xu, Yifan Jiang, Chen Huang, Liangchen Song, Thorsten Gernoth, Liangliang Cao, Zhangyang Wang, Hao Tang | http://arxiv.org/pdf/2410.10774v1 | null |
2024-10-14 | DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model | 驱动道场数据集:推进互动性和知识增强的驾驶世界模型发展 | Yuqi Wang, Ke Cheng, Jiawei He, Qitai Wang, Hengchen Dai, Yuntao Chen, Fei Xia, Zhaoxiang Zhang | http://arxiv.org/pdf/2410.10738v1 | null |
2024-10-14 | Transforming Game Play: A Comparative Study of DCQN and DTQN Architectures in Reinforcement Learning | 游戏玩法转换:强化学习中DCQN与DTQN架构的对比研究 | William A. Stigall | http://arxiv.org/pdf/2410.10660v1 | null |
2024-10-14 | Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling | 定制您的视觉自回归食谱:集自回归建模方法 | Wenze Liu, Le Zhuo, Yi Xin, Sheng Xia, Peng Gao, Xiangyu Yue | http://arxiv.org/pdf/2410.10511v1 | null |
2024-10-14 | Domain-Conditioned Transformer for Fully Test-time Adaptation | 域条件Transformer用于全测试时间自适应 | Yushun Tang, Shuoshuo Chen, Jiyuan Jia, Yi Zhang, Zhihai He | http://arxiv.org/pdf/2410.10442v1 | null |
2024-10-14 | Parameterize Structure with Differentiable Template for 3D Shape Generation | 参数化结构:用于三维形状生成的可微模板 | Changfeng Ma, Pengxiao Guo, Shuangyu Yang, Yinuo Chen, Jie Guo, Chongjun Wang, Yanwen Guo, Wenping Wang | http://arxiv.org/pdf/2410.10399v1 | null |
2024-10-14 | FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification | FasterDiT:无需架构修改的更快扩散Transformer训练方法 | Jingfeng Yao, Wang Cheng, Wenyu Liu, Xinggang Wang | http://arxiv.org/pdf/2410.10356v1 | null |
2024-10-14 | A Consistency-Aware Spot-Guided Transformer for Versatile and Hierarchical Point Cloud Registration | 一致性感知的斑点引导Transformer用于灵活与层次化点云配准 | Renlang Huang, Yufan Tang, Jiming Chen, Liang Li | http://arxiv.org/pdf/2410.10295v1 | null |
2024-10-14 | KNN Transformer with Pyramid Prompts for Few-Shot Learning | KNN金字塔提示Transformer用于少样本学习 | Wenhao Li, Qiangchang Wang, Peng Zhao, Yilong Yin | http://arxiv.org/pdf/2410.10227v1 | null |
2024-10-14 | Interaction-Guided Two-Branch Image Dehazing Network | 交互引导的双分支图像去雾网络 | Huichun Liu, Xiaosong Li, Tianshu Tan | http://arxiv.org/pdf/2410.10121v1 | null |
2024-10-14 | Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models | 专家混合模型的个性化:面向视觉-语言模型的联邦提示学习 | Jun Luo, Chen Chen, Shandong Wu | http://arxiv.org/pdf/2410.10114v1 | null |
2024-10-14 | Accelerating Diffusion Transformers with Token-wise Feature Caching | 加速基于Token-wise特征缓存的扩散Transformer模型 | Chang Zou, Xuyang Liu, Ting Liu, Siteng Huang, Linfeng Zhang | http://arxiv.org/pdf/2410.05317v2 | link |
2024-10-14 | Learning to Balance: Diverse Normalization for Cloth-Changing Person Re-Identification | 学习平衡:面向换衣人物再识别的多样化归一化技术 | Hongjun Wang, Jiyuan Chen, Zhengwei Yin, Xuan Song, Yinqiang Zheng | http://arxiv.org/pdf/2410.03977v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-14 | Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes | Sitcom-Crafter:三维场景中的剧情驱动人体运动生成系统 | Jianqi Chen, Panwen Hu, Xiaojun Chang, Zhenwei Shi, Michael Christian Kampffmeyer, Xiaodan Liang | http://arxiv.org/pdf/2410.10790v1 | null |
2024-10-14 | Self-Assessed Generation: Trustworthy Label Generation for Optical Flow and Stereo Matching in Real-world | 自评估生成:真实世界光学流与立体匹配的可信标签生成方法 | Han Ling, Yinghui Sun, Quansen Sun, Ivor Tsang, Yuhui Zheng | http://arxiv.org/pdf/2410.10453v1 | null |
2024-10-14 | On Representation of 3D Rotation in the Context of Deep Learning | 在深度学习背景下三维旋转的表示研究 | Viktória Pravdová, Lukáš Gajdošech, Hassan Ali, Viktor Kocur | http://arxiv.org/pdf/2410.10350v1 | null |
2024-10-14 | Animate-X: Universal Character Image Animation with Enhanced Motion Representation | Animate-X: 基于增强运动表征的通用角色图像动画技术 | Shuai Tan, Biao Gong, Xiang Wang, Shiwei Zhang, Dandan Zheng, Ruobing Zheng, Kecheng Zheng, Jingdong Chen, Ming Yang | http://arxiv.org/pdf/2410.10306v1 | null |
2024-10-14 | Slide-based Graph Collaborative Training for Histopathology Whole Slide Image Analysis | 基于幻灯片图谱的协同训练在组织病理学全切片图像分析中的应用 | Jun Shi, Tong Shu, Zhiguo Jiang, Wei Wang, Haibo Wu, Yushan Zheng | http://arxiv.org/pdf/2410.10260v1 | null |
2024-10-14 | Fast and Accurate Neural Rendering Using Semi-Gradients | 快速准确的半梯度神经渲染方法 | In-Young Cho, Jaewoong Cho | http://arxiv.org/pdf/2410.10149v1 | null |
2024-10-14 | Hi-Mamba: Hierarchical Mamba for Efficient Image Super-Resolution | Hi-Mamba:用于高效图像超分辨的层次化Mamba算法 | Junbo Qiao, Jincheng Liao, Wei Li, Yulun Zhang, Yong Guo, Yi Wen, Zhangxizi Qiu, Jiao Xie, Jie Hu, Shaohui Lin | http://arxiv.org/pdf/2410.10140v1 | null |
2024-10-14 | ScaleFlow++: Robust and Accurate Estimation of 3D Motion from Video | ScaleFlow++:视频中的稳健与精确三维运动估计 | Han Ling, Yinghui Sun, Quansen Sun, Yuhui Zheng | http://arxiv.org/pdf/2409.12202v2 | link |
2024-10-14 | Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation | 基于3D特征场的视觉-语言导航的模拟到真实迁移 | Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Shuqiang Jiang | http://arxiv.org/pdf/2406.09798v3 | link |
2024-10-14 | GAIA: Rethinking Action Quality Assessment for AI-Generated Videos | GAIA:重新审视AI生成视频中的动作质量评估 | Zijian Chen, Wei Sun, Yuan Tian, Jun Jia, Zicheng Zhang, Jiarui Wang, Ru Huang, Xiongkuo Min, Guangtao Zhai, Wenjun Zhang | http://arxiv.org/pdf/2406.06087v2 | link |
2024-10-14 | Twisting Lids Off with Two Hands | 两手持握解锁旋转盖 | Toru Lin, Zhao-Heng Yin, Haozhi Qi, Pieter Abbeel, Jitendra Malik | http://arxiv.org/pdf/2403.02338v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-14 | Exploring Semi-Supervised Learning for Online Mapping | 探索在线建图中的半监督学习技术 | Adam Lilja, Erik Wallin, Junsheng Fu, Lars Hammarstrand | http://arxiv.org/pdf/2410.10279v1 | null |
2024-10-14 | Unsupervised Point Cloud Completion through Unbalanced Optimal Transport | 无监督点云补全通过非平衡最优传输实现 | Taekyung Lee, Jaemoo Choi, Myungjoo Kang, Jaewoong Choi | http://arxiv.org/pdf/2410.02671v2 | null |
2024-10-14 | Learning Temporally Equivariance for Degenerative Disease Progression in OCT by Predicting Future Representations | 学习时序等变特性以预测OCT中退行性疾病进展的未来表征 | Taha Emre, Arunava Chakravarty, Dmitrii Lachinov, Antoine Rivail, Ursula Schmidt-Erfurth, Hrvoje Bogunović | http://arxiv.org/pdf/2405.09404v2 | link |
2024-10-14 | Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation | 探索无标注图像字幕生成:基于检索增强的伪句子生成方法 | Zhiyuan Li, Dongnan Liu, Heng Wang, Chaoyi Zhang, Weidong Cai | http://arxiv.org/pdf/2307.14750v3 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-14 | LVD-2M: A Long-take Video Dataset with Temporally Dense Captions | LVD-2M:具有时间密集型字幕的长镜头视频数据集 | Tianwei Xiong, Yuqing Wang, Daquan Zhou, Zhijie Lin, Jiashi Feng, Xihui Liu | http://arxiv.org/pdf/2410.10816v1 | null |
2024-10-14 | Deep Linear Probe Generators for Weight Space Learning | 深度线性探针生成器在权重空间学习中的应用 | Jonathan Kahana, Eliahu Horwitz, Imri Shuval, Yedid Hoshen | http://arxiv.org/pdf/2410.10811v1 | null |
2024-10-14 | A Counterexample in Image Registration | ,中文翻译为:图像配准中的一个反例 | Serap A. Savari | http://arxiv.org/pdf/2410.10725v1 | null |
2024-10-14 | Artificial Intelligence-Based Triaging of Cutaneous Melanocytic Lesions | 基于人工智能的皮肤黑素细胞病变分拣方法 | Ruben T. Lucassen, Nikolas Stathonikos, Gerben E. Breimer, Mitko Veta, Willeke A. M. Blokx | http://arxiv.org/pdf/2410.10509v1 | null |
2024-10-14 | A Novel No-Reference Image Quality Metric For Assessing Sharpness In Satellite Imagery | 一种新型无参考图像质量评价指标:评估卫星图像清晰度 | Lucas Gonzalo Antonel | http://arxiv.org/pdf/2410.10488v1 | null |
2024-10-14 | PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation | PIVOT-R:面向机器人操作的原语驱动路径点感知世界模型 | Kaidong Zhang, Pengzhen Ren, Bingqian Lin, Junfan Lin, Shikui Ma, Hang Xu, Xiaodan Liang | http://arxiv.org/pdf/2410.10394v1 | null |
2024-10-14 | Automated extraction of 4D aircraft trajectories from video recordings | 4D飞机轨迹从视频记录中的自动提取 | Jean-François Villeforceix | http://arxiv.org/pdf/2410.10249v1 | null |
2024-10-14 | LOBG:Less Overfitting for Better Generalization in Vision-Language Model | LOBG:降低视觉-语言模型过拟合以提升泛化能力 | Chenhao Ding, Xinyuan Gao, Songlin Dong, Yuhang He, Qiang Wang, Alex Kot, Yihong Gong | http://arxiv.org/pdf/2410.10247v1 | null |
2024-10-14 | MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting | MuseTalk:基于潜在空间修复的实时高质量唇同步技术 | Yue Zhang, Minhao Liu, Zhaokang Chen, Bin Wu, Yubin Zeng, Chao Zhan, Yingjie He, Junxin Huang, Wenjiang Zhou | http://arxiv.org/pdf/2410.10122v1 | null |
2024-10-14 | Can We Predict Performance of Large Models across Vision-Language Tasks? | 能否预测大型模型在视觉-语言任务中的性能表现? | Qinyu Zhao, Ming Xu, Kartik Gupta, Akshay Asthana, Liang Zheng, Stephen Gould | http://arxiv.org/pdf/2410.10112v1 | null |
2024-10-14 | Learning to Customize Text-to-Image Diffusion In Diverse Context | 学习在多样化情境中定制文本到图像扩散 | Taewook Kim, Wei Chen, Qiang Qiu | http://arxiv.org/pdf/2410.10058v1 | null |
2024-10-14 | Enhancing Performance of Point Cloud Completion Networks with Consistency Loss | 增强点云补全网络性能的一致性损失方法 | Christofel Rio Goenawan, Kevin Tirta Wijaya, Seung-Hyun Kong | http://arxiv.org/pdf/2410.07298v2 | null |
2024-10-14 | Autoencoded Image Compression for Secure and Fast Transmission | 自编码图像压缩实现安全快速传输 | Aryan Kashyap Naveen, Sunil Thunga, Anuhya Murki, Mahati A Kalale, Shriya Anil | http://arxiv.org/pdf/2407.03990v2 | link |
2024-10-14 | A Review of Electromagnetic Elimination Methods for low-field portable MRI scanner | 低场便携式MRI扫描仪电磁消除方法综述 | Wanyu Bian, Panfeng Li, Mengyao Zheng, Chihang Wang, Anying Li, Ying Li, Haowei Ni, Zixuan Zeng | http://arxiv.org/pdf/2406.17804v2 | null |
2024-10-14 | AdaPose: Towards Cross-Site Device-Free Human Pose Estimation with Commodity WiFi | AdaPose:基于商用WiFi的跨站点无设备人体姿态估计研究 | Yunjiao Zhou, Jianfei Yang, He Huang, Lihua Xie | http://arxiv.org/pdf/2309.16964v2 | null |
2024-10-14 | AR-TTA: A Simple Method for Real-World Continual Test-Time Adaptation | AR-TTA:一种面向真实世界持续测试时适应的简单方法 | Damian Sójka, Sebastian Cygert, Bartłomiej Twardowski, Tomasz Trzciński | http://arxiv.org/pdf/2309.10109v2 | null |
2024-10-14 | MT-SNN: Enhance Spiking Neural Network with Multiple Thresholds | MT-SNN:多阈值增强型脉冲神经网络 | Xiaoting Wang, Yanxiang Zhang | http://arxiv.org/pdf/2303.11127v2 | null |