Skip to content

Latest commit

 

History

History
232 lines (204 loc) · 49.8 KB

2024-12-03.md

File metadata and controls

232 lines (204 loc) · 49.8 KB

[UPDATED!] 2024-12-03 (Update Time)

3DGS

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-03 AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction AniGS:从单张图像生成可动画高斯化身的不一致高斯重建 Lingteng Qiu, Shenhao Zhu, Qi Zuo, Xiaodong Gu, Yuan Dong, Junfei Zhang, Chao Xu, Zhe Li http://arxiv.org/pdf/2412.02684v1 None
2024-12-03 D-MiSo: Editing Dynamic 3D Scenes using Multi-Gaussians Soup D-MiSo:使用多高斯汤编辑动态3D场景 Joanna Waczyńska, Piotr Borycki, Joanna Kaleta, Sławomir Tadeja, Przemysław Spurek http://arxiv.org/pdf/2405.14276v3 None
2024-12-03 RelayGS: Reconstructing Dynamic Scenes with Large-Scale and Complex Motions via Relay Gaussians RelayGS:通过中继高斯重建具有大规模和复杂运动的动态场景 Qiankun Gao, Yanmin Wu, Chengxiang Wen, Jiarui Meng, Luyang Tang, Jie Chen, Ronggang Wang, Jian Zhang http://arxiv.org/pdf/2412.02493v1 https://github.com/gqk/RelayGS
2024-12-03 Spiking GS: Towards High-Accuracy and Low-Cost Surface Reconstruction via Spiking Neuron-based Gaussian Splatting 脉冲GS:通过脉冲神经元基于高斯散布实现高精度和低成本表面重建 Weixing Zhang, Zongrui Li, De Ma, Huajin Tang, Xudong Jiang, Qian Zheng, Gang Pan http://arxiv.org/pdf/2410.07266v5 https://github.com/zju-bmi-lab/SpikingGS.
2024-12-03 Multi-robot autonomous 3D reconstruction using Gaussian splatting with Semantic guidance 多机器人自主3D重建:基于语义引导的高斯散点法 Jing Zeng, Qi Ye, Tianle Liu, Yang Xu, Jin Li, Jinming Xu, Liang Li, Jiming Chen http://arxiv.org/pdf/2412.02249v1 None
2024-12-03 SparseLGS: Sparse View Language Embedded Gaussian Splatting 稀疏视图语言嵌入高斯喷溅 Jun Hu, Zhang Chen, Zhong Li, Yi Xu, Juyong Zhang http://arxiv.org/pdf/2412.02245v1 None
2024-12-03 SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images 稀疏抓取:通过稀疏多视图RGB图像的3D语义高斯分层进行机器人抓取 Junqiu Yu, Xinlin Ren, Yongchong Gu, Haitao Lin, Tianyu Wang, Yi Zhu, Hang Xu, Yu-Gang Jiang http://arxiv.org/pdf/2412.02140v1 None
2024-12-03 Gaussian Object Carver: Object-Compositional Gaussian Splatting with surfaces completion 高斯物体裁剪:具有表面补全的对象组成高斯喷溅 Liu Liu, Xinjie Wang, Jiaxiong Qiu, Tianwei Lin, Xiaolin Zhou, Zhizhong Su http://arxiv.org/pdf/2412.02075v1 None

3D视觉与重建

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-03 Sharp-It: A Multi-view to Multi-view Diffusion Model for 3D Synthesis and Manipulation Sharp-It:一种用于3D合成和操纵的多视角到多视角扩散模型 Yiftach Edelstein, Or Patashnik, Dana Cohen-Bar, Lihi Zelnik-Manor http://arxiv.org/pdf/2412.02631v1 None
2024-12-03 MedTet: An Online Motion Model for 4D Heart Reconstruction MedTet:一种用于4D心脏重建的在线运动模型 Yihong Chen, Jiancheng Yang, Deniz Sayin Mercadier, Hieu Le, Pascal Fua http://arxiv.org/pdf/2412.02589v1 https://github.com/Scalsol/MedTet.
2024-12-03 Tomographic SAR Reconstruction for Forest Height Estimation 森林高度估计的断层扫描合成孔径雷达重建 Grace Colverd, Jumpei Takami, Laura Schade, Karol Bot, Joseph A. Gallego-Mejia http://arxiv.org/pdf/2412.00903v2 None
2024-12-03 LiDAR-based Registration against Georeferenced Models for Globally Consistent Allocentric Maps 基于激光雷达的注册与地理参照模型对比,以实现全局一致的定位中心地图 Jan Quenzel, Linus T. Mallwitz, Benedikt T. Arnold, Sven Behnke http://arxiv.org/pdf/2412.02533v1 None
2024-12-03 ROVER: A Multi-Season Dataset for Visual SLAM ROVER:一个适用于视觉SLAM的多季节数据集 Fabian Schmidt, Constantin Blessing, Markus Enzweiler, Abhinav Valada http://arxiv.org/pdf/2412.02506v1 None
2024-12-03 BYE: Build Your Encoder with One Sequence of Exploration Data for Long-Term Dynamic Scene Understanding 拜拜:使用一次探索数据序列构建您的编码器以实现长期动态场景理解 Chenguang Huang, Shengchao Yan, Wolfram Burgard http://arxiv.org/pdf/2412.02449v1 None
2024-12-03 TimeWalker: Personalized Neural Space for Lifelong Head Avatars 时间漫步者:终身头像的个性化神经网络空间 Dongwei Pan, Yang Li, Hongsheng Li, Kwan-Yee Lin http://arxiv.org/pdf/2412.02421v1 None
2024-12-03 SegNet4D: Efficient Instance-Aware 4D Semantic Segmentation for LiDAR Point Cloud SegNet4D:高效实例感知4D激光雷达点云语义分割 Neng Wang, Ruibin Guo, Chenghao Shi, Ziyue Wang, Hui Zhang, Huimin Lu, Zhiqiang Zheng, Xieyuanli Chen http://arxiv.org/pdf/2406.16279v3 https://github.com/nubot-nudt/SegNet4D.
2024-12-03 3D Face Reconstruction From Radar Images 从雷达图像中重建3D人脸 Valentin Braeutigam, Vanessa Wirth, Ingrid Ullmann, Christian Schüßler, Martin Vossiek, Matthias Berking, Bernhard Egger http://arxiv.org/pdf/2412.02403v1 None
2024-12-03 RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation 规则引导的空间感知网络:用于端到端3D指称表达式分割 Changli Wu, Qi Chen, Jiayi Ji, Haowei Wang, Yiwei Ma, You Huang, Gen Luo, Hao Fei http://arxiv.org/pdf/2412.02402v1 https://github.com/sosppxo/RG-SAN.
2024-12-03 Single-Shot Metric Depth from Focused Plenoptic Cameras 单次测量的聚焦全视场相机度量深度 Blanca Lasheras-Hernandez, Klaus H. Strobl, Sergio Izquierdo, Tim Bodenmüller, Rudolph Triebel, Javier Civera http://arxiv.org/pdf/2412.02386v1 None
2024-12-03 Realistic Surgical Simulation from Monocular Videos 基于单目视频的逼真手术模拟 Kailing Wang, Chen Yang, Keyang Zhao, Xiaokang Yang, Wei Shen http://arxiv.org/pdf/2412.02359v1 None
2024-12-03 SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation 场景因子:可控3D场景生成的因子化潜在3D扩散 Alexey Bokhovkin, Quan Meng, Shubham Tulsiani, Angela Dai http://arxiv.org/pdf/2412.01801v2 None
2024-12-03 Amodal Depth Anything: Amodal Depth Estimation in the Wild : 非模态深度任何物:野外非模态深度估计 Zhenyu Li, Mykola Lavreniuk, Jian Shi, Shariq Farooq Bhat, Peter Wonka http://arxiv.org/pdf/2412.02336v1 None
2024-12-03 Dual Exposure Stereo for Extended Dynamic Range 3D Imaging 双曝光立体视觉实现扩展动态范围三维成像 Juhyung Choi, Jinnyeong Kim, Seokjun Choi, Jinwoo Lee, Samuel Brucker, Mario Bijelic, Felix Heide, Seung-Hwan Baek http://arxiv.org/pdf/2412.02351v1 None
2024-12-03 HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset HumanRig:在大规模数据集中学习人形角色自动绑定 Zedong Chu, Feng Xiong, Meiduo Liu, Jinzhi Zhang, Mingqi Shao, Zhaoxu Sun, Di Wang, Mu Xu http://arxiv.org/pdf/2412.02317v1 None
2024-12-03 Partial Non-rigid Deformations and interpolations of Human Body Surfaces 人体表面部分非刚性变形与插值 Thomas Besnier, Emery Pierson, Sylvain Arguillere, Mohamed Daoudi http://arxiv.org/pdf/2412.02306v1 None
2024-12-03 Viewpoint Consistency in 3D Generation via Attention and CLIP Guidance 通过注意力和CLIP引导实现3D生成中的视点一致性 Qing Zhang, Zehao Chen, Jinguang Tong, Jing Zhang, Jie Hong, Xuesong Li http://arxiv.org/pdf/2412.02287v1 None
2024-12-03 KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation KP-RED:利用语义关键点进行联合3D形状检索和变形 Ruida Zhang, Chenyangguang Zhang, Yan Di, Fabian Manhardt, Xingyu Liu, Federico Tombari, Xiangyang Ji http://arxiv.org/pdf/2403.10099v3 https://github.com/lolrudy/KP-RED.
2024-12-03 Take Your Steps: Hierarchically Efficient Pulmonary Disease Screening via CT Volume Compression 《迈出步伐:通过CT体积压缩实现层次化高效的肺病筛查》 Qian Shao, Kai Zhang, Bang Du, Zepeng Li, Yixuan Wu, Qiyuan Chen, Jian Wu, Jintai Chen http://arxiv.org/pdf/2412.01525v2 None
2024-12-03 How to Use Diffusion Priors under Sparse Views? 如何利用稀疏视图下的扩散先验? Qisen Wang, Yifan Zhao, Jiawei Ma, Jia Li http://arxiv.org/pdf/2412.02225v1 https://github.com/iCVTEAM/IPSM.
2024-12-03 LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models 布局VLM:通过视觉-语言模型的可微3D布局优化 Fan-Yun Sun, Weiyu Liu, Siyi Gu, Dylan Lim, Goutam Bhat, Federico Tombari, Manling Li, Nick Haber http://arxiv.org/pdf/2412.02193v1 None
2024-12-03 CFPNet: Improving Lightweight ToF Depth Completion via Cross-zone Feature Propagation CFPNet:通过跨区域特征传播提升轻量级ToF深度补全 Laiyan Ding, Hualie Jiang, Rui Xu, Rui Huang http://arxiv.org/pdf/2411.04480v4 https://github.com/denyingmxd/CFPNet.
2024-12-03 Towards Cross-View-Consistent Self-Supervised Surround Depth Estimation 朝向跨视角一致的自监督周围深度估计 Laiyan Ding, Hualie Jiang, Jie Li, Yongquan Chen, Rui Huang http://arxiv.org/pdf/2407.04041v3 https://github.com/denyingmxd/CVCDepth.
2024-12-03 HSLiNets: Hyperspectral Image and LiDAR Data Fusion Using Efficient Dual Non-Linear Feature Learning Networks HSLiNets:基于高效双非线性特征学习网络的超光谱图像与激光雷达数据融合 Judy X Yang, Jing Wang, Chen Hong Sui, Zekun Long, Jun Zhou http://arxiv.org/pdf/2412.00302v2 None
2024-12-03 FoveaSPAD: Exploiting Depth Priors for Adaptive and Efficient Single-Photon 3D Imaging FoveaSPAD:利用深度先验进行自适应和高效的单光子3D成像 Justin Folden, Atul Ingle, Sanjeev J. Koppal http://arxiv.org/pdf/2412.02052v1 None

NeRF

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-03 TFS-NeRF: Template-Free NeRF for Semantic 3D Reconstruction of Dynamic Scene TFS-NeRF:无模板NeRF用于动态场景的语义3D重建 Sandika Biswas, Qianyi Wu, Biplab Banerjee, Hamid Rezatofighi http://arxiv.org/pdf/2409.17459v3 None
2024-12-03 Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs 通过回收预调LoRAs解锁视觉基础模型的无调优小样本适应性 Zixuan Hu, Yongxian Wei, Li Shen, Chun Yuan, Dacheng Tao http://arxiv.org/pdf/2412.02220v1 None
2024-12-03 3D representation in 512-Byte:Variational tokenizer is the key for autoregressive 3D generation 512字节内的3D表示:变分标记器是自回归3D生成的关键 Jinzhi Zhang, Feng Xiong, Mu Xu http://arxiv.org/pdf/2412.02202v1 None

人体姿态与动作

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-03 STRIDE: Single-video based Temporally Continuous Occlusion Robust 3D Pose Estimation STRIDE:基于单视频的时序连续遮挡鲁棒3D姿态估计 Rohit Lal, Saketh Bachu, Yash Garg, Arindam Dutta, Calvin-Khang Ta, Dripta S. Raychaudhuri, Hannah Dela Cruz, M. Salman Asif http://arxiv.org/pdf/2312.16221v3 https://github.com/take2rohit/stride

图像生成与编辑

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-03 Diffusion-based Visual Anagram as Multi-task Learning 基于扩散的视觉字谜作为多任务学习 Zhiyuan Xu, Yinhe Chen, Huan-ang Gao, Weiyan Zhao, Guiyu Zhang, Hao Zhao http://arxiv.org/pdf/2412.02693v1 None
2024-12-03 FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation FoundHand:大规模领域特定学习以实现可控手部图像生成 Kefan Chen, Chaerin Min, Linguang Zhang, Shreyas Hampali, Cem Keskin, Srinath Sridhar http://arxiv.org/pdf/2412.02690v1 None
2024-12-03 Taming Scalable Visual Tokenizer for Autoregressive Image Generation 驯服可扩展视觉标记器以实现自回归图像生成 Fengyuan Shi, Zhuoyan Luo, Yixiao Ge, Yujiu Yang, Ying Shan, Limin Wang http://arxiv.org/pdf/2412.02692v1 https://github.com/TencentARC/SEED-Voken.
2024-12-03 SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance SNOOPI:带适当引导的超强一步扩散蒸馏 Viet Nguyen, Anh Aengus Nguyen, Trung Dao, Khoi Nguyen, Cuong Pham, Toan Tran, Anh Tran http://arxiv.org/pdf/2412.02687v1 None
2024-12-03 Diffusion Models with Anisotropic Gaussian Splatting for Image Inpainting 各向异性高斯喷溅扩散模型在图像修复中的应用 Jacob Fein-Ashley, Benjamin Fein-Ashley http://arxiv.org/pdf/2412.01682v2 None
2024-12-03 Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis Switti:设计用于文本到图像合成的尺度感知Transformer Anton Voronov, Denis Kuznedelev, Mikhail Khoroshikh, Valentin Khrulkov, Dmitry Baranchuk http://arxiv.org/pdf/2412.01819v2 None
2024-12-03 Decoupling Dark Knowledge via Block-wise Logit Distillation for Feature-level Alignment 通过块级对数似然蒸馏解耦暗知识以实现特征级对齐 Chengting Yu, Fengzhao Zhang, Ruizhe Chen, Aili Wang, Zuozhu Liu, Shurun Tan, Er-Ping Li http://arxiv.org/pdf/2411.01547v2 None
2024-12-03 MetaShadow: Object-Centered Shadow Detection, Removal, and Synthesis 元阴影:基于对象的阴影检测、去除和合成 Tianyu Wang, Jianming Zhang, Haitian Zheng, Zhihong Ding, Scott Cohen, Zhe Lin, Wei Xiong, Chi-Wing Fu http://arxiv.org/pdf/2412.02635v1 None
2024-12-03 Scaling Image Tokenizers with Grouped Spherical Quantization 图像分词器通过分组球面量化进行扩展 Jiangtao Wang, Zhen Qin, Yifan Zhang, Vincent Tao Hu, Björn Ommer, Rania Briq, Stefan Kesselheim http://arxiv.org/pdf/2412.02632v1 None
2024-12-03 Continual Learning of Personalized Generative Face Models with Experience Replay 持续学习个性化生成人脸模型的经验回放 Annie N. Wang, Luchao Qi, Roni Sengupta http://arxiv.org/pdf/2412.02627v1 None
2024-12-03 Denoising: A Powerful Building-Block for Imaging, Inverse Problems, and Machine Learning 去噪:图像、逆问题和机器学习中的强大构建块 Peyman Milanfar, Mauricio Delbracio http://arxiv.org/pdf/2409.06219v4 None
2024-12-03 dc-GAN: Dual-Conditioned GAN for Face Demorphing From a Single Morph 双条件生成对抗网络:从单个形态进行人脸去形变 Nitish Shukla, Arun Ross http://arxiv.org/pdf/2411.14494v2 None
2024-12-03 LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting 室内场景重光照中的潜在内参与扩散模型的结合:LumiNet Xiaoyan Xing, Konrad Groh, Sezer Karaoglu, Theo Gevers, Anand Bhattad http://arxiv.org/pdf/2412.00177v2 None
2024-12-03 Unveiling Concept Attribution in Diffusion Models 揭示扩散模型中的概念归因 Quang H. Nguyen, Hoang Phan, Khoa D. Doan http://arxiv.org/pdf/2412.02542v1 https://github.com/mail-research/CAD-attribution4diffusion
2024-12-03 ShadowHack: Hacking Shadows via Luminance-Color Divide and Conquer 阴影黑客:通过亮度-颜色分割与征服进行阴影攻击 Jin Hu, Mingjia Li, Xiaojie Guo http://arxiv.org/pdf/2412.02545v1 https://github.com/lime-j/ShadowHack
2024-12-03 WEM-GAN: Wavelet transform based facial expression manipulation WEM-GAN:基于小波变换的面部表情操纵 Dongya Sun, Yunfei Hu, Xianzhe Zhang, Yingsong Hu http://arxiv.org/pdf/2412.02530v1 None
2024-12-03 Towards Rich Emotions in 3D Avatars: A Text-to-3D Avatar Generation Benchmark 迈向丰富情感的3D虚拟形象:文本到3D虚拟形象生成基准 Haidong Xu, Meishan Zhang, Hao Ju, Zhedong Zheng, Hongyuan Zhu, Erik Cambria, Min Zhang, Hao Fei http://arxiv.org/pdf/2412.02508v1 None
2024-12-03 HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving 全息驾驶:面向自动驾驶的全面2D-3D多模态街景生成 Zehuan Wu, Jingcheng Ni, Xiaodong Wang, Yuxin Guo, Rui Chen, Lewei Lu, Jifeng Dai, Yuwen Xiong http://arxiv.org/pdf/2412.01407v2 None
2024-12-03 VISTA: A Panoramic View of Neural Representations VISTA:神经网络表示的全景视图 Tom White http://arxiv.org/pdf/2412.02412v1 None
2024-12-03 Efficient Concertormer for Image Deblurring and Beyond 高效Concertormer在图像去模糊及其他领域的应用 Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien, Ming-Hsuan Yang http://arxiv.org/pdf/2404.06135v2 None
2024-12-03 GenMix: Effective Data Augmentation with Generative Diffusion Model Image Editing GenMix:基于生成扩散模型的图像编辑有效数据增强 Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood, Karthik Nandakumar, Naveed Akhtar http://arxiv.org/pdf/2412.02366v1 None
2024-12-03 UniForm: A Reuse Attention Mechanism Optimized for Efficient Vision Transformers on Edge Devices UniForm:针对边缘设备高效视觉Transformer优化的重用注意力机制 Seul-Ki Yeom, Tae-Ho Kim http://arxiv.org/pdf/2412.02344v1 None
2024-12-03 SimuScope: Realistic Endoscopic Synthetic Dataset Generation through Surgical Simulation and Diffusion Models SimuScope:通过手术模拟和扩散模型生成逼真的内窥镜合成数据集 Sabina Martyniak, Joanna Kaleta, Diego Dall'Alba, Michał Naskręt, Szymon Płotka, Przemysław Korzeniowski http://arxiv.org/pdf/2412.02332v1 https://github.com/SanoScience/SimuScope.
2024-12-03 Controlling the Latent Diffusion Model for Generative Image Shadow Removal via Residual Generation 控制潜在扩散模型通过残差生成进行生成图像阴影去除 Xinjie Li, Yang Zhao, Dong Wang, Yuan Chen, Li Cao, Xiaoping Liu http://arxiv.org/pdf/2412.02322v1 None
2024-12-03 Active Learning via Classifier Impact and Greedy Selection for Interactive Image Retrieval 基于分类器影响和贪婪选择的交互式图像检索主动学习 Leah Bar, Boaz Lerner, Nir Darshan, Rami Ben-Ari http://arxiv.org/pdf/2412.02310v1 https://github.com/barleah/GreedyAL.
2024-12-03 PCIM: Learning Pixel Attributions via Pixel-wise Channel Isolation Mixing in High Content Imaging PCIM:通过高内容成像中的像素通道隔离混合学习像素归属 Daniel Siegismund, Mario Wieser, Stephan Heyse, Stephan Steigele http://arxiv.org/pdf/2412.02275v1 None
2024-12-03 Diffusion Implicit Policy for Unpaired Scene-aware Motion Synthesis 无配对场景感知运动合成的扩散隐式策略 Jingyu Gong, Chong Zhang, Fengqi Liu, Ke Fan, Qianyu Zhou, Xin Tan, Zhizhong Zhang, Yuan Xie http://arxiv.org/pdf/2412.02261v1 None
2024-12-03 Fast LiDAR Data Generation with Rectified Flows 快速校正流下的激光雷达数据生成 Kazuto Nakashima, Xiaowen Liu, Tomoya Miyawaki, Yumi Iwashita, Ryo Kurazume http://arxiv.org/pdf/2412.02241v1 None
2024-12-03 Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative Models 跨注意力头位置模式可以与文本到图像生成模型中的人类视觉概念相一致 Jungwon Park, Jungmin Ko, Dongnam Byun, Jangwon Suh, Wonjong Rhee http://arxiv.org/pdf/2412.02237v1 None
2024-12-03 CubeFormer: A Simple yet Effective Baseline for Lightweight Image Super-Resolution 立方体former:一种简单而有效的轻量级图像超分辨率基线 Jikai Wang, Huan Zheng, Jianbing Shen http://arxiv.org/pdf/2412.02234v1 None
2024-12-03 PriorPath: Coarse-To-Fine Approach for Controlled De-Novo Pathology Semantic Masks Generation PriorPath:受控从头病理语义掩码生成的粗到细方法 Nati Daniel, May Nathan, Eden Azeroual, Yael Fisher, Yonatan Savir http://arxiv.org/pdf/2411.16515v2 None
2024-12-03 GIST: Towards Photorealistic Style Transfer via Multiscale Geometric Representations GIST:通过多尺度几何表示实现逼真风格迁移 Renan A. Rojas-Gomez, Minh N. Do http://arxiv.org/pdf/2412.02214v1 None
2024-12-03 Embedded Prompt Tuning: Towards Enhanced Calibration of Pretrained Models for Medical Images 嵌入式提示微调:迈向增强医学图像预训练模型校准 Wenqiang Zu, Shenghao Xie, Qing Zhao, Guoqi Li, Lei Ma http://arxiv.org/pdf/2407.01003v4 https://github.com/zuwenqiang/EPT.
2024-12-03 Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis 生成摄影:实现逼真文本到图像合成的场景一致相机控制 Yu Yuan, Xijun Wang, Yichen Sheng, Prateek Chennuri, Xingguang Zhang, Stanley Chan http://arxiv.org/pdf/2412.02168v1 None
2024-12-03 Concept Replacer: Replacing Sensitive Concepts in Diffusion Models via Precision Localization 概念替换器:通过精确定位在扩散模型中替换敏感概念 Lingyun Zhang, Yu Xie, Yanwei Fu, Ping Chen http://arxiv.org/pdf/2412.01244v2 None
2024-12-03 PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution PassionSR:基于一步扩散的图像超分辨率中的自适应尺度后训练量化 Libo Zhu, Jianze Li, Haotong Qin, Wenbo Li, Yulun Zhang, Yong Guo, Xiaokang Yang http://arxiv.org/pdf/2411.17106v3 https://github.com/libozhu03/PassionSR.
2024-12-03 DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling DyMO:基于动态多目标调度的免训练扩散模型对齐 Xin Xie, Dong Gong http://arxiv.org/pdf/2412.00759v2 None
2024-12-03 Exploring Frequency-Inspired Optimization in Transformer for Efficient Single Image Super-Resolution 探索Transformer中的频率灵感优化以提高单图像超分辨率效率 Ao Li, Le Zhang, Yun Liu, Ce Zhu http://arxiv.org/pdf/2308.05022v4 https://github.com/AVC2-UESTC/Frequency-Inspired-Optimization-for-EfficientSR.git.
2024-12-03 Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation 释放自回归模型在上下文中学习的潜力以实现少样本图像处理 Bolin Lai, Felix Juefei-Xu, Miao Liu, Xiaoliang Dai, Nikhil Mehta, Chenguang Zhu, Zeyi Huang, James M. Rehg http://arxiv.org/pdf/2412.01027v2 None
2024-12-03 InstantSwap: Fast Customized Concept Swapping across Sharp Shape Differences 即时交换:跨越尖锐形状差异的快速定制概念交换 Chenyang Zhu, Kai Li, Yue Ma, Longxiang Tang, Chengyu Fang, Chubin Chen, Qifeng Chen, Xiu Li http://arxiv.org/pdf/2412.01197v2 None
2024-12-03 Direct Coloring for Self-Supervised Enhanced Feature Decoupling 直接着色用于自监督增强特征解耦 Salman Mohamadi, Gianfranco Doretto, Donald A. Adjeroh http://arxiv.org/pdf/2412.02109v1 None
2024-12-03 PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models 基于透视布局扩散模型的可控街景合成:PerLDiff Jinhua Zhang, Hualian Sheng, Sijia Cai, Bing Deng, Qiao Liang, Wen Li, Ying Fu, Jieping Ye http://arxiv.org/pdf/2407.06109v3 None
2024-12-03 OmniCreator: Self-Supervised Unified Generation with Universal Editing 全创者:基于通用编辑的自监督统一生成 Haodong Chen, Lan Wang, Harry Yang, Ser-Nam Lim http://arxiv.org/pdf/2412.02114v1 None
2024-12-03 TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation 文本中心背景自适应的注意力引导文本到图像生成 Tianyi Liang, Jiangqi Liu, Sicheng Song, Shiqi Jiang, Yifei Huang, Xinzhuo Zhang, Changbo Wang, Chenhui Li http://arxiv.org/pdf/2404.11824v2 None
2024-12-03 Hyperspectral Images Efficient Spatial and Spectral non-Linear Model with Bidirectional Feature Learning 高光谱图像高效空间和光谱非线性模型与双向特征学习 Judy X Yang, Jing Wang, Zekun Long, Chenhong Sui, Jun Zhou http://arxiv.org/pdf/2412.00283v2 None
2024-12-03 AccDiffusion v2: Towards More Accurate Higher-Resolution Diffusion Extrapolation AccDiffusion v2:迈向更高精度的高分辨率扩散外推 Zhihang Lin, Mingbao Lin, Wengyi Zhan, Rongrong Ji http://arxiv.org/pdf/2412.02099v1 https://github.com/lzhxmu/AccDiffusion_v2
2024-12-03 VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition VIGFace:隐私保护的人脸虚拟身份生成 Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam, Junghyun Cho, Ig-Jae Kim http://arxiv.org/pdf/2403.08277v2 None
2024-12-03 Conti-Fuse: A Novel Continuous Decomposition-based Fusion Framework for Infrared and Visible Images 红外与可见光图像的基于连续分解的新型融合框架:Conti-Fuse Hui Li, Haolong Ma, Chunyang Cheng, Zhongwei Shen, Xiaoning Song, Xiao-Jun Wu http://arxiv.org/pdf/2406.04689v3 None
2024-12-03 Multi-student Diffusion Distillation for Better One-step Generators 多学生扩散蒸馏以实现更好的单步生成器 Yanke Song, Jonathan Lorraine, Weili Nie, Karsten Kreis, James Lucas http://arxiv.org/pdf/2410.23274v2 None

多模态学习

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-03 AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? AV-Odyssey Bench:您的多模态LLM真的能理解视听信息吗? Kaixiong Gong, Kaituo Feng, Bohao Li, Yibing Wang, Mofan Cheng, Shijia Yang, Jiaming Han, Benyou Wang http://arxiv.org/pdf/2412.02611v1 None
2024-12-03 Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey 遥感时序视觉-语言模型:全面综述 Chenyang Liu, Jiafan Zhang, Keyan Chen, Man Wang, Zhengxia Zou, Zhenwei Shi http://arxiv.org/pdf/2412.02573v1 https://github.com/Chen-Yang-Liu/Awesome-RS-Temporal-VLM
2024-12-03 SJTU:Spatial judgments in multimodal models towards unified segmentation through coordinate detection 上海交通大学:多模态模型中的空间判断,通过坐标检测实现统一分割 Joongwon Chae, Zhenyu Wang, Peiwu Qin http://arxiv.org/pdf/2412.02565v1 None
2024-12-03 Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention Networks 多模态遥感场景分类:基于VLMs和双交叉注意力网络的实现 Jinjin Cai, Kexin Meng, Baijian Yang, Gang Shao http://arxiv.org/pdf/2412.02531v1 https://github.com/CJR7/MultiAtt-RSSC
2024-12-03 Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents 网格增强视觉:一种简单而有效的多模态智能体增强空间理解方法 Joongwon Chae, Zhenyu Wang, Lian Zhang, Dongmei Yu, Peiwu Qin http://arxiv.org/pdf/2411.18270v2 None
2024-12-03 Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification 动态-LLaVA:通过动态视觉-语言上下文稀疏化的高效多模态大型语言模型 Wenxuan Huang, Zijie Zhai, Yunhang Shen, Shaoshen Cao, Fei Zhao, Xiangfeng Xu, Zheyu Ye, Shaohui Lin http://arxiv.org/pdf/2412.00876v2 https://github.com/Osilly/dynamic_llava
2024-12-03 ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation? ScImage:多模态大型语言模型在科学文本到图像生成方面的表现如何? Leixin Zhang, Steffen Eger, Yinjie Cheng, Weihe Zhai, Jonas Belouadi, Christoph Leiter, Simone Paolo Ponzetto, Fahimeh Moafian http://arxiv.org/pdf/2412.02368v1 None
2024-12-03 Agri-LLaVA: Knowledge-Infused Large Multimodal Assistant on Agricultural Pests and Diseases 农业病虫害知识增强大型多模态助手 Liqiong Wang, Teng Jin, Jinyu Yang, Ales Leonardis, Fangyi Wang, Feng Zheng http://arxiv.org/pdf/2412.02158v1 https://github.com/Kki2Eve/Agri-LLaVA.
2024-12-03 Adaptive Rank, Reduced Forgetting: Knowledge Retention in Continual Learning Vision-Language Models with Dynamic Rank-Selective LoRA 自适应排名,减少遗忘:动态排名选择LoRA在持续学习视觉-语言模型中的知识保留 Haodong Lu, Chongyang Zhao, Jason Xue, Lina Yao, Kristen Moore, Dong Gong http://arxiv.org/pdf/2412.01004v2 None
2024-12-03 Personalized Multimodal Large Language Models: A Survey 个性化多模态大型语言模型:综述 Junda Wu, Hanjia Lyu, Yu Xia, Zhehao Zhang, Joe Barrow, Ishita Kumar, Mehrnoosh Mirtaheri, Hongjie Chen http://arxiv.org/pdf/2412.02142v1 None
2024-12-03 WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image WSI-LLaVA:一种用于全切片图像的多模态大型语言模型 Yuci Liang, Xinheng Lyu, Meidan Ding, Wenting Chen, Jipeng Zhang, Yuexiang Ren, Xiangjian He, Song Wu http://arxiv.org/pdf/2412.02141v1 None
2024-12-03 Rethinking Self-Supervised Learning Within the Framework of Partial Information Decomposition 重新思考在部分信息分解框架下的自监督学习 Salman Mohamadi, Gianfranco Doretto, Donald A. Adjeroh http://arxiv.org/pdf/2412.02121v1 None
2024-12-03 ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification 非对称语义对齐网络:用于RGB和SAR图像土地覆盖分类 Pan Zhang, Baochai Peng, Chaoran Lu, Quanjin Huang http://arxiv.org/pdf/2412.02044v1 https://github.com/whu-pzhang/ASANet

目标检测与分割

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-03 Planning-Guided Diffusion Policy Learning for Generalizable Contact-Rich Bimanual Manipulation 规划引导的扩散策略学习以实现通用的富含接触的双臂操作 Xuanlin Li, Tong Zhao, Xinghao Zhu, Jiuguang Wang, Tao Pang, Kuan Fang http://arxiv.org/pdf/2412.02676v1 None
2024-12-03 Go beyond End-to-End Training: Boosting Greedy Local Learning with Context Supply 超越端到端训练:通过上下文供应增强贪婪局部学习 Chengting Yu, Fengzhao Zhang, Hanzhi Ma, Aili Wang, Erping Li http://arxiv.org/pdf/2312.07636v2 https://github.com/Tab-ct/ContSup.
2024-12-03 Robust soybean seed yield estimation using high-throughput ground robot videos 基于高通量地面机器人视频的鲁棒大豆产量估计 Jiale Feng, Samuel W. Blair, Timilehin Ayanlade, Aditya Balu, Baskar Ganapathysubramanian, Arti Singh, Soumik Sarkar, Asheesh K Singh http://arxiv.org/pdf/2412.02642v1 None
2024-12-03 A Bidirectional Long Short Term Memory Approach for Infrastructure Health Monitoring Using On-board Vibration Response 双向长短期记忆方法在利用车载振动响应进行基础设施健康监测中的应用 R. R. Samani, A. Nunez, B. De Schutter http://arxiv.org/pdf/2412.02643v1 None
2024-12-03 Class-wise Autoencoders Measure Classification Difficulty And Detect Label Mistakes 分类自编码器衡量分类难度并检测标签错误 Jacob Marks, Brent A. Griffin, Jason J. Corso http://arxiv.org/pdf/2412.02596v1 https://github.com/voxel51/reconstruction-error-ratios.
2024-12-03 OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation OCR阻碍RAG:评估OCR对检索增强生成的影响级联效应 Junyuan Zhang, Qintong Zhang, Bin Wang, Linke Ouyang, Zichen Wen, Ying Li, Ka-Ho Chow, Conghui He http://arxiv.org/pdf/2412.02592v1 https://github.com/opendatalab/OHR-Bench
2024-12-03 Segmentation of Coronary Artery Stenosis in X-ray Angiography using Mamba Models 基于Mamba模型的X射线血管造影冠状动脉狭窄分割 Ali Rostami, Fatemeh Fouladi, Hedieh Sajedi http://arxiv.org/pdf/2412.02568v1 None
2024-12-03 Copy-Move Forgery Detection and Question Answering for Remote Sensing Image 遥感图像的复制-移动伪造检测与问答 Ze Zhang, Enyuan Zhao, Ziyi Wan, Jie Nie, Xinyue Liang, Lei Huang http://arxiv.org/pdf/2412.02575v1 https://github.com/shenyedepisa/RSCMQA.
2024-12-03 Comparative Analysis of Resource-Efficient CNN Architectures for Brain Tumor Classification 脑肿瘤分类中资源高效CNN架构的比较分析 Md Ashik Khan, Rafath Bin Zafar Auvee http://arxiv.org/pdf/2411.15596v2 None
2024-12-03 Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection 开放集半监督目标检测的协同特征-对数对比学习 Xinhao Zhong, Siyu Jiao, Yao Zhao, Yunchao Wei http://arxiv.org/pdf/2411.13001v2 None
2024-12-03 Multi-Class Abnormality Classification Task in Video Capsule Endoscopy 多类别视频胶囊内窥镜异常分类任务 Dev Rishi Verma, Vibhor Saxena, Dhruv Sharma, Arpan Gupta http://arxiv.org/pdf/2410.19973v3 None
2024-12-03 OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations OODFace:在常见 corruption 和外观变化下的面部识别鲁棒性基准测试 Caixin Kang, Yubo Chen, Shouwei Ruan, Shiji Zhao, Ruochen Zhang, Jiayi Wang, Shan Fu, Xingxing Wei http://arxiv.org/pdf/2412.02479v1 None
2024-12-03 Resonance: Learning to Predict Social-Aware Pedestrian Trajectories as Co-Vibrations 共鸣:学习预测具有社会意识的行人轨迹作为共振动 Conghao Wong, Ziqian Zou, Beihao Xia, Xinge You http://arxiv.org/pdf/2412.02447v1 None
2024-12-03 DPE-Net: Dual-Parallel Encoder Based Network for Semantic Segmentation of Polyps DPE-Net:基于双并行编码器的息肉语义分割网络 Malik Abdul Manan, Feng Jinchao, Shahzad Ahmed, Abdul Raheem http://arxiv.org/pdf/2412.00888v2 None
2024-12-03 Multi-scale and Multi-path Cascaded Convolutional Network for Semantic Segmentation of Colorectal Polyps 多尺度多路径级联卷积网络用于结直肠癌息肉的语义分割 Malik Abdul Manan, Feng Jinchao, Muhammad Yaqub, Shahzad Ahmed, Syed Muhammad Ali Imran, Imran Shabir Chuhan, Haroon Ahmed Khan http://arxiv.org/pdf/2412.02443v1 None
2024-12-03 PolarBEVDet: Exploring Polar Representation for Multi-View 3D Object Detection in Bird's-Eye-View 极点BEVDet:探索极点表示在鸟瞰图多视图3D目标检测中的应用 Zichen Yu, Quanli Liu, Wei Wang, Liyong Zhang, Xiaoguang Zhao http://arxiv.org/pdf/2408.16200v2 https://github.com/Yzichen/PolarBEVDet.git.
2024-12-03 Facial Expression Recognition with Controlled Privacy Preservation and Feature Compensation 面部表情识别:可控隐私保护与特征补偿 Feng Xu, David Ahmedt-Aristizabal, Lars Petersson, Dadong Wang, Xun Li http://arxiv.org/pdf/2412.00277v2 None
2024-12-03 Who Walks With You Matters: Perceiving Social Interactions with Groups for Pedestrian Trajectory Prediction 与你同行者至关重要:用于行人轨迹预测的群体社交交互感知 Ziqian Zou, Conghao Wong, Beihao Xia, Qinmu Peng, Xinge You http://arxiv.org/pdf/2412.02395v1 None
2024-12-03 Bio-inspired visual relative localization for large swarms of UAVs 生物启发的大规模无人机群视觉相对定位 Martin Křížek, Matouš Vrba, Antonella Barišić Kulaš, Stjepan Bogdan, Martin Saska http://arxiv.org/pdf/2412.02393v1 None
2024-12-03 Trajectory-based Road Autolabeling with Lidar-Camera Fusion in Winter Conditions 基于轨迹的冬季条件下激光雷达-摄像头融合道路自动标注 Eerik Alamikkotervo, Henrik Toikka, Kari Tammi, Risto Ojala http://arxiv.org/pdf/2412.02370v1 https://github.com/eerik98/lidar-camera-road-autolabeling.git
2024-12-03 Active Negative Loss: A Robust Framework for Learning with Noisy Labels 主动负损失:一种用于带噪声标签学习的鲁棒框架 Xichen Ye, Yifan Wu, Yiwen Xu, Xiaoqiang Li, Weizhong Zhang, Yifan Chen http://arxiv.org/pdf/2412.02373v1 https://github.com/Virusdoll/Active-Negative-Loss.
2024-12-03 Enhancing joint automatic chest X-ray diagnosis and clinical visual attention prediction with multi-stage cooperative learning 提升多阶段协同学习在联合自动胸部X光片诊断与临床视觉注意力预测中的应用 Zirui Qiu, Hassan Rivaz, Yiming Xiao http://arxiv.org/pdf/2403.16970v3 None
2024-12-03 A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation 良好的基础胜过众多标签:高效标签的全景分割 Niclas Vödisch, Kürsat Petek, Markus Käppeler, Abhinav Valada, Wolfram Burgard http://arxiv.org/pdf/2405.19035v2 None
2024-12-03 LoCo: Low-Contrast-Enhanced Contrastive Learning for Semi-Supervised Endoscopic Image Segmentation 低对比度增强对比学习用于半监督内窥镜图像分割 Lingcong Cai, Yun Li, Xiaomao Fan, Kaixuan Song, Yongcheng Li, Yixuan Yuan, Ruxin Wang, Wenbin Lei http://arxiv.org/pdf/2412.02314v1 https://github.com/AnoK3111/LoCo.
2024-12-03 Noisy Ostracods: A Fine-Grained, Imbalanced Real-World Dataset for Benchmarking Robust Machine Learning and Label Correction Methods 噪声盲蝽:用于基准测试鲁棒机器学习和标签校正方法的细粒度、不平衡真实世界数据集 Jiamian Hu, Yuanyuan Hong, Yihua Chen, He Wang, Moriaki Yasuhara http://arxiv.org/pdf/2412.02313v1 https://github.com/H-Jamieu/Noisy_ostracods.
2024-12-03 Initial Study On Improving Segmentation By Combining Preoperative CT And Intraoperative CBCT Using Synthetic Data 初步研究:通过结合术前CT和术中CBCT以及合成数据改进分割 Maximilian E. Tschuchnig, Philipp Steininger, Michael Gadermayr http://arxiv.org/pdf/2412.02294v1 None
2024-12-03 Monocular Lane Detection Based on Deep Learning: A Survey 单目车道检测基于深度学习:综述 Xin He, Haiyun Guo, Kuan Zhu, Bingke Zhu, Xu Zhao, Jianwu Fang, Jinqiao Wang http://arxiv.org/pdf/2411.16316v4 https://github.com/Core9724/Awesome-Lane-Detection
2024-12-03 ASTM :Autonomous Smart Traffic Management System Using Artificial Intelligence CNN and LSTM ASTM:基于人工智能CNN和LSTM的自主智能交通管理系统 Christofel Rio Goenawan http://arxiv.org/pdf/2410.10929v5 None
2024-12-03 AH-OCDA: Amplitude-based Curriculum Learning and Hopfield Segmentation Model for Open Compound Domain Adaptation 基于振幅的课程学习和霍普菲尔德分割模型用于开放复合域自适应 Jaehyun Choi, Junwon Ko, Dong-Jae Lee, Junmo Kim http://arxiv.org/pdf/2412.02280v1 None
2024-12-03 Sustainable Self-evolution Adversarial Training 可持续的自适应对抗训练 Wenxuan Wang, Chenglei Wang, Huihui Qi, Menghao Ye, Xuelin Qian, Peng Wang, Yanning Zhang http://arxiv.org/pdf/2412.02270v1 None
2024-12-03 GSGTrack: Gaussian Splatting-Guided Object Pose Tracking from RGB Videos GSGTrack:基于高斯喷溅引导的RGB视频目标姿态跟踪 Zhiyuan Chen, Fan Lu, Guo Yu, Bin Li, Sanqing Qu, Yuan Huang, Changhong Fu, Guang Chen http://arxiv.org/pdf/2412.02267v1 None
2024-12-03 Diabetic Retinopathy Classification from Retinal Images using Machine Learning Approaches 糖尿病视网膜病变从视网膜图像中利用机器学习方法进行分类 Indronil Bhattacharjee, Al-Mahmud, Tareq Mahmud http://arxiv.org/pdf/2412.02265v1 None
2024-12-03 Composing Open-domain Vision with RAG for Ocean Monitoring and Conservation 基于RAG的开放域视觉合成用于海洋监测与保护 Sepand Dyanatkar, Angran Li, Alexander Dungate http://arxiv.org/pdf/2412.02262v1 None
2024-12-03 ProbPose: A Probabilistic Approach to 2D Human Pose Estimation 概率姿态估计:二维人体姿态估计的概率方法 Miroslav Purkrabek, Jiri Matas http://arxiv.org/pdf/2412.02254v1 None
2024-12-03 Vision Transformers for Weakly-Supervised Microorganism Enumeration 视觉Transformer在弱监督微生物计数中的应用 Javier Ureña Santiago, Thomas Ströhle, Antonio Rodríguez-Sánchez, Ruth Breu http://arxiv.org/pdf/2412.02250v1 None
2024-12-03 Learning from Reduced Labels for Long-Tailed Data 从减少标签中学习长尾数据 Meng Wei, Zhongnian Li, Yong Zhou, Xinzheng Xu http://arxiv.org/pdf/2403.16469v2 None
2024-12-03 U-Net in Medical Image Segmentation: A Review of Its Applications Across Modalities U-Net在医学图像分割中的应用综述:跨模态应用回顾 Fnu Neha, Deepshikha Bhati, Deepak Kumar Shukla, Sonavi Makarand Dalvi, Nikolaos Mantzou, Safa Shubbar http://arxiv.org/pdf/2412.02242v1 None
2024-12-03 SpaGBOL: Spatial-Graph-Based Orientated Localisation 空间图基导向定位:SpaGBOL Tavis Shore, Oscar Mendez, Simon Hadfield http://arxiv.org/pdf/2409.15514v2 None
2024-12-03 Phase-Informed Tool Segmentation for Manual Small-Incision Cataract Surgery 基于相位信息的工具分割用于手工小切口白内障手术 Bhuvan Sachdeva, Naren Akash, Tajamul Ashraf, Simon Mueller, Thomas Schultz, Maximilian W. M. Wintergerst, Niharika Singri Prasad, Kaushik Murali http://arxiv.org/pdf/2411.16794v2 None
2024-12-03 Jailbreak Large Vision-Language Models Through Multi-Modal Linkage 通过多模态链接破解大型视觉-语言模型 Yu Wang, Xiaofei Zhou, Yichen Wang, Geyuan Zhang, Tianxing He http://arxiv.org/pdf/2412.00473v2 https://github.com/wangyu-ovo/MML
2024-12-03 CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy CC-OCR:用于评估大型多模态模型在识字能力方面的全面且具有挑战性的OCR基准 Zhibo Yang, Jun Tang, Zhaohai Li, Pengfei Wang, Jianqiang Wan, Humen Zhong, Xuejing Liu, Mingkun Yang http://arxiv.org/pdf/2412.02210v1 None
2024-12-03 Transformer-Metric Loss for CNN-Based Face Recognition 基于CNN的人脸识别的Transformer度量损失 Pritesh Prakash, Ashish Jacob Sam http://arxiv.org/pdf/2412.02198v1 None
2024-12-03 Cascaded Multi-Scale Attention for Enhanced Multi-Scale Feature Extraction and Interaction with Low-Resolution Images 级联多尺度注意力:增强多尺度特征提取与低分辨率图像的交互 Xiangyong Lu, Masanori Suganuma, Takayuki Okatani http://arxiv.org/pdf/2412.02197v1 https://github.com/xyongLu/CMSA.
2024-12-03 Multi-Granularity Video Object Segmentation 多粒度视频目标分割 Sangbeom Lim, Seongchan Kim, Seungjun An, Seokju Cho, Paul Hongsuck Seo, Seungryong Kim http://arxiv.org/pdf/2412.01471v2 None
2024-12-03 CamoFA: A Learnable Fourier-based Augmentation for Camouflage Segmentation 伪装分割的可学习傅里叶增强:CamoFA Minh-Quan Le, Minh-Triet Tran, Trung-Nghia Le, Tam V. Nguyen, Thanh-Toan Do http://arxiv.org/pdf/2308.15660v2 None
2024-12-03 Anatomically-Grounded Fact Checking of Automated Chest X-ray Reports 基于解剖学的自动化胸部X光报告事实核查 R. Mahmood, K. C. L. Wong, D. M. Reyes, N. D'Souza, L. Shi, J. Wu, P. Kaviani, M. Kalra http://arxiv.org/pdf/2412.02177v1 None
2024-12-03 Underload: Defending against Latency Attacks for Object Detectors on Edge Devices 边缘设备上目标检测器的低负载:防御延迟攻击 Tianyi Wang, Zichen Wang, Cong Wang, Yuanchao Shu, Ruilong Deng, Peng Cheng, Jiming Chen http://arxiv.org/pdf/2412.02171v1 None
2024-12-03 VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning 视觉推理中的细粒度批评与校正基准:自我改进的评估 Xueqing Wu, Yuheng Ding, Bingxuan Li, Pan Lu, Da Yin, Kai-Wei Chang, Nanyun Peng http://arxiv.org/pdf/2412.02172v1 None
2024-12-03 GFreeDet: Exploiting Gaussian Splatting and Foundation Models for Model-free Unseen Object Detection in the BOP Challenge 2024 GFreeDet:利用高斯喷溅和基础模型在BOP挑战2024中进行无模型未见物体检测 Xingyu Liu, Yingyue Li, Chengxi Li, Gu Wang, Chenyangguang Zhang, Ziqin Huang, Xiangyang Ji http://arxiv.org/pdf/2412.01552v2 None
2024-12-03 An Empirical Study of Mamba-based Pedestrian Attribute Recognition 基于Mamba的行人属性识别实证研究 Xiao Wang, Weizhe Kong, Jiandong Jin, Shiao Wang, Ruichong Gao, Qingchuan Ma, Chenglong Li, Jin Tang http://arxiv.org/pdf/2407.10374v2 https://github.com/Event-AHU/OpenPAR
2024-12-03 TransFair: Transferring Fairness from Ocular Disease Classification to Progression Prediction TransFair:从眼部疾病分类到进展预测的公平性迁移 Leila Gheisi, Henry Chu, Raju Gottumukkala, Yan Luo, Xingquan Zhu, Mengyu Wang, Min Shi http://arxiv.org/pdf/2412.00051v2 None
2024-12-03 GSOT3D: Towards Generic 3D Single Object Tracking in the Wild GSOT3D:迈向通用野外3D单目标跟踪 Yifan Jiao, Yunhao Li, Junhua Ding, Qing Yang, Song Fu, Heng Fan, Libo Zhang http://arxiv.org/pdf/2412.02129v1 https://github.com/ailovejinx/GSOT3D.
2024-12-03 Topology-Preserving Image Segmentation with Spatial-Aware Persistent Feature Matching 拓扑保持的空间感知持久特征匹配图像分割 Bo Wen, Haochen Zhang, Dirk-Uwe G. Bartsch, William R. Freeman, Truong Q. Nguyen, Cheolhong An http://arxiv.org/pdf/2412.02076v1 None
2024-12-03 Performance Comparison of Deep Learning Techniques in Naira Classification 深度学习技术在奈拉分类性能比较 Ismail Ismail Tijjani, Ahmad Abubakar Mustapha, Isma'il Tijjani Idris http://arxiv.org/pdf/2412.02072v1 None
2024-12-03 Another Vertical View: A Hierarchical Network for Heterogeneous Trajectory Prediction via Spectrums 另一种垂直视角:通过频谱进行异构轨迹预测的分层网络 Beihao Xia, Conghao Wong, Duanquan Xu, Qinmu Peng, Xinge You http://arxiv.org/pdf/2304.05106v2 None
2024-12-03 Dynamic Adversarial Attacks on Autonomous Driving Systems 动态对抗攻击自动驾驶系统 Amirhosein Chahe, Chenan Wang, Abhishek Jeyapratap, Kaidi Xu, Lifeng Zhou http://arxiv.org/pdf/2312.06701v3 None
2024-12-03 CLERF: Contrastive LEaRning for Full Range Head Pose Estimation CLERF:全范围头部姿态估计的对比学习方法 Ting-Ruen Wei, Haowei Liu, Huei-Chung Hu, Xuyang Wu, Yi Fang, Hsin-Tai Wu http://arxiv.org/pdf/2412.02066v1 None
2024-12-03 Redundant Queries in DETR-Based 3D Detection Methods: Unnecessary and Prunable 基于DETR的3D检测方法中的冗余查询:不必要的且可剪枝 Lizhen Xu, Shanmin Pang, Wenzhao Qiu, Zehao Wu, Xiuxiu Bai, Kuizhi Mei, Jianru Xue http://arxiv.org/pdf/2412.02054v1 https://github.com/iseri27/Gpq

视频理解与处理

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-03 Towards Neuro-Symbolic Video Understanding 迈向神经符号视频理解 Minkyu Choi, Harsh Goel, Mohammad Omama, Yunhao Yang, Sahil Shah, Sandeep Chinchali http://arxiv.org/pdf/2403.11021v3 None
2024-12-03 Motion Prompting: Controlling Video Generation with Motion Trajectories 运动提示:通过运动轨迹控制视频生成 Daniel Geng, Charles Herrmann, Junhwa Hur, Forrester Cole, Serena Zhang, Tobias Pfaff, Tatiana Lopez-Guevara, Carl Doersch http://arxiv.org/pdf/2412.02700v1 None
2024-12-03 Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification 神经符号形式验证在文本到视频模型评估中的应用 S. P. Sharan, Minkyu Choi, Sahil Shah, Harsh Goel, Mohammad Omama, Sandeep Chinchali http://arxiv.org/pdf/2411.16718v3 None
2024-12-03 Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback 利用人工智能反馈提升文本到视频生成中的动态物体交互 Hiroki Furuta, Heiga Zen, Dale Schuurmans, Aleksandra Faust, Yutaka Matsuo, Percy Liang, Sherry Yang http://arxiv.org/pdf/2412.02617v1 None
2024-12-03 It Takes Two: Real-time Co-Speech Two-person's Interaction Generation via Reactive Auto-regressive Diffusion Model 《协同互动:通过反应式自回归扩散模型实现实时双人对语交互生成》 Mingyi Shi, Dafei Qin, Leo Ho, Zhouyingcheng Liao, Yinghao Huang, Junichi Yamagishi, Taku Komura http://arxiv.org/pdf/2412.02419v1 None
2024-12-03 VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation 视频思维生成:多帧视频生成的协作框架 Mingzhe Zheng, Yongqi Xu, Haojian Huang, Xuran Ma, Yexin Liu, Wenjie Shu, Yatian Pang, Feilong Tang http://arxiv.org/pdf/2412.02259v1 None
2024-12-03 OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation OpenHumanVid:用于提升以人为中心的视频生成的超大规模高质量数据集 Hui Li, Mingwang Xu, Yun Zhan, Shan Mu, Jiaye Li, Kaihui Cheng, Yuxuan Chen, Tan Chen http://arxiv.org/pdf/2412.00115v2 None
2024-12-03 VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models 视觉XL:基于潜在图像扩散模型的高清视频逆问题求解器 Taesung Kwon, Jong Chul Ye http://arxiv.org/pdf/2412.00156v2 None
2024-12-03 VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding 视频ICL:基于置信度的迭代情境学习以实现分布外视频理解 Kangsan Kim, Geon Park, Youngwan Lee, Woongyeong Yeo, Sung Ju Hwang http://arxiv.org/pdf/2412.02186v1 https://github.com/KangsanKim07/VideoICL
2024-12-03 From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding 从秒到小时:全面长视频理解的多模态大型语言模型综述 Heqing Zou, Tianze Luo, Guiyang Xie, Victor, Zhang, Fengmao Lv, Guangcong Wang, Junyang Chen http://arxiv.org/pdf/2409.18938v2 None
2024-12-03 Understanding Particles From Video: Property Estimation of Granular Materials via Visuo-Haptic Learning 从视频中理解粒子:通过视觉-触觉学习估计颗粒材料的属性 Zeqing Zhang, Guangze Zheng, Xuebo Ji, Guanqi Chen, Ruixing Jia, Wentao Chen, Guanhua Chen, Liangjun Zhang http://arxiv.org/pdf/2412.02119v1 None
2024-12-03 DuoCast: Duo-Probabilistic Meteorology-Aware Model for Extended Precipitation Nowcasting 双播:扩展降水预报的双概率气象感知模型 Penghui Wen, Lei Bai, Mengwei He, Patrick Filippi, Feng Zhang, Thomas Francis Bishop, Zhiyong Wang, Kun Hu http://arxiv.org/pdf/2412.01091v2 https://github.com/ph-w2000/DuoCast.
2024-12-03 Progress-Aware Video Frame Captioning 感知进度的视频帧字幕生成 Zihui Xue, Joungbin An, Xitong Yang, Kristen Grauman http://arxiv.org/pdf/2412.02071v1 None

其他

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-03 MERGE: Multi-faceted Hierarchical Graph-based GNN for Gene Expression Prediction from Whole Slide Histopathology Images MERGE:基于多角度分层图的GNN在整张切片病理图像中进行基因表达预测 Aniruddha Ganguly, Debolina Chatterjee, Wentao Huang, Jie Zhang, Alisa Yurovsky, Travis Steele Johnson, Chao Chen http://arxiv.org/pdf/2412.02601v1 None
2024-12-03 OMENN: One Matrix to Explain Neural Networks OMENN:一个矩阵解释神经网络 Adam Wróbel, Mikołaj Janusz, Bartosz Zieliński, Dawid Rymarczyk http://arxiv.org/pdf/2412.02399v1 None
2024-12-03 Enabling DBSCAN for Very Large-Scale High-Dimensional Spaces 启用DBSCAN处理大规模高维空间 Yongyu Wang http://arxiv.org/pdf/2411.11421v3 None
2024-12-03 ILASH: A Predictive Neural Architecture Search Framework for Multi-Task Applications ILASH:多任务应用预测性神经架构搜索框架 Md Hafizur Rahman, Md Mashfiq Rizvee, Sumaiya Shomaji, Prabuddha Chakraborty http://arxiv.org/pdf/2412.02116v1 None
2024-12-03 A Classic-Quantum Hybrid Network Framework: CQH-Net 经典-量子混合网络框架:CQH-Net Ao Liu, Cuihong Wen, Jieci Wang http://arxiv.org/pdf/2412.02059v1 None