发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-03 | AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction | AniGS:从单张图像生成可动画高斯化身的不一致高斯重建 | Lingteng Qiu, Shenhao Zhu, Qi Zuo, Xiaodong Gu, Yuan Dong, Junfei Zhang, Chao Xu, Zhe Li | http://arxiv.org/pdf/2412.02684v1 | None |
2024-12-03 | D-MiSo: Editing Dynamic 3D Scenes using Multi-Gaussians Soup | D-MiSo:使用多高斯汤编辑动态3D场景 | Joanna Waczyńska, Piotr Borycki, Joanna Kaleta, Sławomir Tadeja, Przemysław Spurek | http://arxiv.org/pdf/2405.14276v3 | None |
2024-12-03 | RelayGS: Reconstructing Dynamic Scenes with Large-Scale and Complex Motions via Relay Gaussians | RelayGS:通过中继高斯重建具有大规模和复杂运动的动态场景 | Qiankun Gao, Yanmin Wu, Chengxiang Wen, Jiarui Meng, Luyang Tang, Jie Chen, Ronggang Wang, Jian Zhang | http://arxiv.org/pdf/2412.02493v1 | https://github.com/gqk/RelayGS |
2024-12-03 | Spiking GS: Towards High-Accuracy and Low-Cost Surface Reconstruction via Spiking Neuron-based Gaussian Splatting | 脉冲GS:通过脉冲神经元基于高斯散布实现高精度和低成本表面重建 | Weixing Zhang, Zongrui Li, De Ma, Huajin Tang, Xudong Jiang, Qian Zheng, Gang Pan | http://arxiv.org/pdf/2410.07266v5 | https://github.com/zju-bmi-lab/SpikingGS. |
2024-12-03 | Multi-robot autonomous 3D reconstruction using Gaussian splatting with Semantic guidance | 多机器人自主3D重建:基于语义引导的高斯散点法 | Jing Zeng, Qi Ye, Tianle Liu, Yang Xu, Jin Li, Jinming Xu, Liang Li, Jiming Chen | http://arxiv.org/pdf/2412.02249v1 | None |
2024-12-03 | SparseLGS: Sparse View Language Embedded Gaussian Splatting | 稀疏视图语言嵌入高斯喷溅 | Jun Hu, Zhang Chen, Zhong Li, Yi Xu, Juyong Zhang | http://arxiv.org/pdf/2412.02245v1 | None |
2024-12-03 | SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images | 稀疏抓取:通过稀疏多视图RGB图像的3D语义高斯分层进行机器人抓取 | Junqiu Yu, Xinlin Ren, Yongchong Gu, Haitao Lin, Tianyu Wang, Yi Zhu, Hang Xu, Yu-Gang Jiang | http://arxiv.org/pdf/2412.02140v1 | None |
2024-12-03 | Gaussian Object Carver: Object-Compositional Gaussian Splatting with surfaces completion | 高斯物体裁剪:具有表面补全的对象组成高斯喷溅 | Liu Liu, Xinjie Wang, Jiaxiong Qiu, Tianwei Lin, Xiaolin Zhou, Zhizhong Su | http://arxiv.org/pdf/2412.02075v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-03 | Sharp-It: A Multi-view to Multi-view Diffusion Model for 3D Synthesis and Manipulation | Sharp-It:一种用于3D合成和操纵的多视角到多视角扩散模型 | Yiftach Edelstein, Or Patashnik, Dana Cohen-Bar, Lihi Zelnik-Manor | http://arxiv.org/pdf/2412.02631v1 | None |
2024-12-03 | MedTet: An Online Motion Model for 4D Heart Reconstruction | MedTet:一种用于4D心脏重建的在线运动模型 | Yihong Chen, Jiancheng Yang, Deniz Sayin Mercadier, Hieu Le, Pascal Fua | http://arxiv.org/pdf/2412.02589v1 | https://github.com/Scalsol/MedTet. |
2024-12-03 | Tomographic SAR Reconstruction for Forest Height Estimation | 森林高度估计的断层扫描合成孔径雷达重建 | Grace Colverd, Jumpei Takami, Laura Schade, Karol Bot, Joseph A. Gallego-Mejia | http://arxiv.org/pdf/2412.00903v2 | None |
2024-12-03 | LiDAR-based Registration against Georeferenced Models for Globally Consistent Allocentric Maps | 基于激光雷达的注册与地理参照模型对比,以实现全局一致的定位中心地图 | Jan Quenzel, Linus T. Mallwitz, Benedikt T. Arnold, Sven Behnke | http://arxiv.org/pdf/2412.02533v1 | None |
2024-12-03 | ROVER: A Multi-Season Dataset for Visual SLAM | ROVER:一个适用于视觉SLAM的多季节数据集 | Fabian Schmidt, Constantin Blessing, Markus Enzweiler, Abhinav Valada | http://arxiv.org/pdf/2412.02506v1 | None |
2024-12-03 | BYE: Build Your Encoder with One Sequence of Exploration Data for Long-Term Dynamic Scene Understanding | 拜拜:使用一次探索数据序列构建您的编码器以实现长期动态场景理解 | Chenguang Huang, Shengchao Yan, Wolfram Burgard | http://arxiv.org/pdf/2412.02449v1 | None |
2024-12-03 | TimeWalker: Personalized Neural Space for Lifelong Head Avatars | 时间漫步者:终身头像的个性化神经网络空间 | Dongwei Pan, Yang Li, Hongsheng Li, Kwan-Yee Lin | http://arxiv.org/pdf/2412.02421v1 | None |
2024-12-03 | SegNet4D: Efficient Instance-Aware 4D Semantic Segmentation for LiDAR Point Cloud | SegNet4D:高效实例感知4D激光雷达点云语义分割 | Neng Wang, Ruibin Guo, Chenghao Shi, Ziyue Wang, Hui Zhang, Huimin Lu, Zhiqiang Zheng, Xieyuanli Chen | http://arxiv.org/pdf/2406.16279v3 | https://github.com/nubot-nudt/SegNet4D. |
2024-12-03 | 3D Face Reconstruction From Radar Images | 从雷达图像中重建3D人脸 | Valentin Braeutigam, Vanessa Wirth, Ingrid Ullmann, Christian Schüßler, Martin Vossiek, Matthias Berking, Bernhard Egger | http://arxiv.org/pdf/2412.02403v1 | None |
2024-12-03 | RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation | 规则引导的空间感知网络:用于端到端3D指称表达式分割 | Changli Wu, Qi Chen, Jiayi Ji, Haowei Wang, Yiwei Ma, You Huang, Gen Luo, Hao Fei | http://arxiv.org/pdf/2412.02402v1 | https://github.com/sosppxo/RG-SAN. |
2024-12-03 | Single-Shot Metric Depth from Focused Plenoptic Cameras | 单次测量的聚焦全视场相机度量深度 | Blanca Lasheras-Hernandez, Klaus H. Strobl, Sergio Izquierdo, Tim Bodenmüller, Rudolph Triebel, Javier Civera | http://arxiv.org/pdf/2412.02386v1 | None |
2024-12-03 | Realistic Surgical Simulation from Monocular Videos | 基于单目视频的逼真手术模拟 | Kailing Wang, Chen Yang, Keyang Zhao, Xiaokang Yang, Wei Shen | http://arxiv.org/pdf/2412.02359v1 | None |
2024-12-03 | SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation | 场景因子:可控3D场景生成的因子化潜在3D扩散 | Alexey Bokhovkin, Quan Meng, Shubham Tulsiani, Angela Dai | http://arxiv.org/pdf/2412.01801v2 | None |
2024-12-03 | Amodal Depth Anything: Amodal Depth Estimation in the Wild | : 非模态深度任何物:野外非模态深度估计 | Zhenyu Li, Mykola Lavreniuk, Jian Shi, Shariq Farooq Bhat, Peter Wonka | http://arxiv.org/pdf/2412.02336v1 | None |
2024-12-03 | Dual Exposure Stereo for Extended Dynamic Range 3D Imaging | 双曝光立体视觉实现扩展动态范围三维成像 | Juhyung Choi, Jinnyeong Kim, Seokjun Choi, Jinwoo Lee, Samuel Brucker, Mario Bijelic, Felix Heide, Seung-Hwan Baek | http://arxiv.org/pdf/2412.02351v1 | None |
2024-12-03 | HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset | HumanRig:在大规模数据集中学习人形角色自动绑定 | Zedong Chu, Feng Xiong, Meiduo Liu, Jinzhi Zhang, Mingqi Shao, Zhaoxu Sun, Di Wang, Mu Xu | http://arxiv.org/pdf/2412.02317v1 | None |
2024-12-03 | Partial Non-rigid Deformations and interpolations of Human Body Surfaces | 人体表面部分非刚性变形与插值 | Thomas Besnier, Emery Pierson, Sylvain Arguillere, Mohamed Daoudi | http://arxiv.org/pdf/2412.02306v1 | None |
2024-12-03 | Viewpoint Consistency in 3D Generation via Attention and CLIP Guidance | 通过注意力和CLIP引导实现3D生成中的视点一致性 | Qing Zhang, Zehao Chen, Jinguang Tong, Jing Zhang, Jie Hong, Xuesong Li | http://arxiv.org/pdf/2412.02287v1 | None |
2024-12-03 | KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation | KP-RED:利用语义关键点进行联合3D形状检索和变形 | Ruida Zhang, Chenyangguang Zhang, Yan Di, Fabian Manhardt, Xingyu Liu, Federico Tombari, Xiangyang Ji | http://arxiv.org/pdf/2403.10099v3 | https://github.com/lolrudy/KP-RED. |
2024-12-03 | Take Your Steps: Hierarchically Efficient Pulmonary Disease Screening via CT Volume Compression | 《迈出步伐:通过CT体积压缩实现层次化高效的肺病筛查》 | Qian Shao, Kai Zhang, Bang Du, Zepeng Li, Yixuan Wu, Qiyuan Chen, Jian Wu, Jintai Chen | http://arxiv.org/pdf/2412.01525v2 | None |
2024-12-03 | How to Use Diffusion Priors under Sparse Views? | 如何利用稀疏视图下的扩散先验? | Qisen Wang, Yifan Zhao, Jiawei Ma, Jia Li | http://arxiv.org/pdf/2412.02225v1 | https://github.com/iCVTEAM/IPSM. |
2024-12-03 | LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models | 布局VLM:通过视觉-语言模型的可微3D布局优化 | Fan-Yun Sun, Weiyu Liu, Siyi Gu, Dylan Lim, Goutam Bhat, Federico Tombari, Manling Li, Nick Haber | http://arxiv.org/pdf/2412.02193v1 | None |
2024-12-03 | CFPNet: Improving Lightweight ToF Depth Completion via Cross-zone Feature Propagation | CFPNet:通过跨区域特征传播提升轻量级ToF深度补全 | Laiyan Ding, Hualie Jiang, Rui Xu, Rui Huang | http://arxiv.org/pdf/2411.04480v4 | https://github.com/denyingmxd/CFPNet. |
2024-12-03 | Towards Cross-View-Consistent Self-Supervised Surround Depth Estimation | 朝向跨视角一致的自监督周围深度估计 | Laiyan Ding, Hualie Jiang, Jie Li, Yongquan Chen, Rui Huang | http://arxiv.org/pdf/2407.04041v3 | https://github.com/denyingmxd/CVCDepth. |
2024-12-03 | HSLiNets: Hyperspectral Image and LiDAR Data Fusion Using Efficient Dual Non-Linear Feature Learning Networks | HSLiNets:基于高效双非线性特征学习网络的超光谱图像与激光雷达数据融合 | Judy X Yang, Jing Wang, Chen Hong Sui, Zekun Long, Jun Zhou | http://arxiv.org/pdf/2412.00302v2 | None |
2024-12-03 | FoveaSPAD: Exploiting Depth Priors for Adaptive and Efficient Single-Photon 3D Imaging | FoveaSPAD:利用深度先验进行自适应和高效的单光子3D成像 | Justin Folden, Atul Ingle, Sanjeev J. Koppal | http://arxiv.org/pdf/2412.02052v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-03 | TFS-NeRF: Template-Free NeRF for Semantic 3D Reconstruction of Dynamic Scene | TFS-NeRF:无模板NeRF用于动态场景的语义3D重建 | Sandika Biswas, Qianyi Wu, Biplab Banerjee, Hamid Rezatofighi | http://arxiv.org/pdf/2409.17459v3 | None |
2024-12-03 | Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs | 通过回收预调LoRAs解锁视觉基础模型的无调优小样本适应性 | Zixuan Hu, Yongxian Wei, Li Shen, Chun Yuan, Dacheng Tao | http://arxiv.org/pdf/2412.02220v1 | None |
2024-12-03 | 3D representation in 512-Byte:Variational tokenizer is the key for autoregressive 3D generation | 512字节内的3D表示:变分标记器是自回归3D生成的关键 | Jinzhi Zhang, Feng Xiong, Mu Xu | http://arxiv.org/pdf/2412.02202v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-03 | STRIDE: Single-video based Temporally Continuous Occlusion Robust 3D Pose Estimation | STRIDE:基于单视频的时序连续遮挡鲁棒3D姿态估计 | Rohit Lal, Saketh Bachu, Yash Garg, Arindam Dutta, Calvin-Khang Ta, Dripta S. Raychaudhuri, Hannah Dela Cruz, M. Salman Asif | http://arxiv.org/pdf/2312.16221v3 | https://github.com/take2rohit/stride |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-03 | Diffusion-based Visual Anagram as Multi-task Learning | 基于扩散的视觉字谜作为多任务学习 | Zhiyuan Xu, Yinhe Chen, Huan-ang Gao, Weiyan Zhao, Guiyu Zhang, Hao Zhao | http://arxiv.org/pdf/2412.02693v1 | None |
2024-12-03 | FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation | FoundHand:大规模领域特定学习以实现可控手部图像生成 | Kefan Chen, Chaerin Min, Linguang Zhang, Shreyas Hampali, Cem Keskin, Srinath Sridhar | http://arxiv.org/pdf/2412.02690v1 | None |
2024-12-03 | Taming Scalable Visual Tokenizer for Autoregressive Image Generation | 驯服可扩展视觉标记器以实现自回归图像生成 | Fengyuan Shi, Zhuoyan Luo, Yixiao Ge, Yujiu Yang, Ying Shan, Limin Wang | http://arxiv.org/pdf/2412.02692v1 | https://github.com/TencentARC/SEED-Voken. |
2024-12-03 | SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance | SNOOPI:带适当引导的超强一步扩散蒸馏 | Viet Nguyen, Anh Aengus Nguyen, Trung Dao, Khoi Nguyen, Cuong Pham, Toan Tran, Anh Tran | http://arxiv.org/pdf/2412.02687v1 | None |
2024-12-03 | Diffusion Models with Anisotropic Gaussian Splatting for Image Inpainting | 各向异性高斯喷溅扩散模型在图像修复中的应用 | Jacob Fein-Ashley, Benjamin Fein-Ashley | http://arxiv.org/pdf/2412.01682v2 | None |
2024-12-03 | Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis | Switti:设计用于文本到图像合成的尺度感知Transformer | Anton Voronov, Denis Kuznedelev, Mikhail Khoroshikh, Valentin Khrulkov, Dmitry Baranchuk | http://arxiv.org/pdf/2412.01819v2 | None |
2024-12-03 | Decoupling Dark Knowledge via Block-wise Logit Distillation for Feature-level Alignment | 通过块级对数似然蒸馏解耦暗知识以实现特征级对齐 | Chengting Yu, Fengzhao Zhang, Ruizhe Chen, Aili Wang, Zuozhu Liu, Shurun Tan, Er-Ping Li | http://arxiv.org/pdf/2411.01547v2 | None |
2024-12-03 | MetaShadow: Object-Centered Shadow Detection, Removal, and Synthesis | 元阴影:基于对象的阴影检测、去除和合成 | Tianyu Wang, Jianming Zhang, Haitian Zheng, Zhihong Ding, Scott Cohen, Zhe Lin, Wei Xiong, Chi-Wing Fu | http://arxiv.org/pdf/2412.02635v1 | None |
2024-12-03 | Scaling Image Tokenizers with Grouped Spherical Quantization | 图像分词器通过分组球面量化进行扩展 | Jiangtao Wang, Zhen Qin, Yifan Zhang, Vincent Tao Hu, Björn Ommer, Rania Briq, Stefan Kesselheim | http://arxiv.org/pdf/2412.02632v1 | None |
2024-12-03 | Continual Learning of Personalized Generative Face Models with Experience Replay | 持续学习个性化生成人脸模型的经验回放 | Annie N. Wang, Luchao Qi, Roni Sengupta | http://arxiv.org/pdf/2412.02627v1 | None |
2024-12-03 | Denoising: A Powerful Building-Block for Imaging, Inverse Problems, and Machine Learning | 去噪:图像、逆问题和机器学习中的强大构建块 | Peyman Milanfar, Mauricio Delbracio | http://arxiv.org/pdf/2409.06219v4 | None |
2024-12-03 | dc-GAN: Dual-Conditioned GAN for Face Demorphing From a Single Morph | 双条件生成对抗网络:从单个形态进行人脸去形变 | Nitish Shukla, Arun Ross | http://arxiv.org/pdf/2411.14494v2 | None |
2024-12-03 | LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting | 室内场景重光照中的潜在内参与扩散模型的结合:LumiNet | Xiaoyan Xing, Konrad Groh, Sezer Karaoglu, Theo Gevers, Anand Bhattad | http://arxiv.org/pdf/2412.00177v2 | None |
2024-12-03 | Unveiling Concept Attribution in Diffusion Models | 揭示扩散模型中的概念归因 | Quang H. Nguyen, Hoang Phan, Khoa D. Doan | http://arxiv.org/pdf/2412.02542v1 | https://github.com/mail-research/CAD-attribution4diffusion |
2024-12-03 | ShadowHack: Hacking Shadows via Luminance-Color Divide and Conquer | 阴影黑客:通过亮度-颜色分割与征服进行阴影攻击 | Jin Hu, Mingjia Li, Xiaojie Guo | http://arxiv.org/pdf/2412.02545v1 | https://github.com/lime-j/ShadowHack |
2024-12-03 | WEM-GAN: Wavelet transform based facial expression manipulation | WEM-GAN:基于小波变换的面部表情操纵 | Dongya Sun, Yunfei Hu, Xianzhe Zhang, Yingsong Hu | http://arxiv.org/pdf/2412.02530v1 | None |
2024-12-03 | Towards Rich Emotions in 3D Avatars: A Text-to-3D Avatar Generation Benchmark | 迈向丰富情感的3D虚拟形象:文本到3D虚拟形象生成基准 | Haidong Xu, Meishan Zhang, Hao Ju, Zhedong Zheng, Hongyuan Zhu, Erik Cambria, Min Zhang, Hao Fei | http://arxiv.org/pdf/2412.02508v1 | None |
2024-12-03 | HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving | 全息驾驶:面向自动驾驶的全面2D-3D多模态街景生成 | Zehuan Wu, Jingcheng Ni, Xiaodong Wang, Yuxin Guo, Rui Chen, Lewei Lu, Jifeng Dai, Yuwen Xiong | http://arxiv.org/pdf/2412.01407v2 | None |
2024-12-03 | VISTA: A Panoramic View of Neural Representations | VISTA:神经网络表示的全景视图 | Tom White | http://arxiv.org/pdf/2412.02412v1 | None |
2024-12-03 | Efficient Concertormer for Image Deblurring and Beyond | 高效Concertormer在图像去模糊及其他领域的应用 | Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien, Ming-Hsuan Yang | http://arxiv.org/pdf/2404.06135v2 | None |
2024-12-03 | GenMix: Effective Data Augmentation with Generative Diffusion Model Image Editing | GenMix:基于生成扩散模型的图像编辑有效数据增强 | Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood, Karthik Nandakumar, Naveed Akhtar | http://arxiv.org/pdf/2412.02366v1 | None |
2024-12-03 | UniForm: A Reuse Attention Mechanism Optimized for Efficient Vision Transformers on Edge Devices | UniForm:针对边缘设备高效视觉Transformer优化的重用注意力机制 | Seul-Ki Yeom, Tae-Ho Kim | http://arxiv.org/pdf/2412.02344v1 | None |
2024-12-03 | SimuScope: Realistic Endoscopic Synthetic Dataset Generation through Surgical Simulation and Diffusion Models | SimuScope:通过手术模拟和扩散模型生成逼真的内窥镜合成数据集 | Sabina Martyniak, Joanna Kaleta, Diego Dall'Alba, Michał Naskręt, Szymon Płotka, Przemysław Korzeniowski | http://arxiv.org/pdf/2412.02332v1 | https://github.com/SanoScience/SimuScope. |
2024-12-03 | Controlling the Latent Diffusion Model for Generative Image Shadow Removal via Residual Generation | 控制潜在扩散模型通过残差生成进行生成图像阴影去除 | Xinjie Li, Yang Zhao, Dong Wang, Yuan Chen, Li Cao, Xiaoping Liu | http://arxiv.org/pdf/2412.02322v1 | None |
2024-12-03 | Active Learning via Classifier Impact and Greedy Selection for Interactive Image Retrieval | 基于分类器影响和贪婪选择的交互式图像检索主动学习 | Leah Bar, Boaz Lerner, Nir Darshan, Rami Ben-Ari | http://arxiv.org/pdf/2412.02310v1 | https://github.com/barleah/GreedyAL. |
2024-12-03 | PCIM: Learning Pixel Attributions via Pixel-wise Channel Isolation Mixing in High Content Imaging | PCIM:通过高内容成像中的像素通道隔离混合学习像素归属 | Daniel Siegismund, Mario Wieser, Stephan Heyse, Stephan Steigele | http://arxiv.org/pdf/2412.02275v1 | None |
2024-12-03 | Diffusion Implicit Policy for Unpaired Scene-aware Motion Synthesis | 无配对场景感知运动合成的扩散隐式策略 | Jingyu Gong, Chong Zhang, Fengqi Liu, Ke Fan, Qianyu Zhou, Xin Tan, Zhizhong Zhang, Yuan Xie | http://arxiv.org/pdf/2412.02261v1 | None |
2024-12-03 | Fast LiDAR Data Generation with Rectified Flows | 快速校正流下的激光雷达数据生成 | Kazuto Nakashima, Xiaowen Liu, Tomoya Miyawaki, Yumi Iwashita, Ryo Kurazume | http://arxiv.org/pdf/2412.02241v1 | None |
2024-12-03 | Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative Models | 跨注意力头位置模式可以与文本到图像生成模型中的人类视觉概念相一致 | Jungwon Park, Jungmin Ko, Dongnam Byun, Jangwon Suh, Wonjong Rhee | http://arxiv.org/pdf/2412.02237v1 | None |
2024-12-03 | CubeFormer: A Simple yet Effective Baseline for Lightweight Image Super-Resolution | 立方体former:一种简单而有效的轻量级图像超分辨率基线 | Jikai Wang, Huan Zheng, Jianbing Shen | http://arxiv.org/pdf/2412.02234v1 | None |
2024-12-03 | PriorPath: Coarse-To-Fine Approach for Controlled De-Novo Pathology Semantic Masks Generation | PriorPath:受控从头病理语义掩码生成的粗到细方法 | Nati Daniel, May Nathan, Eden Azeroual, Yael Fisher, Yonatan Savir | http://arxiv.org/pdf/2411.16515v2 | None |
2024-12-03 | GIST: Towards Photorealistic Style Transfer via Multiscale Geometric Representations | GIST:通过多尺度几何表示实现逼真风格迁移 | Renan A. Rojas-Gomez, Minh N. Do | http://arxiv.org/pdf/2412.02214v1 | None |
2024-12-03 | Embedded Prompt Tuning: Towards Enhanced Calibration of Pretrained Models for Medical Images | 嵌入式提示微调:迈向增强医学图像预训练模型校准 | Wenqiang Zu, Shenghao Xie, Qing Zhao, Guoqi Li, Lei Ma | http://arxiv.org/pdf/2407.01003v4 | https://github.com/zuwenqiang/EPT. |
2024-12-03 | Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis | 生成摄影:实现逼真文本到图像合成的场景一致相机控制 | Yu Yuan, Xijun Wang, Yichen Sheng, Prateek Chennuri, Xingguang Zhang, Stanley Chan | http://arxiv.org/pdf/2412.02168v1 | None |
2024-12-03 | Concept Replacer: Replacing Sensitive Concepts in Diffusion Models via Precision Localization | 概念替换器:通过精确定位在扩散模型中替换敏感概念 | Lingyun Zhang, Yu Xie, Yanwei Fu, Ping Chen | http://arxiv.org/pdf/2412.01244v2 | None |
2024-12-03 | PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution | PassionSR:基于一步扩散的图像超分辨率中的自适应尺度后训练量化 | Libo Zhu, Jianze Li, Haotong Qin, Wenbo Li, Yulun Zhang, Yong Guo, Xiaokang Yang | http://arxiv.org/pdf/2411.17106v3 | https://github.com/libozhu03/PassionSR. |
2024-12-03 | DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling | DyMO:基于动态多目标调度的免训练扩散模型对齐 | Xin Xie, Dong Gong | http://arxiv.org/pdf/2412.00759v2 | None |
2024-12-03 | Exploring Frequency-Inspired Optimization in Transformer for Efficient Single Image Super-Resolution | 探索Transformer中的频率灵感优化以提高单图像超分辨率效率 | Ao Li, Le Zhang, Yun Liu, Ce Zhu | http://arxiv.org/pdf/2308.05022v4 | https://github.com/AVC2-UESTC/Frequency-Inspired-Optimization-for-EfficientSR.git. |
2024-12-03 | Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation | 释放自回归模型在上下文中学习的潜力以实现少样本图像处理 | Bolin Lai, Felix Juefei-Xu, Miao Liu, Xiaoliang Dai, Nikhil Mehta, Chenguang Zhu, Zeyi Huang, James M. Rehg | http://arxiv.org/pdf/2412.01027v2 | None |
2024-12-03 | InstantSwap: Fast Customized Concept Swapping across Sharp Shape Differences | 即时交换:跨越尖锐形状差异的快速定制概念交换 | Chenyang Zhu, Kai Li, Yue Ma, Longxiang Tang, Chengyu Fang, Chubin Chen, Qifeng Chen, Xiu Li | http://arxiv.org/pdf/2412.01197v2 | None |
2024-12-03 | Direct Coloring for Self-Supervised Enhanced Feature Decoupling | 直接着色用于自监督增强特征解耦 | Salman Mohamadi, Gianfranco Doretto, Donald A. Adjeroh | http://arxiv.org/pdf/2412.02109v1 | None |
2024-12-03 | PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models | 基于透视布局扩散模型的可控街景合成:PerLDiff | Jinhua Zhang, Hualian Sheng, Sijia Cai, Bing Deng, Qiao Liang, Wen Li, Ying Fu, Jieping Ye | http://arxiv.org/pdf/2407.06109v3 | None |
2024-12-03 | OmniCreator: Self-Supervised Unified Generation with Universal Editing | 全创者:基于通用编辑的自监督统一生成 | Haodong Chen, Lan Wang, Harry Yang, Ser-Nam Lim | http://arxiv.org/pdf/2412.02114v1 | None |
2024-12-03 | TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation | 文本中心背景自适应的注意力引导文本到图像生成 | Tianyi Liang, Jiangqi Liu, Sicheng Song, Shiqi Jiang, Yifei Huang, Xinzhuo Zhang, Changbo Wang, Chenhui Li | http://arxiv.org/pdf/2404.11824v2 | None |
2024-12-03 | Hyperspectral Images Efficient Spatial and Spectral non-Linear Model with Bidirectional Feature Learning | 高光谱图像高效空间和光谱非线性模型与双向特征学习 | Judy X Yang, Jing Wang, Zekun Long, Chenhong Sui, Jun Zhou | http://arxiv.org/pdf/2412.00283v2 | None |
2024-12-03 | AccDiffusion v2: Towards More Accurate Higher-Resolution Diffusion Extrapolation | AccDiffusion v2:迈向更高精度的高分辨率扩散外推 | Zhihang Lin, Mingbao Lin, Wengyi Zhan, Rongrong Ji | http://arxiv.org/pdf/2412.02099v1 | https://github.com/lzhxmu/AccDiffusion_v2 |
2024-12-03 | VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition | VIGFace:隐私保护的人脸虚拟身份生成 | Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam, Junghyun Cho, Ig-Jae Kim | http://arxiv.org/pdf/2403.08277v2 | None |
2024-12-03 | Conti-Fuse: A Novel Continuous Decomposition-based Fusion Framework for Infrared and Visible Images | 红外与可见光图像的基于连续分解的新型融合框架:Conti-Fuse | Hui Li, Haolong Ma, Chunyang Cheng, Zhongwei Shen, Xiaoning Song, Xiao-Jun Wu | http://arxiv.org/pdf/2406.04689v3 | None |
2024-12-03 | Multi-student Diffusion Distillation for Better One-step Generators | 多学生扩散蒸馏以实现更好的单步生成器 | Yanke Song, Jonathan Lorraine, Weili Nie, Karsten Kreis, James Lucas | http://arxiv.org/pdf/2410.23274v2 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-03 | AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? | AV-Odyssey Bench:您的多模态LLM真的能理解视听信息吗? | Kaixiong Gong, Kaituo Feng, Bohao Li, Yibing Wang, Mofan Cheng, Shijia Yang, Jiaming Han, Benyou Wang | http://arxiv.org/pdf/2412.02611v1 | None |
2024-12-03 | Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey | 遥感时序视觉-语言模型:全面综述 | Chenyang Liu, Jiafan Zhang, Keyan Chen, Man Wang, Zhengxia Zou, Zhenwei Shi | http://arxiv.org/pdf/2412.02573v1 | https://github.com/Chen-Yang-Liu/Awesome-RS-Temporal-VLM |
2024-12-03 | SJTU:Spatial judgments in multimodal models towards unified segmentation through coordinate detection | 上海交通大学:多模态模型中的空间判断,通过坐标检测实现统一分割 | Joongwon Chae, Zhenyu Wang, Peiwu Qin | http://arxiv.org/pdf/2412.02565v1 | None |
2024-12-03 | Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention Networks | 多模态遥感场景分类:基于VLMs和双交叉注意力网络的实现 | Jinjin Cai, Kexin Meng, Baijian Yang, Gang Shao | http://arxiv.org/pdf/2412.02531v1 | https://github.com/CJR7/MultiAtt-RSSC |
2024-12-03 | Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents | 网格增强视觉:一种简单而有效的多模态智能体增强空间理解方法 | Joongwon Chae, Zhenyu Wang, Lian Zhang, Dongmei Yu, Peiwu Qin | http://arxiv.org/pdf/2411.18270v2 | None |
2024-12-03 | Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification | 动态-LLaVA:通过动态视觉-语言上下文稀疏化的高效多模态大型语言模型 | Wenxuan Huang, Zijie Zhai, Yunhang Shen, Shaoshen Cao, Fei Zhao, Xiangfeng Xu, Zheyu Ye, Shaohui Lin | http://arxiv.org/pdf/2412.00876v2 | https://github.com/Osilly/dynamic_llava |
2024-12-03 | ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation? | ScImage:多模态大型语言模型在科学文本到图像生成方面的表现如何? | Leixin Zhang, Steffen Eger, Yinjie Cheng, Weihe Zhai, Jonas Belouadi, Christoph Leiter, Simone Paolo Ponzetto, Fahimeh Moafian | http://arxiv.org/pdf/2412.02368v1 | None |
2024-12-03 | Agri-LLaVA: Knowledge-Infused Large Multimodal Assistant on Agricultural Pests and Diseases | 农业病虫害知识增强大型多模态助手 | Liqiong Wang, Teng Jin, Jinyu Yang, Ales Leonardis, Fangyi Wang, Feng Zheng | http://arxiv.org/pdf/2412.02158v1 | https://github.com/Kki2Eve/Agri-LLaVA. |
2024-12-03 | Adaptive Rank, Reduced Forgetting: Knowledge Retention in Continual Learning Vision-Language Models with Dynamic Rank-Selective LoRA | 自适应排名,减少遗忘:动态排名选择LoRA在持续学习视觉-语言模型中的知识保留 | Haodong Lu, Chongyang Zhao, Jason Xue, Lina Yao, Kristen Moore, Dong Gong | http://arxiv.org/pdf/2412.01004v2 | None |
2024-12-03 | Personalized Multimodal Large Language Models: A Survey | 个性化多模态大型语言模型:综述 | Junda Wu, Hanjia Lyu, Yu Xia, Zhehao Zhang, Joe Barrow, Ishita Kumar, Mehrnoosh Mirtaheri, Hongjie Chen | http://arxiv.org/pdf/2412.02142v1 | None |
2024-12-03 | WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image | WSI-LLaVA:一种用于全切片图像的多模态大型语言模型 | Yuci Liang, Xinheng Lyu, Meidan Ding, Wenting Chen, Jipeng Zhang, Yuexiang Ren, Xiangjian He, Song Wu | http://arxiv.org/pdf/2412.02141v1 | None |
2024-12-03 | Rethinking Self-Supervised Learning Within the Framework of Partial Information Decomposition | 重新思考在部分信息分解框架下的自监督学习 | Salman Mohamadi, Gianfranco Doretto, Donald A. Adjeroh | http://arxiv.org/pdf/2412.02121v1 | None |
2024-12-03 | ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification | 非对称语义对齐网络:用于RGB和SAR图像土地覆盖分类 | Pan Zhang, Baochai Peng, Chaoran Lu, Quanjin Huang | http://arxiv.org/pdf/2412.02044v1 | https://github.com/whu-pzhang/ASANet |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-03 | Planning-Guided Diffusion Policy Learning for Generalizable Contact-Rich Bimanual Manipulation | 规划引导的扩散策略学习以实现通用的富含接触的双臂操作 | Xuanlin Li, Tong Zhao, Xinghao Zhu, Jiuguang Wang, Tao Pang, Kuan Fang | http://arxiv.org/pdf/2412.02676v1 | None |
2024-12-03 | Go beyond End-to-End Training: Boosting Greedy Local Learning with Context Supply | 超越端到端训练:通过上下文供应增强贪婪局部学习 | Chengting Yu, Fengzhao Zhang, Hanzhi Ma, Aili Wang, Erping Li | http://arxiv.org/pdf/2312.07636v2 | https://github.com/Tab-ct/ContSup. |
2024-12-03 | Robust soybean seed yield estimation using high-throughput ground robot videos | 基于高通量地面机器人视频的鲁棒大豆产量估计 | Jiale Feng, Samuel W. Blair, Timilehin Ayanlade, Aditya Balu, Baskar Ganapathysubramanian, Arti Singh, Soumik Sarkar, Asheesh K Singh | http://arxiv.org/pdf/2412.02642v1 | None |
2024-12-03 | A Bidirectional Long Short Term Memory Approach for Infrastructure Health Monitoring Using On-board Vibration Response | 双向长短期记忆方法在利用车载振动响应进行基础设施健康监测中的应用 | R. R. Samani, A. Nunez, B. De Schutter | http://arxiv.org/pdf/2412.02643v1 | None |
2024-12-03 | Class-wise Autoencoders Measure Classification Difficulty And Detect Label Mistakes | 分类自编码器衡量分类难度并检测标签错误 | Jacob Marks, Brent A. Griffin, Jason J. Corso | http://arxiv.org/pdf/2412.02596v1 | https://github.com/voxel51/reconstruction-error-ratios. |
2024-12-03 | OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation | OCR阻碍RAG:评估OCR对检索增强生成的影响级联效应 | Junyuan Zhang, Qintong Zhang, Bin Wang, Linke Ouyang, Zichen Wen, Ying Li, Ka-Ho Chow, Conghui He | http://arxiv.org/pdf/2412.02592v1 | https://github.com/opendatalab/OHR-Bench |
2024-12-03 | Segmentation of Coronary Artery Stenosis in X-ray Angiography using Mamba Models | 基于Mamba模型的X射线血管造影冠状动脉狭窄分割 | Ali Rostami, Fatemeh Fouladi, Hedieh Sajedi | http://arxiv.org/pdf/2412.02568v1 | None |
2024-12-03 | Copy-Move Forgery Detection and Question Answering for Remote Sensing Image | 遥感图像的复制-移动伪造检测与问答 | Ze Zhang, Enyuan Zhao, Ziyi Wan, Jie Nie, Xinyue Liang, Lei Huang | http://arxiv.org/pdf/2412.02575v1 | https://github.com/shenyedepisa/RSCMQA. |
2024-12-03 | Comparative Analysis of Resource-Efficient CNN Architectures for Brain Tumor Classification | 脑肿瘤分类中资源高效CNN架构的比较分析 | Md Ashik Khan, Rafath Bin Zafar Auvee | http://arxiv.org/pdf/2411.15596v2 | None |
2024-12-03 | Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection | 开放集半监督目标检测的协同特征-对数对比学习 | Xinhao Zhong, Siyu Jiao, Yao Zhao, Yunchao Wei | http://arxiv.org/pdf/2411.13001v2 | None |
2024-12-03 | Multi-Class Abnormality Classification Task in Video Capsule Endoscopy | 多类别视频胶囊内窥镜异常分类任务 | Dev Rishi Verma, Vibhor Saxena, Dhruv Sharma, Arpan Gupta | http://arxiv.org/pdf/2410.19973v3 | None |
2024-12-03 | OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations | OODFace:在常见 corruption 和外观变化下的面部识别鲁棒性基准测试 | Caixin Kang, Yubo Chen, Shouwei Ruan, Shiji Zhao, Ruochen Zhang, Jiayi Wang, Shan Fu, Xingxing Wei | http://arxiv.org/pdf/2412.02479v1 | None |
2024-12-03 | Resonance: Learning to Predict Social-Aware Pedestrian Trajectories as Co-Vibrations | 共鸣:学习预测具有社会意识的行人轨迹作为共振动 | Conghao Wong, Ziqian Zou, Beihao Xia, Xinge You | http://arxiv.org/pdf/2412.02447v1 | None |
2024-12-03 | DPE-Net: Dual-Parallel Encoder Based Network for Semantic Segmentation of Polyps | DPE-Net:基于双并行编码器的息肉语义分割网络 | Malik Abdul Manan, Feng Jinchao, Shahzad Ahmed, Abdul Raheem | http://arxiv.org/pdf/2412.00888v2 | None |
2024-12-03 | Multi-scale and Multi-path Cascaded Convolutional Network for Semantic Segmentation of Colorectal Polyps | 多尺度多路径级联卷积网络用于结直肠癌息肉的语义分割 | Malik Abdul Manan, Feng Jinchao, Muhammad Yaqub, Shahzad Ahmed, Syed Muhammad Ali Imran, Imran Shabir Chuhan, Haroon Ahmed Khan | http://arxiv.org/pdf/2412.02443v1 | None |
2024-12-03 | PolarBEVDet: Exploring Polar Representation for Multi-View 3D Object Detection in Bird's-Eye-View | 极点BEVDet:探索极点表示在鸟瞰图多视图3D目标检测中的应用 | Zichen Yu, Quanli Liu, Wei Wang, Liyong Zhang, Xiaoguang Zhao | http://arxiv.org/pdf/2408.16200v2 | https://github.com/Yzichen/PolarBEVDet.git. |
2024-12-03 | Facial Expression Recognition with Controlled Privacy Preservation and Feature Compensation | 面部表情识别:可控隐私保护与特征补偿 | Feng Xu, David Ahmedt-Aristizabal, Lars Petersson, Dadong Wang, Xun Li | http://arxiv.org/pdf/2412.00277v2 | None |
2024-12-03 | Who Walks With You Matters: Perceiving Social Interactions with Groups for Pedestrian Trajectory Prediction | 与你同行者至关重要:用于行人轨迹预测的群体社交交互感知 | Ziqian Zou, Conghao Wong, Beihao Xia, Qinmu Peng, Xinge You | http://arxiv.org/pdf/2412.02395v1 | None |
2024-12-03 | Bio-inspired visual relative localization for large swarms of UAVs | 生物启发的大规模无人机群视觉相对定位 | Martin Křížek, Matouš Vrba, Antonella Barišić Kulaš, Stjepan Bogdan, Martin Saska | http://arxiv.org/pdf/2412.02393v1 | None |
2024-12-03 | Trajectory-based Road Autolabeling with Lidar-Camera Fusion in Winter Conditions | 基于轨迹的冬季条件下激光雷达-摄像头融合道路自动标注 | Eerik Alamikkotervo, Henrik Toikka, Kari Tammi, Risto Ojala | http://arxiv.org/pdf/2412.02370v1 | https://github.com/eerik98/lidar-camera-road-autolabeling.git |
2024-12-03 | Active Negative Loss: A Robust Framework for Learning with Noisy Labels | 主动负损失:一种用于带噪声标签学习的鲁棒框架 | Xichen Ye, Yifan Wu, Yiwen Xu, Xiaoqiang Li, Weizhong Zhang, Yifan Chen | http://arxiv.org/pdf/2412.02373v1 | https://github.com/Virusdoll/Active-Negative-Loss. |
2024-12-03 | Enhancing joint automatic chest X-ray diagnosis and clinical visual attention prediction with multi-stage cooperative learning | 提升多阶段协同学习在联合自动胸部X光片诊断与临床视觉注意力预测中的应用 | Zirui Qiu, Hassan Rivaz, Yiming Xiao | http://arxiv.org/pdf/2403.16970v3 | None |
2024-12-03 | A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation | 良好的基础胜过众多标签:高效标签的全景分割 | Niclas Vödisch, Kürsat Petek, Markus Käppeler, Abhinav Valada, Wolfram Burgard | http://arxiv.org/pdf/2405.19035v2 | None |
2024-12-03 | LoCo: Low-Contrast-Enhanced Contrastive Learning for Semi-Supervised Endoscopic Image Segmentation | 低对比度增强对比学习用于半监督内窥镜图像分割 | Lingcong Cai, Yun Li, Xiaomao Fan, Kaixuan Song, Yongcheng Li, Yixuan Yuan, Ruxin Wang, Wenbin Lei | http://arxiv.org/pdf/2412.02314v1 | https://github.com/AnoK3111/LoCo. |
2024-12-03 | Noisy Ostracods: A Fine-Grained, Imbalanced Real-World Dataset for Benchmarking Robust Machine Learning and Label Correction Methods | 噪声盲蝽:用于基准测试鲁棒机器学习和标签校正方法的细粒度、不平衡真实世界数据集 | Jiamian Hu, Yuanyuan Hong, Yihua Chen, He Wang, Moriaki Yasuhara | http://arxiv.org/pdf/2412.02313v1 | https://github.com/H-Jamieu/Noisy_ostracods. |
2024-12-03 | Initial Study On Improving Segmentation By Combining Preoperative CT And Intraoperative CBCT Using Synthetic Data | 初步研究:通过结合术前CT和术中CBCT以及合成数据改进分割 | Maximilian E. Tschuchnig, Philipp Steininger, Michael Gadermayr | http://arxiv.org/pdf/2412.02294v1 | None |
2024-12-03 | Monocular Lane Detection Based on Deep Learning: A Survey | 单目车道检测基于深度学习:综述 | Xin He, Haiyun Guo, Kuan Zhu, Bingke Zhu, Xu Zhao, Jianwu Fang, Jinqiao Wang | http://arxiv.org/pdf/2411.16316v4 | https://github.com/Core9724/Awesome-Lane-Detection |
2024-12-03 | ASTM :Autonomous Smart Traffic Management System Using Artificial Intelligence CNN and LSTM | ASTM:基于人工智能CNN和LSTM的自主智能交通管理系统 | Christofel Rio Goenawan | http://arxiv.org/pdf/2410.10929v5 | None |
2024-12-03 | AH-OCDA: Amplitude-based Curriculum Learning and Hopfield Segmentation Model for Open Compound Domain Adaptation | 基于振幅的课程学习和霍普菲尔德分割模型用于开放复合域自适应 | Jaehyun Choi, Junwon Ko, Dong-Jae Lee, Junmo Kim | http://arxiv.org/pdf/2412.02280v1 | None |
2024-12-03 | Sustainable Self-evolution Adversarial Training | 可持续的自适应对抗训练 | Wenxuan Wang, Chenglei Wang, Huihui Qi, Menghao Ye, Xuelin Qian, Peng Wang, Yanning Zhang | http://arxiv.org/pdf/2412.02270v1 | None |
2024-12-03 | GSGTrack: Gaussian Splatting-Guided Object Pose Tracking from RGB Videos | GSGTrack:基于高斯喷溅引导的RGB视频目标姿态跟踪 | Zhiyuan Chen, Fan Lu, Guo Yu, Bin Li, Sanqing Qu, Yuan Huang, Changhong Fu, Guang Chen | http://arxiv.org/pdf/2412.02267v1 | None |
2024-12-03 | Diabetic Retinopathy Classification from Retinal Images using Machine Learning Approaches | 糖尿病视网膜病变从视网膜图像中利用机器学习方法进行分类 | Indronil Bhattacharjee, Al-Mahmud, Tareq Mahmud | http://arxiv.org/pdf/2412.02265v1 | None |
2024-12-03 | Composing Open-domain Vision with RAG for Ocean Monitoring and Conservation | 基于RAG的开放域视觉合成用于海洋监测与保护 | Sepand Dyanatkar, Angran Li, Alexander Dungate | http://arxiv.org/pdf/2412.02262v1 | None |
2024-12-03 | ProbPose: A Probabilistic Approach to 2D Human Pose Estimation | 概率姿态估计:二维人体姿态估计的概率方法 | Miroslav Purkrabek, Jiri Matas | http://arxiv.org/pdf/2412.02254v1 | None |
2024-12-03 | Vision Transformers for Weakly-Supervised Microorganism Enumeration | 视觉Transformer在弱监督微生物计数中的应用 | Javier Ureña Santiago, Thomas Ströhle, Antonio Rodríguez-Sánchez, Ruth Breu | http://arxiv.org/pdf/2412.02250v1 | None |
2024-12-03 | Learning from Reduced Labels for Long-Tailed Data | 从减少标签中学习长尾数据 | Meng Wei, Zhongnian Li, Yong Zhou, Xinzheng Xu | http://arxiv.org/pdf/2403.16469v2 | None |
2024-12-03 | U-Net in Medical Image Segmentation: A Review of Its Applications Across Modalities | U-Net在医学图像分割中的应用综述:跨模态应用回顾 | Fnu Neha, Deepshikha Bhati, Deepak Kumar Shukla, Sonavi Makarand Dalvi, Nikolaos Mantzou, Safa Shubbar | http://arxiv.org/pdf/2412.02242v1 | None |
2024-12-03 | SpaGBOL: Spatial-Graph-Based Orientated Localisation | 空间图基导向定位:SpaGBOL | Tavis Shore, Oscar Mendez, Simon Hadfield | http://arxiv.org/pdf/2409.15514v2 | None |
2024-12-03 | Phase-Informed Tool Segmentation for Manual Small-Incision Cataract Surgery | 基于相位信息的工具分割用于手工小切口白内障手术 | Bhuvan Sachdeva, Naren Akash, Tajamul Ashraf, Simon Mueller, Thomas Schultz, Maximilian W. M. Wintergerst, Niharika Singri Prasad, Kaushik Murali | http://arxiv.org/pdf/2411.16794v2 | None |
2024-12-03 | Jailbreak Large Vision-Language Models Through Multi-Modal Linkage | 通过多模态链接破解大型视觉-语言模型 | Yu Wang, Xiaofei Zhou, Yichen Wang, Geyuan Zhang, Tianxing He | http://arxiv.org/pdf/2412.00473v2 | https://github.com/wangyu-ovo/MML |
2024-12-03 | CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy | CC-OCR:用于评估大型多模态模型在识字能力方面的全面且具有挑战性的OCR基准 | Zhibo Yang, Jun Tang, Zhaohai Li, Pengfei Wang, Jianqiang Wan, Humen Zhong, Xuejing Liu, Mingkun Yang | http://arxiv.org/pdf/2412.02210v1 | None |
2024-12-03 | Transformer-Metric Loss for CNN-Based Face Recognition | 基于CNN的人脸识别的Transformer度量损失 | Pritesh Prakash, Ashish Jacob Sam | http://arxiv.org/pdf/2412.02198v1 | None |
2024-12-03 | Cascaded Multi-Scale Attention for Enhanced Multi-Scale Feature Extraction and Interaction with Low-Resolution Images | 级联多尺度注意力:增强多尺度特征提取与低分辨率图像的交互 | Xiangyong Lu, Masanori Suganuma, Takayuki Okatani | http://arxiv.org/pdf/2412.02197v1 | https://github.com/xyongLu/CMSA. |
2024-12-03 | Multi-Granularity Video Object Segmentation | 多粒度视频目标分割 | Sangbeom Lim, Seongchan Kim, Seungjun An, Seokju Cho, Paul Hongsuck Seo, Seungryong Kim | http://arxiv.org/pdf/2412.01471v2 | None |
2024-12-03 | CamoFA: A Learnable Fourier-based Augmentation for Camouflage Segmentation | 伪装分割的可学习傅里叶增强:CamoFA | Minh-Quan Le, Minh-Triet Tran, Trung-Nghia Le, Tam V. Nguyen, Thanh-Toan Do | http://arxiv.org/pdf/2308.15660v2 | None |
2024-12-03 | Anatomically-Grounded Fact Checking of Automated Chest X-ray Reports | 基于解剖学的自动化胸部X光报告事实核查 | R. Mahmood, K. C. L. Wong, D. M. Reyes, N. D'Souza, L. Shi, J. Wu, P. Kaviani, M. Kalra | http://arxiv.org/pdf/2412.02177v1 | None |
2024-12-03 | Underload: Defending against Latency Attacks for Object Detectors on Edge Devices | 边缘设备上目标检测器的低负载:防御延迟攻击 | Tianyi Wang, Zichen Wang, Cong Wang, Yuanchao Shu, Ruilong Deng, Peng Cheng, Jiming Chen | http://arxiv.org/pdf/2412.02171v1 | None |
2024-12-03 | VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning | 视觉推理中的细粒度批评与校正基准:自我改进的评估 | Xueqing Wu, Yuheng Ding, Bingxuan Li, Pan Lu, Da Yin, Kai-Wei Chang, Nanyun Peng | http://arxiv.org/pdf/2412.02172v1 | None |
2024-12-03 | GFreeDet: Exploiting Gaussian Splatting and Foundation Models for Model-free Unseen Object Detection in the BOP Challenge 2024 | GFreeDet:利用高斯喷溅和基础模型在BOP挑战2024中进行无模型未见物体检测 | Xingyu Liu, Yingyue Li, Chengxi Li, Gu Wang, Chenyangguang Zhang, Ziqin Huang, Xiangyang Ji | http://arxiv.org/pdf/2412.01552v2 | None |
2024-12-03 | An Empirical Study of Mamba-based Pedestrian Attribute Recognition | 基于Mamba的行人属性识别实证研究 | Xiao Wang, Weizhe Kong, Jiandong Jin, Shiao Wang, Ruichong Gao, Qingchuan Ma, Chenglong Li, Jin Tang | http://arxiv.org/pdf/2407.10374v2 | https://github.com/Event-AHU/OpenPAR |
2024-12-03 | TransFair: Transferring Fairness from Ocular Disease Classification to Progression Prediction | TransFair:从眼部疾病分类到进展预测的公平性迁移 | Leila Gheisi, Henry Chu, Raju Gottumukkala, Yan Luo, Xingquan Zhu, Mengyu Wang, Min Shi | http://arxiv.org/pdf/2412.00051v2 | None |
2024-12-03 | GSOT3D: Towards Generic 3D Single Object Tracking in the Wild | GSOT3D:迈向通用野外3D单目标跟踪 | Yifan Jiao, Yunhao Li, Junhua Ding, Qing Yang, Song Fu, Heng Fan, Libo Zhang | http://arxiv.org/pdf/2412.02129v1 | https://github.com/ailovejinx/GSOT3D. |
2024-12-03 | Topology-Preserving Image Segmentation with Spatial-Aware Persistent Feature Matching | 拓扑保持的空间感知持久特征匹配图像分割 | Bo Wen, Haochen Zhang, Dirk-Uwe G. Bartsch, William R. Freeman, Truong Q. Nguyen, Cheolhong An | http://arxiv.org/pdf/2412.02076v1 | None |
2024-12-03 | Performance Comparison of Deep Learning Techniques in Naira Classification | 深度学习技术在奈拉分类性能比较 | Ismail Ismail Tijjani, Ahmad Abubakar Mustapha, Isma'il Tijjani Idris | http://arxiv.org/pdf/2412.02072v1 | None |
2024-12-03 | Another Vertical View: A Hierarchical Network for Heterogeneous Trajectory Prediction via Spectrums | 另一种垂直视角:通过频谱进行异构轨迹预测的分层网络 | Beihao Xia, Conghao Wong, Duanquan Xu, Qinmu Peng, Xinge You | http://arxiv.org/pdf/2304.05106v2 | None |
2024-12-03 | Dynamic Adversarial Attacks on Autonomous Driving Systems | 动态对抗攻击自动驾驶系统 | Amirhosein Chahe, Chenan Wang, Abhishek Jeyapratap, Kaidi Xu, Lifeng Zhou | http://arxiv.org/pdf/2312.06701v3 | None |
2024-12-03 | CLERF: Contrastive LEaRning for Full Range Head Pose Estimation | CLERF:全范围头部姿态估计的对比学习方法 | Ting-Ruen Wei, Haowei Liu, Huei-Chung Hu, Xuyang Wu, Yi Fang, Hsin-Tai Wu | http://arxiv.org/pdf/2412.02066v1 | None |
2024-12-03 | Redundant Queries in DETR-Based 3D Detection Methods: Unnecessary and Prunable | 基于DETR的3D检测方法中的冗余查询:不必要的且可剪枝 | Lizhen Xu, Shanmin Pang, Wenzhao Qiu, Zehao Wu, Xiuxiu Bai, Kuizhi Mei, Jianru Xue | http://arxiv.org/pdf/2412.02054v1 | https://github.com/iseri27/Gpq |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-03 | Towards Neuro-Symbolic Video Understanding | 迈向神经符号视频理解 | Minkyu Choi, Harsh Goel, Mohammad Omama, Yunhao Yang, Sahil Shah, Sandeep Chinchali | http://arxiv.org/pdf/2403.11021v3 | None |
2024-12-03 | Motion Prompting: Controlling Video Generation with Motion Trajectories | 运动提示:通过运动轨迹控制视频生成 | Daniel Geng, Charles Herrmann, Junhwa Hur, Forrester Cole, Serena Zhang, Tobias Pfaff, Tatiana Lopez-Guevara, Carl Doersch | http://arxiv.org/pdf/2412.02700v1 | None |
2024-12-03 | Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification | 神经符号形式验证在文本到视频模型评估中的应用 | S. P. Sharan, Minkyu Choi, Sahil Shah, Harsh Goel, Mohammad Omama, Sandeep Chinchali | http://arxiv.org/pdf/2411.16718v3 | None |
2024-12-03 | Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback | 利用人工智能反馈提升文本到视频生成中的动态物体交互 | Hiroki Furuta, Heiga Zen, Dale Schuurmans, Aleksandra Faust, Yutaka Matsuo, Percy Liang, Sherry Yang | http://arxiv.org/pdf/2412.02617v1 | None |
2024-12-03 | It Takes Two: Real-time Co-Speech Two-person's Interaction Generation via Reactive Auto-regressive Diffusion Model | 《协同互动:通过反应式自回归扩散模型实现实时双人对语交互生成》 | Mingyi Shi, Dafei Qin, Leo Ho, Zhouyingcheng Liao, Yinghao Huang, Junichi Yamagishi, Taku Komura | http://arxiv.org/pdf/2412.02419v1 | None |
2024-12-03 | VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation | 视频思维生成:多帧视频生成的协作框架 | Mingzhe Zheng, Yongqi Xu, Haojian Huang, Xuran Ma, Yexin Liu, Wenjie Shu, Yatian Pang, Feilong Tang | http://arxiv.org/pdf/2412.02259v1 | None |
2024-12-03 | OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation | OpenHumanVid:用于提升以人为中心的视频生成的超大规模高质量数据集 | Hui Li, Mingwang Xu, Yun Zhan, Shan Mu, Jiaye Li, Kaihui Cheng, Yuxuan Chen, Tan Chen | http://arxiv.org/pdf/2412.00115v2 | None |
2024-12-03 | VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models | 视觉XL:基于潜在图像扩散模型的高清视频逆问题求解器 | Taesung Kwon, Jong Chul Ye | http://arxiv.org/pdf/2412.00156v2 | None |
2024-12-03 | VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding | 视频ICL:基于置信度的迭代情境学习以实现分布外视频理解 | Kangsan Kim, Geon Park, Youngwan Lee, Woongyeong Yeo, Sung Ju Hwang | http://arxiv.org/pdf/2412.02186v1 | https://github.com/KangsanKim07/VideoICL |
2024-12-03 | From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding | 从秒到小时:全面长视频理解的多模态大型语言模型综述 | Heqing Zou, Tianze Luo, Guiyang Xie, Victor, Zhang, Fengmao Lv, Guangcong Wang, Junyang Chen | http://arxiv.org/pdf/2409.18938v2 | None |
2024-12-03 | Understanding Particles From Video: Property Estimation of Granular Materials via Visuo-Haptic Learning | 从视频中理解粒子:通过视觉-触觉学习估计颗粒材料的属性 | Zeqing Zhang, Guangze Zheng, Xuebo Ji, Guanqi Chen, Ruixing Jia, Wentao Chen, Guanhua Chen, Liangjun Zhang | http://arxiv.org/pdf/2412.02119v1 | None |
2024-12-03 | DuoCast: Duo-Probabilistic Meteorology-Aware Model for Extended Precipitation Nowcasting | 双播:扩展降水预报的双概率气象感知模型 | Penghui Wen, Lei Bai, Mengwei He, Patrick Filippi, Feng Zhang, Thomas Francis Bishop, Zhiyong Wang, Kun Hu | http://arxiv.org/pdf/2412.01091v2 | https://github.com/ph-w2000/DuoCast. |
2024-12-03 | Progress-Aware Video Frame Captioning | 感知进度的视频帧字幕生成 | Zihui Xue, Joungbin An, Xitong Yang, Kristen Grauman | http://arxiv.org/pdf/2412.02071v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-03 | MERGE: Multi-faceted Hierarchical Graph-based GNN for Gene Expression Prediction from Whole Slide Histopathology Images | MERGE:基于多角度分层图的GNN在整张切片病理图像中进行基因表达预测 | Aniruddha Ganguly, Debolina Chatterjee, Wentao Huang, Jie Zhang, Alisa Yurovsky, Travis Steele Johnson, Chao Chen | http://arxiv.org/pdf/2412.02601v1 | None |
2024-12-03 | OMENN: One Matrix to Explain Neural Networks | OMENN:一个矩阵解释神经网络 | Adam Wróbel, Mikołaj Janusz, Bartosz Zieliński, Dawid Rymarczyk | http://arxiv.org/pdf/2412.02399v1 | None |
2024-12-03 | Enabling DBSCAN for Very Large-Scale High-Dimensional Spaces | 启用DBSCAN处理大规模高维空间 | Yongyu Wang | http://arxiv.org/pdf/2411.11421v3 | None |
2024-12-03 | ILASH: A Predictive Neural Architecture Search Framework for Multi-Task Applications | ILASH:多任务应用预测性神经架构搜索框架 | Md Hafizur Rahman, Md Mashfiq Rizvee, Sumaiya Shomaji, Prabuddha Chakraborty | http://arxiv.org/pdf/2412.02116v1 | None |
2024-12-03 | A Classic-Quantum Hybrid Network Framework: CQH-Net | 经典-量子混合网络框架:CQH-Net | Ao Liu, Cuihong Wen, Jieci Wang | http://arxiv.org/pdf/2412.02059v1 | None |