Skip to content

Latest commit

 

History

History
executable file
·
183 lines (158 loc) · 34.5 KB

2024-11-20.md

File metadata and controls

executable file
·
183 lines (158 loc) · 34.5 KB

[UPDATED!] 2024-11-20 (Publish Time)

生成模型

Publish Date Title Title_CN Authors PDF Code
2024-11-20 REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents REDUCIO!使用极压缩运动潜变量在16秒内生成1024×1024视频 Rui Tian, Qi Dai, Jianmin Bao, Kai Qiu, Yifan Yang, Chong Luo, Zuxuan Wu, Yu-Gang Jiang http://arxiv.org/pdf/2411.13552v1 null
2024-11-20 Identity Preserving 3D Head Stylization with Multiview Score Distillation 基于多视角评分蒸馏的保留身份的3D头部风格化 Bahri Batuhan Bilecen, Ahmet Berke Gokmen, Furkan Guzelant, Aysegul Dundar http://arxiv.org/pdf/2411.13536v1 null
2024-11-20 VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models VBench++:用于视频生成模型的全面和多功能基准套件 Ziqi Huang, Fan Zhang, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong, Qianli Ma, Nattapol Chanpaisit, Chenyang Si, Yuming Jiang, et.al. http://arxiv.org/pdf/2411.13503v1 null
2024-11-20 Adversarial Diffusion Compression for Real-World Image Super-Resolution 对抗扩散压缩实现真实图像超分辨率 Bin Chen, Gehui Li, Rongyuan Wu, Xindong Zhang, Jie Chen, Jian Zhang, Lei Zhang http://arxiv.org/pdf/2411.13383v1 null
2024-11-20 DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild DATAP-SfM:动态感知任意点鲁棒从运动中获取野外结构 Weicai Ye, Xinyu Chen, Ruohao Zhan, Di Huang, Xiaoshui Huang, Haoyi Zhu, Hujun Bao, Wanli Ouyang, Tong He, Guofeng Zhang http://arxiv.org/pdf/2411.13291v1 null
2024-11-20 XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation XMask3D:开放词汇3D语义分割的跨模态掩码推理 Ziyi Wang, Yanbo Wang, Xumin Yu, Jie Zhou, Jiwen Lu http://arxiv.org/pdf/2411.13243v1 null
2024-11-20 RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation RAW-Diffusion:基于RGB引导的扩散模型实现高保真度RAW图像生成 Christoph Reinders, Radu Berdan, Beril Besbinar, Junji Otsuka, Daisuke Iso http://arxiv.org/pdf/2411.13150v1 null
2024-11-20 CopyrightMeter: Revisiting Copyright Protection in Text-to-image Models 版权度量器:重新审视文本到图像模型的版权保护 Naen Xu, Changjiang Li, Tianyu Du, Minxi Li, Wenjie Luo, Jiacheng Liang, Yuyuan Li, Xuhong Zhang, Meng Han, Jianwei Yin, et.al. http://arxiv.org/pdf/2411.13144v1 null
2024-11-20 Virtual Staining of Label-Free Tissue in Imaging Mass Spectrometry 无标记组织在成像质谱中的虚拟染色 Yijie Zhang, Luzhe Huang, Nir Pillar, Yuzhu Li, Lukasz G. Migas, Raf Van de Plas, Jeffrey M. Spraggins, Aydogan Ozcan http://arxiv.org/pdf/2411.13120v1 null
2024-11-20 LMM-driven Semantic Image-Text Coding for Ultra Low-bitrate Learned Image Compression 基于LMM驱动的语义图像-文本编码的超低比特率学习图像压缩 Shimon Murai, Heming Sun, Jiro Katto http://arxiv.org/pdf/2411.13033v1 null
2024-11-20 ORID: Organ-Regional Information Driven Framework for Radiology Report Generation ORID:基于器官-区域信息的放射报告生成框架 Tiancheng Gu, Kaicheng Yang, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai http://arxiv.org/pdf/2411.13025v1 null
2024-11-20 Automating Sonologists USG Commands with AI and Voice Interface 利用AI和语音界面自动化超声学家超声指令 Emad Mohamed, Shruti Tiwari, Sheena Christabel Pravin http://arxiv.org/pdf/2411.13006v1 null
2024-11-20 HouseLLM: LLM-Assisted Two-Phase Text-to-Floorplan Generation HouseLLM:基于LLM的两阶段文本生成平面图 Ziyang Zong, Zhaohuan Zhan, Guang Tan http://arxiv.org/pdf/2411.12279v2 null
2024-11-20 TP-UNet: Temporal Prompt Guided UNet for Medical Image Segmentation 时间提示引导的UNet医疗图像分割 Ranmin Wang, Limin Zhuang, Hongkun Chen, Boyan Xu, Ruichu Cai http://arxiv.org/pdf/2411.11305v2 null
2024-11-20 Time Step Generating: A Universal Synthesized Deepfake Image Detector 时间步生成:一种通用的深度伪造图像检测器 Ziyue Zeng, Haoyuan Liu, Dingjie Peng, Luoxu Jing, Hiroshi Watanabe http://arxiv.org/pdf/2411.11016v2 link
2024-11-20 Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step 对抗评分身份蒸馏:一步超越教师 Mingyuan Zhou, Huangjie Zheng, Yi Gu, Zhendong Wang, Hai Huang http://arxiv.org/pdf/2410.14919v3 link
2024-11-20 MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes MagicDrive3D:可控3D生成以实现街景任意视角渲染 Ruiyuan Gao, Kai Chen, Zhihao Li, Lanqing Hong, Zhenguo Li, Qiang Xu http://arxiv.org/pdf/2405.14475v3 null
2024-11-20 Erasing Undesirable Influence in Diffusion Models 消除扩散模型中的不良影响 Jing Wu, Trung Le, Munawar Hayat, Mehrtash Harandi http://arxiv.org/pdf/2401.05779v4 null

多模态

Publish Date Title Title_CN Authors PDF Code
2024-11-20 Teaching VLMs to Localize Specific Objects from In-context Examples 从上下文示例中教授VLMs定位特定物体 Sivan Doveh, Nimrod Shabtay, Wei Lin, Eli Schwartz, Hilde Kuehne, Raja Giryes, Rogerio Feris, Leonid Karlinsky, James Glass, Assaf Arbelle, et.al. http://arxiv.org/pdf/2411.13317v1 null
2024-11-20 VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation 视频自动竞技场:通过用户模拟评估大型多模态模型在视频分析中的自动竞技场 Ziyang Luo, Haoning Wu, Dongxu Li, Jing Ma, Mohan Kankanhalli, Junnan Li http://arxiv.org/pdf/2411.13281v1 null
2024-11-20 DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving 多模态大型语言模型在自动驾驶中的空间理解基准:DriveMLLM Xianda Guo, Ruijun Zhang, Yiqun Duan, Yuhang He, Chenming Zhang, Shuai Liu, Long Chen http://arxiv.org/pdf/2411.13112v1 null
2024-11-20 Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving 自动驾驶中多模态LLM视觉表示增强的提示线索 Hao Zhou, Zhanning Gao, Maosheng Ye, Zhili Chen, Qifeng Chen, Tongyi Cao, Honggang Qi http://arxiv.org/pdf/2411.13076v1 null
2024-11-20 Efficient Masked AutoEncoder for Video Object Counting and A Large-Scale Benchmark 高效掩码自编码器在视频目标计数及大规模基准上的应用 Bing Cao, Quanhao Lu, Jiekang Feng, Pengfei Zhu, Qinghua Hu, Qilong Wang http://arxiv.org/pdf/2411.13056v1 null
2024-11-20 MEGL: Multimodal Explanation-Guided Learning MEGL:多模态解释引导学习 Yifei Zhang, Tianxu Jiang, Bo Pan, Jingyu Wang, Guangji Bai, Liang Zhao http://arxiv.org/pdf/2411.13053v1 null
2024-11-20 Unsupervised Homography Estimation on Multimodal Image Pair via Alternating Optimization 基于交替优化的多模态图像对无监督单应性估计 Sanghyeob Song, Jaihyun Lew, Hyemi Jang, Sungroh Yoon http://arxiv.org/pdf/2411.13036v1 null
2024-11-20 LaVida Drive: Vision-Text Interaction VLM for Autonomous Driving with Token Selection, Recovery and Enhancement LaVida Drive:基于token选择、恢复和增强的视觉-文本交互VLM自动驾驶系统 Siwen Jiao, Yangyi Fang http://arxiv.org/pdf/2411.12980v1 null
2024-11-20 Constraint Learning for Parametric Point Cloud 参数点云的约束学习 Xi Cheng, Ruiqi Lei, Di Huang, Zhichao Liao, Fengyuan Piao, Yan Chen, Pingfa Feng, Long Zeng http://arxiv.org/pdf/2411.07747v3 null
2024-11-20 Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training Mono-InternVL:通过内源视觉预训练推动单体多模态大型语言模型的边界 Gen Luo, Xue Yang, Wenhan Dou, Zhaokai Wang, Jiawen Liu, Jifeng Dai, Yu Qiao, Xizhou Zhu http://arxiv.org/pdf/2410.08202v2 null
2024-11-20 DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction DAOcc:基于3D目标检测的多传感器融合三维占用预测 Zhen Yang, Yanpeng Dong, Heng Wang, Lichao Ma, Zijian Cui, Qi Liu, Haoran Pei http://arxiv.org/pdf/2409.19972v2 link
2024-11-20 MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation MMTryon:高质量时尚生成中的多模态多参考控制 Xujie Zhang, Ente Lin, Xiu Li, Yuxuan Luo, Michael Kampffmeyer, Xin Dong, Xiaodan Liang http://arxiv.org/pdf/2405.00448v4 null
2024-11-20 A Multi-scale Information Integration Framework for Infrared and Visible Image Fusion 多尺度红外与可见光图像融合信息集成框架 Guang Yang, Jie Li, Hanxiao Lei, Xinbo Gao http://arxiv.org/pdf/2312.04328v2 link

Nerf

Publish Date Title Title_CN Authors PDF Code
2024-11-20 GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting gazeGaussian:基于3D高斯散布的高保真注视重定向 Xiaobao Wei, Peng Chen, Guangyu Li, Ming Lu, Hui Chen, Feng Tian http://arxiv.org/pdf/2411.12981v1 null
2024-11-20 Voxel-Mesh Hybrid Representation for Real-Time View Synthesis 基于体素-网格混合表示的实时视图合成 Chenhao Zhang, Yongyang Zhou, Lei Zhang http://arxiv.org/pdf/2403.06505v2 null

3DGS

Publish Date Title Title_CN Authors PDF Code
2024-11-20 Generating 3D-Consistent Videos from Unposed Internet Photos 从未摆姿势的互联网照片生成三维一致的视频 Gene Chou, Kai Zhang, Sai Bi, Hao Tan, Zexiang Xu, Fujun Luan, Bharath Hariharan, Noah Snavely http://arxiv.org/pdf/2411.13549v1 null
2024-11-20 Beyond Gaussians: Fast and High-Fidelity 3D Splatting with Linear Kernels 超越高斯:线性核的快速高保真3D分层渲染 Haodong Chen, Runnan Chen, Qiang Qu, Zhaoqing Wang, Tongliang Liu, Xiaoming Chen, Yuk Ying Chung http://arxiv.org/pdf/2411.12440v2 null

模型压缩/优化

Publish Date Title Title_CN Authors PDF Code
2024-11-20 Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning 极简化的极限:极端剪枝的技巧集 Andy Li, Aiden Durrant, Milan Markovic, Lu Yin, Georgios Leontidis http://arxiv.org/pdf/2411.13545v1 null
2024-11-20 RTSR: A Real-Time Super-Resolution Model for AV1 Compressed Content 实时AV1压缩内容超分辨率模型:RTSR Yuxuan Jiang, Jakub Nawała, Chen Feng, Fan Zhang, Xiaoqing Zhu, Joel Sole, David Bull http://arxiv.org/pdf/2411.13362v1 null
2024-11-20 DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes 分布式高斯涡轮重建稀疏视图大规模场景 Hao Li, Yuanyuan Gao, Haosong Peng, Chenming Wu, Weicai Ye, Yufeng Zhan, Chen Zhao, Dingwen Zhang, Jingdong Wang, Junwei Han http://arxiv.org/pdf/2411.12309v2 null
2024-11-20 DINO-LG: A Task-Specific DINO Model for Coronary Calcium Scoring DINO-LG:针对冠状动脉钙评分的任务特定DINO模型 Mahmut S. Gokmen, Caner Ozcan, Cody Bumgardner http://arxiv.org/pdf/2411.07976v4 null
2024-11-20 QIANets: Quantum-Integrated Adaptive Networks for Reduced Latency and Improved Inference Times in CNN Models QIANets:用于降低延迟和提高CNN模型推理时间的量子集成自适应网络 Zhumazhan Balapanov, Vanessa Matvei, Olivia Holmberg, Edward Magongo, Jonathan Pei, Kevin Zhu http://arxiv.org/pdf/2410.10318v2 null

分类/检测/识别/分割/...

Publish Date Title Title_CN Authors PDF Code
2024-11-20 AI-generated Image Detection: Passive or Watermark? 人工智能生成图像检测:被动式还是水印式? Moyang Guo, Yuepeng Hu, Zhengyuan Jiang, Zeyu Li, Amir Sadovnik, Arka Daw, Neil Gong http://arxiv.org/pdf/2411.13553v1 null
2024-11-20 Find Any Part in 3D 三维中查找任意部分 Ziqi Ma, Yisong Yue, Georgia Gkioxari http://arxiv.org/pdf/2411.13550v1 null
2024-11-20 DIS-Mine: Instance Segmentation for Disaster-Awareness in Poor-Light Condition in Underground Mines DIS-Mine:地下矿井弱光条件下灾害感知的实例分割 Mizanur Rahman Jewel, Mohamed Elmahallawy, Sanjay Madria, Samuel Frimpong http://arxiv.org/pdf/2411.13544v1 null
2024-11-20 Comparative Analysis of Machine Learning and Deep Learning Models for Classifying Squamous Epithelial Cells of the Cervix 宫颈鳞状上皮细胞分类中机器学习与深度学习模型的比较分析 Subhasish Das, Satish K Panda, Madhusmita Sethy, Prajna Paramita Giri, Ashwini K Nanda http://arxiv.org/pdf/2411.13535v1 null
2024-11-20 Entropy Bootstrapping for Weakly Supervised Nuclei Detection 基于熵引导的弱监督细胞核检测 James Willoughby, Irina Voiculescu http://arxiv.org/pdf/2411.13528v1 null
2024-11-20 Geometric Algebra Planes: Convex Implicit Neural Volumes 几何代数平面:凸隐式神经体积 Irmak Sivgin, Sara Fridovich-Keil, Gordon Wetzstein, Mert Pilanci http://arxiv.org/pdf/2411.13525v1 null
2024-11-20 Efficient Brain Imaging Analysis for Alzheimer's and Dementia Detection Using Convolution-Derivative Operations 基于卷积-导数运算的高效脑部影像分析用于阿尔茨海默病和痴呆检测 Yasmine Mustafa, Mohamed Elmahallawy, Tie Luo http://arxiv.org/pdf/2411.13490v1 null
2024-11-20 Learning based Ge'ez character handwritten recognition 基于学习的古吉兹文字手写识别 Hailemicael Lulseged Yimer, Hailegabriel Dereje Degefa, Marco Cristani, Federico Cunico http://arxiv.org/pdf/2411.13350v1 null
2024-11-20 A Resource Efficient Fusion Network for Object Detection in Bird's-Eye View using Camera and Raw Radar Data 基于相机和原始雷达数据在鸟瞰视图中进行目标检测的资源高效融合网络 Kavin Chandrasekaran, Sorin Grigorescu, Gijs Dubbelman, Pavol Jancura http://arxiv.org/pdf/2411.13311v1 null
2024-11-20 DATTA: Domain-Adversarial Test-Time Adaptation for Cross-Domain WiFi-Based Human Activity Recognition DATTA:基于跨域WiFi的人体活动识别的域对抗测试时自适应 Julian Strohmayer, Rafael Sterzinger, Matthias Wödlinger, Martin Kampel http://arxiv.org/pdf/2411.13284v1 null
2024-11-20 Paying more attention to local contrast: improving infrared small target detection performance via prior knowledge 关注局部对比度提升:利用先验知识改善红外小目标检测性能 Peichao Wang, Jiabao Wang, Yao Chen, Rui Zhang, Yang Li, Zhuang Miao http://arxiv.org/pdf/2411.13260v1 null
2024-11-20 BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation BelHouse3D:用于评估3D点云语义分割遮挡鲁棒性的基准数据集 Umamaheswaran Raman Kumar, Abdur Razzaq Fayjie, Jurgen Hannaert, Patrick Vandewalle http://arxiv.org/pdf/2411.13251v1 null
2024-11-20 ViSTa Dataset: Do vision-language models understand sequential tasks? ViSTa数据集:视觉-语言模型是否理解顺序任务? Evžen Wybitul, Evan Ryan Gunter, Mikhail Seleznyov http://arxiv.org/pdf/2411.13211v1 null
2024-11-20 Intensity-Spatial Dual Masked Autoencoder for Multi-Scale Feature Learning in Chest CT Segmentation 强度-空间双重掩码自编码器在胸部CT分割中的多尺度特征学习 Yuexing Ding, Jun Wang, Hongbing Lyu http://arxiv.org/pdf/2411.13198v1 null
2024-11-20 VADet: Multi-frame LiDAR 3D Object Detection using Variable Aggregation VADet:基于可变聚合的多帧激光雷达3D目标检测 Chengjie Huang, Vahdat Abdelzad, Sean Sedwards, Krzysztof Czarnecki http://arxiv.org/pdf/2411.13186v1 null
2024-11-20 Click; Single Object Tracking; Video Object Segmentation; Real-time Interaction 点击;单目标跟踪;视频目标分割;实时交互 Kuiran Wang, Xuehui Yu, Wenwen Yu, Guorong Li, Xiangyuan Lan, Qixiang Ye, Jianbin Jiao, Zhenjun Han http://arxiv.org/pdf/2411.13183v1 null
2024-11-20 Cross-Camera Distracted Driver Classification through Feature Disentanglement and Contrastive Learning 基于特征解耦和对比学习的跨摄像头分心驾驶员分类 Simone Bianco, Luigi Celona, Paolo Napoletano http://arxiv.org/pdf/2411.13181v1 null
2024-11-20 YCB-LUMA: YCB Object Dataset with Luminance Keying for Object Localization YCB-LUMA:基于亮度关键帧的YCB目标数据集用于目标定位 Thomas Pöllabauer http://arxiv.org/pdf/2411.13149v1 null
2024-11-20 GraphCL: Graph-based Clustering for Semi-Supervised Medical Image Segmentation 基于图的半监督医学图像分割聚类 Mengzhu Wang, Jiao Li, Houcheng Su, Nan Yin, Shen Li http://arxiv.org/pdf/2411.13147v1 null
2024-11-20 Adapting Vision Foundation Models for Robust Cloud Segmentation in Remote Sensing Images 远程传感图像中鲁棒云分割的视觉基础模型自适应 Xuechao Zou, Shun Zhang, Kai Li, Shiying Wang, Junliang Xing, Lei Jin, Congyan Lang, Pin Tao http://arxiv.org/pdf/2411.13127v1 null
2024-11-20 Demonstrating the Suitability of Neuromorphic, Event-Based, Dynamic Vision Sensors for In Process Monitoring of Metallic Additive Manufacturing and Welding 展示神经形态、事件驱动、动态视觉传感器在金属增材制造和焊接过程监控中的适用性 David Mascareñas, Andre Green, Ashlee Liao, Michael Torrez, Alessandro Cattaneo, Amber Black, John Bernardin, Garrett Kenyon http://arxiv.org/pdf/2411.13108v1 null
2024-11-20 Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension 视频-RAG:视觉对齐的检索增强长视频理解 Yongdong Luo, Xiawu Zheng, Xiao Yang, Guilin Li, Haojia Lin, Jinfa Huang, Jiayi Ji, Fei Chao, Jiebo Luo, Rongrong Ji http://arxiv.org/pdf/2411.13093v1 null
2024-11-20 Bounding-box Watermarking: Defense against Model Extraction Attacks on Object Detectors 边界框水印:针对目标检测器模型提取攻击的防御 Satoru Koda, Ikuya Morikawa http://arxiv.org/pdf/2411.13047v1 null
2024-11-20 Prior-based Objective Inference Mining Potential Uncertainty for Facial Expression Recognition 基于先验的目标推断挖掘面部表情识别中的潜在不确定性 Hanwei Liu, Huiling Cai, Qingcheng Lin, Xuefeng Li, Hui Xiao http://arxiv.org/pdf/2411.13024v1 null
2024-11-20 Open-World Amodal Appearance Completion 开放式世界非模态外观完成 Jiayang Ao, Yanbei Jiang, Qiuhong Ke, Krista A. Ehinger http://arxiv.org/pdf/2411.13019v1 null
2024-11-20 DT-LSD: Deformable Transformer-based Line Segment Detection 基于可变形变换器的线段检测:DT-LSD Sebastian Janampa, Marios Pattichis http://arxiv.org/pdf/2411.13005v1 null
2024-11-20 Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection 开放集半监督目标检测中的协同特征-对数对比学习 Xinhao Zhong, Siyu Jiao, Yao Zhao, Yunchao Wei http://arxiv.org/pdf/2411.13001v1 null
2024-11-20 Enhancing Thermal MOT: A Novel Box Association Method Leveraging Thermal Identity and Motion Similarity 增强热MOT:利用热身份和运动相似性的新型框关联方法 Wassim El Ahmar, Dhanvin Kolhatkar, Farzan Nowruzi, Robert Laganiere http://arxiv.org/pdf/2411.12943v1 null
2024-11-20 Topological Symmetry Enhanced Graph Convolution for Skeleton-Based Action Recognition 拓扑对称性增强的基于骨架的动作识别图卷积 Zeyu Liang, Hailun Xia, Naichuan Zheng, Huan Xu http://arxiv.org/pdf/2411.12560v2 link
2024-11-20 CLIP Unreasonable Potential in Single-Shot Face Recognition CLIP在单次人脸识别中的巨大潜力 Nhan T. Luu http://arxiv.org/pdf/2411.12319v2 null
2024-11-20 CRT-Fusion: Camera, Radar, Temporal Fusion Using Motion Information for 3D Object Detection CRT-Fusion:利用运动信息实现相机、雷达、时序融合的3D目标检测 Jisong Kim, Minjae Seong, Jun Won Choi http://arxiv.org/pdf/2411.03013v2 null
2024-11-20 Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models 利用视觉数据上下文不确定性进行深度模型高效训练 Sharat Agarwal http://arxiv.org/pdf/2411.01925v2 null
2024-11-20 TALoS: Enhancing Semantic Scene Completion via Test-time Adaptation on the Line of Sight TALoS:通过视距测试时自适应增强语义场景补全 Hyun-Kurl Jang, Jihun Kim, Hyeokjun Kweon, Kuk-Jin Yoon http://arxiv.org/pdf/2410.15674v2 link
2024-11-20 Multiview Scene Graph 多视角场景图 Juexiao Zhang, Gao Zhu, Sihang Li, Xinhao Liu, Haorui Song, Xinran Tang, Chen Feng http://arxiv.org/pdf/2410.11187v3 link
2024-11-20 SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data SynFER:利用合成数据提升面部表情识别 Xilin He, Cheng Luo, Xiaole Xian, Bing Li, Siyang Song, Muhammad Haris Khan, Weicheng Xie, Linlin Shen, Zongyuan Ge http://arxiv.org/pdf/2410.09865v2 null
2024-11-20 Classification of Buried Objects from Ground Penetrating Radar Images by using Second Order Deep Learning Models 基于二阶深度学习模型的地下物体分类方法 Douba Jafuno, Ammar Mian, Guillaume Ginolhac, Nickolas Stelzenmuller http://arxiv.org/pdf/2410.07117v2 null
2024-11-20 3D-Aware Instance Segmentation and Tracking in Egocentric Videos 基于自回归的3D感知实例分割与跟踪在自摄视频中 Yash Bhalgat, Vadim Tschernezki, Iro Laina, João F. Henriques, Andrea Vedaldi, Andrew Zisserman http://arxiv.org/pdf/2408.09860v2 null
2024-11-20 Occlusion-Aware Seamless Segmentation 遮挡感知无缝分割 Yihong Cao, Jiaming Zhang, Hao Shi, Kunyu Peng, Yuhongxuan Zhang, Hui Zhang, Rainer Stiefelhagen, Kailun Yang http://arxiv.org/pdf/2407.02182v3 link
2024-11-20 High-Level Parallelism and Nested Features for Dynamic Inference Cost and Top-Down Attention 高级并行性和嵌套特征用于动态推理成本和自上而下注意力 André Peter Kelm, Niels Hannemann, Bruno Heberle, Lucas Schmidt, Tim Rolff, Christian Wilms, Ehsan Yaghoubi, Simone Frintrop http://arxiv.org/pdf/2308.05128v3 null
2024-11-20 Smart Pressure e-Mat for Human Sleeping Posture and Dynamic Activity Recognition 智能压力电子垫:用于人类睡眠姿势和动态活动识别 Liangqi Yuan, Yuan Wei, Jia Li http://arxiv.org/pdf/2305.11367v2 null
2024-11-20 Self-Supervised Place Recognition by Refining Temporal and Featural Pseudo Labels from Panoramic Data 基于全景数据的时序和特征伪标签精炼的自监督地点识别 Chao Chen, Zegang Cheng, Xinhao Liu, Yiming Li, Li Ding, Ruoyu Wang, Chen Feng http://arxiv.org/pdf/2208.09315v3 null
2024-11-20 Word-level Sign Language Recognition with Multi-stream Neural Networks Focusing on Local Regions and Skeletal Information 基于多流神经网络的词汇级手语识别:聚焦于局部区域和骨骼信息 Mizuki Maruyama, Shrey Singh, Katsufumi Inoue, Partha Pratim Roy, Masakazu Iwamura, Michifumi Yoshioka http://arxiv.org/pdf/2106.15989v2 null

图像理解

Publish Date Title Title_CN Authors PDF Code
2024-11-20 Rotation Equivariant Proximal Operator for Deep Unfolding Methods in Image Restoration 图像恢复中的深度展开方法旋转等变近端算子 Jiahong Fu, Qi Xie, Deyu Meng, Zongben Xu http://arxiv.org/pdf/2312.15701v2 link

LLM

Publish Date Title Title_CN Authors PDF Code
2024-11-20 Unification of Balti and trans-border sister dialects in the essence of LLMs and AI Technology 基于LLMs和AI技术的巴尔提语及其跨境姐妹方言统一本质 Muhammad Sharif, Jiangyan Yi, Muhammad Shoaib http://arxiv.org/pdf/2411.13409v1 null
2024-11-20 On the Consistency of Video Large Language Models in Temporal Comprehension 视频大语言模型在时间理解上的一致性 Minjoon Jung, Junbin Xiao, Byoung-Tak Zhang, Angela Yao http://arxiv.org/pdf/2411.12951v1 null
2024-11-20 MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving MiniDrive:基于多层2D特征作为文本标记的更高效视觉-语言模型,以应用于自动驾驶 Enming Zhang, Xingyuan Dai, Yisheng Lv, Qinghai Miao http://arxiv.org/pdf/2409.07267v4 link

Transformer

Publish Date Title Title_CN Authors PDF Code
2024-11-20 Practical Compact Deep Compressed Sensing 实用紧凑深度压缩感知 Bin Chen, Jian Zhang http://arxiv.org/pdf/2411.13081v1 null
2024-11-20 Attentive Contextual Attention for Cloud Removal 云消除的注意力上下文关注 Wenli Huang, Ye Deng, Yang Wu, Jinjun Wang http://arxiv.org/pdf/2411.13042v1 null
2024-11-20 RobustFormer: Noise-Robust Pre-training for images and videos 鲁棒Former:图像和视频的噪声鲁棒预训练 Ashish Bastola, Nishant Luitel, Hao Wang, Danda Pani Paudel, Roshani Poudel, Abolfazl Razi http://arxiv.org/pdf/2411.13040v1 null
2024-11-20 JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation 基于扩散音驱动的面部动态和头部运动生成的肖像和动物图像动画:JoyVASA Xuyang Cao, Guoxin Wang, Sheng Shi, Jun Zhao, Yang Yao, Jintao Fei, Minyu Gao http://arxiv.org/pdf/2411.09209v3 link
2024-11-20 Random Representations Outperform Online Continually Learned Representations 随机表示优于在线持续学习的表示 Ameya Prabhu, Shiven Sinha, Ponnurangam Kumaraguru, Philip H. S. Torr, Ozan Sener, Puneet K. Dokania http://arxiv.org/pdf/2402.08823v3 link
2024-11-20 Informative Scene Graph Generation via Debiasing 基于去偏见的场景图生成 Lianli Gao, Xinyu Lyu, Yuyu Guo, Yuxuan Hu, Yuan-Fang Li, Lu Xu, Heng Tao Shen, Jingkuan Song http://arxiv.org/pdf/2308.05286v2 null

3D/CG

Publish Date Title Title_CN Authors PDF Code
2024-11-20 Unbiased Scene Graph Generation by Type-Aware Message Passing on Heterogeneous and Dual Graphs 基于异构和双图上的类型感知消息传递的无偏场景图生成 Guanglu Sun, Jin Qiu, Lili Liang http://arxiv.org/pdf/2411.13287v1 null
2024-11-20 ESARM: 3D Emotional Speech-to-Animation via Reward Model from Automatically-Ranked Demonstrations ESARM:基于自动排名演示的奖励模型实现3D情感语音到动画 Xulong Zhang, Xiaoyang Qu, Haoxiang Shi, Chunguang Xiao, Jianzong Wang http://arxiv.org/pdf/2411.13089v1 null
2024-11-20 X as Supervision: Contending with Depth Ambiguity in Unsupervised Monocular 3D Pose Estimation X作为监督:解决无监督单目3D姿态估计中的深度模糊性 Yuchen Yang, Xuanyi Liu, Xing Gao, Zhihang Zhong, Xiao Sun http://arxiv.org/pdf/2411.13026v1 null
2024-11-20 M3D: Dual-Stream Selective State Spaces and Depth-Driven Framework for High-Fidelity Single-View 3D Reconstruction M3D:高保真单视图3D重建的流式双流选择性状态空间和深度驱动框架 Luoxi Zhang, Pragyan Shrestha, Yu Zhou, Chun Xie, Itaru Kitahara http://arxiv.org/pdf/2411.12635v2 link
2024-11-20 Capsule Network Projectors are Equivariant and Invariant Learners 胶囊网络投影器是等变和不变学习器 Miles Everett, Aiden Durrant, Mingjun Zhong, Georgios Leontidis http://arxiv.org/pdf/2405.14386v3 link
2024-11-20 A community palm model 社区棕榈模型 Nicholas Clinton, Andreas Vollrath, Remi D'annunzio, Desheng Liu, Henry B. Glick, Adrià Descals, Alicia Sullivan, Oliver Guinan, Jacob Abramowitz, Fred Stolle, et.al. http://arxiv.org/pdf/2405.09530v2 null
2024-11-20 HHAvatar: Gaussian Head Avatar with Dynamic Hairs HHAvatar:动态发丝高斯头部形象 Zhanfeng Liao, Yuelang Xu, Zhe Li, Qijing Li, Boyao Zhou, Ruifeng Bai, Di Xu, Hongwen Zhang, Yebin Liu http://arxiv.org/pdf/2312.03029v3 link
2024-11-20 Accurate Eye Tracking from Dense 3D Surface Reconstructions using Single-Shot Deflectometry 基于单次衍射测量的密集3D表面重建中的精确眼动追踪 Jiazhang Wang, Tianfu Wang, Bingjie Xu, Oliver Cossairt, Florian Willomitzer http://arxiv.org/pdf/2308.07298v3 null

各类学习方式

Publish Date Title Title_CN Authors PDF Code
2024-11-20 AGLP: A Graph Learning Perspective for Semi-supervised Domain Adaptation AGLP:半监督域适应的图学习视角 Houcheng Su, Mengzhu Wang, Jiao Li, Nan Yin, Li Shen http://arxiv.org/pdf/2411.13152v1 null
2024-11-20 TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models TAPT:视觉-语言模型中鲁棒推理的测试时对抗提示调整 Xin Wang, Kai Chen, Jiaming Zhang, Jingjing Chen, Xingjun Ma http://arxiv.org/pdf/2411.13136v1 null
2024-11-20 Improving OOD Generalization of Pre-trained Encoders via Aligned Embedding-Space Ensembles 通过对齐嵌入空间集成提升预训练编码器的OOD泛化能力 Shuman Peng, Arash Khoeini, Sharan Vaswani, Martin Ester http://arxiv.org/pdf/2411.13073v1 null
2024-11-20 Training Physics-Driven Deep Learning Reconstruction without Raw Data Access for Equitable Fast MRI 无原始数据访问权限下基于物理驱动的深度学习重建技术,实现公平快速磁共振成像 Yaşar Utku Alçalar, Merve Gülle, Mehmet Akçakaya http://arxiv.org/pdf/2411.13022v1 null
2024-11-20 VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer VAST:通过零样本表情风格迁移使您的说话虚拟形象生动起来 Liyang Chen, Zhiyong Wu, Runnan Li, Weihong Bao, Jun Ling, Xu Tan, Sheng Zhao http://arxiv.org/pdf/2308.04830v3 null

其他

Publish Date Title Title_CN Authors PDF Code
2024-11-20 HF-Diff: High-Frequency Perceptual Loss and Distribution Matching for One-Step Diffusion-Based Image Super-Resolution HF-Diff:高频感知损失与分布匹配的基于一步扩散的图像超分辨率 Shoaib Meraj Sami, Md Mahedi Hasan, Jeremy Dawson, Nasser Nasrabadi http://arxiv.org/pdf/2411.13548v1 null
2024-11-20 Quantum-Brain: Quantum-Inspired Neural Network Approach to Vision-Brain Understanding 量子脑:基于量子启发的神经网络方法实现视觉-大脑理解 Hoang-Quan Nguyen, Xuan-Bac Nguyen, Hugh Churchill, Arabinda Kumar Choudhary, Pawan Sinha, Samee U. Khan, Khoa Luu http://arxiv.org/pdf/2411.13378v1 null
2024-11-20 WHALES: A Multi-agent Scheduling Dataset for Enhanced Cooperation in Autonomous Driving 鲸鱼:增强自动驾驶中多智能体合作的调度数据集 Siwei Chen, Yinsong, Wang, Ziyi Song, Sheng Zhou http://arxiv.org/pdf/2411.13340v1 null
2024-11-20 Can Reasons Help Improve Pedestrian Intent Estimation? A Cross-Modal Approach 原因能助于提升行人意图估计吗?一种跨模态方法 Vaishnavi Khindkar, Vineeth Balasubramanian, Chetan Arora, Anbumani Subramanian, C. V. Jawahar http://arxiv.org/pdf/2411.13302v1 null
2024-11-20 Analysis and Synthesis Denoisers for Forward-Backward Plug-and-Play Algorithms 正向-反向插件式算法的降噪分析与综合 Matthieu Kowalski, Benoît Malézieux, Thomas Moreau, Audrey Repetti http://arxiv.org/pdf/2411.13276v1 null
2024-11-20 An Integrated Approach to Robotic Object Grasping and Manipulation 机器人抓取与操作的综合方法 Owais Ahmed, M Huzaifa, M Areeb, Hamza Ali Khan http://arxiv.org/pdf/2411.13205v1 null
2024-11-20 SONNET: Enhancing Time Delay Estimation by Leveraging Simulated Audio SONNET:通过利用模拟音频增强时间延迟估计 Erik Tegler, Magnus Oskarsson, Kalle Åström http://arxiv.org/pdf/2411.13179v1 null
2024-11-20 Globally Correlation-Aware Hard Negative Generation 全局相关性感知的硬负样本生成 Wenjie Peng, Hongxiang Huang, Tianshui Chen, Quhui Ke, Gang Dai, Shuangping Huang http://arxiv.org/pdf/2411.13145v1 null
2024-11-20 Superpixel Cost Volume Excitation for Stereo Matching 立体匹配中的超像素代价体积激发 Shanglong Liu, Lin Qi, Junyu Dong, Wenxiang Gu, Liyi Xu http://arxiv.org/pdf/2411.13105v1 null
2024-11-20 Automatic marker-free registration based on similar tetrahedras for single-tree point clouds 基于相似四面体的单树点云自动无标记配准 Jing Ren, Pei Wang, Hanlong Li, Yuhan Wu, Yuhang Gao, Wenxin Chen, Mingtai Zhang, Lingyun Zhang http://arxiv.org/pdf/2411.13069v1 null
2024-11-20 Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation 朝向无偏和鲁棒的时空场景图生成与预测 Rohith Peddi, Saurabh, Ayush Abhay Shrivastava, Parag Singla, Vibhav Gogate http://arxiv.org/pdf/2411.13059v1 null
2024-11-20 Chanel-Orderer: A Channel-Ordering Predictor for Tri-Channel Natural Images Chanel-Orderer:三通道自然图像通道排序预测器 Shen Li, Lei Jiang, Wei Wang, Hongwei Hu, Liang Li http://arxiv.org/pdf/2411.13021v1 null
2024-11-20 Generation of synthetic gait data: application to multiple sclerosis patients' gait patterns 合成步态数据生成:应用于多发性硬化症患者步态模式 Klervi Le Gall, Lise Bellanger, David Laplaud, Aymeric Stamm http://arxiv.org/pdf/2411.10377v2 null
2024-11-20 Exploring the Low-Pass Filtering Behavior in Image Super-Resolution 探索图像超分辨率中的低通滤波行为 Haoyu Deng, Zijing Xu, Yule Duan, Xiao Wu, Wenjie Shu, Liang-Jian Deng http://arxiv.org/pdf/2405.07919v4 link
2024-11-20 PDE-CNNs: Axiomatic Derivations and Applications 偏微分方程卷积神经网络:公理化推导与应用 Gijs Bellaard, Sei Sakata, Bart M. N. Smets, Remco Duits http://arxiv.org/pdf/2403.15182v3 null
2024-11-20 CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement 编码先验引导的压缩视频质量提升聚合网络 Qiang Zhu, Jinhua Hao, Yukang Ding, Yu Liu, Qiao Mo, Ming Sun, Chao Zhou, Shuyuan Zhu http://arxiv.org/pdf/2403.10362v2 null