Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-20 | REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents | REDUCIO!使用极压缩运动潜变量在16秒内生成1024×1024视频 | Rui Tian, Qi Dai, Jianmin Bao, Kai Qiu, Yifan Yang, Chong Luo, Zuxuan Wu, Yu-Gang Jiang | http://arxiv.org/pdf/2411.13552v1 | null |
2024-11-20 | Identity Preserving 3D Head Stylization with Multiview Score Distillation | 基于多视角评分蒸馏的保留身份的3D头部风格化 | Bahri Batuhan Bilecen, Ahmet Berke Gokmen, Furkan Guzelant, Aysegul Dundar | http://arxiv.org/pdf/2411.13536v1 | null |
2024-11-20 | VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models | VBench++:用于视频生成模型的全面和多功能基准套件 | Ziqi Huang, Fan Zhang, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong, Qianli Ma, Nattapol Chanpaisit, Chenyang Si, Yuming Jiang, et.al. | http://arxiv.org/pdf/2411.13503v1 | null |
2024-11-20 | Adversarial Diffusion Compression for Real-World Image Super-Resolution | 对抗扩散压缩实现真实图像超分辨率 | Bin Chen, Gehui Li, Rongyuan Wu, Xindong Zhang, Jie Chen, Jian Zhang, Lei Zhang | http://arxiv.org/pdf/2411.13383v1 | null |
2024-11-20 | DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild | DATAP-SfM:动态感知任意点鲁棒从运动中获取野外结构 | Weicai Ye, Xinyu Chen, Ruohao Zhan, Di Huang, Xiaoshui Huang, Haoyi Zhu, Hujun Bao, Wanli Ouyang, Tong He, Guofeng Zhang | http://arxiv.org/pdf/2411.13291v1 | null |
2024-11-20 | XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation | XMask3D:开放词汇3D语义分割的跨模态掩码推理 | Ziyi Wang, Yanbo Wang, Xumin Yu, Jie Zhou, Jiwen Lu | http://arxiv.org/pdf/2411.13243v1 | null |
2024-11-20 | RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation | RAW-Diffusion:基于RGB引导的扩散模型实现高保真度RAW图像生成 | Christoph Reinders, Radu Berdan, Beril Besbinar, Junji Otsuka, Daisuke Iso | http://arxiv.org/pdf/2411.13150v1 | null |
2024-11-20 | CopyrightMeter: Revisiting Copyright Protection in Text-to-image Models | 版权度量器:重新审视文本到图像模型的版权保护 | Naen Xu, Changjiang Li, Tianyu Du, Minxi Li, Wenjie Luo, Jiacheng Liang, Yuyuan Li, Xuhong Zhang, Meng Han, Jianwei Yin, et.al. | http://arxiv.org/pdf/2411.13144v1 | null |
2024-11-20 | Virtual Staining of Label-Free Tissue in Imaging Mass Spectrometry | 无标记组织在成像质谱中的虚拟染色 | Yijie Zhang, Luzhe Huang, Nir Pillar, Yuzhu Li, Lukasz G. Migas, Raf Van de Plas, Jeffrey M. Spraggins, Aydogan Ozcan | http://arxiv.org/pdf/2411.13120v1 | null |
2024-11-20 | LMM-driven Semantic Image-Text Coding for Ultra Low-bitrate Learned Image Compression | 基于LMM驱动的语义图像-文本编码的超低比特率学习图像压缩 | Shimon Murai, Heming Sun, Jiro Katto | http://arxiv.org/pdf/2411.13033v1 | null |
2024-11-20 | ORID: Organ-Regional Information Driven Framework for Radiology Report Generation | ORID:基于器官-区域信息的放射报告生成框架 | Tiancheng Gu, Kaicheng Yang, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai | http://arxiv.org/pdf/2411.13025v1 | null |
2024-11-20 | Automating Sonologists USG Commands with AI and Voice Interface | 利用AI和语音界面自动化超声学家超声指令 | Emad Mohamed, Shruti Tiwari, Sheena Christabel Pravin | http://arxiv.org/pdf/2411.13006v1 | null |
2024-11-20 | HouseLLM: LLM-Assisted Two-Phase Text-to-Floorplan Generation | HouseLLM:基于LLM的两阶段文本生成平面图 | Ziyang Zong, Zhaohuan Zhan, Guang Tan | http://arxiv.org/pdf/2411.12279v2 | null |
2024-11-20 | TP-UNet: Temporal Prompt Guided UNet for Medical Image Segmentation | 时间提示引导的UNet医疗图像分割 | Ranmin Wang, Limin Zhuang, Hongkun Chen, Boyan Xu, Ruichu Cai | http://arxiv.org/pdf/2411.11305v2 | null |
2024-11-20 | Time Step Generating: A Universal Synthesized Deepfake Image Detector | 时间步生成:一种通用的深度伪造图像检测器 | Ziyue Zeng, Haoyuan Liu, Dingjie Peng, Luoxu Jing, Hiroshi Watanabe | http://arxiv.org/pdf/2411.11016v2 | link |
2024-11-20 | Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step | 对抗评分身份蒸馏:一步超越教师 | Mingyuan Zhou, Huangjie Zheng, Yi Gu, Zhendong Wang, Hai Huang | http://arxiv.org/pdf/2410.14919v3 | link |
2024-11-20 | MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes | MagicDrive3D:可控3D生成以实现街景任意视角渲染 | Ruiyuan Gao, Kai Chen, Zhihao Li, Lanqing Hong, Zhenguo Li, Qiang Xu | http://arxiv.org/pdf/2405.14475v3 | null |
2024-11-20 | Erasing Undesirable Influence in Diffusion Models | 消除扩散模型中的不良影响 | Jing Wu, Trung Le, Munawar Hayat, Mehrtash Harandi | http://arxiv.org/pdf/2401.05779v4 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-20 | Teaching VLMs to Localize Specific Objects from In-context Examples | 从上下文示例中教授VLMs定位特定物体 | Sivan Doveh, Nimrod Shabtay, Wei Lin, Eli Schwartz, Hilde Kuehne, Raja Giryes, Rogerio Feris, Leonid Karlinsky, James Glass, Assaf Arbelle, et.al. | http://arxiv.org/pdf/2411.13317v1 | null |
2024-11-20 | VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation | 视频自动竞技场:通过用户模拟评估大型多模态模型在视频分析中的自动竞技场 | Ziyang Luo, Haoning Wu, Dongxu Li, Jing Ma, Mohan Kankanhalli, Junnan Li | http://arxiv.org/pdf/2411.13281v1 | null |
2024-11-20 | DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving | 多模态大型语言模型在自动驾驶中的空间理解基准:DriveMLLM | Xianda Guo, Ruijun Zhang, Yiqun Duan, Yuhang He, Chenming Zhang, Shuai Liu, Long Chen | http://arxiv.org/pdf/2411.13112v1 | null |
2024-11-20 | Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving | 自动驾驶中多模态LLM视觉表示增强的提示线索 | Hao Zhou, Zhanning Gao, Maosheng Ye, Zhili Chen, Qifeng Chen, Tongyi Cao, Honggang Qi | http://arxiv.org/pdf/2411.13076v1 | null |
2024-11-20 | Efficient Masked AutoEncoder for Video Object Counting and A Large-Scale Benchmark | 高效掩码自编码器在视频目标计数及大规模基准上的应用 | Bing Cao, Quanhao Lu, Jiekang Feng, Pengfei Zhu, Qinghua Hu, Qilong Wang | http://arxiv.org/pdf/2411.13056v1 | null |
2024-11-20 | MEGL: Multimodal Explanation-Guided Learning | MEGL:多模态解释引导学习 | Yifei Zhang, Tianxu Jiang, Bo Pan, Jingyu Wang, Guangji Bai, Liang Zhao | http://arxiv.org/pdf/2411.13053v1 | null |
2024-11-20 | Unsupervised Homography Estimation on Multimodal Image Pair via Alternating Optimization | 基于交替优化的多模态图像对无监督单应性估计 | Sanghyeob Song, Jaihyun Lew, Hyemi Jang, Sungroh Yoon | http://arxiv.org/pdf/2411.13036v1 | null |
2024-11-20 | LaVida Drive: Vision-Text Interaction VLM for Autonomous Driving with Token Selection, Recovery and Enhancement | LaVida Drive:基于token选择、恢复和增强的视觉-文本交互VLM自动驾驶系统 | Siwen Jiao, Yangyi Fang | http://arxiv.org/pdf/2411.12980v1 | null |
2024-11-20 | Constraint Learning for Parametric Point Cloud | 参数点云的约束学习 | Xi Cheng, Ruiqi Lei, Di Huang, Zhichao Liao, Fengyuan Piao, Yan Chen, Pingfa Feng, Long Zeng | http://arxiv.org/pdf/2411.07747v3 | null |
2024-11-20 | Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training | Mono-InternVL:通过内源视觉预训练推动单体多模态大型语言模型的边界 | Gen Luo, Xue Yang, Wenhan Dou, Zhaokai Wang, Jiawen Liu, Jifeng Dai, Yu Qiao, Xizhou Zhu | http://arxiv.org/pdf/2410.08202v2 | null |
2024-11-20 | DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction | DAOcc:基于3D目标检测的多传感器融合三维占用预测 | Zhen Yang, Yanpeng Dong, Heng Wang, Lichao Ma, Zijian Cui, Qi Liu, Haoran Pei | http://arxiv.org/pdf/2409.19972v2 | link |
2024-11-20 | MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation | MMTryon:高质量时尚生成中的多模态多参考控制 | Xujie Zhang, Ente Lin, Xiu Li, Yuxuan Luo, Michael Kampffmeyer, Xin Dong, Xiaodan Liang | http://arxiv.org/pdf/2405.00448v4 | null |
2024-11-20 | A Multi-scale Information Integration Framework for Infrared and Visible Image Fusion | 多尺度红外与可见光图像融合信息集成框架 | Guang Yang, Jie Li, Hanxiao Lei, Xinbo Gao | http://arxiv.org/pdf/2312.04328v2 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-20 | GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting | gazeGaussian:基于3D高斯散布的高保真注视重定向 | Xiaobao Wei, Peng Chen, Guangyu Li, Ming Lu, Hui Chen, Feng Tian | http://arxiv.org/pdf/2411.12981v1 | null |
2024-11-20 | Voxel-Mesh Hybrid Representation for Real-Time View Synthesis | 基于体素-网格混合表示的实时视图合成 | Chenhao Zhang, Yongyang Zhou, Lei Zhang | http://arxiv.org/pdf/2403.06505v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-20 | Generating 3D-Consistent Videos from Unposed Internet Photos | 从未摆姿势的互联网照片生成三维一致的视频 | Gene Chou, Kai Zhang, Sai Bi, Hao Tan, Zexiang Xu, Fujun Luan, Bharath Hariharan, Noah Snavely | http://arxiv.org/pdf/2411.13549v1 | null |
2024-11-20 | Beyond Gaussians: Fast and High-Fidelity 3D Splatting with Linear Kernels | 超越高斯:线性核的快速高保真3D分层渲染 | Haodong Chen, Runnan Chen, Qiang Qu, Zhaoqing Wang, Tongliang Liu, Xiaoming Chen, Yuk Ying Chung | http://arxiv.org/pdf/2411.12440v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-20 | Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning | 极简化的极限:极端剪枝的技巧集 | Andy Li, Aiden Durrant, Milan Markovic, Lu Yin, Georgios Leontidis | http://arxiv.org/pdf/2411.13545v1 | null |
2024-11-20 | RTSR: A Real-Time Super-Resolution Model for AV1 Compressed Content | 实时AV1压缩内容超分辨率模型:RTSR | Yuxuan Jiang, Jakub Nawała, Chen Feng, Fan Zhang, Xiaoqing Zhu, Joel Sole, David Bull | http://arxiv.org/pdf/2411.13362v1 | null |
2024-11-20 | DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes | 分布式高斯涡轮重建稀疏视图大规模场景 | Hao Li, Yuanyuan Gao, Haosong Peng, Chenming Wu, Weicai Ye, Yufeng Zhan, Chen Zhao, Dingwen Zhang, Jingdong Wang, Junwei Han | http://arxiv.org/pdf/2411.12309v2 | null |
2024-11-20 | DINO-LG: A Task-Specific DINO Model for Coronary Calcium Scoring | DINO-LG:针对冠状动脉钙评分的任务特定DINO模型 | Mahmut S. Gokmen, Caner Ozcan, Cody Bumgardner | http://arxiv.org/pdf/2411.07976v4 | null |
2024-11-20 | QIANets: Quantum-Integrated Adaptive Networks for Reduced Latency and Improved Inference Times in CNN Models | QIANets:用于降低延迟和提高CNN模型推理时间的量子集成自适应网络 | Zhumazhan Balapanov, Vanessa Matvei, Olivia Holmberg, Edward Magongo, Jonathan Pei, Kevin Zhu | http://arxiv.org/pdf/2410.10318v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-20 | AI-generated Image Detection: Passive or Watermark? | 人工智能生成图像检测:被动式还是水印式? | Moyang Guo, Yuepeng Hu, Zhengyuan Jiang, Zeyu Li, Amir Sadovnik, Arka Daw, Neil Gong | http://arxiv.org/pdf/2411.13553v1 | null |
2024-11-20 | Find Any Part in 3D | 三维中查找任意部分 | Ziqi Ma, Yisong Yue, Georgia Gkioxari | http://arxiv.org/pdf/2411.13550v1 | null |
2024-11-20 | DIS-Mine: Instance Segmentation for Disaster-Awareness in Poor-Light Condition in Underground Mines | DIS-Mine:地下矿井弱光条件下灾害感知的实例分割 | Mizanur Rahman Jewel, Mohamed Elmahallawy, Sanjay Madria, Samuel Frimpong | http://arxiv.org/pdf/2411.13544v1 | null |
2024-11-20 | Comparative Analysis of Machine Learning and Deep Learning Models for Classifying Squamous Epithelial Cells of the Cervix | 宫颈鳞状上皮细胞分类中机器学习与深度学习模型的比较分析 | Subhasish Das, Satish K Panda, Madhusmita Sethy, Prajna Paramita Giri, Ashwini K Nanda | http://arxiv.org/pdf/2411.13535v1 | null |
2024-11-20 | Entropy Bootstrapping for Weakly Supervised Nuclei Detection | 基于熵引导的弱监督细胞核检测 | James Willoughby, Irina Voiculescu | http://arxiv.org/pdf/2411.13528v1 | null |
2024-11-20 | Geometric Algebra Planes: Convex Implicit Neural Volumes | 几何代数平面:凸隐式神经体积 | Irmak Sivgin, Sara Fridovich-Keil, Gordon Wetzstein, Mert Pilanci | http://arxiv.org/pdf/2411.13525v1 | null |
2024-11-20 | Efficient Brain Imaging Analysis for Alzheimer's and Dementia Detection Using Convolution-Derivative Operations | 基于卷积-导数运算的高效脑部影像分析用于阿尔茨海默病和痴呆检测 | Yasmine Mustafa, Mohamed Elmahallawy, Tie Luo | http://arxiv.org/pdf/2411.13490v1 | null |
2024-11-20 | Learning based Ge'ez character handwritten recognition | 基于学习的古吉兹文字手写识别 | Hailemicael Lulseged Yimer, Hailegabriel Dereje Degefa, Marco Cristani, Federico Cunico | http://arxiv.org/pdf/2411.13350v1 | null |
2024-11-20 | A Resource Efficient Fusion Network for Object Detection in Bird's-Eye View using Camera and Raw Radar Data | 基于相机和原始雷达数据在鸟瞰视图中进行目标检测的资源高效融合网络 | Kavin Chandrasekaran, Sorin Grigorescu, Gijs Dubbelman, Pavol Jancura | http://arxiv.org/pdf/2411.13311v1 | null |
2024-11-20 | DATTA: Domain-Adversarial Test-Time Adaptation for Cross-Domain WiFi-Based Human Activity Recognition | DATTA:基于跨域WiFi的人体活动识别的域对抗测试时自适应 | Julian Strohmayer, Rafael Sterzinger, Matthias Wödlinger, Martin Kampel | http://arxiv.org/pdf/2411.13284v1 | null |
2024-11-20 | Paying more attention to local contrast: improving infrared small target detection performance via prior knowledge | 关注局部对比度提升:利用先验知识改善红外小目标检测性能 | Peichao Wang, Jiabao Wang, Yao Chen, Rui Zhang, Yang Li, Zhuang Miao | http://arxiv.org/pdf/2411.13260v1 | null |
2024-11-20 | BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation | BelHouse3D:用于评估3D点云语义分割遮挡鲁棒性的基准数据集 | Umamaheswaran Raman Kumar, Abdur Razzaq Fayjie, Jurgen Hannaert, Patrick Vandewalle | http://arxiv.org/pdf/2411.13251v1 | null |
2024-11-20 | ViSTa Dataset: Do vision-language models understand sequential tasks? | ViSTa数据集:视觉-语言模型是否理解顺序任务? | Evžen Wybitul, Evan Ryan Gunter, Mikhail Seleznyov | http://arxiv.org/pdf/2411.13211v1 | null |
2024-11-20 | Intensity-Spatial Dual Masked Autoencoder for Multi-Scale Feature Learning in Chest CT Segmentation | 强度-空间双重掩码自编码器在胸部CT分割中的多尺度特征学习 | Yuexing Ding, Jun Wang, Hongbing Lyu | http://arxiv.org/pdf/2411.13198v1 | null |
2024-11-20 | VADet: Multi-frame LiDAR 3D Object Detection using Variable Aggregation | VADet:基于可变聚合的多帧激光雷达3D目标检测 | Chengjie Huang, Vahdat Abdelzad, Sean Sedwards, Krzysztof Czarnecki | http://arxiv.org/pdf/2411.13186v1 | null |
2024-11-20 | Click; Single Object Tracking; Video Object Segmentation; Real-time Interaction | 点击;单目标跟踪;视频目标分割;实时交互 | Kuiran Wang, Xuehui Yu, Wenwen Yu, Guorong Li, Xiangyuan Lan, Qixiang Ye, Jianbin Jiao, Zhenjun Han | http://arxiv.org/pdf/2411.13183v1 | null |
2024-11-20 | Cross-Camera Distracted Driver Classification through Feature Disentanglement and Contrastive Learning | 基于特征解耦和对比学习的跨摄像头分心驾驶员分类 | Simone Bianco, Luigi Celona, Paolo Napoletano | http://arxiv.org/pdf/2411.13181v1 | null |
2024-11-20 | YCB-LUMA: YCB Object Dataset with Luminance Keying for Object Localization | YCB-LUMA:基于亮度关键帧的YCB目标数据集用于目标定位 | Thomas Pöllabauer | http://arxiv.org/pdf/2411.13149v1 | null |
2024-11-20 | GraphCL: Graph-based Clustering for Semi-Supervised Medical Image Segmentation | 基于图的半监督医学图像分割聚类 | Mengzhu Wang, Jiao Li, Houcheng Su, Nan Yin, Shen Li | http://arxiv.org/pdf/2411.13147v1 | null |
2024-11-20 | Adapting Vision Foundation Models for Robust Cloud Segmentation in Remote Sensing Images | 远程传感图像中鲁棒云分割的视觉基础模型自适应 | Xuechao Zou, Shun Zhang, Kai Li, Shiying Wang, Junliang Xing, Lei Jin, Congyan Lang, Pin Tao | http://arxiv.org/pdf/2411.13127v1 | null |
2024-11-20 | Demonstrating the Suitability of Neuromorphic, Event-Based, Dynamic Vision Sensors for In Process Monitoring of Metallic Additive Manufacturing and Welding | 展示神经形态、事件驱动、动态视觉传感器在金属增材制造和焊接过程监控中的适用性 | David Mascareñas, Andre Green, Ashlee Liao, Michael Torrez, Alessandro Cattaneo, Amber Black, John Bernardin, Garrett Kenyon | http://arxiv.org/pdf/2411.13108v1 | null |
2024-11-20 | Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension | 视频-RAG:视觉对齐的检索增强长视频理解 | Yongdong Luo, Xiawu Zheng, Xiao Yang, Guilin Li, Haojia Lin, Jinfa Huang, Jiayi Ji, Fei Chao, Jiebo Luo, Rongrong Ji | http://arxiv.org/pdf/2411.13093v1 | null |
2024-11-20 | Bounding-box Watermarking: Defense against Model Extraction Attacks on Object Detectors | 边界框水印:针对目标检测器模型提取攻击的防御 | Satoru Koda, Ikuya Morikawa | http://arxiv.org/pdf/2411.13047v1 | null |
2024-11-20 | Prior-based Objective Inference Mining Potential Uncertainty for Facial Expression Recognition | 基于先验的目标推断挖掘面部表情识别中的潜在不确定性 | Hanwei Liu, Huiling Cai, Qingcheng Lin, Xuefeng Li, Hui Xiao | http://arxiv.org/pdf/2411.13024v1 | null |
2024-11-20 | Open-World Amodal Appearance Completion | 开放式世界非模态外观完成 | Jiayang Ao, Yanbei Jiang, Qiuhong Ke, Krista A. Ehinger | http://arxiv.org/pdf/2411.13019v1 | null |
2024-11-20 | DT-LSD: Deformable Transformer-based Line Segment Detection | 基于可变形变换器的线段检测:DT-LSD | Sebastian Janampa, Marios Pattichis | http://arxiv.org/pdf/2411.13005v1 | null |
2024-11-20 | Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection | 开放集半监督目标检测中的协同特征-对数对比学习 | Xinhao Zhong, Siyu Jiao, Yao Zhao, Yunchao Wei | http://arxiv.org/pdf/2411.13001v1 | null |
2024-11-20 | Enhancing Thermal MOT: A Novel Box Association Method Leveraging Thermal Identity and Motion Similarity | 增强热MOT:利用热身份和运动相似性的新型框关联方法 | Wassim El Ahmar, Dhanvin Kolhatkar, Farzan Nowruzi, Robert Laganiere | http://arxiv.org/pdf/2411.12943v1 | null |
2024-11-20 | Topological Symmetry Enhanced Graph Convolution for Skeleton-Based Action Recognition | 拓扑对称性增强的基于骨架的动作识别图卷积 | Zeyu Liang, Hailun Xia, Naichuan Zheng, Huan Xu | http://arxiv.org/pdf/2411.12560v2 | link |
2024-11-20 | CLIP Unreasonable Potential in Single-Shot Face Recognition | CLIP在单次人脸识别中的巨大潜力 | Nhan T. Luu | http://arxiv.org/pdf/2411.12319v2 | null |
2024-11-20 | CRT-Fusion: Camera, Radar, Temporal Fusion Using Motion Information for 3D Object Detection | CRT-Fusion:利用运动信息实现相机、雷达、时序融合的3D目标检测 | Jisong Kim, Minjae Seong, Jun Won Choi | http://arxiv.org/pdf/2411.03013v2 | null |
2024-11-20 | Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models | 利用视觉数据上下文不确定性进行深度模型高效训练 | Sharat Agarwal | http://arxiv.org/pdf/2411.01925v2 | null |
2024-11-20 | TALoS: Enhancing Semantic Scene Completion via Test-time Adaptation on the Line of Sight | TALoS:通过视距测试时自适应增强语义场景补全 | Hyun-Kurl Jang, Jihun Kim, Hyeokjun Kweon, Kuk-Jin Yoon | http://arxiv.org/pdf/2410.15674v2 | link |
2024-11-20 | Multiview Scene Graph | 多视角场景图 | Juexiao Zhang, Gao Zhu, Sihang Li, Xinhao Liu, Haorui Song, Xinran Tang, Chen Feng | http://arxiv.org/pdf/2410.11187v3 | link |
2024-11-20 | SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data | SynFER:利用合成数据提升面部表情识别 | Xilin He, Cheng Luo, Xiaole Xian, Bing Li, Siyang Song, Muhammad Haris Khan, Weicheng Xie, Linlin Shen, Zongyuan Ge | http://arxiv.org/pdf/2410.09865v2 | null |
2024-11-20 | Classification of Buried Objects from Ground Penetrating Radar Images by using Second Order Deep Learning Models | 基于二阶深度学习模型的地下物体分类方法 | Douba Jafuno, Ammar Mian, Guillaume Ginolhac, Nickolas Stelzenmuller | http://arxiv.org/pdf/2410.07117v2 | null |
2024-11-20 | 3D-Aware Instance Segmentation and Tracking in Egocentric Videos | 基于自回归的3D感知实例分割与跟踪在自摄视频中 | Yash Bhalgat, Vadim Tschernezki, Iro Laina, João F. Henriques, Andrea Vedaldi, Andrew Zisserman | http://arxiv.org/pdf/2408.09860v2 | null |
2024-11-20 | Occlusion-Aware Seamless Segmentation | 遮挡感知无缝分割 | Yihong Cao, Jiaming Zhang, Hao Shi, Kunyu Peng, Yuhongxuan Zhang, Hui Zhang, Rainer Stiefelhagen, Kailun Yang | http://arxiv.org/pdf/2407.02182v3 | link |
2024-11-20 | High-Level Parallelism and Nested Features for Dynamic Inference Cost and Top-Down Attention | 高级并行性和嵌套特征用于动态推理成本和自上而下注意力 | André Peter Kelm, Niels Hannemann, Bruno Heberle, Lucas Schmidt, Tim Rolff, Christian Wilms, Ehsan Yaghoubi, Simone Frintrop | http://arxiv.org/pdf/2308.05128v3 | null |
2024-11-20 | Smart Pressure e-Mat for Human Sleeping Posture and Dynamic Activity Recognition | 智能压力电子垫:用于人类睡眠姿势和动态活动识别 | Liangqi Yuan, Yuan Wei, Jia Li | http://arxiv.org/pdf/2305.11367v2 | null |
2024-11-20 | Self-Supervised Place Recognition by Refining Temporal and Featural Pseudo Labels from Panoramic Data | 基于全景数据的时序和特征伪标签精炼的自监督地点识别 | Chao Chen, Zegang Cheng, Xinhao Liu, Yiming Li, Li Ding, Ruoyu Wang, Chen Feng | http://arxiv.org/pdf/2208.09315v3 | null |
2024-11-20 | Word-level Sign Language Recognition with Multi-stream Neural Networks Focusing on Local Regions and Skeletal Information | 基于多流神经网络的词汇级手语识别:聚焦于局部区域和骨骼信息 | Mizuki Maruyama, Shrey Singh, Katsufumi Inoue, Partha Pratim Roy, Masakazu Iwamura, Michifumi Yoshioka | http://arxiv.org/pdf/2106.15989v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-20 | Rotation Equivariant Proximal Operator for Deep Unfolding Methods in Image Restoration | 图像恢复中的深度展开方法旋转等变近端算子 | Jiahong Fu, Qi Xie, Deyu Meng, Zongben Xu | http://arxiv.org/pdf/2312.15701v2 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-20 | Unification of Balti and trans-border sister dialects in the essence of LLMs and AI Technology | 基于LLMs和AI技术的巴尔提语及其跨境姐妹方言统一本质 | Muhammad Sharif, Jiangyan Yi, Muhammad Shoaib | http://arxiv.org/pdf/2411.13409v1 | null |
2024-11-20 | On the Consistency of Video Large Language Models in Temporal Comprehension | 视频大语言模型在时间理解上的一致性 | Minjoon Jung, Junbin Xiao, Byoung-Tak Zhang, Angela Yao | http://arxiv.org/pdf/2411.12951v1 | null |
2024-11-20 | MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving | MiniDrive:基于多层2D特征作为文本标记的更高效视觉-语言模型,以应用于自动驾驶 | Enming Zhang, Xingyuan Dai, Yisheng Lv, Qinghai Miao | http://arxiv.org/pdf/2409.07267v4 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-20 | Practical Compact Deep Compressed Sensing | 实用紧凑深度压缩感知 | Bin Chen, Jian Zhang | http://arxiv.org/pdf/2411.13081v1 | null |
2024-11-20 | Attentive Contextual Attention for Cloud Removal | 云消除的注意力上下文关注 | Wenli Huang, Ye Deng, Yang Wu, Jinjun Wang | http://arxiv.org/pdf/2411.13042v1 | null |
2024-11-20 | RobustFormer: Noise-Robust Pre-training for images and videos | 鲁棒Former:图像和视频的噪声鲁棒预训练 | Ashish Bastola, Nishant Luitel, Hao Wang, Danda Pani Paudel, Roshani Poudel, Abolfazl Razi | http://arxiv.org/pdf/2411.13040v1 | null |
2024-11-20 | JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation | 基于扩散音驱动的面部动态和头部运动生成的肖像和动物图像动画:JoyVASA | Xuyang Cao, Guoxin Wang, Sheng Shi, Jun Zhao, Yang Yao, Jintao Fei, Minyu Gao | http://arxiv.org/pdf/2411.09209v3 | link |
2024-11-20 | Random Representations Outperform Online Continually Learned Representations | 随机表示优于在线持续学习的表示 | Ameya Prabhu, Shiven Sinha, Ponnurangam Kumaraguru, Philip H. S. Torr, Ozan Sener, Puneet K. Dokania | http://arxiv.org/pdf/2402.08823v3 | link |
2024-11-20 | Informative Scene Graph Generation via Debiasing | 基于去偏见的场景图生成 | Lianli Gao, Xinyu Lyu, Yuyu Guo, Yuxuan Hu, Yuan-Fang Li, Lu Xu, Heng Tao Shen, Jingkuan Song | http://arxiv.org/pdf/2308.05286v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-20 | Unbiased Scene Graph Generation by Type-Aware Message Passing on Heterogeneous and Dual Graphs | 基于异构和双图上的类型感知消息传递的无偏场景图生成 | Guanglu Sun, Jin Qiu, Lili Liang | http://arxiv.org/pdf/2411.13287v1 | null |
2024-11-20 | ESARM: 3D Emotional Speech-to-Animation via Reward Model from Automatically-Ranked Demonstrations | ESARM:基于自动排名演示的奖励模型实现3D情感语音到动画 | Xulong Zhang, Xiaoyang Qu, Haoxiang Shi, Chunguang Xiao, Jianzong Wang | http://arxiv.org/pdf/2411.13089v1 | null |
2024-11-20 | X as Supervision: Contending with Depth Ambiguity in Unsupervised Monocular 3D Pose Estimation | X作为监督:解决无监督单目3D姿态估计中的深度模糊性 | Yuchen Yang, Xuanyi Liu, Xing Gao, Zhihang Zhong, Xiao Sun | http://arxiv.org/pdf/2411.13026v1 | null |
2024-11-20 | M3D: Dual-Stream Selective State Spaces and Depth-Driven Framework for High-Fidelity Single-View 3D Reconstruction | M3D:高保真单视图3D重建的流式双流选择性状态空间和深度驱动框架 | Luoxi Zhang, Pragyan Shrestha, Yu Zhou, Chun Xie, Itaru Kitahara | http://arxiv.org/pdf/2411.12635v2 | link |
2024-11-20 | Capsule Network Projectors are Equivariant and Invariant Learners | 胶囊网络投影器是等变和不变学习器 | Miles Everett, Aiden Durrant, Mingjun Zhong, Georgios Leontidis | http://arxiv.org/pdf/2405.14386v3 | link |
2024-11-20 | A community palm model | 社区棕榈模型 | Nicholas Clinton, Andreas Vollrath, Remi D'annunzio, Desheng Liu, Henry B. Glick, Adrià Descals, Alicia Sullivan, Oliver Guinan, Jacob Abramowitz, Fred Stolle, et.al. | http://arxiv.org/pdf/2405.09530v2 | null |
2024-11-20 | HHAvatar: Gaussian Head Avatar with Dynamic Hairs | HHAvatar:动态发丝高斯头部形象 | Zhanfeng Liao, Yuelang Xu, Zhe Li, Qijing Li, Boyao Zhou, Ruifeng Bai, Di Xu, Hongwen Zhang, Yebin Liu | http://arxiv.org/pdf/2312.03029v3 | link |
2024-11-20 | Accurate Eye Tracking from Dense 3D Surface Reconstructions using Single-Shot Deflectometry | 基于单次衍射测量的密集3D表面重建中的精确眼动追踪 | Jiazhang Wang, Tianfu Wang, Bingjie Xu, Oliver Cossairt, Florian Willomitzer | http://arxiv.org/pdf/2308.07298v3 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-20 | AGLP: A Graph Learning Perspective for Semi-supervised Domain Adaptation | AGLP:半监督域适应的图学习视角 | Houcheng Su, Mengzhu Wang, Jiao Li, Nan Yin, Li Shen | http://arxiv.org/pdf/2411.13152v1 | null |
2024-11-20 | TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models | TAPT:视觉-语言模型中鲁棒推理的测试时对抗提示调整 | Xin Wang, Kai Chen, Jiaming Zhang, Jingjing Chen, Xingjun Ma | http://arxiv.org/pdf/2411.13136v1 | null |
2024-11-20 | Improving OOD Generalization of Pre-trained Encoders via Aligned Embedding-Space Ensembles | 通过对齐嵌入空间集成提升预训练编码器的OOD泛化能力 | Shuman Peng, Arash Khoeini, Sharan Vaswani, Martin Ester | http://arxiv.org/pdf/2411.13073v1 | null |
2024-11-20 | Training Physics-Driven Deep Learning Reconstruction without Raw Data Access for Equitable Fast MRI | 无原始数据访问权限下基于物理驱动的深度学习重建技术,实现公平快速磁共振成像 | Yaşar Utku Alçalar, Merve Gülle, Mehmet Akçakaya | http://arxiv.org/pdf/2411.13022v1 | null |
2024-11-20 | VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer | VAST:通过零样本表情风格迁移使您的说话虚拟形象生动起来 | Liyang Chen, Zhiyong Wu, Runnan Li, Weihong Bao, Jun Ling, Xu Tan, Sheng Zhao | http://arxiv.org/pdf/2308.04830v3 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-20 | HF-Diff: High-Frequency Perceptual Loss and Distribution Matching for One-Step Diffusion-Based Image Super-Resolution | HF-Diff:高频感知损失与分布匹配的基于一步扩散的图像超分辨率 | Shoaib Meraj Sami, Md Mahedi Hasan, Jeremy Dawson, Nasser Nasrabadi | http://arxiv.org/pdf/2411.13548v1 | null |
2024-11-20 | Quantum-Brain: Quantum-Inspired Neural Network Approach to Vision-Brain Understanding | 量子脑:基于量子启发的神经网络方法实现视觉-大脑理解 | Hoang-Quan Nguyen, Xuan-Bac Nguyen, Hugh Churchill, Arabinda Kumar Choudhary, Pawan Sinha, Samee U. Khan, Khoa Luu | http://arxiv.org/pdf/2411.13378v1 | null |
2024-11-20 | WHALES: A Multi-agent Scheduling Dataset for Enhanced Cooperation in Autonomous Driving | 鲸鱼:增强自动驾驶中多智能体合作的调度数据集 | Siwei Chen, Yinsong, Wang, Ziyi Song, Sheng Zhou | http://arxiv.org/pdf/2411.13340v1 | null |
2024-11-20 | Can Reasons Help Improve Pedestrian Intent Estimation? A Cross-Modal Approach | 原因能助于提升行人意图估计吗?一种跨模态方法 | Vaishnavi Khindkar, Vineeth Balasubramanian, Chetan Arora, Anbumani Subramanian, C. V. Jawahar | http://arxiv.org/pdf/2411.13302v1 | null |
2024-11-20 | Analysis and Synthesis Denoisers for Forward-Backward Plug-and-Play Algorithms | 正向-反向插件式算法的降噪分析与综合 | Matthieu Kowalski, Benoît Malézieux, Thomas Moreau, Audrey Repetti | http://arxiv.org/pdf/2411.13276v1 | null |
2024-11-20 | An Integrated Approach to Robotic Object Grasping and Manipulation | 机器人抓取与操作的综合方法 | Owais Ahmed, M Huzaifa, M Areeb, Hamza Ali Khan | http://arxiv.org/pdf/2411.13205v1 | null |
2024-11-20 | SONNET: Enhancing Time Delay Estimation by Leveraging Simulated Audio | SONNET:通过利用模拟音频增强时间延迟估计 | Erik Tegler, Magnus Oskarsson, Kalle Åström | http://arxiv.org/pdf/2411.13179v1 | null |
2024-11-20 | Globally Correlation-Aware Hard Negative Generation | 全局相关性感知的硬负样本生成 | Wenjie Peng, Hongxiang Huang, Tianshui Chen, Quhui Ke, Gang Dai, Shuangping Huang | http://arxiv.org/pdf/2411.13145v1 | null |
2024-11-20 | Superpixel Cost Volume Excitation for Stereo Matching | 立体匹配中的超像素代价体积激发 | Shanglong Liu, Lin Qi, Junyu Dong, Wenxiang Gu, Liyi Xu | http://arxiv.org/pdf/2411.13105v1 | null |
2024-11-20 | Automatic marker-free registration based on similar tetrahedras for single-tree point clouds | 基于相似四面体的单树点云自动无标记配准 | Jing Ren, Pei Wang, Hanlong Li, Yuhan Wu, Yuhang Gao, Wenxin Chen, Mingtai Zhang, Lingyun Zhang | http://arxiv.org/pdf/2411.13069v1 | null |
2024-11-20 | Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation | 朝向无偏和鲁棒的时空场景图生成与预测 | Rohith Peddi, Saurabh, Ayush Abhay Shrivastava, Parag Singla, Vibhav Gogate | http://arxiv.org/pdf/2411.13059v1 | null |
2024-11-20 | Chanel-Orderer: A Channel-Ordering Predictor for Tri-Channel Natural Images | Chanel-Orderer:三通道自然图像通道排序预测器 | Shen Li, Lei Jiang, Wei Wang, Hongwei Hu, Liang Li | http://arxiv.org/pdf/2411.13021v1 | null |
2024-11-20 | Generation of synthetic gait data: application to multiple sclerosis patients' gait patterns | 合成步态数据生成:应用于多发性硬化症患者步态模式 | Klervi Le Gall, Lise Bellanger, David Laplaud, Aymeric Stamm | http://arxiv.org/pdf/2411.10377v2 | null |
2024-11-20 | Exploring the Low-Pass Filtering Behavior in Image Super-Resolution | 探索图像超分辨率中的低通滤波行为 | Haoyu Deng, Zijing Xu, Yule Duan, Xiao Wu, Wenjie Shu, Liang-Jian Deng | http://arxiv.org/pdf/2405.07919v4 | link |
2024-11-20 | PDE-CNNs: Axiomatic Derivations and Applications | 偏微分方程卷积神经网络:公理化推导与应用 | Gijs Bellaard, Sei Sakata, Bart M. N. Smets, Remco Duits | http://arxiv.org/pdf/2403.15182v3 | null |
2024-11-20 | CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement | 编码先验引导的压缩视频质量提升聚合网络 | Qiang Zhu, Jinhua Hao, Yukang Ding, Yu Liu, Qiao Mo, Ming Sun, Chao Zhou, Shuyuan Zhu | http://arxiv.org/pdf/2403.10362v2 | null |