Skip to content

Latest commit

 

History

History
executable file
·
165 lines (140 loc) · 28.9 KB

2024-04-02.md

File metadata and controls

executable file
·
165 lines (140 loc) · 28.9 KB

[UPDATED!] 2024-04-02 (Publish Time)

生成模型

Publish Date Title Title_CN Authors PDF Code
2024-04-02 GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image GeneAvatar:从单个图像编辑通用表达感知体积头部头像 Chong Bao, Yinda Zhang, Yuan Li, Xiyu Zhang, Bangbang Yang, Hujun Bao, Marc Pollefeys, Guofeng Zhang, Zhaopeng Cui http://arxiv.org/pdf/2404.02152v1 null
2024-04-02 Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models Diffusion$^2$:通过正交扩散模型的分数组合生成动态 3D 内容 Zeyu Yang, Zijie Pan, Chun Gu, Li Zhang http://arxiv.org/pdf/2404.02148v1 null
2024-04-02 3D Congealing: 3D-Aware Image Alignment in the Wild 3D 凝结:野外 3D 感知图像对齐 Yunzhi Zhang, Zizhang Li, Amit Raj, Andreas Engelhardt, Yuanzhen Li, Tingbo Hou, Jiajun Wu, Varun Jampani http://arxiv.org/pdf/2404.02125v1 null
2024-04-02 Neural Ordinary Differential Equation based Sequential Image Registration for Dynamic Characterization 基于神经常微分方程的动态表征序列图像配准 Yifan Wu, Mengjin Dong, Rohit Jena, Chen Qin, James C. Gee http://arxiv.org/pdf/2404.02106v1 null
2024-04-02 WcDT: World-centric Diffusion Transformer for Traffic Scene Generation WcDT:用于生成交通场景的以世界为中心的扩散变压器 Chen Yang, Aaron Xuxiang Tian, Dong Chen, Tianyu Shi, Arsalan Heydarian http://arxiv.org/pdf/2404.02082v1 null
2024-04-02 Bi-LORA: A Vision-Language Approach for Synthetic Image Detection Bi-LORA:一种用于合成图像检测的视觉语言方法 Mamadou Keita, Wassim Hamidouche, Hessen Bougueffa Eutamene, Abdenour Hadid, Abdelmalik Taleb-Ahmed http://arxiv.org/pdf/2404.01959v1 null
2024-04-02 3D Scene Generation from Scene Graphs and Self-Attention 从场景图生成 3D 场景和自注意力 Pietro Bonazzi, Mengqi Wang, Diego Martin Arroyo, Fabian Manhardt, Federico Tombari http://arxiv.org/pdf/2404.01887v1 null
2024-04-02 Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model 通过运动解耦扩散模型生成协同语音手势视频 Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu http://arxiv.org/pdf/2404.01862v1 null
2024-04-02 Contextual Embedding Learning to Enhance 2D Networks for Volumetric Image Segmentation 上下文嵌入学习增强 2D 网络的体积图像分割 Zhuoyuan Wang, Dong Sun, Xiangyun Zeng, Ruodai Wu, Yi Wang http://arxiv.org/pdf/2404.01723v1 null
2024-04-02 Upsample Guidance: Scale Up Diffusion Models without Training 上采样指导:无需训练即可扩大扩散模型 Juno Hwang, Yong-Hyun Park, Junghyo Jo http://arxiv.org/pdf/2404.01709v1 null
2024-04-02 MotionChain: Conversational Motion Controllers via Multimodal Prompts MotionChain:通过多模式提示的对话式运动控制器 Biao Jiang, Xin Chen, Chi Zhang, Fukun Yin, Zhuoyuan Li, Gang YU, Jiayuan Fan http://arxiv.org/pdf/2404.01700v1 null
2024-04-02 FashionEngine: Interactive Generation and Editing of 3D Clothed Humans FashionEngine:3D 服装人体的交互式生成和编辑 Tao Hu, Fangzhou Hong, Zhaoxi Chen, Ziwei Liu http://arxiv.org/pdf/2404.01655v1 null
2024-04-02 Diffusion Deepfake 扩散 Deepfake Chaitali Bhattacharyya, Hanxiao Wang, Feng Zhang, Sungho Kim, Xiatian Zhu http://arxiv.org/pdf/2404.01579v1 null

多模态

Publish Date Title Title_CN Authors PDF Code
2024-04-02 Segment Any 3D Object with Language 使用语言分割任何 3D 对象 Seungjun Lee, Yuyang Zhao, Gim Hee Lee http://arxiv.org/pdf/2404.02157v1 null
2024-04-02 ViTamin: Designing Scalable Vision Models in the Vision-Language Era ViTamin:在视觉语言时代设计可扩展的视觉模型 Jienneg Chen, Qihang Yu, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen http://arxiv.org/pdf/2404.02132v1 null
2024-04-02 IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT IISAN:通过解耦 PEFT 有效调整多模态表示以实现顺序推荐 Junchen Fu, Xuri Ge, Xin Xin, Alexandros Karatzoglou, Ioannis Arapakis, Jie Wang, Joemon M Jose http://arxiv.org/pdf/2404.02059v1 null
2024-04-02 Unleash the Potential of CLIP for Video Highlight Detection 释放 CLIP 在视频精彩片段检测方面的潜力 Donghoon Han, Seunghyeon Seo, Eunhwan Park, Seong-Uk Nam, Nojun Kwak http://arxiv.org/pdf/2404.01745v1 null
2024-04-02 PRISM-TopoMap: Online Topological Mapping with Place Recognition and Scan Matching PRISM-TopoMap:具有地点识别和扫描匹配功能的在线拓扑测绘 Kirill Muravyev, Alexander Melekhin, Dmitriy Yudin, Konstantin Yakovlev http://arxiv.org/pdf/2404.01674v1 null
2024-04-02 Leveraging YOLO-World and GPT-4V LMMs for Zero-Shot Person Detection and Action Recognition in Drone Imagery 利用 YOLO-World 和 GPT-4V LMM 进行无人机图像中的零样本人物检测和动作识别 Christian Limberg, Artur Gonçalves, Bastien Rigault, Helmut Prendinger http://arxiv.org/pdf/2404.01571v1 null
2024-04-02 mChartQA: A universal benchmark for multimodal Chart Question Answer based on Vision-Language Alignment and Reasoning mChartQA:基于视觉语言对齐和推理的多模式图表问答的通用基准 Jingxuan Wei, Nan Xu, Guiyong Chang, Yin Luo, BiHui Yu, Ruifeng Guo http://arxiv.org/pdf/2404.01548v1 null

Nerf

Publish Date Title Title_CN Authors PDF Code
2024-04-02 Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields Alpha 不变性:关于神经辐射场中距离和体积密度之间的逆缩放 Joshua Ahn, Haochen Wang, Raymond A. Yeh, Greg Shakhnarovich http://arxiv.org/pdf/2404.02155v1 null

3DGS

Publish Date Title Title_CN Authors PDF Code
2024-04-02 Surface Reconstruction from Gaussian Splatting via Novel Stereo Views 通过新颖的立体视图从高斯泼溅重建表面 Yaniv Wolf, Amit Bracha, Ron Kimmel http://arxiv.org/pdf/2404.01810v1 null

模型压缩/优化

Publish Date Title Title_CN Authors PDF Code
2024-04-02 Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners 预先训练的视觉和语言转换器是少样本增量学习器 Keon-Hee Park, Kyungwoo Song, Gyeong-Moon Park http://arxiv.org/pdf/2404.02117v1 null
2024-04-02 Minimize Quantization Output Error with Bias Compensation 通过偏置补偿最小化量化输出误差 Cheng Gong, Haoshuai Zheng, Mengting Hu, Zheng Lin, Deng-Ping Fan, Yuzhi Zhang, Tao Li http://arxiv.org/pdf/2404.01892v1 null
2024-04-02 AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation AddSR:通过对抗性扩散蒸馏加速基于扩散的盲超分辨率 Rui Xie, Ying Tai, Kai Zhang, Zhenyu Zhang, Jun Zhou, Jian Yang http://arxiv.org/pdf/2404.01717v1 null
2024-04-02 Task Integration Distillation for Object Detectors 目标检测器的任务集成蒸馏 Hai Su, ZhenWen Jian, Songsen Yu http://arxiv.org/pdf/2404.01699v1 null
2024-04-02 RefQSR: Reference-based Quantization for Image Super-Resolution Networks RefQSR:图像超分辨率网络的基于参考的量化 Hongjae Lee, Jun-Sang Yoo, Seung-Won Jung http://arxiv.org/pdf/2404.01690v1 null
2024-04-02 TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation TSCM:使用跨度量知识蒸馏的视觉位置识别师生模型 Yehui Shen, Mingmin Liu, Huimin Lu, Xieyuanli Chen http://arxiv.org/pdf/2404.01587v1 null

分类/检测/识别/分割/...

Publish Date Title Title_CN Authors PDF Code
2024-04-02 ResNet with Integrated Convolutional Block Attention Module for Ship Classification Using Transfer Learning on Optical Satellite Imagery 具有集成卷积块注意模块的 ResNet,用于在光学卫星图像上使用迁移学习进行船舶分类 Ryan Donghan Kwon, Gangjoo Robin Nam, Jisoo Tak, Yeom Hyeok, Junseob Shin, Hyerin Cha, Kim Soo Bin http://arxiv.org/pdf/2404.02135v1 null
2024-04-02 ImageNot: A contrast with ImageNet preserves model rankings ImageNot:与 ImageNet 的对比保留了模型排名 Olawale Salaudeen, Moritz Hardt http://arxiv.org/pdf/2404.02112v1 null
2024-04-02 BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition BRAVEn:改进视觉和听觉语音识别的自我监督预训练 Alexandros Haliassos, Andreas Zinonos, Rodrigo Mira, Stavros Petridis, Maja Pantic http://arxiv.org/pdf/2404.02098v1 null
2024-04-02 Adaptive Feature Fusion Neural Network for Glaucoma Segmentation on Unseen Fundus Images 用于看不见的眼底图像上的青光眼分割的自适应特征融合神经网络 Jiyuan Zhong, Hu Ke, Ming Yan http://arxiv.org/pdf/2404.02084v1 null
2024-04-02 EGTR: Extracting Graph from Transformer for Scene Graph Generation EGTR:从 Transformer 中提取图以生成场景图 Jinbae Im, JeongYeon Nam, Nokyung Park, Hyungmin Lee, Seunghyun Park http://arxiv.org/pdf/2404.02072v1 null
2024-04-02 Red-Teaming Segment Anything Model 红队细分任何模型 Krzysztof Jankowski, Bartlomiej Sobieski, Mateusz Kwiatkowski, Jakub Szulc, Michal Janik, Hubert Baniecki, Przemyslaw Biecek http://arxiv.org/pdf/2404.02067v1 null
2024-04-02 Multi-Level Label Correction by Distilling Proximate Patterns for Semi-supervised Semantic Segmentation 通过提取近似模式进行半监督语义分割的多级标签校正 Hui Xiao, Yuting Hong, Li Dong, Diqun Yan, Jiayan Zhuang, Junjie Xiong, Dongtai Liang, Chengbin Peng http://arxiv.org/pdf/2404.02065v1 null
2024-04-02 Cooperative Students: Navigating Unsupervised Domain Adaptation in Nighttime Object Detection 合作学生:在夜间物体检测中探索无监督域适应 Jicheng Yuan, Anh Le-Tuan, Manfred Hauswirth, Danh Le-Phuoc http://arxiv.org/pdf/2404.01988v1 null
2024-04-02 CAM-Based Methods Can See through Walls 基于 CAM 的方法可以看穿墙壁 Magamed Taimeskhanov, Ronan Sicre, Damien Garreau http://arxiv.org/pdf/2404.01964v1 null
2024-04-02 Automatic Wood Pith Detector: Local Orientation Estimation and Robust Accumulation 自动木髓检测器:局部方向估计和鲁棒累积 Henry Marichal, Diego Passarella, Gregory Randall http://arxiv.org/pdf/2404.01952v1 null
2024-04-02 Synthetic Data for Robust Stroke Segmentation 用于稳健笔画分割的综合数据 Liam Chalcroft, Ioannis Pappas, Cathy J. Price, John Ashburner http://arxiv.org/pdf/2404.01946v1 null
2024-04-02 Event-assisted Low-Light Video Object Segmentation 事件辅助低光视频对象分割 Hebei Li, Jin Wang, Jiahui Yuan, Yue Li, Wenming Weng, Yansong Peng, Yueyi Zhang, Zhiwei Xiong, Xiaoyan Sun http://arxiv.org/pdf/2404.01945v1 null
2024-04-02 PREGO: online mistake detection in PRocedural EGOcentric videos PREGO:程序性以自我为中心的视频中的在线错误检测 Alessandro Flaborea, Guido Maria D'Amely di Melendugno, Leonardo Plini, Luca Scofano, Edoardo De Matteis, Antonino Furnari, Giovanni Maria Farinella, Fabio Galasso http://arxiv.org/pdf/2404.01933v1 null
2024-04-02 Towards Enhanced Analysis of Lung Cancer Lesions in EBUS-TBNA -- A Semi-Supervised Video Object Detection Method EBUS-TBNA 中肺癌病灶的增强分析——一种半监督视频目标检测方法 Jyun-An Lin, Yun-Chien Cheng, Ching-Kai Lin http://arxiv.org/pdf/2404.01929v1 null
2024-04-02 Improving Bird's Eye View Semantic Segmentation by Task Decomposition 通过任务分解改进鸟瞰语义分割 Tianhao Zhao, Yongcan Chen, Yu Wu, Tianyang Liu, Bo Du, Peilun Xiao, Shi Qiu, Hongda Yang, Guozhen Li, Yi Yang, et.al. http://arxiv.org/pdf/2404.01925v1 null
2024-04-02 ASTRA: An Action Spotting TRAnsformer for Soccer Videos ASTRA:用于足球视频的动作识别 TRansformer Artur Xarles, Sergio Escalera, Thomas B. Moeslund, Albert Clapés http://arxiv.org/pdf/2404.01891v1 null
2024-04-02 Scene Adaptive Sparse Transformer for Event-based Object Detection 用于基于事件的对象检测的场景自适应稀疏变换器 Yansong Peng, Hebei Li, Yueyi Zhang, Xiaoyan Sun, Feng Wu http://arxiv.org/pdf/2404.01882v1 null
2024-04-02 Semi-Supervised Domain Adaptation for Wildfire Detection 用于野火检测的半监督域适应 JooYoung Jang, Youngseo Cha, Jisu Kim, SooHyung Lee, Geonu Lee, Minkook Cho, Young Hwang, Nojun Kwak http://arxiv.org/pdf/2404.01842v1 null
2024-04-02 Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection Sparse Semi-DETR:用于半监督目标检测的稀疏可学习查询 Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker, Muhammad Zeshan Afzal http://arxiv.org/pdf/2404.01819v1 null
2024-04-02 Rethinking Annotator Simulation: Realistic Evaluation of Whole-Body PET Lesion Interactive Segmentation Methods 重新思考注释器模拟:全身 PET 病变交互式分割方法的真实评估 Zdravko Marinov, Moon Kim, Jens Kleesiek, Rainer Stiefelhagen http://arxiv.org/pdf/2404.01816v1 null
2024-04-02 EventSleep: Sleep Activity Recognition with Event Cameras EventSleep:使用事件摄像头进行睡眠活动识别 Carlos Plou, Nerea Gallego, Alberto Sabater, Eduardo Montijano, Pablo Urcola, Luis Montesano, Ruben Martinez-Cantin, Ana C. Murillo http://arxiv.org/pdf/2404.01801v1 null
2024-04-02 Super-Resolution Analysis for Landfill Waste Classification 垃圾填埋场垃圾分类的超分辨率分析 Matias Molina, Rita P. Ribeiro, Bruno Veloso, João Gama http://arxiv.org/pdf/2404.01790v1 null
2024-04-02 A noisy elephant in the room: Is your out-of-distribution detector robust to label noise? 房间里的一头吵闹的大象:您的分布式检测器对标记噪声是否稳健? Galadrielle Humblot-Renaux, Sergio Escalera, Thomas B. Moeslund http://arxiv.org/pdf/2404.01775v1 null
2024-04-02 Guidelines for Cerebrovascular Segmentation: Managing Imperfect Annotations in the context of Semi-Supervised Learning 脑血管分割指南:在半监督学习的背景下管理不完美注释 Pierre Rougé, Pierre-Henri Conze, Nicolas Passat, Odyssée Merveille http://arxiv.org/pdf/2404.01765v1 null
2024-04-02 Atom-Level Optical Chemical Structure Recognition with Limited Supervision 有限监督下的原子级光学化学结构识别 Martijn Oldenhof, Edward De Brouwer, Adam Arany, Yves Moreau http://arxiv.org/pdf/2404.01743v1 null
2024-04-02 Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge 通过领域先验知识推广 6-DoF 抓取检测 Haoxiang Ma, Modi Shi, Boyang Gao, Di Huang http://arxiv.org/pdf/2404.01727v1 null
2024-04-02 Disentangled Pre-training for Human-Object Interaction Detection 人与物体交互检测的解缠预训练 Zhuolong Li, Xingao Li, Changxing Ding, Xiangmin Xu http://arxiv.org/pdf/2404.01725v1 null
2024-04-02 Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model Samba:利用状态空间模型对遥感图像进行语义分割 Qinfeng Zhu, Yuanzhi Cai, Yuan Fang, Yihan Yang, Cheng Chen, Lei Fan, Anh Nguyen http://arxiv.org/pdf/2404.01705v1 null
2024-04-02 Boosting Visual Recognition for Autonomous Driving in Real-world Degradations with Deep Channel Prior 利用深通道先验增强现实世界退化中自动驾驶的视觉识别 Zhanwen Liu, Yuhang Li, Yang Wang, Bolin Gao, Yisheng An, Xiangmo Zhao http://arxiv.org/pdf/2404.01703v1 null
2024-04-02 Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss 超越图像超分辨率,实现任务驱动感知损失的图像识别 Jaeha Kim, Junghun Oh, Kyoung Mu Lee http://arxiv.org/pdf/2404.01692v1 null
2024-04-02 JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments JRDB-PanoTrack:拥挤人类环境中的开放世界全景分割和跟踪机器人数据集 Duy-Tho Le, Chenhui Gou, Stavya Datta, Hengcan Shi, Ian Reid, Jianfei Cai, Hamid Rezatofighi http://arxiv.org/pdf/2404.01686v1 null
2024-04-02 A Universal Knowledge Embedded Contrastive Learning Framework for Hyperspectral Image Classification 用于高光谱图像分类的通用知识嵌入式对比学习框架 Quanwei Liu, Yanni Dong, Tao Huang, Lefei Zhang, Bo Do http://arxiv.org/pdf/2404.01673v1 null
2024-04-02 Supporting Mitosis Detection AI Training with Inter-Observer Eye-Gaze Consistencies 通过观察者间的眼睛注视一致性支持有丝分裂检测 AI 训练 Hongyan Gu, Zihan Yan, Ayesha Alvi, Brandon Day, Chunxu Yang, Zida Wu, Shino Magaki, Mohammad Haeri, Xiang 'Anthony' Chen http://arxiv.org/pdf/2404.01656v1 null
2024-04-02 A Closer Look at Spatial-Slice Features Learning for COVID-19 Detection 仔细研究用于 COVID-19 检测的空间切片特征学习 Chih-Chung Hsu, Chia-Ming Lee, Yang Fan Chiang, Yi-Shiuan Chou, Chih-Yu Jiang, Shen-Chieh Tai, Chi-Han Tsai http://arxiv.org/pdf/2404.01643v1 null
2024-04-02 Learning to Control Camera Exposure via Reinforcement Learning 通过强化学习学习控制相机曝光 Kyunghyun Lee, Ukcheol Shin, Byeong-Uk Lee http://arxiv.org/pdf/2404.01636v1 null
2024-04-02 LR-FPN: Enhancing Remote Sensing Object Detection with Location Refined Feature Pyramid Network LR-FPN:利用位置细化特征金字塔网络增强遥感目标检测 Hanqian Li, Ruinan Zhang, Ye Pan, Junchi Ren, Fei Shen http://arxiv.org/pdf/2404.01614v1 null
2024-04-02 Language Model Guided Interpretable Video Action Reasoning 语言模型引导的可解释视频动作推理 Ning Wang, Guangming Zhu, HS Li, Liang Zhang, Syed Afaq Ali Shah, Mohammed Bennamoun http://arxiv.org/pdf/2404.01591v1 null
2024-04-02 Learning Temporal Cues by Predicting Objects Move for Multi-camera 3D Object Detection 通过预测多摄像头 3D 对象检测的对象移动来学习时间线索 Seokha Moon, Hongbeen Park, Jungphil Kwon, Jaekoo Lee, Jinkyu Kim http://arxiv.org/pdf/2404.01580v1 null
2024-04-02 A Linear Time and Space Local Point Cloud Geometry Encoder via Vectorized Kernel Mixture (VecKM) 通过矢量化核混合 (VecKM) 的线性时间和空间局部点云几何编码器 Dehao Yuan, Cornelia Fermüller, Tahseen Rabbani, Furong Huang, Yiannis Aloimonos http://arxiv.org/pdf/2404.01568v1 null

OCR

Publish Date Title Title_CN Authors PDF Code
2024-04-02 Release of Pre-Trained Models for the Japanese Language 发布日语预训练模型 Kei Sawada, Tianyu Zhao, Makoto Shing, Kentaro Mitsui, Akio Kaga, Yukiya Hono, Toshiaki Wakatsuki, Koh Mitsuda http://arxiv.org/pdf/2404.01657v1 null

图像理解

Publish Date Title Title_CN Authors PDF Code
2024-04-02 Specularity Factorization for Low-Light Enhancement 用于低光增强的镜面分解 Saurabh Saini, P J Narayanan http://arxiv.org/pdf/2404.01998v1 null
2024-04-02 CSST Strong Lensing Preparation: a Framework for Detecting Strong Lenses in the Multi-color Imaging Survey by the China Survey Space Telescope (CSST) CSST强透镜准备:中国巡天太空望远镜(CSST)多色成像巡天强透镜探测框架 Xu Li, Ruiqi Sun, Jiameng Lv, Peng Jia, Nan Li, Chengliang Wei, Zou Hu, Xinzhong Er, Yun Chen, Zhang Ban, et.al. http://arxiv.org/pdf/2404.01780v1 null

Transformer

Publish Date Title Title_CN Authors PDF Code
2024-04-02 SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation SelfPose3d:自监督多人多视图 3d 姿势估计 Vinkle Srivastav, Keqi Chen, Nicolas Padoy http://arxiv.org/pdf/2404.02041v1 null
2024-04-02 DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning DELAN:通过跨模态对比学习实现视觉和语言导航的双层对齐 Mengfei Du, Binhao Wu, Jiwen Zhang, Zhihao Fan, Zejun Li, Ruipu Luo, Xuanjing Huang, Zhongyu Wei http://arxiv.org/pdf/2404.01994v1 link
2024-04-02 GEARS: Local Geometry-aware Hand-object Interaction Synthesis GEARS:局部几何感知的手部物体交互综合 Keyang Zhou, Bharat Lal Bhatnagar, Jan Eric Lenssen, Gerard Pons-moll http://arxiv.org/pdf/2404.01758v1 null
2024-04-02 ContrastCAD: Contrastive Learning-based Representation Learning for Computer-Aided Design Models ContrastCAD:计算机辅助设计模型的基于对比学习的表示学习 Minseop Jung, Minseong Kim, Jibum Kim http://arxiv.org/pdf/2404.01645v1 link
2024-04-02 WaveDH: Wavelet Sub-bands Guided ConvNet for Efficient Image Dehazing WaveDH:小波子带引导的 ConvNet 用于高效图像去雾 Seongmin Hwang, Daeyoung Han, Cheolkon Jung, Moongu Jeon http://arxiv.org/pdf/2404.01604v1 null
2024-04-02 Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining 用于图像去雨的双向多尺度隐式神经表示 Xiang Chen, Jinshan Pan, Jiangxin Dong http://arxiv.org/pdf/2404.01547v1 null

3D/CG

Publish Date Title Title_CN Authors PDF Code
2024-04-02 A discussion about violin reduction: geometric analysis of contour lines and channel of minima 关于小提琴还原的讨论:等高线和极小值通道的几何分析 Philémon Beghin, Anne-Emmanuelle Ceulemans, François Glineur http://arxiv.org/pdf/2404.01995v1 null
2024-04-02 Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation 使用神经辐射表示进行连续视觉语言导航的前瞻探索 Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Junjie Hu, Ming Jiang, Shuqiang Jiang http://arxiv.org/pdf/2404.01943v1 link
2024-04-02 LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging LPSNet:利用无透镜成像进行端到端人体姿势和形状估计 Haoyang Ge, Qiao Feng, Hailong Jia, Xiongzheng Li, Xiangjun Yin, You Zhou, Jingyu Yang, Kun Li http://arxiv.org/pdf/2404.01941v1 null
2024-04-02 Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation Sketch3D:草图到 3D 生成的风格一致指南 Wangguandong Zheng, Haifeng Xia, Rui Chen, Ming Shao, Siyu Xia, Zhengming Ding http://arxiv.org/pdf/2404.01843v1 null
2024-04-02 Spin-UP: Spin Light for Natural Light Uncalibrated Photometric Stereo Spin-UP:用于自然光未校准光度立体的旋转光 Zongrui Li, Zhan Lu, Haojie Yan, Boxin Shi, Gang Pan, Qian Zheng, Xudong Jiang http://arxiv.org/pdf/2404.01612v1 null
2024-04-02 Leveraging Digital Perceptual Technologies for Remote Perception and Analysis of Human Biomechanical Processes: A Contactless Approach for Workload and Joint Force Assessment 利用数字感知技术对人体生物力学过程进行远程感知和分析:工作负载和联合力量评估的非接触式方法 Jesudara Omidokun, Darlington Egeonu, Bochen Jia, Liang Yang http://arxiv.org/pdf/2404.01576v1 null
2024-04-02 Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes 具有网格锚定哈希表混合形状的高效 3D 隐式头部头像 Ziqian Bai, Feitong Tan, Sean Fanello, Rohit Pandey, Mingsong Dou, Shichen Liu, Ping Tan, Yinda Zhang http://arxiv.org/pdf/2404.01543v1 null

各类学习方式

Publish Date Title Title_CN Authors PDF Code
2024-04-02 Iterated Learning Improves Compositionality in Large Vision-Language Models 迭代学习提高了大型视觉语言模型的组合性 Chenhao Zheng, Jieyu Zhang, Aniruddha Kembhavi, Ranjay Krishna http://arxiv.org/pdf/2404.02145v1 null
2024-04-02 VLRM: Vision-Language Models act as Reward Models for Image Captioning VLRM:视觉语言模型充当图像字幕的奖励模型 Maksim Dzabraev, Alexander Kunitsyn, Andrei Ivaniuta http://arxiv.org/pdf/2404.01911v1 null
2024-04-02 RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement RAVE:用于 CLIP 引导背光图像增强的残余矢量嵌入 Tatiana Gaintseva, Marting Benning, Gregory Slabaugh http://arxiv.org/pdf/2404.01889v1 null
2024-04-02 Pairwise Similarity Distribution Clustering for Noisy Label Learning 用于噪声标签学习的成对相似度分布聚类 Sihan Bai http://arxiv.org/pdf/2404.01853v1 null
2024-04-02 T-VSL: Text-Guided Visual Sound Source Localization in Mixtures T-VSL:混合物中文本引导的视觉声源定位 Tanvir Mahmud, Yapeng Tian, Diana Marculescu http://arxiv.org/pdf/2404.01751v1 null
2024-04-02 Learning Equi-angular Representations for Online Continual Learning 学习在线持续学习的等角表示 Minhyuk Seo, Hyunseo Koh, Wonje Jeung, Minjae Lee, San Kim, Hankook Lee, Sungjun Cho, Sungik Choi, Hyunwoo Kim, Jonghyun Choi http://arxiv.org/pdf/2404.01628v1 null

其他

Publish Date Title Title_CN Authors PDF Code
2024-04-02 Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration 动态预训练:实现高效且可扩展的一体化图像恢复 Akshay Dudhane, Omkar Thawakar, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang http://arxiv.org/pdf/2404.02154v1 null
2024-04-02 CameraCtrl: Enabling Camera Control for Text-to-Video Generation CameraCtrl:启用相机控制以生成文本到视频 Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, Bo Dai, Hongsheng Li, Ceyuan Yang http://arxiv.org/pdf/2404.02101v1 link
2024-04-02 Causality-based Transfer of Driving Scenarios to Unseen Intersections 基于因果关系的驾驶场景到看不见的十字路口的转移 Christoph Glasmacher, Michael Schuldes, Sleiman El Masri, Lutz Eckstein http://arxiv.org/pdf/2404.02046v1 null
2024-04-02 Fashion Style Editing with Generative Human Prior 使用生成人类先验进行时尚风格编辑 Chaerin Kong, Seungyong Lee, Soohyeok Im, Wonsuk Yang http://arxiv.org/pdf/2404.01984v1 null
2024-04-02 Joint-Task Regularization for Partially Labeled Multi-Task Learning 部分标记多任务学习的联合任务正则化 Kento Nishi, Junsik Kim, Wanhua Li, Hanspeter Pfister http://arxiv.org/pdf/2404.01976v1 link
2024-04-02 Quantifying Noise of Dynamic Vision Sensor 量化动态视觉传感器的噪声 Evgeny V. Votyakov, Alessandro Artusi http://arxiv.org/pdf/2404.01948v1 null
2024-04-02 Toward Efficient Visual Gyroscopes: Spherical Moments, Harmonics Filtering, and Masking Techniques for Spherical Camera Applications 实现高效的视觉陀螺仪:球面力矩、谐波过滤和球形相机应用的掩蔽技术 Yao Du, Carlos M. Mateo, Mirjana Maras, Tsun-Hsuan Wang, Marc Blanchon, Alexander Amini, Daniela Rus, Omar Tahri http://arxiv.org/pdf/2404.01924v1 null
2024-04-02 Real, fake and synthetic faces - does the coin have three sides? 真面、假面和合成面——硬币有三个面吗? Shahzeb Naeem, Ramzi Al-Sharawi, Muhammad Riyyan Khan, Usman Tariq, Abhinav Dhall, Hasan Al-Nashash http://arxiv.org/pdf/2404.01878v1 null
2024-04-02 Exploring Latent Pathways: Enhancing the Interpretability of Autonomous Driving with a Variational Autoencoder 探索潜在路径:使用变分自动编码器增强自动驾驶的可解释性 Anass Bairouk, Mirjana Maras, Simon Herlin, Alexander Amini, Marc Blanchon, Ramin Hasani, Patrick Chareyre, Daniela Rus http://arxiv.org/pdf/2404.01750v1 null
2024-04-02 Global Mapping of Exposure and Physical Vulnerability Dynamics in Least Developed Countries using Remote Sensing and Machine Learning 利用遥感和机器学习绘制最不发达国家的全球暴露和物理脆弱性动态图 Joshua Dimasaka, Christian Geiß, Emily So http://arxiv.org/pdf/2404.01748v1 null
2024-04-02 Conjugate-Gradient-like Based Adaptive Moment Estimation Optimization Algorithm for Deep Learning 基于类共轭梯度的深度学习自适应矩估计优化算法 Jiawu Tian, Liwei Xu, Xiaowei Zhang, Yongqi Li http://arxiv.org/pdf/2404.01714v1 null
2024-04-02 AI WALKUP: A Computer-Vision Approach to Quantifying MDS-UPDRS in Parkinson's Disease AI WALKUP:量化帕金森病 MDS-UPDRS 的计算机视觉方法 Xiang Xiang, Zihan Zhang, Jing Ma, Yao Deng http://arxiv.org/pdf/2404.01654v1 null
2024-04-02 EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis EDTalk:情感头部合成的高效解缠 Shuai Tan, Bin Ji, Mengxiao Bi, Ye Pan http://arxiv.org/pdf/2404.01647v1 null
2024-04-02 Two-Phase Multi-Dose-Level PET Image Reconstruction with Dose Level Awareness 具有剂量水平感知功能的两阶段多剂量水平 PET 图像重建 Yuchen Fei, Yanmei Luo, Yan Wang, Jiaqi Cui, Yuanyuan Xu, Jiliu Zhou, Dinggang Shen http://arxiv.org/pdf/2404.01563v1 null