Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-02 | GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image | GeneAvatar:从单个图像编辑通用表达感知体积头部头像 | Chong Bao, Yinda Zhang, Yuan Li, Xiyu Zhang, Bangbang Yang, Hujun Bao, Marc Pollefeys, Guofeng Zhang, Zhaopeng Cui | http://arxiv.org/pdf/2404.02152v1 | null |
2024-04-02 | Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models | Diffusion$^2$:通过正交扩散模型的分数组合生成动态 3D 内容 | Zeyu Yang, Zijie Pan, Chun Gu, Li Zhang | http://arxiv.org/pdf/2404.02148v1 | null |
2024-04-02 | 3D Congealing: 3D-Aware Image Alignment in the Wild | 3D 凝结:野外 3D 感知图像对齐 | Yunzhi Zhang, Zizhang Li, Amit Raj, Andreas Engelhardt, Yuanzhen Li, Tingbo Hou, Jiajun Wu, Varun Jampani | http://arxiv.org/pdf/2404.02125v1 | null |
2024-04-02 | Neural Ordinary Differential Equation based Sequential Image Registration for Dynamic Characterization | 基于神经常微分方程的动态表征序列图像配准 | Yifan Wu, Mengjin Dong, Rohit Jena, Chen Qin, James C. Gee | http://arxiv.org/pdf/2404.02106v1 | null |
2024-04-02 | WcDT: World-centric Diffusion Transformer for Traffic Scene Generation | WcDT:用于生成交通场景的以世界为中心的扩散变压器 | Chen Yang, Aaron Xuxiang Tian, Dong Chen, Tianyu Shi, Arsalan Heydarian | http://arxiv.org/pdf/2404.02082v1 | null |
2024-04-02 | Bi-LORA: A Vision-Language Approach for Synthetic Image Detection | Bi-LORA:一种用于合成图像检测的视觉语言方法 | Mamadou Keita, Wassim Hamidouche, Hessen Bougueffa Eutamene, Abdenour Hadid, Abdelmalik Taleb-Ahmed | http://arxiv.org/pdf/2404.01959v1 | null |
2024-04-02 | 3D Scene Generation from Scene Graphs and Self-Attention | 从场景图生成 3D 场景和自注意力 | Pietro Bonazzi, Mengqi Wang, Diego Martin Arroyo, Fabian Manhardt, Federico Tombari | http://arxiv.org/pdf/2404.01887v1 | null |
2024-04-02 | Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model | 通过运动解耦扩散模型生成协同语音手势视频 | Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu | http://arxiv.org/pdf/2404.01862v1 | null |
2024-04-02 | Contextual Embedding Learning to Enhance 2D Networks for Volumetric Image Segmentation | 上下文嵌入学习增强 2D 网络的体积图像分割 | Zhuoyuan Wang, Dong Sun, Xiangyun Zeng, Ruodai Wu, Yi Wang | http://arxiv.org/pdf/2404.01723v1 | null |
2024-04-02 | Upsample Guidance: Scale Up Diffusion Models without Training | 上采样指导:无需训练即可扩大扩散模型 | Juno Hwang, Yong-Hyun Park, Junghyo Jo | http://arxiv.org/pdf/2404.01709v1 | null |
2024-04-02 | MotionChain: Conversational Motion Controllers via Multimodal Prompts | MotionChain:通过多模式提示的对话式运动控制器 | Biao Jiang, Xin Chen, Chi Zhang, Fukun Yin, Zhuoyuan Li, Gang YU, Jiayuan Fan | http://arxiv.org/pdf/2404.01700v1 | null |
2024-04-02 | FashionEngine: Interactive Generation and Editing of 3D Clothed Humans | FashionEngine:3D 服装人体的交互式生成和编辑 | Tao Hu, Fangzhou Hong, Zhaoxi Chen, Ziwei Liu | http://arxiv.org/pdf/2404.01655v1 | null |
2024-04-02 | Diffusion Deepfake | 扩散 Deepfake | Chaitali Bhattacharyya, Hanxiao Wang, Feng Zhang, Sungho Kim, Xiatian Zhu | http://arxiv.org/pdf/2404.01579v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-02 | Segment Any 3D Object with Language | 使用语言分割任何 3D 对象 | Seungjun Lee, Yuyang Zhao, Gim Hee Lee | http://arxiv.org/pdf/2404.02157v1 | null |
2024-04-02 | ViTamin: Designing Scalable Vision Models in the Vision-Language Era | ViTamin:在视觉语言时代设计可扩展的视觉模型 | Jienneg Chen, Qihang Yu, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen | http://arxiv.org/pdf/2404.02132v1 | null |
2024-04-02 | IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT | IISAN:通过解耦 PEFT 有效调整多模态表示以实现顺序推荐 | Junchen Fu, Xuri Ge, Xin Xin, Alexandros Karatzoglou, Ioannis Arapakis, Jie Wang, Joemon M Jose | http://arxiv.org/pdf/2404.02059v1 | null |
2024-04-02 | Unleash the Potential of CLIP for Video Highlight Detection | 释放 CLIP 在视频精彩片段检测方面的潜力 | Donghoon Han, Seunghyeon Seo, Eunhwan Park, Seong-Uk Nam, Nojun Kwak | http://arxiv.org/pdf/2404.01745v1 | null |
2024-04-02 | PRISM-TopoMap: Online Topological Mapping with Place Recognition and Scan Matching | PRISM-TopoMap:具有地点识别和扫描匹配功能的在线拓扑测绘 | Kirill Muravyev, Alexander Melekhin, Dmitriy Yudin, Konstantin Yakovlev | http://arxiv.org/pdf/2404.01674v1 | null |
2024-04-02 | Leveraging YOLO-World and GPT-4V LMMs for Zero-Shot Person Detection and Action Recognition in Drone Imagery | 利用 YOLO-World 和 GPT-4V LMM 进行无人机图像中的零样本人物检测和动作识别 | Christian Limberg, Artur Gonçalves, Bastien Rigault, Helmut Prendinger | http://arxiv.org/pdf/2404.01571v1 | null |
2024-04-02 | mChartQA: A universal benchmark for multimodal Chart Question Answer based on Vision-Language Alignment and Reasoning | mChartQA:基于视觉语言对齐和推理的多模式图表问答的通用基准 | Jingxuan Wei, Nan Xu, Guiyong Chang, Yin Luo, BiHui Yu, Ruifeng Guo | http://arxiv.org/pdf/2404.01548v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-02 | Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields | Alpha 不变性:关于神经辐射场中距离和体积密度之间的逆缩放 | Joshua Ahn, Haochen Wang, Raymond A. Yeh, Greg Shakhnarovich | http://arxiv.org/pdf/2404.02155v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-02 | Surface Reconstruction from Gaussian Splatting via Novel Stereo Views | 通过新颖的立体视图从高斯泼溅重建表面 | Yaniv Wolf, Amit Bracha, Ron Kimmel | http://arxiv.org/pdf/2404.01810v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-02 | Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners | 预先训练的视觉和语言转换器是少样本增量学习器 | Keon-Hee Park, Kyungwoo Song, Gyeong-Moon Park | http://arxiv.org/pdf/2404.02117v1 | null |
2024-04-02 | Minimize Quantization Output Error with Bias Compensation | 通过偏置补偿最小化量化输出误差 | Cheng Gong, Haoshuai Zheng, Mengting Hu, Zheng Lin, Deng-Ping Fan, Yuzhi Zhang, Tao Li | http://arxiv.org/pdf/2404.01892v1 | null |
2024-04-02 | AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation | AddSR:通过对抗性扩散蒸馏加速基于扩散的盲超分辨率 | Rui Xie, Ying Tai, Kai Zhang, Zhenyu Zhang, Jun Zhou, Jian Yang | http://arxiv.org/pdf/2404.01717v1 | null |
2024-04-02 | Task Integration Distillation for Object Detectors | 目标检测器的任务集成蒸馏 | Hai Su, ZhenWen Jian, Songsen Yu | http://arxiv.org/pdf/2404.01699v1 | null |
2024-04-02 | RefQSR: Reference-based Quantization for Image Super-Resolution Networks | RefQSR:图像超分辨率网络的基于参考的量化 | Hongjae Lee, Jun-Sang Yoo, Seung-Won Jung | http://arxiv.org/pdf/2404.01690v1 | null |
2024-04-02 | TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation | TSCM:使用跨度量知识蒸馏的视觉位置识别师生模型 | Yehui Shen, Mingmin Liu, Huimin Lu, Xieyuanli Chen | http://arxiv.org/pdf/2404.01587v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-02 | ResNet with Integrated Convolutional Block Attention Module for Ship Classification Using Transfer Learning on Optical Satellite Imagery | 具有集成卷积块注意模块的 ResNet,用于在光学卫星图像上使用迁移学习进行船舶分类 | Ryan Donghan Kwon, Gangjoo Robin Nam, Jisoo Tak, Yeom Hyeok, Junseob Shin, Hyerin Cha, Kim Soo Bin | http://arxiv.org/pdf/2404.02135v1 | null |
2024-04-02 | ImageNot: A contrast with ImageNet preserves model rankings | ImageNot:与 ImageNet 的对比保留了模型排名 | Olawale Salaudeen, Moritz Hardt | http://arxiv.org/pdf/2404.02112v1 | null |
2024-04-02 | BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition | BRAVEn:改进视觉和听觉语音识别的自我监督预训练 | Alexandros Haliassos, Andreas Zinonos, Rodrigo Mira, Stavros Petridis, Maja Pantic | http://arxiv.org/pdf/2404.02098v1 | null |
2024-04-02 | Adaptive Feature Fusion Neural Network for Glaucoma Segmentation on Unseen Fundus Images | 用于看不见的眼底图像上的青光眼分割的自适应特征融合神经网络 | Jiyuan Zhong, Hu Ke, Ming Yan | http://arxiv.org/pdf/2404.02084v1 | null |
2024-04-02 | EGTR: Extracting Graph from Transformer for Scene Graph Generation | EGTR:从 Transformer 中提取图以生成场景图 | Jinbae Im, JeongYeon Nam, Nokyung Park, Hyungmin Lee, Seunghyun Park | http://arxiv.org/pdf/2404.02072v1 | null |
2024-04-02 | Red-Teaming Segment Anything Model | 红队细分任何模型 | Krzysztof Jankowski, Bartlomiej Sobieski, Mateusz Kwiatkowski, Jakub Szulc, Michal Janik, Hubert Baniecki, Przemyslaw Biecek | http://arxiv.org/pdf/2404.02067v1 | null |
2024-04-02 | Multi-Level Label Correction by Distilling Proximate Patterns for Semi-supervised Semantic Segmentation | 通过提取近似模式进行半监督语义分割的多级标签校正 | Hui Xiao, Yuting Hong, Li Dong, Diqun Yan, Jiayan Zhuang, Junjie Xiong, Dongtai Liang, Chengbin Peng | http://arxiv.org/pdf/2404.02065v1 | null |
2024-04-02 | Cooperative Students: Navigating Unsupervised Domain Adaptation in Nighttime Object Detection | 合作学生:在夜间物体检测中探索无监督域适应 | Jicheng Yuan, Anh Le-Tuan, Manfred Hauswirth, Danh Le-Phuoc | http://arxiv.org/pdf/2404.01988v1 | null |
2024-04-02 | CAM-Based Methods Can See through Walls | 基于 CAM 的方法可以看穿墙壁 | Magamed Taimeskhanov, Ronan Sicre, Damien Garreau | http://arxiv.org/pdf/2404.01964v1 | null |
2024-04-02 | Automatic Wood Pith Detector: Local Orientation Estimation and Robust Accumulation | 自动木髓检测器:局部方向估计和鲁棒累积 | Henry Marichal, Diego Passarella, Gregory Randall | http://arxiv.org/pdf/2404.01952v1 | null |
2024-04-02 | Synthetic Data for Robust Stroke Segmentation | 用于稳健笔画分割的综合数据 | Liam Chalcroft, Ioannis Pappas, Cathy J. Price, John Ashburner | http://arxiv.org/pdf/2404.01946v1 | null |
2024-04-02 | Event-assisted Low-Light Video Object Segmentation | 事件辅助低光视频对象分割 | Hebei Li, Jin Wang, Jiahui Yuan, Yue Li, Wenming Weng, Yansong Peng, Yueyi Zhang, Zhiwei Xiong, Xiaoyan Sun | http://arxiv.org/pdf/2404.01945v1 | null |
2024-04-02 | PREGO: online mistake detection in PRocedural EGOcentric videos | PREGO:程序性以自我为中心的视频中的在线错误检测 | Alessandro Flaborea, Guido Maria D'Amely di Melendugno, Leonardo Plini, Luca Scofano, Edoardo De Matteis, Antonino Furnari, Giovanni Maria Farinella, Fabio Galasso | http://arxiv.org/pdf/2404.01933v1 | null |
2024-04-02 | Towards Enhanced Analysis of Lung Cancer Lesions in EBUS-TBNA -- A Semi-Supervised Video Object Detection Method | EBUS-TBNA 中肺癌病灶的增强分析——一种半监督视频目标检测方法 | Jyun-An Lin, Yun-Chien Cheng, Ching-Kai Lin | http://arxiv.org/pdf/2404.01929v1 | null |
2024-04-02 | Improving Bird's Eye View Semantic Segmentation by Task Decomposition | 通过任务分解改进鸟瞰语义分割 | Tianhao Zhao, Yongcan Chen, Yu Wu, Tianyang Liu, Bo Du, Peilun Xiao, Shi Qiu, Hongda Yang, Guozhen Li, Yi Yang, et.al. | http://arxiv.org/pdf/2404.01925v1 | null |
2024-04-02 | ASTRA: An Action Spotting TRAnsformer for Soccer Videos | ASTRA:用于足球视频的动作识别 TRansformer | Artur Xarles, Sergio Escalera, Thomas B. Moeslund, Albert Clapés | http://arxiv.org/pdf/2404.01891v1 | null |
2024-04-02 | Scene Adaptive Sparse Transformer for Event-based Object Detection | 用于基于事件的对象检测的场景自适应稀疏变换器 | Yansong Peng, Hebei Li, Yueyi Zhang, Xiaoyan Sun, Feng Wu | http://arxiv.org/pdf/2404.01882v1 | null |
2024-04-02 | Semi-Supervised Domain Adaptation for Wildfire Detection | 用于野火检测的半监督域适应 | JooYoung Jang, Youngseo Cha, Jisu Kim, SooHyung Lee, Geonu Lee, Minkook Cho, Young Hwang, Nojun Kwak | http://arxiv.org/pdf/2404.01842v1 | null |
2024-04-02 | Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection | Sparse Semi-DETR:用于半监督目标检测的稀疏可学习查询 | Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker, Muhammad Zeshan Afzal | http://arxiv.org/pdf/2404.01819v1 | null |
2024-04-02 | Rethinking Annotator Simulation: Realistic Evaluation of Whole-Body PET Lesion Interactive Segmentation Methods | 重新思考注释器模拟:全身 PET 病变交互式分割方法的真实评估 | Zdravko Marinov, Moon Kim, Jens Kleesiek, Rainer Stiefelhagen | http://arxiv.org/pdf/2404.01816v1 | null |
2024-04-02 | EventSleep: Sleep Activity Recognition with Event Cameras | EventSleep:使用事件摄像头进行睡眠活动识别 | Carlos Plou, Nerea Gallego, Alberto Sabater, Eduardo Montijano, Pablo Urcola, Luis Montesano, Ruben Martinez-Cantin, Ana C. Murillo | http://arxiv.org/pdf/2404.01801v1 | null |
2024-04-02 | Super-Resolution Analysis for Landfill Waste Classification | 垃圾填埋场垃圾分类的超分辨率分析 | Matias Molina, Rita P. Ribeiro, Bruno Veloso, João Gama | http://arxiv.org/pdf/2404.01790v1 | null |
2024-04-02 | A noisy elephant in the room: Is your out-of-distribution detector robust to label noise? | 房间里的一头吵闹的大象:您的分布式检测器对标记噪声是否稳健? | Galadrielle Humblot-Renaux, Sergio Escalera, Thomas B. Moeslund | http://arxiv.org/pdf/2404.01775v1 | null |
2024-04-02 | Guidelines for Cerebrovascular Segmentation: Managing Imperfect Annotations in the context of Semi-Supervised Learning | 脑血管分割指南:在半监督学习的背景下管理不完美注释 | Pierre Rougé, Pierre-Henri Conze, Nicolas Passat, Odyssée Merveille | http://arxiv.org/pdf/2404.01765v1 | null |
2024-04-02 | Atom-Level Optical Chemical Structure Recognition with Limited Supervision | 有限监督下的原子级光学化学结构识别 | Martijn Oldenhof, Edward De Brouwer, Adam Arany, Yves Moreau | http://arxiv.org/pdf/2404.01743v1 | null |
2024-04-02 | Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge | 通过领域先验知识推广 6-DoF 抓取检测 | Haoxiang Ma, Modi Shi, Boyang Gao, Di Huang | http://arxiv.org/pdf/2404.01727v1 | null |
2024-04-02 | Disentangled Pre-training for Human-Object Interaction Detection | 人与物体交互检测的解缠预训练 | Zhuolong Li, Xingao Li, Changxing Ding, Xiangmin Xu | http://arxiv.org/pdf/2404.01725v1 | null |
2024-04-02 | Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model | Samba:利用状态空间模型对遥感图像进行语义分割 | Qinfeng Zhu, Yuanzhi Cai, Yuan Fang, Yihan Yang, Cheng Chen, Lei Fan, Anh Nguyen | http://arxiv.org/pdf/2404.01705v1 | null |
2024-04-02 | Boosting Visual Recognition for Autonomous Driving in Real-world Degradations with Deep Channel Prior | 利用深通道先验增强现实世界退化中自动驾驶的视觉识别 | Zhanwen Liu, Yuhang Li, Yang Wang, Bolin Gao, Yisheng An, Xiangmo Zhao | http://arxiv.org/pdf/2404.01703v1 | null |
2024-04-02 | Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss | 超越图像超分辨率,实现任务驱动感知损失的图像识别 | Jaeha Kim, Junghun Oh, Kyoung Mu Lee | http://arxiv.org/pdf/2404.01692v1 | null |
2024-04-02 | JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments | JRDB-PanoTrack:拥挤人类环境中的开放世界全景分割和跟踪机器人数据集 | Duy-Tho Le, Chenhui Gou, Stavya Datta, Hengcan Shi, Ian Reid, Jianfei Cai, Hamid Rezatofighi | http://arxiv.org/pdf/2404.01686v1 | null |
2024-04-02 | A Universal Knowledge Embedded Contrastive Learning Framework for Hyperspectral Image Classification | 用于高光谱图像分类的通用知识嵌入式对比学习框架 | Quanwei Liu, Yanni Dong, Tao Huang, Lefei Zhang, Bo Do | http://arxiv.org/pdf/2404.01673v1 | null |
2024-04-02 | Supporting Mitosis Detection AI Training with Inter-Observer Eye-Gaze Consistencies | 通过观察者间的眼睛注视一致性支持有丝分裂检测 AI 训练 | Hongyan Gu, Zihan Yan, Ayesha Alvi, Brandon Day, Chunxu Yang, Zida Wu, Shino Magaki, Mohammad Haeri, Xiang 'Anthony' Chen | http://arxiv.org/pdf/2404.01656v1 | null |
2024-04-02 | A Closer Look at Spatial-Slice Features Learning for COVID-19 Detection | 仔细研究用于 COVID-19 检测的空间切片特征学习 | Chih-Chung Hsu, Chia-Ming Lee, Yang Fan Chiang, Yi-Shiuan Chou, Chih-Yu Jiang, Shen-Chieh Tai, Chi-Han Tsai | http://arxiv.org/pdf/2404.01643v1 | null |
2024-04-02 | Learning to Control Camera Exposure via Reinforcement Learning | 通过强化学习学习控制相机曝光 | Kyunghyun Lee, Ukcheol Shin, Byeong-Uk Lee | http://arxiv.org/pdf/2404.01636v1 | null |
2024-04-02 | LR-FPN: Enhancing Remote Sensing Object Detection with Location Refined Feature Pyramid Network | LR-FPN:利用位置细化特征金字塔网络增强遥感目标检测 | Hanqian Li, Ruinan Zhang, Ye Pan, Junchi Ren, Fei Shen | http://arxiv.org/pdf/2404.01614v1 | null |
2024-04-02 | Language Model Guided Interpretable Video Action Reasoning | 语言模型引导的可解释视频动作推理 | Ning Wang, Guangming Zhu, HS Li, Liang Zhang, Syed Afaq Ali Shah, Mohammed Bennamoun | http://arxiv.org/pdf/2404.01591v1 | null |
2024-04-02 | Learning Temporal Cues by Predicting Objects Move for Multi-camera 3D Object Detection | 通过预测多摄像头 3D 对象检测的对象移动来学习时间线索 | Seokha Moon, Hongbeen Park, Jungphil Kwon, Jaekoo Lee, Jinkyu Kim | http://arxiv.org/pdf/2404.01580v1 | null |
2024-04-02 | A Linear Time and Space Local Point Cloud Geometry Encoder via Vectorized Kernel Mixture (VecKM) | 通过矢量化核混合 (VecKM) 的线性时间和空间局部点云几何编码器 | Dehao Yuan, Cornelia Fermüller, Tahseen Rabbani, Furong Huang, Yiannis Aloimonos | http://arxiv.org/pdf/2404.01568v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-02 | Release of Pre-Trained Models for the Japanese Language | 发布日语预训练模型 | Kei Sawada, Tianyu Zhao, Makoto Shing, Kentaro Mitsui, Akio Kaga, Yukiya Hono, Toshiaki Wakatsuki, Koh Mitsuda | http://arxiv.org/pdf/2404.01657v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-02 | Specularity Factorization for Low-Light Enhancement | 用于低光增强的镜面分解 | Saurabh Saini, P J Narayanan | http://arxiv.org/pdf/2404.01998v1 | null |
2024-04-02 | CSST Strong Lensing Preparation: a Framework for Detecting Strong Lenses in the Multi-color Imaging Survey by the China Survey Space Telescope (CSST) | CSST强透镜准备:中国巡天太空望远镜(CSST)多色成像巡天强透镜探测框架 | Xu Li, Ruiqi Sun, Jiameng Lv, Peng Jia, Nan Li, Chengliang Wei, Zou Hu, Xinzhong Er, Yun Chen, Zhang Ban, et.al. | http://arxiv.org/pdf/2404.01780v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-02 | SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation | SelfPose3d:自监督多人多视图 3d 姿势估计 | Vinkle Srivastav, Keqi Chen, Nicolas Padoy | http://arxiv.org/pdf/2404.02041v1 | null |
2024-04-02 | DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning | DELAN:通过跨模态对比学习实现视觉和语言导航的双层对齐 | Mengfei Du, Binhao Wu, Jiwen Zhang, Zhihao Fan, Zejun Li, Ruipu Luo, Xuanjing Huang, Zhongyu Wei | http://arxiv.org/pdf/2404.01994v1 | link |
2024-04-02 | GEARS: Local Geometry-aware Hand-object Interaction Synthesis | GEARS:局部几何感知的手部物体交互综合 | Keyang Zhou, Bharat Lal Bhatnagar, Jan Eric Lenssen, Gerard Pons-moll | http://arxiv.org/pdf/2404.01758v1 | null |
2024-04-02 | ContrastCAD: Contrastive Learning-based Representation Learning for Computer-Aided Design Models | ContrastCAD:计算机辅助设计模型的基于对比学习的表示学习 | Minseop Jung, Minseong Kim, Jibum Kim | http://arxiv.org/pdf/2404.01645v1 | link |
2024-04-02 | WaveDH: Wavelet Sub-bands Guided ConvNet for Efficient Image Dehazing | WaveDH:小波子带引导的 ConvNet 用于高效图像去雾 | Seongmin Hwang, Daeyoung Han, Cheolkon Jung, Moongu Jeon | http://arxiv.org/pdf/2404.01604v1 | null |
2024-04-02 | Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining | 用于图像去雨的双向多尺度隐式神经表示 | Xiang Chen, Jinshan Pan, Jiangxin Dong | http://arxiv.org/pdf/2404.01547v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-02 | A discussion about violin reduction: geometric analysis of contour lines and channel of minima | 关于小提琴还原的讨论:等高线和极小值通道的几何分析 | Philémon Beghin, Anne-Emmanuelle Ceulemans, François Glineur | http://arxiv.org/pdf/2404.01995v1 | null |
2024-04-02 | Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation | 使用神经辐射表示进行连续视觉语言导航的前瞻探索 | Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Junjie Hu, Ming Jiang, Shuqiang Jiang | http://arxiv.org/pdf/2404.01943v1 | link |
2024-04-02 | LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging | LPSNet:利用无透镜成像进行端到端人体姿势和形状估计 | Haoyang Ge, Qiao Feng, Hailong Jia, Xiongzheng Li, Xiangjun Yin, You Zhou, Jingyu Yang, Kun Li | http://arxiv.org/pdf/2404.01941v1 | null |
2024-04-02 | Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation | Sketch3D:草图到 3D 生成的风格一致指南 | Wangguandong Zheng, Haifeng Xia, Rui Chen, Ming Shao, Siyu Xia, Zhengming Ding | http://arxiv.org/pdf/2404.01843v1 | null |
2024-04-02 | Spin-UP: Spin Light for Natural Light Uncalibrated Photometric Stereo | Spin-UP:用于自然光未校准光度立体的旋转光 | Zongrui Li, Zhan Lu, Haojie Yan, Boxin Shi, Gang Pan, Qian Zheng, Xudong Jiang | http://arxiv.org/pdf/2404.01612v1 | null |
2024-04-02 | Leveraging Digital Perceptual Technologies for Remote Perception and Analysis of Human Biomechanical Processes: A Contactless Approach for Workload and Joint Force Assessment | 利用数字感知技术对人体生物力学过程进行远程感知和分析:工作负载和联合力量评估的非接触式方法 | Jesudara Omidokun, Darlington Egeonu, Bochen Jia, Liang Yang | http://arxiv.org/pdf/2404.01576v1 | null |
2024-04-02 | Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes | 具有网格锚定哈希表混合形状的高效 3D 隐式头部头像 | Ziqian Bai, Feitong Tan, Sean Fanello, Rohit Pandey, Mingsong Dou, Shichen Liu, Ping Tan, Yinda Zhang | http://arxiv.org/pdf/2404.01543v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-02 | Iterated Learning Improves Compositionality in Large Vision-Language Models | 迭代学习提高了大型视觉语言模型的组合性 | Chenhao Zheng, Jieyu Zhang, Aniruddha Kembhavi, Ranjay Krishna | http://arxiv.org/pdf/2404.02145v1 | null |
2024-04-02 | VLRM: Vision-Language Models act as Reward Models for Image Captioning | VLRM:视觉语言模型充当图像字幕的奖励模型 | Maksim Dzabraev, Alexander Kunitsyn, Andrei Ivaniuta | http://arxiv.org/pdf/2404.01911v1 | null |
2024-04-02 | RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement | RAVE:用于 CLIP 引导背光图像增强的残余矢量嵌入 | Tatiana Gaintseva, Marting Benning, Gregory Slabaugh | http://arxiv.org/pdf/2404.01889v1 | null |
2024-04-02 | Pairwise Similarity Distribution Clustering for Noisy Label Learning | 用于噪声标签学习的成对相似度分布聚类 | Sihan Bai | http://arxiv.org/pdf/2404.01853v1 | null |
2024-04-02 | T-VSL: Text-Guided Visual Sound Source Localization in Mixtures | T-VSL:混合物中文本引导的视觉声源定位 | Tanvir Mahmud, Yapeng Tian, Diana Marculescu | http://arxiv.org/pdf/2404.01751v1 | null |
2024-04-02 | Learning Equi-angular Representations for Online Continual Learning | 学习在线持续学习的等角表示 | Minhyuk Seo, Hyunseo Koh, Wonje Jeung, Minjae Lee, San Kim, Hankook Lee, Sungjun Cho, Sungik Choi, Hyunwoo Kim, Jonghyun Choi | http://arxiv.org/pdf/2404.01628v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-02 | Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration | 动态预训练:实现高效且可扩展的一体化图像恢复 | Akshay Dudhane, Omkar Thawakar, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang | http://arxiv.org/pdf/2404.02154v1 | null |
2024-04-02 | CameraCtrl: Enabling Camera Control for Text-to-Video Generation | CameraCtrl:启用相机控制以生成文本到视频 | Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, Bo Dai, Hongsheng Li, Ceyuan Yang | http://arxiv.org/pdf/2404.02101v1 | link |
2024-04-02 | Causality-based Transfer of Driving Scenarios to Unseen Intersections | 基于因果关系的驾驶场景到看不见的十字路口的转移 | Christoph Glasmacher, Michael Schuldes, Sleiman El Masri, Lutz Eckstein | http://arxiv.org/pdf/2404.02046v1 | null |
2024-04-02 | Fashion Style Editing with Generative Human Prior | 使用生成人类先验进行时尚风格编辑 | Chaerin Kong, Seungyong Lee, Soohyeok Im, Wonsuk Yang | http://arxiv.org/pdf/2404.01984v1 | null |
2024-04-02 | Joint-Task Regularization for Partially Labeled Multi-Task Learning | 部分标记多任务学习的联合任务正则化 | Kento Nishi, Junsik Kim, Wanhua Li, Hanspeter Pfister | http://arxiv.org/pdf/2404.01976v1 | link |
2024-04-02 | Quantifying Noise of Dynamic Vision Sensor | 量化动态视觉传感器的噪声 | Evgeny V. Votyakov, Alessandro Artusi | http://arxiv.org/pdf/2404.01948v1 | null |
2024-04-02 | Toward Efficient Visual Gyroscopes: Spherical Moments, Harmonics Filtering, and Masking Techniques for Spherical Camera Applications | 实现高效的视觉陀螺仪:球面力矩、谐波过滤和球形相机应用的掩蔽技术 | Yao Du, Carlos M. Mateo, Mirjana Maras, Tsun-Hsuan Wang, Marc Blanchon, Alexander Amini, Daniela Rus, Omar Tahri | http://arxiv.org/pdf/2404.01924v1 | null |
2024-04-02 | Real, fake and synthetic faces - does the coin have three sides? | 真面、假面和合成面——硬币有三个面吗? | Shahzeb Naeem, Ramzi Al-Sharawi, Muhammad Riyyan Khan, Usman Tariq, Abhinav Dhall, Hasan Al-Nashash | http://arxiv.org/pdf/2404.01878v1 | null |
2024-04-02 | Exploring Latent Pathways: Enhancing the Interpretability of Autonomous Driving with a Variational Autoencoder | 探索潜在路径:使用变分自动编码器增强自动驾驶的可解释性 | Anass Bairouk, Mirjana Maras, Simon Herlin, Alexander Amini, Marc Blanchon, Ramin Hasani, Patrick Chareyre, Daniela Rus | http://arxiv.org/pdf/2404.01750v1 | null |
2024-04-02 | Global Mapping of Exposure and Physical Vulnerability Dynamics in Least Developed Countries using Remote Sensing and Machine Learning | 利用遥感和机器学习绘制最不发达国家的全球暴露和物理脆弱性动态图 | Joshua Dimasaka, Christian Geiß, Emily So | http://arxiv.org/pdf/2404.01748v1 | null |
2024-04-02 | Conjugate-Gradient-like Based Adaptive Moment Estimation Optimization Algorithm for Deep Learning | 基于类共轭梯度的深度学习自适应矩估计优化算法 | Jiawu Tian, Liwei Xu, Xiaowei Zhang, Yongqi Li | http://arxiv.org/pdf/2404.01714v1 | null |
2024-04-02 | AI WALKUP: A Computer-Vision Approach to Quantifying MDS-UPDRS in Parkinson's Disease | AI WALKUP:量化帕金森病 MDS-UPDRS 的计算机视觉方法 | Xiang Xiang, Zihan Zhang, Jing Ma, Yao Deng | http://arxiv.org/pdf/2404.01654v1 | null |
2024-04-02 | EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis | EDTalk:情感头部合成的高效解缠 | Shuai Tan, Bin Ji, Mengxiao Bi, Ye Pan | http://arxiv.org/pdf/2404.01647v1 | null |
2024-04-02 | Two-Phase Multi-Dose-Level PET Image Reconstruction with Dose Level Awareness | 具有剂量水平感知功能的两阶段多剂量水平 PET 图像重建 | Yuchen Fei, Yanmei Luo, Yan Wang, Jiaqi Cui, Yuanyuan Xu, Jiliu Zhou, Dinggang Shen | http://arxiv.org/pdf/2404.01563v1 | null |