Skip to content

Latest commit

 

History

History
executable file
·
144 lines (121 loc) · 24.5 KB

2024-05-22.md

File metadata and controls

executable file
·
144 lines (121 loc) · 24.5 KB

[UPDATED!] 2024-05-22 (Publish Time)

生成模型

Publish Date Title Title_CN Authors PDF Code
2024-05-22 MagicPose4D: Crafting Articulated Models with Appearance and Motion Control MagicPose4D:制作具有外观和运动控制的铰接模型 Hao Zhang, Di Chang, Fang Li, Mohammad Soleymani, Narendra Ahuja http://arxiv.org/pdf/2405.14017v1 null
2024-05-22 Learning Latent Space Hierarchical EBM Diffusion Models 学习潜在空间分层 EBM 扩散模型 Jiali Cui, Tian Han http://arxiv.org/pdf/2405.13910v1 null
2024-05-22 FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition FreeCustom:用于多概念合成的免调整定制图像生成 Ganggui Ding, Canyu Zhao, Wen Wang, Zhen Yang, Zide Liu, Hao Chen, Chunhua Shen http://arxiv.org/pdf/2405.13870v1 null
2024-05-22 ReVideo: Remake a Video with Motion and Content Control ReVideo:通过动作和内容控制重新制作视频 Chong Mou, Mingdeng Cao, Xintao Wang, Zhaoyang Zhang, Ying Shan, Jian Zhang http://arxiv.org/pdf/2405.13865v1 null
2024-05-22 Robust Disaster Assessment from Aerial Imagery Using Text-to-Image Synthetic Data 使用文本到图像合成数据对航空影像进行稳健的灾害评估 Tarun Kalluri, Jihyeon Lee, Kihyuk Sohn, Sahil Singla, Manmohan Chandraker, Joseph Xu, Jeremiah Liu http://arxiv.org/pdf/2405.13779v1 null
2024-05-22 A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation 用于视听生成的混合噪声级多功能扩散变压器 Gwanghyun Kim, Alonso Martinez, Yu-Chuan Su, Brendan Jou, José Lezama, Agrim Gupta, Lijun Yu, Lu Jiang, Aren Jansen, Jacob Walker, et.al. http://arxiv.org/pdf/2405.13762v1 null
2024-05-22 ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models ComboStoc:扩散生成模型的组合随机性 Rui Xu, Jiepeng Wang, Hao Pan, Yang Liu, Xin Tong, Shiqing Xin, Changhe Tu, Taku Komura, Wenping Wang http://arxiv.org/pdf/2405.13729v1 null
2024-05-22 InstaDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos InstaDrag:从视频中实现快速、准确的基于拖动的图像编辑 Yujun Shi, Jun Hao Liew, Hanshu Yan, Vincent Y. F. Tan, Jiashi Feng http://arxiv.org/pdf/2405.13722v1 null
2024-05-22 Prompt Mixing in Diffusion Models using the Black Scholes Algorithm 使用 Black Scholes 算法在扩散模型中进行提示混合 Divya Kothandaraman, Ming Lin, Dinesh Manocha http://arxiv.org/pdf/2405.13685v1 null
2024-05-22 Curriculum Direct Preference Optimization for Diffusion and Consistency Models 扩散和一致性模型的课程直接偏好优化 Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Nicu Sebe, Mubarak Shah http://arxiv.org/pdf/2405.13637v1 null
2024-05-22 MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation MetaEarth:全球范围遥感图像生成的生成基础模型 Zhiping Yu, Chenyang Liu, Liqin Liu, Zhenwei Shi, Zhengxia Zou http://arxiv.org/pdf/2405.13570v1 null
2024-05-22 MotionCraft: Physics-based Zero-Shot Video Generation MotionCraft:基于物理的零镜头视频生成 Luca Savant Aira, Antonio Montanaro, Emanuele Aiello, Diego Valsesia, Enrico Magli http://arxiv.org/pdf/2405.13557v1 null
2024-05-22 Directly Denoising Diffusion Model 直接去噪扩散模型 Dan Zhang, Jingjing Wang, Feng Luo http://arxiv.org/pdf/2405.13540v1 null
2024-05-22 Class-Conditional self-reward mechanism for improved Text-to-Image models 用于改进文本到图像模型的类条件自我奖励机制 Safouane El Ghazouali, Arnaud Gucciardi, Umberto Michelucci http://arxiv.org/pdf/2405.13473v1 null
2024-05-22 Markerless retro-identification complements re-identification of individual insect subjects in archived image data of biological experiments 无标记逆向识别补充了生物实验存档图像数据中个体昆虫受试者的重新识别 Asaduz Zaman, Vanessa Kellermann, Alan Dorin http://arxiv.org/pdf/2405.13376v1 null
2024-05-22 How to Trace Latent Generative Model Generated Images without Artificial Watermark? 如何在没有人工水印的情况下追踪潜在生成模型生成的图像? Zhenting Wang, Vikash Sehwag, Chen Chen, Lingjuan Lyu, Dimitris N. Metaxas, Shiqing Ma http://arxiv.org/pdf/2405.13360v1 null
2024-05-22 Single color virtual H&E staining with In-and-Out Net 使用进出网进行单色虚拟 H&E 染色 Mengkun Chen, Yen-Tung Liu, Fadeel Sher Khan, Matthew C. Fox, Jason S. Reichenberg, Fabiana C. P. S. Lopes, Katherine R. Sebastian, Mia K. Markey, James W. Tunnell http://arxiv.org/pdf/2405.13278v1 null

多模态

Publish Date Title Title_CN Authors PDF Code
2024-05-22 I2I-Mamba: Multi-modal medical image synthesis via selective state space modeling I2I-Mamba:通过选择性状态空间建模实现多模态医学图像合成 Omer F. Atli, Bilal Kabas, Fuat Arslan, Mahmut Yurt, Onat Dalmaz, Tolga Çukur http://arxiv.org/pdf/2405.14022v1 null
2024-05-22 BrainMorph: A Foundational Keypoint Model for Robust and Flexible Brain MRI Registration BrainMorph:稳健且灵活的脑 MRI 配准的基础关键点模型 Alan Q. Wang, Rachit Saluja, Heejong Kim, Xinzi He, Adrian Dalca, Mert R. Sabuncu http://arxiv.org/pdf/2405.14019v1 null
2024-05-22 PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery PitVQA:基于图像的文本嵌入法学硕士,用于垂体手术中的视觉问答 Runlong He, Mengya Xu, Adrito Das, Danyal Z. Khan, Sophia Bano, Hani J. Marcus, Danail Stoyanov, Matthew J. Clarkson, Mobarakol Islam http://arxiv.org/pdf/2405.13949v1 null
2024-05-22 Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models 多模态大语言模型中视觉推理细化的思维图像提示 Qiji Zhou, Ruochen Zhou, Zike Hu, Panzhong Lu, Siyang Gao, Yue Zhang http://arxiv.org/pdf/2405.13872v1 null
2024-05-22 Dense Connector for MLLMs 用于 MLLM 的密集连接器 Huanjin Yao, Wenhao Wu, Taojiannan Yang, YuXin Song, Mengxi Zhang, Haocheng Feng, Yifan Sun, Zhiheng Li, Wanli Ouyang, Jingdong Wang http://arxiv.org/pdf/2405.13800v1 null
2024-05-22 No Filter: Cultural and Socioeconomic Diversityin Contrastive Vision-Language Models 无过滤:对比视觉语言模型中的文化和社会经济多样性 Angéline Pouget, Lucas Beyer, Emanuele Bugliarello, Xiao Wang, Andreas Peter Steiner, Xiaohua Zhai, Ibrahim Alabdulmohsin http://arxiv.org/pdf/2405.13777v1 null
2024-05-22 Safety Alignment for Vision Language Models 视觉语言模型的安全对齐 Zhendong Liu, Yuanbi Nie, Yingshui Tan, Xiangyu Yue, Qiushi Cui, Chongjun Wang, Xiaoyong Zhu, Bo Zheng http://arxiv.org/pdf/2405.13581v1 null
2024-05-22 Cross-Modal Distillation in Industrial Anomaly Detection: Exploring Efficient Multi-Modal IAD 工业异常检测中的跨模态蒸馏:探索高效的多模态 IAD Wenbo Sui, Daniel Lichau, Josselin Lefèvre, Harold Phelippeau http://arxiv.org/pdf/2405.13571v1 null
2024-05-22 Adapting Multi-modal Large Language Model to Concept Drift in the Long-tailed Open World 多模态大语言模型适应长尾开放世界中的概念漂移 Xiaoyu Yang, Jie Lu, En Yu http://arxiv.org/pdf/2405.13459v1 null

Nerf

Publish Date Title Title_CN Authors PDF Code
2024-05-22 DoGaussian: Distributed-Oriented Gaussian Splatting for Large-Scale 3D Reconstruction Via Gaussian Consensus DoGaussian:通过高斯一致性进行大规模 3D 重建的分布式高斯泼溅 Yu Chen, Gim Hee Lee http://arxiv.org/pdf/2405.13943v1 null
2024-05-22 Gaussian Time Machine: A Real-Time Rendering Methodology for Time-Variant Appearances 高斯时间机器:时变外观的实时渲染方法 Licheng Shen, Ho Ngai Chow, Lingyun Wang, Tong Zhang, Mengqiu Wang, Yuxing Han http://arxiv.org/pdf/2405.13694v1 null

模型压缩/优化

Publish Date Title Title_CN Authors PDF Code
2024-05-22 Two Heads are Better Than One: Neural Networks Quantization with 2D Hilbert Curve-based Output Representation 两个头比一个头更好:具有基于 2D 希尔伯特曲线的输出表示的神经网络量化 Mykhailo Uss, Ruslan Yermolenko, Olena Kolodiazhna, Oleksii Shashko, Ivan Safonov, Volodymyr Savin, Yoonjae Yeo, Seowon Ji, Jaeyun Jeong http://arxiv.org/pdf/2405.14024v1 null
2024-05-22 DCT-Based Decorrelated Attention for Vision Transformers 基于 DCT 的视觉变换器去相关注意力 Hongyi Pan, Emadeldeen Hamdan, Xin Zhu, Koushik Biswas, Ahmet Cetin, Ulas Bagci http://arxiv.org/pdf/2405.13901v1 null
2024-05-22 QGait: Toward Accurate Quantization for Gait Recognition with Binarized Input QGait:通过二值化输入实现步态识别的精确量化 Senmao Tian, Haoyu Gao, Gangyi Hong, Shuyun Wang, JingJie Wang, Xin Yu, Shunli Zhang http://arxiv.org/pdf/2405.13859v1 null
2024-05-22 Low-Resolution Chest X-ray Classification via Knowledge Distillation and Multi-task Learning 通过知识蒸馏和多任务学习进行低分辨率胸部 X 射线分类 Yasmeena Akhter, Rishabh Ranjan, Richa Singh, Mayank Vatsa http://arxiv.org/pdf/2405.13370v1 null

分类/检测/识别/分割/...

Publish Date Title Title_CN Authors PDF Code
2024-05-22 Learning rigid-body simulators over implicit shapes for large-scale scenes and vision 针对大规模场景和视觉,通过隐式形状学习刚体模拟器 Yulia Rubanova, Tatiana Lopez-Guevara, Kelsey R. Allen, William F. Whitney, Kimberly Stachenfeld, Tobias Pfaff http://arxiv.org/pdf/2405.14045v1 null
2024-05-22 One-shot Training for Video Object Segmentation 视频对象分割的一次性训练 Baiyu Chen, Sixian Chan, Xiaoqin Zhang http://arxiv.org/pdf/2405.14010v1 null
2024-05-22 AutoLCZ: Towards Automatized Local Climate Zone Mapping from Rule-Based Remote Sensing AutoLCZ:通过基于规则的遥感实现自动化当地气候带测绘 Chenying Liu, Hunsoo Song, Anamika Shreevastava, Conrad M Albrecht http://arxiv.org/pdf/2405.13993v1 null
2024-05-22 TS40K: a 3D Point Cloud Dataset of Rural Terrain and Electrical Transmission System TS40K:农村地形和电力传输系统的 3D 点云数据集 Diogo Lavado, Cláudia Soares, Alessandra Micheletti, Ricardo Santos, André Coelho, João Santos http://arxiv.org/pdf/2405.13989v1 null
2024-05-22 LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate LookHere:具有定向注意力的视觉变换器进行概括和推断 Anthony Fuller, Daniel G. Kyrollos, Yousef Yassin, James R. Green http://arxiv.org/pdf/2405.13985v1 null
2024-05-22 Optimizing Curvature Learning for Robust Hyperbolic Deep Learning in Computer Vision 优化曲率学习以实现计算机视觉中的鲁棒双曲深度学习 Ahmad Bdeir, Niels Landwehr http://arxiv.org/pdf/2405.13979v1 null
2024-05-22 ST-Gait++: Leveraging spatio-temporal convolutions for gait-based emotion recognition on videos ST-Gait++:利用时空卷积实现基于步态的视频情绪识别 Maria Luísa Lima, Willams de Lima Costa, Estefania Talavera Martinez, Veronica Teichrieb http://arxiv.org/pdf/2405.13903v1 null
2024-05-22 A General Framework for Jersey Number Recognition in Sports Video 体育视频中球衣号码识别的通用框架 Maria Koshkina, James H. Elder http://arxiv.org/pdf/2405.13896v1 null
2024-05-22 Just rotate it! Uncertainty estimation in closed-source models via multiple queries 只需旋转它即可!通过多个查询估计闭源模型的不确定性 Konstantinos Pitas, Julyan Arbel http://arxiv.org/pdf/2405.13864v1 null
2024-05-22 Hyperspectral Image Reconstruction for Predicting Chick Embryo Mortality Towards Advancing Egg and Hatchery Industry 用于预测鸡胚死亡率的高光谱图像重建,促进鸡蛋和孵化行业的发展 Md. Toukir Ahmed, Md Wadud Ahmed, Ocean Monjur, Jason Lee Emmert, Girish Chowdhary, Mohammed Kamruzzaman http://arxiv.org/pdf/2405.13843v1 null
2024-05-22 Multi-Dataset Multi-Task Learning for COVID-19 Prognosis 用于 COVID-19 预后的多数据集多任务学习 Filippo Ruffini, Lorenzo Tronchin, Zhuoru Wu, Wenting Chen, Paolo Soda, Linlin Shen, Valerio Guarrasi http://arxiv.org/pdf/2405.13771v1 null
2024-05-22 Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks 神经网络中基于反事实梯度的预测信任量化 Mohit Prabhushankar, Ghassan AlRegib http://arxiv.org/pdf/2405.13758v1 null
2024-05-22 A label-free and data-free training strategy for vasculature segmentation in serial sectioning OCT data 连续切片 OCT 数据中脉管系统分割的无标签和无数据训练策略 Etienne Chollet, Yael Balbastre, Caroline Magnain, Bruce Fischl, Hui Wang http://arxiv.org/pdf/2405.13757v1 null
2024-05-22 Optimizing Lymphocyte Detection in Breast Cancer Whole Slide Imaging through Data-Centric Strategies 通过以数据为中心的策略优化乳腺癌全玻片成像中的淋巴细胞检测 Amine Marzouki, Zhuxian Guo, Qinghe Zeng, Camille Kurtz, Nicolas Loménie http://arxiv.org/pdf/2405.13710v1 null
2024-05-22 Embedding Generalized Semantic Knowledge into Few-Shot Remote Sensing Segmentation 将广义语义知识嵌入到少样本遥感分割中 Yuyu Jia, Wei Huang, Junyu Gao, Qi Wang, Qiang Li http://arxiv.org/pdf/2405.13686v1 null
2024-05-22 Ultra-Fast Adaptive Track Detection Network 超快速自适应轨迹检测网络 Hai Ni, Rui Wang, Scarlett Liu http://arxiv.org/pdf/2405.13538v1 null
2024-05-22 PerSense: Personalized Instance Segmentation in Dense Images PerSense:密集图像中的个性化实例分割 Muhammad Ibraheem Siddiqui, Muhammad Umer Sheikh, Hassan Abid, Muhammad Haris Khan http://arxiv.org/pdf/2405.13518v1 null
2024-05-22 Continual Learning in Medical Imaging from Theory to Practice: A Survey and Practical Analysis 医学影像从理论到实践的持续学习:调查与实践分析 Mohammad Areeb Qazi, Anees Ur Rehman Hashmi, Santosh Sanjeev, Ibrahim Almakky, Numan Saeed, Mohammad Yaqub http://arxiv.org/pdf/2405.13482v1 null
2024-05-22 AdaFedFR: Federated Face Recognition with Adaptive Inter-Class Representation Learning AdaFedFR:具有自适应类间表示学习的联合人脸识别 Di Qiu, Xinyang Lin, Kaiye Wang, Xiangxiang Chu, Pengfei Yan http://arxiv.org/pdf/2405.13467v1 null
2024-05-22 A Label Propagation Strategy for CutMix in Multi-Label Remote Sensing Image Classification 多标签遥感图像分类中 CutMix 的标签传播策略 Tom Burgert, Tim Siebert, Kai Norman Clasen, Begüm Demir http://arxiv.org/pdf/2405.13451v1 null
2024-05-22 Dynamically enhanced static handwriting representation for Parkinson's disease detection 用于帕金森病检测的动态增强静态手写表示 Moises Diaz, Miguel Angel Ferrer, Donato Impedovo, Giuseppe Pirlo, Gennaro Vessio http://arxiv.org/pdf/2405.13438v1 null
2024-05-22 Multi Player Tracking in Ice Hockey with Homographic Projections 使用单应投影进行冰球多人跟踪 Harish Prakash, Jia Cheng Shang, Ken M. Nsiempba, Yuhao Chen, David A. Clausi, John S. Zelek http://arxiv.org/pdf/2405.13397v1 null
2024-05-22 Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation 使用语言视觉提示进行低数据实例分割的无监督预训练 Dingwen Zhang, Hao Li, Diqi He, Nian Liu, Lechao Cheng, Jingdong Wang, Junwei Han http://arxiv.org/pdf/2405.13388v1 null
2024-05-22 VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding VTG-LLM:将时间戳知识集成到视频 LLM 中以增强视频时间基础 Yongxin Guo, Jingyu Liu, Mingda Li, Xiaoying Tang, Xi Chen, Bo Zhao http://arxiv.org/pdf/2405.13382v1 null
2024-05-22 Collaboration of Teachers for Semi-supervised Object Detection 教师协作进行半监督目标检测 Liyu Chen, Huaao Tang, Yi Wen, Hanting Chen, Wei Li, Junchao Liu, Jie Hu http://arxiv.org/pdf/2405.13374v1 null
2024-05-22 Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer 语义公平聚类:Vision Transformer 的简单、快速且有效的策略 Qihang Fan, Huaibo Huang, Mingrui Chen, Ran He http://arxiv.org/pdf/2405.13337v1 null
2024-05-22 Vision Transformer with Sparse Scan Prior 具有稀疏扫描先验的视觉变换器 Qihang Fan, Huaibo Huang, Mingrui Chen, Ran He http://arxiv.org/pdf/2405.13335v1 null
2024-05-22 Hybrid Multihead Attentive Unet-3D for Brain Tumor Segmentation 用于脑肿瘤分割的混合多头 Attentive Unet-3D Muhammad Ansab Butt, Absaar Ul Jabbar http://arxiv.org/pdf/2405.13304v1 null
2024-05-22 Enhancing Active Learning for Sentinel 2 Imagery through Contrastive Learning and Uncertainty Estimation 通过对比学习和不确定性估计增强 Sentinel 2 图像的主动学习 David Pogorzelski, Peter Arlinghaus http://arxiv.org/pdf/2405.13285v1 null
2024-05-22 FLARE up your data: Diffusion-based Augmentation Method in Astronomical Imaging 闪耀您的数据:天文成像中基于扩散的增强方法 Mohammed Talha Alam, Raza Imam, Mohsen Guizani, Fakhri Karray http://arxiv.org/pdf/2405.13267v1 null
2024-05-22 Traffic control using intelligent timing of traffic lights with reinforcement learning technique and real-time processing of surveillance camera images 利用强化学习技术和实时处理监控摄像头图像的交通信号灯智能定时进行交通控制 Mahdi Jamebozorg, Mohsen Hami, Sajjad Deh Deh Jani http://arxiv.org/pdf/2405.13256v1 null

图像理解

Publish Date Title Title_CN Authors PDF Code
2024-05-22 Computer-Vision-Enabled Worker Video Analysis for Motion Amount Quantification 用于运动量量化的计算机视觉工人视频分析 Hari Iyer, Neel Macwan, Shenghan Guo, Heejin Jeong http://arxiv.org/pdf/2405.13999v1 null
2024-05-22 Addressing the Elephant in the Room: Robust Animal Re-Identification with Unsupervised Part-Based Feature Alignment 解决房间里的大象:鲁棒的动物重新识别与无监督的基于部分的特征对齐 Yingxue Yu, Vidit Vidit, Andrey Davydov, Martin Engilberge, Pascal Fua http://arxiv.org/pdf/2405.13781v1 null

LLM

Publish Date Title Title_CN Authors PDF Code
2024-05-22 TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment TOPA:通过纯文本预对齐扩展用于视频理解的大型语言模型 Wei Li, Hehe Fan, Yongkang Wong, Mohan Kankanhalli, Yi Yang http://arxiv.org/pdf/2405.13911v1 null

Transformer

Publish Date Title Title_CN Authors PDF Code
2024-05-22 Mitigating Interference in the Knowledge Continuum through Attention-Guided Incremental Learning 通过注意力引导的增量学习减轻知识连续体中的干扰 Prashant Bhat, Bharath Renjith, Elahe Arani, Bahram Zonooz http://arxiv.org/pdf/2405.13978v1 null
2024-05-22 Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching 基于仿射的可变形注意力和半密集匹配的选择性融合 Hongkai Chen, Zixin Luo, Yurun Tian, Xuyang Bai, Ziyu Wang, Lei Zhou, Mingmin Zhen, Tian Fang, David McKinnon, Yanghai Tsin, et.al. http://arxiv.org/pdf/2405.13874v1 null
2024-05-22 MAGIC: Map-Guided Few-Shot Audio-Visual Acoustics Modeling MAGIC:地图引导的少样本视听声学建模 Diwei Huang, Kunyang Lin, Peihao Chen, Qing Du, Mingkui Tan http://arxiv.org/pdf/2405.13860v1 null
2024-05-22 GMMFormer v2: An Uncertainty-aware Framework for Partially Relevant Video Retrieval GMMFormer v2:用于部分相关视频检索的不确定性感知框架 Yuting Wang, Jinpeng Wang, Bin Chen, Tao Dai, Ruisheng Luo, Shu-Tao Xia http://arxiv.org/pdf/2405.13824v1 null
2024-05-22 Context and Geometry Aware Voxel Transformer for Semantic Scene Completion 用于语义场景完成的上下文和几何感知体素转换器 Zhu Yu, Runming Zhang, Jiacheng Ying, Junchen Yu, Xiaohai Hu, Lun Luo, Siyuan Cao, Huiliang Shen http://arxiv.org/pdf/2405.13675v1 null
2024-05-22 Advancing Spiking Neural Networks towards Multiscale Spatiotemporal Interaction Learning 推进尖峰神经网络走向多尺度时空交互学习 Yimeng Shan, Malu Zhang, Rui-jie Zhu, Xuerui Qiu, Jason K. Eshraghian, Haicheng Qu http://arxiv.org/pdf/2405.13672v1 null
2024-05-22 Comparative Analysis of Hyperspectral Image Reconstruction Using Deep Learning for Agricultural and Biological Applications 使用深度学习进行农业和生物应用的高光谱图像重建的比较分析 Md. Toukir Ahmed, Mohammed Kamruzzaman http://arxiv.org/pdf/2405.13331v1 null
2024-05-22 AUGlasses: Continuous Action Unit based Facial Reconstruction with Low-power IMUs on Smart Glasses AUGlasses:基于连续动作单元的智能眼镜面部重建,采用低功耗 IMU Yanrong Li, Tengxiang Zhang, Xin Zeng, Yuntao Wang, Haotian Zhang, Yiqiang Chen http://arxiv.org/pdf/2405.13289v1 null

3D/CG

Publish Date Title Title_CN Authors PDF Code
2024-05-22 Monocular Gaussian SLAM with Language Extended Loop Closure 具有语言扩展循环闭包的单目高斯 SLAM Tian Lan, Qinwei Lin, Haoqian Wang http://arxiv.org/pdf/2405.13748v1 null
2024-05-22 EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views EgoChoir:从自我中心视角捕捉 3D 人与物体交互区域 Yuhang Yang, Wei Zhai, Chengfeng Wang, Chengjun Yu, Yang Cao, Zheng-Jun Zha http://arxiv.org/pdf/2405.13659v1 null

各类学习方式

Publish Date Title Title_CN Authors PDF Code
2024-05-22 Rehearsal-free Federated Domain-incremental Learning 免演练联邦域增量学习 Rui Sun, Haoran Duan, Jiahua Dong, Varun Ojha, Tejal Shah, Rajiv Ranjan http://arxiv.org/pdf/2405.13900v1 null
2024-05-22 What Makes Good Few-shot Examples for Vision-Language Models? 是什么造就了视觉语言模型的良好小样本示例? Zhaojun Guo, Jinghui Lu, Xuejing Liu, Rui Zhao, ZhenXing Qian, Fei Tan http://arxiv.org/pdf/2405.13532v1 null

其他

Publish Date Title Title_CN Authors PDF Code
2024-05-22 Refining Skewed Perceptions in Vision-Language Models through Visual Representations 通过视觉表示改善视觉语言模型中的偏差感知 Haocheng Dai, Sarang Joshi http://arxiv.org/pdf/2405.14030v1 null
2024-05-22 Text Prompting for Multi-Concept Video Customization by Autoregressive Generation 通过自回归生成进行多概念视频定制的文本提示 Divya Kothandaraman, Kihyuk Sohn, Ruben Villegas, Paul Voigtlaender, Dinesh Manocha, Mohammad Babaeizadeh http://arxiv.org/pdf/2405.13951v1 null
2024-05-22 Koopcon: A new approach towards smarter and less complex learning Koopcon:一种实现更智能、更简单学习的新方法 Vahid Jebraeeli, Bo Jiang, Derya Cansever, Hamid Krim http://arxiv.org/pdf/2405.13866v1 null
2024-05-22 Perceptual Fairness in Image Restoration 图像恢复中的感知公平 Guy Ohayon, Michael Elad, Tomer Michaeli http://arxiv.org/pdf/2405.13805v1 null
2024-05-22 NeurCross: A Self-Supervised Neural Approach for Representing Cross Fields in Quad Mesh Generation NeurCross:一种用于表示四边形网格生成中交叉场的自监督神经方法 Qiujie Dong, Huibiao Wen, Rui Xu, Xiaokang Yu, Jiaran Zhou, Shuangmin Chen, Shiqing Xin, Changhe Tu, Wenping Wang http://arxiv.org/pdf/2405.13745v1 null
2024-05-22 AltChart: Enhancing VLM-based Chart Summarization Through Multi-Pretext Tasks AltChart:通过多借口任务增强基于 VLM 的图表摘要 Omar Moured, Jiaming Zhang, M. Saquib Sarfraz, Rainer Stiefelhagen http://arxiv.org/pdf/2405.13580v1 null
2024-05-22 A Perspective Analysis of Handwritten Signature Technology 手写签名技术透视分析 Moises Diaz, Miguel A. Ferrer, Donato Impedovo, Muhammad Imran Malik, Giuseppe Pirlo, Rejean Plamondon http://arxiv.org/pdf/2405.13555v1 null
2024-05-22 HR-INR: Continuous Space-Time Video Super-Resolution via Event Camera HR-INR:通过事件相机实现连续时空视频超分辨率 Yunfan Lu, Zipeng Wang, Yusheng Wang, Hui Xiong http://arxiv.org/pdf/2405.13389v1 null
2024-05-22 Part-based Quantitative Analysis for Heatmaps 基于零件的热图定量分析 Osman Tursun, Sinan Kalkan, Simon Denman, Sridha Sridharan, Clinton Fookes http://arxiv.org/pdf/2405.13264v1 null