Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-05-22 | MagicPose4D: Crafting Articulated Models with Appearance and Motion Control | MagicPose4D:制作具有外观和运动控制的铰接模型 | Hao Zhang, Di Chang, Fang Li, Mohammad Soleymani, Narendra Ahuja | http://arxiv.org/pdf/2405.14017v1 | null |
2024-05-22 | Learning Latent Space Hierarchical EBM Diffusion Models | 学习潜在空间分层 EBM 扩散模型 | Jiali Cui, Tian Han | http://arxiv.org/pdf/2405.13910v1 | null |
2024-05-22 | FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition | FreeCustom:用于多概念合成的免调整定制图像生成 | Ganggui Ding, Canyu Zhao, Wen Wang, Zhen Yang, Zide Liu, Hao Chen, Chunhua Shen | http://arxiv.org/pdf/2405.13870v1 | null |
2024-05-22 | ReVideo: Remake a Video with Motion and Content Control | ReVideo:通过动作和内容控制重新制作视频 | Chong Mou, Mingdeng Cao, Xintao Wang, Zhaoyang Zhang, Ying Shan, Jian Zhang | http://arxiv.org/pdf/2405.13865v1 | null |
2024-05-22 | Robust Disaster Assessment from Aerial Imagery Using Text-to-Image Synthetic Data | 使用文本到图像合成数据对航空影像进行稳健的灾害评估 | Tarun Kalluri, Jihyeon Lee, Kihyuk Sohn, Sahil Singla, Manmohan Chandraker, Joseph Xu, Jeremiah Liu | http://arxiv.org/pdf/2405.13779v1 | null |
2024-05-22 | A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation | 用于视听生成的混合噪声级多功能扩散变压器 | Gwanghyun Kim, Alonso Martinez, Yu-Chuan Su, Brendan Jou, José Lezama, Agrim Gupta, Lijun Yu, Lu Jiang, Aren Jansen, Jacob Walker, et.al. | http://arxiv.org/pdf/2405.13762v1 | null |
2024-05-22 | ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models | ComboStoc:扩散生成模型的组合随机性 | Rui Xu, Jiepeng Wang, Hao Pan, Yang Liu, Xin Tong, Shiqing Xin, Changhe Tu, Taku Komura, Wenping Wang | http://arxiv.org/pdf/2405.13729v1 | null |
2024-05-22 | InstaDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos | InstaDrag:从视频中实现快速、准确的基于拖动的图像编辑 | Yujun Shi, Jun Hao Liew, Hanshu Yan, Vincent Y. F. Tan, Jiashi Feng | http://arxiv.org/pdf/2405.13722v1 | null |
2024-05-22 | Prompt Mixing in Diffusion Models using the Black Scholes Algorithm | 使用 Black Scholes 算法在扩散模型中进行提示混合 | Divya Kothandaraman, Ming Lin, Dinesh Manocha | http://arxiv.org/pdf/2405.13685v1 | null |
2024-05-22 | Curriculum Direct Preference Optimization for Diffusion and Consistency Models | 扩散和一致性模型的课程直接偏好优化 | Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Nicu Sebe, Mubarak Shah | http://arxiv.org/pdf/2405.13637v1 | null |
2024-05-22 | MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation | MetaEarth:全球范围遥感图像生成的生成基础模型 | Zhiping Yu, Chenyang Liu, Liqin Liu, Zhenwei Shi, Zhengxia Zou | http://arxiv.org/pdf/2405.13570v1 | null |
2024-05-22 | MotionCraft: Physics-based Zero-Shot Video Generation | MotionCraft:基于物理的零镜头视频生成 | Luca Savant Aira, Antonio Montanaro, Emanuele Aiello, Diego Valsesia, Enrico Magli | http://arxiv.org/pdf/2405.13557v1 | null |
2024-05-22 | Directly Denoising Diffusion Model | 直接去噪扩散模型 | Dan Zhang, Jingjing Wang, Feng Luo | http://arxiv.org/pdf/2405.13540v1 | null |
2024-05-22 | Class-Conditional self-reward mechanism for improved Text-to-Image models | 用于改进文本到图像模型的类条件自我奖励机制 | Safouane El Ghazouali, Arnaud Gucciardi, Umberto Michelucci | http://arxiv.org/pdf/2405.13473v1 | null |
2024-05-22 | Markerless retro-identification complements re-identification of individual insect subjects in archived image data of biological experiments | 无标记逆向识别补充了生物实验存档图像数据中个体昆虫受试者的重新识别 | Asaduz Zaman, Vanessa Kellermann, Alan Dorin | http://arxiv.org/pdf/2405.13376v1 | null |
2024-05-22 | How to Trace Latent Generative Model Generated Images without Artificial Watermark? | 如何在没有人工水印的情况下追踪潜在生成模型生成的图像? | Zhenting Wang, Vikash Sehwag, Chen Chen, Lingjuan Lyu, Dimitris N. Metaxas, Shiqing Ma | http://arxiv.org/pdf/2405.13360v1 | null |
2024-05-22 | Single color virtual H&E staining with In-and-Out Net | 使用进出网进行单色虚拟 H&E 染色 | Mengkun Chen, Yen-Tung Liu, Fadeel Sher Khan, Matthew C. Fox, Jason S. Reichenberg, Fabiana C. P. S. Lopes, Katherine R. Sebastian, Mia K. Markey, James W. Tunnell | http://arxiv.org/pdf/2405.13278v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-05-22 | I2I-Mamba: Multi-modal medical image synthesis via selective state space modeling | I2I-Mamba:通过选择性状态空间建模实现多模态医学图像合成 | Omer F. Atli, Bilal Kabas, Fuat Arslan, Mahmut Yurt, Onat Dalmaz, Tolga Çukur | http://arxiv.org/pdf/2405.14022v1 | null |
2024-05-22 | BrainMorph: A Foundational Keypoint Model for Robust and Flexible Brain MRI Registration | BrainMorph:稳健且灵活的脑 MRI 配准的基础关键点模型 | Alan Q. Wang, Rachit Saluja, Heejong Kim, Xinzi He, Adrian Dalca, Mert R. Sabuncu | http://arxiv.org/pdf/2405.14019v1 | null |
2024-05-22 | PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery | PitVQA:基于图像的文本嵌入法学硕士,用于垂体手术中的视觉问答 | Runlong He, Mengya Xu, Adrito Das, Danyal Z. Khan, Sophia Bano, Hani J. Marcus, Danail Stoyanov, Matthew J. Clarkson, Mobarakol Islam | http://arxiv.org/pdf/2405.13949v1 | null |
2024-05-22 | Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models | 多模态大语言模型中视觉推理细化的思维图像提示 | Qiji Zhou, Ruochen Zhou, Zike Hu, Panzhong Lu, Siyang Gao, Yue Zhang | http://arxiv.org/pdf/2405.13872v1 | null |
2024-05-22 | Dense Connector for MLLMs | 用于 MLLM 的密集连接器 | Huanjin Yao, Wenhao Wu, Taojiannan Yang, YuXin Song, Mengxi Zhang, Haocheng Feng, Yifan Sun, Zhiheng Li, Wanli Ouyang, Jingdong Wang | http://arxiv.org/pdf/2405.13800v1 | null |
2024-05-22 | No Filter: Cultural and Socioeconomic Diversityin Contrastive Vision-Language Models | 无过滤:对比视觉语言模型中的文化和社会经济多样性 | Angéline Pouget, Lucas Beyer, Emanuele Bugliarello, Xiao Wang, Andreas Peter Steiner, Xiaohua Zhai, Ibrahim Alabdulmohsin | http://arxiv.org/pdf/2405.13777v1 | null |
2024-05-22 | Safety Alignment for Vision Language Models | 视觉语言模型的安全对齐 | Zhendong Liu, Yuanbi Nie, Yingshui Tan, Xiangyu Yue, Qiushi Cui, Chongjun Wang, Xiaoyong Zhu, Bo Zheng | http://arxiv.org/pdf/2405.13581v1 | null |
2024-05-22 | Cross-Modal Distillation in Industrial Anomaly Detection: Exploring Efficient Multi-Modal IAD | 工业异常检测中的跨模态蒸馏:探索高效的多模态 IAD | Wenbo Sui, Daniel Lichau, Josselin Lefèvre, Harold Phelippeau | http://arxiv.org/pdf/2405.13571v1 | null |
2024-05-22 | Adapting Multi-modal Large Language Model to Concept Drift in the Long-tailed Open World | 多模态大语言模型适应长尾开放世界中的概念漂移 | Xiaoyu Yang, Jie Lu, En Yu | http://arxiv.org/pdf/2405.13459v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-05-22 | DoGaussian: Distributed-Oriented Gaussian Splatting for Large-Scale 3D Reconstruction Via Gaussian Consensus | DoGaussian:通过高斯一致性进行大规模 3D 重建的分布式高斯泼溅 | Yu Chen, Gim Hee Lee | http://arxiv.org/pdf/2405.13943v1 | null |
2024-05-22 | Gaussian Time Machine: A Real-Time Rendering Methodology for Time-Variant Appearances | 高斯时间机器:时变外观的实时渲染方法 | Licheng Shen, Ho Ngai Chow, Lingyun Wang, Tong Zhang, Mengqiu Wang, Yuxing Han | http://arxiv.org/pdf/2405.13694v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-05-22 | Two Heads are Better Than One: Neural Networks Quantization with 2D Hilbert Curve-based Output Representation | 两个头比一个头更好:具有基于 2D 希尔伯特曲线的输出表示的神经网络量化 | Mykhailo Uss, Ruslan Yermolenko, Olena Kolodiazhna, Oleksii Shashko, Ivan Safonov, Volodymyr Savin, Yoonjae Yeo, Seowon Ji, Jaeyun Jeong | http://arxiv.org/pdf/2405.14024v1 | null |
2024-05-22 | DCT-Based Decorrelated Attention for Vision Transformers | 基于 DCT 的视觉变换器去相关注意力 | Hongyi Pan, Emadeldeen Hamdan, Xin Zhu, Koushik Biswas, Ahmet Cetin, Ulas Bagci | http://arxiv.org/pdf/2405.13901v1 | null |
2024-05-22 | QGait: Toward Accurate Quantization for Gait Recognition with Binarized Input | QGait:通过二值化输入实现步态识别的精确量化 | Senmao Tian, Haoyu Gao, Gangyi Hong, Shuyun Wang, JingJie Wang, Xin Yu, Shunli Zhang | http://arxiv.org/pdf/2405.13859v1 | null |
2024-05-22 | Low-Resolution Chest X-ray Classification via Knowledge Distillation and Multi-task Learning | 通过知识蒸馏和多任务学习进行低分辨率胸部 X 射线分类 | Yasmeena Akhter, Rishabh Ranjan, Richa Singh, Mayank Vatsa | http://arxiv.org/pdf/2405.13370v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-05-22 | Learning rigid-body simulators over implicit shapes for large-scale scenes and vision | 针对大规模场景和视觉,通过隐式形状学习刚体模拟器 | Yulia Rubanova, Tatiana Lopez-Guevara, Kelsey R. Allen, William F. Whitney, Kimberly Stachenfeld, Tobias Pfaff | http://arxiv.org/pdf/2405.14045v1 | null |
2024-05-22 | One-shot Training for Video Object Segmentation | 视频对象分割的一次性训练 | Baiyu Chen, Sixian Chan, Xiaoqin Zhang | http://arxiv.org/pdf/2405.14010v1 | null |
2024-05-22 | AutoLCZ: Towards Automatized Local Climate Zone Mapping from Rule-Based Remote Sensing | AutoLCZ:通过基于规则的遥感实现自动化当地气候带测绘 | Chenying Liu, Hunsoo Song, Anamika Shreevastava, Conrad M Albrecht | http://arxiv.org/pdf/2405.13993v1 | null |
2024-05-22 | TS40K: a 3D Point Cloud Dataset of Rural Terrain and Electrical Transmission System | TS40K:农村地形和电力传输系统的 3D 点云数据集 | Diogo Lavado, Cláudia Soares, Alessandra Micheletti, Ricardo Santos, André Coelho, João Santos | http://arxiv.org/pdf/2405.13989v1 | null |
2024-05-22 | LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate | LookHere:具有定向注意力的视觉变换器进行概括和推断 | Anthony Fuller, Daniel G. Kyrollos, Yousef Yassin, James R. Green | http://arxiv.org/pdf/2405.13985v1 | null |
2024-05-22 | Optimizing Curvature Learning for Robust Hyperbolic Deep Learning in Computer Vision | 优化曲率学习以实现计算机视觉中的鲁棒双曲深度学习 | Ahmad Bdeir, Niels Landwehr | http://arxiv.org/pdf/2405.13979v1 | null |
2024-05-22 | ST-Gait++: Leveraging spatio-temporal convolutions for gait-based emotion recognition on videos | ST-Gait++:利用时空卷积实现基于步态的视频情绪识别 | Maria Luísa Lima, Willams de Lima Costa, Estefania Talavera Martinez, Veronica Teichrieb | http://arxiv.org/pdf/2405.13903v1 | null |
2024-05-22 | A General Framework for Jersey Number Recognition in Sports Video | 体育视频中球衣号码识别的通用框架 | Maria Koshkina, James H. Elder | http://arxiv.org/pdf/2405.13896v1 | null |
2024-05-22 | Just rotate it! Uncertainty estimation in closed-source models via multiple queries | 只需旋转它即可!通过多个查询估计闭源模型的不确定性 | Konstantinos Pitas, Julyan Arbel | http://arxiv.org/pdf/2405.13864v1 | null |
2024-05-22 | Hyperspectral Image Reconstruction for Predicting Chick Embryo Mortality Towards Advancing Egg and Hatchery Industry | 用于预测鸡胚死亡率的高光谱图像重建,促进鸡蛋和孵化行业的发展 | Md. Toukir Ahmed, Md Wadud Ahmed, Ocean Monjur, Jason Lee Emmert, Girish Chowdhary, Mohammed Kamruzzaman | http://arxiv.org/pdf/2405.13843v1 | null |
2024-05-22 | Multi-Dataset Multi-Task Learning for COVID-19 Prognosis | 用于 COVID-19 预后的多数据集多任务学习 | Filippo Ruffini, Lorenzo Tronchin, Zhuoru Wu, Wenting Chen, Paolo Soda, Linlin Shen, Valerio Guarrasi | http://arxiv.org/pdf/2405.13771v1 | null |
2024-05-22 | Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks | 神经网络中基于反事实梯度的预测信任量化 | Mohit Prabhushankar, Ghassan AlRegib | http://arxiv.org/pdf/2405.13758v1 | null |
2024-05-22 | A label-free and data-free training strategy for vasculature segmentation in serial sectioning OCT data | 连续切片 OCT 数据中脉管系统分割的无标签和无数据训练策略 | Etienne Chollet, Yael Balbastre, Caroline Magnain, Bruce Fischl, Hui Wang | http://arxiv.org/pdf/2405.13757v1 | null |
2024-05-22 | Optimizing Lymphocyte Detection in Breast Cancer Whole Slide Imaging through Data-Centric Strategies | 通过以数据为中心的策略优化乳腺癌全玻片成像中的淋巴细胞检测 | Amine Marzouki, Zhuxian Guo, Qinghe Zeng, Camille Kurtz, Nicolas Loménie | http://arxiv.org/pdf/2405.13710v1 | null |
2024-05-22 | Embedding Generalized Semantic Knowledge into Few-Shot Remote Sensing Segmentation | 将广义语义知识嵌入到少样本遥感分割中 | Yuyu Jia, Wei Huang, Junyu Gao, Qi Wang, Qiang Li | http://arxiv.org/pdf/2405.13686v1 | null |
2024-05-22 | Ultra-Fast Adaptive Track Detection Network | 超快速自适应轨迹检测网络 | Hai Ni, Rui Wang, Scarlett Liu | http://arxiv.org/pdf/2405.13538v1 | null |
2024-05-22 | PerSense: Personalized Instance Segmentation in Dense Images | PerSense:密集图像中的个性化实例分割 | Muhammad Ibraheem Siddiqui, Muhammad Umer Sheikh, Hassan Abid, Muhammad Haris Khan | http://arxiv.org/pdf/2405.13518v1 | null |
2024-05-22 | Continual Learning in Medical Imaging from Theory to Practice: A Survey and Practical Analysis | 医学影像从理论到实践的持续学习:调查与实践分析 | Mohammad Areeb Qazi, Anees Ur Rehman Hashmi, Santosh Sanjeev, Ibrahim Almakky, Numan Saeed, Mohammad Yaqub | http://arxiv.org/pdf/2405.13482v1 | null |
2024-05-22 | AdaFedFR: Federated Face Recognition with Adaptive Inter-Class Representation Learning | AdaFedFR:具有自适应类间表示学习的联合人脸识别 | Di Qiu, Xinyang Lin, Kaiye Wang, Xiangxiang Chu, Pengfei Yan | http://arxiv.org/pdf/2405.13467v1 | null |
2024-05-22 | A Label Propagation Strategy for CutMix in Multi-Label Remote Sensing Image Classification | 多标签遥感图像分类中 CutMix 的标签传播策略 | Tom Burgert, Tim Siebert, Kai Norman Clasen, Begüm Demir | http://arxiv.org/pdf/2405.13451v1 | null |
2024-05-22 | Dynamically enhanced static handwriting representation for Parkinson's disease detection | 用于帕金森病检测的动态增强静态手写表示 | Moises Diaz, Miguel Angel Ferrer, Donato Impedovo, Giuseppe Pirlo, Gennaro Vessio | http://arxiv.org/pdf/2405.13438v1 | null |
2024-05-22 | Multi Player Tracking in Ice Hockey with Homographic Projections | 使用单应投影进行冰球多人跟踪 | Harish Prakash, Jia Cheng Shang, Ken M. Nsiempba, Yuhao Chen, David A. Clausi, John S. Zelek | http://arxiv.org/pdf/2405.13397v1 | null |
2024-05-22 | Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation | 使用语言视觉提示进行低数据实例分割的无监督预训练 | Dingwen Zhang, Hao Li, Diqi He, Nian Liu, Lechao Cheng, Jingdong Wang, Junwei Han | http://arxiv.org/pdf/2405.13388v1 | null |
2024-05-22 | VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding | VTG-LLM:将时间戳知识集成到视频 LLM 中以增强视频时间基础 | Yongxin Guo, Jingyu Liu, Mingda Li, Xiaoying Tang, Xi Chen, Bo Zhao | http://arxiv.org/pdf/2405.13382v1 | null |
2024-05-22 | Collaboration of Teachers for Semi-supervised Object Detection | 教师协作进行半监督目标检测 | Liyu Chen, Huaao Tang, Yi Wen, Hanting Chen, Wei Li, Junchao Liu, Jie Hu | http://arxiv.org/pdf/2405.13374v1 | null |
2024-05-22 | Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer | 语义公平聚类:Vision Transformer 的简单、快速且有效的策略 | Qihang Fan, Huaibo Huang, Mingrui Chen, Ran He | http://arxiv.org/pdf/2405.13337v1 | null |
2024-05-22 | Vision Transformer with Sparse Scan Prior | 具有稀疏扫描先验的视觉变换器 | Qihang Fan, Huaibo Huang, Mingrui Chen, Ran He | http://arxiv.org/pdf/2405.13335v1 | null |
2024-05-22 | Hybrid Multihead Attentive Unet-3D for Brain Tumor Segmentation | 用于脑肿瘤分割的混合多头 Attentive Unet-3D | Muhammad Ansab Butt, Absaar Ul Jabbar | http://arxiv.org/pdf/2405.13304v1 | null |
2024-05-22 | Enhancing Active Learning for Sentinel 2 Imagery through Contrastive Learning and Uncertainty Estimation | 通过对比学习和不确定性估计增强 Sentinel 2 图像的主动学习 | David Pogorzelski, Peter Arlinghaus | http://arxiv.org/pdf/2405.13285v1 | null |
2024-05-22 | FLARE up your data: Diffusion-based Augmentation Method in Astronomical Imaging | 闪耀您的数据:天文成像中基于扩散的增强方法 | Mohammed Talha Alam, Raza Imam, Mohsen Guizani, Fakhri Karray | http://arxiv.org/pdf/2405.13267v1 | null |
2024-05-22 | Traffic control using intelligent timing of traffic lights with reinforcement learning technique and real-time processing of surveillance camera images | 利用强化学习技术和实时处理监控摄像头图像的交通信号灯智能定时进行交通控制 | Mahdi Jamebozorg, Mohsen Hami, Sajjad Deh Deh Jani | http://arxiv.org/pdf/2405.13256v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-05-22 | Computer-Vision-Enabled Worker Video Analysis for Motion Amount Quantification | 用于运动量量化的计算机视觉工人视频分析 | Hari Iyer, Neel Macwan, Shenghan Guo, Heejin Jeong | http://arxiv.org/pdf/2405.13999v1 | null |
2024-05-22 | Addressing the Elephant in the Room: Robust Animal Re-Identification with Unsupervised Part-Based Feature Alignment | 解决房间里的大象:鲁棒的动物重新识别与无监督的基于部分的特征对齐 | Yingxue Yu, Vidit Vidit, Andrey Davydov, Martin Engilberge, Pascal Fua | http://arxiv.org/pdf/2405.13781v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-05-22 | TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment | TOPA:通过纯文本预对齐扩展用于视频理解的大型语言模型 | Wei Li, Hehe Fan, Yongkang Wong, Mohan Kankanhalli, Yi Yang | http://arxiv.org/pdf/2405.13911v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-05-22 | Mitigating Interference in the Knowledge Continuum through Attention-Guided Incremental Learning | 通过注意力引导的增量学习减轻知识连续体中的干扰 | Prashant Bhat, Bharath Renjith, Elahe Arani, Bahram Zonooz | http://arxiv.org/pdf/2405.13978v1 | null |
2024-05-22 | Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching | 基于仿射的可变形注意力和半密集匹配的选择性融合 | Hongkai Chen, Zixin Luo, Yurun Tian, Xuyang Bai, Ziyu Wang, Lei Zhou, Mingmin Zhen, Tian Fang, David McKinnon, Yanghai Tsin, et.al. | http://arxiv.org/pdf/2405.13874v1 | null |
2024-05-22 | MAGIC: Map-Guided Few-Shot Audio-Visual Acoustics Modeling | MAGIC:地图引导的少样本视听声学建模 | Diwei Huang, Kunyang Lin, Peihao Chen, Qing Du, Mingkui Tan | http://arxiv.org/pdf/2405.13860v1 | null |
2024-05-22 | GMMFormer v2: An Uncertainty-aware Framework for Partially Relevant Video Retrieval | GMMFormer v2:用于部分相关视频检索的不确定性感知框架 | Yuting Wang, Jinpeng Wang, Bin Chen, Tao Dai, Ruisheng Luo, Shu-Tao Xia | http://arxiv.org/pdf/2405.13824v1 | null |
2024-05-22 | Context and Geometry Aware Voxel Transformer for Semantic Scene Completion | 用于语义场景完成的上下文和几何感知体素转换器 | Zhu Yu, Runming Zhang, Jiacheng Ying, Junchen Yu, Xiaohai Hu, Lun Luo, Siyuan Cao, Huiliang Shen | http://arxiv.org/pdf/2405.13675v1 | null |
2024-05-22 | Advancing Spiking Neural Networks towards Multiscale Spatiotemporal Interaction Learning | 推进尖峰神经网络走向多尺度时空交互学习 | Yimeng Shan, Malu Zhang, Rui-jie Zhu, Xuerui Qiu, Jason K. Eshraghian, Haicheng Qu | http://arxiv.org/pdf/2405.13672v1 | null |
2024-05-22 | Comparative Analysis of Hyperspectral Image Reconstruction Using Deep Learning for Agricultural and Biological Applications | 使用深度学习进行农业和生物应用的高光谱图像重建的比较分析 | Md. Toukir Ahmed, Mohammed Kamruzzaman | http://arxiv.org/pdf/2405.13331v1 | null |
2024-05-22 | AUGlasses: Continuous Action Unit based Facial Reconstruction with Low-power IMUs on Smart Glasses | AUGlasses:基于连续动作单元的智能眼镜面部重建,采用低功耗 IMU | Yanrong Li, Tengxiang Zhang, Xin Zeng, Yuntao Wang, Haotian Zhang, Yiqiang Chen | http://arxiv.org/pdf/2405.13289v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-05-22 | Monocular Gaussian SLAM with Language Extended Loop Closure | 具有语言扩展循环闭包的单目高斯 SLAM | Tian Lan, Qinwei Lin, Haoqian Wang | http://arxiv.org/pdf/2405.13748v1 | null |
2024-05-22 | EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views | EgoChoir:从自我中心视角捕捉 3D 人与物体交互区域 | Yuhang Yang, Wei Zhai, Chengfeng Wang, Chengjun Yu, Yang Cao, Zheng-Jun Zha | http://arxiv.org/pdf/2405.13659v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-05-22 | Rehearsal-free Federated Domain-incremental Learning | 免演练联邦域增量学习 | Rui Sun, Haoran Duan, Jiahua Dong, Varun Ojha, Tejal Shah, Rajiv Ranjan | http://arxiv.org/pdf/2405.13900v1 | null |
2024-05-22 | What Makes Good Few-shot Examples for Vision-Language Models? | 是什么造就了视觉语言模型的良好小样本示例? | Zhaojun Guo, Jinghui Lu, Xuejing Liu, Rui Zhao, ZhenXing Qian, Fei Tan | http://arxiv.org/pdf/2405.13532v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-05-22 | Refining Skewed Perceptions in Vision-Language Models through Visual Representations | 通过视觉表示改善视觉语言模型中的偏差感知 | Haocheng Dai, Sarang Joshi | http://arxiv.org/pdf/2405.14030v1 | null |
2024-05-22 | Text Prompting for Multi-Concept Video Customization by Autoregressive Generation | 通过自回归生成进行多概念视频定制的文本提示 | Divya Kothandaraman, Kihyuk Sohn, Ruben Villegas, Paul Voigtlaender, Dinesh Manocha, Mohammad Babaeizadeh | http://arxiv.org/pdf/2405.13951v1 | null |
2024-05-22 | Koopcon: A new approach towards smarter and less complex learning | Koopcon:一种实现更智能、更简单学习的新方法 | Vahid Jebraeeli, Bo Jiang, Derya Cansever, Hamid Krim | http://arxiv.org/pdf/2405.13866v1 | null |
2024-05-22 | Perceptual Fairness in Image Restoration | 图像恢复中的感知公平 | Guy Ohayon, Michael Elad, Tomer Michaeli | http://arxiv.org/pdf/2405.13805v1 | null |
2024-05-22 | NeurCross: A Self-Supervised Neural Approach for Representing Cross Fields in Quad Mesh Generation | NeurCross:一种用于表示四边形网格生成中交叉场的自监督神经方法 | Qiujie Dong, Huibiao Wen, Rui Xu, Xiaokang Yu, Jiaran Zhou, Shuangmin Chen, Shiqing Xin, Changhe Tu, Wenping Wang | http://arxiv.org/pdf/2405.13745v1 | null |
2024-05-22 | AltChart: Enhancing VLM-based Chart Summarization Through Multi-Pretext Tasks | AltChart:通过多借口任务增强基于 VLM 的图表摘要 | Omar Moured, Jiaming Zhang, M. Saquib Sarfraz, Rainer Stiefelhagen | http://arxiv.org/pdf/2405.13580v1 | null |
2024-05-22 | A Perspective Analysis of Handwritten Signature Technology | 手写签名技术透视分析 | Moises Diaz, Miguel A. Ferrer, Donato Impedovo, Muhammad Imran Malik, Giuseppe Pirlo, Rejean Plamondon | http://arxiv.org/pdf/2405.13555v1 | null |
2024-05-22 | HR-INR: Continuous Space-Time Video Super-Resolution via Event Camera | HR-INR:通过事件相机实现连续时空视频超分辨率 | Yunfan Lu, Zipeng Wang, Yusheng Wang, Hui Xiong | http://arxiv.org/pdf/2405.13389v1 | null |
2024-05-22 | Part-based Quantitative Analysis for Heatmaps | 基于零件的热图定量分析 | Osman Tursun, Sinan Kalkan, Simon Denman, Sridha Sridharan, Clinton Fookes | http://arxiv.org/pdf/2405.13264v1 | null |