Skip to content

Latest commit

 

History

History
executable file
·
189 lines (164 loc) · 35.9 KB

2024-03-21.md

File metadata and controls

executable file
·
189 lines (164 loc) · 35.9 KB

[UPDATED!] 2024-03-21 (Publish Time)

生成模型

Publish Date Title Title_CN Authors PDF Code
2024-03-21 Simplified Diffusion Schrödinger Bridge 简化扩散薛定谔桥 Zhicong Tang, Tiankai Hang, Shuyang Gu, Dong Chen, Baining Guo http://arxiv.org/pdf/2403.14623v1 null
2024-03-21 GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation GRM:用于高效 3D 重建和生成的大型高斯重建模型 Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, Gordon Wetzstein http://arxiv.org/pdf/2403.14621v1 null
2024-03-21 ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition ClusteringSDF:用于 3D 分解的自组织神经隐式曲面 Tianhao Wu, Chuanxia Zheng, Tat-Jen Cham, Qianyi Wu http://arxiv.org/pdf/2403.14619v1 null
2024-03-21 DreamReward: Text-to-3D Generation with Human Preference DreamReward:根据人类偏好生成文本到 3D Junliang Ye, Fangfu Liu, Qixiu Li, Zhengyi Wang, Yikai Wang, Xinzhou Wang, Yueqi Duan, Jun Zhu http://arxiv.org/pdf/2403.14613v1 null
2024-03-21 ReNoise: Real Image Inversion Through Iterative Noising ReNoise:通过迭代噪声进行真实图像反转 Daniel Garibi, Or Patashnik, Andrey Voynov, Hadar Averbuch-Elor, Daniel Cohen-Or http://arxiv.org/pdf/2403.14602v1 null
2024-03-21 Object-Centric Domain Randomization for 3D Shape Reconstruction in the Wild 用于野外 3D 形状重建的以对象为中心的域随机化 Junhyeong Cho, Kim Youwang, Hunmin Yang, Tae-Hyun Oh http://arxiv.org/pdf/2403.14539v1 null
2024-03-21 HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression HAC:用于 3D 高斯泼溅压缩的哈希网格辅助上下文 Yihang Chen, Qianyi Wu, Jianfei Cai, Mehrtash Harandi, Weiyao Lin http://arxiv.org/pdf/2403.14530v1 null
2024-03-21 Click to Grasp: Zero-Shot Precise Manipulation via Visual Diffusion Descriptors 点击掌握:通过视觉扩散描述符进行零射击精确操作 Nikolaos Tsagkas, Jack Rome, Subramanian Ramamoorthy, Oisin Mac Aodha, Chris Xiaoxuan Lu http://arxiv.org/pdf/2403.14526v1 null
2024-03-21 Denoising Diffusion Models for 3D Healthy Brain Tissue Inpainting 用于 3D 健康脑组织修复的去噪扩散模型 Alicia Durrer, Julia Wolleb, Florentin Bieder, Paul Friedrich, Lester Melie-Garcia, Mario Ocampo-Pineda, Cosmin I. Bercea, Ibrahim E. Hamamci, Benedikt Wiestler, Marie Piraud, et.al. http://arxiv.org/pdf/2403.14499v1 null
2024-03-21 Style-Extracting Diffusion Models for Semi-Supervised Histopathology Segmentation 用于半监督组织病理学分割的风格提取扩散模型 Mathias Öttl, Frauke Wilm, Jana Steenpass, Jingna Qiu, Matthias Rübner, Arndt Hartmann, Matthias Beckmann, Peter Fasching, Andreas Maier, Ramona Erber, et.al. http://arxiv.org/pdf/2403.14429v1 null
2024-03-21 DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning DP-RDM:无需微调即可使扩散模型适应私有域 Jonathan Lebensold, Maziar Sanjabi, Pietro Astolfi, Adriana Romero-Soriano, Kamalika Chaudhuri, Mike Rabbat, Chuan Guo http://arxiv.org/pdf/2403.14421v1 null
2024-03-21 OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation OA-CNN:用于 3D 语义分割的全自适应稀疏 CNN Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen, Hengshuang Zhao, Zhuotao Tian, Jiaya Jia http://arxiv.org/pdf/2403.14418v1 null
2024-03-21 A Bag of Tricks for Few-Shot Class-Incremental Learning 少样本类增量学习的一大堆技巧 Shuvendu Roy, Chunjong Park, Aldi Fahrezi, Ali Etemad http://arxiv.org/pdf/2403.14392v1 null
2024-03-21 InfNeRF: Towards Infinite Scale NeRF Rendering with O(log n) Space Complexity InfNeRF:以 O(log n) 空间复杂度实现无限规模 NeRF 渲染 Jiabin Liang, Lanqing Zhang, Zhuoran Zhao, Xiangyu Xu http://arxiv.org/pdf/2403.14376v1 null
2024-03-21 Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models 具有令牌优化的开放词汇注意力图用于扩散模型中的语义分割 Pablo Marcos-Manchón, Roberto Alcover-Couso, Juan C. SanMiguel, Jose M. Martínez http://arxiv.org/pdf/2403.14291v1 null
2024-03-21 Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation Zero123-6D:用于 RGB 类别级 6D 姿势估计的零样本新颖视图合成 Francesco Di Felice, Alberto Remus, Stefano Gasperini, Benjamin Busam, Lionel Ott, Federico Tombari, Roland Siegwart, Carlo Alberto Avizzano http://arxiv.org/pdf/2403.14279v1 null
2024-03-21 Diffusion Models with Ensembled Structure-Based Anomaly Scoring for Unsupervised Anomaly Detection 用于无监督异常检测的具有基于集成结构的异常评分的扩散模型 Finn Behrendt, Debayan Bhattacharya, Lennart Maack, Julia Krüger, Roland Opfer, Robin Mieling, Alexander Schlaefer http://arxiv.org/pdf/2403.14262v1 null
2024-03-21 StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN StyleCineGAN:使用预先训练的 StyleGAN 生成景观电影图片 Jongwoo Choi, Kwanggyoon Seo, Amirsaman Ashtari, Junyong Noh http://arxiv.org/pdf/2403.14186v1 null
2024-03-21 Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians Mini-Splatting:用有限数量的高斯表示场景 Guangchi Fang, Bing Wang http://arxiv.org/pdf/2403.14166v1 null
2024-03-21 Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition 通过内容帧运动潜在分解的高效视频扩散模型 Sihyun Yu, Weili Nie, De-An Huang, Boyi Li, Jinwoo Shin, Anima Anandkumar http://arxiv.org/pdf/2403.14148v1 null
2024-03-21 Powerful Lossy Compression for Noisy Images 针对噪声图像的强大有损压缩 Shilv Cai, Xiaoguo Liang, Shuning Cao, Luxin Yan, Sheng Zhong, Liqun Chen, Xu Zou http://arxiv.org/pdf/2403.14135v1 null
2024-03-21 QSMDiff: Unsupervised 3D Diffusion Models for Quantitative Susceptibility Mapping QSMDiff:用于定量磁化率绘图的无监督 3D 扩散模型 Zhuang Xiong, Wei Jiang, Yang Gao, Feng Liu, Hongfu Sun http://arxiv.org/pdf/2403.14070v1 null
2024-03-21 LeFusion: Synthesizing Myocardial Pathology on Cardiac MRI via Lesion-Focus Diffusion Models LeFusion:通过病变焦点扩散模型在心脏 MRI 上综合心肌病理学 Hantao Zhang, Jiancheng Yang, Shouhong Wan, Pascal Fua http://arxiv.org/pdf/2403.14066v1 null

多模态

Publish Date Title Title_CN Authors PDF Code
2024-03-21 MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? MathVerse:您的多模式法学硕士能否真正看到视觉数学问题中的图表? Renrui Zhang, Dongzhi Jiang, Yichi Zhang, Haokun Lin, Ziyu Guo, Pengshuo Qiu, Aojun Zhou, Pan Lu, Kai-Wei Chang, Peng Gao, et.al. http://arxiv.org/pdf/2403.14624v1 null
2024-03-21 Language Repository for Long Video Understanding 长视频理解语言库 Kumara Kahatapitiya, Kanchana Ranasinghe, Jongwoo Park, Michael S. Ryoo http://arxiv.org/pdf/2403.14622v1 null
2024-03-21 PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model PSALM:具有大型多模态模型的像素分割 Zheng Zhang, Yeyao Ma, Enming Zhang, Xiang Bai http://arxiv.org/pdf/2403.14598v1 null
2024-03-21 Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference Cobra:将 Mamba 扩展到多模态大型语言模型以实现高效推理 Han Zhao, Min Zhang, Wei Zhao, Pengxiang Ding, Siteng Huang, Donglin Wang http://arxiv.org/pdf/2403.14520v1 null
2024-03-21 Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination Pensieve:回顾然后比较可以减轻幻视 Dingchen Yang, Bowen Cao, Guang Chen, Changjun Jiang http://arxiv.org/pdf/2403.14401v1 null
2024-03-21 LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding LayoutLLM:大型语言模型指令调整,以实现视觉丰富的文档理解 Masato Fujitake http://arxiv.org/pdf/2403.14252v1 null
2024-03-21 Dermacen Analytica: A Novel Methodology Integrating Multi-Modal Large Language Models with Machine Learning in tele-dermatology Dermacen Analytica:一种将多模态大型语言模型与远程皮肤病学机器学习相结合的新方法 Dimitrios P. Panagoulias, Evridiki Tsoureli-Nikita, Maria Virvou, George A. Tsihrintzis http://arxiv.org/pdf/2403.14243v1 null
2024-03-21 Unsupervised Audio-Visual Segmentation with Modality Alignment 具有模态对齐的无监督视听分割 Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Jiangkang Deng, Xiatian Zhu http://arxiv.org/pdf/2403.14203v1 null
2024-03-21 OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation OTSeg:零样本语义分割的多提示 Sinkhorn 注意力 Kwanyoung Kim, Yujin Oh, Jong Chul Ye http://arxiv.org/pdf/2403.14183v1 null
2024-03-21 Leveraging Large Language Model-based Room-Object Relationships Knowledge for Enhancing Multimodal-Input Object Goal Navigation 利用基于大语言模型的房间-对象关系知识来增强多模式输入对象目标导航 Leyuan Sun, Asako Kanezaki, Guillaume Caron, Yusuke Yoshiyasu http://arxiv.org/pdf/2403.14163v1 null
2024-03-21 Empowering Segmentation Ability to Multi-modal Large Language Models 增强多模态大型语言模型的细分能力 Yuqi Yang, Peng-Tao Jiang, Jing Wang, Hao Zhang, Kai Zhao, Jinwei Chen, Bo Li http://arxiv.org/pdf/2403.14141v1 null
2024-03-21 Leveraging Thermal Modality to Enhance Reconstruction in Low-Light Conditions 利用热模态增强弱光条件下的重建 Jiacong Xu, Mingqian Liao, K Ram Prabhakar, Vishal M. Patel http://arxiv.org/pdf/2403.14053v1 null

Nerf

Publish Date Title Title_CN Authors PDF Code
2024-03-21 CombiNeRF: A Combination of Regularization Techniques for Few-Shot Neural Radiance Field View Synthesis CombiNeRF:用于少样本神经辐射场视图合成的正则化技术组合 Matteo Bonotto, Luigi Sarrocco, Daniele Evangelista, Marco Imperoli, Alberto Pretto http://arxiv.org/pdf/2403.14412v1 null

3DGS

Publish Date Title Title_CN Authors PDF Code
2024-03-21 MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images MVSplat:稀疏多视图图像的高效 3D 高斯分布 Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai http://arxiv.org/pdf/2403.14627v1 null
2024-03-21 Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering 高斯磨砂:具有实时渲染的可编辑复杂辐射场 Antoine Guédon, Vincent Lepetit http://arxiv.org/pdf/2403.14554v1 null
2024-03-21 Isotropic Gaussian Splatting for Real-Time Radiance Field Rendering 用于实时辐射场渲染的各向同性高斯喷射 Yuanhao Gong, Lantao Yu, Guanghui Yue http://arxiv.org/pdf/2403.14244v1 null

模型压缩/优化

Publish Date Title Title_CN Authors PDF Code
2024-03-21 Learning to Project for Cross-Task Knowledge Distillation 学习项目以进行跨任务知识蒸馏 Dylan Auty, Roy Miles, Benedikt Kolbeinsson, Krystian Mikolajczyk http://arxiv.org/pdf/2403.14494v1 null
2024-03-21 Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels 标签不足的开放式视频问答的排名蒸馏 Tianming Liang, Chaolei Tan, Beihao Xia, Wei-Shi Zheng, Jian-Fang Hu http://arxiv.org/pdf/2403.14430v1 null
2024-03-21 Accelerating ViT Inference on FPGA through Static and Dynamic Pruning 通过静态和动态修剪加速 FPGA 上的 ViT 推理 Dhruv Parikh, Shouyi Li, Bingyi Zhang, Rajgopal Kannan, Carl Busart, Viktor Prasanna http://arxiv.org/pdf/2403.14047v1 null

分类/检测/识别/分割/...

Publish Date Title Title_CN Authors PDF Code
2024-03-21 ODTFormer: Efficient Obstacle Detection and Tracking with Stereo Cameras Based on Transformer ODTFormer:基于 Transformer 的立体相机高效障碍物检测与跟踪 Tianye Ding, Hongyu Li, Huaizu Jiang http://arxiv.org/pdf/2403.14626v1 null
2024-03-21 LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors LiFT:密集 ViT 描述符的极其简单的轻量级特征转换 Saksham Suri, Matthew Walmer, Kamal Gupta, Abhinav Shrivastava http://arxiv.org/pdf/2403.14625v1 null
2024-03-21 T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy T-Rex2:通过文本-视觉提示协同实现通用对象检测 Qing Jiang, Feng Li, Zhaoyang Zeng, Tianhe Ren, Shilong Liu, Lei Zhang http://arxiv.org/pdf/2403.14610v1 null
2024-03-21 VXP: Voxel-Cross-Pixel Large-scale Image-LiDAR Place Recognition VXP:体素-跨像素大尺寸图像-LiDAR地点识别 Yun-Jin Li, Mariia Gladkova, Yan Xia, Rui Wang, Daniel Cremers http://arxiv.org/pdf/2403.14594v1 null
2024-03-21 Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer 代币转换很重要:对 Vision Transformer 进行忠实的事后解释 Junyi Wu, Bin Duan, Weitai Kang, Hao Tang, Yan Yan http://arxiv.org/pdf/2403.14552v1 null
2024-03-21 Estimating Physical Information Consistency of Channel Data Augmentation for Remote Sensing Images 估计遥感图像通道数据增强的物理信息一致性 Tom Burgert, Begüm Demir http://arxiv.org/pdf/2403.14547v1 null
2024-03-21 Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets 资源贫乏数据集中跨数据集隔离手语识别的迁移学习 Ahmet Alp Kindiroglu, Ozgur Kara, Ogulcan Ozdemir, Lale Akarun http://arxiv.org/pdf/2403.14534v1 null
2024-03-21 Invisible Needle Detection in Ultrasound: Leveraging Mechanism-Induced Vibration 超声波中的隐形针检测:利用机制引起的振动 Chenyang Li, Dianye Huang, Angelos Karlas, Nassir Navab, Zhongliang Jiang http://arxiv.org/pdf/2403.14523v1 null
2024-03-21 MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection MULDE:通过去噪分数匹配进行多尺度对数密度估计,用于视频异常检测 Jakub Micorek, Horst Possegger, Dominik Narnhofer, Horst Bischof, Mateusz Kozinski http://arxiv.org/pdf/2403.14497v1 null
2024-03-21 Adversary-Robust Graph-Based Learning of WSIs 基于对抗鲁棒图的 WSI 学习 Saba Heidari Gheshlaghi, Milan Aryal, Nasim Yahyasoltani, Masoud Ganji http://arxiv.org/pdf/2403.14489v1 null
2024-03-21 DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing DesignEdit:多层潜在分解和融合,实现统一准确的图像编辑 Yueru Jia, Yuhui Yuan, Aosong Cheng, Chuke Wang, Ji Li, Huizhu Jia, Shanghang Zhang http://arxiv.org/pdf/2403.14487v1 null
2024-03-21 HyperGALE: ASD Classification via Hypergraph Gated Attention with Learnable Hyperedges HyperGALE:通过具有可学习超边的超图门控注意力进行 ASD 分类 Mehul Arora, Chirag Shantilal Jain, Lalith Bharadwaj Baru, Kamalaker Dadi, Bapi Raju Surampudi http://arxiv.org/pdf/2403.14484v1 null
2024-03-21 CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers CathFlow:使用光流和变压器对介入超声中的导管进行自监督分割 Alex Ranne, Liming Kuang, Yordanka Velikova, Nassir Navab, Ferdinando Rodriguez y Baena http://arxiv.org/pdf/2403.14465v1 null
2024-03-21 Analysing Diffusion Segmentation for Medical Images 分析医学图像的扩散分割 Mathias Öttl, Siyuan Mei, Frauke Wilm, Jana Steenpass, Matthias Rübner, Arndt Hartmann, Matthias Beckmann, Peter Fasching, Andreas Maier, Ramona Erber, et.al. http://arxiv.org/pdf/2403.14440v1 null
2024-03-21 Raw Instinct: Trust Your Classifiers and Skip the Conversion 原始本能:相信您的分类器并跳过转换 Christos Kantas, Bjørk Antoniussen, Mathias V. Andersen, Rasmus Munksø, Shobhit Kotnala, Simon B. Jensen, Andreas Møgelmose, Lau Nørgaard, Thomas B. Moeslund http://arxiv.org/pdf/2403.14439v1 null
2024-03-21 Biased Binary Attribute Classifiers Ignore the Majority Classes 有偏差的二元属性分类器忽略大多数类 Xinyi Zhang, Johanna Sophie Bieri, Manuel Günther http://arxiv.org/pdf/2403.14435v1 null
2024-03-21 Tensor network compressibility of convolutional models 卷积模型的张量网络可压缩性 Sukhbinder Singh, Saeed S. Jahromi, Roman Orus http://arxiv.org/pdf/2403.14379v1 null
2024-03-21 Varroa destructor detection on honey bees using hyperspectral imagery 使用高光谱图像检测蜜蜂瓦螨破坏者 Zina-Sabrina Duma, Tomas Zemcik, Simon Bilik, Tuomas Sihvonen, Peter Honec, Satu-Pia Reinikainen, Karel Horak http://arxiv.org/pdf/2403.14359v1 null
2024-03-21 LDTR: Transformer-based Lane Detection with Anchor-chain Representation LDTR:具有锚链表示的基于变压器的车道检测 Zhongyu Yang, Chen Shen, Wei Shao, Tengfei Xing, Runbo Hu, Pengfei Xu, Hua Chai, Ruini Xue http://arxiv.org/pdf/2403.14354v1 null
2024-03-21 Annotation-Efficient Polyp Segmentation via Active Learning 通过主动学习进行注释高效的息肉分割 Duojun Huang, Xinyu Xiong, De-Jun Fan, Feng Gao, Xiao-Jian Wu, Guanbin Li http://arxiv.org/pdf/2403.14350v1 null
2024-03-21 Towards Efficient Information Fusion: Concentric Dual Fusion Attention Based Multiple Instance Learning for Whole Slide Images 迈向高效信息融合:基于同心双融合注意力的整个幻灯片图像的多实例学习 Yujian Liu, Ruoxuan Wu, Xinjie Shen, Zihuang Lu, Lingyu Liang, Haiyu Zhou, Shipu Xu, Shaoai Cai, Shidang Xu http://arxiv.org/pdf/2403.14346v1 null
2024-03-21 FFT-based Selection and Optimization of Statistics for Robust Recognition of Severely Corrupted Images 基于 FFT 的统计选择和优化,用于严重损坏图像的鲁棒识别 Elena Camuffo, Umberto Michieli, Jijoong Moon, Daehyun Kim, Mete Ozay http://arxiv.org/pdf/2403.14335v1 null
2024-03-21 Exosense: A Vision-Centric Scene Understanding System For Safe Exoskeleton Navigation Exosense:用于安全外骨骼导航的以视觉为中心的场景理解系统 Jianeng Wang, Matias Mattamala, Christina Kassab, Lintong Zhang, Maurice Fallon http://arxiv.org/pdf/2403.14320v1 null
2024-03-21 A Lightweight Attention-based Deep Network via Multi-Scale Feature Fusion for Multi-View Facial Expression Recognition 通过多尺度特征融合的轻量级基于注意力的深度网络用于多视图面部表情识别 Ali Ezati, Mohammadreza Dezyani, Rajib Rana, Roozbeh Rajabi, Ahmad Ayatollahi http://arxiv.org/pdf/2403.14318v1 null
2024-03-21 Impact Assessment of Missing Data in Model Predictions for Earth Observation Applications 地球观测应用模型预测中缺失数据的影响评估 Francisco Mena, Diego Arenas, Marcela Charfuelan, Marlon Nuske, Andreas Dengel http://arxiv.org/pdf/2403.14297v1 null
2024-03-21 Exploring Green AI for Audio Deepfake Detection 探索用于音频 Deepfake 检测的绿色 AI Subhajit Saha, Md Sahidullah, Swagatam Das http://arxiv.org/pdf/2403.14290v1 null
2024-03-21 Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection 场景图 ViT:端到端开放词汇视觉关系检测 Tim Salzmann, Markus Ryll, Alex Bewley, Matthias Minderer http://arxiv.org/pdf/2403.14270v1 null
2024-03-21 Safeguarding Medical Image Segmentation Datasets against Unauthorized Training via Contour- and Texture-Aware Perturbations 通过轮廓和纹理感知扰动保护医学图像分割数据集免受未经授权的训练 Xun Lin, Yi Yu, Song Xia, Jue Jiang, Haoran Wang, Zitong Yu, Yizhong Liu, Ying Fu, Shuai Wang, Wenzhong Tang, et.al. http://arxiv.org/pdf/2403.14250v1 null
2024-03-21 ResNet101 and DAE for Enhance Quality and Classification Accuracy in Skin Cancer Imaging ResNet101 和 DAE 用于提高皮肤癌成像的质量和分类准确性 Sibasish Dhibar http://arxiv.org/pdf/2403.14248v1 null
2024-03-21 RG-CAT: Detection Pipeline and Catalogue of Radio Galaxies in the EMU Pilot Survey RG-CAT:EMU 试点巡天中射电星系的探测管道和目录 Nikhel Gupta, Ray P. Norris, Zeeshan Hayder, Minh Huynh, Lars Petersson, X. Rosalind Wang, Andrew M. Hopkins, Heinz Andernach, Yjan Gordon, Simone Riggi, et.al. http://arxiv.org/pdf/2403.14235v1 null
2024-03-21 SoftPatch: Unsupervised Anomaly Detection with Noisy Data SoftPatch:使用噪声数据进行无监督异常检测 Xi Jiang, Ying Chen, Qiang Nie, Yong Liu, Jianlin Liu, Bin-Bin Gao, Jun Liu, Chengjie Wang, Feng Zheng http://arxiv.org/pdf/2403.14233v1 null
2024-03-21 Toward Multi-class Anomaly Detection: Exploring Class-aware Unified Model against Inter-class Interference 面向多类异常检测:探索针对类间干扰的类感知统一模型 Xi Jiang, Ying Chen, Qiang Nie, Jianlin Liu, Yong Liu, Chengjie Wang, Feng Zheng http://arxiv.org/pdf/2403.14213v1 null
2024-03-21 PECI-Net: Bolus segmentation from video fluoroscopic swallowing study images using preprocessing ensemble and cascaded inference PECI-Net:使用预处理集成和级联推理对视频透视吞咽研究图像进行团注分割 Dougho Park, Younghun Kim, Harim Kang, Junmyeoung Lee, Jinyoung Choi, Taeyeon Kim, Sangeok Lee, Seokil Son, Minsol Kim, Injung Kim http://arxiv.org/pdf/2403.14191v1 null
2024-03-21 Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding 静动态统一网络:视频接地的高效时域过滤 Jingjing Hu, Dan Guo, Kun Li, Zhan Si, Xun Yang, Xiaojun Chang, Meng Wang http://arxiv.org/pdf/2403.14174v1 null
2024-03-21 Learning Decomposable and Debiased Representations via Attribute-Centric Information Bottlenecks 通过以属性为中心的信息瓶颈学习可分解和有偏差的表示 Jinyung Hong, Eun Som Jeon, Changhoon Kim, Keun Hee Park, Utkarsh Nath, Yezhou Yang, Pavan Turaga, Theodore P. Pavlic http://arxiv.org/pdf/2403.14140v1 null
2024-03-21 Evidential Semantic Mapping in Off-road Environments with Uncertainty-aware Bayesian Kernel Inference 使用不确定性感知贝叶斯核推理在越野环境中进行证据语义映射 Junyoung Kim, Junwon Seo, Jihong Min http://arxiv.org/pdf/2403.14138v1 null
2024-03-21 Improving Image Classification Accuracy through Complementary Intra-Class and Inter-Class Mixup 通过互补的类内和类间混合提高图像分类精度 Ye Xu, Ya Gao, Xiaorong Qiu, Yang Chen, Ying Ji http://arxiv.org/pdf/2403.14137v1 null
2024-03-21 3D Object Detection from Point Cloud via Voting Step Diffusion 通过投票步骤扩散从点云检测 3D 对象 Haoran Hou, Mingtao Feng, Zijie Wu, Weisheng Dong, Qing Zhu, Yaonan Wang, Ajmal Mian http://arxiv.org/pdf/2403.14133v1 null
2024-03-21 Soft Masked Transformer for Point Cloud Processing with Skip Attention-Based Upsampling 用于点云处理的软掩模变压器,具有基于跳过注意力的上采样 Yong He, Hongshan Yu, Muhammad Ibrahim, Xiaoyan Liu, Tongjia Chen, Anwaar Ulhaq, Ajmal Mian http://arxiv.org/pdf/2403.14124v1 null
2024-03-21 Training point-based deep learning networks for forest segmentation with synthetic data 使用合成数据训练基于点的深度学习网络进行森林分割 Francisco Raverta Capua, Juan Schandin, Pablo De Cristóforis http://arxiv.org/pdf/2403.14115v1 null
2024-03-21 Test-time Similarity Modification for Person Re-identification toward Temporal Distribution Shift 针对时间分布转移的人员重新识别的测试时相似性修改 Kazuki Adachi, Shohei Enomoto, Taku Sasaki, Shin'ya Yamaguchi http://arxiv.org/pdf/2403.14114v1 null
2024-03-21 Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition 用于全景活动识别的时空接近感知双路径模型 Sumin Lee, Yooseung Wang, Sangmin Woo, Changick Kim http://arxiv.org/pdf/2403.14113v1 null
2024-03-21 MaskSAM: Towards Auto-prompt SAM with Mask Classification for Medical Image Segmentation MaskSAM:针对医学图像分割具有掩模分类的自动提示 SAM Bin Xie, Hao Tang, Bin Duan, Dawen Cai, Yan Yan http://arxiv.org/pdf/2403.14103v1 null
2024-03-21 Unsupervised Intrinsic Image Decomposition with LiDAR Intensity Enhanced Training 利用 LiDAR 强度增强训练进行无监督本征图像分解 Shogo Sato, Takuhiro Kaneko, Kazuhiko Murasaki, Taiga Yoshida, Ryuichi Tanida, Akisato Kimura http://arxiv.org/pdf/2403.14089v1 null
2024-03-21 Surface Reconstruction from Point Clouds via Grid-based Intersection Prediction 通过基于网格的交叉点预测从点云重建表面 Hui Tian, Kai Xu http://arxiv.org/pdf/2403.14085v1 null
2024-03-21 EventDance: Unsupervised Source-free Cross-modal Adaptation for Event-based Object Recognition EventDance:用于基于事件的对象识别的无监督无源跨模式适应 Xu Zheng, Lin Wang http://arxiv.org/pdf/2403.14082v1 null
2024-03-21 Semantics from Space: Satellite-Guided Thermal Semantic Segmentation Annotation for Aerial Field Robots 来自太空的语义:航空领域机器人的卫星引导热语义分割注释 Connor Lee, Saraswati Soedarmadji, Matthew Anderson, Anthony J. Clark, Soon-Jo Chung http://arxiv.org/pdf/2403.14056v1 null

图像理解

Publish Date Title Title_CN Authors PDF Code
2024-03-21 Enhancing Historical Image Retrieval with Compositional Cues 通过构图线索增强历史图像检索 Tingyu Lin, Robert Sablatnig http://arxiv.org/pdf/2403.14287v1 null

LLM

Publish Date Title Title_CN Authors PDF Code
2024-03-21 Detoxifying Large Language Models via Knowledge Editing 通过知识编辑消除大型语言模型的毒害 Mengru Wang, Ningyu Zhang, Ziwen Xu, Zekun Xi, Shumin Deng, Yunzhi Yao, Qishen Zhang, Linyi Yang, Jindong Wang, Huajun Chen http://arxiv.org/pdf/2403.14472v1 link
2024-03-21 Less but Better: Enabling Generalized Zero-shot Learning Towards Unseen Domains by Intrinsic Learning from Redundant LLM Semantics 更少但更好:通过冗余 LLM 语义的内在学习实现对未见领域的广义零样本学习 Jiaqi Yue, Jiancheng Zhao, Chunhui Zhao http://arxiv.org/pdf/2403.14362v1 null

Transformer

Publish Date Title Title_CN Authors PDF Code
2024-03-21 AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation AdaIR:通过频率挖掘和调制进行自适应一体化图像恢复 Yuning Cui, Syed Waqas Zamir, Salman Khan, Alois Knoll, Mubarak Shah, Fahad Shahbaz Khan http://arxiv.org/pdf/2403.14614v1 null
2024-03-21 View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network 用于空地摄像机网络下人员重识别的视图解耦变压器 Quan Zhang, Lei Wang, Vishal M. Patel, Xiaohua Xie, Jianhuang Lai http://arxiv.org/pdf/2403.14513v1 null
2024-03-21 RoDLA: Benchmarking the Robustness of Document Layout Analysis Models RoDLA:文档布局分析模型的稳健性基准测试 Yufan Chen, Jiaming Zhang, Kunyu Peng, Junwei Zheng, Ruiping Liu, Philip Torr, Rainer Stiefelhagen http://arxiv.org/pdf/2403.14442v1 null
2024-03-21 SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field SurroundSDF:基于有符号距离场的隐式 3D 场景理解 Lizhe Liu, Bohua Wang, Hongwei Xie, Daqi Liu, Li Liu, Zhiqiang Tian, Kuiyuan Yang, Bing Wang http://arxiv.org/pdf/2403.14366v1 null
2024-03-21 On the Concept Trustworthiness in Concept Bottleneck Models 概念瓶颈模型中的概念可信度研究 Qihan Huang, Jie Song, Jingwen Hu, Haofei Zhang, Yong Wang, Mingli Song http://arxiv.org/pdf/2403.14349v1 null
2024-03-21 $\nabla τ$: Gradient-based and Task-Agnostic machine Unlearning $\nabla τ$:基于梯度和任务无关的机器取消学习 Daniel Trippa, Cesare Campagnano, Maria Sofia Bucarelli, Gabriele Tolomei, Fabrizio Silvestri http://arxiv.org/pdf/2403.14339v1 null
2024-03-21 CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing CFPL-FAS:通用人脸反欺骗的免费即时学习 Ajian Liu, Shuai Xue, Jianwen Gan, Jun Wan, Yanyan Liang, Jiankang Deng, Sergio Escalera, Zhen Lei http://arxiv.org/pdf/2403.14333v1 null
2024-03-21 SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks SpikingResformer:在尖峰神经网络中桥接 ResNet 和 Vision Transformer Xinyu Shi, Zecheng Hao, Zhaofei Yu http://arxiv.org/pdf/2403.14302v1 null
2024-03-21 Weak Supervision with Arbitrary Single Frame for Micro- and Macro-expression Spotting 任意单帧微表情和宏观表情识别的弱监督 Wang-Wang Yu, Xian-Shi Zhang, Fu-Ya Luo, Yijun Cao, Kai-Fu Yang, Hong-Mei Yan, Yong-Jie Li http://arxiv.org/pdf/2403.14240v1 null
2024-03-21 Harmonizing Visual and Textual Embeddings for Zero-Shot Text-to-Image Customization 协调视觉和文本嵌入以实现零样本文本到图像的定制 Yeji Song, Jimyeong Kim, Wonhark Park, Wonsik Shin, Wonjong Rhee, Nojun Kwak http://arxiv.org/pdf/2403.14155v1 null
2024-03-21 External Knowledge Enhanced 3D Scene Generation from Sketch 外部知识增强了从草图生成 3D 场景的能力 Zijie Wu, Mingtao Feng, Yaonan Wang, He Xie, Weisheng Dong, Bo Miao, Ajmal Mian http://arxiv.org/pdf/2403.14121v1 null
2024-03-21 C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion C-TPT:通过文本特征分散对视觉语言模型进行校准测试时提示调整 Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee, Mark Hasegawa-Johnson, Yingzhen Li, Chang D. Yoo http://arxiv.org/pdf/2403.14119v1 null
2024-03-21 Existence Is Chaos: Enhancing 3D Human Motion Prediction with Uncertainty Consideration 存在就是混沌:考虑不确定性增强 3D 人体运动预测 Zhihao Wang, Yulin Zhou, Ningyu Zhang, Xiaosong Yang, Jun Xiao, Zhao Wang http://arxiv.org/pdf/2403.14104v1 null

3D/CG

Publish Date Title Title_CN Authors PDF Code
2024-03-21 Zero-Shot Multi-Object Shape Completion 零样本多对象形状完成 Shun Iwase, Katherine Liu, Vitor Guizilini, Adrien Gaidon, Kris Kitani, Rares Ambrus, Sergey Zakharov http://arxiv.org/pdf/2403.14628v1 null
2024-03-21 Explorative Inbetweening of Time and Space 时间与空间的探索 Haiwen Feng, Zheng Ding, Zhihao Xia, Simon Niklaus, Victoria Abrevaya, Michael J. Black, Xuaner Zhang http://arxiv.org/pdf/2403.14611v1 null
2024-03-21 Visibility-Aware Keypoint Localization for 6DoF Object Pose Estimation 用于 6DoF 物体姿态估计的可见性感知关键点定位 Ruyi Lian, Haibin Ling http://arxiv.org/pdf/2403.14559v1 null
2024-03-21 Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset 从机器人的角度探索 3D 人体姿势估计和预测:HARPER 数据集 Andrea Avogaro. Andrea Toaiari, Federico Cunico, Xiangmin Xu, Haralambos Dafas, Alessandro Vinciarelli, Emma Li, Marco Cristani http://arxiv.org/pdf/2403.14447v1 null
2024-03-21 Enabling Visual Composition and Animation in Unsupervised Video Generation 在无监督视频生成中启用视觉合成和动画 Aram Davtyan, Sepehr Sameni, Björn Ommer, Paolo Favaro http://arxiv.org/pdf/2403.14368v1 null
2024-03-21 Volumetric Environment Representation for Vision-Language Navigation 视觉语言导航的体积环境表示 Rui Liu, Wenguan Wang, Yi Yang http://arxiv.org/pdf/2403.14158v1 null

各类学习方式

Publish Date Title Title_CN Authors PDF Code
2024-03-21 GLC++: Source-Free Universal Domain Adaptation through Global-Local Clustering and Contrastive Affinity Learning GLC++:通过全局局部聚类和对比亲和学习进行无源通用域适应 Sanqing Qu, Tianpei Zou, Florian Röhrbein, Cewu Lu, Guang Chen, Dacheng Tao, Changjun Jiang http://arxiv.org/pdf/2403.14410v1 null
2024-03-21 Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 释放未标记的数据:跨视图地理定位的范例 Guopeng Li, Ming Qian, Gui-Song Xia http://arxiv.org/pdf/2403.14198v1 null
2024-03-21 Text-Enhanced Data-free Approach for Federated Class-Incremental Learning 用于联邦类增量学习的文本增强无数据方法 Minh-Tuan Tran, Trung Le, Xuan-May Le, Mehrtash Harandi, Dinh Phung http://arxiv.org/pdf/2403.14101v1 null

其他

Publish Date Title Title_CN Authors PDF Code
2024-03-21 Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion Videoshop:使用噪声外推扩散反转进行本地化语义视频编辑 Xiang Fan, Anand Bhattad, Ranjay Krishna http://arxiv.org/pdf/2403.14617v1 null
2024-03-21 Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning 分层文本到视觉自我监督对齐以改进组织病理学表示学习 Hasindri Watawana, Kanchana Ranasinghe, Tariq Mahmood, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan http://arxiv.org/pdf/2403.14616v1 null
2024-03-21 MyVLM: Personalizing VLMs for User-Specific Queries MyVLM:针对特定于用户的查询个性化 VLM Yuval Alaluf, Elad Richardson, Sergey Tulyakov, Kfir Aberman, Daniel Cohen-Or http://arxiv.org/pdf/2403.14599v1 null
2024-03-21 Implicit Style-Content Separation using B-LoRA 使用 B-LoRA 隐式风格内容分离 Yarden Frenkel, Yael Vinker, Ariel Shamir, Daniel Cohen-Or http://arxiv.org/pdf/2403.14572v1 null
2024-03-21 DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video DINO-Tracker:驯服 DINO,在单个视频中进行自我监督点跟踪 Narek Tumanyan, Assaf Singer, Shai Bagon, Tali Dekel http://arxiv.org/pdf/2403.14548v1 null
2024-03-21 AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks AnyV2V:适用于任何视频到视频编辑任务的即插即用框架 Max Ku, Cong Wei, Weiming Ren, Huan Yang, Wenhu Chen http://arxiv.org/pdf/2403.14468v1 null
2024-03-21 SyncTweedies: A General Generative Framework Based on Synchronized Diffusions SyncTweedies:基于同步扩散的通用生成框架 Jaihoon Kim, Juil Koo, Kyeongmin Yeo, Minhyuk Sung http://arxiv.org/pdf/2403.14370v1 null
2024-03-21 Neural Network-Based Processing and Reconstruction of Compromised Biophotonic Image Data 基于神经网络的受损生物光子图像数据处理和重建 Michael John Fanous, Paloma Casteleiro Costa, Cagatay Isil, Luzhe Huang, Aydogan Ozcan http://arxiv.org/pdf/2403.14324v1 null
2024-03-21 HySim: An Efficient Hybrid Similarity Measure for Patch Matching in Image Inpainting HySim:图像修复中补丁匹配的高效混合相似度测量 Saad Noufel, Nadir Maaroufi, Mehdi Najib, Mohamed Bakhouya http://arxiv.org/pdf/2403.14292v1 null
2024-03-21 Assessing the Robustness of Spectral Clustering for Deep Speaker Diarization 评估深度说话人二值化的谱聚类的鲁棒性 Nikhil Raghav, Md Sahidullah http://arxiv.org/pdf/2403.14286v1 null
2024-03-21 A Framework for Portrait Stylization with Skin-Tone Awareness and Nudity Identification 具有肤色感知和裸体识别的肖像风格化框架 Seungkwon Kim, Sangyeon Kim, Seung-Hun Nam http://arxiv.org/pdf/2403.14264v1 null
2024-03-21 Debiasing surgeon: fantastic weights and how to find them 去偏外科医生:奇妙的权重以及如何找到它们 Rémi Nahon, Ivan Luiz De Moura Matos, Van-Tam Nguyen, Enzo Tartaglione http://arxiv.org/pdf/2403.14200v1 null
2024-03-21 Science based AI model certification for untrained operational environments with application in traffic state estimation 基于科学的人工智能模型认证,适用于未经训练的操作环境,并应用于交通状态估计 Daryl Mupupuni, Anupama Guntu, Liang Hong, Kamrul Hasan, Leehyun Keel http://arxiv.org/pdf/2403.14093v1 null