Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-03-21 | Simplified Diffusion Schrödinger Bridge | 简化扩散薛定谔桥 | Zhicong Tang, Tiankai Hang, Shuyang Gu, Dong Chen, Baining Guo | http://arxiv.org/pdf/2403.14623v1 | null |
2024-03-21 | GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation | GRM:用于高效 3D 重建和生成的大型高斯重建模型 | Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, Gordon Wetzstein | http://arxiv.org/pdf/2403.14621v1 | null |
2024-03-21 | ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition | ClusteringSDF:用于 3D 分解的自组织神经隐式曲面 | Tianhao Wu, Chuanxia Zheng, Tat-Jen Cham, Qianyi Wu | http://arxiv.org/pdf/2403.14619v1 | null |
2024-03-21 | DreamReward: Text-to-3D Generation with Human Preference | DreamReward:根据人类偏好生成文本到 3D | Junliang Ye, Fangfu Liu, Qixiu Li, Zhengyi Wang, Yikai Wang, Xinzhou Wang, Yueqi Duan, Jun Zhu | http://arxiv.org/pdf/2403.14613v1 | null |
2024-03-21 | ReNoise: Real Image Inversion Through Iterative Noising | ReNoise:通过迭代噪声进行真实图像反转 | Daniel Garibi, Or Patashnik, Andrey Voynov, Hadar Averbuch-Elor, Daniel Cohen-Or | http://arxiv.org/pdf/2403.14602v1 | null |
2024-03-21 | Object-Centric Domain Randomization for 3D Shape Reconstruction in the Wild | 用于野外 3D 形状重建的以对象为中心的域随机化 | Junhyeong Cho, Kim Youwang, Hunmin Yang, Tae-Hyun Oh | http://arxiv.org/pdf/2403.14539v1 | null |
2024-03-21 | HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression | HAC:用于 3D 高斯泼溅压缩的哈希网格辅助上下文 | Yihang Chen, Qianyi Wu, Jianfei Cai, Mehrtash Harandi, Weiyao Lin | http://arxiv.org/pdf/2403.14530v1 | null |
2024-03-21 | Click to Grasp: Zero-Shot Precise Manipulation via Visual Diffusion Descriptors | 点击掌握:通过视觉扩散描述符进行零射击精确操作 | Nikolaos Tsagkas, Jack Rome, Subramanian Ramamoorthy, Oisin Mac Aodha, Chris Xiaoxuan Lu | http://arxiv.org/pdf/2403.14526v1 | null |
2024-03-21 | Denoising Diffusion Models for 3D Healthy Brain Tissue Inpainting | 用于 3D 健康脑组织修复的去噪扩散模型 | Alicia Durrer, Julia Wolleb, Florentin Bieder, Paul Friedrich, Lester Melie-Garcia, Mario Ocampo-Pineda, Cosmin I. Bercea, Ibrahim E. Hamamci, Benedikt Wiestler, Marie Piraud, et.al. | http://arxiv.org/pdf/2403.14499v1 | null |
2024-03-21 | Style-Extracting Diffusion Models for Semi-Supervised Histopathology Segmentation | 用于半监督组织病理学分割的风格提取扩散模型 | Mathias Öttl, Frauke Wilm, Jana Steenpass, Jingna Qiu, Matthias Rübner, Arndt Hartmann, Matthias Beckmann, Peter Fasching, Andreas Maier, Ramona Erber, et.al. | http://arxiv.org/pdf/2403.14429v1 | null |
2024-03-21 | DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning | DP-RDM:无需微调即可使扩散模型适应私有域 | Jonathan Lebensold, Maziar Sanjabi, Pietro Astolfi, Adriana Romero-Soriano, Kamalika Chaudhuri, Mike Rabbat, Chuan Guo | http://arxiv.org/pdf/2403.14421v1 | null |
2024-03-21 | OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation | OA-CNN:用于 3D 语义分割的全自适应稀疏 CNN | Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen, Hengshuang Zhao, Zhuotao Tian, Jiaya Jia | http://arxiv.org/pdf/2403.14418v1 | null |
2024-03-21 | A Bag of Tricks for Few-Shot Class-Incremental Learning | 少样本类增量学习的一大堆技巧 | Shuvendu Roy, Chunjong Park, Aldi Fahrezi, Ali Etemad | http://arxiv.org/pdf/2403.14392v1 | null |
2024-03-21 | InfNeRF: Towards Infinite Scale NeRF Rendering with O(log n) Space Complexity | InfNeRF:以 O(log n) 空间复杂度实现无限规模 NeRF 渲染 | Jiabin Liang, Lanqing Zhang, Zhuoran Zhao, Xiangyu Xu | http://arxiv.org/pdf/2403.14376v1 | null |
2024-03-21 | Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models | 具有令牌优化的开放词汇注意力图用于扩散模型中的语义分割 | Pablo Marcos-Manchón, Roberto Alcover-Couso, Juan C. SanMiguel, Jose M. Martínez | http://arxiv.org/pdf/2403.14291v1 | null |
2024-03-21 | Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation | Zero123-6D:用于 RGB 类别级 6D 姿势估计的零样本新颖视图合成 | Francesco Di Felice, Alberto Remus, Stefano Gasperini, Benjamin Busam, Lionel Ott, Federico Tombari, Roland Siegwart, Carlo Alberto Avizzano | http://arxiv.org/pdf/2403.14279v1 | null |
2024-03-21 | Diffusion Models with Ensembled Structure-Based Anomaly Scoring for Unsupervised Anomaly Detection | 用于无监督异常检测的具有基于集成结构的异常评分的扩散模型 | Finn Behrendt, Debayan Bhattacharya, Lennart Maack, Julia Krüger, Roland Opfer, Robin Mieling, Alexander Schlaefer | http://arxiv.org/pdf/2403.14262v1 | null |
2024-03-21 | StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN | StyleCineGAN:使用预先训练的 StyleGAN 生成景观电影图片 | Jongwoo Choi, Kwanggyoon Seo, Amirsaman Ashtari, Junyong Noh | http://arxiv.org/pdf/2403.14186v1 | null |
2024-03-21 | Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians | Mini-Splatting:用有限数量的高斯表示场景 | Guangchi Fang, Bing Wang | http://arxiv.org/pdf/2403.14166v1 | null |
2024-03-21 | Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition | 通过内容帧运动潜在分解的高效视频扩散模型 | Sihyun Yu, Weili Nie, De-An Huang, Boyi Li, Jinwoo Shin, Anima Anandkumar | http://arxiv.org/pdf/2403.14148v1 | null |
2024-03-21 | Powerful Lossy Compression for Noisy Images | 针对噪声图像的强大有损压缩 | Shilv Cai, Xiaoguo Liang, Shuning Cao, Luxin Yan, Sheng Zhong, Liqun Chen, Xu Zou | http://arxiv.org/pdf/2403.14135v1 | null |
2024-03-21 | QSMDiff: Unsupervised 3D Diffusion Models for Quantitative Susceptibility Mapping | QSMDiff:用于定量磁化率绘图的无监督 3D 扩散模型 | Zhuang Xiong, Wei Jiang, Yang Gao, Feng Liu, Hongfu Sun | http://arxiv.org/pdf/2403.14070v1 | null |
2024-03-21 | LeFusion: Synthesizing Myocardial Pathology on Cardiac MRI via Lesion-Focus Diffusion Models | LeFusion:通过病变焦点扩散模型在心脏 MRI 上综合心肌病理学 | Hantao Zhang, Jiancheng Yang, Shouhong Wan, Pascal Fua | http://arxiv.org/pdf/2403.14066v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-03-21 | MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? | MathVerse:您的多模式法学硕士能否真正看到视觉数学问题中的图表? | Renrui Zhang, Dongzhi Jiang, Yichi Zhang, Haokun Lin, Ziyu Guo, Pengshuo Qiu, Aojun Zhou, Pan Lu, Kai-Wei Chang, Peng Gao, et.al. | http://arxiv.org/pdf/2403.14624v1 | null |
2024-03-21 | Language Repository for Long Video Understanding | 长视频理解语言库 | Kumara Kahatapitiya, Kanchana Ranasinghe, Jongwoo Park, Michael S. Ryoo | http://arxiv.org/pdf/2403.14622v1 | null |
2024-03-21 | PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model | PSALM:具有大型多模态模型的像素分割 | Zheng Zhang, Yeyao Ma, Enming Zhang, Xiang Bai | http://arxiv.org/pdf/2403.14598v1 | null |
2024-03-21 | Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference | Cobra:将 Mamba 扩展到多模态大型语言模型以实现高效推理 | Han Zhao, Min Zhang, Wei Zhao, Pengxiang Ding, Siteng Huang, Donglin Wang | http://arxiv.org/pdf/2403.14520v1 | null |
2024-03-21 | Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination | Pensieve:回顾然后比较可以减轻幻视 | Dingchen Yang, Bowen Cao, Guang Chen, Changjun Jiang | http://arxiv.org/pdf/2403.14401v1 | null |
2024-03-21 | LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding | LayoutLLM:大型语言模型指令调整,以实现视觉丰富的文档理解 | Masato Fujitake | http://arxiv.org/pdf/2403.14252v1 | null |
2024-03-21 | Dermacen Analytica: A Novel Methodology Integrating Multi-Modal Large Language Models with Machine Learning in tele-dermatology | Dermacen Analytica:一种将多模态大型语言模型与远程皮肤病学机器学习相结合的新方法 | Dimitrios P. Panagoulias, Evridiki Tsoureli-Nikita, Maria Virvou, George A. Tsihrintzis | http://arxiv.org/pdf/2403.14243v1 | null |
2024-03-21 | Unsupervised Audio-Visual Segmentation with Modality Alignment | 具有模态对齐的无监督视听分割 | Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Jiangkang Deng, Xiatian Zhu | http://arxiv.org/pdf/2403.14203v1 | null |
2024-03-21 | OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation | OTSeg:零样本语义分割的多提示 Sinkhorn 注意力 | Kwanyoung Kim, Yujin Oh, Jong Chul Ye | http://arxiv.org/pdf/2403.14183v1 | null |
2024-03-21 | Leveraging Large Language Model-based Room-Object Relationships Knowledge for Enhancing Multimodal-Input Object Goal Navigation | 利用基于大语言模型的房间-对象关系知识来增强多模式输入对象目标导航 | Leyuan Sun, Asako Kanezaki, Guillaume Caron, Yusuke Yoshiyasu | http://arxiv.org/pdf/2403.14163v1 | null |
2024-03-21 | Empowering Segmentation Ability to Multi-modal Large Language Models | 增强多模态大型语言模型的细分能力 | Yuqi Yang, Peng-Tao Jiang, Jing Wang, Hao Zhang, Kai Zhao, Jinwei Chen, Bo Li | http://arxiv.org/pdf/2403.14141v1 | null |
2024-03-21 | Leveraging Thermal Modality to Enhance Reconstruction in Low-Light Conditions | 利用热模态增强弱光条件下的重建 | Jiacong Xu, Mingqian Liao, K Ram Prabhakar, Vishal M. Patel | http://arxiv.org/pdf/2403.14053v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-03-21 | CombiNeRF: A Combination of Regularization Techniques for Few-Shot Neural Radiance Field View Synthesis | CombiNeRF:用于少样本神经辐射场视图合成的正则化技术组合 | Matteo Bonotto, Luigi Sarrocco, Daniele Evangelista, Marco Imperoli, Alberto Pretto | http://arxiv.org/pdf/2403.14412v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-03-21 | MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images | MVSplat:稀疏多视图图像的高效 3D 高斯分布 | Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai | http://arxiv.org/pdf/2403.14627v1 | null |
2024-03-21 | Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering | 高斯磨砂:具有实时渲染的可编辑复杂辐射场 | Antoine Guédon, Vincent Lepetit | http://arxiv.org/pdf/2403.14554v1 | null |
2024-03-21 | Isotropic Gaussian Splatting for Real-Time Radiance Field Rendering | 用于实时辐射场渲染的各向同性高斯喷射 | Yuanhao Gong, Lantao Yu, Guanghui Yue | http://arxiv.org/pdf/2403.14244v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-03-21 | Learning to Project for Cross-Task Knowledge Distillation | 学习项目以进行跨任务知识蒸馏 | Dylan Auty, Roy Miles, Benedikt Kolbeinsson, Krystian Mikolajczyk | http://arxiv.org/pdf/2403.14494v1 | null |
2024-03-21 | Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels | 标签不足的开放式视频问答的排名蒸馏 | Tianming Liang, Chaolei Tan, Beihao Xia, Wei-Shi Zheng, Jian-Fang Hu | http://arxiv.org/pdf/2403.14430v1 | null |
2024-03-21 | Accelerating ViT Inference on FPGA through Static and Dynamic Pruning | 通过静态和动态修剪加速 FPGA 上的 ViT 推理 | Dhruv Parikh, Shouyi Li, Bingyi Zhang, Rajgopal Kannan, Carl Busart, Viktor Prasanna | http://arxiv.org/pdf/2403.14047v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-03-21 | ODTFormer: Efficient Obstacle Detection and Tracking with Stereo Cameras Based on Transformer | ODTFormer:基于 Transformer 的立体相机高效障碍物检测与跟踪 | Tianye Ding, Hongyu Li, Huaizu Jiang | http://arxiv.org/pdf/2403.14626v1 | null |
2024-03-21 | LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors | LiFT:密集 ViT 描述符的极其简单的轻量级特征转换 | Saksham Suri, Matthew Walmer, Kamal Gupta, Abhinav Shrivastava | http://arxiv.org/pdf/2403.14625v1 | null |
2024-03-21 | T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy | T-Rex2:通过文本-视觉提示协同实现通用对象检测 | Qing Jiang, Feng Li, Zhaoyang Zeng, Tianhe Ren, Shilong Liu, Lei Zhang | http://arxiv.org/pdf/2403.14610v1 | null |
2024-03-21 | VXP: Voxel-Cross-Pixel Large-scale Image-LiDAR Place Recognition | VXP:体素-跨像素大尺寸图像-LiDAR地点识别 | Yun-Jin Li, Mariia Gladkova, Yan Xia, Rui Wang, Daniel Cremers | http://arxiv.org/pdf/2403.14594v1 | null |
2024-03-21 | Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer | 代币转换很重要:对 Vision Transformer 进行忠实的事后解释 | Junyi Wu, Bin Duan, Weitai Kang, Hao Tang, Yan Yan | http://arxiv.org/pdf/2403.14552v1 | null |
2024-03-21 | Estimating Physical Information Consistency of Channel Data Augmentation for Remote Sensing Images | 估计遥感图像通道数据增强的物理信息一致性 | Tom Burgert, Begüm Demir | http://arxiv.org/pdf/2403.14547v1 | null |
2024-03-21 | Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets | 资源贫乏数据集中跨数据集隔离手语识别的迁移学习 | Ahmet Alp Kindiroglu, Ozgur Kara, Ogulcan Ozdemir, Lale Akarun | http://arxiv.org/pdf/2403.14534v1 | null |
2024-03-21 | Invisible Needle Detection in Ultrasound: Leveraging Mechanism-Induced Vibration | 超声波中的隐形针检测:利用机制引起的振动 | Chenyang Li, Dianye Huang, Angelos Karlas, Nassir Navab, Zhongliang Jiang | http://arxiv.org/pdf/2403.14523v1 | null |
2024-03-21 | MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection | MULDE:通过去噪分数匹配进行多尺度对数密度估计,用于视频异常检测 | Jakub Micorek, Horst Possegger, Dominik Narnhofer, Horst Bischof, Mateusz Kozinski | http://arxiv.org/pdf/2403.14497v1 | null |
2024-03-21 | Adversary-Robust Graph-Based Learning of WSIs | 基于对抗鲁棒图的 WSI 学习 | Saba Heidari Gheshlaghi, Milan Aryal, Nasim Yahyasoltani, Masoud Ganji | http://arxiv.org/pdf/2403.14489v1 | null |
2024-03-21 | DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing | DesignEdit:多层潜在分解和融合,实现统一准确的图像编辑 | Yueru Jia, Yuhui Yuan, Aosong Cheng, Chuke Wang, Ji Li, Huizhu Jia, Shanghang Zhang | http://arxiv.org/pdf/2403.14487v1 | null |
2024-03-21 | HyperGALE: ASD Classification via Hypergraph Gated Attention with Learnable Hyperedges | HyperGALE:通过具有可学习超边的超图门控注意力进行 ASD 分类 | Mehul Arora, Chirag Shantilal Jain, Lalith Bharadwaj Baru, Kamalaker Dadi, Bapi Raju Surampudi | http://arxiv.org/pdf/2403.14484v1 | null |
2024-03-21 | CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers | CathFlow:使用光流和变压器对介入超声中的导管进行自监督分割 | Alex Ranne, Liming Kuang, Yordanka Velikova, Nassir Navab, Ferdinando Rodriguez y Baena | http://arxiv.org/pdf/2403.14465v1 | null |
2024-03-21 | Analysing Diffusion Segmentation for Medical Images | 分析医学图像的扩散分割 | Mathias Öttl, Siyuan Mei, Frauke Wilm, Jana Steenpass, Matthias Rübner, Arndt Hartmann, Matthias Beckmann, Peter Fasching, Andreas Maier, Ramona Erber, et.al. | http://arxiv.org/pdf/2403.14440v1 | null |
2024-03-21 | Raw Instinct: Trust Your Classifiers and Skip the Conversion | 原始本能:相信您的分类器并跳过转换 | Christos Kantas, Bjørk Antoniussen, Mathias V. Andersen, Rasmus Munksø, Shobhit Kotnala, Simon B. Jensen, Andreas Møgelmose, Lau Nørgaard, Thomas B. Moeslund | http://arxiv.org/pdf/2403.14439v1 | null |
2024-03-21 | Biased Binary Attribute Classifiers Ignore the Majority Classes | 有偏差的二元属性分类器忽略大多数类 | Xinyi Zhang, Johanna Sophie Bieri, Manuel Günther | http://arxiv.org/pdf/2403.14435v1 | null |
2024-03-21 | Tensor network compressibility of convolutional models | 卷积模型的张量网络可压缩性 | Sukhbinder Singh, Saeed S. Jahromi, Roman Orus | http://arxiv.org/pdf/2403.14379v1 | null |
2024-03-21 | Varroa destructor detection on honey bees using hyperspectral imagery | 使用高光谱图像检测蜜蜂瓦螨破坏者 | Zina-Sabrina Duma, Tomas Zemcik, Simon Bilik, Tuomas Sihvonen, Peter Honec, Satu-Pia Reinikainen, Karel Horak | http://arxiv.org/pdf/2403.14359v1 | null |
2024-03-21 | LDTR: Transformer-based Lane Detection with Anchor-chain Representation | LDTR:具有锚链表示的基于变压器的车道检测 | Zhongyu Yang, Chen Shen, Wei Shao, Tengfei Xing, Runbo Hu, Pengfei Xu, Hua Chai, Ruini Xue | http://arxiv.org/pdf/2403.14354v1 | null |
2024-03-21 | Annotation-Efficient Polyp Segmentation via Active Learning | 通过主动学习进行注释高效的息肉分割 | Duojun Huang, Xinyu Xiong, De-Jun Fan, Feng Gao, Xiao-Jian Wu, Guanbin Li | http://arxiv.org/pdf/2403.14350v1 | null |
2024-03-21 | Towards Efficient Information Fusion: Concentric Dual Fusion Attention Based Multiple Instance Learning for Whole Slide Images | 迈向高效信息融合:基于同心双融合注意力的整个幻灯片图像的多实例学习 | Yujian Liu, Ruoxuan Wu, Xinjie Shen, Zihuang Lu, Lingyu Liang, Haiyu Zhou, Shipu Xu, Shaoai Cai, Shidang Xu | http://arxiv.org/pdf/2403.14346v1 | null |
2024-03-21 | FFT-based Selection and Optimization of Statistics for Robust Recognition of Severely Corrupted Images | 基于 FFT 的统计选择和优化,用于严重损坏图像的鲁棒识别 | Elena Camuffo, Umberto Michieli, Jijoong Moon, Daehyun Kim, Mete Ozay | http://arxiv.org/pdf/2403.14335v1 | null |
2024-03-21 | Exosense: A Vision-Centric Scene Understanding System For Safe Exoskeleton Navigation | Exosense:用于安全外骨骼导航的以视觉为中心的场景理解系统 | Jianeng Wang, Matias Mattamala, Christina Kassab, Lintong Zhang, Maurice Fallon | http://arxiv.org/pdf/2403.14320v1 | null |
2024-03-21 | A Lightweight Attention-based Deep Network via Multi-Scale Feature Fusion for Multi-View Facial Expression Recognition | 通过多尺度特征融合的轻量级基于注意力的深度网络用于多视图面部表情识别 | Ali Ezati, Mohammadreza Dezyani, Rajib Rana, Roozbeh Rajabi, Ahmad Ayatollahi | http://arxiv.org/pdf/2403.14318v1 | null |
2024-03-21 | Impact Assessment of Missing Data in Model Predictions for Earth Observation Applications | 地球观测应用模型预测中缺失数据的影响评估 | Francisco Mena, Diego Arenas, Marcela Charfuelan, Marlon Nuske, Andreas Dengel | http://arxiv.org/pdf/2403.14297v1 | null |
2024-03-21 | Exploring Green AI for Audio Deepfake Detection | 探索用于音频 Deepfake 检测的绿色 AI | Subhajit Saha, Md Sahidullah, Swagatam Das | http://arxiv.org/pdf/2403.14290v1 | null |
2024-03-21 | Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection | 场景图 ViT:端到端开放词汇视觉关系检测 | Tim Salzmann, Markus Ryll, Alex Bewley, Matthias Minderer | http://arxiv.org/pdf/2403.14270v1 | null |
2024-03-21 | Safeguarding Medical Image Segmentation Datasets against Unauthorized Training via Contour- and Texture-Aware Perturbations | 通过轮廓和纹理感知扰动保护医学图像分割数据集免受未经授权的训练 | Xun Lin, Yi Yu, Song Xia, Jue Jiang, Haoran Wang, Zitong Yu, Yizhong Liu, Ying Fu, Shuai Wang, Wenzhong Tang, et.al. | http://arxiv.org/pdf/2403.14250v1 | null |
2024-03-21 | ResNet101 and DAE for Enhance Quality and Classification Accuracy in Skin Cancer Imaging | ResNet101 和 DAE 用于提高皮肤癌成像的质量和分类准确性 | Sibasish Dhibar | http://arxiv.org/pdf/2403.14248v1 | null |
2024-03-21 | RG-CAT: Detection Pipeline and Catalogue of Radio Galaxies in the EMU Pilot Survey | RG-CAT:EMU 试点巡天中射电星系的探测管道和目录 | Nikhel Gupta, Ray P. Norris, Zeeshan Hayder, Minh Huynh, Lars Petersson, X. Rosalind Wang, Andrew M. Hopkins, Heinz Andernach, Yjan Gordon, Simone Riggi, et.al. | http://arxiv.org/pdf/2403.14235v1 | null |
2024-03-21 | SoftPatch: Unsupervised Anomaly Detection with Noisy Data | SoftPatch:使用噪声数据进行无监督异常检测 | Xi Jiang, Ying Chen, Qiang Nie, Yong Liu, Jianlin Liu, Bin-Bin Gao, Jun Liu, Chengjie Wang, Feng Zheng | http://arxiv.org/pdf/2403.14233v1 | null |
2024-03-21 | Toward Multi-class Anomaly Detection: Exploring Class-aware Unified Model against Inter-class Interference | 面向多类异常检测:探索针对类间干扰的类感知统一模型 | Xi Jiang, Ying Chen, Qiang Nie, Jianlin Liu, Yong Liu, Chengjie Wang, Feng Zheng | http://arxiv.org/pdf/2403.14213v1 | null |
2024-03-21 | PECI-Net: Bolus segmentation from video fluoroscopic swallowing study images using preprocessing ensemble and cascaded inference | PECI-Net:使用预处理集成和级联推理对视频透视吞咽研究图像进行团注分割 | Dougho Park, Younghun Kim, Harim Kang, Junmyeoung Lee, Jinyoung Choi, Taeyeon Kim, Sangeok Lee, Seokil Son, Minsol Kim, Injung Kim | http://arxiv.org/pdf/2403.14191v1 | null |
2024-03-21 | Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding | 静动态统一网络:视频接地的高效时域过滤 | Jingjing Hu, Dan Guo, Kun Li, Zhan Si, Xun Yang, Xiaojun Chang, Meng Wang | http://arxiv.org/pdf/2403.14174v1 | null |
2024-03-21 | Learning Decomposable and Debiased Representations via Attribute-Centric Information Bottlenecks | 通过以属性为中心的信息瓶颈学习可分解和有偏差的表示 | Jinyung Hong, Eun Som Jeon, Changhoon Kim, Keun Hee Park, Utkarsh Nath, Yezhou Yang, Pavan Turaga, Theodore P. Pavlic | http://arxiv.org/pdf/2403.14140v1 | null |
2024-03-21 | Evidential Semantic Mapping in Off-road Environments with Uncertainty-aware Bayesian Kernel Inference | 使用不确定性感知贝叶斯核推理在越野环境中进行证据语义映射 | Junyoung Kim, Junwon Seo, Jihong Min | http://arxiv.org/pdf/2403.14138v1 | null |
2024-03-21 | Improving Image Classification Accuracy through Complementary Intra-Class and Inter-Class Mixup | 通过互补的类内和类间混合提高图像分类精度 | Ye Xu, Ya Gao, Xiaorong Qiu, Yang Chen, Ying Ji | http://arxiv.org/pdf/2403.14137v1 | null |
2024-03-21 | 3D Object Detection from Point Cloud via Voting Step Diffusion | 通过投票步骤扩散从点云检测 3D 对象 | Haoran Hou, Mingtao Feng, Zijie Wu, Weisheng Dong, Qing Zhu, Yaonan Wang, Ajmal Mian | http://arxiv.org/pdf/2403.14133v1 | null |
2024-03-21 | Soft Masked Transformer for Point Cloud Processing with Skip Attention-Based Upsampling | 用于点云处理的软掩模变压器,具有基于跳过注意力的上采样 | Yong He, Hongshan Yu, Muhammad Ibrahim, Xiaoyan Liu, Tongjia Chen, Anwaar Ulhaq, Ajmal Mian | http://arxiv.org/pdf/2403.14124v1 | null |
2024-03-21 | Training point-based deep learning networks for forest segmentation with synthetic data | 使用合成数据训练基于点的深度学习网络进行森林分割 | Francisco Raverta Capua, Juan Schandin, Pablo De Cristóforis | http://arxiv.org/pdf/2403.14115v1 | null |
2024-03-21 | Test-time Similarity Modification for Person Re-identification toward Temporal Distribution Shift | 针对时间分布转移的人员重新识别的测试时相似性修改 | Kazuki Adachi, Shohei Enomoto, Taku Sasaki, Shin'ya Yamaguchi | http://arxiv.org/pdf/2403.14114v1 | null |
2024-03-21 | Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition | 用于全景活动识别的时空接近感知双路径模型 | Sumin Lee, Yooseung Wang, Sangmin Woo, Changick Kim | http://arxiv.org/pdf/2403.14113v1 | null |
2024-03-21 | MaskSAM: Towards Auto-prompt SAM with Mask Classification for Medical Image Segmentation | MaskSAM:针对医学图像分割具有掩模分类的自动提示 SAM | Bin Xie, Hao Tang, Bin Duan, Dawen Cai, Yan Yan | http://arxiv.org/pdf/2403.14103v1 | null |
2024-03-21 | Unsupervised Intrinsic Image Decomposition with LiDAR Intensity Enhanced Training | 利用 LiDAR 强度增强训练进行无监督本征图像分解 | Shogo Sato, Takuhiro Kaneko, Kazuhiko Murasaki, Taiga Yoshida, Ryuichi Tanida, Akisato Kimura | http://arxiv.org/pdf/2403.14089v1 | null |
2024-03-21 | Surface Reconstruction from Point Clouds via Grid-based Intersection Prediction | 通过基于网格的交叉点预测从点云重建表面 | Hui Tian, Kai Xu | http://arxiv.org/pdf/2403.14085v1 | null |
2024-03-21 | EventDance: Unsupervised Source-free Cross-modal Adaptation for Event-based Object Recognition | EventDance:用于基于事件的对象识别的无监督无源跨模式适应 | Xu Zheng, Lin Wang | http://arxiv.org/pdf/2403.14082v1 | null |
2024-03-21 | Semantics from Space: Satellite-Guided Thermal Semantic Segmentation Annotation for Aerial Field Robots | 来自太空的语义:航空领域机器人的卫星引导热语义分割注释 | Connor Lee, Saraswati Soedarmadji, Matthew Anderson, Anthony J. Clark, Soon-Jo Chung | http://arxiv.org/pdf/2403.14056v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-03-21 | Enhancing Historical Image Retrieval with Compositional Cues | 通过构图线索增强历史图像检索 | Tingyu Lin, Robert Sablatnig | http://arxiv.org/pdf/2403.14287v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-03-21 | Detoxifying Large Language Models via Knowledge Editing | 通过知识编辑消除大型语言模型的毒害 | Mengru Wang, Ningyu Zhang, Ziwen Xu, Zekun Xi, Shumin Deng, Yunzhi Yao, Qishen Zhang, Linyi Yang, Jindong Wang, Huajun Chen | http://arxiv.org/pdf/2403.14472v1 | link |
2024-03-21 | Less but Better: Enabling Generalized Zero-shot Learning Towards Unseen Domains by Intrinsic Learning from Redundant LLM Semantics | 更少但更好:通过冗余 LLM 语义的内在学习实现对未见领域的广义零样本学习 | Jiaqi Yue, Jiancheng Zhao, Chunhui Zhao | http://arxiv.org/pdf/2403.14362v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-03-21 | AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation | AdaIR:通过频率挖掘和调制进行自适应一体化图像恢复 | Yuning Cui, Syed Waqas Zamir, Salman Khan, Alois Knoll, Mubarak Shah, Fahad Shahbaz Khan | http://arxiv.org/pdf/2403.14614v1 | null |
2024-03-21 | View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network | 用于空地摄像机网络下人员重识别的视图解耦变压器 | Quan Zhang, Lei Wang, Vishal M. Patel, Xiaohua Xie, Jianhuang Lai | http://arxiv.org/pdf/2403.14513v1 | null |
2024-03-21 | RoDLA: Benchmarking the Robustness of Document Layout Analysis Models | RoDLA:文档布局分析模型的稳健性基准测试 | Yufan Chen, Jiaming Zhang, Kunyu Peng, Junwei Zheng, Ruiping Liu, Philip Torr, Rainer Stiefelhagen | http://arxiv.org/pdf/2403.14442v1 | null |
2024-03-21 | SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field | SurroundSDF:基于有符号距离场的隐式 3D 场景理解 | Lizhe Liu, Bohua Wang, Hongwei Xie, Daqi Liu, Li Liu, Zhiqiang Tian, Kuiyuan Yang, Bing Wang | http://arxiv.org/pdf/2403.14366v1 | null |
2024-03-21 | On the Concept Trustworthiness in Concept Bottleneck Models | 概念瓶颈模型中的概念可信度研究 | Qihan Huang, Jie Song, Jingwen Hu, Haofei Zhang, Yong Wang, Mingli Song | http://arxiv.org/pdf/2403.14349v1 | null |
2024-03-21 |
|
Daniel Trippa, Cesare Campagnano, Maria Sofia Bucarelli, Gabriele Tolomei, Fabrizio Silvestri | http://arxiv.org/pdf/2403.14339v1 | null | |
2024-03-21 | CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing | CFPL-FAS:通用人脸反欺骗的免费即时学习 | Ajian Liu, Shuai Xue, Jianwen Gan, Jun Wan, Yanyan Liang, Jiankang Deng, Sergio Escalera, Zhen Lei | http://arxiv.org/pdf/2403.14333v1 | null |
2024-03-21 | SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks | SpikingResformer:在尖峰神经网络中桥接 ResNet 和 Vision Transformer | Xinyu Shi, Zecheng Hao, Zhaofei Yu | http://arxiv.org/pdf/2403.14302v1 | null |
2024-03-21 | Weak Supervision with Arbitrary Single Frame for Micro- and Macro-expression Spotting | 任意单帧微表情和宏观表情识别的弱监督 | Wang-Wang Yu, Xian-Shi Zhang, Fu-Ya Luo, Yijun Cao, Kai-Fu Yang, Hong-Mei Yan, Yong-Jie Li | http://arxiv.org/pdf/2403.14240v1 | null |
2024-03-21 | Harmonizing Visual and Textual Embeddings for Zero-Shot Text-to-Image Customization | 协调视觉和文本嵌入以实现零样本文本到图像的定制 | Yeji Song, Jimyeong Kim, Wonhark Park, Wonsik Shin, Wonjong Rhee, Nojun Kwak | http://arxiv.org/pdf/2403.14155v1 | null |
2024-03-21 | External Knowledge Enhanced 3D Scene Generation from Sketch | 外部知识增强了从草图生成 3D 场景的能力 | Zijie Wu, Mingtao Feng, Yaonan Wang, He Xie, Weisheng Dong, Bo Miao, Ajmal Mian | http://arxiv.org/pdf/2403.14121v1 | null |
2024-03-21 | C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion | C-TPT:通过文本特征分散对视觉语言模型进行校准测试时提示调整 | Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee, Mark Hasegawa-Johnson, Yingzhen Li, Chang D. Yoo | http://arxiv.org/pdf/2403.14119v1 | null |
2024-03-21 | Existence Is Chaos: Enhancing 3D Human Motion Prediction with Uncertainty Consideration | 存在就是混沌:考虑不确定性增强 3D 人体运动预测 | Zhihao Wang, Yulin Zhou, Ningyu Zhang, Xiaosong Yang, Jun Xiao, Zhao Wang | http://arxiv.org/pdf/2403.14104v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-03-21 | Zero-Shot Multi-Object Shape Completion | 零样本多对象形状完成 | Shun Iwase, Katherine Liu, Vitor Guizilini, Adrien Gaidon, Kris Kitani, Rares Ambrus, Sergey Zakharov | http://arxiv.org/pdf/2403.14628v1 | null |
2024-03-21 | Explorative Inbetweening of Time and Space | 时间与空间的探索 | Haiwen Feng, Zheng Ding, Zhihao Xia, Simon Niklaus, Victoria Abrevaya, Michael J. Black, Xuaner Zhang | http://arxiv.org/pdf/2403.14611v1 | null |
2024-03-21 | Visibility-Aware Keypoint Localization for 6DoF Object Pose Estimation | 用于 6DoF 物体姿态估计的可见性感知关键点定位 | Ruyi Lian, Haibin Ling | http://arxiv.org/pdf/2403.14559v1 | null |
2024-03-21 | Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset | 从机器人的角度探索 3D 人体姿势估计和预测:HARPER 数据集 | Andrea Avogaro. Andrea Toaiari, Federico Cunico, Xiangmin Xu, Haralambos Dafas, Alessandro Vinciarelli, Emma Li, Marco Cristani | http://arxiv.org/pdf/2403.14447v1 | null |
2024-03-21 | Enabling Visual Composition and Animation in Unsupervised Video Generation | 在无监督视频生成中启用视觉合成和动画 | Aram Davtyan, Sepehr Sameni, Björn Ommer, Paolo Favaro | http://arxiv.org/pdf/2403.14368v1 | null |
2024-03-21 | Volumetric Environment Representation for Vision-Language Navigation | 视觉语言导航的体积环境表示 | Rui Liu, Wenguan Wang, Yi Yang | http://arxiv.org/pdf/2403.14158v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-03-21 | GLC++: Source-Free Universal Domain Adaptation through Global-Local Clustering and Contrastive Affinity Learning | GLC++:通过全局局部聚类和对比亲和学习进行无源通用域适应 | Sanqing Qu, Tianpei Zou, Florian Röhrbein, Cewu Lu, Guang Chen, Dacheng Tao, Changjun Jiang | http://arxiv.org/pdf/2403.14410v1 | null |
2024-03-21 | Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization | 释放未标记的数据:跨视图地理定位的范例 | Guopeng Li, Ming Qian, Gui-Song Xia | http://arxiv.org/pdf/2403.14198v1 | null |
2024-03-21 | Text-Enhanced Data-free Approach for Federated Class-Incremental Learning | 用于联邦类增量学习的文本增强无数据方法 | Minh-Tuan Tran, Trung Le, Xuan-May Le, Mehrtash Harandi, Dinh Phung | http://arxiv.org/pdf/2403.14101v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-03-21 | Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion | Videoshop:使用噪声外推扩散反转进行本地化语义视频编辑 | Xiang Fan, Anand Bhattad, Ranjay Krishna | http://arxiv.org/pdf/2403.14617v1 | null |
2024-03-21 | Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning | 分层文本到视觉自我监督对齐以改进组织病理学表示学习 | Hasindri Watawana, Kanchana Ranasinghe, Tariq Mahmood, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan | http://arxiv.org/pdf/2403.14616v1 | null |
2024-03-21 | MyVLM: Personalizing VLMs for User-Specific Queries | MyVLM:针对特定于用户的查询个性化 VLM | Yuval Alaluf, Elad Richardson, Sergey Tulyakov, Kfir Aberman, Daniel Cohen-Or | http://arxiv.org/pdf/2403.14599v1 | null |
2024-03-21 | Implicit Style-Content Separation using B-LoRA | 使用 B-LoRA 隐式风格内容分离 | Yarden Frenkel, Yael Vinker, Ariel Shamir, Daniel Cohen-Or | http://arxiv.org/pdf/2403.14572v1 | null |
2024-03-21 | DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video | DINO-Tracker:驯服 DINO,在单个视频中进行自我监督点跟踪 | Narek Tumanyan, Assaf Singer, Shai Bagon, Tali Dekel | http://arxiv.org/pdf/2403.14548v1 | null |
2024-03-21 | AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks | AnyV2V:适用于任何视频到视频编辑任务的即插即用框架 | Max Ku, Cong Wei, Weiming Ren, Huan Yang, Wenhu Chen | http://arxiv.org/pdf/2403.14468v1 | null |
2024-03-21 | SyncTweedies: A General Generative Framework Based on Synchronized Diffusions | SyncTweedies:基于同步扩散的通用生成框架 | Jaihoon Kim, Juil Koo, Kyeongmin Yeo, Minhyuk Sung | http://arxiv.org/pdf/2403.14370v1 | null |
2024-03-21 | Neural Network-Based Processing and Reconstruction of Compromised Biophotonic Image Data | 基于神经网络的受损生物光子图像数据处理和重建 | Michael John Fanous, Paloma Casteleiro Costa, Cagatay Isil, Luzhe Huang, Aydogan Ozcan | http://arxiv.org/pdf/2403.14324v1 | null |
2024-03-21 | HySim: An Efficient Hybrid Similarity Measure for Patch Matching in Image Inpainting | HySim:图像修复中补丁匹配的高效混合相似度测量 | Saad Noufel, Nadir Maaroufi, Mehdi Najib, Mohamed Bakhouya | http://arxiv.org/pdf/2403.14292v1 | null |
2024-03-21 | Assessing the Robustness of Spectral Clustering for Deep Speaker Diarization | 评估深度说话人二值化的谱聚类的鲁棒性 | Nikhil Raghav, Md Sahidullah | http://arxiv.org/pdf/2403.14286v1 | null |
2024-03-21 | A Framework for Portrait Stylization with Skin-Tone Awareness and Nudity Identification | 具有肤色感知和裸体识别的肖像风格化框架 | Seungkwon Kim, Sangyeon Kim, Seung-Hun Nam | http://arxiv.org/pdf/2403.14264v1 | null |
2024-03-21 | Debiasing surgeon: fantastic weights and how to find them | 去偏外科医生:奇妙的权重以及如何找到它们 | Rémi Nahon, Ivan Luiz De Moura Matos, Van-Tam Nguyen, Enzo Tartaglione | http://arxiv.org/pdf/2403.14200v1 | null |
2024-03-21 | Science based AI model certification for untrained operational environments with application in traffic state estimation | 基于科学的人工智能模型认证,适用于未经训练的操作环境,并应用于交通状态估计 | Daryl Mupupuni, Anupama Guntu, Liang Hong, Kamrul Hasan, Leehyun Keel | http://arxiv.org/pdf/2403.14093v1 | null |