Skip to content

Latest commit

 

History

History
executable file
·
135 lines (110 loc) · 22.2 KB

2024-10-06.md

File metadata and controls

executable file
·
135 lines (110 loc) · 22.2 KB

[UPDATED!] 2024-10-06 (Publish Time)

生成模型

Publish Date Title Title_CN Authors PDF Code
2024-10-06 AdaptDiff: Cross-Modality Domain Adaptation via Weak Conditional Semantic Diffusion for Retinal Vessel Segmentation AdaptDiff:基于弱条件语义扩散的跨模态域自适应视网膜血管分割方法 Dewei Hu, Hao Li, Han Liu, Jiacheng Wang, Xing Yao, Daiwei Lu, Ipek Oguz http://arxiv.org/pdf/2410.04648v1 null
2024-10-06 Towards Unsupervised Blind Face Restoration using Diffusion Prior 无监督盲人脸修复的扩散先验方法研究 Tianshu Kuai, Sina Honari, Igor Gilitschenski, Alex Levinshtein http://arxiv.org/pdf/2410.04618v1 null
2024-10-06 MECFormer: Multi-task Whole Slide Image Classification with Expert Consultation Network MECFormer:基于专家咨询网络的多任务全切片图像分类 Doanh C. Bui, Jin Tae Kwak http://arxiv.org/pdf/2410.04507v1 null
2024-10-06 SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems SITCOM:用于逆问题的逐步三重一致性扩散采样方法 Ismail Alkhouri, Shijun Liang, Cheng-Han Huang, Jimmy Dai, Qing Qu, Saiprasad Ravishankar, Rongrong Wang http://arxiv.org/pdf/2410.04479v1 null
2024-10-06 Video Summarization Techniques: A Comprehensive Review 视频摘要技术:全面综述 Toqa Alaa, Ahmad Mongy, Assem Bakr, Mariam Diab, Walid Gomaa http://arxiv.org/pdf/2410.04449v1 null
2024-10-06 Attention Shift: Steering AI Away from Unsafe Content 注意力的转移:引导AI远离不安全内容 Shivank Garg, Manyana Tiwari http://arxiv.org/pdf/2410.04447v1 null
2024-10-06 DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion 扩散伪造:通过引导稳定扩散增强深度伪造检测的泛化能力 Ke Sun, Shen Chen, Taiping Yao, Hong Liu, Xiaoshuai Sun, Shouhong Ding, Rongrong Ji http://arxiv.org/pdf/2410.04372v1 null
2024-10-06 VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide 视频指南:无需训练通过教师指南提升视频扩散模型性能 Dohun Lee, Bryan S Kim, Geon Yeong Park, Jong Chul Ye http://arxiv.org/pdf/2410.04364v1 null
2024-10-06 Measuring and Improving Persuasiveness of Large Language Models 测量与提升大型语言模型的劝说力 Somesh Singh, Yaman K Singla, Harini SI, Balaji Krishnamurthy http://arxiv.org/pdf/2410.02653v2 null
2024-10-06 A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in Text-to-Image Encoders through Causal Analysis and Embedding Optimization 揭示文本至图像编码器中的信息混淆:通过因果分析与嵌入优化区分猫与狗的差异 Chieh-Yun Chen, Li-Wu Tsao, Chiang Tseng, Hong-Han Shuai http://arxiv.org/pdf/2410.00321v2 link
2024-10-06 ArMeme: Propagandistic Content in Arabic Memes 阿Meme:阿拉伯梗图中的宣传内容 Firoj Alam, Abul Hasnat, Fatema Ahmed, Md Arid Hasan, Maram Hasanain http://arxiv.org/pdf/2406.03916v2 null
2024-10-06 Text-to-Image Rectified Flow as Plug-and-Play Priors 文本到图像校正流作为即插即用先验 Xiaofeng Yang, Cheng Chen, Xulei Yang, Fayao Liu, Guosheng Lin http://arxiv.org/pdf/2406.03293v3 link
2024-10-06 MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding MindFormer:多主体fMRI的语义对齐用于大脑解码 Inhwa Han, Jaayeon Lee, Jong Chul Ye http://arxiv.org/pdf/2405.17720v2 null
2024-10-06 Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs 动代理:一种基于LLMs的人体动作生成会话框架 Qi Wu, Yubo Zhao, Yifan Wang, Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang http://arxiv.org/pdf/2405.17013v3 null

多模态

Publish Date Title Title_CN Authors PDF Code
2024-10-06 Multimodal 3D Fusion and In-Situ Learning for Spatially Aware AI 多模态3D融合与空间感知AI的现场学习方法 Chengyuan Xu, Radha Kumaran, Noah Stier, Kangyou Yu, Tobias Höllerer http://arxiv.org/pdf/2410.04652v1 null
2024-10-06 VISTA: A Visual and Textual Attention Dataset for Interpreting Multimodal Models VISTA:用于解释多模态模型的视觉与文本注意力数据集 Harshit, Tolga Tasdizen http://arxiv.org/pdf/2410.04609v1 null
2024-10-06 UniMuMo: Unified Text, Music and Motion Generation 统一文本、音乐与动作生成:UniMuMo模型 Han Yang, Kun Su, Yutong Zhang, Jiaben Chen, Kaizhi Qian, Gaowen Liu, Chuang Gan http://arxiv.org/pdf/2410.04534v1 null
2024-10-06 MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration MC-CoT:面向零样本医疗VQA的模块化协同CoT框架,集成LLM和MLLM Lai Wei, Wenkai Wang, Xiaoyu Shen, Yu Xie, Zhihao Fan, Xiaojin Zhang, Zhongyu Wei, Wei Chen http://arxiv.org/pdf/2410.04521v1 null
2024-10-06 CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News Detection 视觉语言模型的共识利用CoVLM:用于半监督多模态假新闻检测的共识杠杆 Devank, Jayateja Kalla, Soma Biswas http://arxiv.org/pdf/2410.04426v1 null
2024-10-06 MVP-Bench: Can Large Vision--Language Models Conduct Multi-level Visual Perception Like Humans? MVP-Bench:大型视觉-语言模型能否像人类一样进行多级视觉感知? Guanzhen Li, Yuxi Xie, Min-Yen Kan http://arxiv.org/pdf/2410.04345v1 null
2024-10-06 MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension 多模态先验引导的参数高效调优方法MaPPER:用于指代表达理解 Ting Liu, Zunnan Xu, Yue Hu, Liangtao Shi, Zhiqiang Wang, Quanjun Yin http://arxiv.org/pdf/2409.13609v2 null
2024-10-06 AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation AWT:基于增强、加权与迁移的视觉-语言模型转换方法 Yuhan Zhu, Yuyang Ji, Zhiyu Zhao, Gangshan Wu, Limin Wang http://arxiv.org/pdf/2407.04603v2 link
2024-10-06 Unveiling the Tapestry of Consistency in Large Vision-Language Models 揭示大型视觉-语言模型一致性结构的奥秘 Yuan Zhang, Fei Xiao, Tao Huang, Chun-Kai Fan, Hongyuan Dong, Jiawen Li, Jiacong Wang, Kuan Cheng, Shanghang Zhang, Haoyuan Guo http://arxiv.org/pdf/2405.14156v4 link

Nerf

Publish Date Title Title_CN Authors PDF Code
2024-10-06 In-Place Panoptic Radiance Field Segmentation with Perceptual Prior for 3D Scene Understanding 原地感知先验的泛光场分割在3D场景理解中的应用 Shenghao Li http://arxiv.org/pdf/2410.04529v1 null
2024-10-06 Deformable NeRF using Recursively Subdivided Tetrahedra 可变形NeRF:使用递归细分四面体 Zherui Qiu, Chenqu Ren, Kaiwen Song, Xiaoyi Zeng, Leyuan Yang, Juyong Zhang http://arxiv.org/pdf/2410.04402v1 null
2024-10-06 GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians 高斯块:基于基元和高斯分布构建具有部分感知能力的组合与可编辑三维场景 Shuyi Jiang, Qihao Zhao, Hossein Rahmani, De Wen Soh, Jun Liu, Na Zhao http://arxiv.org/pdf/2410.01535v2 null
2024-10-06 HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting 高动态范围新视角合成的高效算法:HDR-GS通过高斯扩散实现1000倍速度提升 Yuanhao Cai, Zihao Xiao, Yixun Liang, Minghan Qin, Yulun Zhang, Xiaokang Yang, Yaoyao Liu, Alan Yuille http://arxiv.org/pdf/2405.15125v3 link

3DGS

Publish Date Title Title_CN Authors PDF Code
2024-10-06 Mode-GS: Monocular Depth Guided Anchored 3D Gaussian Splatting for Robust Ground-View Scene Rendering Mode-GS:单目深度引导的锚定三维高斯泊松渲染在鲁棒地面视角场景渲染中的应用 Yonghan Lee, Jaehoon Choi, Dongki Jung, Jaeseong Yun, Soohyun Ryu, Dinesh Manocha, Suyong Yeon http://arxiv.org/pdf/2410.04646v1 null
2024-10-06 StreetSurfGS: Scalable Urban Street Surface Reconstruction with Planar-based Gaussian Splatting StreetSurfGS:基于平面高斯展开的可扩展城市街道表面重建方法 Xiao Cui, Weicai Ye, Yifan Wang, Guofeng Zhang, Wengang Zhou, Tong He, Houqiang Li http://arxiv.org/pdf/2410.04354v1 null
2024-10-06 S4D: Streaming 4D Real-World Reconstruction with Gaussians and 3D Control Points S4D:基于高斯分布与三维控制点的流式四维现实世界重建 Bing He, Yunuo Chen, Guo Lu, Qi Wang, Qunshan Gu, Rong Xie, Li Song, Wenjun Zhang http://arxiv.org/pdf/2408.13036v2 link

模型压缩/优化

Publish Date Title Title_CN Authors PDF Code
2024-10-06 CAPEEN: Image Captioning with Early Exits and Knowledge Distillation CAPEEN: 基于早期退出与知识蒸馏的图像描述生成 Divya Jyoti Bajpai, Manjesh Kumar Hanawal http://arxiv.org/pdf/2410.04433v1 null
2024-10-06 Drone Stereo Vision for Radiata Pine Branch Detection and Distance Measurement: Utilizing Deep Learning and YOLO Integration 利用深度学习和YOLO集成进行放射松树枝检测与距离测量的无人机立体视觉研究 Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green http://arxiv.org/pdf/2410.00503v2 null
2024-10-06 KISS-Matcher: Fast and Robust Point Cloud Registration Revisited KISS-Matcher:快速鲁棒的点云配准方法再探 Hyungtae Lim, Daebeom Kim, Gunhee Shin, Jingnan Shi, Ignacio Vizzo, Hyun Myung, Jaesik Park, Luca Carlone http://arxiv.org/pdf/2409.15615v2 link

分类/检测/识别/分割/...

Publish Date Title Title_CN Authors PDF Code
2024-10-06 Multi-Tiered Self-Contrastive Learning for Medical Microwave Radiometry (MWR) Breast Cancer Detection 多层级自对比学习在医学微波辐射计(MWR)乳腺癌检测中的应用 Christoforos Galazis, Huiyi Wu, Igor Goryanin http://arxiv.org/pdf/2410.04636v1 null
2024-10-06 Learning De-Biased Representations for Remote-Sensing Imagery 学习去偏置表示用于遥感影像处理 Zichen Tian, Zhaozheng Chen, Qianru Sun http://arxiv.org/pdf/2410.04546v1 null
2024-10-06 Look Around and Find Out: OOD Detection with Relative Angles 环绕探查:基于相对角度的OOD检测研究 Berker Demirel, Marco Fumero, Francesco Locatello http://arxiv.org/pdf/2410.04525v1 null
2024-10-06 DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination DAMRO:深入LVLM注意力机制以降低目标幻觉 Xuan Gong, Tianshi Ming, Xinpeng Wang, Zhihua Wei http://arxiv.org/pdf/2410.04514v1 null
2024-10-06 Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual Classification 逻辑推理正则化在视觉分类泛化中的决策解释 Zhaorui Tan, Xi Yang, Qiufeng Wang, Anh Nguyen, Kaizhu Huang http://arxiv.org/pdf/2410.04492v1 null
2024-10-06 Tensor-Train Point Cloud Compression and Efficient Approximate Nearest-Neighbor Search 张量列车点云压缩与高效近似最近邻搜索 Georgii Novikov, Alexander Gneushev, Alexey Kadeishvili, Ivan Oseledets http://arxiv.org/pdf/2410.04462v1 null
2024-10-06 Optimising for the Unknown: Domain Alignment for Cephalometric Landmark Detection 优化未知领域:用于 cephalometric 标志点检测的领域对齐技术 Julian Wyatt, Irina Voiculescu http://arxiv.org/pdf/2410.04445v1 null
2024-10-06 Automated Detection of Defects on Metal Surfaces using Vision Transformers 基于视觉变换器的金属表面缺陷自动检测技术 Toqa Alaa, Mostafa Kotb, Arwa Zakaria, Mariam Diab, Walid Gomaa http://arxiv.org/pdf/2410.04440v1 null
2024-10-06 A Mathematical Explanation of UNet UNet的数学解释 Xue-Cheng Tai, Hao Liu, Raymond H. Chan, Lingfeng Li http://arxiv.org/pdf/2410.04434v1 null
2024-10-06 SynCo: Synthetic Hard Negatives in Contrastive Learning for Better Unsupervised Visual Representations SynCo:对比学习中合成硬负样本用于提升无监督视觉表征学习性能 Nikolaos Giakoumoglou, Tania Stathaki http://arxiv.org/pdf/2410.02401v2 link
2024-10-06 Towards a vision foundation model for comprehensive assessment of Cardiac MRI 构建用于全面评估心脏MRI的视觉基础模型 Athira J Jacob, Indraneel Borgohain, Teodora Chitiboi, Puneet Sharma, Dorin Comaniciu, Daniel Rueckert http://arxiv.org/pdf/2410.01665v2 null
2024-10-06 SONICS: Synthetic Or Not -- Identifying Counterfeit Songs SONICS: 鉴别真伪——识别伪造歌曲技术 Md Awsafur Rahman, Zaber Ibn Abdul Hakim, Najibul Haque Sarker, Bishmoy Paul, Shaikh Anowarul Fattah http://arxiv.org/pdf/2408.14080v3 null
2024-10-06 Combined CNN and ViT features off-the-shelf: Another astounding baseline for recognition 融合CNN与ViT即用型特征:识别任务的又一惊人基线 Fernando Alonso-Fernandez, Kevin Hernandez-Diaz, Prayag Tiwari, Josef Bigun http://arxiv.org/pdf/2407.19472v2 null
2024-10-06 Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN 实时手势识别:融合骨架数据与多流卷积神经网络 Oluwaleke Yusuf, Maki Habib, Mohamed Moustafa http://arxiv.org/pdf/2406.15003v2 link
2024-10-06 Deep Learning Innovations for Underwater Waste Detection: An In-Depth Analysis 深度学习在水下垃圾检测中的创新应用:深入分析 Jaskaran Singh Walia, Pavithra L K, Kesar Mehta, Shivram Harshavardhana, Nandini Tyagi http://arxiv.org/pdf/2405.18299v3 link
2024-10-06 GUing: A Mobile GUI Search Engine using a Vision-Language Model GUing:一种基于视觉-语言模型的移动GUI搜索引擎 Jialiang Wei, Anne-Lise Courbis, Thomas Lambolais, Binbin Xu, Pierre Louis Bernard, Gérard Dray, Walid Maalej http://arxiv.org/pdf/2405.00145v3 link
2024-10-06 Boosting Unsupervised Semantic Segmentation with Principal Mask Proposals 增强型无监督语义分割:基于主成分掩膜提案方法 Oliver Hahn, Nikita Araslanov, Simone Schaub-Meyer, Stefan Roth http://arxiv.org/pdf/2404.16818v2 link
2024-10-06 Switch EMA: A Free Lunch for Better Flatness and Sharpness 切换指数移动平均:一种提升平坦度和锐度的免费午餐策略 Siyuan Li, Zicheng Liu, Juanxi Tian, Ge Wang, Zedong Wang, Weiyang Jin, Di Wu, Cheng Tan, Tao Lin, Yang Liu, et.al. http://arxiv.org/pdf/2402.09240v2 link
2024-10-06 OpenMixup: Open Mixup Toolbox and Benchmark for Visual Representation Learning 开放Mixup:视觉表征学习的开放Mixup工具箱与基准测试 Siyuan Li, Zedong Wang, Zicheng Liu, Juanxi Tian, Di Wu, Cheng Tan, Weiyang Jin, Stan Z. Li http://arxiv.org/pdf/2209.04851v3 link

图像理解

Publish Date Title Title_CN Authors PDF Code
2024-10-06 Neural Product Importance Sampling via Warp Composition 神经产品重要性采样:通过扭曲组合方法 Joey Litalien, Miloš Hašan, Fujun Luan, Krishna Mullia, Iliyan Georgiev http://arxiv.org/pdf/2409.18974v2 null

LLM

Publish Date Title Title_CN Authors PDF Code
2024-10-06 Realizing Video Summarization from the Path of Language-based Semantic Understanding 基于语言语义理解路径实现视频摘要生成 Kuan-Chen Mu, Zhi-Yi Chin, Wei-Chen Chiu http://arxiv.org/pdf/2410.04511v1 null
2024-10-06 Knowledge Mechanisms in Large Language Models: A Survey and Perspective 大规模语言模型中的知识机制:综述与展望 Mengru Wang, Yunzhi Yao, Ziwen Xu, Shuofei Qiao, Shumin Deng, Peng Wang, Xiang Chen, Jia-Chen Gu, Yong Jiang, Pengjun Xie, et.al. http://arxiv.org/pdf/2407.15017v3 null
2024-10-06 To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models 针对大型语言模型实用知识遗忘的探讨:遗忘与否? Bozhong Tian, Xiaozhuan Liang, Siyuan Cheng, Qingbin Liu, Mengru Wang, Dianbo Sui, Xi Chen, Huajun Chen, Ningyu Zhang http://arxiv.org/pdf/2407.01920v2 link

Transformer

Publish Date Title Title_CN Authors PDF Code
2024-10-06 Enhancing 3D Human Pose Estimation Amidst Severe Occlusion with Dual Transformer Fusion 双Transformer融合增强严重遮挡下的3D人体姿态估计 Mehwish Ghafoor, Arif Mahmood, Muhammad Bilal http://arxiv.org/pdf/2410.04574v1 null
2024-10-06 Empowering Backbone Models for Visual Text Generation with Input Granularity Control and Glyph-Aware Training 增强骨干模型以实现视觉文本生成:输入粒度控制与字形感知训练 Wenbo Li, Guohao Li, Zhibin Lan, Xue Xu, Wanru Zhuang, Jiachen Liu, Xinyan Xiao, Jinsong Su http://arxiv.org/pdf/2410.04439v1 null
2024-10-06 SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference SparseVLM:用于高效视觉-语言模型推理的视觉令牌稀疏化 Yuan Zhang, Chun-Kai Fan, Junpeng Ma, Wenzhao Zheng, Tao Huang, Kuan Cheng, Denis Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, et.al. http://arxiv.org/pdf/2410.04417v1 null
2024-10-06 Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion Famba-V: 基于跨层Token融合的快速视觉Mamba模型 Hui Shen, Zhongwei Wan, Xin Wang, Mi Zhang http://arxiv.org/pdf/2409.09808v3 link
2024-10-06 Transferable Tactile Transformers for Representation Learning Across Diverse Sensors and Tasks 跨多种传感器和任务的可迁移触觉变压器表示学习 Jialiang Zhao, Yuxiang Ma, Lirui Wang, Edward H. Adelson http://arxiv.org/pdf/2406.13640v3 null

3D/CG

Publish Date Title Title_CN Authors PDF Code
2024-10-06 LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation LiteVLoc: 针对图像目标导航的Map-Lite视觉定位研究 Jianhao Jiao, Jinhao He, Changkun Liu, Sebastian Aegidius, Xiangcheng Hu, Tristan Braud, Dimitrios Kanoulas http://arxiv.org/pdf/2410.04419v1 null

各类学习方式

Publish Date Title Title_CN Authors PDF Code
2024-10-06 DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor 深度退化响应作为灵活图像描述子的开发:DDR方法 Juncheng Wu, Zhangkai Ni, Hanli Wang, Wenhan Yang, Yuyin Zhou, Shiqi Wang http://arxiv.org/pdf/2406.08377v2 null

其他

Publish Date Title Title_CN Authors PDF Code
2024-10-06 Is What You Ask For What You Get? Investigating Concept Associations in Text-to-Image Models 探究文本到图像模型中的概念关联:所求是否即所得? Salma Abdel Magid, Weiwei Pan, Simon Warchol, Grace Guo, Junsik Kim, Mahia Rahman, Hanspeter Pfister http://arxiv.org/pdf/2410.04634v1 null
2024-10-06 Generalizability analysis of deep learning predictions of human brain responses to augmented and semantically novel visual stimuli 深度学习预测人脑对增强和语义新颖视觉刺激响应的可泛化性分析 Valentyn Piskovskyi, Riccardo Chimisso, Sabrina Patania, Tom Foulsham, Giuseppe Vizzari, Dimitri Ognibene http://arxiv.org/pdf/2410.04497v1 null
2024-10-06 U-net based prediction of cerebrospinal fluid distribution and ventricular reflux grading 基于U-net的脑脊液分布预测与室管膜反流分级研究 Melanie Rieff, Fabian Holzberger, Oksana Lapina, Geir Ringstad, Lars Magnus Valnes, Bogna Warsza, Kent-Andre Mardal, Per Kristian Eide, Barbara Wohlmuth http://arxiv.org/pdf/2410.04460v1 null
2024-10-06 Disentangling Regional Primitives for Image Generation 解耦区域基元以实现图像生成 Zhengting Chen, Lei Cheng, Lianghui Ding, Quanshi Zhang http://arxiv.org/pdf/2410.04421v1 null
2024-10-06 Accelerating Inference of Networks in the Frequency Domain 频域网络推理加速研究 Chenqiu Zhao, Guanfang Dong, Anup Basu http://arxiv.org/pdf/2410.04342v1 null
2024-10-06 VoxAct-B: Voxel-Based Acting and Stabilizing Policy for Bimanual Manipulation VoxAct-B:基于体素的双臂操作动作与稳定策略研究 I-Chun Arthur Liu, Sicheng He, Daniel Seita, Gaurav Sukhatme http://arxiv.org/pdf/2407.04152v2 link
2024-10-06 On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning 面向视觉情境自然语言理解的高效语言与视觉辅助系统:阅读与推理中的关键要素研究 Geewook Kim, Minjoon Seo http://arxiv.org/pdf/2406.11823v2 link
2024-10-06 A study on the adequacy of common IQA measures for medical images 医学图像中常见图像质量评估指标充分性的研究 Anna Breger, Clemens Karner, Ian Selby, Janek Gröhl, Sören Dittmer, Edward Lilley, Judith Babar, Jake Beckford, Thomas R Else, Timothy J Sadler, et.al. http://arxiv.org/pdf/2405.19224v3 link
2024-10-06 Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation 层次化空间邻近推理在视觉-语言导航中的应用 Ming Xu, Zilong Xie http://arxiv.org/pdf/2403.11541v3 link
2024-10-06 MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment MAGR:面向连续动作质量评估的流形对齐图正则化方法 Kanglei Zhou, Liyuan Wang, Xingxing Zhang, Hubert P. H. Shum, Frederick W. B. Li, Jianguo Li, Xiaohui Liang http://arxiv.org/pdf/2403.04398v2 null