Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-06 | AdaptDiff: Cross-Modality Domain Adaptation via Weak Conditional Semantic Diffusion for Retinal Vessel Segmentation | AdaptDiff:基于弱条件语义扩散的跨模态域自适应视网膜血管分割方法 | Dewei Hu, Hao Li, Han Liu, Jiacheng Wang, Xing Yao, Daiwei Lu, Ipek Oguz | http://arxiv.org/pdf/2410.04648v1 | null |
2024-10-06 | Towards Unsupervised Blind Face Restoration using Diffusion Prior | 无监督盲人脸修复的扩散先验方法研究 | Tianshu Kuai, Sina Honari, Igor Gilitschenski, Alex Levinshtein | http://arxiv.org/pdf/2410.04618v1 | null |
2024-10-06 | MECFormer: Multi-task Whole Slide Image Classification with Expert Consultation Network | MECFormer:基于专家咨询网络的多任务全切片图像分类 | Doanh C. Bui, Jin Tae Kwak | http://arxiv.org/pdf/2410.04507v1 | null |
2024-10-06 | SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems | SITCOM:用于逆问题的逐步三重一致性扩散采样方法 | Ismail Alkhouri, Shijun Liang, Cheng-Han Huang, Jimmy Dai, Qing Qu, Saiprasad Ravishankar, Rongrong Wang | http://arxiv.org/pdf/2410.04479v1 | null |
2024-10-06 | Video Summarization Techniques: A Comprehensive Review | 视频摘要技术:全面综述 | Toqa Alaa, Ahmad Mongy, Assem Bakr, Mariam Diab, Walid Gomaa | http://arxiv.org/pdf/2410.04449v1 | null |
2024-10-06 | Attention Shift: Steering AI Away from Unsafe Content | 注意力的转移:引导AI远离不安全内容 | Shivank Garg, Manyana Tiwari | http://arxiv.org/pdf/2410.04447v1 | null |
2024-10-06 | DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion | 扩散伪造:通过引导稳定扩散增强深度伪造检测的泛化能力 | Ke Sun, Shen Chen, Taiping Yao, Hong Liu, Xiaoshuai Sun, Shouhong Ding, Rongrong Ji | http://arxiv.org/pdf/2410.04372v1 | null |
2024-10-06 | VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide | 视频指南:无需训练通过教师指南提升视频扩散模型性能 | Dohun Lee, Bryan S Kim, Geon Yeong Park, Jong Chul Ye | http://arxiv.org/pdf/2410.04364v1 | null |
2024-10-06 | Measuring and Improving Persuasiveness of Large Language Models | 测量与提升大型语言模型的劝说力 | Somesh Singh, Yaman K Singla, Harini SI, Balaji Krishnamurthy | http://arxiv.org/pdf/2410.02653v2 | null |
2024-10-06 | A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in Text-to-Image Encoders through Causal Analysis and Embedding Optimization | 揭示文本至图像编码器中的信息混淆:通过因果分析与嵌入优化区分猫与狗的差异 | Chieh-Yun Chen, Li-Wu Tsao, Chiang Tseng, Hong-Han Shuai | http://arxiv.org/pdf/2410.00321v2 | link |
2024-10-06 | ArMeme: Propagandistic Content in Arabic Memes | 阿Meme:阿拉伯梗图中的宣传内容 | Firoj Alam, Abul Hasnat, Fatema Ahmed, Md Arid Hasan, Maram Hasanain | http://arxiv.org/pdf/2406.03916v2 | null |
2024-10-06 | Text-to-Image Rectified Flow as Plug-and-Play Priors | 文本到图像校正流作为即插即用先验 | Xiaofeng Yang, Cheng Chen, Xulei Yang, Fayao Liu, Guosheng Lin | http://arxiv.org/pdf/2406.03293v3 | link |
2024-10-06 | MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding | MindFormer:多主体fMRI的语义对齐用于大脑解码 | Inhwa Han, Jaayeon Lee, Jong Chul Ye | http://arxiv.org/pdf/2405.17720v2 | null |
2024-10-06 | Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs | 动代理:一种基于LLMs的人体动作生成会话框架 | Qi Wu, Yubo Zhao, Yifan Wang, Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang | http://arxiv.org/pdf/2405.17013v3 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-06 | Multimodal 3D Fusion and In-Situ Learning for Spatially Aware AI | 多模态3D融合与空间感知AI的现场学习方法 | Chengyuan Xu, Radha Kumaran, Noah Stier, Kangyou Yu, Tobias Höllerer | http://arxiv.org/pdf/2410.04652v1 | null |
2024-10-06 | VISTA: A Visual and Textual Attention Dataset for Interpreting Multimodal Models | VISTA:用于解释多模态模型的视觉与文本注意力数据集 | Harshit, Tolga Tasdizen | http://arxiv.org/pdf/2410.04609v1 | null |
2024-10-06 | UniMuMo: Unified Text, Music and Motion Generation | 统一文本、音乐与动作生成:UniMuMo模型 | Han Yang, Kun Su, Yutong Zhang, Jiaben Chen, Kaizhi Qian, Gaowen Liu, Chuang Gan | http://arxiv.org/pdf/2410.04534v1 | null |
2024-10-06 | MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration | MC-CoT:面向零样本医疗VQA的模块化协同CoT框架,集成LLM和MLLM | Lai Wei, Wenkai Wang, Xiaoyu Shen, Yu Xie, Zhihao Fan, Xiaojin Zhang, Zhongyu Wei, Wei Chen | http://arxiv.org/pdf/2410.04521v1 | null |
2024-10-06 | CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News Detection | 视觉语言模型的共识利用CoVLM:用于半监督多模态假新闻检测的共识杠杆 | Devank, Jayateja Kalla, Soma Biswas | http://arxiv.org/pdf/2410.04426v1 | null |
2024-10-06 | MVP-Bench: Can Large Vision--Language Models Conduct Multi-level Visual Perception Like Humans? | MVP-Bench:大型视觉-语言模型能否像人类一样进行多级视觉感知? | Guanzhen Li, Yuxi Xie, Min-Yen Kan | http://arxiv.org/pdf/2410.04345v1 | null |
2024-10-06 | MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension | 多模态先验引导的参数高效调优方法MaPPER:用于指代表达理解 | Ting Liu, Zunnan Xu, Yue Hu, Liangtao Shi, Zhiqiang Wang, Quanjun Yin | http://arxiv.org/pdf/2409.13609v2 | null |
2024-10-06 | AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation | AWT:基于增强、加权与迁移的视觉-语言模型转换方法 | Yuhan Zhu, Yuyang Ji, Zhiyu Zhao, Gangshan Wu, Limin Wang | http://arxiv.org/pdf/2407.04603v2 | link |
2024-10-06 | Unveiling the Tapestry of Consistency in Large Vision-Language Models | 揭示大型视觉-语言模型一致性结构的奥秘 | Yuan Zhang, Fei Xiao, Tao Huang, Chun-Kai Fan, Hongyuan Dong, Jiawen Li, Jiacong Wang, Kuan Cheng, Shanghang Zhang, Haoyuan Guo | http://arxiv.org/pdf/2405.14156v4 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-06 | In-Place Panoptic Radiance Field Segmentation with Perceptual Prior for 3D Scene Understanding | 原地感知先验的泛光场分割在3D场景理解中的应用 | Shenghao Li | http://arxiv.org/pdf/2410.04529v1 | null |
2024-10-06 | Deformable NeRF using Recursively Subdivided Tetrahedra | 可变形NeRF:使用递归细分四面体 | Zherui Qiu, Chenqu Ren, Kaiwen Song, Xiaoyi Zeng, Leyuan Yang, Juyong Zhang | http://arxiv.org/pdf/2410.04402v1 | null |
2024-10-06 | GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians | 高斯块:基于基元和高斯分布构建具有部分感知能力的组合与可编辑三维场景 | Shuyi Jiang, Qihao Zhao, Hossein Rahmani, De Wen Soh, Jun Liu, Na Zhao | http://arxiv.org/pdf/2410.01535v2 | null |
2024-10-06 | HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting | 高动态范围新视角合成的高效算法:HDR-GS通过高斯扩散实现1000倍速度提升 | Yuanhao Cai, Zihao Xiao, Yixun Liang, Minghan Qin, Yulun Zhang, Xiaokang Yang, Yaoyao Liu, Alan Yuille | http://arxiv.org/pdf/2405.15125v3 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-06 | Mode-GS: Monocular Depth Guided Anchored 3D Gaussian Splatting for Robust Ground-View Scene Rendering | Mode-GS:单目深度引导的锚定三维高斯泊松渲染在鲁棒地面视角场景渲染中的应用 | Yonghan Lee, Jaehoon Choi, Dongki Jung, Jaeseong Yun, Soohyun Ryu, Dinesh Manocha, Suyong Yeon | http://arxiv.org/pdf/2410.04646v1 | null |
2024-10-06 | StreetSurfGS: Scalable Urban Street Surface Reconstruction with Planar-based Gaussian Splatting | StreetSurfGS:基于平面高斯展开的可扩展城市街道表面重建方法 | Xiao Cui, Weicai Ye, Yifan Wang, Guofeng Zhang, Wengang Zhou, Tong He, Houqiang Li | http://arxiv.org/pdf/2410.04354v1 | null |
2024-10-06 | S4D: Streaming 4D Real-World Reconstruction with Gaussians and 3D Control Points | S4D:基于高斯分布与三维控制点的流式四维现实世界重建 | Bing He, Yunuo Chen, Guo Lu, Qi Wang, Qunshan Gu, Rong Xie, Li Song, Wenjun Zhang | http://arxiv.org/pdf/2408.13036v2 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-06 | CAPEEN: Image Captioning with Early Exits and Knowledge Distillation | CAPEEN: 基于早期退出与知识蒸馏的图像描述生成 | Divya Jyoti Bajpai, Manjesh Kumar Hanawal | http://arxiv.org/pdf/2410.04433v1 | null |
2024-10-06 | Drone Stereo Vision for Radiata Pine Branch Detection and Distance Measurement: Utilizing Deep Learning and YOLO Integration | 利用深度学习和YOLO集成进行放射松树枝检测与距离测量的无人机立体视觉研究 | Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green | http://arxiv.org/pdf/2410.00503v2 | null |
2024-10-06 | KISS-Matcher: Fast and Robust Point Cloud Registration Revisited | KISS-Matcher:快速鲁棒的点云配准方法再探 | Hyungtae Lim, Daebeom Kim, Gunhee Shin, Jingnan Shi, Ignacio Vizzo, Hyun Myung, Jaesik Park, Luca Carlone | http://arxiv.org/pdf/2409.15615v2 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-06 | Multi-Tiered Self-Contrastive Learning for Medical Microwave Radiometry (MWR) Breast Cancer Detection | 多层级自对比学习在医学微波辐射计(MWR)乳腺癌检测中的应用 | Christoforos Galazis, Huiyi Wu, Igor Goryanin | http://arxiv.org/pdf/2410.04636v1 | null |
2024-10-06 | Learning De-Biased Representations for Remote-Sensing Imagery | 学习去偏置表示用于遥感影像处理 | Zichen Tian, Zhaozheng Chen, Qianru Sun | http://arxiv.org/pdf/2410.04546v1 | null |
2024-10-06 | Look Around and Find Out: OOD Detection with Relative Angles | 环绕探查:基于相对角度的OOD检测研究 | Berker Demirel, Marco Fumero, Francesco Locatello | http://arxiv.org/pdf/2410.04525v1 | null |
2024-10-06 | DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination | DAMRO:深入LVLM注意力机制以降低目标幻觉 | Xuan Gong, Tianshi Ming, Xinpeng Wang, Zhihua Wei | http://arxiv.org/pdf/2410.04514v1 | null |
2024-10-06 | Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual Classification | 逻辑推理正则化在视觉分类泛化中的决策解释 | Zhaorui Tan, Xi Yang, Qiufeng Wang, Anh Nguyen, Kaizhu Huang | http://arxiv.org/pdf/2410.04492v1 | null |
2024-10-06 | Tensor-Train Point Cloud Compression and Efficient Approximate Nearest-Neighbor Search | 张量列车点云压缩与高效近似最近邻搜索 | Georgii Novikov, Alexander Gneushev, Alexey Kadeishvili, Ivan Oseledets | http://arxiv.org/pdf/2410.04462v1 | null |
2024-10-06 | Optimising for the Unknown: Domain Alignment for Cephalometric Landmark Detection | 优化未知领域:用于 cephalometric 标志点检测的领域对齐技术 | Julian Wyatt, Irina Voiculescu | http://arxiv.org/pdf/2410.04445v1 | null |
2024-10-06 | Automated Detection of Defects on Metal Surfaces using Vision Transformers | 基于视觉变换器的金属表面缺陷自动检测技术 | Toqa Alaa, Mostafa Kotb, Arwa Zakaria, Mariam Diab, Walid Gomaa | http://arxiv.org/pdf/2410.04440v1 | null |
2024-10-06 | A Mathematical Explanation of UNet | UNet的数学解释 | Xue-Cheng Tai, Hao Liu, Raymond H. Chan, Lingfeng Li | http://arxiv.org/pdf/2410.04434v1 | null |
2024-10-06 | SynCo: Synthetic Hard Negatives in Contrastive Learning for Better Unsupervised Visual Representations | SynCo:对比学习中合成硬负样本用于提升无监督视觉表征学习性能 | Nikolaos Giakoumoglou, Tania Stathaki | http://arxiv.org/pdf/2410.02401v2 | link |
2024-10-06 | Towards a vision foundation model for comprehensive assessment of Cardiac MRI | 构建用于全面评估心脏MRI的视觉基础模型 | Athira J Jacob, Indraneel Borgohain, Teodora Chitiboi, Puneet Sharma, Dorin Comaniciu, Daniel Rueckert | http://arxiv.org/pdf/2410.01665v2 | null |
2024-10-06 | SONICS: Synthetic Or Not -- Identifying Counterfeit Songs | SONICS: 鉴别真伪——识别伪造歌曲技术 | Md Awsafur Rahman, Zaber Ibn Abdul Hakim, Najibul Haque Sarker, Bishmoy Paul, Shaikh Anowarul Fattah | http://arxiv.org/pdf/2408.14080v3 | null |
2024-10-06 | Combined CNN and ViT features off-the-shelf: Another astounding baseline for recognition | 融合CNN与ViT即用型特征:识别任务的又一惊人基线 | Fernando Alonso-Fernandez, Kevin Hernandez-Diaz, Prayag Tiwari, Josef Bigun | http://arxiv.org/pdf/2407.19472v2 | null |
2024-10-06 | Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN | 实时手势识别:融合骨架数据与多流卷积神经网络 | Oluwaleke Yusuf, Maki Habib, Mohamed Moustafa | http://arxiv.org/pdf/2406.15003v2 | link |
2024-10-06 | Deep Learning Innovations for Underwater Waste Detection: An In-Depth Analysis | 深度学习在水下垃圾检测中的创新应用:深入分析 | Jaskaran Singh Walia, Pavithra L K, Kesar Mehta, Shivram Harshavardhana, Nandini Tyagi | http://arxiv.org/pdf/2405.18299v3 | link |
2024-10-06 | GUing: A Mobile GUI Search Engine using a Vision-Language Model | GUing:一种基于视觉-语言模型的移动GUI搜索引擎 | Jialiang Wei, Anne-Lise Courbis, Thomas Lambolais, Binbin Xu, Pierre Louis Bernard, Gérard Dray, Walid Maalej | http://arxiv.org/pdf/2405.00145v3 | link |
2024-10-06 | Boosting Unsupervised Semantic Segmentation with Principal Mask Proposals | 增强型无监督语义分割:基于主成分掩膜提案方法 | Oliver Hahn, Nikita Araslanov, Simone Schaub-Meyer, Stefan Roth | http://arxiv.org/pdf/2404.16818v2 | link |
2024-10-06 | Switch EMA: A Free Lunch for Better Flatness and Sharpness | 切换指数移动平均:一种提升平坦度和锐度的免费午餐策略 | Siyuan Li, Zicheng Liu, Juanxi Tian, Ge Wang, Zedong Wang, Weiyang Jin, Di Wu, Cheng Tan, Tao Lin, Yang Liu, et.al. | http://arxiv.org/pdf/2402.09240v2 | link |
2024-10-06 | OpenMixup: Open Mixup Toolbox and Benchmark for Visual Representation Learning | 开放Mixup:视觉表征学习的开放Mixup工具箱与基准测试 | Siyuan Li, Zedong Wang, Zicheng Liu, Juanxi Tian, Di Wu, Cheng Tan, Weiyang Jin, Stan Z. Li | http://arxiv.org/pdf/2209.04851v3 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-06 | Neural Product Importance Sampling via Warp Composition | 神经产品重要性采样:通过扭曲组合方法 | Joey Litalien, Miloš Hašan, Fujun Luan, Krishna Mullia, Iliyan Georgiev | http://arxiv.org/pdf/2409.18974v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-06 | Realizing Video Summarization from the Path of Language-based Semantic Understanding | 基于语言语义理解路径实现视频摘要生成 | Kuan-Chen Mu, Zhi-Yi Chin, Wei-Chen Chiu | http://arxiv.org/pdf/2410.04511v1 | null |
2024-10-06 | Knowledge Mechanisms in Large Language Models: A Survey and Perspective | 大规模语言模型中的知识机制:综述与展望 | Mengru Wang, Yunzhi Yao, Ziwen Xu, Shuofei Qiao, Shumin Deng, Peng Wang, Xiang Chen, Jia-Chen Gu, Yong Jiang, Pengjun Xie, et.al. | http://arxiv.org/pdf/2407.15017v3 | null |
2024-10-06 | To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models | 针对大型语言模型实用知识遗忘的探讨:遗忘与否? | Bozhong Tian, Xiaozhuan Liang, Siyuan Cheng, Qingbin Liu, Mengru Wang, Dianbo Sui, Xi Chen, Huajun Chen, Ningyu Zhang | http://arxiv.org/pdf/2407.01920v2 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-06 | Enhancing 3D Human Pose Estimation Amidst Severe Occlusion with Dual Transformer Fusion | 双Transformer融合增强严重遮挡下的3D人体姿态估计 | Mehwish Ghafoor, Arif Mahmood, Muhammad Bilal | http://arxiv.org/pdf/2410.04574v1 | null |
2024-10-06 | Empowering Backbone Models for Visual Text Generation with Input Granularity Control and Glyph-Aware Training | 增强骨干模型以实现视觉文本生成:输入粒度控制与字形感知训练 | Wenbo Li, Guohao Li, Zhibin Lan, Xue Xu, Wanru Zhuang, Jiachen Liu, Xinyan Xiao, Jinsong Su | http://arxiv.org/pdf/2410.04439v1 | null |
2024-10-06 | SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference | SparseVLM:用于高效视觉-语言模型推理的视觉令牌稀疏化 | Yuan Zhang, Chun-Kai Fan, Junpeng Ma, Wenzhao Zheng, Tao Huang, Kuan Cheng, Denis Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, et.al. | http://arxiv.org/pdf/2410.04417v1 | null |
2024-10-06 | Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion | Famba-V: 基于跨层Token融合的快速视觉Mamba模型 | Hui Shen, Zhongwei Wan, Xin Wang, Mi Zhang | http://arxiv.org/pdf/2409.09808v3 | link |
2024-10-06 | Transferable Tactile Transformers for Representation Learning Across Diverse Sensors and Tasks | 跨多种传感器和任务的可迁移触觉变压器表示学习 | Jialiang Zhao, Yuxiang Ma, Lirui Wang, Edward H. Adelson | http://arxiv.org/pdf/2406.13640v3 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-06 | LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation | LiteVLoc: 针对图像目标导航的Map-Lite视觉定位研究 | Jianhao Jiao, Jinhao He, Changkun Liu, Sebastian Aegidius, Xiangcheng Hu, Tristan Braud, Dimitrios Kanoulas | http://arxiv.org/pdf/2410.04419v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-06 | DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor | 深度退化响应作为灵活图像描述子的开发:DDR方法 | Juncheng Wu, Zhangkai Ni, Hanli Wang, Wenhan Yang, Yuyin Zhou, Shiqi Wang | http://arxiv.org/pdf/2406.08377v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-06 | Is What You Ask For What You Get? Investigating Concept Associations in Text-to-Image Models | 探究文本到图像模型中的概念关联:所求是否即所得? | Salma Abdel Magid, Weiwei Pan, Simon Warchol, Grace Guo, Junsik Kim, Mahia Rahman, Hanspeter Pfister | http://arxiv.org/pdf/2410.04634v1 | null |
2024-10-06 | Generalizability analysis of deep learning predictions of human brain responses to augmented and semantically novel visual stimuli | 深度学习预测人脑对增强和语义新颖视觉刺激响应的可泛化性分析 | Valentyn Piskovskyi, Riccardo Chimisso, Sabrina Patania, Tom Foulsham, Giuseppe Vizzari, Dimitri Ognibene | http://arxiv.org/pdf/2410.04497v1 | null |
2024-10-06 | U-net based prediction of cerebrospinal fluid distribution and ventricular reflux grading | 基于U-net的脑脊液分布预测与室管膜反流分级研究 | Melanie Rieff, Fabian Holzberger, Oksana Lapina, Geir Ringstad, Lars Magnus Valnes, Bogna Warsza, Kent-Andre Mardal, Per Kristian Eide, Barbara Wohlmuth | http://arxiv.org/pdf/2410.04460v1 | null |
2024-10-06 | Disentangling Regional Primitives for Image Generation | 解耦区域基元以实现图像生成 | Zhengting Chen, Lei Cheng, Lianghui Ding, Quanshi Zhang | http://arxiv.org/pdf/2410.04421v1 | null |
2024-10-06 | Accelerating Inference of Networks in the Frequency Domain | 频域网络推理加速研究 | Chenqiu Zhao, Guanfang Dong, Anup Basu | http://arxiv.org/pdf/2410.04342v1 | null |
2024-10-06 | VoxAct-B: Voxel-Based Acting and Stabilizing Policy for Bimanual Manipulation | VoxAct-B:基于体素的双臂操作动作与稳定策略研究 | I-Chun Arthur Liu, Sicheng He, Daniel Seita, Gaurav Sukhatme | http://arxiv.org/pdf/2407.04152v2 | link |
2024-10-06 | On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning | 面向视觉情境自然语言理解的高效语言与视觉辅助系统:阅读与推理中的关键要素研究 | Geewook Kim, Minjoon Seo | http://arxiv.org/pdf/2406.11823v2 | link |
2024-10-06 | A study on the adequacy of common IQA measures for medical images | 医学图像中常见图像质量评估指标充分性的研究 | Anna Breger, Clemens Karner, Ian Selby, Janek Gröhl, Sören Dittmer, Edward Lilley, Judith Babar, Jake Beckford, Thomas R Else, Timothy J Sadler, et.al. | http://arxiv.org/pdf/2405.19224v3 | link |
2024-10-06 | Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation | 层次化空间邻近推理在视觉-语言导航中的应用 | Ming Xu, Zilong Xie | http://arxiv.org/pdf/2403.11541v3 | link |
2024-10-06 | MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment | MAGR:面向连续动作质量评估的流形对齐图正则化方法 | Kanglei Zhou, Liyuan Wang, Xingxing Zhang, Hubert P. H. Shum, Frederick W. B. Li, Jianguo Li, Xiaohui Liang | http://arxiv.org/pdf/2403.04398v2 | null |