Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-09 | IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation | IterComp:基于模型库的迭代组合感知反馈学习在文本到图像生成中的应用 | Xinchen Zhang, Ling Yang, Guohao Li, Yaqi Cai, Jiake Xie, Yong Tang, Yujiu Yang, Mengdi Wang, Bin Cui | http://arxiv.org/pdf/2410.07171v1 | null |
2024-10-09 | AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation | AvatarGO:零样本4D人-物交互生成与动画技术 | Yukang Cao, Liang Pan, Kai Han, Kwan-Yee K. Wong, Ziwei Liu | http://arxiv.org/pdf/2410.07164v1 | null |
2024-10-09 | InstructG2I: Synthesizing Images from Multimodal Attributed Graphs | InstructG2I: 从多模态属性图合成图像 | Bowen Jin, Ziqi Pang, Bingjun Guo, Yu-Xiong Wang, Jiaxuan You, Jiawei Han | http://arxiv.org/pdf/2410.07157v1 | null |
2024-10-09 | Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis | Trans4D:面向组合文本到四维合成的真实几何感知过渡技术 | Bohan Zeng, Ling Yang, Siyu Li, Jiaming Liu, Zixiang Zhang, Juanxi Tian, Kaixin Zhu, Yongzhen Guo, Fu-Yun Wang, Minkai Xu, et.al. | http://arxiv.org/pdf/2410.07155v1 | null |
2024-10-09 | Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control | 多视角一致性PBR纹理的协同控制联合生成技术研究 | Shimon Vainer, Konstantin Kutsy, Dante De Nigris, Ciara Rowles, Slava Elizarov, Simon Donné | http://arxiv.org/pdf/2410.06985v1 | null |
2024-10-09 | Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think | 生成过程中的表示对齐:训练扩散变换器比你想的更容易 | Sihyun Yu, Sangkyung Kwak, Huiwon Jang, Jongheon Jeong, Jonathan Huang, Jinwoo Shin, Saining Xie | http://arxiv.org/pdf/2410.06940v1 | null |
2024-10-09 | Compositional Entailment Learning for Hyperbolic Vision-Language Models | 组件蕴含学习在双曲视觉-语言模型中的应用 | Avik Pal, Max van Spengler, Guido Maria D'Amely di Melendugno, Alessandro Flaborea, Fabio Galasso, Pascal Mettes | http://arxiv.org/pdf/2410.06912v1 | null |
2024-10-09 | Boosting Few-Shot Detection with Large Language Models and Layout-to-Image Synthesis | 增强型少数镜头检测:结合大型语言模型与布局到图像合成技术 | Ahmed Abdullah, Nikolas Ebert, Oliver Wasenmüller | http://arxiv.org/pdf/2410.06841v1 | null |
2024-10-09 | Transesophageal Echocardiography Generation using Anatomical Models | 使用解剖模型的经食管超声心动图生成技术研究 | Emmanuel Oladokun, Musa Abdulkareem, Jurica Šprem, Vicente Grau | http://arxiv.org/pdf/2410.06781v1 | null |
2024-10-09 | Diff-FMT: Diffusion Models for Fluorescence Molecular Tomography | Diff-FMT: 用于荧光分子层析的扩散模型研究 | Qianqian Xue, Peng Zhang, Xingyu Liu, Wenjian Wang, Guanglei Zhang | http://arxiv.org/pdf/2410.06757v1 | null |
2024-10-09 | Suppress Content Shift: Better Diffusion Features via Off-the-Shelf Generation Techniques | 抑制内容偏移:通过现成生成技术实现更优的扩散特征 | Benyuan Meng, Qianqian Xu, Zitai Wang, Zhiyong Yang, Xiaochun Cao, Qingming Huang | http://arxiv.org/pdf/2410.06719v1 | null |
2024-10-09 | Decouple-Then-Merge: Towards Better Training for Diffusion Models | 解耦后合并:迈向更好的扩散模型训练方法 | Qianli Ma, Xuefei Ning, Dongrui Liu, Li Niu, Linfeng Zhang | http://arxiv.org/pdf/2410.06664v1 | null |
2024-10-09 | DDRN:a Data Distribution Reconstruction Network for Occluded Person Re-Identification | DDRN:一种用于遮挡人物重识别的数据分布重建网络 | Zhaoyong Wang, Yujie Liu, Mingyue Li, Wenxin Zhang, Zongmin Li | http://arxiv.org/pdf/2410.06600v1 | null |
2024-10-09 | InstantIR: Blind Image Restoration with Instant Generative Reference | 即时IR:基于即时生成参考的盲图像恢复技术 | Jen-Yuan Huang, Haofan Wang, Qixun Wang, Xu Bai, Hao Ai, Peng Xing, Jen-Tse Huang | http://arxiv.org/pdf/2410.06551v1 | null |
2024-10-09 | HFH-Font: Few-shot Chinese Font Synthesis with Higher Quality, Faster Speed, and Higher Resolution | HFH-Font:高质量、快速、高分辨率下的少样本中文字体合成技术 | Hua Li, Zhouhui Lian | http://arxiv.org/pdf/2410.06488v1 | null |
2024-10-09 | Does Spatial Cognition Emerge in Frontier Models? | 空间认知在前沿模型中是否涌现? | Santhosh Kumar Ramakrishnan, Erik Wijmans, Philipp Kraehenbuehl, Vladlen Koltun | http://arxiv.org/pdf/2410.06468v1 | null |
2024-10-09 | Dynamic Diffusion Transformer | 动态扩散Transformer | Wangbo Zhao, Yizeng Han, Jiasheng Tang, Kai Wang, Yibing Song, Gao Huang, Fan Wang, Yang You | http://arxiv.org/pdf/2410.03456v2 | link |
2024-10-09 | Can Your Generative Model Detect Out-of-Distribution Covariate Shift? | 能否用您的生成模型检测分布外协变量漂移? | Christiaan Viviers, Amaan Valiuddin, Francisco Caetano, Lemar Abdi, Lena Filatova, Peter de With, Fons van der Sommen | http://arxiv.org/pdf/2409.03043v2 | link |
2024-10-09 | Atlas Gaussians Diffusion for 3D Generation | Atlas 高斯扩散在三维生成中的应用 | Haitao Yang, Yuan Dong, Hanwen Jiang, Dejia Xu, Georgios Pavlakos, Qixing Huang | http://arxiv.org/pdf/2408.13055v2 | null |
2024-10-09 | FlowDreamer: Exploring High Fidelity Text-to-3D Generation via Rectified Flow | FlowDreamer:基于矫正流的高保真文本至3D生成探索 | Hangyu Li, Xiangxiang Chu, Dingyuan Shi, Wang Lin | http://arxiv.org/pdf/2408.05008v3 | null |
2024-10-09 | MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence | MovieDreamer:协同长视觉序列的层次化生成方法 | Canyu Zhao, Mingyu Liu, Wen Wang, Weihua Chen, Fan Wang, Hao Chen, Bo Zhang, Chunhua Shen | http://arxiv.org/pdf/2407.16655v2 | null |
2024-10-09 | MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers | MeshAnything:基于自回归变换器的艺术家创建网格生成技术 | Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, et.al. | http://arxiv.org/pdf/2406.10163v2 | link |
2024-10-09 | Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models | 防御性遗忘与对抗训练在扩散模型中实现稳健概念消除的研究 | Yimeng Zhang, Xin Chen, Jinghan Jia, Yihua Zhang, Chongyu Fan, Jiancheng Liu, Mingyi Hong, Ke Ding, Sijia Liu | http://arxiv.org/pdf/2405.15234v3 | link |
2024-10-09 | Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training | 通过大规模无动作视频预训练学习可操作离散扩散策略 | Haoran He, Chenjia Bai, Ling Pan, Weinan Zhang, Bin Zhao, Xuelong Li | http://arxiv.org/pdf/2402.14407v4 | null |
2024-10-09 | Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training | 从错误中学习:文本到图像扩散模型训练的迭代提示重标记方法 | Xinyan Chen, Jiaxin Ge, Tianjun Zhang, Jiaming Liu, Shanghang Zhang | http://arxiv.org/pdf/2312.16204v3 | link |
2024-10-09 | A Unified Generative Framework for Realistic Lidar Simulation in Autonomous Driving Systems | 统一生成框架在自动驾驶系统中的真实激光雷达模拟 | Hamed Haghighi, Mehrdad Dianati, Valentina Donzella, Kurt Debattista | http://arxiv.org/pdf/2312.15817v2 | link |
2024-10-09 | The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization | 在去噪中的彩票票假设:迈向语义驱动的初始化研究 | Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa | http://arxiv.org/pdf/2312.08872v4 | null |
2024-10-09 | CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling | CMMD:用于视频-音频条件建模的对比多模态扩散算法 | Ruihan Yang, Hannes Gamper, Sebastian Braun | http://arxiv.org/pdf/2312.05412v2 | null |
2024-10-09 | Enforcing 3D Topological Constraints in Composite Objects via Implicit Functions | 通过隐函数在复合对象中强制实施三维拓扑约束 | Hieu Le, Jingyi Xu, Nicolas Talabot, Jiancheng Yang, Pascal Fua | http://arxiv.org/pdf/2307.08716v2 | null |
2024-10-09 | MedLSAM: Localize and Segment Anything Model for 3D CT Images | MedLSAM:面向三维CT图像的定位与分割任意模型 | Wenhui Lei, Xu Wei, Xiaofan Zhang, Kang Li, Shaoting Zhang | http://arxiv.org/pdf/2306.14752v4 | link |
2024-10-09 | Guided Image Synthesis via Initial Image Editing in Diffusion Model | 指导性图像合成:通过扩散模型中的初始图像编辑实现 | Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa | http://arxiv.org/pdf/2305.03382v3 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-09 | MM-Ego: Towards Building Egocentric Multimodal LLMs | MM-Ego:构建以自我为中心的多模态大语言模型之路 | Hanrong Ye, Haotian Zhang, Erik Daxberger, Lin Chen, Zongyu Lin, Yanghao Li, Bowen Zhang, Haoxuan You, Dan Xu, Zhe Gan, et.al. | http://arxiv.org/pdf/2410.07177v1 | null |
2024-10-09 | Do better language models have crisper vision? | 更好的语言模型是否拥有更清晰的视觉? | Jona Ruthardt, Gertjan J. Burghouts, Serge Belongie, Yuki M. Asano | http://arxiv.org/pdf/2410.07173v1 | null |
2024-10-09 | Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate | 解密大型视觉-语言模型中的跨模态对齐与模态融合速率 | Qidong Huang, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Jiaqi Wang, Dahua Lin, Weiming Zhang, Nenghai Yu | http://arxiv.org/pdf/2410.07167v1 | null |
2024-10-09 | Towards Interpreting Visual Information Processing in Vision-Language Models | 视觉-语言模型中视觉信息处理的解释性研究进展 | Clement Neo, Luke Ong, Philip Torr, Mor Geva, David Krueger, Fazl Barez | http://arxiv.org/pdf/2410.07149v1 | null |
2024-10-09 | Personalized Visual Instruction Tuning | 个性化视觉指令调优 | Renjie Pi, Jianshu Zhang, Tianyang Han, Jipeng Zhang, Rui Pan, Tong Zhang | http://arxiv.org/pdf/2410.07113v1 | null |
2024-10-09 | Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology | 面向真实无人机视觉-语言导航:平台、基准与方法论 | Xiangyu Wang, Donglin Yang, Ziqin Wang, Hohin Kwan, Jinyu Chen, Wenjun Wu, Hongsheng Li, Yue Liao, Si Liu | http://arxiv.org/pdf/2410.07087v1 | null |
2024-10-09 | Pixtral 12B | Pixtral 12B:请提供完整的英文标题,以便进行准确翻译。仅有的这个部分看起来像是某个模型或技术的名称,不适宜单独翻译。 | Pravesh Agrawal, Szymon Antoniak, Emma Bou Hanna, Devendra Chaplot, Jessica Chudnovsky, Saurabh Garg, Theophile Gervet, Soham Ghosh, Amélie Héliou, Paul Jacob, et.al. | http://arxiv.org/pdf/2410.07073v1 | null |
2024-10-09 | TinyEmo: Scaling down Emotional Reasoning via Metric Projection | TinyEmo: 通过度量投影缩小情感推理规模 | Cristian Gutierrez | http://arxiv.org/pdf/2410.07062v1 | null |
2024-10-09 | Secure Video Quality Assessment Resisting Adversarial Attacks | 安全视频质量评估:抵御对抗性攻击 | Ao-Xiang Zhang, Yu Ran, Weixuan Tang, Yuan-Gen Wang, Qingxiao Guan, Chunsheng Yang | http://arxiv.org/pdf/2410.06866v1 | null |
2024-10-09 | From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models | 从像素到标记:重新审视大型视觉-语言模型中的对象幻觉问题 | Yuying Shang, Xinyi Zeng, Yutao Zhu, Xiao Yang, Zhengwei Fang, Jingyuan Zhang, Jiawei Chen, Zinan Liu, Yu Tian | http://arxiv.org/pdf/2410.06795v1 | null |
2024-10-09 | HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric Understanding | HERM: 面向以人为中心理解的多模态LLM基准测试与增强 | Keliang Li, Zaifei Yang, Jiahe Zhao, Hongze Shen, Ruibing Hou, Hong Chang, Shiguang Shan, Xilin Chen | http://arxiv.org/pdf/2410.06777v1 | null |
2024-10-09 | To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models | 深度探讨多模态大型语言模型中的连接器选择:保留还是压缩 | Junyan Lin, Haoran Chen, Dawei Zhu, Xiaoyu Shen | http://arxiv.org/pdf/2410.06765v1 | null |
2024-10-09 | Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models | 突破视觉感知:针对大型视觉-语言模型的编码视觉令牌的对抗性攻击 | Yubo Wang, Chaohu Liu, Yanqiu Qu, Haoyu Cao, Deqiang Jiang, Linli Xu | http://arxiv.org/pdf/2410.06699v1 | null |
2024-10-09 | Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization | 增强多模态大语言模型以实现详细准确的视频字幕生成:基于多轮偏好优化的方法 | Changli Tang, Yixuan Li, Yudong Yang, Jimin Zhuang, Guangzhi Sun, Wei Li, Zujun Ma, Chao Zhang | http://arxiv.org/pdf/2410.06682v1 | null |
2024-10-09 | ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time | ETA:推理时视觉语言模型的安全评估与对齐研究 | Yi Ding, Bolian Li, Ruqi Zhang | http://arxiv.org/pdf/2410.06625v1 | null |
2024-10-09 | Decomposing Relationship from 1-to-N into N 1-to-1 for Text-Video Retrieval | 将1对N关系分解为N个1对1关系以用于文本-视频检索 | Jian Xiao, Zhenzhen Hu, Jia Li, Richang Hong | http://arxiv.org/pdf/2410.06618v1 | null |
2024-10-09 | Deep Correlated Prompting for Visual Recognition with Missing Modalities | 深度相关提示用于缺失模态的视觉识别 | Lianyu Hu, Tongkai Shi, Wei Feng, Fanhua Shang, Liang Wan | http://arxiv.org/pdf/2410.06558v1 | null |
2024-10-09 | The Sampling-Gaussian for stereo matching | 样本高斯在立体匹配中的应用 | Baiyu Pan, jichao jiao, Bowen Yao, Jianxin Pang, Jun Cheng | http://arxiv.org/pdf/2410.06527v1 | null |
2024-10-09 | IC3M: In-Car Multimodal Multi-object Monitoring for Abnormal Status of Both Driver and Passengers | IC3M:车载多模态多目标监控,用于识别驾驶员与乘客异常状态 | Zihan Fang, Zheng Lin, Senkang Hu, Hangcheng Cao, Yiqin Deng, Xianhao Chen, Yuguang Fang | http://arxiv.org/pdf/2410.02592v2 | null |
2024-10-09 | DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM | DTVLT:基于LLM的多模态多样化文本基准视觉语言跟踪 | Xuchen Li, Shiyu Hu, Xiaokun Feng, Dailing Zhang, Meiqi Wu, Jing Zhang, Kaiqi Huang | http://arxiv.org/pdf/2410.02492v2 | null |
2024-10-09 | LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation | LLaVA-MoD:通过MoE知识蒸馏实现LLaVA小型化 | Fangxun Shu, Yue Liao, Le Zhuo, Chenning Xu, Lei Zhang, Guanghao Zhang, Haonan Shi, Long Chen, Tao Zhong, Wanggui He, et.al. | http://arxiv.org/pdf/2408.15881v2 | link |
2024-10-09 | Towards Semantic Equivalence of Tokenization in Multimodal LLM | 迈向多模态大语言模型中标记化语义等价的实现 | Shengqiong Wu, Hao Fei, Xiangtai Li, Jiayi Ji, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan | http://arxiv.org/pdf/2406.05127v3 | null |
2024-10-09 | LG-VQ: Language-Guided Codebook Learning | LG-VQ:语言引导的码书学习算法 | Guotao Liang, Baoquan Zhang, Yaowei Wang, Xutao Li, Yunming Ye, Huaibin Wang, Chuyao Luo, Kola Ye, linfeng Luo | http://arxiv.org/pdf/2405.14206v2 | null |
2024-10-09 | DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM | DTLLM-VLT: 基于LLM的视觉语言跟踪多样化文本生成方法 | Xuchen Li, Xiaokun Feng, Shiyu Hu, Meiqi Wu, Dailing Zhang, Jing Zhang, Kaiqi Huang | http://arxiv.org/pdf/2405.12139v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-09 | DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation | DreamMesh4D:基于稀疏控制高斯网格混合表示的视频到四维生成技术 | Zhiqi Li, Yiming Chen, Peidong Liu | http://arxiv.org/pdf/2410.06756v1 | null |
2024-10-09 | MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes | MimicTalk:几分钟内模仿个性化的表情丰富的3D说话脸庞 | Zhenhui Ye, Tianyun Zhong, Yi Ren, Ziyue Jiang, Jiawei Huang, Rongjie Huang, Jinglin Liu, Jinzheng He, Chen Zhang, Zehan Wang, et.al. | http://arxiv.org/pdf/2410.06734v1 | null |
2024-10-09 | 3D Representation Methods: A Survey | 3D表示方法:综述 | Zhengren Wang | http://arxiv.org/pdf/2410.06475v1 | null |
2024-10-09 | EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis | 实时视图合成的精确体积椭球渲染方法:EVER | Alexander Mai, Peter Hedman, George Kopanas, Dor Verbin, David Futschik, Qiangeng Xu, Falko Kuester, Jonathan T. Barron, Yinda Zhang | http://arxiv.org/pdf/2410.01804v3 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-09 | ES-Gaussian: Gaussian Splatting Mapping via Error Space-Based Gaussian Completion | ES-Gaussian:基于误差空间的高斯补全高斯散射映射 | Lu Chen, Yingfu Zeng, Haoang Li, Zhitao Deng, Jiafu Yan, Zhenjun Zhao | http://arxiv.org/pdf/2410.06613v1 | null |
2024-10-09 | Free-DyGS: Camera-Pose-Free Scene Reconstruction based on Gaussian Splatting for Dynamic Surgical Videos | 自由-DyGS:基于高斯扩散的动态手术视频无相机姿态场景重建 | Qian Li, Shuojue Yang, Daiyun Shen, Yueming Jin | http://arxiv.org/pdf/2409.01003v2 | null |
2024-10-09 | HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior | HAHA:具有纹理网格先验的高度精细化高斯人像模型 | David Svitov, Pietro Morerio, Lourdes Agapito, Alessio Del Bue | http://arxiv.org/pdf/2404.01053v2 | link |
2024-10-09 | StopThePop: Sorted Gaussian Splatting for View-Consistent Real-time Rendering | StopThePop:用于视图一致实时渲染的排序高斯溅射技术 | Lukas Radl, Michael Steiner, Mathias Parger, Alexander Weinrauch, Bernhard Kerbl, Markus Steinberger | http://arxiv.org/pdf/2402.00525v3 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-09 | JPEG Inspired Deep Learning | JPEG启发的深度学习 | Ahmed H. Salamah, Kaixiang Zheng, Yiwen Liu, En-Hui Yang | http://arxiv.org/pdf/2410.07081v1 | null |
2024-10-09 | S2HPruner: Soft-to-Hard Distillation Bridges the Discretization Gap in Pruning | S2HPruner:软硬蒸馏法桥接剪枝中的离散化差距 | Weihao Lin, Shengji Tang, Chong Yu, Peng Ye, Tao Chen | http://arxiv.org/pdf/2410.07046v1 | null |
2024-10-09 | Structure-Centric Robust Monocular Depth Estimation via Knowledge Distillation | 结构中心型鲁棒单目深度估计通过知识蒸馏方法 | Runze Chen, Haiyong Luo, Fang Zhao, Jingze Yu, Yupeng Jia, Juan Wang, Xuepeng Ma | http://arxiv.org/pdf/2410.06982v1 | null |
2024-10-09 | Perceptual Quality Assessment of Trisoup-Lifting Encoded 3D Point Clouds | 感知质量评估:Trisoup-Lifting编码的三维点云 | Juncheng Long, Honglei Su, Qi Liu, Hui Yuan, Wei Gao, Jiarun Song, Zhou Wang | http://arxiv.org/pdf/2410.06689v1 | null |
2024-10-09 | DRUPI: Dataset Reduction Using Privileged Information | DRUPI:利用特权信息进行数据集缩减技术研究 | Shaobo Wang, Yantai Yang, Shuaiyu Zhang, Chenghao Sun, Weiya Li, Xuming Hu, Linfeng Zhang | http://arxiv.org/pdf/2410.01611v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-09 | CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition | CHASE:基于骨架的多实体动作识别的凸包自适应平移学习算法 | Yuhang Wen, Mengyuan Liu, Songtao Wu, Beichen Ding | http://arxiv.org/pdf/2410.07153v1 | link |
2024-10-09 | Diagnosis of Malignant Lymphoma Cancer Using Hybrid Optimized Techniques Based on Dense Neural Networks | 基于密集神经网络混合优化技术的恶性淋巴瘤癌症诊断研究 | Salah A. Aly, Ali Bakhiet, Mazen Balat | http://arxiv.org/pdf/2410.06974v1 | null |
2024-10-09 | Bridge the Points: Graph-based Few-shot Segment Anything Semantically | 桥接点:基于图的少样本语义分割任意事物 | Anqi Zhang, Guangyu Gao, Jianbo Jiao, Chi Harold Liu, Yunchao Wei | http://arxiv.org/pdf/2410.06964v1 | null |
2024-10-09 | Learning from Spatio-temporal Correlation for Semi-Supervised LiDAR Semantic Segmentation | 基于时空相关性学习的半监督LiDAR语义分割 | Seungho Lee, Hwijeong Lee, Hyunjung Shim | http://arxiv.org/pdf/2410.06893v1 | null |
2024-10-09 | Selecting the Best Sequential Transfer Path for Medical Image Segmentation with Limited Labeled Data | 最佳顺序迁移路径选择:有限标注数据下的医学图像分割 | Jingyun Yang, Jingge Wang, Guoqing Zhang, Yang Li | http://arxiv.org/pdf/2410.06892v1 | null |
2024-10-09 | Evaluating Model Performance with Hard-Swish Activation Function Adjustments | 评估硬_swish激活函数调整下的模型性能 | Sai Abhinav Pydimarry, Shekhar Madhav Khairnar, Sofia Garces Palacios, Ganesh Sankaranarayanan, Darian Hoagland, Dmitry Nepomnayshy, Huu Phong Nguyen | http://arxiv.org/pdf/2410.06879v1 | null |
2024-10-09 | SurANet: Surrounding-Aware Network for Concealed Object Detection via Highly-Efficient Interactive Contrastive Learning Strategy | SurANet:基于高效交互对比学习策略的周界感知网络用于隐藏物体检测 | Yuhan Kang, Qingpeng Li, Leyuan Fang, Jian Zhao, Xuelong Li | http://arxiv.org/pdf/2410.06842v1 | null |
2024-10-09 | An Improved Approach for Cardiac MRI Segmentation based on 3D UNet Combined with Papillary Muscle Exclusion | ERROR | Narjes Benameur, Ramzi Mahmoudi, Mohamed Deriche, Amira fayouka, Imene Masmoudi, Nessrine Zoghlami | http://arxiv.org/pdf/2410.06818v1 | null |
2024-10-09 | Rethinking the Evaluation of Visible and Infrared Image Fusion | 重新审视可见光与红外图像融合的评价方法 | Dayan Guan, Yixuan Wu, Tianzhu Liu, Alex C. Kot, Yanfeng Gu | http://arxiv.org/pdf/2410.06811v1 | null |
2024-10-09 | QuadMamba: Learning Quadtree-based Selective Scan for Visual State Space Model | 四叉曼巴:学习基于四叉树的视觉状态空间模型的选择性扫描 | Fei Xie, Weijia Zhang, Zhongdao Wang, Chao Ma | http://arxiv.org/pdf/2410.06806v1 | null |
2024-10-09 | Utilizing Transfer Learning and pre-trained Models for Effective Forest Fire Detection: A Case Study of Uttarakhand | 利用迁移学习和预训练模型实现有效的森林火灾检测:乌塔尔阿坎德案例研究 | Hari Prabhat Gupta, Rahul Mishra | http://arxiv.org/pdf/2410.06743v1 | null |
2024-10-09 | Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy | 评估点云着色对语义分割准确性的影响 | Qinfeng Zhu, Jiaze Cao, Yuanzhi Cai, Lei Fan | http://arxiv.org/pdf/2410.06725v1 | null |
2024-10-09 | Fourier-based Action Recognition for Wildlife Behavior Quantification with Event Cameras | 基于傅里叶的行为识别方法:事件相机在野生动物行为量化中的应用 | Friedhelm Hamann, Suman Ghosh, Ignacio Juarez Martinez, Tom Hart, Alex Kacelnik, Guillermo Gallego | http://arxiv.org/pdf/2410.06698v1 | null |
2024-10-09 | Continual Learning in the Frequency Domain | 频率域中的持续学习 | Ruiqi Liu, Boyu Diao, Libo Huang, Zijia An, Zhulin An, Yongjun Xu | http://arxiv.org/pdf/2410.06645v1 | null |
2024-10-09 | Open-RGBT: Open-vocabulary RGB-T Zero-shot Semantic Segmentation in Open-world Environments | 开放RGBT: 开放词汇RGB-T零样本语义分割在开放世界环境中的应用 | Meng Yu, Luojie Yang, Xunjie He, Yi Yang, Yufeng Yue | http://arxiv.org/pdf/2410.06626v1 | null |
2024-10-09 | Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers | Pair-VPR:基于视觉变换器的视觉地点识别之地域感知预训练与对比对分类方法 | Stephen Hausler, Peyman Moghadam | http://arxiv.org/pdf/2410.06614v1 | null |
2024-10-09 | Towards Natural Image Matting in the Wild via Real-Scenario Prior | 走向野外自然图像精细化处理:真实场景先验方法 | Ruihao Xia, Yu Liang, Peng-Tao Jiang, Hao Zhang, Qianru Sun, Yang Tang, Bo Li, Pan Zhou | http://arxiv.org/pdf/2410.06593v1 | null |
2024-10-09 | On The Relationship between Visual Anomaly-free and Anomalous Representations | 视觉无异常与异常表示之间的关系研究 | Riya Sadrani, Hrishikesh Sharma, Ayush Bachan | http://arxiv.org/pdf/2410.06576v1 | null |
2024-10-09 | MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging | MedImageInsight:面向通用领域医学影像的开源嵌入模型 | Noel C. F. Codella, Ying Jin, Shrey Jain, Yu Gu, Ho Hin Lee, Asma Ben Abacha, Alberto Santamaria-Pang, Will Guyman, Naiteek Sangani, Sheng Zhang, et.al. | http://arxiv.org/pdf/2410.06542v1 | null |
2024-10-09 | Deep Learning Ensemble for Predicting Diabetic Macular Edema Onset Using Ultra-Wide Field Color Fundus Image | 深度学习集成模型预测糖尿病黄斑水肿发病期:基于超广角彩色眼底图像分析 | Pengyao Qin, Arun J. Thirunavukarasu, Le Zhang | http://arxiv.org/pdf/2410.06483v1 | null |
2024-10-09 | LoTLIP: Improving Language-Image Pre-training for Long Text Understanding | LoTLIP:提升长文本理解的语图预训练方法 | Wei Wu, Kecheng Zheng, Shuailei Ma, Fan Lu, Yuxin Guo, Yifei Zhang, Wei Chen, Qingpei Guo, Yujun Shen, Zheng-Jun Zha | http://arxiv.org/pdf/2410.05249v2 | null |
2024-10-09 | RobustEMD: Domain Robust Matching for Cross-domain Few-shot Medical Image Segmentation | 域鲁棒匹配的跨域少样本医疗图像分割:RobustEMD方法 | Yazhou Zhu, Minxian Li, Qiaolin Ye, Shidong Wang, Tong Xin, Haofeng Zhang | http://arxiv.org/pdf/2410.01110v2 | null |
2024-10-09 | The BRAVO Semantic Segmentation Challenge Results in UNCV2024 | BRAVO语义分割挑战赛UNCV2024年结果分析 | Tuan-Hung Vu, Eduardo Valle, Andrei Bursuc, Tommie Kerssies, Daan de Geus, Gijs Dubbelman, Long Qian, Bingke Zhu, Yingying Chen, Ming Tang, et.al. | http://arxiv.org/pdf/2409.15107v2 | link |
2024-10-09 | Federated Impression for Learning with Distributed Heterogeneous Data | 联邦影响学习在分布式异构数据中的应用 | Atrin Arya, Sana Ayromlou, Armin Saadat, Purang Abolmaesumi, Xiaoxiao Li | http://arxiv.org/pdf/2409.07351v2 | link |
2024-10-09 | TASAR: Transfer-based Attack on Skeletal Action Recognition | TASAR:基于迁移的骨骼动作识别攻击方法 | Yunfeng Diao, Baiqi Wu, Ruixuan Zhang, Ajian Liu, Xingxing Wei, Meng Wang, He Wang | http://arxiv.org/pdf/2409.02483v2 | null |
2024-10-09 | Staircase Cascaded Fusion of Lightweight Local Pattern Recognition and Long-Range Dependencies for Structural Crack Segmentation | 轻量级局部模式识别与长距离依赖的阶梯式级联融合结构裂缝分割方法 | Hui Liu, Chen Jia, Fan Shi, Xu Cheng, Mianzhao Wang, Shengyong Chen | http://arxiv.org/pdf/2408.12815v2 | link |
2024-10-09 | Comprehensive Performance Evaluation of YOLO11, YOLOv10, YOLOv9 and YOLOv8 on Detecting and Counting Fruitlet in Complex Orchard Environments | 全面评估YOLO11、YOLOv10、YOLOv9和YOLOv8在复杂果园环境中检测与计数果实性能表现 | Ranjan Sapkota, Zhichao Meng, Martin Churuvija, Xiaoqiang Du, Zenghong Ma, Manoj Karkee | http://arxiv.org/pdf/2407.12040v4 | null |
2024-10-09 | Adaptive Parametric Activation | 自适应参数激活函数 | Konstantinos Panagiotis Alexandridis, Jiankang Deng, Anh Nguyen, Shan Luo | http://arxiv.org/pdf/2407.08567v2 | link |
2024-10-09 | Cell Tracking according to Biological Needs -- Strong Mitosis-aware Multi-Hypothesis Tracker with Aleatoric Uncertainty | 根据生物需求进行细胞追踪——具有强有力有丝分裂感知的多假设追踪器与随机不确定性 | Timo Kaiser, Maximilian Schier, Bodo Rosenhahn | http://arxiv.org/pdf/2403.15011v3 | null |
2024-10-09 | Topologically Faithful Multi-class Segmentation in Medical Images | 医学图像中的拓扑保真多类分割方法 | Alexander H. Berger, Nico Stucki, Laurin Lux, Vincent Buergin, Suprosanna Shit, Anna Banaszak, Daniel Rueckert, Ulrich Bauer, Johannes C. Paetzold | http://arxiv.org/pdf/2403.11001v2 | null |
2024-10-09 | Biophysics Informed Pathological Regularisation for Brain Tumour Segmentation | 基于生物物理信息病理正则化的脑肿瘤分割方法 | Lipei Zhang, Yanqi Cheng, Lihao Liu, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero | http://arxiv.org/pdf/2403.09136v3 | null |
2024-10-09 | AUPIMO: Redefining Visual Anomaly Detection Benchmarks with High Speed and Low Tolerance | AUPIMO:以高速与低容忍度重新定义视觉异常检测基准 | Joao P. C. Bertoldo, Dick Ameln, Ashwin Vaidya, Samet Akçay | http://arxiv.org/pdf/2401.01984v4 | link |
2024-10-09 | LISBET: a machine learning model for the automatic segmentation of social behavior motifs | LISBET:一种用于社交行为模式自动分割的机器学习模型 | Giuseppe Chindemi, Benoit Girard, Camilla Bellone | http://arxiv.org/pdf/2311.04069v2 | null |
2024-10-09 | Beyond the Visible: A Survey on Cross-spectral Face Recognition | 超越可见光:跨谱段人脸识别技术综述 | David Anghelone, Cunjian Chen, Arun Ross, Antitza Dantcheva | http://arxiv.org/pdf/2201.04435v4 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-09 | Analysis of different disparity estimation techniques on aerial stereo image datasets | 不同视差估计技术在航空立体图像数据集上的分析研究 | Ishan Narayan, Shashi Poddar | http://arxiv.org/pdf/2410.06711v1 | null |
2024-10-09 | M${}^{3}$Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes | M${}^{3}$Bench:三维场景中移动操作的全身体运动生成基准测试 | Zeyu Zhang, Sixu Yan, Muzhi Han, Zaijin Wang, Xinggang Wang, Song-Chun Zhu, Hangxin Liu | http://arxiv.org/pdf/2410.06678v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-09 | Clean Evaluations on Contaminated Visual Language Models | 清洁评估在污染视觉语言模型上的研究 | Hongyuan Lu, Shujie Miao, Wai Lam | http://arxiv.org/pdf/2410.07030v1 | null |
2024-10-09 | Preference Fine-Tuning for Factuality in Chest X-Ray Interpretation Models Without Human Feedback | ERROR | Dennis Hein, Zhihong Chen, Sophie Ostmeier, Justin Xu, Maya Varma, Eduardo Pontes Reis, Arne Edward Michalson, Christian Bluethgen, Hyun Joo Shin, Curtis Langlotz, et.al. | http://arxiv.org/pdf/2410.07025v1 | null |
2024-10-09 | Weak-eval-Strong: Evaluating and Eliciting Lateral Thinking of LLMs with Situation Puzzles | 弱评估-强激发:使用情境谜题评估和激发LLMs的横向思维能力 | Qi Chen, Bowen Zhang, Gang Wang, Qi Wu | http://arxiv.org/pdf/2410.06733v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-09 | LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning | LaMP:面向运动生成、检索与描述的语言-运动预训练模型 | Zhe Li, Weihao Yuan, Yisheng He, Lingteng Qiu, Shenhao Zhu, Xiaodong Gu, Weichao Shen, Yuan Dong, Zilong Dong, Laurence T. Yang | http://arxiv.org/pdf/2410.07093v1 | null |
2024-10-09 | Adaptive High-Frequency Transformer for Diverse Wildlife Re-Identification | 自适应高频变换器在多样化野生动物重识别中的应用 | Chenyue Li, Shuoyi Chen, Mang Ye | http://arxiv.org/pdf/2410.06977v1 | null |
2024-10-09 | ELMO: Enhanced Real-time LiDAR Motion Capture through Upsampling | ELMO:通过上采样增强实时LiDAR运动捕捉 | Deok-Kyeong Jang, Dongseok Yang, Deok-Yun Jang, Byeoli Choi, Donghoon Shin, Sung-hee Lee | http://arxiv.org/pdf/2410.06963v1 | null |
2024-10-09 | Reliable Probabilistic Human Trajectory Prediction for Autonomous Applications | 可靠的概率性人类轨迹预测在自主应用中的研究 | Manuel Hetzel, Hannes Reichert, Konrad Doll, Bernhard Sick | http://arxiv.org/pdf/2410.06905v1 | null |
2024-10-09 | MatMamba: A Matryoshka State Space Model | MatMamba:一种套娃状态空间模型 | Abhinav Shukla, Sai Vemprala, Aditya Kusupati, Ashish Kapoor | http://arxiv.org/pdf/2410.06718v1 | null |
2024-10-09 | SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference | Language Model Inference 中文翻译:SparseVLM:用于高效视觉-语言模型推理的视觉标记稀疏化 | Yuan Zhang, Chun-Kai Fan, Junpeng Ma, Wenzhao Zheng, Tao Huang, Kuan Cheng, Denis Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, et.al. | http://arxiv.org/pdf/2410.04417v2 | link |
2024-10-09 | Window-based Channel Attention for Wavelet-enhanced Learned Image Compression | 基于窗口的通道注意力机制在Wavelet增强学习图像压缩中的应用 | Heng Xu, Bowen Hai, Yushun Tang, Zhihai He | http://arxiv.org/pdf/2409.14090v2 | null |
2024-10-09 | GMSR:Gradient-Guided Mamba for Spectral Reconstruction from RGB Images | GMSR:基于梯度的Mamba算法实现RGB图像光谱重建 | Xinying Wang, Zhixiong Huang, Sifan Zhang, Jiawen Zhu, Paolo Gamba, Lin Feng | http://arxiv.org/pdf/2405.07777v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-09 | Thing2Reality: Transforming 2D Content into Conditioned Multiviews and 3D Gaussian Objects for XR Communication | Thing2Reality:将2D内容转换为条件多视图与3D高斯对象以实现XR通信 | Erzhen Hu, Mingyi Li, Jungtaek Hong, Xun Qian, Alex Olwal, David Kim, Seongkook Heo, Ruofei Du | http://arxiv.org/pdf/2410.07119v1 | null |
2024-10-09 | Z-upscaling: Optical Flow Guided Frame Interpolation for Isotropic Reconstruction of 3D EM Volumes | Z轴放大:光流引导的帧插值用于三维电子显微镜体积各向同性重建 | Fisseha A. Ferede, Ali Khalighifar, Jaison John, Krishnan Venkataraman, Khaled Khairy | http://arxiv.org/pdf/2410.07043v1 | null |
2024-10-09 | OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB | 全位姿追踪OmniPose6D:面向动态场景中短期物体位姿追踪的单目RGB方法 | Yunzhi Lin, Yipu Zhao, Fu-Jen Chu, Xingyu Chen, Weiyao Wang, Hao Tang, Patricio A. Vela, Matt Feiszli, Kevin Liang | http://arxiv.org/pdf/2410.06694v1 | null |
2024-10-09 | LocoVR: Multiuser Indoor Locomotion Dataset in Virtual Reality | LocoVR:虚拟现实中的多用户室内移动数据集 | Kojiro Takeyama, Yimeng Liu, Misha Sra | http://arxiv.org/pdf/2410.06437v1 | null |
2024-10-09 | SCILLA: SurfaCe Implicit Learning for Large Urban Area, a volumetric hybrid solution | SCILLA:大规模城市区域表面隐式学习,一种体积分层解决方案 | Hala Djeghim, Nathan Piasco, Moussab Bennehar, Luis Roldão, Dzmitry Tsishkou, Désiré Sidibé | http://arxiv.org/pdf/2403.10344v3 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-09 | Continual Learning: Less Forgetting, More OOD Generalization via Adaptive Contrastive Replay | 持续学习:通过自适应对比重放实现更少遗忘与更强OOD泛化能力 | Hossein Rezaei, Mohammad Sabokrou | http://arxiv.org/pdf/2410.07110v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-09 | EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models | EvolveDirector:利用大型视觉-语言模型实现高级文本到图像生成 | Rui Zhao, Hangjie Yuan, Yujie Wei, Shiwei Zhang, Yuchao Gu, Lingmin Ran, Xiang Wang, Zhangjie Wu, Junhao Zhang, Yingya Zhang, et.al. | http://arxiv.org/pdf/2410.07133v1 | null |
2024-10-09 | VHELM: A Holistic Evaluation of Vision Language Models | VHELM:视觉语言模型的全面评估 | Tony Lee, Haoqin Tu, Chi Heem Wong, Wenhao Zheng, Yiyang Zhou, Yifan Mai, Josselin Somerville Roberts, Michihiro Yasunaga, Huaxiu Yao, Cihang Xie, et.al. | http://arxiv.org/pdf/2410.07112v1 | null |
2024-10-09 | A Diffusion-based Xray2MRI Model: Generating Pseudo-MRI Volumes From one Single X-ray | 基于扩散的Xray2MRI模型:从单张X射线生成伪MRI体积数据 | Zhe Wang, Rachid Jennane, Aladine Chetouani, Mohamed Jarraya | http://arxiv.org/pdf/2410.06997v1 | null |
2024-10-09 | Evaluating Computational Pathology Foundation Models for Prostate Cancer Grading under Distribution Shifts | 评估计算病理学基础模型在前列腺癌分级中的分布迁移性能 | Fredrik K. Gustafsson, Mattias Rantalainen | http://arxiv.org/pdf/2410.06723v1 | null |
2024-10-09 | Happy: A Debiased Learning Framework for Continual Generalized Category Discovery | 快乐:一种去偏见的持续广义类别发现学习框架 | Shijie Ma, Fei Zhu, Zhun Zhong, Wenzhuo Liu, Xu-Yao Zhang, Cheng-Lin Liu | http://arxiv.org/pdf/2410.06535v1 | null |
2024-10-09 | MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning | MotionRL: 利用多奖励强化学习将文本到运动生成对齐至人类偏好 | Xiaoyang Liu, Yunyao Mao, Wengang Zhou, Houqiang Li | http://arxiv.org/pdf/2410.06513v1 | null |
2024-10-09 | MaskBlur: Spatial and Angular Data Augmentation for Light Field Image Super-Resolution | MaskBlur:光场图像超分辨率的空间与角度数据增强方法 | Wentao Chao, Fuqing Duan, Yulan Guo, Guanghui Wang | http://arxiv.org/pdf/2410.06478v1 | null |
2024-10-09 | From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning | 从通用到专业:通过任务特定视觉指令调整适应视觉语言模型 | Yang Bai, Yang Zhou, Jun Zhou, Rick Siow Mong Goh, Daniel Shu Wei Ting, Yong Liu | http://arxiv.org/pdf/2410.06456v1 | null |
2024-10-09 | Machine Unlearning in Forgettability Sequence | 机器遗忘序列中的逆向学习 | Junjie Chen, Qian Chen, Jian Lou, Xiaoyu Zhang, Kai Wu, Zilong Wang | http://arxiv.org/pdf/2410.06446v1 | null |
2024-10-09 | Motion and Structure from Event-based Normal Flow | 基于事件正常流的运动与结构估计 | Zhongyang Ren, Bangyan Liao, Delei Kong, Jinghang Li, Peidong Liu, Laurent Kneip, Guillermo Gallego, Yi Zhou | http://arxiv.org/pdf/2407.12239v3 | null |
2024-10-09 | Decompose and Compare Consistency: Measuring VLMs' Answer Reliability via Task-Decomposition Consistency Comparison | 解构与对比一致性:通过任务解构一致性比较衡量VLMs答案可靠性 | Qian Yang, Weixiang Yan, Aishwarya Agrawal | http://arxiv.org/pdf/2407.07840v3 | null |
2024-10-09 | Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models | 评估大规模视觉-语言模型幻觉基准的质量 | Bei Yan, Jie Zhang, Zheng Yuan, Shiguang Shan, Xilin Chen | http://arxiv.org/pdf/2406.17115v2 | link |
2024-10-09 | AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models | AutoHallusion: 针对视觉-语言模型的自动幻觉基准生成方法 | Xiyang Wu, Tianrui Guan, Dianqi Li, Shuaiyi Huang, Xiaoyu Liu, Xijun Wang, Ruiqi Xian, Abhinav Shrivastava, Furong Huang, Jordan Lee Boyd-Graber, et.al. | http://arxiv.org/pdf/2406.10900v2 | null |
2024-10-09 | AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales | AGL-NET:空地跨模态全局定位与多尺度变化研究 | Tianrui Guan, Ruiqi Xian, Xijun Wang, Xiyang Wu, Mohamed Elnoor, Daeun Song, Dinesh Manocha | http://arxiv.org/pdf/2404.03187v2 | null |
2024-10-09 | Less is More: High-value Data Selection for Visual Instruction Tuning | "少即是多:视觉指令调优中的高价值数据选择" | Zikang Liu, Kun Zhou, Wayne Xin Zhao, Dawei Gao, Yaliang Li, Ji-Rong Wen | http://arxiv.org/pdf/2403.09559v3 | null |
2024-10-09 | LM-HT SNN: Enhancing the Performance of SNN to ANN Counterpart through Learnable Multi-hierarchical Threshold Model | LM-HT SNN:通过可学习多层级阈值模型提升SNN性能至ANN对应水平 | Zecheng Hao, Xinyu Shi, Yujia Liu, Zhaofei Yu, Tiejun Huang | http://arxiv.org/pdf/2402.00411v2 | null |