Skip to content

Latest commit

 

History

History
executable file
·
118 lines (95 loc) · 18.1 KB

2024-08-14.md

File metadata and controls

executable file
·
118 lines (95 loc) · 18.1 KB

[UPDATED!] 2024-08-14 (Publish Time)

生成模型

Publish Date Title Title_CN Authors PDF Code
2024-08-14 Detecting Near-Duplicate Face Images 检测近似重复的面部图像 Sudipta Banerjee, Arun Ross http://arxiv.org/pdf/2408.07689v1 link
2024-08-14 DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model DifuzCam:用遮罩和扩散模型代替相机镜头 Erez Yosef, Raja Giryes http://arxiv.org/pdf/2408.07541v1 null
2024-08-14 DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency DeCo:具有运动一致性的解耦以人为中心的扩散视频编辑 Xiaojing Zhong, Xinyi Huang, Xiaofeng Yang, Guosheng Lin, Qingyao Wu http://arxiv.org/pdf/2408.07481v1 null
2024-08-14 One Step Diffusion-based Super-Resolution with Time-Aware Distillation 基于时间感知蒸馏的一步扩散超分辨率 Xiao He, Huaao Tang, Zhijun Tu, Junchao Zhang, Kun Cheng, Hanting Chen, Yong Guo, Mingrui Zhu, Nannan Wang, Xinbo Gao, et.al. http://arxiv.org/pdf/2408.07476v1 link
2024-08-14 Robust Semi-supervised Multimodal Medical Image Segmentation via Cross Modality Collaboration 通过跨模态协作实现稳健的半监督多模态医学图像分割 Xiaogen Zhon, Yiyou Sun, Min Deng, Winnie Chiu Wing Chu, Qi Dou http://arxiv.org/pdf/2408.07341v1 link
2024-08-14 KIND: Knowledge Integration and Diversion in Diffusion Models KIND:扩散模型中的知识整合与转移 Yucheng Xie, Fu Feng, Jing Wang, Xin Geng, Yong Rui http://arxiv.org/pdf/2408.07337v1 null
2024-08-14 GRIF-DM: Generation of Rich Impression Fonts using Diffusion Models GRIF-DM:使用扩散模型生成丰富的印象字体 Lei Kang, Fei Yang, Kai Wang, Mohamed Ali Souibgui, Lluis Gomez, Alicia Fornés, Ernest Valveny, Dimosthenis Karatzas http://arxiv.org/pdf/2408.07259v1 link
2024-08-14 All-around Neural Collapse for Imbalanced Classification 针对不平衡分类的全方位神经折叠 Enhao Zhang, Chaohua Li, Chuanxing Geng, Songcan Chen http://arxiv.org/pdf/2408.07253v1 null

多模态

Publish Date Title Title_CN Authors PDF Code
2024-08-14 End-to-end Semantic-centric Video-based Multimodal Affective Computing 端到端以语义为中心的基于视频的多模态情感计算 Ronghao Lin, Ying Zeng, Sijie Mai, Haifeng Hu http://arxiv.org/pdf/2408.07694v1 null
2024-08-14 Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities LLM、MLLM 及其他领域的模型合并:方法、理论、应用和机会 Enneng Yang, Li Shen, Guibing Guo, Xingwei Wang, Xiaochun Cao, Jie Zhang, Dacheng Tao http://arxiv.org/pdf/2408.07666v1 link
2024-08-14 MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark MathScape:通过分层基准评估多模态数学场景中的 MLLM Minxuan Zhou, Hao Liang, Tianpeng Li, Zhiyu Wu, Mingan Lin, Linzhuang Sun, Yaqi Zhou, Yan Zhang, Xiaoqin Huang, Yicong Chen, et.al. http://arxiv.org/pdf/2408.07543v1 null
2024-08-14 Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach 模态不变多模态学习处理缺失模态:单分支方法 Muhammad Saad Saeed, Shah Nawaz, Muhammad Zaigham Zaheer, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf, Hassan Sajjad, Tom De Schepper, Markus Schedl http://arxiv.org/pdf/2408.07445v1 null
2024-08-14 LLMI3D: Empowering LLM with 3D Perception from a Single 2D Image LLMI3D:通过单个 2D 图像为 LLM 提供 3D 感知能力 Fan Yang, Sicheng Zhao, Yanhao Zhang, Haoxiang Chen, Hui Chen, Wenbo Tang, Haonan Lu, Pengfei Xu, Zhenyu Yang, Jungong Han, et.al. http://arxiv.org/pdf/2408.07422v1 null
2024-08-14 Automated Retinal Image Analysis and Medical Report Generation through Deep Learning 通过深度学习自动分析视网膜图像并生成医疗报告 Jia-Hong Huang http://arxiv.org/pdf/2408.07349v1 null
2024-08-14 Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion 通过基于排名的混合训练和多模态融合增强视觉问答 Peiyuan Chen, Zecheng Zhang, Yiping Dong, Li Zhou, Han Wang http://arxiv.org/pdf/2408.07303v1 null
2024-08-14 Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM 观察与理解:通过 ChemVLM 将视觉与化学知识联系起来 Junxian Li, Di Zhang, Xunzhi Wang, Zeying Hao, Jingdi Lei, Qian Tan, Cai Zhou, Wei Liu, Weiyun Wang, Zhe Chen, et.al. http://arxiv.org/pdf/2408.07246v1 null

Nerf

Publish Date Title Title_CN Authors PDF Code
2024-08-14 Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space 重新思考三维空间中辐射场的开放词汇分割 Hyunjee Lee, Youngsik Yun, Jeongmin Bae, Seoha Kim, Youngjung Uh http://arxiv.org/pdf/2408.07416v1 null

3DGS

Publish Date Title Title_CN Authors PDF Code
2024-08-14 Progressive Radiance Distillation for Inverse Rendering with Gaussian Splatting 利用高斯溅射进行逆向渲染的渐进式辐射度蒸馏 Keyang Ye, Qiming Hou, Kun Zhou http://arxiv.org/pdf/2408.07595v1 null
2024-08-14 3D Gaussian Editing with A Single Image 使用单幅图像进行 3D 高斯编辑 Guan Luo, Tian-Xing Xu, Ying-Tian Liu, Xiao-Xiong Fan, Fang-Lue Zhang, Song-Hai Zhang http://arxiv.org/pdf/2408.07540v1 null

模型压缩/优化

Publish Date Title Title_CN Authors PDF Code
2024-08-14 Knowledge Distillation with Refined Logits 利用精炼 Logits 进行知识蒸馏 Wujie Sun, Defang Chen, Siwei Lyu, Genlang Chen, Chun Chen, Can Wang http://arxiv.org/pdf/2408.07703v1 link
2024-08-14 Towards Real-time Video Compressive Sensing on Mobile Devices 面向移动设备的实时视频压缩感知 Miao Cao, Lishun Wang, Huan Wang, Guoqing Wang, Xin Yuan http://arxiv.org/pdf/2408.07530v1 link
2024-08-14 Infra-YOLO: Efficient Neural Network Structure with Model Compression for Real-Time Infrared Small Object Detection Infra-YOLO:具有模型压缩的高效神经网络结构,用于实时红外小物体检测 Zhonglin Chen, Anyu Geng, Jianan Jiang, Jiwu Lu, Di Wu http://arxiv.org/pdf/2408.07455v1 null
2024-08-14 Leveraging Perceptual Scores for Dataset Pruning in Computer Vision Tasks 利用感知分数进行计算机视觉任务中的数据集修剪 Raghavendra Singh http://arxiv.org/pdf/2408.07243v1 null

分类/检测/识别/分割/...

Publish Date Title Title_CN Authors PDF Code
2024-08-14 A Spitting Image: Modular Superpixel Tokenization in Vision Transformers 一模一样:Vision Transformers 中的模块化超像素标记化 Marius Aasan, Odd Kolbjørnsen, Anne Schistad Solberg, Adín Ramirez Rivera http://arxiv.org/pdf/2408.07680v1 link
2024-08-14 See It All: Contextualized Late Aggregation for 3D Dense Captioning 了解全部内容:3D 密集字幕的上下文化后期聚合 Minjung Kim, Hyung Suk Lim, Seung Hwan Kim, Soonyoung Lee, Bumsoo Kim, Gunhee Kim http://arxiv.org/pdf/2408.07648v1 null
2024-08-14 Boosting Unconstrained Face Recognition with Targeted Style Adversary 利用有针对性的风格对手提高无约束人脸识别率 Mohammad Saeed Ebrahimi Saadabadi, Sahar Rahimi Malakshan, Seyed Rasoul Hosseini, Nasser M. Nasrabadi http://arxiv.org/pdf/2408.07642v1 null
2024-08-14 Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving Panacea+:用于自动驾驶的全景可控视频生成 Yuqing Wen, Yucheng Zhao, Yingfei Liu, Binyuan Huang, Fan Jia, Yanhui Wang, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang http://arxiv.org/pdf/2408.07605v1 null
2024-08-14 Transformers and Large Language Models for Efficient Intrusion Detection Systems: A Comprehensive Survey 用于高效入侵检测系统的 Transformer 和大型语言模型:全面调查 Hamza Kheddar http://arxiv.org/pdf/2408.07583v1 null
2024-08-14 MetaSeg: MetaFormer-based Global Contexts-aware Network for Efficient Semantic Segmentation MetaSeg:基于 MetaFormer 的全局上下文感知网络,用于高效语义分割 Beoungwoo Kang, Seunghun Moon, Yubin Cho, Hyunwoo Yu, Suk-Ju Kang http://arxiv.org/pdf/2408.07576v1 link
2024-08-14 Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation 用于参考图像分割的具有阶段划分视觉和语言变换编码器的交叉感知早期融合 Yubin Cho, Hyunwoo Yu, Suk-ju Kang http://arxiv.org/pdf/2408.07539v1 null
2024-08-14 Improved 3D Whole Heart Geometry from Sparse CMR Slices 通过稀疏 CMR 切片改进 3D 全心脏几何结构 Yiyang Xu, Hao Xu, Matthew Sinclair, Esther Puyol-Antón, Steven A Niederer, Amedeo Chiribiri, Steven E Williams, Michelle C Williams, Alistair A Young http://arxiv.org/pdf/2408.07532v1 link
2024-08-14 Attention-Guided Perturbation for Unsupervised Image Anomaly Detection 用于无监督图像异常检测的注意力引导扰动 Tingfeng Huang, Yuxuan Cheng, Jingbo Xia, Rui Yu, Yuxuan Cai, Jinhai Xiang, Xinwei He, Xiang Bai http://arxiv.org/pdf/2408.07490v1 null
2024-08-14 OMR: Occlusion-Aware Memory-Based Refinement for Video Lane Detection OMR:基于遮挡感知记忆的视频车道检测细化方法 Dongkwon Jin, Chang-Su Kim http://arxiv.org/pdf/2408.07486v1 null
2024-08-14 Domain-invariant Representation Learning via Segment Anything Model for Blood Cell Classification 通过任意分割模型进行领域不变表征学习以实现血细胞分类 Yongcheng Li, Lingcong Cai, Ying Lu, Cheng Lin, Yupeng Zhang, Jingyan Jiang, Genan Dai, Bowen Zhang, Jingzhou Cao, Xiangzhong Zhang, et.al. http://arxiv.org/pdf/2408.07467v1 link
2024-08-14 Costal Cartilage Segmentation with Topology Guided Deformable Mamba: Method and Benchmark 使用拓扑引导可变形曼巴模型进行肋软骨分割:方法与基准 Senmao Wang, Haifan Gong, Runmeng Cui, Boyao Wan, Yicheng Liu, Zhonglin Hu, Haiqing Yang, Jingyang Zhou, Bo Pan, Lin Lin, et.al. http://arxiv.org/pdf/2408.07444v1 null
2024-08-14 MagicFace: Training-free Universal-Style Human Image Customized Synthesis MagicFace:无需训练的通用风格人像定制合成 Yibin Wang, Weizhong Zhang, Cheng Jin http://arxiv.org/pdf/2408.07433v1 null
2024-08-14 UAHOI: Uncertainty-aware Robust Interaction Learning for HOI Detection UAHOI:用于 HOI 检测的不确定性感知稳健交互学习 Mu Chen, Minghan Chen, Yi Yang http://arxiv.org/pdf/2408.07430v1 null
2024-08-14 Segment Using Just One Example 仅使用一个示例进行细分 Pratik Vora, Sudipan Saha http://arxiv.org/pdf/2408.07393v1 null
2024-08-14 RTAT: A Robust Two-stage Association Tracker for Multi-Object Tracking RTAT:用于多目标跟踪的鲁棒两阶段关联跟踪器 Song Guo, Rujie Liu, Narishige Abe http://arxiv.org/pdf/2408.07344v1 null
2024-08-14 Gradient Alignment Improves Test-Time Adaptation for Medical Image Segmentation 梯度对齐可改善医学图像分割的测试时间适应性 Ziyang Chen, Yiwen Ye, Yongsheng Pan, Yong Xia http://arxiv.org/pdf/2408.07343v1 null
2024-08-14 Image-Based Leopard Seal Recognition: Approaches and Challenges in Current Automated Systems 基于图像的豹海豹识别:当前自动化系统中的方法和挑战 Jorge Yero Salazar, Pablo Rivas, Renato Borras-Chavez, Sarah Kienle http://arxiv.org/pdf/2408.07269v1 null
2024-08-14 Lesion-aware network for diabetic retinopathy diagnosis 用于糖尿病视网膜病变诊断的病变感知网络 Xue Xia, Kun Zhan, Yuming Fang, Wenhui Jiang, Fei Shen http://arxiv.org/pdf/2408.07264v1 link
2024-08-14 Ensemble architecture in polyp segmentation 息肉分割中的集成架构 Hao-Yun Hsu, Yi-Ching Cheng, Guan-Hua Huang http://arxiv.org/pdf/2408.07262v1 link
2024-08-14 GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval GQE:增强文本视频检索的通用查询扩展 Zechen Bai, Tianjun Xiao, Tong He, Pichao Wang, Zheng Zhang, Thomas Brox, Mike Zheng Shou http://arxiv.org/pdf/2408.07249v1 null
2024-08-14 Sign language recognition based on deep learning and low-cost handcrafted descriptors 基于深度学习和低成本手工描述符的手语识别 Alvaro Leandro Cavalcante Carneiro, Denis Henrique Pinheiro Salvadeo, Lucas de Brito Silva http://arxiv.org/pdf/2408.07244v1 null
2024-08-14 Enhancing Autonomous Vehicle Perception in Adverse Weather through Image Augmentation during Semantic Segmentation Training 通过语义分割训练期间的图像增强功能增强自动驾驶汽车在恶劣天气下的感知能力 Ethan Kou, Noah Curran http://arxiv.org/pdf/2408.07239v1 null

图像理解

Publish Date Title Title_CN Authors PDF Code
2024-08-14 Enhanced Scale-aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling 利用几何建模增强单目内窥镜场景的尺度感知深度估计 Ruofeng Wei, Bin Li, Kai Chen, Yiyao Ma, Yunhui Liu, Qi Dou http://arxiv.org/pdf/2408.07266v1 null

Transformer

Publish Date Title Title_CN Authors PDF Code
2024-08-14 G$^2$V$^2$former: Graph Guided Video Vision Transformer for Face Anti-Spoofing G$^2$V$^2$former:用于人脸反欺骗的图形引导视频视觉转换器 Jingyi Yang, Zitong Yu, Xiuming Ni, Jia He, Hui Li http://arxiv.org/pdf/2408.07675v1 null
2024-08-14 Sonic: Fast and Transferable Data Poisoning on Clustering Algorithms Sonic:聚类算法中快速且可转移的数据中毒 Francesco Villani, Dario Lazzaro, Antonio Emanuele Cinà, Matteo Dell'Amico, Battista Biggio, Fabio Roli http://arxiv.org/pdf/2408.07558v1 null
2024-08-14 DIffSteISR: Harnessing Diffusion Prior for Superior Real-world Stereo Image Super-Resolution DIffSteISR:利用扩散先验实现卓越的真实世界立体图像超分辨率 Yuanbo Zhou, Xinlin Zhang, Wei Deng, Tao Wang, Tao Tan, Qinquan Gao, Tong Tong http://arxiv.org/pdf/2408.07516v1 null
2024-08-14 CNN-JEPA: Self-Supervised Pretraining Convolutional Neural Networks Using Joint Embedding Predictive Architecture CNN-JEPA:使用联合嵌入预测架构的自监督预训练卷积神经网络 András Kalapos, Bálint Gyires-Tóth http://arxiv.org/pdf/2408.07514v1 null
2024-08-14 GRFormer: Grouped Residual Self-Attention for Lightweight Single Image Super-Resolution GRFormer:用于轻量级单图像超分辨率的分组残差自注意力 Yuzhen Li, Zehang Deng, Yuxin Cao, Lihua Liu http://arxiv.org/pdf/2408.07484v1 link
2024-08-14 Unsupervised Stereo Matching Network For VHR Remote Sensing Images Based On Error Prediction 基于误差预测的 VHR 遥感图像无监督立体匹配网络 Liting Jiang, Yuming Xiang, Feng Wang, Hongjian You http://arxiv.org/pdf/2408.07419v1 link

3D/CG

Publish Date Title Title_CN Authors PDF Code
2024-08-14 RSD-DOG : A New Image Descriptor based on Second Order Derivatives RSD-DOG:一种基于二阶导数的新型图像描述符 Darshan Venkatrayappa, Philippe Montesinos, Daniel Diep, Baptiste Magnier http://arxiv.org/pdf/2408.07687v1 null
2024-08-14 Rethinking the Key Factors for the Generalization of Remote Sensing Stereo Matching Networks 遥感立体匹配网络推广关键因素的再思考 Liting Jiang, Feng Wang, Wenyi Zhang, Peifeng Li, Hongjian You, Yuming Xiang http://arxiv.org/pdf/2408.07613v1 null

各类学习方式

Publish Date Title Title_CN Authors PDF Code
2024-08-14 Evidential Graph Contrastive Alignment for Source-Free Blending-Target Domain Adaptation 用于无源混合目标域自适应的证据图对比对齐 Juepeng Zheng, Yibin Wen, Jinxiao Zhang, Runmin Dong, Haohuan Fu http://arxiv.org/pdf/2408.07527v1 null

其他

Publish Date Title Title_CN Authors PDF Code
2024-08-14 Disentangle and denoise: Tackling context misalignment for video moment retrieval 解开并去噪:解决视频时刻检索的上下文错位问题 Kaijing Ma, Han Fang, Xianghao Zang, Chao Ban, Lanxiang Zhou, Zhongjiang He, Yongxiang Li, Hao Sun, Zerun Feng, Xingsong Hou http://arxiv.org/pdf/2408.07600v1 null
2024-08-14 Whitening Consistently Improves Self-Supervised Learning 白化持续改善自监督学习 András Kalapos, Bálint Gyires-Tóth http://arxiv.org/pdf/2408.07519v1 null
2024-08-14 Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach 跨平台视频行人重识别:新的基准数据集和自适应方法 Shizhou Zhang, Wenlong Luo, De Cheng, Qingchun Yang, Lingyan Ran, Yinghui Xing, Yanning Zhang http://arxiv.org/pdf/2408.07500v1 link
2024-08-14 BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning BAPLe:利用即时学习对医学基础模型进行后门攻击 Asif Hanif, Fahad Shamshad, Muhammad Awais, Muzammal Naseer, Fahad Shahbaz Khan, Karthik Nandakumar, Salman Khan, Rao Muhammad Anwer http://arxiv.org/pdf/2408.07440v1 null
2024-08-14 Achieving Data Efficient Neural Networks with Hybrid Concept-based Models 利用混合概念模型实现数据高效的神经网络 Tobias A. Opsahl, Vegard Antun http://arxiv.org/pdf/2408.07438v1 link