Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-14 | Detecting Near-Duplicate Face Images | 检测近似重复的面部图像 | Sudipta Banerjee, Arun Ross | http://arxiv.org/pdf/2408.07689v1 | link |
2024-08-14 | DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model | DifuzCam:用遮罩和扩散模型代替相机镜头 | Erez Yosef, Raja Giryes | http://arxiv.org/pdf/2408.07541v1 | null |
2024-08-14 | DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency | DeCo:具有运动一致性的解耦以人为中心的扩散视频编辑 | Xiaojing Zhong, Xinyi Huang, Xiaofeng Yang, Guosheng Lin, Qingyao Wu | http://arxiv.org/pdf/2408.07481v1 | null |
2024-08-14 | One Step Diffusion-based Super-Resolution with Time-Aware Distillation | 基于时间感知蒸馏的一步扩散超分辨率 | Xiao He, Huaao Tang, Zhijun Tu, Junchao Zhang, Kun Cheng, Hanting Chen, Yong Guo, Mingrui Zhu, Nannan Wang, Xinbo Gao, et.al. | http://arxiv.org/pdf/2408.07476v1 | link |
2024-08-14 | Robust Semi-supervised Multimodal Medical Image Segmentation via Cross Modality Collaboration | 通过跨模态协作实现稳健的半监督多模态医学图像分割 | Xiaogen Zhon, Yiyou Sun, Min Deng, Winnie Chiu Wing Chu, Qi Dou | http://arxiv.org/pdf/2408.07341v1 | link |
2024-08-14 | KIND: Knowledge Integration and Diversion in Diffusion Models | KIND:扩散模型中的知识整合与转移 | Yucheng Xie, Fu Feng, Jing Wang, Xin Geng, Yong Rui | http://arxiv.org/pdf/2408.07337v1 | null |
2024-08-14 | GRIF-DM: Generation of Rich Impression Fonts using Diffusion Models | GRIF-DM:使用扩散模型生成丰富的印象字体 | Lei Kang, Fei Yang, Kai Wang, Mohamed Ali Souibgui, Lluis Gomez, Alicia Fornés, Ernest Valveny, Dimosthenis Karatzas | http://arxiv.org/pdf/2408.07259v1 | link |
2024-08-14 | All-around Neural Collapse for Imbalanced Classification | 针对不平衡分类的全方位神经折叠 | Enhao Zhang, Chaohua Li, Chuanxing Geng, Songcan Chen | http://arxiv.org/pdf/2408.07253v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-14 | End-to-end Semantic-centric Video-based Multimodal Affective Computing | 端到端以语义为中心的基于视频的多模态情感计算 | Ronghao Lin, Ying Zeng, Sijie Mai, Haifeng Hu | http://arxiv.org/pdf/2408.07694v1 | null |
2024-08-14 | Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities | LLM、MLLM 及其他领域的模型合并:方法、理论、应用和机会 | Enneng Yang, Li Shen, Guibing Guo, Xingwei Wang, Xiaochun Cao, Jie Zhang, Dacheng Tao | http://arxiv.org/pdf/2408.07666v1 | link |
2024-08-14 | MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark | MathScape:通过分层基准评估多模态数学场景中的 MLLM | Minxuan Zhou, Hao Liang, Tianpeng Li, Zhiyu Wu, Mingan Lin, Linzhuang Sun, Yaqi Zhou, Yan Zhang, Xiaoqin Huang, Yicong Chen, et.al. | http://arxiv.org/pdf/2408.07543v1 | null |
2024-08-14 | Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach | 模态不变多模态学习处理缺失模态:单分支方法 | Muhammad Saad Saeed, Shah Nawaz, Muhammad Zaigham Zaheer, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf, Hassan Sajjad, Tom De Schepper, Markus Schedl | http://arxiv.org/pdf/2408.07445v1 | null |
2024-08-14 | LLMI3D: Empowering LLM with 3D Perception from a Single 2D Image | LLMI3D:通过单个 2D 图像为 LLM 提供 3D 感知能力 | Fan Yang, Sicheng Zhao, Yanhao Zhang, Haoxiang Chen, Hui Chen, Wenbo Tang, Haonan Lu, Pengfei Xu, Zhenyu Yang, Jungong Han, et.al. | http://arxiv.org/pdf/2408.07422v1 | null |
2024-08-14 | Automated Retinal Image Analysis and Medical Report Generation through Deep Learning | 通过深度学习自动分析视网膜图像并生成医疗报告 | Jia-Hong Huang | http://arxiv.org/pdf/2408.07349v1 | null |
2024-08-14 | Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion | 通过基于排名的混合训练和多模态融合增强视觉问答 | Peiyuan Chen, Zecheng Zhang, Yiping Dong, Li Zhou, Han Wang | http://arxiv.org/pdf/2408.07303v1 | null |
2024-08-14 | Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM | 观察与理解:通过 ChemVLM 将视觉与化学知识联系起来 | Junxian Li, Di Zhang, Xunzhi Wang, Zeying Hao, Jingdi Lei, Qian Tan, Cai Zhou, Wei Liu, Weiyun Wang, Zhe Chen, et.al. | http://arxiv.org/pdf/2408.07246v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-14 | Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space | 重新思考三维空间中辐射场的开放词汇分割 | Hyunjee Lee, Youngsik Yun, Jeongmin Bae, Seoha Kim, Youngjung Uh | http://arxiv.org/pdf/2408.07416v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-14 | Progressive Radiance Distillation for Inverse Rendering with Gaussian Splatting | 利用高斯溅射进行逆向渲染的渐进式辐射度蒸馏 | Keyang Ye, Qiming Hou, Kun Zhou | http://arxiv.org/pdf/2408.07595v1 | null |
2024-08-14 | 3D Gaussian Editing with A Single Image | 使用单幅图像进行 3D 高斯编辑 | Guan Luo, Tian-Xing Xu, Ying-Tian Liu, Xiao-Xiong Fan, Fang-Lue Zhang, Song-Hai Zhang | http://arxiv.org/pdf/2408.07540v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-14 | Knowledge Distillation with Refined Logits | 利用精炼 Logits 进行知识蒸馏 | Wujie Sun, Defang Chen, Siwei Lyu, Genlang Chen, Chun Chen, Can Wang | http://arxiv.org/pdf/2408.07703v1 | link |
2024-08-14 | Towards Real-time Video Compressive Sensing on Mobile Devices | 面向移动设备的实时视频压缩感知 | Miao Cao, Lishun Wang, Huan Wang, Guoqing Wang, Xin Yuan | http://arxiv.org/pdf/2408.07530v1 | link |
2024-08-14 | Infra-YOLO: Efficient Neural Network Structure with Model Compression for Real-Time Infrared Small Object Detection | Infra-YOLO:具有模型压缩的高效神经网络结构,用于实时红外小物体检测 | Zhonglin Chen, Anyu Geng, Jianan Jiang, Jiwu Lu, Di Wu | http://arxiv.org/pdf/2408.07455v1 | null |
2024-08-14 | Leveraging Perceptual Scores for Dataset Pruning in Computer Vision Tasks | 利用感知分数进行计算机视觉任务中的数据集修剪 | Raghavendra Singh | http://arxiv.org/pdf/2408.07243v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-14 | A Spitting Image: Modular Superpixel Tokenization in Vision Transformers | 一模一样:Vision Transformers 中的模块化超像素标记化 | Marius Aasan, Odd Kolbjørnsen, Anne Schistad Solberg, Adín Ramirez Rivera | http://arxiv.org/pdf/2408.07680v1 | link |
2024-08-14 | See It All: Contextualized Late Aggregation for 3D Dense Captioning | 了解全部内容:3D 密集字幕的上下文化后期聚合 | Minjung Kim, Hyung Suk Lim, Seung Hwan Kim, Soonyoung Lee, Bumsoo Kim, Gunhee Kim | http://arxiv.org/pdf/2408.07648v1 | null |
2024-08-14 | Boosting Unconstrained Face Recognition with Targeted Style Adversary | 利用有针对性的风格对手提高无约束人脸识别率 | Mohammad Saeed Ebrahimi Saadabadi, Sahar Rahimi Malakshan, Seyed Rasoul Hosseini, Nasser M. Nasrabadi | http://arxiv.org/pdf/2408.07642v1 | null |
2024-08-14 | Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving | Panacea+:用于自动驾驶的全景可控视频生成 | Yuqing Wen, Yucheng Zhao, Yingfei Liu, Binyuan Huang, Fan Jia, Yanhui Wang, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang | http://arxiv.org/pdf/2408.07605v1 | null |
2024-08-14 | Transformers and Large Language Models for Efficient Intrusion Detection Systems: A Comprehensive Survey | 用于高效入侵检测系统的 Transformer 和大型语言模型:全面调查 | Hamza Kheddar | http://arxiv.org/pdf/2408.07583v1 | null |
2024-08-14 | MetaSeg: MetaFormer-based Global Contexts-aware Network for Efficient Semantic Segmentation | MetaSeg:基于 MetaFormer 的全局上下文感知网络,用于高效语义分割 | Beoungwoo Kang, Seunghun Moon, Yubin Cho, Hyunwoo Yu, Suk-Ju Kang | http://arxiv.org/pdf/2408.07576v1 | link |
2024-08-14 | Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation | 用于参考图像分割的具有阶段划分视觉和语言变换编码器的交叉感知早期融合 | Yubin Cho, Hyunwoo Yu, Suk-ju Kang | http://arxiv.org/pdf/2408.07539v1 | null |
2024-08-14 | Improved 3D Whole Heart Geometry from Sparse CMR Slices | 通过稀疏 CMR 切片改进 3D 全心脏几何结构 | Yiyang Xu, Hao Xu, Matthew Sinclair, Esther Puyol-Antón, Steven A Niederer, Amedeo Chiribiri, Steven E Williams, Michelle C Williams, Alistair A Young | http://arxiv.org/pdf/2408.07532v1 | link |
2024-08-14 | Attention-Guided Perturbation for Unsupervised Image Anomaly Detection | 用于无监督图像异常检测的注意力引导扰动 | Tingfeng Huang, Yuxuan Cheng, Jingbo Xia, Rui Yu, Yuxuan Cai, Jinhai Xiang, Xinwei He, Xiang Bai | http://arxiv.org/pdf/2408.07490v1 | null |
2024-08-14 | OMR: Occlusion-Aware Memory-Based Refinement for Video Lane Detection | OMR:基于遮挡感知记忆的视频车道检测细化方法 | Dongkwon Jin, Chang-Su Kim | http://arxiv.org/pdf/2408.07486v1 | null |
2024-08-14 | Domain-invariant Representation Learning via Segment Anything Model for Blood Cell Classification | 通过任意分割模型进行领域不变表征学习以实现血细胞分类 | Yongcheng Li, Lingcong Cai, Ying Lu, Cheng Lin, Yupeng Zhang, Jingyan Jiang, Genan Dai, Bowen Zhang, Jingzhou Cao, Xiangzhong Zhang, et.al. | http://arxiv.org/pdf/2408.07467v1 | link |
2024-08-14 | Costal Cartilage Segmentation with Topology Guided Deformable Mamba: Method and Benchmark | 使用拓扑引导可变形曼巴模型进行肋软骨分割:方法与基准 | Senmao Wang, Haifan Gong, Runmeng Cui, Boyao Wan, Yicheng Liu, Zhonglin Hu, Haiqing Yang, Jingyang Zhou, Bo Pan, Lin Lin, et.al. | http://arxiv.org/pdf/2408.07444v1 | null |
2024-08-14 | MagicFace: Training-free Universal-Style Human Image Customized Synthesis | MagicFace:无需训练的通用风格人像定制合成 | Yibin Wang, Weizhong Zhang, Cheng Jin | http://arxiv.org/pdf/2408.07433v1 | null |
2024-08-14 | UAHOI: Uncertainty-aware Robust Interaction Learning for HOI Detection | UAHOI:用于 HOI 检测的不确定性感知稳健交互学习 | Mu Chen, Minghan Chen, Yi Yang | http://arxiv.org/pdf/2408.07430v1 | null |
2024-08-14 | Segment Using Just One Example | 仅使用一个示例进行细分 | Pratik Vora, Sudipan Saha | http://arxiv.org/pdf/2408.07393v1 | null |
2024-08-14 | RTAT: A Robust Two-stage Association Tracker for Multi-Object Tracking | RTAT:用于多目标跟踪的鲁棒两阶段关联跟踪器 | Song Guo, Rujie Liu, Narishige Abe | http://arxiv.org/pdf/2408.07344v1 | null |
2024-08-14 | Gradient Alignment Improves Test-Time Adaptation for Medical Image Segmentation | 梯度对齐可改善医学图像分割的测试时间适应性 | Ziyang Chen, Yiwen Ye, Yongsheng Pan, Yong Xia | http://arxiv.org/pdf/2408.07343v1 | null |
2024-08-14 | Image-Based Leopard Seal Recognition: Approaches and Challenges in Current Automated Systems | 基于图像的豹海豹识别:当前自动化系统中的方法和挑战 | Jorge Yero Salazar, Pablo Rivas, Renato Borras-Chavez, Sarah Kienle | http://arxiv.org/pdf/2408.07269v1 | null |
2024-08-14 | Lesion-aware network for diabetic retinopathy diagnosis | 用于糖尿病视网膜病变诊断的病变感知网络 | Xue Xia, Kun Zhan, Yuming Fang, Wenhui Jiang, Fei Shen | http://arxiv.org/pdf/2408.07264v1 | link |
2024-08-14 | Ensemble architecture in polyp segmentation | 息肉分割中的集成架构 | Hao-Yun Hsu, Yi-Ching Cheng, Guan-Hua Huang | http://arxiv.org/pdf/2408.07262v1 | link |
2024-08-14 | GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval | GQE:增强文本视频检索的通用查询扩展 | Zechen Bai, Tianjun Xiao, Tong He, Pichao Wang, Zheng Zhang, Thomas Brox, Mike Zheng Shou | http://arxiv.org/pdf/2408.07249v1 | null |
2024-08-14 | Sign language recognition based on deep learning and low-cost handcrafted descriptors | 基于深度学习和低成本手工描述符的手语识别 | Alvaro Leandro Cavalcante Carneiro, Denis Henrique Pinheiro Salvadeo, Lucas de Brito Silva | http://arxiv.org/pdf/2408.07244v1 | null |
2024-08-14 | Enhancing Autonomous Vehicle Perception in Adverse Weather through Image Augmentation during Semantic Segmentation Training | 通过语义分割训练期间的图像增强功能增强自动驾驶汽车在恶劣天气下的感知能力 | Ethan Kou, Noah Curran | http://arxiv.org/pdf/2408.07239v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-14 | Enhanced Scale-aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling | 利用几何建模增强单目内窥镜场景的尺度感知深度估计 | Ruofeng Wei, Bin Li, Kai Chen, Yiyao Ma, Yunhui Liu, Qi Dou | http://arxiv.org/pdf/2408.07266v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-14 | G$^2$V$^2$former: Graph Guided Video Vision Transformer for Face Anti-Spoofing | G$^2$V$^2$former:用于人脸反欺骗的图形引导视频视觉转换器 | Jingyi Yang, Zitong Yu, Xiuming Ni, Jia He, Hui Li | http://arxiv.org/pdf/2408.07675v1 | null |
2024-08-14 | Sonic: Fast and Transferable Data Poisoning on Clustering Algorithms | Sonic:聚类算法中快速且可转移的数据中毒 | Francesco Villani, Dario Lazzaro, Antonio Emanuele Cinà, Matteo Dell'Amico, Battista Biggio, Fabio Roli | http://arxiv.org/pdf/2408.07558v1 | null |
2024-08-14 | DIffSteISR: Harnessing Diffusion Prior for Superior Real-world Stereo Image Super-Resolution | DIffSteISR:利用扩散先验实现卓越的真实世界立体图像超分辨率 | Yuanbo Zhou, Xinlin Zhang, Wei Deng, Tao Wang, Tao Tan, Qinquan Gao, Tong Tong | http://arxiv.org/pdf/2408.07516v1 | null |
2024-08-14 | CNN-JEPA: Self-Supervised Pretraining Convolutional Neural Networks Using Joint Embedding Predictive Architecture | CNN-JEPA:使用联合嵌入预测架构的自监督预训练卷积神经网络 | András Kalapos, Bálint Gyires-Tóth | http://arxiv.org/pdf/2408.07514v1 | null |
2024-08-14 | GRFormer: Grouped Residual Self-Attention for Lightweight Single Image Super-Resolution | GRFormer:用于轻量级单图像超分辨率的分组残差自注意力 | Yuzhen Li, Zehang Deng, Yuxin Cao, Lihua Liu | http://arxiv.org/pdf/2408.07484v1 | link |
2024-08-14 | Unsupervised Stereo Matching Network For VHR Remote Sensing Images Based On Error Prediction | 基于误差预测的 VHR 遥感图像无监督立体匹配网络 | Liting Jiang, Yuming Xiang, Feng Wang, Hongjian You | http://arxiv.org/pdf/2408.07419v1 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-14 | RSD-DOG : A New Image Descriptor based on Second Order Derivatives | RSD-DOG:一种基于二阶导数的新型图像描述符 | Darshan Venkatrayappa, Philippe Montesinos, Daniel Diep, Baptiste Magnier | http://arxiv.org/pdf/2408.07687v1 | null |
2024-08-14 | Rethinking the Key Factors for the Generalization of Remote Sensing Stereo Matching Networks | 遥感立体匹配网络推广关键因素的再思考 | Liting Jiang, Feng Wang, Wenyi Zhang, Peifeng Li, Hongjian You, Yuming Xiang | http://arxiv.org/pdf/2408.07613v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-14 | Evidential Graph Contrastive Alignment for Source-Free Blending-Target Domain Adaptation | 用于无源混合目标域自适应的证据图对比对齐 | Juepeng Zheng, Yibin Wen, Jinxiao Zhang, Runmin Dong, Haohuan Fu | http://arxiv.org/pdf/2408.07527v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-14 | Disentangle and denoise: Tackling context misalignment for video moment retrieval | 解开并去噪:解决视频时刻检索的上下文错位问题 | Kaijing Ma, Han Fang, Xianghao Zang, Chao Ban, Lanxiang Zhou, Zhongjiang He, Yongxiang Li, Hao Sun, Zerun Feng, Xingsong Hou | http://arxiv.org/pdf/2408.07600v1 | null |
2024-08-14 | Whitening Consistently Improves Self-Supervised Learning | 白化持续改善自监督学习 | András Kalapos, Bálint Gyires-Tóth | http://arxiv.org/pdf/2408.07519v1 | null |
2024-08-14 | Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach | 跨平台视频行人重识别:新的基准数据集和自适应方法 | Shizhou Zhang, Wenlong Luo, De Cheng, Qingchun Yang, Lingyan Ran, Yinghui Xing, Yanning Zhang | http://arxiv.org/pdf/2408.07500v1 | link |
2024-08-14 | BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning | BAPLe:利用即时学习对医学基础模型进行后门攻击 | Asif Hanif, Fahad Shamshad, Muhammad Awais, Muzammal Naseer, Fahad Shahbaz Khan, Karthik Nandakumar, Salman Khan, Rao Muhammad Anwer | http://arxiv.org/pdf/2408.07440v1 | null |
2024-08-14 | Achieving Data Efficient Neural Networks with Hybrid Concept-based Models | 利用混合概念模型实现数据高效的神经网络 | Tobias A. Opsahl, Vegard Antun | http://arxiv.org/pdf/2408.07438v1 | link |