Skip to content

Latest commit

 

History

History
executable file
·
116 lines (95 loc) · 19 KB

2024-07-30.md

File metadata and controls

executable file
·
116 lines (95 loc) · 19 KB

[UPDATED!] 2024-07-30 (Publish Time)

生成模型

Publish Date Title Title_CN Authors PDF Code
2024-07-30 Matting by Generation 按代数排列 Zhixiang Wang, Baiang Li, Jian Wang, Yu-Lun Liu, Jinwei Gu, Yung-Yu Chuang, Shin'ichi Satoh http://arxiv.org/pdf/2407.21017v1 null
2024-07-30 Add-SD: Rational Generation without Manual Reference Add-SD:无手册参考的 Rational 生成 Lingfeng Yang, Xinyu Zhang, Xiang Li, Jinwen Chen, Kun Yao, Gang Zhang, Errui Ding, Lingqiao Liu, Jingdong Wang, Jian Yang http://arxiv.org/pdf/2407.21016v1 link
2024-07-30 dopanim: A Dataset of Doppelganger Animals with Noisy Annotations from Multiple Humans dopanim:来自多个人类的带噪声注释的替身动物数据集 Marek Herde, Denis Huseljic, Lukas Rauch, Bernhard Sick http://arxiv.org/pdf/2407.20950v1 null
2024-07-30 Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering 通过以对象为中心的体素化和神经渲染实现动态场景理解 Yanpeng Zhao, Yiwei Hao, Siyu Gao, Yunbo Wang, Xiaokang Yang http://arxiv.org/pdf/2407.20908v1 link
2024-07-30 Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks 人工智能生成的图像检测中的漏洞:对抗性攻击的挑战 Yunfeng Diao, Naixin Zhai, Changtao Miao, Xun Yang, Meng Wang http://arxiv.org/pdf/2407.20836v1 null
2024-07-30 SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models SynthVLM:视觉语言模型的高效高质量合成数据 Zheng Liu, Hao Liang, Wentao Xiong, Qinhan Yu, Conghui He, Bin Cui, Wentao Zhang http://arxiv.org/pdf/2407.20756v1 link
2024-07-30 Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks 可转移对抗攻击的提示驱动对比学习 Hunmin Yang, Jongoh Jeong, Kuk-Jin Yoon http://arxiv.org/pdf/2407.20657v1 null
2024-07-30 FACL-Attack: Frequency-Aware Contrastive Learning for Transferable Adversarial Attacks FACL-Attack:可转移对抗攻击的频率感知对比学习 Hunmin Yang, Jongoh Jeong, Kuk-Jin Yoon http://arxiv.org/pdf/2407.20653v1 null
2024-07-30 EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos EgoSonics:为无声的自我中心视频生成同步音频 Aashish Rai, Srinath Sridhar http://arxiv.org/pdf/2407.20592v1 null

多模态

Publish Date Title Title_CN Authors PDF Code
2024-07-30 Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection Evolver:进化链推动大型多模态模型实现仇恨模因检测 Jinfa Huang, Jinsheng Pan, Zhongwei Wan, Hanjia Lyu, Jiebo Luo http://arxiv.org/pdf/2407.21004v1 null
2024-07-30 MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions MMTrail:具有语言和音乐描述的多模式预告片视频数据集 Xiaowei Chi, Yatian Wang, Aosong Cheng, Pengjun Fang, Zeyue Tian, Yingqing He, Zhaoyang Liu, Xingqun Qi, Jiahao Pan, Rongyu Zhang, et.al. http://arxiv.org/pdf/2407.20962v1 link
2024-07-30 UniProcessor: A Text-induced Unified Low-level Image Processor UniProcessor:文本驱动的统一低级图像处理器 Huiyu Duan, Xiongkuo Min, Sijing Wu, Wei Shen, Guangtao Zhai http://arxiv.org/pdf/2407.20928v1 link
2024-07-30 Bayesian Low-Rank LeArning (Bella): A Practical Approach to Bayesian Neural Networks 贝叶斯低秩学习(Bella):贝叶斯神经网络的实用方法 Bao Gia Doan, Afshar Shamsi, Xiao-Yu Guo, Arash Mohammadi, Hamid Alinejad-Rokny, Dino Sejdinovic, Damith C. Ranasinghe, Ehsan Abbasnejad http://arxiv.org/pdf/2407.20891v1 null
2024-07-30 Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training Strategy 采用全像素覆盖采样和训练策略实现高效无参考 4K 视频质量评估 Xiaoheng Tan, Jiabin Zhang, Yuhui Quan, Jing Li, Yajing Wu, Zilin Bian http://arxiv.org/pdf/2407.20766v1 null
2024-07-30 Boosting Audio Visual Question Answering via Key Semantic-Aware Cues 通过关键语义感知线索提升音频视觉问答能力 Guangyao Li, Henghui Du, Di Hu http://arxiv.org/pdf/2407.20693v1 link
2024-07-30 Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos 有效利用 CLIP 生成图像和视频的情景摘要 Dhruv Verma, Debaditya Roy, Basura Fernando http://arxiv.org/pdf/2407.20642v1 null
2024-07-30 Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering 金字塔编码器:用于组合视觉问答的分层代码生成器 Ruoyue Shen, Nakamasa Inoue, Koichi Shinoda http://arxiv.org/pdf/2407.20563v1 null
2024-07-30 Interpreting and Mitigating Hallucination in MLLMs through Multi-agent Debate 通过多主体辩论解释和缓解 MLLM 中的幻觉 Zheng Lin, Zhenxing Niu, Zhibin Wang, Yinghui Xu http://arxiv.org/pdf/2407.20505v1 link

Nerf

Publish Date Title Title_CN Authors PDF Code
2024-07-30 A Comparative Study of Neural Surface Reconstruction for Scientific Visualization 神经表面重建在科学可视化中的比较研究 Siyuan Yao, Weixi Song, Chaoli Wang http://arxiv.org/pdf/2407.20868v1 null

分类/检测/识别/分割/...

Publish Date Title Title_CN Authors PDF Code
2024-07-30 PIXELMOD: Improving Soft Moderation of Visual Misleading Information on Twitter PIXELMOD:改进 Twitter 上视觉误导信息的软审核 Pujan Paudel, Chen Ling, Jeremy Blackburn, Gianluca Stringhini http://arxiv.org/pdf/2407.20987v1 link
2024-07-30 Learning Ordinality in Semantic Segmentation 语义分割中的序数学习 Rafael Cristino, Ricardo P. M. Cruz, Jaime S. Cardoso http://arxiv.org/pdf/2407.20959v1 null
2024-07-30 SSPA: Split-and-Synthesize Prompting with Gated Alignments for Multi-Label Image Recognition SSPA:使用门控对齐进行分割与合成提示,实现多标签图像识别 Hao Tan, Zichang Tan, Jun Li, Jun Wan, Zhen Lei, Stan Z. Li http://arxiv.org/pdf/2407.20920v1 null
2024-07-30 What is YOLOv5: A deep look into the internal features of the popular object detector 什么是 YOLOv5:深入了解流行物体检测器的内部特性 Rahima Khanam, Muhammad Hussain http://arxiv.org/pdf/2407.20892v1 null
2024-07-30 NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding NIS-SLAM:用于 3D 一致场景理解的神经隐式语义 RGB-D SLAM Hongjia Zhai, Gan Huang, Qirui Hu, Guanglin Li, Hujun Bao, Guofeng Zhang http://arxiv.org/pdf/2407.20853v1 null
2024-07-30 DFE-IANet: A Method for Polyp Image Classification Based on Dual-domain Feature Extraction and Interaction Attention DFE-IANet:一种基于双域特征提取和交互注意的息肉图像分类方法 Wei Wang, Jixing He, Xin Wang http://arxiv.org/pdf/2407.20843v1 null
2024-07-30 WARM-3D: A Weakly-Supervised Sim2Real Domain Adaptation Framework for Roadside Monocular 3D Object Detection WARM-3D:用于路边单目 3D 物体检测的弱监督 Sim2Real 域自适应框架 Xingcheng Zhou, Deyu Fu, Walter Zimmer, Mingyu Liu, Venkatnarayanan Lakshminarasimhan, Leah Strand, Alois C. Knoll http://arxiv.org/pdf/2407.20818v1 null
2024-07-30 Neural Fields for Continuous Periodic Motion Estimation in 4D Cardiovascular Imaging 4D 心血管成像中连续周期运动估计的神经场 Simone Garzia, Patryk Rygiel, Sven Dummer, Filippo Cademartiri, Simona Celi, Jelmer M. Wolterink http://arxiv.org/pdf/2407.20728v1 null
2024-07-30 Time Series Anomaly Detection with CNN for Environmental Sensors in Healthcare-IoT 使用 CNN 对医疗物联网中的环境传感器进行时间序列异常检测 Mirza Akhi Khatun, Mangolika Bhattacharya, Ciarán Eising, Lubna Luxmi Dhirani http://arxiv.org/pdf/2407.20695v1 null
2024-07-30 3D-GRES: Generalized 3D Referring Expression Segmentation 3D-GRES:广义 3D 指称表达分割 Changli Wu, Yihang Liu, Jiayi Ji, Yiwei Ma, Haowei Wang, Gen Luo, Henghui Ding, Xiaoshuai Sun, Rongrong Ji http://arxiv.org/pdf/2407.20664v1 null
2024-07-30 DocXPand-25k: a large and diverse benchmark dataset for identity documents analysis DocXPand-25k:用于身份证件分析的大型多样化基准数据集 Julien Lerouge, Guillaume Betmont, Thomas Bres, Evgeny Stepankevich, Alexis Bergès http://arxiv.org/pdf/2407.20662v1 link
2024-07-30 Spiking-DD: Neuromorphic Event Camera based Driver Distraction Detection with Spiking Neural Network Spiking-DD:基于神经形态事件摄像头的驾驶员分心检测与脉冲神经网络 Waseem Shariff, Paul Kielty, Joseph Lemley, Peter Corcoran http://arxiv.org/pdf/2407.20633v1 null
2024-07-30 SharkTrack: an accurate, generalisable software for streamlining shark and ray underwater video analysis SharkTrack:一款精确、通用的软件,可简化鲨鱼和鳐鱼水下视频分析 Filippo Varini, Francesco Ferretti, Jeremy Jenrette, Joel H. Gayford, Mark E. Bond, Matthew J. Witt, Michael R. Heithaus, Sophie Wilday, Ben Glocker http://arxiv.org/pdf/2407.20623v1 null
2024-07-30 Knowledge Fused Recognition: Fusing Hierarchical Knowledge for Image Recognition through Quantitative Relativity Modeling and Deep Metric Learning 知识融合识别:通过定量相对论建模和深度度量学习融合分层知识进行图像识别 Yunfeng Zhao, Huiyu Zhou, Fei Wu, Xifeng Wu http://arxiv.org/pdf/2407.20600v1 null
2024-07-30 Image-based Detection of Segment Misalignment in Multi-mirror Satellites using Transfer Learning 使用迁移学习进行基于图像的多镜卫星段错位检测 C. Tanner Fredieu, Jonathan Tesch, Andrew Kee, David Redding http://arxiv.org/pdf/2407.20582v1 null
2024-07-30 Markers Identification for Relative Pose Estimation of an Uncooperative Target 不合作目标相对姿态估计的标记识别 Batu Candan, Simone Servadio http://arxiv.org/pdf/2407.20515v1 null
2024-07-30 Enhancing Quantitative Image Synthesis through Pretraining and Resolution Scaling for Bone Mineral Density Estimation from a Plain X-ray Image 通过预训练和分辨率缩放增强定量图像合成,以便从普通 X 射线图像中估计骨矿物质密度 Yi Gu, Yoshito Otake, Keisuke Uemura, Masaki Takao, Mazen Soufi, Seiji Okada, Nobuhiko Sugano, Hugues Talbot, Yoshinobu Sato http://arxiv.org/pdf/2407.20495v1 null

OCR

Publish Date Title Title_CN Authors PDF Code
2024-07-30 SceneTeller: Language-to-3D Scene Generation SceneTeller:语言到 3D 场景生成 Başak Melis Öcal, Maxim Tatarchenko, Sezer Karaoglu, Theo Gevers http://arxiv.org/pdf/2407.20727v1 null

LLM

Publish Date Title Title_CN Authors PDF Code
2024-07-30 From Feature Importance to Natural Language Explanations Using LLMs with RAG 使用 RAG 的 LLM 从特征重要性到自然语言解释 Sule Tekkesinoglu, Lars Kunze http://arxiv.org/pdf/2407.20990v1 null

Transformer

Publish Date Title Title_CN Authors PDF Code
2024-07-30 XHand: Real-time Expressive Hand Avatar XHand:实时表情手势 Qijun Gan, Zijie Zhou, Jianke Zhu http://arxiv.org/pdf/2407.21002v1 link
2024-07-30 EAR: Edge-Aware Reconstruction of 3-D vertebrae structures from bi-planar X-ray images EAR:根据双平面 X 射线图像对 3-D 椎骨结构进行边缘感知重建 Lixing Tan, Shuang Song, Yaofeng He, Kangneng Zhou, Tong Lu, Ruoxiu Xiao http://arxiv.org/pdf/2407.20937v1 null
2024-07-30 DeTurb: Atmospheric Turbulence Mitigation with Deformable 3D Convolutions and 3D Swin Transformers DeTurb:利用可变形 3D 卷积和 3D Swin Transformers 缓解大气湍流 Zhicheng Zou, Nantheera Anantrasirichai http://arxiv.org/pdf/2407.20855v1 null
2024-07-30 SpotFormer: Multi-Scale Spatio-Temporal Transformer for Facial Expression Spotting SpotFormer:用于面部表情识别的多尺度时空变换器 Yicheng Deng, Hideaki Hayashi, Hajime Nagahara http://arxiv.org/pdf/2407.20799v1 null
2024-07-30 Benchmarking Histopathology Foundation Models for Ovarian Cancer Bevacizumab Treatment Response Prediction from Whole Slide Images 根据全切片图像对卵巢癌贝伐单抗治疗反应进行预测的组织病理学基础模型进行基准测试 Mayur Mallya, Ali Khajegili Mirabadi, Hossein Farahani, Ali Bashashati http://arxiv.org/pdf/2407.20596v1 null
2024-07-30 HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation HandDAGT:用于 3D 手势估计的去噪自适应图形变换器 Wencan Cheng, Eunji Kim, Jong Hwan Ko http://arxiv.org/pdf/2407.20542v1 link

3D/CG

Publish Date Title Title_CN Authors PDF Code
2024-07-30 Mean of Means: A 10-dollar Solution for Human Localization with Calibration-free and Unconstrained Camera Settings 均值法:无需校准、不受相机设置约束的人体定位 10 美元解决方案 Tianyi Zhang, Wengyu Zhang, Xulu Zhang, Jiaxin Wu, Xiao-Yong Wei, Jiannong Cao, Qing Li http://arxiv.org/pdf/2407.20870v1 null
2024-07-30 Autogenic Language Embedding for Coherent Point Tracking 用于相干点跟踪的自生语言嵌入 Zikai Song, Ying Tang, Run Luo, Lintao Ma, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang http://arxiv.org/pdf/2407.20730v1 link
2024-07-30 Monocular Human-Object Reconstruction in the Wild 户外单目人体-物体重建 Chaofan Huo, Ye Shi, Jingya Wang http://arxiv.org/pdf/2407.20566v1 link
2024-07-30 StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset StackFLOW:通过带偏移的堆叠正则化流进行单目人-物重建 Chaofan Huo, Ye Shi, Yuexin Ma, Lan Xu, Jingyi Yu, Jingya Wang http://arxiv.org/pdf/2407.20545v1 link

各类学习方式

Publish Date Title Title_CN Authors PDF Code
2024-07-30 CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning CLEFT:具有高效大型语言模型和快速微调的语言-图像对比学习 Yuexi Du, Brian Chang, Nicha C. Dvornek http://arxiv.org/pdf/2407.21011v1 link
2024-07-30 S3PET: Semi-supervised Standard-dose PET Image Reconstruction via Dose-aware Token Swap S3PET:通过剂量感知令牌交换进行半监督标准剂量 PET 图像重建 Jiaqi Cui, Pinxian Zeng, Yuanyuan Xu, Xi Wu, Jiliu Zhou, Yan Wang http://arxiv.org/pdf/2407.20878v1 null
2024-07-30 PIP: Prototypes-Injected Prompt for Federated Class Incremental Learning PIP:联邦类增量学习的原型注入提示 Muhammad Anwar Ma'sum, Mahardhika Pratama, Savitha Ramasamy, Lin Liu, Habibullah Habibullah, Ryszard Kowalczyk http://arxiv.org/pdf/2407.20705v1 link

其他

Publish Date Title Title_CN Authors PDF Code
2024-07-30 GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models GABInsight:探索视觉语言模型中的性别活动结合偏见 Ali Abdollahi, Mahdi Ghaznavi, Mohammad Reza Karimi Nejad, Arash Mari Oriyad, Reza Abbasi, Ali Salesi, Melika Behjati, Mohammad Hossein Rohban, Mahdieh Soleymani Baghshah http://arxiv.org/pdf/2407.21001v1 null
2024-07-30 How to Choose a Reinforcement-Learning Algorithm 如何选择强化学习算法 Fabian Bongratz, Vladimir Golkov, Lukas Mautner, Luca Della Libera, Frederik Heetmeyer, Felix Czaja, Julian Rodemann, Daniel Cremers http://arxiv.org/pdf/2407.20917v1 null
2024-07-30 Automatic Die Studies for Ancient Numismatics 古代钱币的自动模具研究 Clément Cornet, Héloïse Aumaître, Romaric Besançon, Julien Olivier, Thomas Faucher, Hervé Le Borgne http://arxiv.org/pdf/2407.20876v1 null
2024-07-30 A Comparative Analysis of YOLOv5, YOLOv8, and YOLOv10 in Kitchen Safety YOLOv5、YOLOv8、YOLOv10 在厨房安全方面的对比分析 Athulya Sundaresan Geetha, Muhammad Hussain http://arxiv.org/pdf/2407.20872v1 null
2024-07-30 Assessing Graphical Perception of Image Embedding Models using Channel Effectiveness 使用通道有效性评估图像嵌入模型的图形感知 Soohyun Lee, Minsuk Chang, Seokhyeon Park, Jinwook Seo http://arxiv.org/pdf/2407.20845v1 null
2024-07-30 Federated Knowledge Recycling: Privacy-Preserving Synthetic Data Sharing 联合知识回收:保护隐私的合成数据共享 Eugenio Lomurno, Matteo Matteucci http://arxiv.org/pdf/2407.20830v1 null
2024-07-30 Re-localization acceleration with Medoid Silhouette Clustering 使用 Medoid Silhouette Clustering 加速重新定位 Hongyi Zhang, Walterio Mayol-Cuevas http://arxiv.org/pdf/2407.20749v1 null
2024-07-30 Scene-Specific Trajectory Sets: Maximizing Representation in Motion Forecasting 场景特定的轨迹集:最大化运动预测中的表征 Abhishek Vivekanandan, J. Marius Zöllner http://arxiv.org/pdf/2407.20732v1 null
2024-07-30 What makes for good morphology representations for spatial omics? 什么构成了空间组学的良好形态表征? Eduard Chelebian, Christophe Avenel, Carolina Wählby http://arxiv.org/pdf/2407.20660v1 null
2024-07-30 Image Re-Identification: Where Self-supervision Meets Vision-Language Learning 图像重新识别:自我监督与视觉语言学习的结合 Bin Wang, Yuying Liang, Lei Cai, Huakun Huang, Huanqiang Zeng http://arxiv.org/pdf/2407.20647v1 link
2024-07-30 Generalizing AI-driven Assessment of Immunohistochemistry across Immunostains and Cancer Types: A Universal Immunohistochemistry Analyzer 推广人工智能驱动的免疫组织化学评估,涵盖免疫染色和癌症类型:通用免疫组织化学分析仪 Biagio Brattoli, Mohammad Mostafavi, Taebum Lee, Wonkyung Jung, Jeongun Ryu, Seonwook Park, Jongchan Park, Sergio Pereira, Seunghwan Shin, Sangjoon Choi, et.al. http://arxiv.org/pdf/2407.20643v1 null
2024-07-30 High-Resolution Spatial Transcriptomics from Histology Images using HisToSGE 使用 HisToSGE 从组织学图像中获取高分辨率空间转录组学 Zhiceng Shi, Shuailin Xue, Fangfang Zhu, Wenwen Min http://arxiv.org/pdf/2407.20518v1 link
2024-07-30 Restoring Real-World Degraded Events Improves Deblurring Quality 恢复现实世界中退化的事件可提高去模糊质量 Yeqing Shen, Shang Li, Kun Song http://arxiv.org/pdf/2407.20502v1 link