Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-30 | Matting by Generation | 按代数排列 | Zhixiang Wang, Baiang Li, Jian Wang, Yu-Lun Liu, Jinwei Gu, Yung-Yu Chuang, Shin'ichi Satoh | http://arxiv.org/pdf/2407.21017v1 | null |
2024-07-30 | Add-SD: Rational Generation without Manual Reference | Add-SD:无手册参考的 Rational 生成 | Lingfeng Yang, Xinyu Zhang, Xiang Li, Jinwen Chen, Kun Yao, Gang Zhang, Errui Ding, Lingqiao Liu, Jingdong Wang, Jian Yang | http://arxiv.org/pdf/2407.21016v1 | link |
2024-07-30 | dopanim: A Dataset of Doppelganger Animals with Noisy Annotations from Multiple Humans | dopanim:来自多个人类的带噪声注释的替身动物数据集 | Marek Herde, Denis Huseljic, Lukas Rauch, Bernhard Sick | http://arxiv.org/pdf/2407.20950v1 | null |
2024-07-30 | Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering | 通过以对象为中心的体素化和神经渲染实现动态场景理解 | Yanpeng Zhao, Yiwei Hao, Siyu Gao, Yunbo Wang, Xiaokang Yang | http://arxiv.org/pdf/2407.20908v1 | link |
2024-07-30 | Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks | 人工智能生成的图像检测中的漏洞:对抗性攻击的挑战 | Yunfeng Diao, Naixin Zhai, Changtao Miao, Xun Yang, Meng Wang | http://arxiv.org/pdf/2407.20836v1 | null |
2024-07-30 | SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models | SynthVLM:视觉语言模型的高效高质量合成数据 | Zheng Liu, Hao Liang, Wentao Xiong, Qinhan Yu, Conghui He, Bin Cui, Wentao Zhang | http://arxiv.org/pdf/2407.20756v1 | link |
2024-07-30 | Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks | 可转移对抗攻击的提示驱动对比学习 | Hunmin Yang, Jongoh Jeong, Kuk-Jin Yoon | http://arxiv.org/pdf/2407.20657v1 | null |
2024-07-30 | FACL-Attack: Frequency-Aware Contrastive Learning for Transferable Adversarial Attacks | FACL-Attack:可转移对抗攻击的频率感知对比学习 | Hunmin Yang, Jongoh Jeong, Kuk-Jin Yoon | http://arxiv.org/pdf/2407.20653v1 | null |
2024-07-30 | EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos | EgoSonics:为无声的自我中心视频生成同步音频 | Aashish Rai, Srinath Sridhar | http://arxiv.org/pdf/2407.20592v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-30 | Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection | Evolver:进化链推动大型多模态模型实现仇恨模因检测 | Jinfa Huang, Jinsheng Pan, Zhongwei Wan, Hanjia Lyu, Jiebo Luo | http://arxiv.org/pdf/2407.21004v1 | null |
2024-07-30 | MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions | MMTrail:具有语言和音乐描述的多模式预告片视频数据集 | Xiaowei Chi, Yatian Wang, Aosong Cheng, Pengjun Fang, Zeyue Tian, Yingqing He, Zhaoyang Liu, Xingqun Qi, Jiahao Pan, Rongyu Zhang, et.al. | http://arxiv.org/pdf/2407.20962v1 | link |
2024-07-30 | UniProcessor: A Text-induced Unified Low-level Image Processor | UniProcessor:文本驱动的统一低级图像处理器 | Huiyu Duan, Xiongkuo Min, Sijing Wu, Wei Shen, Guangtao Zhai | http://arxiv.org/pdf/2407.20928v1 | link |
2024-07-30 | Bayesian Low-Rank LeArning (Bella): A Practical Approach to Bayesian Neural Networks | 贝叶斯低秩学习(Bella):贝叶斯神经网络的实用方法 | Bao Gia Doan, Afshar Shamsi, Xiao-Yu Guo, Arash Mohammadi, Hamid Alinejad-Rokny, Dino Sejdinovic, Damith C. Ranasinghe, Ehsan Abbasnejad | http://arxiv.org/pdf/2407.20891v1 | null |
2024-07-30 | Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training Strategy | 采用全像素覆盖采样和训练策略实现高效无参考 4K 视频质量评估 | Xiaoheng Tan, Jiabin Zhang, Yuhui Quan, Jing Li, Yajing Wu, Zilin Bian | http://arxiv.org/pdf/2407.20766v1 | null |
2024-07-30 | Boosting Audio Visual Question Answering via Key Semantic-Aware Cues | 通过关键语义感知线索提升音频视觉问答能力 | Guangyao Li, Henghui Du, Di Hu | http://arxiv.org/pdf/2407.20693v1 | link |
2024-07-30 | Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos | 有效利用 CLIP 生成图像和视频的情景摘要 | Dhruv Verma, Debaditya Roy, Basura Fernando | http://arxiv.org/pdf/2407.20642v1 | null |
2024-07-30 | Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering | 金字塔编码器:用于组合视觉问答的分层代码生成器 | Ruoyue Shen, Nakamasa Inoue, Koichi Shinoda | http://arxiv.org/pdf/2407.20563v1 | null |
2024-07-30 | Interpreting and Mitigating Hallucination in MLLMs through Multi-agent Debate | 通过多主体辩论解释和缓解 MLLM 中的幻觉 | Zheng Lin, Zhenxing Niu, Zhibin Wang, Yinghui Xu | http://arxiv.org/pdf/2407.20505v1 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-30 | A Comparative Study of Neural Surface Reconstruction for Scientific Visualization | 神经表面重建在科学可视化中的比较研究 | Siyuan Yao, Weixi Song, Chaoli Wang | http://arxiv.org/pdf/2407.20868v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-30 | PIXELMOD: Improving Soft Moderation of Visual Misleading Information on Twitter | PIXELMOD:改进 Twitter 上视觉误导信息的软审核 | Pujan Paudel, Chen Ling, Jeremy Blackburn, Gianluca Stringhini | http://arxiv.org/pdf/2407.20987v1 | link |
2024-07-30 | Learning Ordinality in Semantic Segmentation | 语义分割中的序数学习 | Rafael Cristino, Ricardo P. M. Cruz, Jaime S. Cardoso | http://arxiv.org/pdf/2407.20959v1 | null |
2024-07-30 | SSPA: Split-and-Synthesize Prompting with Gated Alignments for Multi-Label Image Recognition | SSPA:使用门控对齐进行分割与合成提示,实现多标签图像识别 | Hao Tan, Zichang Tan, Jun Li, Jun Wan, Zhen Lei, Stan Z. Li | http://arxiv.org/pdf/2407.20920v1 | null |
2024-07-30 | What is YOLOv5: A deep look into the internal features of the popular object detector | 什么是 YOLOv5:深入了解流行物体检测器的内部特性 | Rahima Khanam, Muhammad Hussain | http://arxiv.org/pdf/2407.20892v1 | null |
2024-07-30 | NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding | NIS-SLAM:用于 3D 一致场景理解的神经隐式语义 RGB-D SLAM | Hongjia Zhai, Gan Huang, Qirui Hu, Guanglin Li, Hujun Bao, Guofeng Zhang | http://arxiv.org/pdf/2407.20853v1 | null |
2024-07-30 | DFE-IANet: A Method for Polyp Image Classification Based on Dual-domain Feature Extraction and Interaction Attention | DFE-IANet:一种基于双域特征提取和交互注意的息肉图像分类方法 | Wei Wang, Jixing He, Xin Wang | http://arxiv.org/pdf/2407.20843v1 | null |
2024-07-30 | WARM-3D: A Weakly-Supervised Sim2Real Domain Adaptation Framework for Roadside Monocular 3D Object Detection | WARM-3D:用于路边单目 3D 物体检测的弱监督 Sim2Real 域自适应框架 | Xingcheng Zhou, Deyu Fu, Walter Zimmer, Mingyu Liu, Venkatnarayanan Lakshminarasimhan, Leah Strand, Alois C. Knoll | http://arxiv.org/pdf/2407.20818v1 | null |
2024-07-30 | Neural Fields for Continuous Periodic Motion Estimation in 4D Cardiovascular Imaging | 4D 心血管成像中连续周期运动估计的神经场 | Simone Garzia, Patryk Rygiel, Sven Dummer, Filippo Cademartiri, Simona Celi, Jelmer M. Wolterink | http://arxiv.org/pdf/2407.20728v1 | null |
2024-07-30 | Time Series Anomaly Detection with CNN for Environmental Sensors in Healthcare-IoT | 使用 CNN 对医疗物联网中的环境传感器进行时间序列异常检测 | Mirza Akhi Khatun, Mangolika Bhattacharya, Ciarán Eising, Lubna Luxmi Dhirani | http://arxiv.org/pdf/2407.20695v1 | null |
2024-07-30 | 3D-GRES: Generalized 3D Referring Expression Segmentation | 3D-GRES:广义 3D 指称表达分割 | Changli Wu, Yihang Liu, Jiayi Ji, Yiwei Ma, Haowei Wang, Gen Luo, Henghui Ding, Xiaoshuai Sun, Rongrong Ji | http://arxiv.org/pdf/2407.20664v1 | null |
2024-07-30 | DocXPand-25k: a large and diverse benchmark dataset for identity documents analysis | DocXPand-25k:用于身份证件分析的大型多样化基准数据集 | Julien Lerouge, Guillaume Betmont, Thomas Bres, Evgeny Stepankevich, Alexis Bergès | http://arxiv.org/pdf/2407.20662v1 | link |
2024-07-30 | Spiking-DD: Neuromorphic Event Camera based Driver Distraction Detection with Spiking Neural Network | Spiking-DD:基于神经形态事件摄像头的驾驶员分心检测与脉冲神经网络 | Waseem Shariff, Paul Kielty, Joseph Lemley, Peter Corcoran | http://arxiv.org/pdf/2407.20633v1 | null |
2024-07-30 | SharkTrack: an accurate, generalisable software for streamlining shark and ray underwater video analysis | SharkTrack:一款精确、通用的软件,可简化鲨鱼和鳐鱼水下视频分析 | Filippo Varini, Francesco Ferretti, Jeremy Jenrette, Joel H. Gayford, Mark E. Bond, Matthew J. Witt, Michael R. Heithaus, Sophie Wilday, Ben Glocker | http://arxiv.org/pdf/2407.20623v1 | null |
2024-07-30 | Knowledge Fused Recognition: Fusing Hierarchical Knowledge for Image Recognition through Quantitative Relativity Modeling and Deep Metric Learning | 知识融合识别:通过定量相对论建模和深度度量学习融合分层知识进行图像识别 | Yunfeng Zhao, Huiyu Zhou, Fei Wu, Xifeng Wu | http://arxiv.org/pdf/2407.20600v1 | null |
2024-07-30 | Image-based Detection of Segment Misalignment in Multi-mirror Satellites using Transfer Learning | 使用迁移学习进行基于图像的多镜卫星段错位检测 | C. Tanner Fredieu, Jonathan Tesch, Andrew Kee, David Redding | http://arxiv.org/pdf/2407.20582v1 | null |
2024-07-30 | Markers Identification for Relative Pose Estimation of an Uncooperative Target | 不合作目标相对姿态估计的标记识别 | Batu Candan, Simone Servadio | http://arxiv.org/pdf/2407.20515v1 | null |
2024-07-30 | Enhancing Quantitative Image Synthesis through Pretraining and Resolution Scaling for Bone Mineral Density Estimation from a Plain X-ray Image | 通过预训练和分辨率缩放增强定量图像合成,以便从普通 X 射线图像中估计骨矿物质密度 | Yi Gu, Yoshito Otake, Keisuke Uemura, Masaki Takao, Mazen Soufi, Seiji Okada, Nobuhiko Sugano, Hugues Talbot, Yoshinobu Sato | http://arxiv.org/pdf/2407.20495v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-30 | SceneTeller: Language-to-3D Scene Generation | SceneTeller:语言到 3D 场景生成 | Başak Melis Öcal, Maxim Tatarchenko, Sezer Karaoglu, Theo Gevers | http://arxiv.org/pdf/2407.20727v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-30 | From Feature Importance to Natural Language Explanations Using LLMs with RAG | 使用 RAG 的 LLM 从特征重要性到自然语言解释 | Sule Tekkesinoglu, Lars Kunze | http://arxiv.org/pdf/2407.20990v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-30 | XHand: Real-time Expressive Hand Avatar | XHand:实时表情手势 | Qijun Gan, Zijie Zhou, Jianke Zhu | http://arxiv.org/pdf/2407.21002v1 | link |
2024-07-30 | EAR: Edge-Aware Reconstruction of 3-D vertebrae structures from bi-planar X-ray images | EAR:根据双平面 X 射线图像对 3-D 椎骨结构进行边缘感知重建 | Lixing Tan, Shuang Song, Yaofeng He, Kangneng Zhou, Tong Lu, Ruoxiu Xiao | http://arxiv.org/pdf/2407.20937v1 | null |
2024-07-30 | DeTurb: Atmospheric Turbulence Mitigation with Deformable 3D Convolutions and 3D Swin Transformers | DeTurb:利用可变形 3D 卷积和 3D Swin Transformers 缓解大气湍流 | Zhicheng Zou, Nantheera Anantrasirichai | http://arxiv.org/pdf/2407.20855v1 | null |
2024-07-30 | SpotFormer: Multi-Scale Spatio-Temporal Transformer for Facial Expression Spotting | SpotFormer:用于面部表情识别的多尺度时空变换器 | Yicheng Deng, Hideaki Hayashi, Hajime Nagahara | http://arxiv.org/pdf/2407.20799v1 | null |
2024-07-30 | Benchmarking Histopathology Foundation Models for Ovarian Cancer Bevacizumab Treatment Response Prediction from Whole Slide Images | 根据全切片图像对卵巢癌贝伐单抗治疗反应进行预测的组织病理学基础模型进行基准测试 | Mayur Mallya, Ali Khajegili Mirabadi, Hossein Farahani, Ali Bashashati | http://arxiv.org/pdf/2407.20596v1 | null |
2024-07-30 | HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation | HandDAGT:用于 3D 手势估计的去噪自适应图形变换器 | Wencan Cheng, Eunji Kim, Jong Hwan Ko | http://arxiv.org/pdf/2407.20542v1 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-30 | Mean of Means: A 10-dollar Solution for Human Localization with Calibration-free and Unconstrained Camera Settings | 均值法:无需校准、不受相机设置约束的人体定位 10 美元解决方案 | Tianyi Zhang, Wengyu Zhang, Xulu Zhang, Jiaxin Wu, Xiao-Yong Wei, Jiannong Cao, Qing Li | http://arxiv.org/pdf/2407.20870v1 | null |
2024-07-30 | Autogenic Language Embedding for Coherent Point Tracking | 用于相干点跟踪的自生语言嵌入 | Zikai Song, Ying Tang, Run Luo, Lintao Ma, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang | http://arxiv.org/pdf/2407.20730v1 | link |
2024-07-30 | Monocular Human-Object Reconstruction in the Wild | 户外单目人体-物体重建 | Chaofan Huo, Ye Shi, Jingya Wang | http://arxiv.org/pdf/2407.20566v1 | link |
2024-07-30 | StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset | StackFLOW:通过带偏移的堆叠正则化流进行单目人-物重建 | Chaofan Huo, Ye Shi, Yuexin Ma, Lan Xu, Jingyi Yu, Jingya Wang | http://arxiv.org/pdf/2407.20545v1 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-30 | CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning | CLEFT:具有高效大型语言模型和快速微调的语言-图像对比学习 | Yuexi Du, Brian Chang, Nicha C. Dvornek | http://arxiv.org/pdf/2407.21011v1 | link |
2024-07-30 | S3PET: Semi-supervised Standard-dose PET Image Reconstruction via Dose-aware Token Swap | S3PET:通过剂量感知令牌交换进行半监督标准剂量 PET 图像重建 | Jiaqi Cui, Pinxian Zeng, Yuanyuan Xu, Xi Wu, Jiliu Zhou, Yan Wang | http://arxiv.org/pdf/2407.20878v1 | null |
2024-07-30 | PIP: Prototypes-Injected Prompt for Federated Class Incremental Learning | PIP:联邦类增量学习的原型注入提示 | Muhammad Anwar Ma'sum, Mahardhika Pratama, Savitha Ramasamy, Lin Liu, Habibullah Habibullah, Ryszard Kowalczyk | http://arxiv.org/pdf/2407.20705v1 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-30 | GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models | GABInsight:探索视觉语言模型中的性别活动结合偏见 | Ali Abdollahi, Mahdi Ghaznavi, Mohammad Reza Karimi Nejad, Arash Mari Oriyad, Reza Abbasi, Ali Salesi, Melika Behjati, Mohammad Hossein Rohban, Mahdieh Soleymani Baghshah | http://arxiv.org/pdf/2407.21001v1 | null |
2024-07-30 | How to Choose a Reinforcement-Learning Algorithm | 如何选择强化学习算法 | Fabian Bongratz, Vladimir Golkov, Lukas Mautner, Luca Della Libera, Frederik Heetmeyer, Felix Czaja, Julian Rodemann, Daniel Cremers | http://arxiv.org/pdf/2407.20917v1 | null |
2024-07-30 | Automatic Die Studies for Ancient Numismatics | 古代钱币的自动模具研究 | Clément Cornet, Héloïse Aumaître, Romaric Besançon, Julien Olivier, Thomas Faucher, Hervé Le Borgne | http://arxiv.org/pdf/2407.20876v1 | null |
2024-07-30 | A Comparative Analysis of YOLOv5, YOLOv8, and YOLOv10 in Kitchen Safety | YOLOv5、YOLOv8、YOLOv10 在厨房安全方面的对比分析 | Athulya Sundaresan Geetha, Muhammad Hussain | http://arxiv.org/pdf/2407.20872v1 | null |
2024-07-30 | Assessing Graphical Perception of Image Embedding Models using Channel Effectiveness | 使用通道有效性评估图像嵌入模型的图形感知 | Soohyun Lee, Minsuk Chang, Seokhyeon Park, Jinwook Seo | http://arxiv.org/pdf/2407.20845v1 | null |
2024-07-30 | Federated Knowledge Recycling: Privacy-Preserving Synthetic Data Sharing | 联合知识回收:保护隐私的合成数据共享 | Eugenio Lomurno, Matteo Matteucci | http://arxiv.org/pdf/2407.20830v1 | null |
2024-07-30 | Re-localization acceleration with Medoid Silhouette Clustering | 使用 Medoid Silhouette Clustering 加速重新定位 | Hongyi Zhang, Walterio Mayol-Cuevas | http://arxiv.org/pdf/2407.20749v1 | null |
2024-07-30 | Scene-Specific Trajectory Sets: Maximizing Representation in Motion Forecasting | 场景特定的轨迹集:最大化运动预测中的表征 | Abhishek Vivekanandan, J. Marius Zöllner | http://arxiv.org/pdf/2407.20732v1 | null |
2024-07-30 | What makes for good morphology representations for spatial omics? | 什么构成了空间组学的良好形态表征? | Eduard Chelebian, Christophe Avenel, Carolina Wählby | http://arxiv.org/pdf/2407.20660v1 | null |
2024-07-30 | Image Re-Identification: Where Self-supervision Meets Vision-Language Learning | 图像重新识别:自我监督与视觉语言学习的结合 | Bin Wang, Yuying Liang, Lei Cai, Huakun Huang, Huanqiang Zeng | http://arxiv.org/pdf/2407.20647v1 | link |
2024-07-30 | Generalizing AI-driven Assessment of Immunohistochemistry across Immunostains and Cancer Types: A Universal Immunohistochemistry Analyzer | 推广人工智能驱动的免疫组织化学评估,涵盖免疫染色和癌症类型:通用免疫组织化学分析仪 | Biagio Brattoli, Mohammad Mostafavi, Taebum Lee, Wonkyung Jung, Jeongun Ryu, Seonwook Park, Jongchan Park, Sergio Pereira, Seunghwan Shin, Sangjoon Choi, et.al. | http://arxiv.org/pdf/2407.20643v1 | null |
2024-07-30 | High-Resolution Spatial Transcriptomics from Histology Images using HisToSGE | 使用 HisToSGE 从组织学图像中获取高分辨率空间转录组学 | Zhiceng Shi, Shuailin Xue, Fangfang Zhu, Wenwen Min | http://arxiv.org/pdf/2407.20518v1 | link |
2024-07-30 | Restoring Real-World Degraded Events Improves Deblurring Quality | 恢复现实世界中退化的事件可提高去模糊质量 | Yeqing Shen, Shang Li, Kun Song | http://arxiv.org/pdf/2407.20502v1 | link |