Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-26 | Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment | 将视觉和语义特征空间与扩散模型统一起来,以增强跨模态对齐 | Yuze Zheng, Zixuan Li, Xiangxian Li, Jinxing Liu, Yuqing Wang, Xiangxu Meng, Lei Meng | http://arxiv.org/pdf/2407.18854v1 | null |
2024-07-26 | Scalable Group Choreography via Variational Phase Manifold Learning | 通过变分相流形学习实现可扩展的群组编舞 | Nhat Le, Khoa Do, Xuan Bui, Tuong Do, Erman Tjiputra, Quang D. Tran, Anh Nguyen | http://arxiv.org/pdf/2407.18839v1 | null |
2024-07-26 | Adversarial Robustification via Text-to-Image Diffusion Models | 通过文本到图像扩散模型实现对抗鲁棒性 | Daewon Choi, Jongheon Jeong, Huiwon Jang, Jinwoo Shin | http://arxiv.org/pdf/2407.18658v1 | null |
2024-07-26 | Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner | Auto DragGAN:以自回归方式编辑生成图像流形 | Pengxiang Cai, Zhiwei Liu, Guibo Zhu, Yunfang Niu, Jinqiao Wang | http://arxiv.org/pdf/2407.18656v1 | null |
2024-07-26 | From 2D to 3D: AISG-SLA Visual Localization Challenge | 从 2D 到 3D:AISG-SLA 视觉定位挑战赛 | Jialin Gao, Bill Ong, Darld Lwi, Zhen Hao Ng, Xun Wei Yee, Mun-Thye Mak, Wee Siong Ng, See-Kiong Ng, Hui Ying Teo, Victor Khoo, et.al. | http://arxiv.org/pdf/2407.18590v1 | null |
2024-07-26 | How To Segment in 3D Using 2D Models: Automated 3D Segmentation of Prostate Cancer Metastatic Lesions on PET Volumes Using Multi-Angle Maximum Intensity Projections and Diffusion Models | 如何使用 2D 模型进行 3D 分割:使用多角度最大强度投影和扩散模型自动对 PET 体积上的前列腺癌转移性病变进行 3D 分割 | Amirhosein Toosi, Sara Harsini, François Bénard, Carlos Uribe, Arman Rahmim | http://arxiv.org/pdf/2407.18555v1 | null |
2024-07-26 | Answerability Fields: Answerable Location Estimation via Diffusion Models | 可回答性领域:通过扩散模型估计可回答的位置 | Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, Koya Sakamoto, Motoaki Kawanabe | http://arxiv.org/pdf/2407.18497v1 | null |
2024-07-26 | Lensless fiber endomicroscopic phase imaging with speckle-conditioned diffusion model | 采用散斑条件扩散模型的无透镜光纤内窥镜相位成像 | Zhaoqing Chen, Jiawei Sun, Xinyi Ye, Bin Zhao, Xuelong Li | http://arxiv.org/pdf/2407.18456v1 | null |
2024-07-26 | Textile Anomaly Detection: Evaluation of the State-of-the-Art for Automated Quality Inspection of Carpet | 纺织品异常检测:地毯自动质量检测最新技术的评估 | Briony Forsberg, Dr Henry Williams, Prof Bruce MacDonald, Tracy Chen, Dr Kirstine Hulse | http://arxiv.org/pdf/2407.18450v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-26 | BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation | BCTR:用于场景图生成的双向调节变压器 | Peng Hao, Xiaobing Wang, Yingying Jiang, Hanchao Jia, Xiaoshuai Hao | http://arxiv.org/pdf/2407.18715v1 | null |
2024-07-26 | Every Part Matters: Integrity Verification of Scientific Figures Based on Multimodal Large Language Models | 每一个部分都很重要:基于多模态大型语言模型的科学图形完整性验证 | Xiang Shi, Jiawei Liu, Yinpeng Liu, Qikai Cheng, Wei Lu | http://arxiv.org/pdf/2407.18626v1 | null |
2024-07-26 | HICEScore: A Hierarchical Metric for Image Captioning Evaluation | HICEScore:图像字幕评估的分层指标 | Zequn Zeng, Jianqiao Sun, Hao Zhang, Tiansheng Wen, Yudi Su, Yan Xie, Zhengjue Wang, Bo Chen | http://arxiv.org/pdf/2407.18589v1 | null |
2024-07-26 | Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention | 使用音频视频转换器融合和交叉注意实现多模态情感识别 | Joe Dhanith P R, Shravan Venkatraman, Vigya Sharma, Santhosh Malarvannan | http://arxiv.org/pdf/2407.18552v1 | null |
2024-07-26 | Text-Region Matching for Multi-Label Image Recognition with Missing Labels | 用于缺失标签的多标签图像识别的文本区域匹配 | Leilei Ma, Hongxing Xie, Lei Wang, Yanping Fu, Dengdi Sun, Haifeng Zhao | http://arxiv.org/pdf/2407.18520v1 | null |
2024-07-26 | A Progressive Single-Modality to Multi-Modality Classification Framework for Alzheimer's Disease Sub-type Diagnosis | 用于阿尔茨海默病亚型诊断的渐进式单模态到多模态分类框架 | Yuxiao Liu, Mianxin Liu, Yuanwang Zhang, Kaicong Sun, Dinggang Shen | http://arxiv.org/pdf/2407.18466v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-26 | IOVS4NeRF:Incremental Optimal View Selection for Large-Scale NeRFs | IOVS4NeRF:大规模 NeRF 的增量式最优视图选择 | Jingpeng Xie, Shiyu Tan, Yuanlei Wang, Yizhen Lao | http://arxiv.org/pdf/2407.18611v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-26 | LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial Control Enhancement | LinguaLinker:具有隐式面部控制增强功能的音频驱动肖像动画 | Rui Zhang, Yixiao Fang, Zhengnan Lu, Pei Cheng, Zebiao Huang, Bin Fu | http://arxiv.org/pdf/2407.18595v1 | null |
2024-07-26 | Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers | 通过从 2D Transformer 中提取关系先验来提升跨域点分类能力 | Longkun Zou, Wanru Zhu, Ke Chen, Lihua Guo, Kailing Guo, Kui Jia, Yaowei Wang | http://arxiv.org/pdf/2407.18534v1 | null |
2024-07-26 | Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation | 通过统一知识提炼实现可推广的病理学基础模型 | Jiabo Ma, Zhengrui Guo, Fengtao Zhou, Yihui Wang, Yingxue Xu, Yu Cai, Zhengjie Zhu, Cheng Jin, Yi Lin Xinrui Jiang, Anjia Han, et.al. | http://arxiv.org/pdf/2407.18449v1 | null |
2024-07-26 | Mixed Non-linear Quantization for Vision Transformers | 视觉变换器的混合非线性量化 | Gihwan Kim, Jemin Lee, Sihyeong Park, Yongin Kwon, Hyungshin Kim | http://arxiv.org/pdf/2407.18437v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-26 | SHIC: Shape-Image Correspondences with no Keypoint Supervision | SHIC:无关键点监督的形状图像对应关系 | Aleksandar Shtedritski, Christian Rupprecht, Andrea Vedaldi | http://arxiv.org/pdf/2407.18907v1 | null |
2024-07-26 | A Scalable Quantum Non-local Neural Network for Image Classification | 用于图像分类的可扩展量子非局部神经网络 | Sparsh Gupta, Debanjan Konar, Vaneet Aggarwal | http://arxiv.org/pdf/2407.18906v1 | null |
2024-07-26 | A Survey on Cell Nuclei Instance Segmentation and Classification: Leveraging Context and Attention | 细胞核实例分割与分类综述:利用上下文和注意力 | João D. Nunes, Diana Montezuma, Domingos Oliveira, Tania Pereira, Jaime S. Cardoso | http://arxiv.org/pdf/2407.18673v1 | null |
2024-07-26 | Local Binary Pattern(LBP) Optimization for Feature Extraction | 用于特征提取的局部二值模式 (LBP) 优化 | Zeinab Sedaghatjoo, Hossein Hosseinzadeh, Bahram Sadeghi Bigham | http://arxiv.org/pdf/2407.18665v1 | null |
2024-07-26 | MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition | MOoSE:面向开放场景文本识别的多方向共享专家 | Chang Liu, Simon Corbillé, Elisa H Barney Smith | http://arxiv.org/pdf/2407.18616v1 | null |
2024-07-26 | LookupForensics: A Large-Scale Multi-Task Dataset for Multi-Phase Image-Based Fact Verification | LookupForensics:用于多阶段基于图像的事实验证的大规模多任务数据集 | Shuhan Cui, Huy H. Nguyen, Trung-Nghia Le, Chun-Shien Lu, Isao Echizen | http://arxiv.org/pdf/2407.18614v1 | null |
2024-07-26 | Content-driven Magnitude-Derivative Spectrum Complementary Learning for Hyperspectral Image Classification | 内容驱动的幅度-导数光谱互补学习用于高光谱图像分类 | Huiyan Bai, Tingfa Xu, Huan Chen, Peifu Liu, Jianan Li | http://arxiv.org/pdf/2407.18593v1 | null |
2024-07-26 | Learning Spectral-Decomposed Tokens for Domain Generalized Semantic Segmentation | 学习领域广义语义分割的谱分解标记 | Jingjun Yi, Qi Bi, Hao Zheng, Haolan Zhan, Wei Ji, Yawen Huang, Yuexiang Li, Yefeng Zheng | http://arxiv.org/pdf/2407.18568v1 | null |
2024-07-26 | VSSD: Vision Mamba with Non-Casual State Space Duality | VSSD:具有非因果状态空间对偶性的 Vision Mamba | Yuheng Shi, Minjing Dong, Mingjia Li, Chang Xu | http://arxiv.org/pdf/2407.18559v1 | null |
2024-07-26 | Skin Cancer Detection utilizing Deep Learning: Classification of Skin Lesion Images using a Vision Transformer | 利用深度学习检测皮肤癌:使用 Vision Transformer 对皮肤病变图像进行分类 | Carolin Flosdorf, Justin Engelker, Igor Keller, Nicolas Mohr | http://arxiv.org/pdf/2407.18554v1 | null |
2024-07-26 | Neural Modulation Alteration to Positive and Negative Emotions in Depressed Patients: Insights from fMRI Using Positive/Negative Emotion Atlas | 抑郁症患者积极和消极情绪的神经调节改变:使用积极/消极情绪图谱从 fMRI 获得的见解 | Yu Feng, Weiming Zeng, Yifan Xie, Hongyu Chen, Lei Wang, Yingying Wang, Hongjie Yan, Kaile Zhang, Ran Tao, Wai Ting Siok, et.al. | http://arxiv.org/pdf/2407.18492v1 | null |
2024-07-26 | SMPISD-MTPNet: Scene Semantic Prior-Assisted Infrared Ship Detection Using Multi-Task Perception Networks | SMPISD-MTPNet:使用多任务感知网络的场景语义先验辅助红外船舶检测 | Chen Hu, Xiaogang Dong, Yian Huang Lele Wang, Liang Xu, Tian Pu, Zhenming Peng | http://arxiv.org/pdf/2407.18487v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-26 | She Works, He Works: A Curious Exploration of Gender Bias in AI-Generated Imagery | 她工作,他工作:对人工智能生成图像中性别偏见的好奇探索 | Amalia Foka | http://arxiv.org/pdf/2407.18524v1 | null |
2024-07-26 | HybridDepth: Robust Depth Fusion for Mobile AR by Leveraging Depth from Focus and Single-Image Priors | HybridDepth:利用焦点深度和单图像先验实现移动 AR 的稳健深度融合 | Ashkan Ganj, Hang Su, Tian Guo | http://arxiv.org/pdf/2407.18443v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-26 | Wolf: Captioning Everything with a World Summarization Framework | Wolf:用世界概括框架为一切添加字幕 | Boyi Li, Ligeng Zhu, Ran Tian, Shuhan Tan, Yuxiao Chen, Yao Lu, Yin Cui, Sushant Veer, Max Ehrlich, Jonah Philion, et.al. | http://arxiv.org/pdf/2407.18908v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-26 | Deep Companion Learning: Enhancing Generalization Through Historical Consistency | 深度同伴学习:通过历史一致性增强泛化能力 | Ruizhao Zhu, Venkatesh Saligrama | http://arxiv.org/pdf/2407.18821v1 | null |
2024-07-26 | Dilated Strip Attention Network for Image Restoration | 用于图像恢复的扩张条带注意网络 | Fangwei Hao, Jiesheng Wu, Ji Du, Yinjie Wang, Jing Xu | http://arxiv.org/pdf/2407.18613v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-26 | Floating No More: Object-Ground Reconstruction from a Single Image | 不再漂浮:从单幅图像重建物体-地面 | Yunze Man, Yichen Sheng, Jianming Zhang, Liang-Yan Gui, Yu-Xiong Wang | http://arxiv.org/pdf/2407.18914v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-26 | DynamicTrack: Advancing Gigapixel Tracking in Crowded Scenes | DynamicTrack:在拥挤场景中推进千兆像素跟踪 | Yunqi Zhao, Yuchen Guo, Zheng Cao, Kai Ni, Ruqi Huang, Lu Fang | http://arxiv.org/pdf/2407.18637v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-07-26 | HRP: Human Affordances for Robotic Pre-Training | HRP:机器人预训练的人类可承受性 | Mohan Kumar Srirama, Sudeep Dasari, Shikhar Bahl, Abhinav Gupta | http://arxiv.org/pdf/2407.18911v1 | null |
2024-07-26 | Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence | 从所学中学习:通过对比采样和视觉暂留实现无源主动域自适应 | Mengyao Lyu, Tianxiang Hao, Xinhao Xu, Hui Chen, Zijia Lin, Jungong Han, Guiguang Ding | http://arxiv.org/pdf/2407.18899v1 | null |
2024-07-26 | Benchmarking Dependence Measures to Prevent Shortcut Learning in Medical Imaging | 基准依赖性测量以防止医学成像中的捷径学习 | Sarah Müller, Louisa Fay, Lisa M. Koch, Sergios Gatidis, Thomas Küstner, Philipp Berens | http://arxiv.org/pdf/2407.18792v1 | null |
2024-07-26 | PIV3CAMS: a multi-camera dataset for multiple computer vision problems and its application to novel view-point synthesis | PIV3CAMS:用于多种计算机视觉问题的多摄像机数据集及其在新型视点合成中的应用 | Sohyeong Kim, Martin Danelljan, Radu Timofte, Luc Van Gool, Jean-Philippe Thiran | http://arxiv.org/pdf/2407.18695v1 | null |
2024-07-26 | Rapid Object Annotation | 快速对象注释 | Misha Denil | http://arxiv.org/pdf/2407.18682v1 | null |
2024-07-26 | A Labeled Ophthalmic Ultrasound Dataset with Medical Report Generation Based on Cross-modal Deep Learning | 基于跨模态深度学习的带标记眼科超声数据集及医学报告生成 | Jing Wang, Junyan Fan, Meng Zhou, Yanzhu Zhang, Mingyu Shi | http://arxiv.org/pdf/2407.18667v1 | null |
2024-07-26 | Learning to Enhance Aperture Phasor Field for Non-Line-of-Sight Imaging | 学习增强非视距成像的孔径相量场 | In Cho, Hyunbo Shim, Seon Joo Kim | http://arxiv.org/pdf/2407.18574v1 | null |
2024-07-26 | Revisit Event Generation Model: Self-Supervised Learning of Event-to-Video Reconstruction with Implicit Neural Representations | 重新审视事件生成模型:使用隐式神经表征进行事件到视频重建的自监督学习 | Zipeng Wang, Yunfan Lu, Lin Wang | http://arxiv.org/pdf/2407.18500v1 | null |