Skip to content

Latest commit

 

History

History
executable file
·
104 lines (81 loc) · 14 KB

2024-07-26.md

File metadata and controls

executable file
·
104 lines (81 loc) · 14 KB

[UPDATED!] 2024-07-26 (Publish Time)

生成模型

Publish Date Title Title_CN Authors PDF Code
2024-07-26 Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment 将视觉和语义特征空间与扩散模型统一起来,以增强跨模态对齐 Yuze Zheng, Zixuan Li, Xiangxian Li, Jinxing Liu, Yuqing Wang, Xiangxu Meng, Lei Meng http://arxiv.org/pdf/2407.18854v1 null
2024-07-26 Scalable Group Choreography via Variational Phase Manifold Learning 通过变分相流形学习实现可扩展的群组编舞 Nhat Le, Khoa Do, Xuan Bui, Tuong Do, Erman Tjiputra, Quang D. Tran, Anh Nguyen http://arxiv.org/pdf/2407.18839v1 null
2024-07-26 Adversarial Robustification via Text-to-Image Diffusion Models 通过文本到图像扩散模型实现对抗鲁棒性 Daewon Choi, Jongheon Jeong, Huiwon Jang, Jinwoo Shin http://arxiv.org/pdf/2407.18658v1 null
2024-07-26 Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner Auto DragGAN:以自回归方式编辑生成图像流形 Pengxiang Cai, Zhiwei Liu, Guibo Zhu, Yunfang Niu, Jinqiao Wang http://arxiv.org/pdf/2407.18656v1 null
2024-07-26 From 2D to 3D: AISG-SLA Visual Localization Challenge 从 2D 到 3D:AISG-SLA 视觉定位挑战赛 Jialin Gao, Bill Ong, Darld Lwi, Zhen Hao Ng, Xun Wei Yee, Mun-Thye Mak, Wee Siong Ng, See-Kiong Ng, Hui Ying Teo, Victor Khoo, et.al. http://arxiv.org/pdf/2407.18590v1 null
2024-07-26 How To Segment in 3D Using 2D Models: Automated 3D Segmentation of Prostate Cancer Metastatic Lesions on PET Volumes Using Multi-Angle Maximum Intensity Projections and Diffusion Models 如何使用 2D 模型进行 3D 分割:使用多角度最大强度投影和扩散模型自动对 PET 体积上的前列腺癌转移性病变进行 3D 分割 Amirhosein Toosi, Sara Harsini, François Bénard, Carlos Uribe, Arman Rahmim http://arxiv.org/pdf/2407.18555v1 null
2024-07-26 Answerability Fields: Answerable Location Estimation via Diffusion Models 可回答性领域:通过扩散模型估计可回答的位置 Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, Koya Sakamoto, Motoaki Kawanabe http://arxiv.org/pdf/2407.18497v1 null
2024-07-26 Lensless fiber endomicroscopic phase imaging with speckle-conditioned diffusion model 采用散斑条件扩散模型的无透镜光纤内窥镜相位成像 Zhaoqing Chen, Jiawei Sun, Xinyi Ye, Bin Zhao, Xuelong Li http://arxiv.org/pdf/2407.18456v1 null
2024-07-26 Textile Anomaly Detection: Evaluation of the State-of-the-Art for Automated Quality Inspection of Carpet 纺织品异常检测:地毯自动质量检测最新技术的评估 Briony Forsberg, Dr Henry Williams, Prof Bruce MacDonald, Tracy Chen, Dr Kirstine Hulse http://arxiv.org/pdf/2407.18450v1 null

多模态

Publish Date Title Title_CN Authors PDF Code
2024-07-26 BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation BCTR:用于场景图生成的双向调节变压器 Peng Hao, Xiaobing Wang, Yingying Jiang, Hanchao Jia, Xiaoshuai Hao http://arxiv.org/pdf/2407.18715v1 null
2024-07-26 Every Part Matters: Integrity Verification of Scientific Figures Based on Multimodal Large Language Models 每一个部分都很重要:基于多模态大型语言模型的科学图形完整性验证 Xiang Shi, Jiawei Liu, Yinpeng Liu, Qikai Cheng, Wei Lu http://arxiv.org/pdf/2407.18626v1 null
2024-07-26 HICEScore: A Hierarchical Metric for Image Captioning Evaluation HICEScore:图像字幕评估的分层指标 Zequn Zeng, Jianqiao Sun, Hao Zhang, Tiansheng Wen, Yudi Su, Yan Xie, Zhengjue Wang, Bo Chen http://arxiv.org/pdf/2407.18589v1 null
2024-07-26 Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention 使用音频视频转换器融合和交叉注意实现多模态情感识别 Joe Dhanith P R, Shravan Venkatraman, Vigya Sharma, Santhosh Malarvannan http://arxiv.org/pdf/2407.18552v1 null
2024-07-26 Text-Region Matching for Multi-Label Image Recognition with Missing Labels 用于缺失标签的多标签图像识别的文本区域匹配 Leilei Ma, Hongxing Xie, Lei Wang, Yanping Fu, Dengdi Sun, Haifeng Zhao http://arxiv.org/pdf/2407.18520v1 null
2024-07-26 A Progressive Single-Modality to Multi-Modality Classification Framework for Alzheimer's Disease Sub-type Diagnosis 用于阿尔茨海默病亚型诊断的渐进式单模态到多模态分类框架 Yuxiao Liu, Mianxin Liu, Yuanwang Zhang, Kaicong Sun, Dinggang Shen http://arxiv.org/pdf/2407.18466v1 null

Nerf

Publish Date Title Title_CN Authors PDF Code
2024-07-26 IOVS4NeRF:Incremental Optimal View Selection for Large-Scale NeRFs IOVS4NeRF:大规模 NeRF 的增量式最优视图选择 Jingpeng Xie, Shiyu Tan, Yuanlei Wang, Yizhen Lao http://arxiv.org/pdf/2407.18611v1 null

模型压缩/优化

Publish Date Title Title_CN Authors PDF Code
2024-07-26 LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial Control Enhancement LinguaLinker:具有隐式面部控制增强功能的音频驱动肖像动画 Rui Zhang, Yixiao Fang, Zhengnan Lu, Pei Cheng, Zebiao Huang, Bin Fu http://arxiv.org/pdf/2407.18595v1 null
2024-07-26 Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers 通过从 2D Transformer 中提取关系先验来提升跨域点分类能力 Longkun Zou, Wanru Zhu, Ke Chen, Lihua Guo, Kailing Guo, Kui Jia, Yaowei Wang http://arxiv.org/pdf/2407.18534v1 null
2024-07-26 Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation 通过统一知识提炼实现可推广的病理学基础模型 Jiabo Ma, Zhengrui Guo, Fengtao Zhou, Yihui Wang, Yingxue Xu, Yu Cai, Zhengjie Zhu, Cheng Jin, Yi Lin Xinrui Jiang, Anjia Han, et.al. http://arxiv.org/pdf/2407.18449v1 null
2024-07-26 Mixed Non-linear Quantization for Vision Transformers 视觉变换器的混合非线性量化 Gihwan Kim, Jemin Lee, Sihyeong Park, Yongin Kwon, Hyungshin Kim http://arxiv.org/pdf/2407.18437v1 null

分类/检测/识别/分割/...

Publish Date Title Title_CN Authors PDF Code
2024-07-26 SHIC: Shape-Image Correspondences with no Keypoint Supervision SHIC:无关键点监督的形状图像对应关系 Aleksandar Shtedritski, Christian Rupprecht, Andrea Vedaldi http://arxiv.org/pdf/2407.18907v1 null
2024-07-26 A Scalable Quantum Non-local Neural Network for Image Classification 用于图像分类的可扩展量子非局部神经网络 Sparsh Gupta, Debanjan Konar, Vaneet Aggarwal http://arxiv.org/pdf/2407.18906v1 null
2024-07-26 A Survey on Cell Nuclei Instance Segmentation and Classification: Leveraging Context and Attention 细胞核实例分割与分类综述:利用上下文和注意力 João D. Nunes, Diana Montezuma, Domingos Oliveira, Tania Pereira, Jaime S. Cardoso http://arxiv.org/pdf/2407.18673v1 null
2024-07-26 Local Binary Pattern(LBP) Optimization for Feature Extraction 用于特征提取的局部二值模式 (LBP) 优化 Zeinab Sedaghatjoo, Hossein Hosseinzadeh, Bahram Sadeghi Bigham http://arxiv.org/pdf/2407.18665v1 null
2024-07-26 MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition MOoSE:面向开放场景文本识别的多方向共享专家 Chang Liu, Simon Corbillé, Elisa H Barney Smith http://arxiv.org/pdf/2407.18616v1 null
2024-07-26 LookupForensics: A Large-Scale Multi-Task Dataset for Multi-Phase Image-Based Fact Verification LookupForensics:用于多阶段基于图像的事实验证的大规模多任务数据集 Shuhan Cui, Huy H. Nguyen, Trung-Nghia Le, Chun-Shien Lu, Isao Echizen http://arxiv.org/pdf/2407.18614v1 null
2024-07-26 Content-driven Magnitude-Derivative Spectrum Complementary Learning for Hyperspectral Image Classification 内容驱动的幅度-导数光谱互补学习用于高光谱图像分类 Huiyan Bai, Tingfa Xu, Huan Chen, Peifu Liu, Jianan Li http://arxiv.org/pdf/2407.18593v1 null
2024-07-26 Learning Spectral-Decomposed Tokens for Domain Generalized Semantic Segmentation 学习领域广义语义分割的谱分解标记 Jingjun Yi, Qi Bi, Hao Zheng, Haolan Zhan, Wei Ji, Yawen Huang, Yuexiang Li, Yefeng Zheng http://arxiv.org/pdf/2407.18568v1 null
2024-07-26 VSSD: Vision Mamba with Non-Casual State Space Duality VSSD:具有非因果状态空间对偶性的 Vision Mamba Yuheng Shi, Minjing Dong, Mingjia Li, Chang Xu http://arxiv.org/pdf/2407.18559v1 null
2024-07-26 Skin Cancer Detection utilizing Deep Learning: Classification of Skin Lesion Images using a Vision Transformer 利用深度学习检测皮肤癌:使用 Vision Transformer 对皮肤病变图像进行分类 Carolin Flosdorf, Justin Engelker, Igor Keller, Nicolas Mohr http://arxiv.org/pdf/2407.18554v1 null
2024-07-26 Neural Modulation Alteration to Positive and Negative Emotions in Depressed Patients: Insights from fMRI Using Positive/Negative Emotion Atlas 抑郁症患者积极和消极情绪的神经调节改变:使用积极/消极情绪图谱从 fMRI 获得的见解 Yu Feng, Weiming Zeng, Yifan Xie, Hongyu Chen, Lei Wang, Yingying Wang, Hongjie Yan, Kaile Zhang, Ran Tao, Wai Ting Siok, et.al. http://arxiv.org/pdf/2407.18492v1 null
2024-07-26 SMPISD-MTPNet: Scene Semantic Prior-Assisted Infrared Ship Detection Using Multi-Task Perception Networks SMPISD-MTPNet:使用多任务感知网络的场景语义先验辅助红外船舶检测 Chen Hu, Xiaogang Dong, Yian Huang Lele Wang, Liang Xu, Tian Pu, Zhenming Peng http://arxiv.org/pdf/2407.18487v1 null

图像理解

Publish Date Title Title_CN Authors PDF Code
2024-07-26 She Works, He Works: A Curious Exploration of Gender Bias in AI-Generated Imagery 她工作,他工作:对人工智能生成图像中性别偏见的好奇探索 Amalia Foka http://arxiv.org/pdf/2407.18524v1 null
2024-07-26 HybridDepth: Robust Depth Fusion for Mobile AR by Leveraging Depth from Focus and Single-Image Priors HybridDepth:利用焦点深度和单图像先验实现移动 AR 的稳健深度融合 Ashkan Ganj, Hang Su, Tian Guo http://arxiv.org/pdf/2407.18443v1 null

LLM

Publish Date Title Title_CN Authors PDF Code
2024-07-26 Wolf: Captioning Everything with a World Summarization Framework Wolf:用世界概括框架为一切添加字幕 Boyi Li, Ligeng Zhu, Ran Tian, Shuhan Tan, Yuxiao Chen, Yao Lu, Yin Cui, Sushant Veer, Max Ehrlich, Jonah Philion, et.al. http://arxiv.org/pdf/2407.18908v1 null

Transformer

Publish Date Title Title_CN Authors PDF Code
2024-07-26 Deep Companion Learning: Enhancing Generalization Through Historical Consistency 深度同伴学习:通过历史一致性增强泛化能力 Ruizhao Zhu, Venkatesh Saligrama http://arxiv.org/pdf/2407.18821v1 null
2024-07-26 Dilated Strip Attention Network for Image Restoration 用于图像恢复的扩张条带注意网络 Fangwei Hao, Jiesheng Wu, Ji Du, Yinjie Wang, Jing Xu http://arxiv.org/pdf/2407.18613v1 null

3D/CG

Publish Date Title Title_CN Authors PDF Code
2024-07-26 Floating No More: Object-Ground Reconstruction from a Single Image 不再漂浮:从单幅图像重建物体-地面 Yunze Man, Yichen Sheng, Jianming Zhang, Liang-Yan Gui, Yu-Xiong Wang http://arxiv.org/pdf/2407.18914v1 null

各类学习方式

Publish Date Title Title_CN Authors PDF Code
2024-07-26 DynamicTrack: Advancing Gigapixel Tracking in Crowded Scenes DynamicTrack:在拥挤场景中推进千兆像素跟踪 Yunqi Zhao, Yuchen Guo, Zheng Cao, Kai Ni, Ruqi Huang, Lu Fang http://arxiv.org/pdf/2407.18637v1 null

其他

Publish Date Title Title_CN Authors PDF Code
2024-07-26 HRP: Human Affordances for Robotic Pre-Training HRP:机器人预训练的人类可承受性 Mohan Kumar Srirama, Sudeep Dasari, Shikhar Bahl, Abhinav Gupta http://arxiv.org/pdf/2407.18911v1 null
2024-07-26 Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence 从所学中学习:通过对比采样和视觉暂留实现无源主动域自适应 Mengyao Lyu, Tianxiang Hao, Xinhao Xu, Hui Chen, Zijia Lin, Jungong Han, Guiguang Ding http://arxiv.org/pdf/2407.18899v1 null
2024-07-26 Benchmarking Dependence Measures to Prevent Shortcut Learning in Medical Imaging 基准依赖性测量以防止医学成像中的捷径学习 Sarah Müller, Louisa Fay, Lisa M. Koch, Sergios Gatidis, Thomas Küstner, Philipp Berens http://arxiv.org/pdf/2407.18792v1 null
2024-07-26 PIV3CAMS: a multi-camera dataset for multiple computer vision problems and its application to novel view-point synthesis PIV3CAMS:用于多种计算机视觉问题的多摄像机数据集及其在新型视点合成中的应用 Sohyeong Kim, Martin Danelljan, Radu Timofte, Luc Van Gool, Jean-Philippe Thiran http://arxiv.org/pdf/2407.18695v1 null
2024-07-26 Rapid Object Annotation 快速对象注释 Misha Denil http://arxiv.org/pdf/2407.18682v1 null
2024-07-26 A Labeled Ophthalmic Ultrasound Dataset with Medical Report Generation Based on Cross-modal Deep Learning 基于跨模态深度学习的带标记眼科超声数据集及医学报告生成 Jing Wang, Junyan Fan, Meng Zhou, Yanzhu Zhang, Mingyu Shi http://arxiv.org/pdf/2407.18667v1 null
2024-07-26 Learning to Enhance Aperture Phasor Field for Non-Line-of-Sight Imaging 学习增强非视距成像的孔径相量场 In Cho, Hyunbo Shim, Seon Joo Kim http://arxiv.org/pdf/2407.18574v1 null
2024-07-26 Revisit Event Generation Model: Self-Supervised Learning of Event-to-Video Reconstruction with Implicit Neural Representations 重新审视事件生成模型:使用隐式神经表征进行事件到视频重建的自监督学习 Zipeng Wang, Yunfan Lu, Lin Wang http://arxiv.org/pdf/2407.18500v1 null