Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-16 | HistoGym: A Reinforcement Learning Environment for Histopathological Image Analysis | HistoGym:用于组织病理学图像分析的强化学习环境 | Zhi-Bo Liu, Xiaobo Pang, Jizhao Wang, Shuai Liu, Chen Li | http://arxiv.org/pdf/2408.08847v1 | link |
2024-08-16 | PFDiff: Training-free Acceleration of Diffusion Models through the Gradient Guidance of Past and Future | PFDiff:通过过去和未来的梯度引导实现无需训练的扩散模型加速 | Guangyi Wang, Yuren Cai, Lijiang Li, Wei Peng, Songzhi Su | http://arxiv.org/pdf/2408.08822v1 | null |
2024-08-16 | Comparative Analysis of Generative Models: Enhancing Image Synthesis with VAEs, GANs, and Stable Diffusion | 生成模型的比较分析:使用 VAE、GAN 和稳定扩散增强图像合成 | Sanchayan Vivekananthan | http://arxiv.org/pdf/2408.08751v1 | null |
2024-08-16 | Beyond the Hype: A dispassionate look at vision-language models in medical scenario | 超越炒作:冷静看待医疗场景中的视觉语言模型 | Yang Nan, Huichi Zhou, Xiaodan Xing, Guang Yang | http://arxiv.org/pdf/2408.08704v1 | null |
2024-08-16 | Modeling the Neonatal Brain Development Using Implicit Neural Representations | 使用隐性神经表征对新生儿大脑发育进行建模 | Florentin Bieder, Paul Friedrich, Hélène Corbaz, Alicia Durrer, Julia Wolleb, Philippe C. Cattin | http://arxiv.org/pdf/2408.08647v1 | null |
2024-08-16 | Generative Dataset Distillation Based on Diffusion Model | 基于扩散模型的生成数据集蒸馏 | Duo Su, Junjie Hou, Guang Li, Ren Togo, Rui Song, Takahiro Ogawa, Miki Haseyama | http://arxiv.org/pdf/2408.08610v1 | link |
2024-08-16 | A New Chinese Landscape Paintings Generation Model based on Stable Diffusion using DreamBooth | 基于 DreamBooth 的稳定扩散中国山水画生成新模型 | Yujia Gu, Xinyu Fang, Xueyuan Deng | http://arxiv.org/pdf/2408.08561v1 | null |
2024-08-16 | Visual-Friendly Concept Protection via Selective Adversarial Perturbations | 通过选择性对抗扰动实现视觉友好概念保护 | Xiaoyue Mi, Fan Tang, Juan Cao, Peng Li, Yang Liu | http://arxiv.org/pdf/2408.08518v1 | link |
2024-08-16 | Efficient Image-to-Image Diffusion Classifier for Adversarial Robustness | 具有对抗鲁棒性的高效图像到图像扩散分类器 | Hefei Mei, Minjing Dong, Chang Xu | http://arxiv.org/pdf/2408.08502v1 | link |
2024-08-16 | Achieving Complex Image Edits via Function Aggregation with Diffusion Models | 通过扩散模型的功能聚合实现复杂的图像编辑 | Mohammadreza Samadi, Fred X. Han, Mohammad Salameh, Hao Wu, Fengyu Sun, Chunhua Zhou, Di Niu | http://arxiv.org/pdf/2408.08495v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-16 | xGen-MM (BLIP-3): A Family of Open Large Multimodal Models | xGen-MM (BLIP-3):开放式大型多模式模型系列 | Le Xue, Manli Shu, Anas Awadalla, Jun Wang, An Yan, Senthil Purushwalkam, Honglu Zhou, Viraj Prabhu, Yutong Dai, Michael S Ryoo, et.al. | http://arxiv.org/pdf/2408.08872v1 | null |
2024-08-16 | RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba | 通过与 Progressive Fusion Mamba 的全层多模态交互实现 RGBT 跟踪 | Andong Lu, Wanyu Wang, Chenglong Li, Jin Tang, Bin Luo | http://arxiv.org/pdf/2408.08827v1 | null |
2024-08-16 | Decoupling Feature Representations of Ego and Other Modalities for Incomplete Multi-modal Brain Tumor Segmentation | 分离自我特征表示和其他模态特征表示以实现不完全多模态脑肿瘤分割 | Kaixiang Yang, Wenqi Shan, Xudong Li, Xuan Wang, Xikai Yang, Xi Wang, Pheng-Ann Heng, Qiang Li, Zhiwei Wang | http://arxiv.org/pdf/2408.08708v1 | link |
2024-08-16 | TsCA: On the Semantic Consistency Alignment via Conditional Transport for Compositional Zero-Shot Learning | TsCA:通过条件传输实现组合零样本学习的语义一致性对齐 | Miaoge Li, Jingcai Guo, Richard Yi Da Xu, Dongsheng Wang, Xiaofeng Cao, Song Guo | http://arxiv.org/pdf/2408.08703v1 | null |
2024-08-16 | A Survey on Benchmarks of Multimodal Large Language Models | 多模态大型语言模型基准调查 | Jian Li, Weiheng Lu | http://arxiv.org/pdf/2408.08632v1 | link |
2024-08-16 | Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs | 告诉编解码器什么值得压缩:使用 LMM 进行机器的语义解缠图像编码 | Jinming Liu, Yuntao Wei, Junyan Lin, Shengyang Zhao, Heming Sun, Zhibo Chen, Wenjun Zeng, Xin Jin | http://arxiv.org/pdf/2408.08575v1 | null |
2024-08-16 | Scaling up Multimodal Pre-training for Sign Language Understanding | 扩大手语理解的多模式预训练 | Wengang Zhou, Weichao Zhao, Hezhen Hu, Zecheng Li, Houqiang Li | http://arxiv.org/pdf/2408.08544v1 | null |
2024-08-16 | Focus on Focus: Focus-oriented Representation Learning and Multi-view Cross-modal Alignment for Glioma Grading | 聚焦焦点:面向焦点的表征学习和多视图跨模态对齐用于胶质瘤分级 | Li Pan, Yupei Zhang, Qiushi Yang, Tan Li, Xiaohan Xing, Maximus C. F. Yeung, Zhen Chen | http://arxiv.org/pdf/2408.08527v1 | link |
2024-08-16 | CoSEC: A Coaxial Stereo Event Camera Dataset for Autonomous Driving | CoSEC:用于自动驾驶的同轴立体事件摄像机数据集 | Shihan Peng, Hanyu Zhou, Hao Dong, Zhiwei Shi, Haoyue Liu, Yuxing Duan, Yi Chang, Luxin Yan | http://arxiv.org/pdf/2408.08500v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-16 | VF-NeRF: Learning Neural Vector Fields for Indoor Scene Reconstruction | VF-NeRF:学习神经矢量场用于室内场景重建 | Albert Gassol Puigjaner, Edoardo Mello Rella, Erik Sandström, Ajad Chhatkuli, Luc Van Gool | http://arxiv.org/pdf/2408.08766v1 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-16 | Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS | 用于 NVS 的对应引导无 SfM 3D 高斯溅射 | Wei Sun, Xiaosong Zhang, Fang Wan, Yanzhao Zhou, Yuan Li, Qixiang Ye, Jianbin Jiao | http://arxiv.org/pdf/2408.08723v1 | null |
2024-08-16 | GS-ID: Illumination Decomposition on Gaussian Splatting via Diffusion Prior and Parametric Light Source Optimization | GS-ID:通过扩散先验和参数光源优化实现高斯散射光照分解 | Kang Du, Zhihao Liang, Zeyu Wang | http://arxiv.org/pdf/2408.08524v1 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-16 | SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation | SAM2-UNet:Segment Anything 2 为自然和医学图像分割打造强大的编码器 | Xinyu Xiong, Zihuang Wu, Shuangyi Tan, Wenxue Li, Feilong Tang, Ying Chen, Siying Li, Jie Ma, Guanbin Li | http://arxiv.org/pdf/2408.08870v1 | link |
2024-08-16 | DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models | DPA:视觉语言模型无监督适应的双原型对齐 | Eman Ali, Sathira Silva, Muhammad Haris Khan | http://arxiv.org/pdf/2408.08855v1 | null |
2024-08-16 | Retrieval-augmented Few-shot Medical Image Segmentation with Foundation Models | 使用基础模型进行检索增强的少样本医学图像分割 | Lin Zhao, Xiao Chen, Eric Z. Chen, Yikang Liu, Terrence Chen, Shanhui Sun | http://arxiv.org/pdf/2408.08813v1 | null |
2024-08-16 | A Disease-Specific Foundation Model Using Over 100K Fundus Images: Release and Validation for Abnormality and Multi-Disease Classification on Downstream Tasks | 使用超过 10 万张眼底图像的疾病特定基础模型:下游任务中的异常和多疾病分类的发布和验证 | Boa Jang, Youngbin Ahn, Eun Kyung Choe, Chang Ki Yoon, Hyuk Jin Choi, Young-Gon Kim | http://arxiv.org/pdf/2408.08790v1 | null |
2024-08-16 | Towards Physical World Backdoor Attacks against Skeleton Action Recognition | 针对骨骼动作识别的物理世界后门攻击 | Qichen Zheng, Yi Yu, Siyuan Yang, Jun Liu, Kwok-Yan Lam, Alex Kot | http://arxiv.org/pdf/2408.08671v1 | null |
2024-08-16 | Extracting polygonal footprints in off-nadir images with Segment Anything Model | 使用 Segment Anything 模型提取非地面图像中的多边形足迹 | Kai Li, Jingbo Chen, Yupeng Deng, Yu Meng, Diyou Liu, Junxian Ma, Chenhao Wang | http://arxiv.org/pdf/2408.08645v1 | null |
2024-08-16 | SketchRef: A Benchmark Dataset and Evaluation Metrics for Automated Sketch Synthesis | SketchRef:自动草图合成的基准数据集和评估指标 | Xingyue Lin, Xingjian Hu, Shuai Peng, Jianhua Zhu, Liangcai Gao | http://arxiv.org/pdf/2408.08623v1 | null |
2024-08-16 | MM-UNet: A Mixed MLP Architecture for Improved Ophthalmic Image Segmentation | MM-UNet:一种用于改进眼科图像分割的混合 MLP 架构 | Zunjie Xiao, Xiaoqing Zhang, Risa Higashita, Jiang Liu | http://arxiv.org/pdf/2408.08600v1 | null |
2024-08-16 | Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation | 用于开放词汇 3D 实例分割的零样本双路径集成框架 | Tri Ton, Ji Woo Hong, SooHwan Eom, Jun Yeop Shim, Junyeong Kim, Chang D. Yoo | http://arxiv.org/pdf/2408.08591v1 | null |
2024-08-16 | TAMER: Tree-Aware Transformer for Handwritten Mathematical Expression Recognition | TAMER:用于手写数学表达式识别的树感知变换器 | Jianhua Zhu, Wenqi Zhao, Yu Li, Xingjian Hu, Liangcai Gao | http://arxiv.org/pdf/2408.08578v1 | link |
2024-08-16 | Tuning a SAM-Based Model with Multi-Cognitive Visual Adapter to Remote Sensing Instance Segmentation | 使用多认知视觉适配器调整基于 SAM 的模型以进行遥感实例分割 | Linghao Zheng, Xinyang Pu, Feng Xu | http://arxiv.org/pdf/2408.08576v1 | null |
2024-08-16 | A training regime to learn unified representations from complementary breast imaging modalities | 一种从互补乳腺成像模式中学习统一表征的训练方案 | Umang Sharma, Jungkyu Park, Laura Heacock, Sumit Chopra, Krzysztof Geras | http://arxiv.org/pdf/2408.08560v1 | null |
2024-08-16 | Detection and tracking of MAVs using a LiDAR with rosette scanning pattern | 使用带有玫瑰花扫描模式的 LiDAR 检测和跟踪 MAV | Sándor Gazdag, Tom Möller, Tamás Filep, Anita Keszler, András L. Majdik | http://arxiv.org/pdf/2408.08555v1 | null |
2024-08-16 | Language-Driven Interactive Shadow Detection | 语言驱动的交互式阴影检测 | Hongqiu Wang, Wei Wang, Haipeng Zhou, Huihui Xu, Shaozhi Wu, Lei Zhu | http://arxiv.org/pdf/2408.08543v1 | link |
2024-08-16 | DFT-Based Adversarial Attack Detection in MRI Brain Imaging: Enhancing Diagnostic Accuracy in Alzheimer's Case Studies | 基于 DFT 的 MRI 脑成像对抗性攻击检测:提高阿尔茨海默病病例研究中的诊断准确性 | Mohammad Hossein Najafi, Mohammad Morsali, Mohammadmahdi Vahediahmar, Saeed Bagheri Shouraki | http://arxiv.org/pdf/2408.08489v1 | null |
2024-08-16 | TEXTOC: Text-driven Object-Centric Style Transfer | TEXTOC:文本驱动的以对象为中心的风格转换 | Jihun Park, Jongmin Gim, Kyoungmin Lee, Seunghun Lee, Sunghoon Im | http://arxiv.org/pdf/2408.08461v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-16 | LLM-PCGC: Large Language Model-based Point Cloud Geometry Compression | LLM-PCGC:基于大型语言模型的点云几何压缩 | Yuqi Ye, Wei Gao | http://arxiv.org/pdf/2408.08682v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-16 | PriorMapNet: Enhancing Online Vectorized HD Map Construction with Priors | PriorMapNet:利用 Priors 增强在线矢量化高清地图构建 | Rongxuan Wang, Xin Lu, Xiaoyang Liu, Xiaoyi Zou, Tongyi Cao, Ying Li | http://arxiv.org/pdf/2408.08802v1 | null |
2024-08-16 | PCP-MAE: Learning to Predict Centers for Point Masked Autoencoders | PCP-MAE:学习预测点掩模自动编码器的中心 | Xiangdong Zhang, Shaofeng Zhang, Junchi Yan | http://arxiv.org/pdf/2408.08753v1 | null |
2024-08-16 | Task-Aware Dynamic Transformer for Efficient Arbitrary-Scale Image Super-Resolution | 任务感知动态变换器,用于高效任意尺度图像超分辨率 | Tianyi Xu, Yiji Zhou, Xiaotao Hu, Kai Zhang, Anran Zhang, Xingye Qiu, Jun Xu | http://arxiv.org/pdf/2408.08736v1 | null |
2024-08-16 | HyCoT: Hyperspectral Compression Transformer with an Efficient Training Strategy | HyCoT:具有高效训练策略的高光谱压缩变换器 | Martin Hermann Paul Fuchs, Behnood Rasti, Begüm Demir | http://arxiv.org/pdf/2408.08700v1 | null |
2024-08-16 | Adaptive Layer Selection for Efficient Vision Transformer Fine-Tuning | 自适应层选择,实现高效的视觉变换器微调 | Alessio Devoto, Federico Alvetreti, Jary Pomponi, Paolo Di Lorenzo, Pasquale Minervini, Simone Scardapane | http://arxiv.org/pdf/2408.08670v1 | null |
2024-08-16 | Learning A Low-Level Vision Generalist via Visual Task Prompt | 通过视觉任务提示学习低级视觉通才 | Xiangyu Chen, Yihao Liu, Yuandong Pu, Wenlong Zhang, Jiantao Zhou, Yu Qiao, Chao Dong | http://arxiv.org/pdf/2408.08601v1 | link |
2024-08-16 | EraW-Net: Enhance-Refine-Align W-Net for Scene-Associated Driver Attention Estimation | EraW-Net:增强-细化-对齐 W-Net,用于场景相关驾驶员注意力估计 | Jun Zhou, Chunsheng Liu, Faliang Chang, Wenqian Wang, Penghui Hao, Yiming Huang, Zhiqiang Yang | http://arxiv.org/pdf/2408.08570v1 | null |
2024-08-16 | Unsupervised Non-Rigid Point Cloud Matching through Large Vision Models | 通过大型视觉模型进行无监督非刚性点云匹配 | Zhangquan Chen, Puhua Jiang, Ruqi Huang | http://arxiv.org/pdf/2408.08568v1 | null |
2024-08-16 | S$^3$Attention: Improving Long Sequence Attention with Smoothed Skeleton Sketching | S$^3$Attention:通过平滑骨架草图提高长序列注意力 | Xue Wang, Tian Zhou, Jianqing Zhu, Jialin Liu, Kun Yuan, Tao Yao, Wotao Yin, Rong Jin, HanQin Cai | http://arxiv.org/pdf/2408.08567v1 | null |
2024-08-16 | Privacy-Preserving Vision Transformer Using Images Encrypted with Restricted Random Permutation Matrices | 使用受限随机置换矩阵加密图像的隐私保护视觉转换器 | Kouki Horio, Kiyoshi Nishikawa, Hitoshi Kiya | http://arxiv.org/pdf/2408.08529v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-16 | Multi-task Learning Approach for Intracranial Hemorrhage Prognosis | 颅内出血预后的多任务学习方法 | Miriam Cobo, Amaia Pérez del Barrio, Pablo Menéndez Fernández-Miranda, Pablo Sanz Bellón, Lara Lloret Iglesias, Wilson Silva | http://arxiv.org/pdf/2408.08784v1 | link |
2024-08-16 | QMambaBSR: Burst Image Super-Resolution with Query State Space Model | QMambaBSR:使用查询状态空间模型实现突发图像超分辨率 | Xin Di, Long Peng, Peizhe Xia, Wenbo Li, Renjing Pei, Yang Cao, Yang Wang, Zheng-Jun Zha | http://arxiv.org/pdf/2408.08665v1 | null |
2024-08-16 | Reference-free Axial Super-resolution of 3D Microscopy Images using Implicit Neural Representation with a 2D Diffusion Prior | 使用隐式神经表征和二维扩散先验实现 3D 显微镜图像的无参考轴向超分辨率 | Kyungryun Lee, Won-Ki Jeong | http://arxiv.org/pdf/2408.08616v1 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-16 | Assessing Generalization Capabilities of Malaria Diagnostic Models from Thin Blood Smears | 通过薄血涂片评估疟疾诊断模型的泛化能力 | Louise Guillon, Soheib Biga, Axel Puyo, Grégoire Pasquier, Valentin Foucher, Yendoubé E. Kantchire, Stéphane E. Sossou, Ameyo M. Dorkenoo, Laurent Bonnardot, Marc Thellier, et.al. | http://arxiv.org/pdf/2408.08792v1 | null |
2024-08-16 | MicroSSIM: Improved Structural Similarity for Comparing Microscopy Data | MicroSSIM:改进的显微镜数据结构相似性比较方法 | Ashesh Ashesh, Joran Deschamps, Florian Jug | http://arxiv.org/pdf/2408.08747v1 | link |
2024-08-16 | Historical Printed Ornaments: Dataset and Tasks | 历史印刷装饰品:数据集和任务 | Sayan Kumar Chaki, Zeynep Sonat Baltaci, Elliot Vincent, Remi Emonet, Fabienne Vial-Bonacci, Christelle Bahier-Porte, Mathieu Aubry, Thierry Fournel | http://arxiv.org/pdf/2408.08633v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-08-16 | Backward-Compatible Aligned Representations via an Orthogonal Transformation Layer | 通过正交变换层实现向后兼容的对齐表示 | Simone Ricci, Niccolò Biondi, Federico Pernici, Alberto Del Bimbo | http://arxiv.org/pdf/2408.08793v1 | null |
2024-08-16 | A lifted Bregman strategy for training unfolded proximal neural network Gaussian denoisers | 一种用于训练展开近端神经网络高斯降噪器的提升 Bregman 策略 | Xiaoyu Wang, Martin Benning, Audrey Repetti | http://arxiv.org/pdf/2408.08742v1 | null |
2024-08-16 | Bi-Directional Deep Contextual Video Compression | 双向深度上下文视频压缩 | Xihua Sheng, Li Li, Dong Liu, Shiqi Wang | http://arxiv.org/pdf/2408.08604v1 | null |
2024-08-16 | S-RAF: A Simulation-Based Robustness Assessment Framework for Responsible Autonomous Driving | S-RAF:基于模拟的负责任自动驾驶稳健性评估框架 | Daniel Omeiza, Pratik Somaiya, Jo-Ann Pattinson, Carolyn Ten-Holter, Jack Stilgoe, Marina Jirotka, Lars Kunze | http://arxiv.org/pdf/2408.08584v1 | link |