Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-09-28 | Introducing SDICE: An Index for Assessing Diversity of Synthetic Medical Datasets | SDICE:评估合成医疗数据集多样性的指标研究 | Mohammed Talha Alam, Raza Imam, Mohammad Areeb Qazi, Asim Ukaye, Karthik Nandakumar | http://arxiv.org/pdf/2409.19436v1 | null |
2024-09-28 | Efficient Semantic Diffusion Architectures for Model Training on Synthetic Echocardiograms | 高效语义扩散架构在合成超声心动图模型训练中的应用 | David Stojanovski, Mariana da Silva, Pablo Lamata, Arian Beqiri, Alberto Gomez | http://arxiv.org/pdf/2409.19371v1 | null |
2024-09-28 | Conditional Image Synthesis with Diffusion Models: A Survey | 条件图像合成中的扩散模型:综述 | Zheyuan Zhan, Defang Chen, Jian-Ping Mei, Zhenghe Zhao, Jiawei Chen, Chun Chen, Siwei Lyu, Can Wang | http://arxiv.org/pdf/2409.19365v1 | null |
2024-09-28 | CausalVE: Face Video Privacy Encryption via Causal Video Prediction | CausalVE:基于因果视频预测的人脸视频隐私加密方法 | Yubo Huang, Wenhao Feng, Xin Lai, Zixi Wang, Jingzehua Xu, Shuai Zhang, Hongjie He, Fan Chen | http://arxiv.org/pdf/2409.19306v1 | null |
2024-09-28 | FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models | FINE:分解知识以初始化可变尺寸扩散模型 | Yucheng Xie, Fu Feng, Ruixiao Shi, Jing Wang, Xin Geng | http://arxiv.org/pdf/2409.19289v1 | null |
2024-09-28 | WcDT: World-centric Diffusion Transformer for Traffic Scene Generation | WcDT:面向交通场景生成的以世界为中心的扩散Transformer模型 | Chen Yang, Yangfan He, Aaron Xuxiang Tian, Dong Chen, Tianyu Shi, Arsalan Heydarian | http://arxiv.org/pdf/2404.02082v2 | link |
2024-09-28 | UKnow: A Unified Knowledge Protocol with Multimodal Knowledge Graph Datasets for Reasoning and Vision-Language Pre-Training | UKnow:面向推理与视觉-语言预训练的统一知识协议多模态知识图谱数据集 | Biao Gong, Shuai Tan, Yutong Feng, Xiaoying Xie, Yuyuan Li, Chaochao Chen, Kecheng Zheng, Yujun Shen, Deli Zhao | http://arxiv.org/pdf/2302.06891v4 | null |
2024-09-28 | Three-stage binarization of color document images based on discrete wavelet transform and generative adversarial networks | 基于离散小波变换和生成对抗网络的颜色文档图像三阶段二值化方法 | Rui-Yang Ju, Yu-Shian Lin, Yanlin Jin, Chih-Chia Chen, Chun-Tse Chien, Jen-Shiun Chiang | http://arxiv.org/pdf/2211.16098v8 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-09-28 | MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation | MedCLIP-SAMv2:迈向通用文本驱动的医学图像分割 | Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz, Yiming Xiao | http://arxiv.org/pdf/2409.19483v1 | null |
2024-09-28 | FairPIVARA: Reducing and Assessing Biases in CLIP-Based Multimodal Models | 公平PIVARA:降低和评估基于CLIP的多模态模型中的偏见 | Diego A. B. Moreira, Alef Iury Ferreira, Gabriel Oliveira dos Santos, Luiz Pereira, João Medrado Gondim, Gustavo Bonil, Helena Maia, Nádia da Silva, Simone Tiemi Hashiguti, Jefersson A. dos Santos, et.al. | http://arxiv.org/pdf/2409.19474v1 | null |
2024-09-28 | Contrastive ground-level image and remote sensing pre-training improves representation learning for natural world imagery | 对比地物图像与遥感预训练提升自然世界图像表征学习 | Andy V. Huynh, Lauren E. Gillespie, Jael Lopez-Saucedo, Claire Tang, Rohan Sikand, Moisés Expósito-Alonso | http://arxiv.org/pdf/2409.19439v1 | null |
2024-09-28 | From Unimodal to Multimodal: Scaling up Projectors to Align Modalities | 从单模态到多模态:扩展投影器以对齐模态 | Mayug Maniparambil, Raiymbek Akshulakov, Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Ankit Singh, Noel E. O'Connor | http://arxiv.org/pdf/2409.19425v1 | null |
2024-09-28 | Multi-sensor Learning Enables Information Transfer across Different Sensory Data and Augments Multi-modality Imaging | 多传感器学习实现不同感官数据间的信息传递并增强多模态成像能力 | Lingting Zhu, Yizheng Chen, Lianli Liu, Lei Xing, Lequan Yu | http://arxiv.org/pdf/2409.19420v1 | null |
2024-09-28 | X-Prompt: Multi-modal Visual Prompt for Video Object Segmentation | X-Prompt:面向视频目标分割的多模态视觉提示方法 | Pinxue Guo, Wanyun Li, Hao Huang, Lingyi Hong, Xinyu Zhou, Zhaoyu Chen, Jinglun Li, Kaixun Jiang, Wei Zhang, Wenqiang Zhang | http://arxiv.org/pdf/2409.19342v1 | null |
2024-09-28 | Visual Question Decomposition on Multimodal Large Language Models | 视觉问题分解在多模态大型语言模型上的研究 | Haowei Zhang, Jianzhe Liu, Zhen Han, Shuo Chen, Bailan He, Volker Tresp, Zhiqiang Xu, Jindong Gu | http://arxiv.org/pdf/2409.19339v1 | null |
2024-09-28 | 3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models | 3D-CT-GPT:通过集成大型视觉-语言模型生成三维放射学报告 | Hao Chen, Wei Zhao, Yingli Li, Tianyang Zhong, Yisong Wang, Youlan Shang, Lei Guo, Junwei Han, Tianming Liu, Jun Liu, et.al. | http://arxiv.org/pdf/2409.19330v1 | null |
2024-09-28 | CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling | CLIP-MoE:构建具有多样化多重回收的CLIP混合专家模型 | Jihai Zhang, Xiaoye Qu, Tong Zhu, Yu Cheng | http://arxiv.org/pdf/2409.19291v1 | null |
2024-09-28 | TrojVLM: Backdoor Attack Against Vision Language Models | TrojVLM:针对视觉语言模型的的后门攻击 | Weimin Lyu, Lu Pang, Tengfei Ma, Haibin Ling, Chao Chen | http://arxiv.org/pdf/2409.19232v1 | null |
2024-09-28 | Multimodal-Enhanced Objectness Learner for Corner Case Detection in Autonomous Driving | 多模态增强的目标性学习器在自动驾驶角案例检测中的应用 | Lixing Xiao, Ruixiao Shi, Xiaoyang Tang, Yi Zhou | http://arxiv.org/pdf/2402.02026v2 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-09-28 | G3R: Gradient Guided Generalizable Reconstruction | G3R:梯度引导的可泛化重建方法 | Yun Chen, Jingkang Wang, Ze Yang, Sivabalan Manivasagam, Raquel Urtasun | http://arxiv.org/pdf/2409.19405v1 | null |
2024-09-28 | GeoTransfer : Generalizable Few-Shot Multi-View Reconstruction via Transfer Learning | GeoTransfer:基于迁移学习的可泛化少样本多视角重建方法 | Shubhendu Jena, Franck Multon, Adnane Boukhayma | http://arxiv.org/pdf/2408.14724v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-09-28 | GS-EVT: Cross-Modal Event Camera Tracking based on Gaussian Splatting | GS-EVT:基于高斯展开的跨模态事件相机追踪算法 | Tao Liu, Runze Yuan, Yi'ang Ju, Xun Xu, Jiaqi Yang, Xiangting Meng, Xavier Lagorce, Laurent Kneip | http://arxiv.org/pdf/2409.19228v1 | null |
2024-09-28 | 1st Place Solution to the 8th HANDS Workshop Challenge -- ARCTIC Track: 3DGS-based Bimanual Category-agnostic Interaction Reconstruction | 第八届HANDS研讨会挑战赛ARCTIC赛道一等奖解决方案:基于3DGS的双手类别无关交互重建 | Jeongwan On, Kyeonghwan Gwak, Gunyoung Kang, Hyein Hwang, Soohyun Hwang, Junuk Cha, Jaewook Han, Seungryul Baek | http://arxiv.org/pdf/2409.19215v1 | null |
2024-09-28 | SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting | SplatSim:基于高斯扩散的RGB操作策略零样本Sim2Real迁移 | Mohammad Nomaan Qureshi, Sparsh Garg, Francisco Yandun, David Held, George Kantor, Abhisesh Silwal | http://arxiv.org/pdf/2409.10161v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-09-28 | Mind the Gap: Promoting Missing Modality Brain Tumor Segmentation with Alignment | 填补空白:利用对齐促进缺失模态脑肿瘤分割 | Tianyi Liu, Zhaorui Tan, Haochuan Jiang, Xi Yang, Kaizhu Huang | http://arxiv.org/pdf/2409.19366v1 | null |
2024-09-28 | MOC-RVQ: Multilevel Codebook-Assisted Digital Generative Semantic Communication | MOC-RVQ:多级码本辅助的数字生成语义通信 | Yingbin Zhou, Yaping Sun, Guanying Chen, Xiaodong Xu, Hao Chen, Binhong Huang, Shuguang Cui, Ping Zhang | http://arxiv.org/pdf/2401.01272v2 | link |
2024-09-28 | Adaptive Depth Networks with Skippable Sub-Paths | 自适应深度网络与可跳过子路径 | Woochul Kang, Hyungseop Lee | http://arxiv.org/pdf/2312.16392v3 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-09-28 | Accelerating Malware Classification: A Vision Transformer Solution | 加速恶意软件分类:一种视觉变换器解决方案 | Shrey Bavishi, Shrey Modi | http://arxiv.org/pdf/2409.19461v1 | null |
2024-09-28 | On the universality of neural encodings in CNNs | 神经编码在卷积神经网络中的普遍性研究 | Florentin Guth, Brice Ménard | http://arxiv.org/pdf/2409.19460v1 | null |
2024-09-28 | See Where You Read with Eye Gaze Tracking and Large Language Model | 基于眼动追踪与大型语言模型的可视化阅读位置识别研究 | Sikai Yang, Gang Yan | http://arxiv.org/pdf/2409.19454v1 | null |
2024-09-28 | Canonical Correlation Guided Deep Neural Network | 规范相关引导的深度神经网络 | Zhiwen Chen, Siwen Mo, Haobin Ke, Steven X. Ding, Zhaohui Jiang, Chunhua Yang, Weihua Gui | http://arxiv.org/pdf/2409.19396v1 | null |
2024-09-28 | DOTA: Distributional Test-Time Adaptation of Vision-Language Models | DOTA:视觉语言模型的分布式测试时自适应调整 | Zongbo Han, Jialong Yang, Junfan Li, Qinghua Hu, Qianli Xu, Mike Zheng Shou, Changqing Zhang | http://arxiv.org/pdf/2409.19375v1 | null |
2024-09-28 | MambaEviScrib: Mamba and Evidence-Guided Consistency Make CNN Work Robustly for Scribble-Based Weakly Supervised Ultrasound Image Segmentation | 曼巴EviScrib:基于曼巴与证据引导一致性的CNN在基于涂鸦弱监督超声图像分割中的稳健工作 | Xiaoxiang Han, Xinyu Li, Jiang Shang, Yiman Liu, Keyan Chen, Qiaohong Liu, Qi Zhang | http://arxiv.org/pdf/2409.19370v1 | null |
2024-09-28 | Sparse Modelling for Feature Learning in High Dimensional Data | 高维数据特征学习的稀疏建模方法 | Harish Neelam, Koushik Sai Veerella, Souradip Biswas | http://arxiv.org/pdf/2409.19361v1 | null |
2024-09-28 | Toward Deep Learning-based Segmentation and Quantitative Analysis of Cervical Spinal Cord Magnetic Resonance Images | 基于深度学习的颈椎脊髓磁共振图像分割与定量分析研究 | Maryam Tavakol Elahi | http://arxiv.org/pdf/2409.19354v1 | null |
2024-09-28 | VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition | VLAD-BuFF:面向视觉地点识别的突发感知快速特征聚合方法 | Ahmad Khaliq, Ming Xu, Stephen Hausler, Michael Milford, Sourav Garg | http://arxiv.org/pdf/2409.19293v1 | link |
2024-09-28 | Beyond Euclidean: Dual-Space Representation Learning for Weakly Supervised Video Violence Detection | 超越欧几里得:双空间表示学习在弱监督视频暴力检测中的应用 | Jiaxu Leng, Zhanjie Wu, Mingpi Tan, Yiran Liu, Ji Gan, Haosheng Chen, Xinbo Gao | http://arxiv.org/pdf/2409.19252v1 | null |
2024-09-28 | Cauchy activation function and XNet | 柯西激活函数与XNet研究 | Xin Li, Zhihong Xia, Hongkun Zhang | http://arxiv.org/pdf/2409.19221v1 | null |
2024-09-28 | Learning to Obstruct Few-Shot Image Classification over Restricted Classes | 学习在受限类别上阻碍少样本图像分类 | Amber Yijia Zheng, Chiao-An Yang, Raymond A. Yeh | http://arxiv.org/pdf/2409.19210v1 | null |
2024-09-28 | TextGaze: Gaze-Controllable Face Generation with Natural Language | 文本注视:基于自然语言的注视可控人脸生成技术 | Hengfei Wang, Zhongqun Zhang, Yihua Cheng, Hyung Jin Chang | http://arxiv.org/pdf/2404.17486v3 | null |
2024-09-28 | Machine Vision-Based Assessment of Fall Color Changes and its Relationship with Leaf Nitrogen Concentration | 基于机器视觉的秋季叶色变化评估及其与叶片氮浓度关系研究 | Achyut Paudel, Jostan Brown, Priyanka Upadhyaya, Atif Bilal Asad, Safal Kshetri, Joseph R. Davidson, Cindy Grimm, Ashley Thompson, Bernardita Sallato, Matthew D. Whiting, et.al. | http://arxiv.org/pdf/2404.14653v2 | null |
2024-09-28 | AnyPattern: Towards In-context Image Copy Detection | AnyPattern:迈向上下文内图像复制检测 | Wenhao Wang, Yifan Sun, Zhentao Tan, Yi Yang | http://arxiv.org/pdf/2404.13788v3 | link |
2024-09-28 | RPMArt: Towards Robust Perception and Manipulation for Articulated Objects | RPMArt:面向关节物体的稳健感知与操作研究 | Junbo Wang, Wenhai Liu, Qiaojun Yu, Yang You, Liu Liu, Weiming Wang, Cewu Lu | http://arxiv.org/pdf/2403.16023v2 | link |
2024-09-28 | ProMISe: Promptable Medical Image Segmentation using SAM | Promise:基于SAM的可提示医疗图像分割方法 | Jinfeng Wang, Sifan Song, Xinkun Wang, Yiyi Wang, Yiyi Miao, Jionglong Su, S. Kevin Zhou | http://arxiv.org/pdf/2403.04164v3 | link |
2024-09-28 | YOLOv8-AM: YOLOv8 Based on Effective Attention Mechanisms for Pediatric Wrist Fracture Detection | 基于有效注意力机制的YOLOv8-AM在小儿腕部骨折检测中的应用 | Chun-Tse Chien, Rui-Yang Ju, Kuang-Yi Chou, Enkaer Xieerke, Jen-Shiun Chiang | http://arxiv.org/pdf/2402.09329v5 | link |
2024-09-28 | Improving Image Coding for Machines through Optimizing Encoder via Auxiliary Loss | 优化辅助损失函数以提升机器图像编码器的性能 | Kei Iino, Shunsuke Akamatsu, Hiroshi Watanabe, Shohei Enomoto, Akira Sakamoto, Takeharu Eda | http://arxiv.org/pdf/2402.08267v2 | null |
2024-09-28 | G2D: From Global to Dense Radiography Representation Learning via Vision-Language Pre-training | G2D:基于视觉-语言预训练的从全局到密集X射线表征学习 | Che Liu, Cheng Ouyang, Sibo Cheng, Anand Shah, Wenjia Bai, Rossella Arcucci | http://arxiv.org/pdf/2312.01522v2 | null |
2024-09-28 | Exploring the Coordination of Frequency and Attention in Masked Image Modeling | 探索遮罩图像建模中频率与注意力的协调机制 | Jie Gui, Tuo Chen, Minjing Dong, Zhengqi Liu, Hao Luo, James Tin-Yau Kwok, Yuan Yan Tang | http://arxiv.org/pdf/2211.15362v3 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-09-28 | Towards Croppable Implicit Neural Representations | 面向可裁剪的隐式神经表示方法 | Maor Ashkenazi, Eran Treister | http://arxiv.org/pdf/2409.19472v1 | link |
2024-09-28 | Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration | 带掩膜的任意物体恢复:利用掩膜图像建模实现盲全合一图像修复 | Chu-Jie Qin, Rui-Qi Wu, Zikun Liu, Xin Lin, Chun-Le Guo, Hyun Hee Park, Chongyi Li | http://arxiv.org/pdf/2409.19403v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-09-28 | DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning | DENEB:一种对图像字幕生成具有幻觉鲁棒性的自动评价指标 | Kazuki Matsuda, Yuiga Wada, Komei Sugiura | http://arxiv.org/pdf/2409.19255v1 | null |
2024-09-28 | Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers | Chat-Scene: 利用对象标识符桥接三维场景与大型语言模型 | Haifeng Huang, Yilun Chen, Zehan Wang, Rongjie Huang, Runsen Xu, Tai Wang, Luping Liu, Xize Cheng, Yang Zhao, Jiangmiao Pang, et.al. | http://arxiv.org/pdf/2312.08168v4 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-09-28 | Fast Encoding and Decoding for Implicit Video Representation | 快速编码与解码隐式视频表示 | Hao Chen, Saining Xie, Ser-Nam Lim, Abhinav Shrivastava | http://arxiv.org/pdf/2409.19429v1 | null |
2024-09-28 | Steering Prediction via a Multi-Sensor System for Autonomous Racing | 多传感器系统在自动驾驶赛车中的转向预测研究 | Zhuyun Zhou, Zongwei Wu, Florian Bolli, Rémi Boutteau, Fan Yang, Radu Timofte, Dominique Ginhac, Tobi Delbruck | http://arxiv.org/pdf/2409.19356v1 | null |
2024-09-28 | Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization | 揭示视觉Transformer中的良性过拟合:训练动态、收敛性与泛化能力 | Jiarui Jiang, Wei Huang, Miao Zhang, Taiji Suzuki, Liqiang Nie | http://arxiv.org/pdf/2409.19345v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-09-28 | Solution of Multiview Egocentric Hand Tracking Challenge ECCV2024 | 多视角自我中心手部跟踪挑战的解决方案 ECCV2024 | Minqiang Zou, Zhi Lv, Riqiang Jin, Tian Zhan, Mochen Yu, Yao Tang, Jiajun Liang | http://arxiv.org/pdf/2409.19362v1 | null |
2024-09-28 | Scalable Cloud-Native Pipeline for Efficient 3D Model Reconstruction from Monocular Smartphone Images | 可扩展云原生管道:高效从单目智能手机图像重建三维模型 | Potito Aghilar, Vito Walter Anelli, Michelantonio Trizio, Tommaso Di Noia | http://arxiv.org/pdf/2409.19322v1 | null |
2024-09-28 | PDCFNet: Enhancing Underwater Images through Pixel Difference Convolution | PDCFNet:通过像素差分卷积增强水下图像 | Song Zhang, Daoliang Li, Ran Zhao | http://arxiv.org/pdf/2409.19269v1 | link |
2024-09-28 | ReLoo: Reconstructing Humans Dressed in Loose Garments from Monocular Video in the Wild | ReLoo:从野外单目视频中重建穿着宽松衣物的人类形态 | Chen Guo, Tianjian Jiang, Manuel Kaufmann, Chengwei Zheng, Julien Valentin, Jie Song, Otmar Hilliges | http://arxiv.org/pdf/2409.15269v2 | null |
2024-09-28 | CT-AGRG: Automated Abnormality-Guided Report Generation from 3D Chest CT Volumes | CT-AGRG:基于三维胸部CT体积的自动化异常引导报告生成 | Theo Di Piazza | http://arxiv.org/pdf/2408.11965v3 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-09-28 | MicroSSIM: Improved Structural Similarity for Comparing Microscopy Data | 微SSIM:用于比较显微数据改进的结构相似性度量 | Ashesh Ashesh, Joran Deschamps, Florian Jug | http://arxiv.org/pdf/2408.08747v2 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-09-28 | Language-guided Robust Navigation for Mobile Robots in Dynamically-changing Environments | 动态环境下语言引导的移动机器人鲁棒导航方法 | Cody Simons, Zhichao Liu, Brandon Marcus, Amit K. Roy-Chowdhury, Konstantinos Karydis | http://arxiv.org/pdf/2409.19459v1 | null |
2024-09-28 | Brain-JEPA: Brain Dynamics Foundation Model with Gradient Positioning and Spatiotemporal Masking | 脑-JEPA:基于梯度定位与时空掩膜的脑动力学基础模型 | Zijian Dong, Ruilin Li, Yilei Wu, Thuan Tinh Nguyen, Joanna Su Xian Chong, Fang Ji, Nathanael Ren Jie Tong, Christopher Li Hsian Chen, Juan Helen Zhou | http://arxiv.org/pdf/2409.19407v1 | null |
2024-09-28 | Projected Tensor-Tensor Products for Efficient Computation of Optimal Multiway Data Representations | 投影张量-张量积:高效计算最优多向数据表示的算法 | Katherine Keegan, Elizabeth Newman | http://arxiv.org/pdf/2409.19402v1 | null |
2024-09-28 | EEPNet: Efficient Edge Pixel-based Matching Network for Cross-Modal Dynamic Registration between LiDAR and Camera | 高效边缘像素匹配网络EEPNet:用于LiDAR与摄像头跨模态动态配准 | Yuanchao Yue, Hui Yuan, Suai Li, Qi Jiang | http://arxiv.org/pdf/2409.19305v1 | null |
2024-09-28 | Summit Vitals: Multi-Camera and Multi-Signal Biosensing at High Altitudes | 高峰生命体征:高海拔环境下多摄像头与多信号生物传感技术研究 | Ke Liu, Jiankai Tang, Zhang Jiang, Yuntao Wang, Xiaojing Liu, Dong Li, Yuanchun Shi | http://arxiv.org/pdf/2409.19223v1 | null |
2024-09-28 | Extending Depth of Field for Varifocal Multiview Images | 扩展变焦多视角图像的景深范围 | Zhilong Li, Kejun Wu, Qiong Liu, You Yang | http://arxiv.org/pdf/2409.19220v1 | null |
2024-09-28 | What Makes for Good Image Captions? | 良好的图像标题应具备哪些特点? | Delong Chen, Samuel Cahyawijaya, Etsuko Ishii, Ho Shu Chan, Yejin Bang, Pascale Fung | http://arxiv.org/pdf/2405.00485v2 | null |
2024-09-28 | Energy-Based Concept Bottleneck Models: Unifying Prediction, Concept Intervention, and Probabilistic Interpretations | 基于能量的概念瓶颈模型:统一预测、概念干预与概率解释 | Xinyue Xu, Yi Qin, Lu Mi, Hao Wang, Xiaomeng Li | http://arxiv.org/pdf/2401.14142v3 | link |
2024-09-28 | IMMA: Immunizing text-to-image Models against Malicious Adaptation | IMMA:对文本到图像模型的恶意适应性免疫保护研究 | Amber Yijia Zheng, Raymond A. Yeh | http://arxiv.org/pdf/2311.18815v3 | link |