Skip to content

Latest commit

 

History

History
executable file
·
152 lines (129 loc) · 27.7 KB

2024-11-08.md

File metadata and controls

executable file
·
152 lines (129 loc) · 27.7 KB

[UPDATED!] 2024-11-08 (Publish Time)

生成模型

Publish Date Title Title_CN Authors PDF Code
2024-11-08 StdGEN: Semantic-Decomposed 3D Character Generation from Single Images StdGEN:基于语义分解的单张图像3D角色生成 Yuze He, Yanning Zhou, Wang Zhao, Zhongkai Wu, Kaiwen Xiao, Wei Yang, Yong-Jin Liu, Xiao Han http://arxiv.org/pdf/2411.05738v1 null
2024-11-08 Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models 图像到文本到图像:用于无标签评估图像到文本生成与文本到图像扩散模型的创新框架 Jia-Hong Huang, Hongyi Zhu, Yixian Shen, Stevan Rudinac, Evangelos Kanoulas http://arxiv.org/pdf/2411.05706v1 null
2024-11-08 Towards Lifelong Few-Shot Customization of Text-to-Image Diffusion 迈向终身少量样本的文本到图像扩散定制 Nan Song, Xiaofeng Yang, Ze Yang, Guosheng Lin http://arxiv.org/pdf/2411.05544v1 null
2024-11-08 Improving image synthesis with diffusion-negative sampling 提升图像生成中的扩散负采样技术 Alakh Desai, Nuno Vasconcelos http://arxiv.org/pdf/2411.05473v1 null
2024-11-08 POC-SLT: Partial Object Completion with SDF Latent Transformers 基于SDF潜在变换器的部分物体补全 Faezeh Zakeri, Raphael Braun, Lukas Ruppert, Henrik P. A. Lensch http://arxiv.org/pdf/2411.05419v1 null
2024-11-08 A Real-time Face Mask Detection and Social Distancing System for COVID-19 using Attention-InceptionV3 Model 基于Attention-InceptionV3模型的COVID-19实时口罩检测与社交距离监控系统 Abdullah Al Asif, Farhana Chowdhury Tisha http://arxiv.org/pdf/2411.05312v1 null
2024-11-08 Adaptive Whole-Body PET Image Denoising Using 3D Diffusion Models with ControlNet 基于控制网络的3D扩散模型自适应全身PET图像降噪 Boxiao Yu, Kuang Gong http://arxiv.org/pdf/2411.05302v1 null
2024-11-08 SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models SVDQuant:通过低秩成分吸收异常值以用于4位扩散模型 Muyang Li, Yujun Lin, Zhekai Zhang, Tianle Cai, Xiuyu Li, Junxian Guo, Enze Xie, Chenlin Meng, Jun-Yan Zhu, Song Han http://arxiv.org/pdf/2411.05007v2 link
2024-11-08 A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in Text-to-Image Encoders through Causal Analysis and Embedding Optimization 猫就是猫(不是狗!):通过因果分析和嵌入优化揭示文本到图像编码器中的信息混淆 Chieh-Yun Chen, Chiang Tseng, Li-Wu Tsao, Hong-Han Shuai http://arxiv.org/pdf/2410.00321v5 link
2024-11-08 TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation 时间对齐字幕:多场景文本到视频生成的时间对齐字幕 Hritik Bansal, Yonatan Bitton, Michal Yarom, Idan Szpektor, Aditya Grover, Kai-Wei Chang http://arxiv.org/pdf/2405.04682v4 null
2024-11-08 A Survey on Future Frame Synthesis: Bridging Deterministic and Generative Approaches 关于未来帧合成的调查:连接确定性和生成性方法 Ruibo Ming, Zhewei Huang, Zhuoxuan Ju, Jianming Hu, Lihui Peng, Shuchang Zhou http://arxiv.org/pdf/2401.14718v5 null
2024-11-08 Text-to-image Diffusion Models in Generative AI: A Survey 生成AI中的文本到图像扩散模型:综述 Chenshuang Zhang, Chaoning Zhang, Mengchun Zhang, In So Kweon, Junmo Kim http://arxiv.org/pdf/2303.07909v3 null

多模态

Publish Date Title Title_CN Authors PDF Code
2024-11-08 Tell What You Hear From What You See -- Video to Audio Generation Through Text 从所见之音中讲述你所听——基于文本的视频音频生成 Xiulong Liu, Kun Su, Eli Shlizerman http://arxiv.org/pdf/2411.05679v1 null
2024-11-08 Predicting Stroke through Retinal Graphs and Multimodal Self-supervised Learning 通过视网膜图和多模态自监督学习预测中风 Yuqing Huang, Bastian Wittmann, Olga Demler, Bjoern Menze, Neda Davoudi http://arxiv.org/pdf/2411.05597v1 null
2024-11-08 AuthFormer: Adaptive Multimodal biometric authentication transformer for middle-aged and elderly people 自适应多模态生物特征认证Transformer,适用于中老年人群 Yang rui, Meng ling-tao, Zhang qiu-yu http://arxiv.org/pdf/2411.05395v1 null
2024-11-08 ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving ZOPP:自动驾驶零样本离线全景感知框架 Tao Ma, Hongbin Zhou, Qiusheng Huang, Xuemeng Yang, Jianfei Guo, Bo Zhang, Min Dou, Yu Qiao, Botian Shi, Hongsheng Li http://arxiv.org/pdf/2411.05311v1 null
2024-11-08 Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding 基于层次视觉特征聚合的无OCR文档理解 Jaeyoo Park, Jin Young Choi, Jeonghyung Park, Bohyung Han http://arxiv.org/pdf/2411.05254v1 null
2024-11-08 SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering SaSR-Net:增强音视频问答的源感知语义表示网络 Tianyu Yang, Yiyang Nan, Lisen Dai, Zhenwen Liang, Yapeng Tian, Xiangliang Zhang http://arxiv.org/pdf/2411.04933v2 null
2024-11-08 Enhancing Osteoporosis Detection: An Explainable Multi-Modal Learning Framework with Feature Fusion and Variable Clustering 提升骨质疏松症检测:一种具有特征融合和变量聚类的可解释多模态学习框架 Mehdi Hosseini Chagahi, Saeed Mohammadi Dashtaki, Niloufar Delfan, Nadia Mohammadi, Alireza Samari, Behzad Moshiri, Md. Jalil Piran, Oliver Faust http://arxiv.org/pdf/2411.00916v2 null
2024-11-08 Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis 基于多任务视频分析的易于在家检测帕金森病 Md Saiful Islam, Tariq Adnan, Jan Freyberg, Sangwu Lee, Abdelrahman Abdelkader, Meghan Pawlik, Cathe Schwartz, Karen Jaffe, Ruth B. Schneider, E Ray Dorsey, et.al. http://arxiv.org/pdf/2406.14856v3 null

Nerf

Publish Date Title Title_CN Authors PDF Code
2024-11-08 A Nerf-Based Color Consistency Method for Remote Sensing Images 基于Nerf的遥感影像色彩一致性方法 Zongcheng Zuo, Yuanxiang Li, Tongtong Zhang http://arxiv.org/pdf/2411.05557v1 null
2024-11-08 From Transparent to Opaque: Rethinking Neural Implicit Surfaces with $α$-NeuS 从透明到不透明:α-NeuS 重构神经网络隐式曲面 Haoran Zhang, Junkai Deng, Xuhui Chen, Fei Hou, Wencheng Wang, Hong Qin, Chen Qian, Ying He http://arxiv.org/pdf/2411.05362v1 null
2024-11-08 Rate-aware Compression for NeRF-based Volumetric Video 基于NeRF的体积视频的速率感知压缩 Zhiyu Zhang, Guo Lu, Huanxiong Liang, Zhengxue Cheng, Anni Tang, Li Song http://arxiv.org/pdf/2411.05322v1 null
2024-11-08 Efficient Dynamic-NeRF Based Volumetric Video Coding with Rate Distortion Optimization 基于高效动态NeRF的体素视频编码及速率失真优化 Zhiyu Zhang, Guo Lu, Huanxiong Liang, Anni Tang, Qiang Hu, Li Song http://arxiv.org/pdf/2402.01380v2 null
2024-11-08 $α$Surf: Implicit Surface Reconstruction for Semi-Transparent and Thin Objects with Decoupled Geometry and Opacity $α$Surf:基于解耦几何和透明度的半透明和薄物体隐式表面重建 Tianhao Wu, Hanxue Liang, Fangcheng Zhong, Gernot Riegler, Shimon Vainer, Jiankang Deng, Cengiz Oztireli http://arxiv.org/pdf/2303.10083v2 null

3DGS

Publish Date Title Title_CN Authors PDF Code
2024-11-08 GS2Pose: Two-stage 6D Object Pose Estimation Guided by Gaussian Splatting GS2Pose:基于高斯散布的二级6D目标姿态估计 Jilan Mei, Junbo Li, Cai Meng http://arxiv.org/pdf/2411.03807v3 null

模型压缩/优化

Publish Date Title Title_CN Authors PDF Code
2024-11-08 FGGP: Fixed-Rate Gradient-First Gradual Pruning FGGP:固定率梯度优先渐变剪枝 Lingkai Zhu, Can Deniz Bezek, Orcun Goksel http://arxiv.org/pdf/2411.05500v1 null
2024-11-08 SASWISE-UE: Segmentation and Synthesis with Interpretable Scalable Ensembles for Uncertainty Estimation SASWISE-UE:用于不确定性估计的可解释可扩展集成分割与合成 Weijie Chen, Alan McMillan http://arxiv.org/pdf/2411.05324v1 null

分类/检测/识别/分割/...

Publish Date Title Title_CN Authors PDF Code
2024-11-08 Curriculum Learning for Few-Shot Domain Adaptation in CT-based Airway Tree Segmentation 基于CT的气道树分割中少样本领域自适应的课程学习 Maxime Jacovella, Ali Keshavarzi, Elsa Angelini http://arxiv.org/pdf/2411.05779v1 null
2024-11-08 FisherMask: Enhancing Neural Network Labeling Efficiency in Image Classification Using Fisher Information FisherMask:利用Fisher信息提升图像分类中神经网络标记效率 Shreen Gul, Mohamed Elmahallawy, Sanjay Madria, Ardhendu Tripathy http://arxiv.org/pdf/2411.05752v1 null
2024-11-08 WavShadow: Wavelet Based Shadow Segmentation and Removal 基于小波的阴影分割与去除 Shreyans Jain, Aadya Arora, Viraj Vekaria, Karan Gandhi http://arxiv.org/pdf/2411.05747v1 null
2024-11-08 Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream 灵长类视觉腹侧流任务优化模型的可扩展性法则 Abdulkadir Gokce, Martin Schrimpf http://arxiv.org/pdf/2411.05712v1 null
2024-11-08 Visual-TCAV: Concept-based Attribution and Saliency Maps for Post-hoc Explainability in Image Classification 视觉-TCAV:图像分类后验可解释性中的基于概念的可解释性和显著性图 Antonio De Santis, Riccardo Campi, Matteo Bianchi, Marco Brambilla http://arxiv.org/pdf/2411.05698v1 null
2024-11-08 Autoregressive Adaptive Hypergraph Transformer for Skeleton-based Activity Recognition 自回归自适应超图Transformer在骨骼动作识别中的应用 Abhisek Ray, Ayush Raj, Maheshkumar H. Kolekar http://arxiv.org/pdf/2411.05692v1 null
2024-11-08 Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation 在线LoRA:基于低秩调整的无任务在线持续学习 Xiwen Wei, Guihong Li, Radu Marculescu http://arxiv.org/pdf/2411.05663v1 null
2024-11-08 Video RWKV:Video Action Recognition Based RWKV 视频RWKV:基于RWKV的视频动作识别 Zhuowen Yin, Chengru Li, Xingbo Dong http://arxiv.org/pdf/2411.05636v1 null
2024-11-08 SynDroneVision: A Synthetic Dataset for Image-Based Drone Detection SynDroneVision:基于图像的无人机检测合成数据集 Tamara R. Lenhard, Andreas Weinmann, Kai Franke, Tobias Koch http://arxiv.org/pdf/2411.05633v1 null
2024-11-08 Efficient Audio-Visual Fusion for Video Classification 高效的视频分类中的视听融合 Mahrukh Awan, Asmar Nadeem, Armin Mustafa http://arxiv.org/pdf/2411.05603v1 null
2024-11-08 Open-set object detection: towards unified problem formulation and benchmarking 开放集目标检测:迈向统一问题表述和基准测试 Hejer Ammar, Nikita Kiselov, Guillaume Lapouge, Romaric Audigier http://arxiv.org/pdf/2411.05564v1 null
2024-11-08 Training objective drives the consistency of representational similarity across datasets 训练目标驱动数据集间表征相似性的一致性 Laure Ciernik, Lorenz Linhardt, Marco Morik, Jonas Dippel, Simon Kornblith, Lukas Muttenthaler http://arxiv.org/pdf/2411.05561v1 null
2024-11-08 DeepArUco++: Improved detection of square fiducial markers in challenging lighting conditions DeepArUco++:在复杂光照条件下改进的方形标识符标记检测 Rafael Berral-Soler, Rafael Muñoz-Salinas, Rafael Medina-Carnicer, Manuel J. Marín-Jiménez http://arxiv.org/pdf/2411.05552v1 null
2024-11-08 Do Histopathological Foundation Models Eliminate Batch Effects? A Comparative Study 病理学基础模型能否消除批效应?一项比较研究 Jonah Kömen, Hannah Marienwald, Jonas Dippel, Julius Hense http://arxiv.org/pdf/2411.05489v1 null
2024-11-08 Comparative Study of Probabilistic Atlas and Deep Learning Approaches for Automatic Brain Tissue Segmentation from MRI Using N4 Bias Field Correction and Anisotropic Diffusion Pre-processing Techniques 基于N4偏场校正和各向异性扩散预处理技术的MRI自动脑组织分割中概率图谱与深度学习方法的比较研究 Mohammad Imran Hossain, Muhammad Zain Amin, Daniel Tweneboah Anyimadu, Taofik Ahmed Suleiman http://arxiv.org/pdf/2411.05456v1 null
2024-11-08 Agricultural Landscape Understanding At Country-Scale 国家尺度下的农业景观理解 Radhika Dua, Nikita Saxena, Aditi Agarwal, Alex Wilson, Gaurav Singh, Hoang Tran, Ishan Deshpande, Amandeep Kaur, Gaurav Aggarwal, Chandan Nath, et.al. http://arxiv.org/pdf/2411.05359v1 null
2024-11-08 Enhancing Visual Classification using Comparative Descriptors 利用比较描述符增强视觉分类 Hankyeol Lee, Gawon Seo, Wonseok Choi, Geunyoung Jung, Kyungwoo Song, Jiyoung Jung http://arxiv.org/pdf/2411.05357v1 null
2024-11-08 A Quality-Centric Framework for Generic Deepfake Detection 基于质量的通用深度伪造检测框架 Wentang Song, Zhiyuan Yan, Yuzhen Lin, Taiping Yao, Changsheng Chen, Shen Chen, Yandan Zhao, Shouhong Ding, Bin Li http://arxiv.org/pdf/2411.05335v1 null
2024-11-08 Revisiting Network Perturbation for Semi-Supervised Semantic Segmentation 重新审视网络扰动在半监督语义分割中的应用 Sien Li, Tao Wang, Ruizhe Hu, Wenxi Liu http://arxiv.org/pdf/2411.05307v1 null
2024-11-08 SimpleBEV: Improved LiDAR-Camera Fusion Architecture for 3D Object Detection SimpleBEV:改进的激光雷达-摄像头融合架构用于3D目标检测 Yun Zhao, Zhan Gong, Peiru Zheng, Hong Zhu, Shaohua Wu http://arxiv.org/pdf/2411.05292v1 null
2024-11-08 Cancer-Net SCa-Synth: An Open Access Synthetically Generated 2D Skin Lesion Dataset for Skin Cancer Classification 癌症网络SCa-Synth:一个开放获取的合成生成2D皮肤病变数据集,用于皮肤癌分类 Chi-en Amy Tai, Oustan Ding, Alexander Wong http://arxiv.org/pdf/2411.05269v1 null
2024-11-08 ZAHA: Introducing the Level of Facade Generalization and the Large-Scale Point Cloud Facade Semantic Segmentation Benchmark Dataset ZAHA:引入立面泛化级别和大规模点云立面语义分割基准数据集 Olaf Wysocki, Yue Tan, Thomas Froech, Yan Xia, Magdalena Wysocki, Ludwig Hoegner, Daniel Cremers, Christoph Holst http://arxiv.org/pdf/2411.04865v2 link
2024-11-08 From CNN to ConvRNN: Adapting Visualization Techniques for Time-Series Anomaly Detection 从CNN到ConvRNN:适应时间序列异常检测的可视化技术 Fabien Poirier http://arxiv.org/pdf/2411.04707v2 null
2024-11-08 Region-Guided Attack on the Segment Anything Model (SAM) 基于区域引导的Segment Anything Model(SAM)攻击 Xiaoliang Liu, Furao Shen, Jian Zhao http://arxiv.org/pdf/2411.02974v2 null
2024-11-08 ROAD-Waymo: Action Awareness at Scale for Autonomous Driving 道路-Waymo:大规模自动驾驶的动作意识 Salman Khan, Izzeddin Teeti, Reza Javanmard Alitappeh, Mihaela C. Stoian, Eleonora Giunchiglia, Gurkirt Singh, Andrew Bradley, Fabio Cuzzolin http://arxiv.org/pdf/2411.01683v2 link
2024-11-08 Lung tumor segmentation in MRI mice scans using 3D nnU-Net with minimum annotations 基于3D nnU-Net的MRI小鼠扫描肺癌分割及最小标注需求 Piotr Kaniewski, Fariba Yousefi, Yeman Brhane Hagos, Talha Qaiser, Nikolay Burlutskiy http://arxiv.org/pdf/2411.00922v2 null
2024-11-08 Spiking Neural Network as Adaptive Event Stream Slicer 脉冲神经网络作为自适应事件流切片器 Jiahang Cao, Mingyuan Sun, Ziqing Wang, Hao Cheng, Qiang Zhang, Shibo Zhou, Renjing Xu http://arxiv.org/pdf/2410.02249v2 null
2024-11-08 Rethinking Pre-trained Feature Extractor Selection in Multiple Instance Learning for Whole Slide Image Classification 重新思考多实例学习中全切片图像分类的预训练特征提取器选择 Bryan Wong, Mun Yong Yi http://arxiv.org/pdf/2408.01167v2 null
2024-11-08 Leveraging Bi-Focal Perspectives and Granular Feature Integration for Accurate Reliable Early Alzheimer's Detection 利用双焦点视角和粒度特征集成进行准确可靠的早期阿尔茨海默病检测 Pandiyaraju V, Shravan Venkatraman, Abeshek A, Aravintakshan S A, Pavan Kumar S, Kannan A http://arxiv.org/pdf/2407.10921v2 null
2024-11-08 Bounding Boxes and Probabilistic Graphical Models: Video Anomaly Detection Simplified 边界框与概率图模型:简化视频异常检测 Mia Siemon, Thomas B. Moeslund, Barry Norton, Kamal Nasrollahi http://arxiv.org/pdf/2407.06000v2 null
2024-11-08 Boosting 3D Object Detection with Semantic-Aware Multi-Branch Framework 基于语义感知多分支框架增强3D物体检测 Hao Jing, Anhong Wang, Lijun Zhao, Yakun Yang, Donghan Bu, Jing Zhang, Yifan Zhang, Junhui Hou http://arxiv.org/pdf/2407.05769v2 null
2024-11-08 IMDL-BenCo: A Comprehensive Benchmark and Codebase for Image Manipulation Detection & Localization IMDL-BenCo:图像操纵检测与定位的全面基准和代码库 Xiaochen Ma, Xuekang Zhu, Lei Su, Bo Du, Zhuohang Jiang, Bingkui Tong, Zeyu Lei, Xinyu Yang, Chi-Man Pun, Jiancheng Lv, et.al. http://arxiv.org/pdf/2406.10580v2 link
2024-11-08 HPE-CogVLM: Advancing Vision Language Models with a Head Pose Grounding Task HPE-CogVLM:通过头部姿态定位任务推动视觉语言模型的发展 Yu Tian, Tianqi Shao, Tsukasa Demizu, Xuyang Wu, Hsin-Tai Wu http://arxiv.org/pdf/2406.01914v2 null
2024-11-08 *: Improving the 3D detector by introducing Voxel2Pillar feature encoding and extracting multi-scale features 通过引入Voxel2Pillar特征编码和提取多尺度特征改进3D检测器 Xusheng Li, Chengliang Wang, Shumao Wang, Zhuo Zeng, Ji Liu http://arxiv.org/pdf/2405.09828v3 null
2024-11-08 Enhancing Vision-Language Few-Shot Adaptation with Negative Learning 增强视觉-语言小样本适应性的负学习 Ce Zhang, Simon Stepputtis, Katia Sycara, Yaqi Xie http://arxiv.org/pdf/2403.12964v2 link
2024-11-08 VM-UNet: Vision Mamba UNet for Medical Image Segmentation VM-UNet:用于医学图像分割的视觉Mamba UNet Jiacheng Ruan, Jincheng Li, Suncheng Xiang http://arxiv.org/pdf/2402.02491v2 link
2024-11-08 Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation Hi-SAM:融合Segment Anything Model进行分层文本分割 Maoyuan Ye, Jing Zhang, Juhua Liu, Chenyu Liu, Baocai Yin, Cong Liu, Bo Du, Dacheng Tao http://arxiv.org/pdf/2401.17904v2 link
2024-11-08 Fight Fire with Fire: Combating Adversarial Patch Attacks using Pattern-randomized Defensive Patches 以火攻火:利用模式随机化防御补丁对抗对抗性补丁攻击 Jianan Feng, Jiachun Li, Changqing Miao, Jianjun Huang, Wei You, Wenchang Shi, Bin Liang http://arxiv.org/pdf/2311.06122v2 link
2024-11-08 CompaCT: Fractal-Based Heuristic Pixel Segmentation for Lossless Compression of High-Color DICOM Medical Images CompaCT:基于分形启发式像素分割的高彩色DICOM医学图像无损压缩 Taaha Khan http://arxiv.org/pdf/2308.13097v2 link

LLM

Publish Date Title Title_CN Authors PDF Code
2024-11-08 A Two-Step Concept-Based Approach for Enhanced Interpretability and Trust in Skin Lesion Diagnosis 基于概念的两步法提升皮肤病变诊断的可解释性和可信度 Cristiano Patrício, Luís F. Teixeira, João C. Neves http://arxiv.org/pdf/2411.05609v1 null
2024-11-08 VISTA: Visual Integrated System for Tailored Automation in Math Problem Generation Using LLM VISTA:基于大型语言模型在数学问题生成中定制自动化视觉集成系统 Jeongwoo Lee, Kwangsuk Park, Jihyeon Park http://arxiv.org/pdf/2411.05423v1 null
2024-11-08 Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks 针对语言模型防御越狱攻击的鲁棒提示优化 Andy Zhou, Bo Li, Haohan Wang http://arxiv.org/pdf/2401.17263v5 link

Transformer

Publish Date Title Title_CN Authors PDF Code
2024-11-08 PEP-GS: Perceptually-Enhanced Precise Structured 3D Gaussians for View-Adaptive Rendering PEP-GS:感知增强精确结构化3D高斯函数用于视域自适应渲染 Junxi Jin, Xiulai Li, Haiping Huang, Lianjun Liu, Yujie Sun http://arxiv.org/pdf/2411.05731v1 null
2024-11-08 STARS: Sensor-agnostic Transformer Architecture for Remote Sensing STARS:适用于遥感的光传感器无关的Transformer架构 Ethan King, Jaime Rodriguez, Diego Llanes, Timothy Doster, Tegan Emerson, James Koch http://arxiv.org/pdf/2411.05714v1 null
2024-11-08 Image inpainting enhancement by replacing the original mask with a self-attended region from the input image 基于输入图像自注意力区域的原始掩码替换的图像修复增强 Kourosh Kiani, Razieh Rastgoo, Alireza Chaji, Sergio Escalera http://arxiv.org/pdf/2411.05705v1 null
2024-11-08 CFPNet: Improving Lightweight ToF Depth Completion via Cross-zone Feature Propagation CFPNet:通过跨区域特征传播提升轻量级ToF深度补全 Laiyan Ding, Hualie Jiang, Rui Xu, Rui Huang http://arxiv.org/pdf/2411.04480v2 link
2024-11-08 Bootstrapping Top-down Information for Self-modulating Slot Attention 自调制槽注意力机制的顶层信息自举 Dongwon Kim, Seoyeon Kim, Suha Kwak http://arxiv.org/pdf/2411.01801v2 null
2024-11-08 Benchmarking Ultra-High-Definition Image Reflection Removal 超高清图像反射去除性能评估 Zhenyuan Zhang, Zhenbo Song, Kaihao Zhang, Zhaoxin Fan, Jianfeng Lu http://arxiv.org/pdf/2308.00265v2 link

3D/CG

Publish Date Title Title_CN Authors PDF Code
2024-11-08 Alignment of 3D woodblock geometrical models and 2D orthographic projection image 三维木版几何模型与二维正射投影图像的对齐 Minh DUc Nguyen, Cong Thuong Le, Trong Lam Nguyen http://arxiv.org/pdf/2411.05524v1 null
2024-11-08 Untrained neural networks can demonstrate memorization-independent abstract reasoning 未训练神经网络的记忆无关抽象推理 Tomer Barak, Yonatan Loewenstein http://arxiv.org/pdf/2407.17791v2 link
2024-11-08 "Where am I?" Scene Retrieval with Language “我在哪里?”基于语言的场景检索 Jiaqi Chen, Daniel Barath, Iro Armeni, Marc Pollefeys, Hermann Blum http://arxiv.org/pdf/2404.14565v2 null
2024-11-08 Physically Realizable Natural-Looking Clothing Textures Evade Person Detectors via 3D Modeling 通过3D建模实现逼真衣物纹理以规避人体检测器 Zhanhao Hu, Wenda Chu, Xiaopei Zhu, Hui Zhang, Bo Zhang, Xiaolin Hu http://arxiv.org/pdf/2307.01778v2 link

各类学习方式

Publish Date Title Title_CN Authors PDF Code
2024-11-08 Sketched Equivariant Imaging Regularization and Deep Internal Learning for Inverse Problems 草图等变成像正则化与深度内部学习在逆问题中的应用 Guixian Xu, Jinglai Li, Junqi Tang http://arxiv.org/pdf/2411.05771v1 null
2024-11-08 End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering 端到端视觉语言模型导航:将空间推理转化为问答 Dylan Goetting, Himanshu Gaurav Singh, Antonio Loquercio http://arxiv.org/pdf/2411.05755v1 null
2024-11-08 Advancing Meteorological Forecasting: AI-based Approach to Synoptic Weather Map Analysis 推进气象预报:基于AI的天气图分析技术 Yo-Hwan Choi, Seon-Yu Kang, Minjong Cheon http://arxiv.org/pdf/2411.05384v1 null

其他

Publish Date Title Title_CN Authors PDF Code
2024-11-08 ASL STEM Wiki: Dataset and Benchmark for Interpreting STEM Articles ASL STEM Wiki:STEM文章翻译数据集和基准 Kayo Yin, Chinmay Singh, Fyodor O. Minakov, Vanessa Milan, Hal Daumé III, Cyril Zhang, Alex X. Lu, Danielle Bragg http://arxiv.org/pdf/2411.05783v1 null
2024-11-08 GazeSearch: Radiology Findings Search Benchmark 视觉搜索:放射学发现搜索基准 Trong Thang Pham, Tien-Phat Nguyen, Yuki Ikebe, Akash Awasthi, Zhigang Deng, Carol C. Wu, Hien Nguyen, Ngan Le http://arxiv.org/pdf/2411.05780v1 null
2024-11-08 Poze: Sports Technique Feedback under Data Constraints 基于数据约束的体育技巧反馈 Agamdeep Singh, Sujit PB, Mayank Vatsa http://arxiv.org/pdf/2411.05734v1 null
2024-11-08 Towards Scalable Foundation Models for Digital Dermatology 面向数字皮肤病学可扩展的基础模型 Fabian Gröger, Philippe Gottfrois, Ludovic Amruthalingam, Alvaro Gonzalez-Jimenez, Simone Lionetti, Luis R. Soenksen-Martinez, Alexander A. Navarini, Marc Pouly http://arxiv.org/pdf/2411.05514v1 null
2024-11-08 Tightly-Coupled, Speed-aided Monocular Visual-Inertial Localization in Topological Map 紧耦合、速度辅助的单目视觉惯性定位在拓扑地图中 Chanuk Yang, Hayeon O, Kunsoo Huh http://arxiv.org/pdf/2411.05497v1 null
2024-11-08 WeatherGFM: Learning A Weather Generalist Foundation Model via In-context Learning 天气通用基础模型:通过上下文学习学习天气通用模型 Xiangyu Zhao, Zhiwang Zhou, Wenlong Zhang, Yihao Liu, Xiangyu Chen, Junchao Gong, Hao Chen, Ben Fei, Shiqi Chen, Wanli Ouyang, et.al. http://arxiv.org/pdf/2411.05420v1 null
2024-11-08 Image Decomposition: Theory, Numerical Schemes, and Performance Evaluation 图像分解:理论、数值方法和性能评估 Jerome Gilles http://arxiv.org/pdf/2411.05265v1 null
2024-11-08 Decoding Report Generators: A Cyclic Vision-Language Adapter for Counterfactual Explanations 解码报告生成器:循环视觉语言适配器用于反事实解释 Yingying Fang, Zihao Jin, Shaojie Guo, Jinda Liu, Yijian Gao, Junzhi Ning, Zhiling Yue, Zhi Li, Simon LF Walsh, Guang Yang http://arxiv.org/pdf/2411.05261v1 null
2024-11-08 Super-resolution in disordered media using neural networks 基于神经网络的失序介质超分辨率 Alexander Christie, Matan Leibovich, Miguel Moscoso, Alexei Novikov, George Papanicolaou, Chrysoula Tsogka http://arxiv.org/pdf/2410.21556v3 null
2024-11-08 Improvement of Spiking Neural Network with Bit Planes and Color Models 基于位平面和颜色模型的脉冲神经网络改进 Nhan T. Luu, Duong T. Luu, Nam N. Pham, Thang C. Truong http://arxiv.org/pdf/2410.08229v2 null
2024-11-08 TropNNC: Structured Neural Network Compression Using Tropical Geometry TropNNC:使用热带几何的神经网络结构压缩 Konstantinos Fotopoulos, Petros Maragos, Panagiotis Misiakos http://arxiv.org/pdf/2409.03945v2 null
2024-11-08 MLAAN: Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network MLAAN:基于多层跳跃增强辅助网络的监督局部学习扩展 Yuming Zhang, Shouxin Zhang, Peizhe Wang, Feiyu Zhu, Dongzhi Guan, Junhao Su, Jiabin Liu, Changpeng Cai http://arxiv.org/pdf/2406.16633v5 null