Skip to content

Latest commit

 

History

History
executable file
·
201 lines (174 loc) · 38.5 KB

2024-11-12.md

File metadata and controls

executable file
·
201 lines (174 loc) · 38.5 KB

[UPDATED!] 2024-11-12 (Publish Time)

生成模型

Publish Date Title Title_CN Authors PDF Code
2024-11-12 Scaling Properties of Diffusion Models for Perceptual Tasks 扩散模型在感知任务中的扩展性质 Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran, Jitendra Malik http://arxiv.org/pdf/2411.08034v1 null
2024-11-12 GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation 高斯万物:交互式点云潜在扩散用于3D生成 Yushi Lan, Shangchen Zhou, Zhaoyang Lyu, Fangzhou Hong, Shuai Yang, Bo Dai, Xingang Pan, Chen Change Loy http://arxiv.org/pdf/2411.08033v1 null
2024-11-12 Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings 波莱特隐扩散(Wala):具有紧凑小波编码的百亿参数3D生成模型 Aditya Sanghi, Aliasghar Khani, Pradyumna Reddy, Arianna Rampini, Derek Cheung, Kamal Rahimi Malekshan, Kanika Madan, Hooman Shayani http://arxiv.org/pdf/2411.08017v1 null
2024-11-12 JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation JanusFlow:统一多模态理解和生成中的自回归与修正流和谐化 Yiyang Ma, Xingchao Liu, Xiaokang Chen, Wen Liu, Chengyue Wu, Zhiyu Wu, Zizheng Pan, Zhenda Xie, Haowei Zhang, Xingkai yu, et.al. http://arxiv.org/pdf/2411.07975v1 null
2024-11-12 DuoLift-GAN:Reconstructing CT from Single-view and Biplanar X-Rays with Generative Adversarial Networks DuoLift-GAN:利用生成对抗网络从单视图和双平面X射线重建CT Zhaoxi Zhang, Yueliang Ying http://arxiv.org/pdf/2411.07941v1 null
2024-11-12 Diverse capability and scaling of diffusion and auto-regressive models when learning abstract rules 扩散和自回归模型在学习抽象规则时的多样性和扩展性 Binxu Wang, Jiaqi Shang, Haim Sompolinsky http://arxiv.org/pdf/2411.07873v1 null
2024-11-12 Interaction Asymmetry: A General Principle for Learning Composable Abstractions 交互不对称性:学习可组合抽象的通用原则 Jack Brady, Julius von Kügelgen, Sébastien Lachapelle, Simon Buchholz, Thomas Kipf, Wieland Brendel http://arxiv.org/pdf/2411.07784v1 null
2024-11-12 Novel View Synthesis with Pixel-Space Diffusion Models 基于像素空间扩散模型的创新视图合成 Noam Elata, Bahjat Kawar, Yaron Ostrovsky-Berman, Miriam Farber, Ron Sokolovsky http://arxiv.org/pdf/2411.07765v1 null
2024-11-12 LapGSR: Laplacian Reconstructive Network for Guided Thermal Super-Resolution 拉普拉斯重建网络:引导热超分辨率用拉普拉斯网络 Aditya Kasliwal, Ishaan Gakhar, Aryan Kamani, Pratinav Seth, Ujjwal Verma http://arxiv.org/pdf/2411.07750v1 null
2024-11-12 Evaluating the Generation of Spatial Relations in Text and Image Generative Models 评估文本和图像生成模型中的空间关系生成 Shang Hong Sim, Clarence Lee, Alvin Tan, Cheston Tan http://arxiv.org/pdf/2411.07664v1 null
2024-11-12 Leveraging Previous Steps: A Training-free Fast Solver for Flow Diffusion 利用先前步骤:一种无训练的快速流扩散求解器 Kaiyu Song, Hanjiang Lai http://arxiv.org/pdf/2411.07627v1 null
2024-11-12 Unraveling the Connections between Flow Matching and Diffusion Probabilistic Models in Training-free Conditional Generation 揭示流匹配与扩散之间的联系:无监督条件生成中的概率模型 Kaiyu Song, Hanjiang Lai http://arxiv.org/pdf/2411.07625v1 null
2024-11-12 Artificial Intelligence for Biomedical Video Generation 人工智能在生物医学视频生成中的应用 Linyuan Li, Jianing Qiu, Anujit Saha, Lin Li, Poyuan Li, Mengxian He, Ziyu Guo, Wu Yuan http://arxiv.org/pdf/2411.07619v1 null
2024-11-12 Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors 半真图像:用于评估AI生成图像检测器鲁棒性的大规模数据集 Anisha Pal, Julia Kruk, Mansi Phute, Manognya Bhattaram, Diyi Yang, Duen Horng Chau, Judy Hoffman http://arxiv.org/pdf/2411.07472v1 null
2024-11-12 Tracing the Roots: Leveraging Temporal Dynamics in Diffusion Trajectories for Origin Attribution 追踪根源:利用扩散轨迹的时间动态进行起源归因 Andreas Floros, Seyed-Mohsen Moosavi-Dezfooli, Pier Luigi Dragotti http://arxiv.org/pdf/2411.07449v1 null
2024-11-12 All-in-one Weather-degraded Image Restoration via Adaptive Degradation-aware Self-prompting Model 一体化恶劣天气退化图像修复:自适应退化感知自提示模型 Yuanbo Wen, Tao Gao, Ziqi Li, Jing Zhang, Kaihao Zhang, Ting Chen http://arxiv.org/pdf/2411.07445v1 null
2024-11-12 Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models 基于预训练扩散模型的免训练图像对象插入 Yoad Tewel, Rinon Gal, Dvir Samuel, Yuval Atzmon, Lior Wolf, Gal Chechik http://arxiv.org/pdf/2411.07232v2 null
2024-11-12 CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense 因果扩散模型驱动的对抗防御去耦 Mingkun Zhang, Keping Bi, Wei Chen, Quanrun Chen, Jiafeng Guo, Xueqi Cheng http://arxiv.org/pdf/2410.23091v3 null
2024-11-12 Transformer-Based Tooth Alignment Prediction With Occlusion And Collision Constraints 基于Transformer的考虑遮挡和碰撞约束的牙齿对齐预测 ZhenXing Dong, JiaZhou Chen, YangHui Xu http://arxiv.org/pdf/2410.20806v3 null
2024-11-12 HYPNOS : Highly Precise Foreground-focused Diffusion Finetuning for Inanimate Objects HYPNOS:针对非生物物体的高精度前景聚焦扩散微调 Oliverio Theophilus Nathanael, Jonathan Samuel Lumentut, Nicholas Hans Muliawan, Edbert Valencio Angky, Felix Indra Kurniadi, Alfi Yusrotis Zakiyyah, Jeklin Harefa http://arxiv.org/pdf/2410.14265v2 null
2024-11-12 Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization 调整配对样本优化的时间步长蒸馏扩散模型 Zichen Miao, Zhengyuan Yang, Kevin Lin, Ze Wang, Zicheng Liu, Lijuan Wang, Qiang Qiu http://arxiv.org/pdf/2410.03190v2 null
2024-11-12 GenRec: Unifying Video Generation and Recognition with Diffusion Models GenRec:通过扩散模型统一视频生成与识别 Zejia Weng, Xitong Yang, Zhen Xing, Zuxuan Wu, Yu-Gang Jiang http://arxiv.org/pdf/2408.15241v2 null
2024-11-12 Leveraging Pre-trained Models for FF-to-FFPE Histopathological Image Translation 利用预训练模型进行FF至FFPE病理图像转换 Qilai Zhang, Jiawen Li, Peiran Liao, Jiali Hu, Tian Guan, Anjia Han, Yonghong He http://arxiv.org/pdf/2406.18054v2 link
2024-11-12 Neural Gaffer: Relighting Any Object via Diffusion 神经胶水:通过扩散重光照任何物体 Haian Jin, Yuan Li, Fujun Luan, Yuanbo Xiangli, Sai Bi, Kai Zhang, Zexiang Xu, Jin Sun, Noah Snavely http://arxiv.org/pdf/2406.07520v3 null
2024-11-12 Video Diffusion Models are Training-free Motion Interpreter and Controller 视频扩散模型:无需训练的运动解释器和控制器 Zeqi Xiao, Yifan Zhou, Shuai Yang, Xingang Pan http://arxiv.org/pdf/2405.14864v3 null
2024-11-12 Functional Imaging Constrained Diffusion for Brain PET Synthesis from Structural MRI 基于功能成像约束的扩散脑PET合成从结构MRI Minhui Yu, Mengqi Wu, Ling Yue, Andrea Bozoki, Mingxia Liu http://arxiv.org/pdf/2405.02504v3 null
2024-11-12 Improving Training-free Conditional Diffusion Model via Fisher Information 通过Fisher信息改进无训练条件扩散模型 Kaiyu Song, Hanjiang Lai http://arxiv.org/pdf/2404.18252v2 null
2024-11-12 Exploring Diverse Methods in Visual Question Answering 探索视觉问答中的多种方法 Panfeng Li, Qikai Yang, Xieming Geng, Wenjing Zhou, Zhicheng Ding, Yi Nian http://arxiv.org/pdf/2404.13565v3 null
2024-11-12 DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling 基于高斯点撒的联合相关建模的3D场景生成:DreamScape Xuening Yuan, Hongyu Yang, Yueming Zhao, Di Huang http://arxiv.org/pdf/2404.09227v2 null
2024-11-12 Diffusion Models Meet Remote Sensing: Principles, Methods, and Perspectives 扩散模型与遥感:原理、方法与展望 Yidan Liu, Jun Yue, Shaobo Xia, Pedram Ghamisi, Weiying Xie, Leyuan Fang http://arxiv.org/pdf/2404.08926v3 null
2024-11-12 LaRE^2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection 基于潜在重建误差的扩散生成图像检测方法:LaRE^2 Yunpeng Luo, Junlong Du, Ke Yan, Shouhong Ding http://arxiv.org/pdf/2403.17465v3 null
2024-11-12 LEO: Generative Latent Image Animator for Human Video Synthesis LEO:用于人类视频合成的生成式潜在图像动画器 Yaohui Wang, Xin Ma, Xinyuan Chen, Cunjian Chen, Antitza Dantcheva, Bo Dai, Yu Qiao http://arxiv.org/pdf/2305.03989v3 link

多模态

Publish Date Title Title_CN Authors PDF Code
2024-11-12 Commissioning An All-Sky Infrared Camera Array for Detection Of Airborne Objects 全天空红外相机阵列的启动用于空中目标检测 Laura Dominé, Ankit Biswas, Richard Cloete, Alex Delacroix, Andriy Fedorenko, Lucas Jacaruso, Ezra Kelderman, Eric Keto, Sarah Little, Abraham Loeb, et.al. http://arxiv.org/pdf/2411.07956v1 null
2024-11-12 SimBase: A Simple Baseline for Temporal Video Grounding SimBase:一种简单的时间视频定位基线 Peijun Bao, Alex C. Kot http://arxiv.org/pdf/2411.07945v1 null
2024-11-12 Towards Vision Mixture of Experts for Wildlife Monitoring on the Edge 面向边缘野生动物监测的视觉混合专家算法 Emmanuel Azuh Mensah, Anderson Lee, Haoran Zhang, Yitong Shan, Kurtis Heimerl http://arxiv.org/pdf/2411.07834v1 null
2024-11-12 Constraint Learning for Parametric Point Cloud 参数点云的约束学习 Xi Cheng, Ruiqi Lei, Di Huang, Zhichao Liao, Fengyuan Piao, Yan Chen, Pingfa Feng, Long Zeng http://arxiv.org/pdf/2411.07747v1 null
2024-11-12 Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG 基于ImageRAG增强超高清遥感图像分析 Zilun Zhang, Haozhan Shen, Tiancheng Zhao, Yuhao Wang, Bin Chen, Yuxiang Cai, Yongheng Shang, Jianwei Yin http://arxiv.org/pdf/2411.07688v1 null
2024-11-12 Understanding Audiovisual Deepfake Detection: Techniques, Challenges, Human Factors and Perceptual Insights 理解视听深度伪造检测:技术、挑战、人因因素与感知洞察 Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, Hsin-Min Wang http://arxiv.org/pdf/2411.07650v1 null
2024-11-12 Contrastive Language Prompting to Ease False Positives in Medical Anomaly Detection 对比语言提示以缓解医学异常检测中的误报 YeongHyeon Park, Myung Jin Kim, Hyeong Seok Kim http://arxiv.org/pdf/2411.07546v1 null
2024-11-12 SparrowVQE: Visual Question Explanation for Course Content Understanding SparrowVQE:课程内容理解的视觉问题解释 Jialu Li, Manish Kumar Thota, Ruslan Gokhman, Radek Holik, Youshan Zhang http://arxiv.org/pdf/2411.07516v1 null
2024-11-12 MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data MSEG-VCUQ:基于增强视觉基础模型、卷积神经网络和不确定性量化进行多模态分割的高速视频相位检测数据处理 Chika Maduabuchi, Ericmoore Jossou, Matteo Bucci http://arxiv.org/pdf/2411.07463v1 null
2024-11-12 BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions BLIP3-KALE:知识增强的大规模密集式标题 Anas Awadalla, Le Xue, Manli Shu, An Yan, Jun Wang, Senthil Purushwalkam, Sheng Shen, Hannah Lee, Oscar Lo, Jae Sung Park, et.al. http://arxiv.org/pdf/2411.07461v1 null
2024-11-12 LLMs Can Evolve Continually on Modality for X-Modal Reasoning 大型语言模型能够在模态上持续进化以实现跨模态推理 Jiazuo Yu, Haomiao Xiong, Lu Zhang, Haiwen Diao, Yunzhi Zhuge, Lanqing Hong, Dong Wang, Huchuan Lu, You He, Long Chen http://arxiv.org/pdf/2410.20178v2 link
2024-11-12 MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks MEGA-Bench:将多模态评估扩展至超过500个真实世界任务 Jiacheng Chen, Tianhao Liang, Sherman Siu, Zhengqing Wang, Kai Wang, Yubo Wang, Yuansheng Ni, Wang Zhu, Ziyan Jiang, Bohan Lyu, et.al. http://arxiv.org/pdf/2410.10563v2 null
2024-11-12 MIRAGE: Multimodal Identification and Recognition of Annotations in Indian General Prescriptions MIRAGE:印度通用处方中多模态注释识别与识别 Tavish Mankash, V. S. Chaithanya Kota, Anish De, Praveen Prakash, Kshitij Jadhav http://arxiv.org/pdf/2410.09729v2 null
2024-11-12 Adapting Segment Anything Model to Multi-modal Salient Object Detection with Semantic Feature Fusion Guidance 适配语义特征融合引导的多模态显著目标检测的Segment Anything模型 Kunpeng Wang, Danying Lin, Chenglong Li, Zhengzheng Tu, Bin Luo http://arxiv.org/pdf/2408.15063v4 link
2024-11-12 L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection L4DR:用于天气鲁棒的3D目标检测的激光雷达-雷达融合 Xun Huang, Ziyu Xu, Hai Wu, Jinlong Wang, Qiming Xia, Yan Xia, Jonathan Li, Kyle Gao, Chenglu Wen, Cheng Wang http://arxiv.org/pdf/2408.03677v4 null
2024-11-12 Pseudo-triplet Guided Few-shot Composed Image Retrieval 伪三元组引导的少样本合成图像检索 Bohan Hou, Haoqiang Lin, Haokun Wen, Meng Liu, Mingzhu Xu, Xuemeng Song http://arxiv.org/pdf/2407.06001v2 null
2024-11-12 MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations MMLongBench-Doc:基于可视化的大语境文档理解基准测试 Yubo Ma, Yuhang Zang, Liangyu Chen, Meiqi Chen, Yizhu Jiao, Xinze Li, Xinyuan Lu, Ziyu Liu, Yan Ma, Xiaoyi Dong, et.al. http://arxiv.org/pdf/2407.01523v3 null
2024-11-12 OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer OmAgent:复杂视频理解的多模态代理框架——任务划分与征服 Lu Zhang, Tiancheng Zhao, Heting Ying, Yibo Ma, Kyusong Lee http://arxiv.org/pdf/2406.16620v3 link
2024-11-12 Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags 通过检索标签提醒多模态大型语言模型具备物体感知知识 Daiqing Qi, Handong Zhao, Zijun Wei, Sheng Li http://arxiv.org/pdf/2406.10839v3 null
2024-11-12 Enhance Image-to-Image Generation with LLaVA-generated Prompts 利用LLaVA生成的提示增强图像到图像生成 Zhicheng Ding, Panfeng Li, Qikai Yang, Siyang Li http://arxiv.org/pdf/2406.01956v3 null
2024-11-12 Meta-Learned Modality-Weighted Knowledge Distillation for Robust Multi-Modal Learning with Missing Data 元学习模态加权知识蒸馏,用于具有缺失数据的鲁棒多模态学习 Hu Wang, Salma Hassan, Yuyuan Liu, Congbo Ma, Yuanhong Chen, Yutong Xie, Mostafa Salem, Yu Tian, Jodie Avery, Louise Hull, et.al. http://arxiv.org/pdf/2405.07155v2 link
2024-11-12 Revisiting the Adversarial Robustness of Vision Language Models: a Multimodal Perspective 重新审视视觉语言模型的对抗鲁棒性:多模态视角 Wanqi Zhou, Shuanghao Bai, Danilo P. Mandic, Qibin Zhao, Badong Chen http://arxiv.org/pdf/2404.19287v3 link
2024-11-12 How Does the Textual Information Affect the Retrieval of Multimodal In-Context Learning? 文本信息如何影响多模态情境学习的检索? Yang Luo, Zangwei Zheng, Zirui Zhu, Yang You http://arxiv.org/pdf/2404.12866v2 null

Nerf

Publish Date Title Title_CN Authors PDF Code
2024-11-12 Material Transforms from Disentangled NeRF Representations 基于解耦NeRF表示的物质变换 Ivan Lopes, Jean-François Lalonde, Raoul de Charette http://arxiv.org/pdf/2411.08037v1 null
2024-11-12 LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS 光高斯:无界3D高斯压缩,15倍缩减与200+ FPS Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang http://arxiv.org/pdf/2311.17245v6 link

3DGS

Publish Date Title Title_CN Authors PDF Code
2024-11-12 Projecting Gaussian Ellipsoids While Avoiding Affine Projection Approximation 在避免仿射投影逼近的同时投影高斯椭球体 Han Qi, Tao Cai, Xiyue Han http://arxiv.org/pdf/2411.07579v1 null
2024-11-12 GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting 高斯切割:基于图割的3D高斯分层交互式分割 Umangi Jain, Ashkan Mirzaei, Igor Gilitschenski http://arxiv.org/pdf/2411.07555v1 null
2024-11-12 HiCoM: Hierarchical Coherent Motion for Streamable Dynamic Scene with 3D Gaussian Splatting HiCoM:基于3D高斯拼贴的可流式动态场景分层连贯运动 Qiankun Gao, Jiarui Meng, Chengxiang Wen, Jie Chen, Jian Zhang http://arxiv.org/pdf/2411.07541v1 null
2024-11-12 GUS-IR: Gaussian Splatting with Unified Shading for Inverse Rendering GUS-IR:基于统一着色的高斯溅射逆渲染 Zhihao Liang, Hongdong Li, Kui Jia, Kailing Guo, Qi Zhang http://arxiv.org/pdf/2411.07478v1 null
2024-11-12 SplatFormer: Point Transformer for Robust 3D Gaussian Splatting SplatFormer:用于鲁棒3D高斯Splatting的点变换器 Yutong Chen, Marko Mihajlovic, Xiyi Chen, Yiming Wang, Sergey Prokudin, Siyu Tang http://arxiv.org/pdf/2411.06390v2 link

模型压缩/优化

Publish Date Title Title_CN Authors PDF Code
2024-11-12 DINO-LG: A Task-Specific DINO Model for Coronary Calcium Scoring DINO-LG:针对冠状动脉钙化评分的特定任务DINO模型 Mahmut S. Gokmen, Cody Bumgardner, Caner Ozcan http://arxiv.org/pdf/2411.07976v1 null
2024-11-12 Efficient 3D Perception on Multi-Sweep Point Cloud with Gumbel Spatial Pruning 基于Gumbel空间剪枝的多扫点云高效3D感知 Jianhao Li, Tianyu Sun, Xueqian Zhang, Zhongdao Wang, Bailan Feng, Hengshuang Zhao http://arxiv.org/pdf/2411.07742v1 null
2024-11-12 Quantifying Knowledge Distillation Using Partial Information Decomposition 基于部分信息分解的知识蒸馏量化 Pasan Dissanayake, Faisal Hamman, Barproda Halder, Ilia Sucholutsky, Qiuyi Zhang, Sanghamitra Dutta http://arxiv.org/pdf/2411.07483v1 null
2024-11-12 Zero-Shot NAS via the Suppression of Local Entropy Decrease 通过抑制局部熵减的零样本NAS Ning Wu, Han Huang, Yueting Xu, Zhifeng Hao http://arxiv.org/pdf/2411.06236v2 null
2024-11-12 TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration 异构智能体协作迁移视觉-语言基础模型 Yiwei Guo, Shaobin Zhuang, Kunchang Li, Yu Qiao, Yali Wang http://arxiv.org/pdf/2410.12183v2 link

分类/检测/识别/分割/...

Publish Date Title Title_CN Authors PDF Code
2024-11-12 Automatic dataset shift identification to support root cause analysis of AI performance drift 自动数据集偏移识别以支持AI性能漂移的根本原因分析 Mélanie Roschewitz, Raghav Mehta, Charles Jones, Ben Glocker http://arxiv.org/pdf/2411.07940v1 null
2024-11-12 Isometric Transformations for Image Augmentation in Mueller Matrix Polarimetry 等距变换在穆勒矩阵极化光度测量图像增强中的应用 Christopher Hahne, Omar Rodriguez-Nunez, Éléa Gros, Théotim Lucas, Ekkehard Hewer, Tatiana Novikova, Theoni Maragkou, Philippe Schucht, Richard McKinley http://arxiv.org/pdf/2411.07918v1 null
2024-11-12 TLDR: Traffic Light Detection using Fourier Domain Adaptation in Hostile WeatheR 基于傅里叶域自适应的恶劣天气下交通灯检测 Ishaan Gakhar, Aryesh Guha, Aryaman Gupta, Amit Agarwal, Durga Toshniwal, Ujjwal Verma http://arxiv.org/pdf/2411.07901v1 null
2024-11-12 INTRABENCH: Interactive Radiological Benchmark INTRABENCH:交互式放射学基准 Constantin Ulrich, Tassilo Wald, Emily Tempus, Maximilian Rokuss, Paul F. Jaeger, Klaus Maier-Hein http://arxiv.org/pdf/2411.07885v1 null
2024-11-12 CDXFormer: Boosting Remote Sensing Change Detection with Extended Long Short-Term Memory CDXFormer:借助扩展长短期记忆增强遥感变化检测 Zhenkai Wu, Xiaowen Ma, Rongrong Lian, Zhentao Lin, Wei Zhang http://arxiv.org/pdf/2411.07863v1 null
2024-11-12 Large-scale Remote Sensing Image Target Recognition and Automatic Annotation 大规模遥感图像目标识别与自动标注 Wuzheng Dong http://arxiv.org/pdf/2411.07802v1 null
2024-11-12 Horticultural Temporal Fruit Monitoring via 3D Instance Segmentation and Re-Identification using Point Clouds 基于点云的3D实例分割与再识别的园艺水果时空监测 Daniel Fusaro, Federico Magistri, Jens Behley, Alberto Pretto, Cyrill Stachniss http://arxiv.org/pdf/2411.07799v1 null
2024-11-12 AdaSemiCD: An Adaptive Semi-Supervised Change Detection Method Based on Pseudo-Label Evaluation AdaSemiCD:基于伪标签评估的自适应半监督变化检测方法 Ran Lingyan, Wen Dongcheng, Zhuo Tao, Zhang Shizhou, Zhang Xiuwei, Zhang Yanning http://arxiv.org/pdf/2411.07758v1 null
2024-11-12 ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction 自适应提升型3D语义占用与成本体积基础流量预测 Dubing Chen, Jin Fang, Wencheng Han, Xinjing Cheng, Junbo Yin, Chenzhong Xu, Fahad Shahbaz Khan, Jianbing Shen http://arxiv.org/pdf/2411.07725v1 null
2024-11-12 EMPERROR: A Flexible Generative Perception Error Model for Probing Self-Driving Planners EMPERROR:用于探测自动驾驶规划器的灵活生成感知错误模型 Niklas Hanselmann, Simon Doll, Marius Cordts, Hendrik P. A. Lensch, Andreas Geiger http://arxiv.org/pdf/2411.07719v1 null
2024-11-12 Emotion Classification of Children Expressions 儿童表情情绪分类 Sanchayan Vivekananthan http://arxiv.org/pdf/2411.07708v1 null
2024-11-12 AI enhanced diagnosis of Peyronies disease a novel approach using Computer Vision 基于计算机视觉的AI增强勃起功能障碍诊断:一种新方法 Yudara Kularathne, Janitha Prathapa, Prarththanan Sothyrajah, Salomi Arasaratnam, Sithira Ambepitiya, Thanveer Ahamed, Dinuka Wijesundara http://arxiv.org/pdf/2411.07684v1 null
2024-11-12 HMIL: Hierarchical Multi-Instance Learning for Fine-Grained Whole Slide Image Classification 分层多实例学习用于细粒度全切片图像分类 Cheng Jin, Luyang Luo, Huangjing Lin, Jun Hou, Hao Chen http://arxiv.org/pdf/2411.07660v1 null
2024-11-12 Mix from Failure: Confusion-Pairing Mixup for Long-Tailed Recognition 基于失败的混合:长尾识别的混淆对混合 Youngseok Yoon, Sangwoo Hong, Hyungjoon Joo, Yao Qin, Haewon Jeong, Jungwoo Lee http://arxiv.org/pdf/2411.07621v1 null
2024-11-12 Quantum Information-Empowered Graph Neural Network for Hyperspectral Change Detection 量子信息赋能的高光谱变化检测图神经网络 Chia-Hsiang Lin, Tzu-Hsuan Lin, Jocelyn Chanussot http://arxiv.org/pdf/2411.07608v1 null
2024-11-12 SegQC: a segmentation network-based framework for multi-metric segmentation quality control and segmentation error detection in volumetric medical images 基于分割网络的体积医学图像多指标分割质量控制与分割错误检测框架:SegQC Bella Specktor-Fadida, Liat Ben-Sira, Dafna Ben-Bashat, Leo Joskowicz http://arxiv.org/pdf/2411.07601v1 null
2024-11-12 Semantic segmentation on multi-resolution optical and microwave data using deep learning 基于深度学习在多分辨率光波和微波数据上的语义分割 Jai G Singla, Bakul Vaghela http://arxiv.org/pdf/2411.07581v1 null
2024-11-12 Depthwise Separable Convolutions with Deep Residual Convolutions 深度残差可分离卷积 Md Arid Hasan, Krishno Dey http://arxiv.org/pdf/2411.07544v1 null
2024-11-12 A Novel Automatic Real-time Motion Tracking Method for Magnetic Resonance Imaging-guided Radiotherapy: Leveraging the Enhanced Tracking-Learning-Detection Framework with Automatic Segmentation 一种基于磁共振成像引导的放疗的新型自动实时运动跟踪方法:利用增强型跟踪-学习-检测框架与自动分割 Shengqi Chen, Zilin Wang, Jianrong Dai, Shirui Qin, Ying Cao, Ruiao Zhao, Jiayun Chen, Guohua Wu, Yuan Tang http://arxiv.org/pdf/2411.07503v1 null
2024-11-12 Gaussian Process Emulators for Few-Shot Segmentation in Cardiac MRI 高斯过程仿制在心脏MRI少量样本分割中的应用 Bruno Viti, Franz Thaler, Kathrin Lisa Kapper, Martin Urschler, Martin Holler, Elias Karabelas http://arxiv.org/pdf/2411.06911v2 link
2024-11-12 WavShadow: Wavelet Based Shadow Segmentation and Removal WavShadow:基于小波变换的阴影分割与去除 Shreyans Jain, Viraj Vekaria, Karan Gandhi, Aadya Arora http://arxiv.org/pdf/2411.05747v3 null
2024-11-12 CALoR: Towards Comprehensive Model Inversion Defense CALoR:迈向全面模型反演防御 Hongyao Yu, Yixiang Qiu, Hao Fang, Bin Chen, Sijin Yu, Bin Wang, Shu-Tao Xia, Ke Xu http://arxiv.org/pdf/2410.05814v2 null
2024-11-12 Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual Classification 视觉分类中的泛化逻辑推理正则化:解读你的决策 Zhaorui Tan, Xi Yang, Qiufeng Wang, Anh Nguyen, Kaizhu Huang http://arxiv.org/pdf/2410.04492v4 link
2024-11-12 Style Transfer: From Stitching to Neural Networks Art风格迁移:从拼贴到神经网络 Xinhe Xu, Zhuoer Wang, Yihan Zhang, Yizhou Liu, Zhaoyue Wang, Zhihao Xu, Muhan Zhao, Huaiying Luo http://arxiv.org/pdf/2409.00606v3 null
2024-11-12 Transfer Learning for Wildlife Classification: Evaluating YOLOv8 against DenseNet, ResNet, and VGGNet on a Custom Dataset 基于迁移学习野生动物分类:在自定义数据集上评估YOLOv8与DenseNet、ResNet和VGGNet Subek Sharma, Sisir Dhakal, Mansi Bhavsar http://arxiv.org/pdf/2408.00002v2 null
2024-11-12 Memory-Efficient Pseudo-Labeling for Online Source-Free Universal Domain Adaptation using a Gaussian Mixture Model 基于高斯混合模型的内存高效伪标签在线无源通用域自适应方法 Pascal Schlachter, Simon Wagner, Bin Yang http://arxiv.org/pdf/2407.14208v2 link
2024-11-12 Scalar Function Topology Divergence: Comparing Topology of 3D Objects 标量函数拓扑散度:比较三维对象的拓扑结构 Ilya Trofimov, Daria Voronkova, Eduard Tulchinskii, Evgeny Burnaev, Serguei Barannikov http://arxiv.org/pdf/2407.08364v3 link
2024-11-12 SCSA: Exploring the Synergistic Effects Between Spatial and Channel Attention 空间与信道注意力之间的协同效应探索:SCSA Yunzhong Si, Huiying Xu, Xinzhong Zhu, Wenhao Zhang, Yao Dong, Yuxing Chen, Hongbo Li http://arxiv.org/pdf/2407.05128v2 link
2024-11-12 Odd-One-Out: Anomaly Detection by Comparing with Neighbors 异常检测:与邻居比较的“异类”识别 Ankan Bhunia, Changjian Li, Hakan Bilen http://arxiv.org/pdf/2406.20099v2 link
2024-11-12 Utilizing Graph Generation for Enhanced Domain Adaptive Object Detection 利用图生成增强领域自适应目标检测 Mu Wang http://arxiv.org/pdf/2406.06535v3 null
2024-11-12 Human-in-the-Loop Segmentation of Multi-species Coral Imagery 人机交互的多物种珊瑚图像分割 Scarlett Raine, Ross Marchant, Brano Kusy, Frederic Maire, Niko Suenderhauf, Tobias Fischer http://arxiv.org/pdf/2404.09406v3 link
2024-11-12 CIMIL-CRC: a clinically-informed multiple instance learning framework for patient-level colorectal cancer molecular subtypes classification from H&E stained images CIMIL-CRC:基于临床信息的多实例学习框架,用于从HE染色图像中对患者级别的结直肠癌分子亚型进行分类 Hadar Hezi, Matan Gelber, Alexander Balabanov, Yosef E. Maruvka, Moti Freiman http://arxiv.org/pdf/2401.16131v2 null
2024-11-12 WildScenes: A Benchmark for 2D and 3D Semantic Segmentation in Large-scale Natural Environments WildScenes:大规模自然环境中二维和三维语义分割基准 Kavisha Vidanapathirana, Joshua Knights, Stephen Hausler, Mark Cox, Milad Ramezani, Jason Jooste, Ethan Griffiths, Shaheer Mohamed, Sridha Sridharan, Clinton Fookes, et.al. http://arxiv.org/pdf/2312.15364v2 link
2024-11-12 TraceFL: Interpretability-Driven Debugging in Federated Learning via Neuron Provenance 基于神经元溯源的联邦学习中可解释性驱动的调试:TraceFL Waris Gill, Ali Anwar, Muhammad Ali Gulzar http://arxiv.org/pdf/2312.13632v3 link
2024-11-12 TUNeS: A Temporal U-Net with Self-Attention for Video-based Surgical Phase Recognition TUNeS:一种用于视频手术阶段识别的时序U-Net和自注意力机制 Isabel Funke, Dominik Rivoir, Stefanie Krell, Stefanie Speidel http://arxiv.org/pdf/2307.09997v5 link

GNN

Publish Date Title Title_CN Authors PDF Code
2024-11-12 xCG: Explainable Cell Graphs for Survival Prediction in Non-Small Cell Lung Cancer xCG:用于非小细胞肺癌生存预测的可解释细胞图 Marvin Sextro, Gabriel Dernbach, Kai Standvoss, Simon Schallenberg, Frederick Klauschen, Klaus-Robert Müller, Maximilian Alber, Lukas Ruff http://arxiv.org/pdf/2411.07643v1 null

图像理解

Publish Date Title Title_CN Authors PDF Code
2024-11-12 MureObjectStitch: Multi-reference Image Composition 多参考图像拼接:MureObjectStitch Jiaxuan Chen, Bo Zhang, Li Niu http://arxiv.org/pdf/2411.07462v1 null

LLM

Publish Date Title Title_CN Authors PDF Code
2024-11-12 LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models LLMPhy:基于大型语言模型和世界模型的复杂物理推理 Anoop Cherian, Radu Corcodel, Siddarth Jain, Diego Romeres http://arxiv.org/pdf/2411.08027v1 null
2024-11-12 Grounded Video Caption Generation 基于情境的视频字幕生成 Evangelos Kazakos, Cordelia Schmid, Josef Sivic http://arxiv.org/pdf/2411.07584v1 null
2024-11-12 LAUREL: Learned Augmented Residual Layer 学习增强残差层 Gaurav Menghani, Ravi Kumar, Sanjiv Kumar http://arxiv.org/pdf/2411.07501v1 null
2024-11-12 Exploring Advanced Large Language Models with LLMsuite 探索高级大型语言模型:LLMsuite方法 Giorgio Roffo http://arxiv.org/pdf/2407.12036v2 null

Transformer

Publish Date Title Title_CN Authors PDF Code
2024-11-12 Rendering-Oriented 3D Point Cloud Attribute Compression using Sparse Tensor-based Transformer 基于稀疏张量Transformer的面向渲染的3D点云属性压缩 Xiao Huo, Junhui Ho, Shuai Wan, Fuzheng Yang http://arxiv.org/pdf/2411.07899v1 null
2024-11-12 Joint multi-dimensional dynamic attention and transformer for general image restoration 联合多维度动态注意力与Transformer的通用图像修复 Huan Zhang, Xu Zhang, Nian Cai, Jianglei Di, Yun Zhang http://arxiv.org/pdf/2411.07893v1 null
2024-11-12 3D Focusing-and-Matching Network for Multi-Instance Point Cloud Registration 三维聚焦与匹配网络在多实例点云配准中的应用 Liyuan Zhang, Le Hui, Qi Liu, Bo Li, Yuchao Dai http://arxiv.org/pdf/2411.07740v1 null
2024-11-12 Fast Disentangled Slim Tensor Learning for Multi-view Clustering 快速解耦精简张量学习在多视角聚类中的应用 Deng Xu, Chao Zhang, Zechao Li, Chunlin Chen, Huaxiong Li http://arxiv.org/pdf/2411.07685v1 null
2024-11-12 Breaking the Low-Rank Dilemma of Linear Attention 打破线性注意力的低秩困境 Qihang Fan, Huaibo Huang, Ran He http://arxiv.org/pdf/2411.07635v1 null
2024-11-12 Multi-task Feature Enhancement Network for No-Reference Image Quality Assessment 多任务特征增强网络用于无参考图像质量评估 Li Yu http://arxiv.org/pdf/2411.07556v1 null
2024-11-12 Extreme Rotation Estimation in the Wild 野外的极端旋转估计 Hana Bezalel, Dotan Ankri, Ruojin Cai, Hadar Averbuch-Elor http://arxiv.org/pdf/2411.07096v2 null
2024-11-12 Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution 解耦精细细节与全局几何的压缩深度图超分辨率 Huan Zheng, Wencheng Han, Jianbing Shen http://arxiv.org/pdf/2411.03239v2 null
2024-11-12 PhyTracker: An Online Tracker for Phytoplankton PhyTracker:浮游植物在线追踪器 Yang Yu, Qingxuan Lv, Yuezun Li, Zhiqiang Wei, Junyu Dong http://arxiv.org/pdf/2407.00352v2 null

3D/CG

Publish Date Title Title_CN Authors PDF Code
2024-11-12 No-Reference Point Cloud Quality Assessment via Graph Convolutional Network 基于图卷积网络的无需参考点云质量评估 Wu Chen, Qiuping Jiang, Wei Zhou, Feng Shao, Guangtao Zhai, Weisi Lin http://arxiv.org/pdf/2411.07728v1 null
2024-11-12 IR image databases generation under target intrinsic thermal variability constraints 基于目标内在热变异性约束的IR图像数据库生成 Jerome Gilles, Stephane Landeau, Tristan Dagobert, Philippe Chevalier, Christian Bolut http://arxiv.org/pdf/2411.07577v1 null
2024-11-12 ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation ReKep:机器人操作中关系关键点约束的时空推理 Wenlong Huang, Chen Wang, Yunzhu Li, Ruohan Zhang, Li Fei-Fei http://arxiv.org/pdf/2409.01652v2 null
2024-11-12 High-throughput 3D shape completion of potato tubers on a harvester 高吞吐量收获机上马铃薯块茎的三维形状补全 Pieter M. Blok, Federico Magistri, Cyrill Stachniss, Haozhou Wang, James Burridge, Wei Guo http://arxiv.org/pdf/2407.21341v3 link

各类学习方式

Publish Date Title Title_CN Authors PDF Code
2024-11-12 Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype 利用域内类别感知原型增强开放域持续学习 Yadong Lu, Shitian Zhao, Boxiang Yun, Dongsheng Jiang, Yin Li, Qingli Li, Yan Wang http://arxiv.org/pdf/2408.09984v2 null

其他

Publish Date Title Title_CN Authors PDF Code
2024-11-12 Artistic Neural Style Transfer Algorithms with Activation Smoothing 艺术风格迁移神经算法的激活平滑技术 Xiangtian Li, Han Cao, Zhaoyang Zhang, Jiacheng Hu, Yuhui Jin, Zihao Zhao http://arxiv.org/pdf/2411.08014v1 null
2024-11-12 Learning Disentangled Representations for Perceptual Point Cloud Quality Assessment via Mutual Information Minimization 通过互信息最小化学习感知点云质量评估的解耦表示 Ziyu Shan, Yujie Zhang, Yipeng Liu, Yiling Xu http://arxiv.org/pdf/2411.07936v1 null
2024-11-12 NL-SLAM for OC-VLN: Natural Language Grounded SLAM for Object-Centric VLN 基于自然语言的地基SLAM对象中心视觉语言导航 Sonia Raychaudhuri, Duy Ta, Katrina Ashton, Angel X. Chang, Jiuguang Wang, Bernadette Bucher http://arxiv.org/pdf/2411.07848v1 null
2024-11-12 SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model SAV-SE:基于选择性状态空间模型的场景感知视听语音增强 Xinyuan Qian, Jiaran Gao, Yaodan Zhang, Qiquan Zhang, Hexin Liu, Leibny Paola Garcia, Haizhou Li http://arxiv.org/pdf/2411.07751v1 null
2024-11-12 Maritime Search and Rescue Missions with Aerial Images: A Survey 基于航空图像的海洋搜救任务:综述 Juan P. Martinez-Esteso, Francisco J. Castellanos, Jorge Calvo-Zaragoza, Antonio Javier Gallego http://arxiv.org/pdf/2411.07649v1 null
2024-11-12 Atmospheric turbulence restoration by diffeomorphic image registration and blind deconvolution 基于微分形态图像配准和无迹卷积的大气湍流恢复 Jerome Gilles, Tristan Dagobert, Carlo De Franchis http://arxiv.org/pdf/2411.07578v1 null
2024-11-12 Génération de bases de données images IR sous contraintes avec variabilité thermique intrinsèque des cibles 基于内禀热变性的图像红外数据库约束生成 Jerome Gilles, Stephane Landeau, Tristan Dagobert, Philippe Chevalier, Christian Bolut http://arxiv.org/pdf/2411.07575v1 null
2024-11-12 Uncertainty-Aware Test-Time Adaptation for Inverse Consistent Diffeomorphic Lung Image Registration 基于不确定性的逆一致性形变肺图像配准的测试时自适应方法 Muhammad F. A. Chaudhary, Stephanie M. Aguilera, Arie Nakhmani, Joseph M. Reinhardt, Surya P. Bhatt, Sandeep Bodduluri http://arxiv.org/pdf/2411.07567v1 null
2024-11-12 Act in Collusion: A Persistent Distributed Multi-Target Backdoor in Federated Learning 在联盟学习中实施勾结:一种持续的分布式多目标后门 Tao Liu, Wu Yang, Chen Xu, Jiguang Lv, Huanran Wang, Yuhang Zhang, Shuchun Xu, Dapeng Man http://arxiv.org/pdf/2411.03926v2 null
2024-11-12 Bootstrapping Reinforcement Learning with Imitation for Vision-Based Agile Flight 基于模仿的自举视觉敏捷飞行强化学习 Jiaxu Xing, Angel Romero, Leonard Bauersfeld, Davide Scaramuzza http://arxiv.org/pdf/2403.12203v3 null
2024-11-12 Temporal-Mapping Photography for Event Cameras 基于时间映射的相机事件摄影 Yuhan Bao, Lei Sun, Yuqin Ma, Kaiwei Wang http://arxiv.org/pdf/2403.06443v2 link
2024-11-12 REVEX: A Unified Framework for Removal-Based Explainable Artificial Intelligence in Video REXVEX:基于移除的统一框架,用于视频中的可解释人工智能 F. Xavier Gaya-Morey, Jose M. Buades-Rubio, I. Scott MacKenzie, Cristina Manresa-Yee http://arxiv.org/pdf/2401.11796v2 null