Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-12 | Scaling Properties of Diffusion Models for Perceptual Tasks | 扩散模型在感知任务中的扩展性质 | Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran, Jitendra Malik | http://arxiv.org/pdf/2411.08034v1 | null |
2024-11-12 | GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation | 高斯万物:交互式点云潜在扩散用于3D生成 | Yushi Lan, Shangchen Zhou, Zhaoyang Lyu, Fangzhou Hong, Shuai Yang, Bo Dai, Xingang Pan, Chen Change Loy | http://arxiv.org/pdf/2411.08033v1 | null |
2024-11-12 | Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings | 波莱特隐扩散(Wala):具有紧凑小波编码的百亿参数3D生成模型 | Aditya Sanghi, Aliasghar Khani, Pradyumna Reddy, Arianna Rampini, Derek Cheung, Kamal Rahimi Malekshan, Kanika Madan, Hooman Shayani | http://arxiv.org/pdf/2411.08017v1 | null |
2024-11-12 | JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation | JanusFlow:统一多模态理解和生成中的自回归与修正流和谐化 | Yiyang Ma, Xingchao Liu, Xiaokang Chen, Wen Liu, Chengyue Wu, Zhiyu Wu, Zizheng Pan, Zhenda Xie, Haowei Zhang, Xingkai yu, et.al. | http://arxiv.org/pdf/2411.07975v1 | null |
2024-11-12 | DuoLift-GAN:Reconstructing CT from Single-view and Biplanar X-Rays with Generative Adversarial Networks | DuoLift-GAN:利用生成对抗网络从单视图和双平面X射线重建CT | Zhaoxi Zhang, Yueliang Ying | http://arxiv.org/pdf/2411.07941v1 | null |
2024-11-12 | Diverse capability and scaling of diffusion and auto-regressive models when learning abstract rules | 扩散和自回归模型在学习抽象规则时的多样性和扩展性 | Binxu Wang, Jiaqi Shang, Haim Sompolinsky | http://arxiv.org/pdf/2411.07873v1 | null |
2024-11-12 | Interaction Asymmetry: A General Principle for Learning Composable Abstractions | 交互不对称性:学习可组合抽象的通用原则 | Jack Brady, Julius von Kügelgen, Sébastien Lachapelle, Simon Buchholz, Thomas Kipf, Wieland Brendel | http://arxiv.org/pdf/2411.07784v1 | null |
2024-11-12 | Novel View Synthesis with Pixel-Space Diffusion Models | 基于像素空间扩散模型的创新视图合成 | Noam Elata, Bahjat Kawar, Yaron Ostrovsky-Berman, Miriam Farber, Ron Sokolovsky | http://arxiv.org/pdf/2411.07765v1 | null |
2024-11-12 | LapGSR: Laplacian Reconstructive Network for Guided Thermal Super-Resolution | 拉普拉斯重建网络:引导热超分辨率用拉普拉斯网络 | Aditya Kasliwal, Ishaan Gakhar, Aryan Kamani, Pratinav Seth, Ujjwal Verma | http://arxiv.org/pdf/2411.07750v1 | null |
2024-11-12 | Evaluating the Generation of Spatial Relations in Text and Image Generative Models | 评估文本和图像生成模型中的空间关系生成 | Shang Hong Sim, Clarence Lee, Alvin Tan, Cheston Tan | http://arxiv.org/pdf/2411.07664v1 | null |
2024-11-12 | Leveraging Previous Steps: A Training-free Fast Solver for Flow Diffusion | 利用先前步骤:一种无训练的快速流扩散求解器 | Kaiyu Song, Hanjiang Lai | http://arxiv.org/pdf/2411.07627v1 | null |
2024-11-12 | Unraveling the Connections between Flow Matching and Diffusion Probabilistic Models in Training-free Conditional Generation | 揭示流匹配与扩散之间的联系:无监督条件生成中的概率模型 | Kaiyu Song, Hanjiang Lai | http://arxiv.org/pdf/2411.07625v1 | null |
2024-11-12 | Artificial Intelligence for Biomedical Video Generation | 人工智能在生物医学视频生成中的应用 | Linyuan Li, Jianing Qiu, Anujit Saha, Lin Li, Poyuan Li, Mengxian He, Ziyu Guo, Wu Yuan | http://arxiv.org/pdf/2411.07619v1 | null |
2024-11-12 | Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors | 半真图像:用于评估AI生成图像检测器鲁棒性的大规模数据集 | Anisha Pal, Julia Kruk, Mansi Phute, Manognya Bhattaram, Diyi Yang, Duen Horng Chau, Judy Hoffman | http://arxiv.org/pdf/2411.07472v1 | null |
2024-11-12 | Tracing the Roots: Leveraging Temporal Dynamics in Diffusion Trajectories for Origin Attribution | 追踪根源:利用扩散轨迹的时间动态进行起源归因 | Andreas Floros, Seyed-Mohsen Moosavi-Dezfooli, Pier Luigi Dragotti | http://arxiv.org/pdf/2411.07449v1 | null |
2024-11-12 | All-in-one Weather-degraded Image Restoration via Adaptive Degradation-aware Self-prompting Model | 一体化恶劣天气退化图像修复:自适应退化感知自提示模型 | Yuanbo Wen, Tao Gao, Ziqi Li, Jing Zhang, Kaihao Zhang, Ting Chen | http://arxiv.org/pdf/2411.07445v1 | null |
2024-11-12 | Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models | 基于预训练扩散模型的免训练图像对象插入 | Yoad Tewel, Rinon Gal, Dvir Samuel, Yuval Atzmon, Lior Wolf, Gal Chechik | http://arxiv.org/pdf/2411.07232v2 | null |
2024-11-12 | CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense | 因果扩散模型驱动的对抗防御去耦 | Mingkun Zhang, Keping Bi, Wei Chen, Quanrun Chen, Jiafeng Guo, Xueqi Cheng | http://arxiv.org/pdf/2410.23091v3 | null |
2024-11-12 | Transformer-Based Tooth Alignment Prediction With Occlusion And Collision Constraints | 基于Transformer的考虑遮挡和碰撞约束的牙齿对齐预测 | ZhenXing Dong, JiaZhou Chen, YangHui Xu | http://arxiv.org/pdf/2410.20806v3 | null |
2024-11-12 | HYPNOS : Highly Precise Foreground-focused Diffusion Finetuning for Inanimate Objects | HYPNOS:针对非生物物体的高精度前景聚焦扩散微调 | Oliverio Theophilus Nathanael, Jonathan Samuel Lumentut, Nicholas Hans Muliawan, Edbert Valencio Angky, Felix Indra Kurniadi, Alfi Yusrotis Zakiyyah, Jeklin Harefa | http://arxiv.org/pdf/2410.14265v2 | null |
2024-11-12 | Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization | 调整配对样本优化的时间步长蒸馏扩散模型 | Zichen Miao, Zhengyuan Yang, Kevin Lin, Ze Wang, Zicheng Liu, Lijuan Wang, Qiang Qiu | http://arxiv.org/pdf/2410.03190v2 | null |
2024-11-12 | GenRec: Unifying Video Generation and Recognition with Diffusion Models | GenRec:通过扩散模型统一视频生成与识别 | Zejia Weng, Xitong Yang, Zhen Xing, Zuxuan Wu, Yu-Gang Jiang | http://arxiv.org/pdf/2408.15241v2 | null |
2024-11-12 | Leveraging Pre-trained Models for FF-to-FFPE Histopathological Image Translation | 利用预训练模型进行FF至FFPE病理图像转换 | Qilai Zhang, Jiawen Li, Peiran Liao, Jiali Hu, Tian Guan, Anjia Han, Yonghong He | http://arxiv.org/pdf/2406.18054v2 | link |
2024-11-12 | Neural Gaffer: Relighting Any Object via Diffusion | 神经胶水:通过扩散重光照任何物体 | Haian Jin, Yuan Li, Fujun Luan, Yuanbo Xiangli, Sai Bi, Kai Zhang, Zexiang Xu, Jin Sun, Noah Snavely | http://arxiv.org/pdf/2406.07520v3 | null |
2024-11-12 | Video Diffusion Models are Training-free Motion Interpreter and Controller | 视频扩散模型:无需训练的运动解释器和控制器 | Zeqi Xiao, Yifan Zhou, Shuai Yang, Xingang Pan | http://arxiv.org/pdf/2405.14864v3 | null |
2024-11-12 | Functional Imaging Constrained Diffusion for Brain PET Synthesis from Structural MRI | 基于功能成像约束的扩散脑PET合成从结构MRI | Minhui Yu, Mengqi Wu, Ling Yue, Andrea Bozoki, Mingxia Liu | http://arxiv.org/pdf/2405.02504v3 | null |
2024-11-12 | Improving Training-free Conditional Diffusion Model via Fisher Information | 通过Fisher信息改进无训练条件扩散模型 | Kaiyu Song, Hanjiang Lai | http://arxiv.org/pdf/2404.18252v2 | null |
2024-11-12 | Exploring Diverse Methods in Visual Question Answering | 探索视觉问答中的多种方法 | Panfeng Li, Qikai Yang, Xieming Geng, Wenjing Zhou, Zhicheng Ding, Yi Nian | http://arxiv.org/pdf/2404.13565v3 | null |
2024-11-12 | DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling | 基于高斯点撒的联合相关建模的3D场景生成:DreamScape | Xuening Yuan, Hongyu Yang, Yueming Zhao, Di Huang | http://arxiv.org/pdf/2404.09227v2 | null |
2024-11-12 | Diffusion Models Meet Remote Sensing: Principles, Methods, and Perspectives | 扩散模型与遥感:原理、方法与展望 | Yidan Liu, Jun Yue, Shaobo Xia, Pedram Ghamisi, Weiying Xie, Leyuan Fang | http://arxiv.org/pdf/2404.08926v3 | null |
2024-11-12 | LaRE^2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection | 基于潜在重建误差的扩散生成图像检测方法:LaRE^2 | Yunpeng Luo, Junlong Du, Ke Yan, Shouhong Ding | http://arxiv.org/pdf/2403.17465v3 | null |
2024-11-12 | LEO: Generative Latent Image Animator for Human Video Synthesis | LEO:用于人类视频合成的生成式潜在图像动画器 | Yaohui Wang, Xin Ma, Xinyuan Chen, Cunjian Chen, Antitza Dantcheva, Bo Dai, Yu Qiao | http://arxiv.org/pdf/2305.03989v3 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-12 | Commissioning An All-Sky Infrared Camera Array for Detection Of Airborne Objects | 全天空红外相机阵列的启动用于空中目标检测 | Laura Dominé, Ankit Biswas, Richard Cloete, Alex Delacroix, Andriy Fedorenko, Lucas Jacaruso, Ezra Kelderman, Eric Keto, Sarah Little, Abraham Loeb, et.al. | http://arxiv.org/pdf/2411.07956v1 | null |
2024-11-12 | SimBase: A Simple Baseline for Temporal Video Grounding | SimBase:一种简单的时间视频定位基线 | Peijun Bao, Alex C. Kot | http://arxiv.org/pdf/2411.07945v1 | null |
2024-11-12 | Towards Vision Mixture of Experts for Wildlife Monitoring on the Edge | 面向边缘野生动物监测的视觉混合专家算法 | Emmanuel Azuh Mensah, Anderson Lee, Haoran Zhang, Yitong Shan, Kurtis Heimerl | http://arxiv.org/pdf/2411.07834v1 | null |
2024-11-12 | Constraint Learning for Parametric Point Cloud | 参数点云的约束学习 | Xi Cheng, Ruiqi Lei, Di Huang, Zhichao Liao, Fengyuan Piao, Yan Chen, Pingfa Feng, Long Zeng | http://arxiv.org/pdf/2411.07747v1 | null |
2024-11-12 | Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG | 基于ImageRAG增强超高清遥感图像分析 | Zilun Zhang, Haozhan Shen, Tiancheng Zhao, Yuhao Wang, Bin Chen, Yuxiang Cai, Yongheng Shang, Jianwei Yin | http://arxiv.org/pdf/2411.07688v1 | null |
2024-11-12 | Understanding Audiovisual Deepfake Detection: Techniques, Challenges, Human Factors and Perceptual Insights | 理解视听深度伪造检测:技术、挑战、人因因素与感知洞察 | Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, Hsin-Min Wang | http://arxiv.org/pdf/2411.07650v1 | null |
2024-11-12 | Contrastive Language Prompting to Ease False Positives in Medical Anomaly Detection | 对比语言提示以缓解医学异常检测中的误报 | YeongHyeon Park, Myung Jin Kim, Hyeong Seok Kim | http://arxiv.org/pdf/2411.07546v1 | null |
2024-11-12 | SparrowVQE: Visual Question Explanation for Course Content Understanding | SparrowVQE:课程内容理解的视觉问题解释 | Jialu Li, Manish Kumar Thota, Ruslan Gokhman, Radek Holik, Youshan Zhang | http://arxiv.org/pdf/2411.07516v1 | null |
2024-11-12 | MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data | MSEG-VCUQ:基于增强视觉基础模型、卷积神经网络和不确定性量化进行多模态分割的高速视频相位检测数据处理 | Chika Maduabuchi, Ericmoore Jossou, Matteo Bucci | http://arxiv.org/pdf/2411.07463v1 | null |
2024-11-12 | BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions | BLIP3-KALE:知识增强的大规模密集式标题 | Anas Awadalla, Le Xue, Manli Shu, An Yan, Jun Wang, Senthil Purushwalkam, Sheng Shen, Hannah Lee, Oscar Lo, Jae Sung Park, et.al. | http://arxiv.org/pdf/2411.07461v1 | null |
2024-11-12 | LLMs Can Evolve Continually on Modality for X-Modal Reasoning | 大型语言模型能够在模态上持续进化以实现跨模态推理 | Jiazuo Yu, Haomiao Xiong, Lu Zhang, Haiwen Diao, Yunzhi Zhuge, Lanqing Hong, Dong Wang, Huchuan Lu, You He, Long Chen | http://arxiv.org/pdf/2410.20178v2 | link |
2024-11-12 | MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks | MEGA-Bench:将多模态评估扩展至超过500个真实世界任务 | Jiacheng Chen, Tianhao Liang, Sherman Siu, Zhengqing Wang, Kai Wang, Yubo Wang, Yuansheng Ni, Wang Zhu, Ziyan Jiang, Bohan Lyu, et.al. | http://arxiv.org/pdf/2410.10563v2 | null |
2024-11-12 | MIRAGE: Multimodal Identification and Recognition of Annotations in Indian General Prescriptions | MIRAGE:印度通用处方中多模态注释识别与识别 | Tavish Mankash, V. S. Chaithanya Kota, Anish De, Praveen Prakash, Kshitij Jadhav | http://arxiv.org/pdf/2410.09729v2 | null |
2024-11-12 | Adapting Segment Anything Model to Multi-modal Salient Object Detection with Semantic Feature Fusion Guidance | 适配语义特征融合引导的多模态显著目标检测的Segment Anything模型 | Kunpeng Wang, Danying Lin, Chenglong Li, Zhengzheng Tu, Bin Luo | http://arxiv.org/pdf/2408.15063v4 | link |
2024-11-12 | L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection | L4DR:用于天气鲁棒的3D目标检测的激光雷达-雷达融合 | Xun Huang, Ziyu Xu, Hai Wu, Jinlong Wang, Qiming Xia, Yan Xia, Jonathan Li, Kyle Gao, Chenglu Wen, Cheng Wang | http://arxiv.org/pdf/2408.03677v4 | null |
2024-11-12 | Pseudo-triplet Guided Few-shot Composed Image Retrieval | 伪三元组引导的少样本合成图像检索 | Bohan Hou, Haoqiang Lin, Haokun Wen, Meng Liu, Mingzhu Xu, Xuemeng Song | http://arxiv.org/pdf/2407.06001v2 | null |
2024-11-12 | MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations | MMLongBench-Doc:基于可视化的大语境文档理解基准测试 | Yubo Ma, Yuhang Zang, Liangyu Chen, Meiqi Chen, Yizhu Jiao, Xinze Li, Xinyuan Lu, Ziyu Liu, Yan Ma, Xiaoyi Dong, et.al. | http://arxiv.org/pdf/2407.01523v3 | null |
2024-11-12 | OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer | OmAgent:复杂视频理解的多模态代理框架——任务划分与征服 | Lu Zhang, Tiancheng Zhao, Heting Ying, Yibo Ma, Kyusong Lee | http://arxiv.org/pdf/2406.16620v3 | link |
2024-11-12 | Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags | 通过检索标签提醒多模态大型语言模型具备物体感知知识 | Daiqing Qi, Handong Zhao, Zijun Wei, Sheng Li | http://arxiv.org/pdf/2406.10839v3 | null |
2024-11-12 | Enhance Image-to-Image Generation with LLaVA-generated Prompts | 利用LLaVA生成的提示增强图像到图像生成 | Zhicheng Ding, Panfeng Li, Qikai Yang, Siyang Li | http://arxiv.org/pdf/2406.01956v3 | null |
2024-11-12 | Meta-Learned Modality-Weighted Knowledge Distillation for Robust Multi-Modal Learning with Missing Data | 元学习模态加权知识蒸馏,用于具有缺失数据的鲁棒多模态学习 | Hu Wang, Salma Hassan, Yuyuan Liu, Congbo Ma, Yuanhong Chen, Yutong Xie, Mostafa Salem, Yu Tian, Jodie Avery, Louise Hull, et.al. | http://arxiv.org/pdf/2405.07155v2 | link |
2024-11-12 | Revisiting the Adversarial Robustness of Vision Language Models: a Multimodal Perspective | 重新审视视觉语言模型的对抗鲁棒性:多模态视角 | Wanqi Zhou, Shuanghao Bai, Danilo P. Mandic, Qibin Zhao, Badong Chen | http://arxiv.org/pdf/2404.19287v3 | link |
2024-11-12 | How Does the Textual Information Affect the Retrieval of Multimodal In-Context Learning? | 文本信息如何影响多模态情境学习的检索? | Yang Luo, Zangwei Zheng, Zirui Zhu, Yang You | http://arxiv.org/pdf/2404.12866v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-12 | Material Transforms from Disentangled NeRF Representations | 基于解耦NeRF表示的物质变换 | Ivan Lopes, Jean-François Lalonde, Raoul de Charette | http://arxiv.org/pdf/2411.08037v1 | null |
2024-11-12 | LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS | 光高斯:无界3D高斯压缩,15倍缩减与200+ FPS | Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang | http://arxiv.org/pdf/2311.17245v6 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-12 | Projecting Gaussian Ellipsoids While Avoiding Affine Projection Approximation | 在避免仿射投影逼近的同时投影高斯椭球体 | Han Qi, Tao Cai, Xiyue Han | http://arxiv.org/pdf/2411.07579v1 | null |
2024-11-12 | GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting | 高斯切割:基于图割的3D高斯分层交互式分割 | Umangi Jain, Ashkan Mirzaei, Igor Gilitschenski | http://arxiv.org/pdf/2411.07555v1 | null |
2024-11-12 | HiCoM: Hierarchical Coherent Motion for Streamable Dynamic Scene with 3D Gaussian Splatting | HiCoM:基于3D高斯拼贴的可流式动态场景分层连贯运动 | Qiankun Gao, Jiarui Meng, Chengxiang Wen, Jie Chen, Jian Zhang | http://arxiv.org/pdf/2411.07541v1 | null |
2024-11-12 | GUS-IR: Gaussian Splatting with Unified Shading for Inverse Rendering | GUS-IR:基于统一着色的高斯溅射逆渲染 | Zhihao Liang, Hongdong Li, Kui Jia, Kailing Guo, Qi Zhang | http://arxiv.org/pdf/2411.07478v1 | null |
2024-11-12 | SplatFormer: Point Transformer for Robust 3D Gaussian Splatting | SplatFormer:用于鲁棒3D高斯Splatting的点变换器 | Yutong Chen, Marko Mihajlovic, Xiyi Chen, Yiming Wang, Sergey Prokudin, Siyu Tang | http://arxiv.org/pdf/2411.06390v2 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-12 | DINO-LG: A Task-Specific DINO Model for Coronary Calcium Scoring | DINO-LG:针对冠状动脉钙化评分的特定任务DINO模型 | Mahmut S. Gokmen, Cody Bumgardner, Caner Ozcan | http://arxiv.org/pdf/2411.07976v1 | null |
2024-11-12 | Efficient 3D Perception on Multi-Sweep Point Cloud with Gumbel Spatial Pruning | 基于Gumbel空间剪枝的多扫点云高效3D感知 | Jianhao Li, Tianyu Sun, Xueqian Zhang, Zhongdao Wang, Bailan Feng, Hengshuang Zhao | http://arxiv.org/pdf/2411.07742v1 | null |
2024-11-12 | Quantifying Knowledge Distillation Using Partial Information Decomposition | 基于部分信息分解的知识蒸馏量化 | Pasan Dissanayake, Faisal Hamman, Barproda Halder, Ilia Sucholutsky, Qiuyi Zhang, Sanghamitra Dutta | http://arxiv.org/pdf/2411.07483v1 | null |
2024-11-12 | Zero-Shot NAS via the Suppression of Local Entropy Decrease | 通过抑制局部熵减的零样本NAS | Ning Wu, Han Huang, Yueting Xu, Zhifeng Hao | http://arxiv.org/pdf/2411.06236v2 | null |
2024-11-12 | TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration | 异构智能体协作迁移视觉-语言基础模型 | Yiwei Guo, Shaobin Zhuang, Kunchang Li, Yu Qiao, Yali Wang | http://arxiv.org/pdf/2410.12183v2 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-12 | Automatic dataset shift identification to support root cause analysis of AI performance drift | 自动数据集偏移识别以支持AI性能漂移的根本原因分析 | Mélanie Roschewitz, Raghav Mehta, Charles Jones, Ben Glocker | http://arxiv.org/pdf/2411.07940v1 | null |
2024-11-12 | Isometric Transformations for Image Augmentation in Mueller Matrix Polarimetry | 等距变换在穆勒矩阵极化光度测量图像增强中的应用 | Christopher Hahne, Omar Rodriguez-Nunez, Éléa Gros, Théotim Lucas, Ekkehard Hewer, Tatiana Novikova, Theoni Maragkou, Philippe Schucht, Richard McKinley | http://arxiv.org/pdf/2411.07918v1 | null |
2024-11-12 | TLDR: Traffic Light Detection using Fourier Domain Adaptation in Hostile WeatheR | 基于傅里叶域自适应的恶劣天气下交通灯检测 | Ishaan Gakhar, Aryesh Guha, Aryaman Gupta, Amit Agarwal, Durga Toshniwal, Ujjwal Verma | http://arxiv.org/pdf/2411.07901v1 | null |
2024-11-12 | INTRABENCH: Interactive Radiological Benchmark | INTRABENCH:交互式放射学基准 | Constantin Ulrich, Tassilo Wald, Emily Tempus, Maximilian Rokuss, Paul F. Jaeger, Klaus Maier-Hein | http://arxiv.org/pdf/2411.07885v1 | null |
2024-11-12 | CDXFormer: Boosting Remote Sensing Change Detection with Extended Long Short-Term Memory | CDXFormer:借助扩展长短期记忆增强遥感变化检测 | Zhenkai Wu, Xiaowen Ma, Rongrong Lian, Zhentao Lin, Wei Zhang | http://arxiv.org/pdf/2411.07863v1 | null |
2024-11-12 | Large-scale Remote Sensing Image Target Recognition and Automatic Annotation | 大规模遥感图像目标识别与自动标注 | Wuzheng Dong | http://arxiv.org/pdf/2411.07802v1 | null |
2024-11-12 | Horticultural Temporal Fruit Monitoring via 3D Instance Segmentation and Re-Identification using Point Clouds | 基于点云的3D实例分割与再识别的园艺水果时空监测 | Daniel Fusaro, Federico Magistri, Jens Behley, Alberto Pretto, Cyrill Stachniss | http://arxiv.org/pdf/2411.07799v1 | null |
2024-11-12 | AdaSemiCD: An Adaptive Semi-Supervised Change Detection Method Based on Pseudo-Label Evaluation | AdaSemiCD:基于伪标签评估的自适应半监督变化检测方法 | Ran Lingyan, Wen Dongcheng, Zhuo Tao, Zhang Shizhou, Zhang Xiuwei, Zhang Yanning | http://arxiv.org/pdf/2411.07758v1 | null |
2024-11-12 | ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction | 自适应提升型3D语义占用与成本体积基础流量预测 | Dubing Chen, Jin Fang, Wencheng Han, Xinjing Cheng, Junbo Yin, Chenzhong Xu, Fahad Shahbaz Khan, Jianbing Shen | http://arxiv.org/pdf/2411.07725v1 | null |
2024-11-12 | EMPERROR: A Flexible Generative Perception Error Model for Probing Self-Driving Planners | EMPERROR:用于探测自动驾驶规划器的灵活生成感知错误模型 | Niklas Hanselmann, Simon Doll, Marius Cordts, Hendrik P. A. Lensch, Andreas Geiger | http://arxiv.org/pdf/2411.07719v1 | null |
2024-11-12 | Emotion Classification of Children Expressions | 儿童表情情绪分类 | Sanchayan Vivekananthan | http://arxiv.org/pdf/2411.07708v1 | null |
2024-11-12 | AI enhanced diagnosis of Peyronies disease a novel approach using Computer Vision | 基于计算机视觉的AI增强勃起功能障碍诊断:一种新方法 | Yudara Kularathne, Janitha Prathapa, Prarththanan Sothyrajah, Salomi Arasaratnam, Sithira Ambepitiya, Thanveer Ahamed, Dinuka Wijesundara | http://arxiv.org/pdf/2411.07684v1 | null |
2024-11-12 | HMIL: Hierarchical Multi-Instance Learning for Fine-Grained Whole Slide Image Classification | 分层多实例学习用于细粒度全切片图像分类 | Cheng Jin, Luyang Luo, Huangjing Lin, Jun Hou, Hao Chen | http://arxiv.org/pdf/2411.07660v1 | null |
2024-11-12 | Mix from Failure: Confusion-Pairing Mixup for Long-Tailed Recognition | 基于失败的混合:长尾识别的混淆对混合 | Youngseok Yoon, Sangwoo Hong, Hyungjoon Joo, Yao Qin, Haewon Jeong, Jungwoo Lee | http://arxiv.org/pdf/2411.07621v1 | null |
2024-11-12 | Quantum Information-Empowered Graph Neural Network for Hyperspectral Change Detection | 量子信息赋能的高光谱变化检测图神经网络 | Chia-Hsiang Lin, Tzu-Hsuan Lin, Jocelyn Chanussot | http://arxiv.org/pdf/2411.07608v1 | null |
2024-11-12 | SegQC: a segmentation network-based framework for multi-metric segmentation quality control and segmentation error detection in volumetric medical images | 基于分割网络的体积医学图像多指标分割质量控制与分割错误检测框架:SegQC | Bella Specktor-Fadida, Liat Ben-Sira, Dafna Ben-Bashat, Leo Joskowicz | http://arxiv.org/pdf/2411.07601v1 | null |
2024-11-12 | Semantic segmentation on multi-resolution optical and microwave data using deep learning | 基于深度学习在多分辨率光波和微波数据上的语义分割 | Jai G Singla, Bakul Vaghela | http://arxiv.org/pdf/2411.07581v1 | null |
2024-11-12 | Depthwise Separable Convolutions with Deep Residual Convolutions | 深度残差可分离卷积 | Md Arid Hasan, Krishno Dey | http://arxiv.org/pdf/2411.07544v1 | null |
2024-11-12 | A Novel Automatic Real-time Motion Tracking Method for Magnetic Resonance Imaging-guided Radiotherapy: Leveraging the Enhanced Tracking-Learning-Detection Framework with Automatic Segmentation | 一种基于磁共振成像引导的放疗的新型自动实时运动跟踪方法:利用增强型跟踪-学习-检测框架与自动分割 | Shengqi Chen, Zilin Wang, Jianrong Dai, Shirui Qin, Ying Cao, Ruiao Zhao, Jiayun Chen, Guohua Wu, Yuan Tang | http://arxiv.org/pdf/2411.07503v1 | null |
2024-11-12 | Gaussian Process Emulators for Few-Shot Segmentation in Cardiac MRI | 高斯过程仿制在心脏MRI少量样本分割中的应用 | Bruno Viti, Franz Thaler, Kathrin Lisa Kapper, Martin Urschler, Martin Holler, Elias Karabelas | http://arxiv.org/pdf/2411.06911v2 | link |
2024-11-12 | WavShadow: Wavelet Based Shadow Segmentation and Removal | WavShadow:基于小波变换的阴影分割与去除 | Shreyans Jain, Viraj Vekaria, Karan Gandhi, Aadya Arora | http://arxiv.org/pdf/2411.05747v3 | null |
2024-11-12 | CALoR: Towards Comprehensive Model Inversion Defense | CALoR:迈向全面模型反演防御 | Hongyao Yu, Yixiang Qiu, Hao Fang, Bin Chen, Sijin Yu, Bin Wang, Shu-Tao Xia, Ke Xu | http://arxiv.org/pdf/2410.05814v2 | null |
2024-11-12 | Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual Classification | 视觉分类中的泛化逻辑推理正则化:解读你的决策 | Zhaorui Tan, Xi Yang, Qiufeng Wang, Anh Nguyen, Kaizhu Huang | http://arxiv.org/pdf/2410.04492v4 | link |
2024-11-12 | Style Transfer: From Stitching to Neural Networks | Art风格迁移:从拼贴到神经网络 | Xinhe Xu, Zhuoer Wang, Yihan Zhang, Yizhou Liu, Zhaoyue Wang, Zhihao Xu, Muhan Zhao, Huaiying Luo | http://arxiv.org/pdf/2409.00606v3 | null |
2024-11-12 | Transfer Learning for Wildlife Classification: Evaluating YOLOv8 against DenseNet, ResNet, and VGGNet on a Custom Dataset | 基于迁移学习野生动物分类:在自定义数据集上评估YOLOv8与DenseNet、ResNet和VGGNet | Subek Sharma, Sisir Dhakal, Mansi Bhavsar | http://arxiv.org/pdf/2408.00002v2 | null |
2024-11-12 | Memory-Efficient Pseudo-Labeling for Online Source-Free Universal Domain Adaptation using a Gaussian Mixture Model | 基于高斯混合模型的内存高效伪标签在线无源通用域自适应方法 | Pascal Schlachter, Simon Wagner, Bin Yang | http://arxiv.org/pdf/2407.14208v2 | link |
2024-11-12 | Scalar Function Topology Divergence: Comparing Topology of 3D Objects | 标量函数拓扑散度:比较三维对象的拓扑结构 | Ilya Trofimov, Daria Voronkova, Eduard Tulchinskii, Evgeny Burnaev, Serguei Barannikov | http://arxiv.org/pdf/2407.08364v3 | link |
2024-11-12 | SCSA: Exploring the Synergistic Effects Between Spatial and Channel Attention | 空间与信道注意力之间的协同效应探索:SCSA | Yunzhong Si, Huiying Xu, Xinzhong Zhu, Wenhao Zhang, Yao Dong, Yuxing Chen, Hongbo Li | http://arxiv.org/pdf/2407.05128v2 | link |
2024-11-12 | Odd-One-Out: Anomaly Detection by Comparing with Neighbors | 异常检测:与邻居比较的“异类”识别 | Ankan Bhunia, Changjian Li, Hakan Bilen | http://arxiv.org/pdf/2406.20099v2 | link |
2024-11-12 | Utilizing Graph Generation for Enhanced Domain Adaptive Object Detection | 利用图生成增强领域自适应目标检测 | Mu Wang | http://arxiv.org/pdf/2406.06535v3 | null |
2024-11-12 | Human-in-the-Loop Segmentation of Multi-species Coral Imagery | 人机交互的多物种珊瑚图像分割 | Scarlett Raine, Ross Marchant, Brano Kusy, Frederic Maire, Niko Suenderhauf, Tobias Fischer | http://arxiv.org/pdf/2404.09406v3 | link |
2024-11-12 | CIMIL-CRC: a clinically-informed multiple instance learning framework for patient-level colorectal cancer molecular subtypes classification from H&E stained images | CIMIL-CRC:基于临床信息的多实例学习框架,用于从HE染色图像中对患者级别的结直肠癌分子亚型进行分类 | Hadar Hezi, Matan Gelber, Alexander Balabanov, Yosef E. Maruvka, Moti Freiman | http://arxiv.org/pdf/2401.16131v2 | null |
2024-11-12 | WildScenes: A Benchmark for 2D and 3D Semantic Segmentation in Large-scale Natural Environments | WildScenes:大规模自然环境中二维和三维语义分割基准 | Kavisha Vidanapathirana, Joshua Knights, Stephen Hausler, Mark Cox, Milad Ramezani, Jason Jooste, Ethan Griffiths, Shaheer Mohamed, Sridha Sridharan, Clinton Fookes, et.al. | http://arxiv.org/pdf/2312.15364v2 | link |
2024-11-12 | TraceFL: Interpretability-Driven Debugging in Federated Learning via Neuron Provenance | 基于神经元溯源的联邦学习中可解释性驱动的调试:TraceFL | Waris Gill, Ali Anwar, Muhammad Ali Gulzar | http://arxiv.org/pdf/2312.13632v3 | link |
2024-11-12 | TUNeS: A Temporal U-Net with Self-Attention for Video-based Surgical Phase Recognition | TUNeS:一种用于视频手术阶段识别的时序U-Net和自注意力机制 | Isabel Funke, Dominik Rivoir, Stefanie Krell, Stefanie Speidel | http://arxiv.org/pdf/2307.09997v5 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-12 | xCG: Explainable Cell Graphs for Survival Prediction in Non-Small Cell Lung Cancer | xCG:用于非小细胞肺癌生存预测的可解释细胞图 | Marvin Sextro, Gabriel Dernbach, Kai Standvoss, Simon Schallenberg, Frederick Klauschen, Klaus-Robert Müller, Maximilian Alber, Lukas Ruff | http://arxiv.org/pdf/2411.07643v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-12 | MureObjectStitch: Multi-reference Image Composition | 多参考图像拼接:MureObjectStitch | Jiaxuan Chen, Bo Zhang, Li Niu | http://arxiv.org/pdf/2411.07462v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-12 | LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models | LLMPhy:基于大型语言模型和世界模型的复杂物理推理 | Anoop Cherian, Radu Corcodel, Siddarth Jain, Diego Romeres | http://arxiv.org/pdf/2411.08027v1 | null |
2024-11-12 | Grounded Video Caption Generation | 基于情境的视频字幕生成 | Evangelos Kazakos, Cordelia Schmid, Josef Sivic | http://arxiv.org/pdf/2411.07584v1 | null |
2024-11-12 | LAUREL: Learned Augmented Residual Layer | 学习增强残差层 | Gaurav Menghani, Ravi Kumar, Sanjiv Kumar | http://arxiv.org/pdf/2411.07501v1 | null |
2024-11-12 | Exploring Advanced Large Language Models with LLMsuite | 探索高级大型语言模型:LLMsuite方法 | Giorgio Roffo | http://arxiv.org/pdf/2407.12036v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-12 | Rendering-Oriented 3D Point Cloud Attribute Compression using Sparse Tensor-based Transformer | 基于稀疏张量Transformer的面向渲染的3D点云属性压缩 | Xiao Huo, Junhui Ho, Shuai Wan, Fuzheng Yang | http://arxiv.org/pdf/2411.07899v1 | null |
2024-11-12 | Joint multi-dimensional dynamic attention and transformer for general image restoration | 联合多维度动态注意力与Transformer的通用图像修复 | Huan Zhang, Xu Zhang, Nian Cai, Jianglei Di, Yun Zhang | http://arxiv.org/pdf/2411.07893v1 | null |
2024-11-12 | 3D Focusing-and-Matching Network for Multi-Instance Point Cloud Registration | 三维聚焦与匹配网络在多实例点云配准中的应用 | Liyuan Zhang, Le Hui, Qi Liu, Bo Li, Yuchao Dai | http://arxiv.org/pdf/2411.07740v1 | null |
2024-11-12 | Fast Disentangled Slim Tensor Learning for Multi-view Clustering | 快速解耦精简张量学习在多视角聚类中的应用 | Deng Xu, Chao Zhang, Zechao Li, Chunlin Chen, Huaxiong Li | http://arxiv.org/pdf/2411.07685v1 | null |
2024-11-12 | Breaking the Low-Rank Dilemma of Linear Attention | 打破线性注意力的低秩困境 | Qihang Fan, Huaibo Huang, Ran He | http://arxiv.org/pdf/2411.07635v1 | null |
2024-11-12 | Multi-task Feature Enhancement Network for No-Reference Image Quality Assessment | 多任务特征增强网络用于无参考图像质量评估 | Li Yu | http://arxiv.org/pdf/2411.07556v1 | null |
2024-11-12 | Extreme Rotation Estimation in the Wild | 野外的极端旋转估计 | Hana Bezalel, Dotan Ankri, Ruojin Cai, Hadar Averbuch-Elor | http://arxiv.org/pdf/2411.07096v2 | null |
2024-11-12 | Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution | 解耦精细细节与全局几何的压缩深度图超分辨率 | Huan Zheng, Wencheng Han, Jianbing Shen | http://arxiv.org/pdf/2411.03239v2 | null |
2024-11-12 | PhyTracker: An Online Tracker for Phytoplankton | PhyTracker:浮游植物在线追踪器 | Yang Yu, Qingxuan Lv, Yuezun Li, Zhiqiang Wei, Junyu Dong | http://arxiv.org/pdf/2407.00352v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-12 | No-Reference Point Cloud Quality Assessment via Graph Convolutional Network | 基于图卷积网络的无需参考点云质量评估 | Wu Chen, Qiuping Jiang, Wei Zhou, Feng Shao, Guangtao Zhai, Weisi Lin | http://arxiv.org/pdf/2411.07728v1 | null |
2024-11-12 | IR image databases generation under target intrinsic thermal variability constraints | 基于目标内在热变异性约束的IR图像数据库生成 | Jerome Gilles, Stephane Landeau, Tristan Dagobert, Philippe Chevalier, Christian Bolut | http://arxiv.org/pdf/2411.07577v1 | null |
2024-11-12 | ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation | ReKep:机器人操作中关系关键点约束的时空推理 | Wenlong Huang, Chen Wang, Yunzhu Li, Ruohan Zhang, Li Fei-Fei | http://arxiv.org/pdf/2409.01652v2 | null |
2024-11-12 | High-throughput 3D shape completion of potato tubers on a harvester | 高吞吐量收获机上马铃薯块茎的三维形状补全 | Pieter M. Blok, Federico Magistri, Cyrill Stachniss, Haozhou Wang, James Burridge, Wei Guo | http://arxiv.org/pdf/2407.21341v3 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-12 | Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype | 利用域内类别感知原型增强开放域持续学习 | Yadong Lu, Shitian Zhao, Boxiang Yun, Dongsheng Jiang, Yin Li, Qingli Li, Yan Wang | http://arxiv.org/pdf/2408.09984v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-11-12 | Artistic Neural Style Transfer Algorithms with Activation Smoothing | 艺术风格迁移神经算法的激活平滑技术 | Xiangtian Li, Han Cao, Zhaoyang Zhang, Jiacheng Hu, Yuhui Jin, Zihao Zhao | http://arxiv.org/pdf/2411.08014v1 | null |
2024-11-12 | Learning Disentangled Representations for Perceptual Point Cloud Quality Assessment via Mutual Information Minimization | 通过互信息最小化学习感知点云质量评估的解耦表示 | Ziyu Shan, Yujie Zhang, Yipeng Liu, Yiling Xu | http://arxiv.org/pdf/2411.07936v1 | null |
2024-11-12 | NL-SLAM for OC-VLN: Natural Language Grounded SLAM for Object-Centric VLN | 基于自然语言的地基SLAM对象中心视觉语言导航 | Sonia Raychaudhuri, Duy Ta, Katrina Ashton, Angel X. Chang, Jiuguang Wang, Bernadette Bucher | http://arxiv.org/pdf/2411.07848v1 | null |
2024-11-12 | SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model | SAV-SE:基于选择性状态空间模型的场景感知视听语音增强 | Xinyuan Qian, Jiaran Gao, Yaodan Zhang, Qiquan Zhang, Hexin Liu, Leibny Paola Garcia, Haizhou Li | http://arxiv.org/pdf/2411.07751v1 | null |
2024-11-12 | Maritime Search and Rescue Missions with Aerial Images: A Survey | 基于航空图像的海洋搜救任务:综述 | Juan P. Martinez-Esteso, Francisco J. Castellanos, Jorge Calvo-Zaragoza, Antonio Javier Gallego | http://arxiv.org/pdf/2411.07649v1 | null |
2024-11-12 | Atmospheric turbulence restoration by diffeomorphic image registration and blind deconvolution | 基于微分形态图像配准和无迹卷积的大气湍流恢复 | Jerome Gilles, Tristan Dagobert, Carlo De Franchis | http://arxiv.org/pdf/2411.07578v1 | null |
2024-11-12 | Génération de bases de données images IR sous contraintes avec variabilité thermique intrinsèque des cibles | 基于内禀热变性的图像红外数据库约束生成 | Jerome Gilles, Stephane Landeau, Tristan Dagobert, Philippe Chevalier, Christian Bolut | http://arxiv.org/pdf/2411.07575v1 | null |
2024-11-12 | Uncertainty-Aware Test-Time Adaptation for Inverse Consistent Diffeomorphic Lung Image Registration | 基于不确定性的逆一致性形变肺图像配准的测试时自适应方法 | Muhammad F. A. Chaudhary, Stephanie M. Aguilera, Arie Nakhmani, Joseph M. Reinhardt, Surya P. Bhatt, Sandeep Bodduluri | http://arxiv.org/pdf/2411.07567v1 | null |
2024-11-12 | Act in Collusion: A Persistent Distributed Multi-Target Backdoor in Federated Learning | 在联盟学习中实施勾结:一种持续的分布式多目标后门 | Tao Liu, Wu Yang, Chen Xu, Jiguang Lv, Huanran Wang, Yuhang Zhang, Shuchun Xu, Dapeng Man | http://arxiv.org/pdf/2411.03926v2 | null |
2024-11-12 | Bootstrapping Reinforcement Learning with Imitation for Vision-Based Agile Flight | 基于模仿的自举视觉敏捷飞行强化学习 | Jiaxu Xing, Angel Romero, Leonard Bauersfeld, Davide Scaramuzza | http://arxiv.org/pdf/2403.12203v3 | null |
2024-11-12 | Temporal-Mapping Photography for Event Cameras | 基于时间映射的相机事件摄影 | Yuhan Bao, Lei Sun, Yuqin Ma, Kaiwei Wang | http://arxiv.org/pdf/2403.06443v2 | link |
2024-11-12 | REVEX: A Unified Framework for Removal-Based Explainable Artificial Intelligence in Video | REXVEX:基于移除的统一框架,用于视频中的可解释人工智能 | F. Xavier Gaya-Morey, Jose M. Buades-Rubio, I. Scott MacKenzie, Cristina Manresa-Yee | http://arxiv.org/pdf/2401.11796v2 | null |