Skip to content

Latest commit

 

History

History
138 lines (116 loc) · 26.4 KB

2024-12-07.md

File metadata and controls

138 lines (116 loc) · 26.4 KB

[UPDATED!] 2024-12-07 (Update Time)

3DGS

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-07 SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes 光谱运动:镜面场景的动态3D重建 Cheng-De Fan, Chen-Wei Chang, Yi-Ruei Liu, Jie-Ying Lee, Jiun-Long Huang, Yu-Chee Tseng, Yu-Lun Liu http://arxiv.org/pdf/2410.17249v2 None
2024-12-07 Temporally Compressed 3D Gaussian Splatting for Dynamic Scenes 时间压缩3D高斯分层渲染动态场景 Saqib Javed, Ahmad Jarrar Khan, Corentin Dumery, Chen Zhao, Mathieu Salzmann http://arxiv.org/pdf/2412.05700v1 None
2024-12-07 CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians CoherentGS:基于一致三维高斯稀疏新视角合成 Avinash Paliwal, Wei Ye, Jinhui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, Nima Khademi Kalantari http://arxiv.org/pdf/2403.19495v2 None
2024-12-07 LumiGauss: Relightable Gaussian Splatting in the Wild LumiGauss:野外可重光照的高斯散斑技术 Joanna Kaleta, Kacper Kania, Tomasz Trzcinski, Marek Kowalski http://arxiv.org/pdf/2408.04474v2 https://github.com/joaxkal/lumigauss.
2024-12-07 Rate-Distortion Optimized Skip Coding of Region Adaptive Hierarchical Transform Coefficients for MPEG G-PCC 基于率失真优化的MPEG G-PCC区域自适应分层变换系数跳过编码 Zehan Wang, Yuxuan Wei, Hui Yuan, Wei Zhang, Peng Li http://arxiv.org/pdf/2412.05574v1 None
2024-12-07 Template-free Articulated Gaussian Splatting for Real-time Reposable Dynamic View Synthesis 无模板关节高斯喷溅实时可重用动态视图合成 Diwen Wan, Yuxiang Wang, Ruijie Lu, Gang Zeng http://arxiv.org/pdf/2412.05570v1 None
2024-12-07 A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision 三维高斯Splats生成中的二维监督引导扩散:一次教学经验 Chensheng Peng, Ido Sobol, Masayoshi Tomizuka, Kurt Keutzer, Chenfeng Xu, Or Litany http://arxiv.org/pdf/2412.00623v2 None
2024-12-07 Text-to-3D Gaussian Splatting with Physics-Grounded Motion Generation 基于物理的动态生成与文本到3D高斯分层渲染 Wenqing Wang, Yun Fu http://arxiv.org/pdf/2412.05560v1 None
2024-12-07 Radiant: Large-scale 3D Gaussian Rendering based on Hierarchical Framework Radiant:基于分层框架的大规模3D高斯渲染 Haosong Peng, Tianyu Qi, Yufeng Zhan, Hao Li, Yalun Dai, Yuanqing Xia http://arxiv.org/pdf/2412.05546v1 None

3D视觉与重建

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-07 MetaFood3D: 3D Food Dataset with Nutrition Values MetaFood3D:含营养价值的3D食品数据集 Yuhao Chen, Jiangpeng He, Gautham Vinod, Siddeshwar Raghavan, Chris Czarnecki, Jinge Ma, Talha Ibn Mahmud, Bruce Coburn http://arxiv.org/pdf/2409.01966v2 None
2024-12-07 RefSAM3D: Adapting SAM with Cross-modal Reference for 3D Medical Image Segmentation RefSAM3D:基于跨模态参考的SAM自适应3D医学图像分割 Xiang Gao, Kai Lu http://arxiv.org/pdf/2412.05605v1 None
2024-12-07 Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space 地球的全球密集嵌入:主要TOM在潜在空间中漂浮 Mikolaj Czerkawski, Marcin Kluczek, Jędrzej S. Bojanowski http://arxiv.org/pdf/2412.05600v1 None
2024-12-07 Revisiting the Role of Texture in 3D Person Re-identification 重新审视纹理在3D人体重识别中的作用 Huy Nguyen, Kien Nguyen, Akila Pemasiri, Sridha Sridharan, Clinton Fookes http://arxiv.org/pdf/2410.00348v2 None
2024-12-07 TB-HSU: Hierarchical 3D Scene Understanding with Contextual Affordances TB-HSU:基于情境适应性的分层3D场景理解 Wenting Xu, Viorela Ila, Luping Zhou, Craig T. Jin http://arxiv.org/pdf/2412.05596v1 None
2024-12-07 Self-Supervised Masked Mesh Learning for Unsupervised Anomaly Detection on 3D Cortical Surfaces 自监督掩码网格学习用于3D皮质表面的无监督异常检测 Hao-Chun Yang, Sicheng Dai, Saige Rutherford, Christian Gaser Andre F Marquand, Christian F Beckmann, Thomas Wolfers http://arxiv.org/pdf/2412.05580v1 None
2024-12-07 UMSPU: Universal Multi-Size Phase Unwrapping via Mutual Self-Distillation and Adaptive Boosting Ensemble Segmenters UMSPU:基于互信息自蒸馏和自适应增强集成分割器的通用多尺度相位展开 Lintong Du, Huazhen Liu, Yijia Zhang, ShuXin Liu, Yuan Qu, Zenghui Zhang, Jiamiao Yang http://arxiv.org/pdf/2412.05584v1 None
2024-12-07 CoE: Deep Coupled Embedding for Non-Rigid Point Cloud Correspondences 深度耦合嵌入用于非刚性点云对应 Huajian Zeng, Maolin Gao, Daniel Cremers http://arxiv.org/pdf/2412.05557v1 None
2024-12-07 Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters 制作动画化:一个高效的三维角色动画制作框架 Zhiyang Guo, Jinxu Xiang, Kai Ma, Wengang Zhou, Houqiang Li, Ran Zhang http://arxiv.org/pdf/2411.18197v2 None
2024-12-07 Street Gaussians without 3D Object Tracker 《无需3D目标跟踪的街景高斯分布》 Ruida Zhang, Chengxi Li, Chenyangguang Zhang, Xingyu Liu, Haili Yuan, Yanyan Li, Xiangyang Ji, Gim Hee Lee http://arxiv.org/pdf/2412.05548v1 None
2024-12-07 Point-GN: A Non-Parametric Network Using Gaussian Positional Encoding for Point Cloud Classification 点云分类中的高斯位置编码非参数网络:Point-GN Marzieh Mohammadi, Amir Salarpour http://arxiv.org/pdf/2412.03056v2 None
2024-12-07 Key-Grid: Unsupervised 3D Keypoints Detection using Grid Heatmap Features 关键网格:基于网格热图特征的无监督3D关键点检测 Chengkai Hou, Zhengrong Xue, Bingyang Zhou, Jinghan Ke, Lin Shao, Huazhe Xu http://arxiv.org/pdf/2410.02237v3 None
2024-12-07 SignAvatar: Sign Language 3D Motion Reconstruction and Generation SignAvatar:手语3D运动重建与生成 Lu Dong, Lipisha Chaudhary, Fei Xu, Xiao Wang, Mason Lary, Ifeoma Nwogu http://arxiv.org/pdf/2405.07974v2 None
2024-12-07 AutoURDF: Unsupervised Robot Modeling from Point Cloud Frames Using Cluster Registration 自动URDF:基于聚类配准的无监督点云帧机器人建模 Jiong Lin, Lechen Zhang, Kwansoo Lee, Jialong Ning, Judah Goldfeder, Hod Lipson http://arxiv.org/pdf/2412.05507v1 None
2024-12-07 LaMoD: Latent Motion Diffusion Model For Myocardial Strain Generation LaMoD:心肌应变生成的潜在运动扩散模型 Jiarui Xing, Nivetha Jayakumar, Nian Wu, Yu Wang, Frederick H. Epstein, Miaomiao Zhang http://arxiv.org/pdf/2407.02229v2 https://github.com/jr-xing/LaMoD.

图像生成与编辑

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-07 Compositional Image Retrieval via Instruction-Aware Contrastive Learning 基于指令感知对比学习的组合图像检索 Wenliang Zhong, Weizhi An, Feng Jiang, Hehuan Ma, Yuzhi Guo, Junzhou Huang http://arxiv.org/pdf/2412.05756v1 None
2024-12-07 Emulating Clinical Quality Muscle B-mode Ultrasound Images from Plane Wave Images Using a Two-Stage Machine Learning Model 模拟临床质量肌肉B模式超声图像的二维平面波图像使用两阶段机器学习模型 Reed Chen, Courtney Trutna Paley, Wren Wightman, Lisa Hobson-Webb, Yohei Harada, Felix Jin, Ouwen Huang, Mark Palmeri http://arxiv.org/pdf/2412.05758v1 None
2024-12-07 FIPER: Generalizable Factorized Fields for Joint Image Compression and Super-Resolution FIPER:用于联合图像压缩和超分辨率的一般化分解域 Yang-Che Sun, Cheng Yu Yeo, Ernie Chu, Jun-Cheng Chen, Yu-Lun Liu http://arxiv.org/pdf/2410.18083v2 None
2024-12-07 A Tiered GAN Approach for Monet-Style Image Generation 分层生成对抗网络在莫奈风格图像生成中的应用 FNU Neha, Deepshikha Bhati, Deepak Kumar Shukla, Md Amiruzzaman http://arxiv.org/pdf/2412.05724v1 None
2024-12-07 Evaluating Hallucination in Text-to-Image Diffusion Models with Scene-Graph based Question-Answering Agent 评估基于场景图问答代理的文本到图像扩散模型中的幻觉 Ziyuan Qin, Dongjie Cheng, Haoyu Wang, Huahui Yi, Yuting Shao, Zhiyuan Fan, Kang Li, Qicheng Lao http://arxiv.org/pdf/2412.05722v1 None
2024-12-07 T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts T2I-FactualBench:基于知识密集型概念的文本到图像模型事实性基准测试 Ziwei Huang, Wanggui He, Quanyu Long, Yandi Wang, Haoyuan Li, Zhelun Yu, Fangxun Shu, Long Chan http://arxiv.org/pdf/2412.04300v2 None
2024-12-07 Jointly RS Image Deblurring and Super-Resolution with Adjustable-Kernel and Multi-Domain Attention 联合可调核和多域注意力下的RS图像去模糊与超分辨率 Yan Zhang, Pengcheng Zheng, Chengxiao Zeng, Bin Xiao, Zhenghao Li, Xinbo Gao http://arxiv.org/pdf/2412.05696v1 https://github.com/zpc456/AKMD-Net.
2024-12-07 One-for-All: Towards Universal Domain Translation with a Single StyleGAN 一网打尽:基于单个StyleGAN的通用领域翻译 Yong Du, Jiahui Zhan, Xinzhe Li, Junyu Dong, Sheng Chen, Ming-Hsuan Yang, Shengfeng He http://arxiv.org/pdf/2310.14222v2 None
2024-12-07 Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising Remix-DiT:混合扩散变换器以实现多专家去噪 Gongfan Fang, Xinyin Ma, Xinchao Wang http://arxiv.org/pdf/2412.05628v1 https://github.com/VainF/Remix-DiT.
2024-12-07 Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC 我们需要为不同任务设计特定的扩散模型吗?尝试ONE-PIC Ming Tao, Bing-Kun Bao, Yaowei Wang, Changsheng Xu http://arxiv.org/pdf/2412.05619v1 https://github.com/tobran/ONE-PIC.
2024-12-07 SimGen: Simulator-conditioned Driving Scene Generation SimGen:模拟器条件下的驾驶场景生成 Yunsong Zhou, Michael Simon, Zhenghao Peng, Sicheng Mo, Hongzi Zhu, Minyi Guo, Bolei Zhou http://arxiv.org/pdf/2406.09386v3 None
2024-12-07 Learning Hierarchical Color Guidance for Depth Map Super-Resolution 学习用于深度图超分辨率的层次化颜色引导 Runmin Cong, Ronghui Sheng, Hao Wu, Yulan Guo, Yunchao Wei, Wangmeng Zuo, Yao Zhao, Sam Kwong http://arxiv.org/pdf/2403.07290v2 None
2024-12-07 SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion SwiftEdit:通过一步扩散实现闪电般的文本引导图像编辑 Trong-Tung Nguyen, Quang Nguyen, Khoi Nguyen, Anh Tran, Cuong Pham http://arxiv.org/pdf/2412.04301v2 None
2024-12-07 Name Your Style: An Arbitrary Artist-aware Image Style Transfer 命名你的风格:任意艺术家感知的图像风格迁移 Zhi-Song Liu, Li-Wen Wang, Wan-Chi Siu, Vicky Kalogeiton http://arxiv.org/pdf/2202.13562v3 None
2024-12-07 Uncovering Vision Modality Threats in Image-to-Image Tasks 揭示图像到图像任务中的视觉模态威胁 Hao Cheng, Erjia Xiao, Jiayan Yang, Jiahang Cao, Qiang Zhang, Jize Zhang, Kaidi Xu, Jindong Gu http://arxiv.org/pdf/2412.05538v1 None
2024-12-07 Test-time Cost-and-Quality Controllable Arbitrary-Scale Super-Resolution with Variable Fourier Components 基于可变傅里叶分量的测试时成本与质量可控任意尺度超分辨率 Kazutoshi Akita, Norimichi Ukita http://arxiv.org/pdf/2412.05517v1 None
2024-12-07 A Comparative Study of Image Denoising Algorithms 图像去噪算法的比较研究 Muhammad Umair Danish http://arxiv.org/pdf/2412.05490v1 None
2024-12-07 Enhancing Sample Generation of Diffusion Models using Noise Level Correction 基于噪声水平校正增强扩散模型样本生成 Abulikemu Abuduweili, Chenyang Yuan, Changliu Liu, Frank Permenter http://arxiv.org/pdf/2412.05488v1 None

多模态学习

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-07 Multimodal Fusion Balancing Through Game-Theoretic Regularization 通过博弈论正则化的多模态融合平衡 Konstantinos Kontras, Thomas Strypsteen, Christos Chatzichristos, Paul Pu Liang, Matthew Blaschko, Maarten De Vos http://arxiv.org/pdf/2411.07335v2 None
2024-12-07 HMGIE: Hierarchical and Multi-Grained Inconsistency Evaluation for Vision-Language Data Cleansing 层次化和多粒度不一致性评估用于视觉-语言数据清洗 Zihao Zhu, Hongbao Zhang, Guanzong Wu, Siwei Lyu, Baoyuan Wu http://arxiv.org/pdf/2412.05685v1 None
2024-12-07 RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts RSUniVLM:基于粒度导向的专家混合的遥感统一视觉语言模型 Xu Liu, Zhouhui Lian http://arxiv.org/pdf/2412.05679v1 https://github.com/xuliu-cyber/RSUniVLM
2024-12-07 Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning? 《多模态大型语言模型真的能实现多模态情境学习吗?》 Shuo Chen, Zhen Han, Bailan He, Jianzhe Liu, Mark Buckley, Yao Qin, Philip Torr, Volker Tresp http://arxiv.org/pdf/2311.18021v2 None
2024-12-07 Multimodal Biometric Authentication Using Camera-Based PPG and Fingerprint Fusion 基于摄像头PPG和指纹融合的多模态生物识别认证 Xue Xian Zheng, M. M. Ur Rahma, Bilal Taha, Mudassir Masood, Dimitrios Hatzinakos, Tareq Al-Naffouri http://arxiv.org/pdf/2412.05660v1 None
2024-12-07 Enhancing CLIP Conceptual Embedding through Knowledge Distillation 通过知识蒸馏增强CLIP概念嵌入 Kuei-Chun Kao http://arxiv.org/pdf/2412.03513v2 None
2024-12-07 Biological Brain Age Estimation using Sex-Aware Adversarial Variational Autoencoder with Multimodal Neuroimages 基于多模态神经图像的性别感知对抗变分自编码器在生物大脑年龄估计中的应用 Abd Ur Rehman, Azka Rehman, Muhammad Usman, Abdullah Shahid, Sung-Min Gho, Aleum Lee, Tariq M. Khan, Imran Razzak http://arxiv.org/pdf/2412.05632v1 None
2024-12-07 Dif4FF: Leveraging Multimodal Diffusion Models and Graph Neural Networks for Accurate New Fashion Product Performance Forecasting Dif4FF:利用多模态扩散模型和图神经网络进行准确的新时尚产品性能预测 Andrea Avogaro, Luigi Capogrosso, Franco Fummi, Marco Cristani http://arxiv.org/pdf/2412.05566v1 None
2024-12-07 Domain Generalization Using Large Pretrained Models with Mixture-of-Adapters 基于混合适配器的预训练大模型在领域泛化中的应用 Gyuseong Lee, Wooseok Jang, Jinhyeon Kim, Jaewoo Jung, Seungryong Kim http://arxiv.org/pdf/2310.11031v2 None
2024-12-07 WavFusion: Towards wav2vec 2.0 Multimodal Speech Emotion Recognition WavFusion:迈向wav2vec 2.0 多模态语音情感识别 Feng Li, Jiusong Luo, Wanjun Xia http://arxiv.org/pdf/2412.05558v1 None
2024-12-07 Comprehensive Evaluation of Multimodal AI Models in Medical Imaging Diagnosis: From Data Augmentation to Preference-Based Comparison 多模态人工智能模型在医学影像诊断中的全面评估:从数据增强到基于偏好的比较 Cailian Ruan, Chengyue Huang, Yahe Yang http://arxiv.org/pdf/2412.05536v1 None
2024-12-07 TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action TACO:通过合成思维链与行动学习多模态动作模型 Zixian Ma, Jianguo Zhang, Zhiwei Liu, Jieyu Zhang, Juntao Tan, Manli Shu, Juan Carlos Niebles, Shelby Heinecke http://arxiv.org/pdf/2412.05479v1 None

目标检测与分割

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-07 BEV-SUSHI: Multi-Target Multi-Camera 3D Detection and Tracking in Bird's-Eye View BEV-SUSHI:鸟瞰视图中多目标多摄像头3D检测与跟踪 Yizhou Wang, Tim Meinhardt, Orcun Cetintas, Cheng-Yen Yang, Sameer Satish Pusegaonkar, Benjamin Missaoui, Sujit Biswas, Zheng Tang http://arxiv.org/pdf/2412.00692v2 None
2024-12-07 Integrating YOLO11 and Convolution Block Attention Module for Multi-Season Segmentation of Tree Trunks and Branches in Commercial Apple Orchards 将YOLO11与卷积块注意力模块集成,用于商业苹果园中树干和枝条的多季节分割 Ranjan Sapkota, Manoj Karkee http://arxiv.org/pdf/2412.05728v1 None
2024-12-07 Impact of Sunglasses on One-to-Many Facial Identification Accuracy 《太阳镜对一对一多人面部识别准确率的影响》 Sicong Tian, Haiyu Wu, Michael C. King, Kevin W. Bowyer http://arxiv.org/pdf/2412.05721v1 None
2024-12-07 DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation DeNVeR:用于无监督视频血管分割的可变形神经网络血管表示 Chun-Hung Wu, Shih-Hong Chen, Chih-Yao Hu, Hsin-Yu Wu, Kai-Hsin Chen, Yu-You Chen, Chih-Hai Su, Chih-Kuo Lee http://arxiv.org/pdf/2406.01591v3 None
2024-12-07 Segment-Level Road Obstacle Detection Using Visual Foundation Model Priors and Likelihood Ratios 基于视觉基础模型先验和似然比的道路障碍物段级检测 Youssef Shoeb, Nazir Nayal, Azarm Nowzard, Fatma Güney, Hanno Gottschalk http://arxiv.org/pdf/2412.05707v1 None
2024-12-07 Early Diagnosis of Alzheimer's Diseases and Dementia from MRI Images Using an Ensemble Deep Learning 基于集成深度学习的MRI图像早期诊断阿尔茨海默病和痴呆 Mozhgan Naderi, Maryam Rastgarpour, Amir Reza Takhsha http://arxiv.org/pdf/2412.05666v1 None
2024-12-07 Nearly Solved? Robust Deepfake Detection Requires More than Visual Forensics 几乎解决了吗?鲁棒的深度伪造检测需要超越视觉取证 Guy Levy, Nathan Liebmann http://arxiv.org/pdf/2412.05676v1 None
2024-12-07 A Comprehensive Assessment Benchmark for Rigorously Evaluating Deep Learning Image Classifiers 深度学习图像分类器严格评估的综合评估基准 Michael W. Spratling http://arxiv.org/pdf/2308.04137v2 None
2024-12-07 Rethinking Annotation for Object Detection: Is Annotating Small-size Instances Worth Its Cost? 重新思考目标检测的标注:标注小型实例是否物有所值? Yusuke Hosoya, Masanori Suganuma, Takayuki Okatani http://arxiv.org/pdf/2412.05611v1 None
2024-12-07 Multispecies Animal Re-ID Using a Large Community-Curated Dataset 多物种动物重识别:利用大型社区编纂数据集 Lasha Otarashvili, Tamilselvan Subramanian, Jason Holmberg, J. J. Levenson, Charles V. Stewart http://arxiv.org/pdf/2412.05602v1 None
2024-12-07 Frequency Perception Network for Camouflaged Object Detection 频率感知网络用于伪装物体检测 Runmin Cong, Mengyao Sun, Sanyi Zhang, Xiaofei Zhou, Wei Zhang, Yao Zhao http://arxiv.org/pdf/2308.08924v2 None
2024-12-07 Real-Time 3D Object Detection Using InnovizOne LiDAR and Low-Power Hailo-8 AI Accelerator 实时3D物体检测:基于InnovizOne激光雷达和低功耗Hailo-8 AI加速器 Itay Krispin-Avraham, Roy Orfaig, Ben-Zion Bobrovsky http://arxiv.org/pdf/2412.05594v1 None
2024-12-07 Point-aware Interaction and CNN-induced Refinement Network for RGB-D Salient Object Detection 点感知交互与CNN诱导的细化网络用于RGB-D显著目标检测 Runmin Cong, Hongyu Liu, Chen Zhang, Wei Zhang, Feng Zheng, Ran Song, Sam Kwong http://arxiv.org/pdf/2308.08930v2 None
2024-12-07 SDDNet: Style-guided Dual-layer Disentanglement Network for Shadow Detection SDDNet:用于阴影检测的基于风格的分层解耦网络 Runmin Cong, Yuchen Guan, Jinpeng Chen, Wei Zhang, Yao Zhao, Sam Kwong http://arxiv.org/pdf/2308.08935v2 None
2024-12-07 Robust Sequential DeepFake Detection 鲁棒性序列深度伪造检测 Rui Shao, Tianxing Wu, Ziwei Liu http://arxiv.org/pdf/2309.14991v2 None
2024-12-07 Unsupervised Gait Recognition with Selective Fusion 无监督步态识别中的选择性融合 Xuqian Ren, Shaopeng Yang, Saihui Hou, Chunshui Cao, Xu Liu, Yongzhen Huang http://arxiv.org/pdf/2303.10772v3 None
2024-12-07 Query-guided Prototype Evolution Network for Few-Shot Segmentation 基于查询引导的原型进化网络用于少样本分割 Runmin Cong, Hang Xiong, Jinpeng Chen, Wei Zhang, Qingming Huang, Yao Zhao http://arxiv.org/pdf/2403.06488v2 None
2024-12-07 Jailbreak Large Vision-Language Models Through Multi-Modal Linkage 通过多模态链接破解大型视觉-语言模型 Yu Wang, Xiaofei Zhou, Yichen Wang, Geyuan Zhang, Tianxing He http://arxiv.org/pdf/2412.00473v3 https://github.com/wangyu-ovo/MML
2024-12-07 UNet++ and LSTM combined approach for Breast Ultrasound Image Segmentation 基于UNet++和LSTM的乳腺超声图像分割联合方法 Saba Hesaraki, Morteza Akbari, Ramin Mousa http://arxiv.org/pdf/2412.05585v1 None
2024-12-07 From Deterministic to Probabilistic: A Novel Perspective on Domain Generalization for Medical Image Segmentation 从确定性到概率性:医学图像分割领域泛化的新视角 Yuheng Xu, Taiping Zhang http://arxiv.org/pdf/2412.05572v1 None
2024-12-07 Exploiting Precision Mapping and Component-Specific Feature Enhancement for Breast Cancer Segmentation and Identification 利用精确映射和组件特定特征增强进行乳腺癌分割与识别 Pandiyaraju V, Shravan Venkatraman, Pavan Kumar S, Santhosh Malarvannan, Kannan A http://arxiv.org/pdf/2407.02844v5 None
2024-12-07 Neighborhood Commonality-aware Evolution Network for Continuous Generalized Category Discovery 邻域共性感知的连续广义类别发现进化网络 Ye Wang, Yaxiong Wang, Guoshuai Zhao, Xueming Qian http://arxiv.org/pdf/2412.05573v1 https://github.com/xjtuYW/NCENet.git
2024-12-07 Psych-Occlusion: Using Visual Psychophysics for Aerial Detection of Occluded Persons during Search and Rescue 心理遮挡:利用视觉心理物理学进行搜救期间空中遮挡人员检测 Arturo Miguel Russell Bernal, Jane Cleland-Huang, Walter Scheirer http://arxiv.org/pdf/2412.05553v1 https://github.com/ArtRuss/NOMAD.
2024-12-07 SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts SAME:基于状态自适应专家混合的通用语言引导视觉导航学习 Gengze Zhou, Yicong Hong, Zun Wang, Chongyang Zhao, Mohit Bansal, Qi Wu http://arxiv.org/pdf/2412.05552v1 None
2024-12-07 TLDR: Text Based Last-layer Retraining for Debiasing Image Classifiers 基于文本的最后一层重新训练以去偏图像分类器 Juhyeon Park, Seokhyeon Jeong, Taesup Moon http://arxiv.org/pdf/2311.18291v2 https://github.com/beotborry/TLDR
2024-12-07 GAQAT: gradient-adaptive quantization-aware training for domain generalization GAQAT:用于领域泛化的梯度自适应量化感知训练 Jiacheng Jiang, Yuan Meng, Chen Tang, Han Yu, Qun Li, Zhi Wang, Wenwu Zhu http://arxiv.org/pdf/2412.05551v1 None
2024-12-07 NOMAD: A Natural, Occluded, Multi-scale Aerial Dataset, for Emergency Response Scenarios NOMAD:一种用于应急响应场景的自然、遮挡、多尺度航空数据集 Arturo Miguel Russell Bernal, Walter Scheirer, Jane Cleland-Huang http://arxiv.org/pdf/2309.09518v2 https://github.com/ArtRuss/NOMAD.
2024-12-07 Action Recognition based Industrial Safety Violation Detection 基于动作识别的工业安全违规检测 Surya N Reddy, Vaibhav Kurrey, Mayank Nagar, Gagan Raj Gupta http://arxiv.org/pdf/2412.05531v1 None
2024-12-07 CLIP-TNseg: A Multi-Modal Hybrid Framework for Thyroid Nodule Segmentation in Ultrasound Images CLIP-TNseg:超声图像甲状腺结节分割的多模态混合框架 Xinjie Sun, Boxiong Wei, Yalong Jiang, Liquan Mao, Qi Zhao http://arxiv.org/pdf/2412.05530v1 https://github.com/jayxjsun/CLIP-TNseg.
2024-12-07 RealDex: Towards Human-like Grasping for Robotic Dexterous Hand RealDex:迈向类人抓取的机器人灵巧手 Yumeng Liu, Yaxun Yang, Youzhuo Wang, Xiaofei Wu, Jiamin Wang, Yichen Yao, Sören Schwertfeger, Sibei Yang http://arxiv.org/pdf/2402.13853v2 None
2024-12-07 DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection 深度伪造适配器:用于深度伪造检测的双层适配器 Rui Shao, Tianxing Wu, Liqiang Nie, Ziwei Liu http://arxiv.org/pdf/2306.00863v2 https://github.com/rshaojimmy/DeepFake-Adapter.
2024-12-07 Securing Social Media Against Deepfakes using Identity, Behavioral, and Geometric Signatures 利用身份、行为和几何签名保障社交媒体免受深度伪造攻击 Muhammad Umar Farooq, Awais Khan, Ijaz Ul Haq, Khalid Mahmood Malik http://arxiv.org/pdf/2412.05487v1 None
2024-12-07 DiffuBox: Refining 3D Object Detection with Point Diffusion DiffuBox:基于点扩散的3D目标检测精炼 Xiangyu Chen, Zhenzhen Liu, Katie Z Luo, Siddhartha Datta, Adhitya Polavaram, Yan Wang, Yurong You, Boyi Li http://arxiv.org/pdf/2405.16034v2 https://github.com/cxy1997/DiffuBox

视频理解与处理

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-07 Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events 黑天鹅:不可预测事件中的推论和可反驳的视频推理 Aditya Chinchure, Sahithya Ravi, Raymond Ng, Vered Shwartz, Boyang Li, Leonid Sigal http://arxiv.org/pdf/2412.05725v1 None
2024-12-07 Efficient Continuous Video Flow Model for Video Prediction 高效连续视频流模型用于视频预测 Gaurav Shrivastava, Abhinav Shrivastava http://arxiv.org/pdf/2412.05633v1 None
2024-12-07 Video2Reward: Generating Reward Function from Videos for Legged Robot Behavior Learning 视频2奖励:为腿部机器人行为学习生成奖励函数 Runhao Zeng, Dingjie Zhou, Qiwei Liang, Junlin Liu, Hui Li, Changxin Huang, Jianqiang Li, Xiping Hu http://arxiv.org/pdf/2412.05515v1 None

其他

发布日期 英文标题 中文标题 作者 PDF链接 代码链接
2024-12-07 Evaluating Automated Radiology Report Quality through Fine-Grained Phrasal Grounding of Clinical Findings 通过临床发现的细粒度短语定位评估自动放射学报告质量 Razi Mahmood, Pingkun Yan, Diego Machado Reyes, Ge Wang, Mannudeep K. Kalra, Parisa Kaviani, Joy T. Wu, Tanveer Syeda-Mahmood http://arxiv.org/pdf/2412.01031v2 None
2024-12-07 Neural network interpretability with layer-wise relevance propagation: novel techniques for neuron selection and visualization 基于层级相关传播的神经网络可解释性:神经元选择和可视化的新方法 Deepshikha Bhati, Fnu Neha, Md Amiruzzaman, Angela Guercio, Deepak Kumar Shukla, Ben Ward http://arxiv.org/pdf/2412.05686v1 None
2024-12-07 PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation PrefixKV:视觉指令跟随模型高效生成所需的自适应前缀键值缓存 Ao Wang, Hui Chen, Jianchao Tan, Kefeng Zhang, Xunliang Cai, Zijia Lin, Jungong Han, Guiguang Ding http://arxiv.org/pdf/2412.03409v2 https://github.com/THU-MIG/PrefixKV
2024-12-07 Automated LaTeX Code Generation from Handwritten Math Expressions Using Vision Transformer 基于视觉Transformer的手写数学表达式自动LaTeX代码生成 Jayaprakash Sundararaj, Akhil Vyas, Benjamin Gonzalez-Maldonado http://arxiv.org/pdf/2412.03853v2 None