发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-07 | SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes | 光谱运动:镜面场景的动态3D重建 | Cheng-De Fan, Chen-Wei Chang, Yi-Ruei Liu, Jie-Ying Lee, Jiun-Long Huang, Yu-Chee Tseng, Yu-Lun Liu | http://arxiv.org/pdf/2410.17249v2 | None |
2024-12-07 | Temporally Compressed 3D Gaussian Splatting for Dynamic Scenes | 时间压缩3D高斯分层渲染动态场景 | Saqib Javed, Ahmad Jarrar Khan, Corentin Dumery, Chen Zhao, Mathieu Salzmann | http://arxiv.org/pdf/2412.05700v1 | None |
2024-12-07 | CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians | CoherentGS:基于一致三维高斯稀疏新视角合成 | Avinash Paliwal, Wei Ye, Jinhui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, Nima Khademi Kalantari | http://arxiv.org/pdf/2403.19495v2 | None |
2024-12-07 | LumiGauss: Relightable Gaussian Splatting in the Wild | LumiGauss:野外可重光照的高斯散斑技术 | Joanna Kaleta, Kacper Kania, Tomasz Trzcinski, Marek Kowalski | http://arxiv.org/pdf/2408.04474v2 | https://github.com/joaxkal/lumigauss. |
2024-12-07 | Rate-Distortion Optimized Skip Coding of Region Adaptive Hierarchical Transform Coefficients for MPEG G-PCC | 基于率失真优化的MPEG G-PCC区域自适应分层变换系数跳过编码 | Zehan Wang, Yuxuan Wei, Hui Yuan, Wei Zhang, Peng Li | http://arxiv.org/pdf/2412.05574v1 | None |
2024-12-07 | Template-free Articulated Gaussian Splatting for Real-time Reposable Dynamic View Synthesis | 无模板关节高斯喷溅实时可重用动态视图合成 | Diwen Wan, Yuxiang Wang, Ruijie Lu, Gang Zeng | http://arxiv.org/pdf/2412.05570v1 | None |
2024-12-07 | A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision | 三维高斯Splats生成中的二维监督引导扩散:一次教学经验 | Chensheng Peng, Ido Sobol, Masayoshi Tomizuka, Kurt Keutzer, Chenfeng Xu, Or Litany | http://arxiv.org/pdf/2412.00623v2 | None |
2024-12-07 | Text-to-3D Gaussian Splatting with Physics-Grounded Motion Generation | 基于物理的动态生成与文本到3D高斯分层渲染 | Wenqing Wang, Yun Fu | http://arxiv.org/pdf/2412.05560v1 | None |
2024-12-07 | Radiant: Large-scale 3D Gaussian Rendering based on Hierarchical Framework | Radiant:基于分层框架的大规模3D高斯渲染 | Haosong Peng, Tianyu Qi, Yufeng Zhan, Hao Li, Yalun Dai, Yuanqing Xia | http://arxiv.org/pdf/2412.05546v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-07 | MetaFood3D: 3D Food Dataset with Nutrition Values | MetaFood3D:含营养价值的3D食品数据集 | Yuhao Chen, Jiangpeng He, Gautham Vinod, Siddeshwar Raghavan, Chris Czarnecki, Jinge Ma, Talha Ibn Mahmud, Bruce Coburn | http://arxiv.org/pdf/2409.01966v2 | None |
2024-12-07 | RefSAM3D: Adapting SAM with Cross-modal Reference for 3D Medical Image Segmentation | RefSAM3D:基于跨模态参考的SAM自适应3D医学图像分割 | Xiang Gao, Kai Lu | http://arxiv.org/pdf/2412.05605v1 | None |
2024-12-07 | Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space | 地球的全球密集嵌入:主要TOM在潜在空间中漂浮 | Mikolaj Czerkawski, Marcin Kluczek, Jędrzej S. Bojanowski | http://arxiv.org/pdf/2412.05600v1 | None |
2024-12-07 | Revisiting the Role of Texture in 3D Person Re-identification | 重新审视纹理在3D人体重识别中的作用 | Huy Nguyen, Kien Nguyen, Akila Pemasiri, Sridha Sridharan, Clinton Fookes | http://arxiv.org/pdf/2410.00348v2 | None |
2024-12-07 | TB-HSU: Hierarchical 3D Scene Understanding with Contextual Affordances | TB-HSU:基于情境适应性的分层3D场景理解 | Wenting Xu, Viorela Ila, Luping Zhou, Craig T. Jin | http://arxiv.org/pdf/2412.05596v1 | None |
2024-12-07 | Self-Supervised Masked Mesh Learning for Unsupervised Anomaly Detection on 3D Cortical Surfaces | 自监督掩码网格学习用于3D皮质表面的无监督异常检测 | Hao-Chun Yang, Sicheng Dai, Saige Rutherford, Christian Gaser Andre F Marquand, Christian F Beckmann, Thomas Wolfers | http://arxiv.org/pdf/2412.05580v1 | None |
2024-12-07 | UMSPU: Universal Multi-Size Phase Unwrapping via Mutual Self-Distillation and Adaptive Boosting Ensemble Segmenters | UMSPU:基于互信息自蒸馏和自适应增强集成分割器的通用多尺度相位展开 | Lintong Du, Huazhen Liu, Yijia Zhang, ShuXin Liu, Yuan Qu, Zenghui Zhang, Jiamiao Yang | http://arxiv.org/pdf/2412.05584v1 | None |
2024-12-07 | CoE: Deep Coupled Embedding for Non-Rigid Point Cloud Correspondences | 深度耦合嵌入用于非刚性点云对应 | Huajian Zeng, Maolin Gao, Daniel Cremers | http://arxiv.org/pdf/2412.05557v1 | None |
2024-12-07 | Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters | 制作动画化:一个高效的三维角色动画制作框架 | Zhiyang Guo, Jinxu Xiang, Kai Ma, Wengang Zhou, Houqiang Li, Ran Zhang | http://arxiv.org/pdf/2411.18197v2 | None |
2024-12-07 | Street Gaussians without 3D Object Tracker | 《无需3D目标跟踪的街景高斯分布》 | Ruida Zhang, Chengxi Li, Chenyangguang Zhang, Xingyu Liu, Haili Yuan, Yanyan Li, Xiangyang Ji, Gim Hee Lee | http://arxiv.org/pdf/2412.05548v1 | None |
2024-12-07 | Point-GN: A Non-Parametric Network Using Gaussian Positional Encoding for Point Cloud Classification | 点云分类中的高斯位置编码非参数网络:Point-GN | Marzieh Mohammadi, Amir Salarpour | http://arxiv.org/pdf/2412.03056v2 | None |
2024-12-07 | Key-Grid: Unsupervised 3D Keypoints Detection using Grid Heatmap Features | 关键网格:基于网格热图特征的无监督3D关键点检测 | Chengkai Hou, Zhengrong Xue, Bingyang Zhou, Jinghan Ke, Lin Shao, Huazhe Xu | http://arxiv.org/pdf/2410.02237v3 | None |
2024-12-07 | SignAvatar: Sign Language 3D Motion Reconstruction and Generation | SignAvatar:手语3D运动重建与生成 | Lu Dong, Lipisha Chaudhary, Fei Xu, Xiao Wang, Mason Lary, Ifeoma Nwogu | http://arxiv.org/pdf/2405.07974v2 | None |
2024-12-07 | AutoURDF: Unsupervised Robot Modeling from Point Cloud Frames Using Cluster Registration | 自动URDF:基于聚类配准的无监督点云帧机器人建模 | Jiong Lin, Lechen Zhang, Kwansoo Lee, Jialong Ning, Judah Goldfeder, Hod Lipson | http://arxiv.org/pdf/2412.05507v1 | None |
2024-12-07 | LaMoD: Latent Motion Diffusion Model For Myocardial Strain Generation | LaMoD:心肌应变生成的潜在运动扩散模型 | Jiarui Xing, Nivetha Jayakumar, Nian Wu, Yu Wang, Frederick H. Epstein, Miaomiao Zhang | http://arxiv.org/pdf/2407.02229v2 | https://github.com/jr-xing/LaMoD. |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-07 | Compositional Image Retrieval via Instruction-Aware Contrastive Learning | 基于指令感知对比学习的组合图像检索 | Wenliang Zhong, Weizhi An, Feng Jiang, Hehuan Ma, Yuzhi Guo, Junzhou Huang | http://arxiv.org/pdf/2412.05756v1 | None |
2024-12-07 | Emulating Clinical Quality Muscle B-mode Ultrasound Images from Plane Wave Images Using a Two-Stage Machine Learning Model | 模拟临床质量肌肉B模式超声图像的二维平面波图像使用两阶段机器学习模型 | Reed Chen, Courtney Trutna Paley, Wren Wightman, Lisa Hobson-Webb, Yohei Harada, Felix Jin, Ouwen Huang, Mark Palmeri | http://arxiv.org/pdf/2412.05758v1 | None |
2024-12-07 | FIPER: Generalizable Factorized Fields for Joint Image Compression and Super-Resolution | FIPER:用于联合图像压缩和超分辨率的一般化分解域 | Yang-Che Sun, Cheng Yu Yeo, Ernie Chu, Jun-Cheng Chen, Yu-Lun Liu | http://arxiv.org/pdf/2410.18083v2 | None |
2024-12-07 | A Tiered GAN Approach for Monet-Style Image Generation | 分层生成对抗网络在莫奈风格图像生成中的应用 | FNU Neha, Deepshikha Bhati, Deepak Kumar Shukla, Md Amiruzzaman | http://arxiv.org/pdf/2412.05724v1 | None |
2024-12-07 | Evaluating Hallucination in Text-to-Image Diffusion Models with Scene-Graph based Question-Answering Agent | 评估基于场景图问答代理的文本到图像扩散模型中的幻觉 | Ziyuan Qin, Dongjie Cheng, Haoyu Wang, Huahui Yi, Yuting Shao, Zhiyuan Fan, Kang Li, Qicheng Lao | http://arxiv.org/pdf/2412.05722v1 | None |
2024-12-07 | T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts | T2I-FactualBench:基于知识密集型概念的文本到图像模型事实性基准测试 | Ziwei Huang, Wanggui He, Quanyu Long, Yandi Wang, Haoyuan Li, Zhelun Yu, Fangxun Shu, Long Chan | http://arxiv.org/pdf/2412.04300v2 | None |
2024-12-07 | Jointly RS Image Deblurring and Super-Resolution with Adjustable-Kernel and Multi-Domain Attention | 联合可调核和多域注意力下的RS图像去模糊与超分辨率 | Yan Zhang, Pengcheng Zheng, Chengxiao Zeng, Bin Xiao, Zhenghao Li, Xinbo Gao | http://arxiv.org/pdf/2412.05696v1 | https://github.com/zpc456/AKMD-Net. |
2024-12-07 | One-for-All: Towards Universal Domain Translation with a Single StyleGAN | 一网打尽:基于单个StyleGAN的通用领域翻译 | Yong Du, Jiahui Zhan, Xinzhe Li, Junyu Dong, Sheng Chen, Ming-Hsuan Yang, Shengfeng He | http://arxiv.org/pdf/2310.14222v2 | None |
2024-12-07 | Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising | Remix-DiT:混合扩散变换器以实现多专家去噪 | Gongfan Fang, Xinyin Ma, Xinchao Wang | http://arxiv.org/pdf/2412.05628v1 | https://github.com/VainF/Remix-DiT. |
2024-12-07 | Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC | 我们需要为不同任务设计特定的扩散模型吗?尝试ONE-PIC | Ming Tao, Bing-Kun Bao, Yaowei Wang, Changsheng Xu | http://arxiv.org/pdf/2412.05619v1 | https://github.com/tobran/ONE-PIC. |
2024-12-07 | SimGen: Simulator-conditioned Driving Scene Generation | SimGen:模拟器条件下的驾驶场景生成 | Yunsong Zhou, Michael Simon, Zhenghao Peng, Sicheng Mo, Hongzi Zhu, Minyi Guo, Bolei Zhou | http://arxiv.org/pdf/2406.09386v3 | None |
2024-12-07 | Learning Hierarchical Color Guidance for Depth Map Super-Resolution | 学习用于深度图超分辨率的层次化颜色引导 | Runmin Cong, Ronghui Sheng, Hao Wu, Yulan Guo, Yunchao Wei, Wangmeng Zuo, Yao Zhao, Sam Kwong | http://arxiv.org/pdf/2403.07290v2 | None |
2024-12-07 | SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion | SwiftEdit:通过一步扩散实现闪电般的文本引导图像编辑 | Trong-Tung Nguyen, Quang Nguyen, Khoi Nguyen, Anh Tran, Cuong Pham | http://arxiv.org/pdf/2412.04301v2 | None |
2024-12-07 | Name Your Style: An Arbitrary Artist-aware Image Style Transfer | 命名你的风格:任意艺术家感知的图像风格迁移 | Zhi-Song Liu, Li-Wen Wang, Wan-Chi Siu, Vicky Kalogeiton | http://arxiv.org/pdf/2202.13562v3 | None |
2024-12-07 | Uncovering Vision Modality Threats in Image-to-Image Tasks | 揭示图像到图像任务中的视觉模态威胁 | Hao Cheng, Erjia Xiao, Jiayan Yang, Jiahang Cao, Qiang Zhang, Jize Zhang, Kaidi Xu, Jindong Gu | http://arxiv.org/pdf/2412.05538v1 | None |
2024-12-07 | Test-time Cost-and-Quality Controllable Arbitrary-Scale Super-Resolution with Variable Fourier Components | 基于可变傅里叶分量的测试时成本与质量可控任意尺度超分辨率 | Kazutoshi Akita, Norimichi Ukita | http://arxiv.org/pdf/2412.05517v1 | None |
2024-12-07 | A Comparative Study of Image Denoising Algorithms | 图像去噪算法的比较研究 | Muhammad Umair Danish | http://arxiv.org/pdf/2412.05490v1 | None |
2024-12-07 | Enhancing Sample Generation of Diffusion Models using Noise Level Correction | 基于噪声水平校正增强扩散模型样本生成 | Abulikemu Abuduweili, Chenyang Yuan, Changliu Liu, Frank Permenter | http://arxiv.org/pdf/2412.05488v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-07 | Multimodal Fusion Balancing Through Game-Theoretic Regularization | 通过博弈论正则化的多模态融合平衡 | Konstantinos Kontras, Thomas Strypsteen, Christos Chatzichristos, Paul Pu Liang, Matthew Blaschko, Maarten De Vos | http://arxiv.org/pdf/2411.07335v2 | None |
2024-12-07 | HMGIE: Hierarchical and Multi-Grained Inconsistency Evaluation for Vision-Language Data Cleansing | 层次化和多粒度不一致性评估用于视觉-语言数据清洗 | Zihao Zhu, Hongbao Zhang, Guanzong Wu, Siwei Lyu, Baoyuan Wu | http://arxiv.org/pdf/2412.05685v1 | None |
2024-12-07 | RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts | RSUniVLM:基于粒度导向的专家混合的遥感统一视觉语言模型 | Xu Liu, Zhouhui Lian | http://arxiv.org/pdf/2412.05679v1 | https://github.com/xuliu-cyber/RSUniVLM |
2024-12-07 | Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning? | 《多模态大型语言模型真的能实现多模态情境学习吗?》 | Shuo Chen, Zhen Han, Bailan He, Jianzhe Liu, Mark Buckley, Yao Qin, Philip Torr, Volker Tresp | http://arxiv.org/pdf/2311.18021v2 | None |
2024-12-07 | Multimodal Biometric Authentication Using Camera-Based PPG and Fingerprint Fusion | 基于摄像头PPG和指纹融合的多模态生物识别认证 | Xue Xian Zheng, M. M. Ur Rahma, Bilal Taha, Mudassir Masood, Dimitrios Hatzinakos, Tareq Al-Naffouri | http://arxiv.org/pdf/2412.05660v1 | None |
2024-12-07 | Enhancing CLIP Conceptual Embedding through Knowledge Distillation | 通过知识蒸馏增强CLIP概念嵌入 | Kuei-Chun Kao | http://arxiv.org/pdf/2412.03513v2 | None |
2024-12-07 | Biological Brain Age Estimation using Sex-Aware Adversarial Variational Autoencoder with Multimodal Neuroimages | 基于多模态神经图像的性别感知对抗变分自编码器在生物大脑年龄估计中的应用 | Abd Ur Rehman, Azka Rehman, Muhammad Usman, Abdullah Shahid, Sung-Min Gho, Aleum Lee, Tariq M. Khan, Imran Razzak | http://arxiv.org/pdf/2412.05632v1 | None |
2024-12-07 | Dif4FF: Leveraging Multimodal Diffusion Models and Graph Neural Networks for Accurate New Fashion Product Performance Forecasting | Dif4FF:利用多模态扩散模型和图神经网络进行准确的新时尚产品性能预测 | Andrea Avogaro, Luigi Capogrosso, Franco Fummi, Marco Cristani | http://arxiv.org/pdf/2412.05566v1 | None |
2024-12-07 | Domain Generalization Using Large Pretrained Models with Mixture-of-Adapters | 基于混合适配器的预训练大模型在领域泛化中的应用 | Gyuseong Lee, Wooseok Jang, Jinhyeon Kim, Jaewoo Jung, Seungryong Kim | http://arxiv.org/pdf/2310.11031v2 | None |
2024-12-07 | WavFusion: Towards wav2vec 2.0 Multimodal Speech Emotion Recognition | WavFusion:迈向wav2vec 2.0 多模态语音情感识别 | Feng Li, Jiusong Luo, Wanjun Xia | http://arxiv.org/pdf/2412.05558v1 | None |
2024-12-07 | Comprehensive Evaluation of Multimodal AI Models in Medical Imaging Diagnosis: From Data Augmentation to Preference-Based Comparison | 多模态人工智能模型在医学影像诊断中的全面评估:从数据增强到基于偏好的比较 | Cailian Ruan, Chengyue Huang, Yahe Yang | http://arxiv.org/pdf/2412.05536v1 | None |
2024-12-07 | TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action | TACO:通过合成思维链与行动学习多模态动作模型 | Zixian Ma, Jianguo Zhang, Zhiwei Liu, Jieyu Zhang, Juntao Tan, Manli Shu, Juan Carlos Niebles, Shelby Heinecke | http://arxiv.org/pdf/2412.05479v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-07 | BEV-SUSHI: Multi-Target Multi-Camera 3D Detection and Tracking in Bird's-Eye View | BEV-SUSHI:鸟瞰视图中多目标多摄像头3D检测与跟踪 | Yizhou Wang, Tim Meinhardt, Orcun Cetintas, Cheng-Yen Yang, Sameer Satish Pusegaonkar, Benjamin Missaoui, Sujit Biswas, Zheng Tang | http://arxiv.org/pdf/2412.00692v2 | None |
2024-12-07 | Integrating YOLO11 and Convolution Block Attention Module for Multi-Season Segmentation of Tree Trunks and Branches in Commercial Apple Orchards | 将YOLO11与卷积块注意力模块集成,用于商业苹果园中树干和枝条的多季节分割 | Ranjan Sapkota, Manoj Karkee | http://arxiv.org/pdf/2412.05728v1 | None |
2024-12-07 | Impact of Sunglasses on One-to-Many Facial Identification Accuracy | 《太阳镜对一对一多人面部识别准确率的影响》 | Sicong Tian, Haiyu Wu, Michael C. King, Kevin W. Bowyer | http://arxiv.org/pdf/2412.05721v1 | None |
2024-12-07 | DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation | DeNVeR:用于无监督视频血管分割的可变形神经网络血管表示 | Chun-Hung Wu, Shih-Hong Chen, Chih-Yao Hu, Hsin-Yu Wu, Kai-Hsin Chen, Yu-You Chen, Chih-Hai Su, Chih-Kuo Lee | http://arxiv.org/pdf/2406.01591v3 | None |
2024-12-07 | Segment-Level Road Obstacle Detection Using Visual Foundation Model Priors and Likelihood Ratios | 基于视觉基础模型先验和似然比的道路障碍物段级检测 | Youssef Shoeb, Nazir Nayal, Azarm Nowzard, Fatma Güney, Hanno Gottschalk | http://arxiv.org/pdf/2412.05707v1 | None |
2024-12-07 | Early Diagnosis of Alzheimer's Diseases and Dementia from MRI Images Using an Ensemble Deep Learning | 基于集成深度学习的MRI图像早期诊断阿尔茨海默病和痴呆 | Mozhgan Naderi, Maryam Rastgarpour, Amir Reza Takhsha | http://arxiv.org/pdf/2412.05666v1 | None |
2024-12-07 | Nearly Solved? Robust Deepfake Detection Requires More than Visual Forensics | 几乎解决了吗?鲁棒的深度伪造检测需要超越视觉取证 | Guy Levy, Nathan Liebmann | http://arxiv.org/pdf/2412.05676v1 | None |
2024-12-07 | A Comprehensive Assessment Benchmark for Rigorously Evaluating Deep Learning Image Classifiers | 深度学习图像分类器严格评估的综合评估基准 | Michael W. Spratling | http://arxiv.org/pdf/2308.04137v2 | None |
2024-12-07 | Rethinking Annotation for Object Detection: Is Annotating Small-size Instances Worth Its Cost? | 重新思考目标检测的标注:标注小型实例是否物有所值? | Yusuke Hosoya, Masanori Suganuma, Takayuki Okatani | http://arxiv.org/pdf/2412.05611v1 | None |
2024-12-07 | Multispecies Animal Re-ID Using a Large Community-Curated Dataset | 多物种动物重识别:利用大型社区编纂数据集 | Lasha Otarashvili, Tamilselvan Subramanian, Jason Holmberg, J. J. Levenson, Charles V. Stewart | http://arxiv.org/pdf/2412.05602v1 | None |
2024-12-07 | Frequency Perception Network for Camouflaged Object Detection | 频率感知网络用于伪装物体检测 | Runmin Cong, Mengyao Sun, Sanyi Zhang, Xiaofei Zhou, Wei Zhang, Yao Zhao | http://arxiv.org/pdf/2308.08924v2 | None |
2024-12-07 | Real-Time 3D Object Detection Using InnovizOne LiDAR and Low-Power Hailo-8 AI Accelerator | 实时3D物体检测:基于InnovizOne激光雷达和低功耗Hailo-8 AI加速器 | Itay Krispin-Avraham, Roy Orfaig, Ben-Zion Bobrovsky | http://arxiv.org/pdf/2412.05594v1 | None |
2024-12-07 | Point-aware Interaction and CNN-induced Refinement Network for RGB-D Salient Object Detection | 点感知交互与CNN诱导的细化网络用于RGB-D显著目标检测 | Runmin Cong, Hongyu Liu, Chen Zhang, Wei Zhang, Feng Zheng, Ran Song, Sam Kwong | http://arxiv.org/pdf/2308.08930v2 | None |
2024-12-07 | SDDNet: Style-guided Dual-layer Disentanglement Network for Shadow Detection | SDDNet:用于阴影检测的基于风格的分层解耦网络 | Runmin Cong, Yuchen Guan, Jinpeng Chen, Wei Zhang, Yao Zhao, Sam Kwong | http://arxiv.org/pdf/2308.08935v2 | None |
2024-12-07 | Robust Sequential DeepFake Detection | 鲁棒性序列深度伪造检测 | Rui Shao, Tianxing Wu, Ziwei Liu | http://arxiv.org/pdf/2309.14991v2 | None |
2024-12-07 | Unsupervised Gait Recognition with Selective Fusion | 无监督步态识别中的选择性融合 | Xuqian Ren, Shaopeng Yang, Saihui Hou, Chunshui Cao, Xu Liu, Yongzhen Huang | http://arxiv.org/pdf/2303.10772v3 | None |
2024-12-07 | Query-guided Prototype Evolution Network for Few-Shot Segmentation | 基于查询引导的原型进化网络用于少样本分割 | Runmin Cong, Hang Xiong, Jinpeng Chen, Wei Zhang, Qingming Huang, Yao Zhao | http://arxiv.org/pdf/2403.06488v2 | None |
2024-12-07 | Jailbreak Large Vision-Language Models Through Multi-Modal Linkage | 通过多模态链接破解大型视觉-语言模型 | Yu Wang, Xiaofei Zhou, Yichen Wang, Geyuan Zhang, Tianxing He | http://arxiv.org/pdf/2412.00473v3 | https://github.com/wangyu-ovo/MML |
2024-12-07 | UNet++ and LSTM combined approach for Breast Ultrasound Image Segmentation | 基于UNet++和LSTM的乳腺超声图像分割联合方法 | Saba Hesaraki, Morteza Akbari, Ramin Mousa | http://arxiv.org/pdf/2412.05585v1 | None |
2024-12-07 | From Deterministic to Probabilistic: A Novel Perspective on Domain Generalization for Medical Image Segmentation | 从确定性到概率性:医学图像分割领域泛化的新视角 | Yuheng Xu, Taiping Zhang | http://arxiv.org/pdf/2412.05572v1 | None |
2024-12-07 | Exploiting Precision Mapping and Component-Specific Feature Enhancement for Breast Cancer Segmentation and Identification | 利用精确映射和组件特定特征增强进行乳腺癌分割与识别 | Pandiyaraju V, Shravan Venkatraman, Pavan Kumar S, Santhosh Malarvannan, Kannan A | http://arxiv.org/pdf/2407.02844v5 | None |
2024-12-07 | Neighborhood Commonality-aware Evolution Network for Continuous Generalized Category Discovery | 邻域共性感知的连续广义类别发现进化网络 | Ye Wang, Yaxiong Wang, Guoshuai Zhao, Xueming Qian | http://arxiv.org/pdf/2412.05573v1 | https://github.com/xjtuYW/NCENet.git |
2024-12-07 | Psych-Occlusion: Using Visual Psychophysics for Aerial Detection of Occluded Persons during Search and Rescue | 心理遮挡:利用视觉心理物理学进行搜救期间空中遮挡人员检测 | Arturo Miguel Russell Bernal, Jane Cleland-Huang, Walter Scheirer | http://arxiv.org/pdf/2412.05553v1 | https://github.com/ArtRuss/NOMAD. |
2024-12-07 | SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts | SAME:基于状态自适应专家混合的通用语言引导视觉导航学习 | Gengze Zhou, Yicong Hong, Zun Wang, Chongyang Zhao, Mohit Bansal, Qi Wu | http://arxiv.org/pdf/2412.05552v1 | None |
2024-12-07 | TLDR: Text Based Last-layer Retraining for Debiasing Image Classifiers | 基于文本的最后一层重新训练以去偏图像分类器 | Juhyeon Park, Seokhyeon Jeong, Taesup Moon | http://arxiv.org/pdf/2311.18291v2 | https://github.com/beotborry/TLDR |
2024-12-07 | GAQAT: gradient-adaptive quantization-aware training for domain generalization | GAQAT:用于领域泛化的梯度自适应量化感知训练 | Jiacheng Jiang, Yuan Meng, Chen Tang, Han Yu, Qun Li, Zhi Wang, Wenwu Zhu | http://arxiv.org/pdf/2412.05551v1 | None |
2024-12-07 | NOMAD: A Natural, Occluded, Multi-scale Aerial Dataset, for Emergency Response Scenarios | NOMAD:一种用于应急响应场景的自然、遮挡、多尺度航空数据集 | Arturo Miguel Russell Bernal, Walter Scheirer, Jane Cleland-Huang | http://arxiv.org/pdf/2309.09518v2 | https://github.com/ArtRuss/NOMAD. |
2024-12-07 | Action Recognition based Industrial Safety Violation Detection | 基于动作识别的工业安全违规检测 | Surya N Reddy, Vaibhav Kurrey, Mayank Nagar, Gagan Raj Gupta | http://arxiv.org/pdf/2412.05531v1 | None |
2024-12-07 | CLIP-TNseg: A Multi-Modal Hybrid Framework for Thyroid Nodule Segmentation in Ultrasound Images | CLIP-TNseg:超声图像甲状腺结节分割的多模态混合框架 | Xinjie Sun, Boxiong Wei, Yalong Jiang, Liquan Mao, Qi Zhao | http://arxiv.org/pdf/2412.05530v1 | https://github.com/jayxjsun/CLIP-TNseg. |
2024-12-07 | RealDex: Towards Human-like Grasping for Robotic Dexterous Hand | RealDex:迈向类人抓取的机器人灵巧手 | Yumeng Liu, Yaxun Yang, Youzhuo Wang, Xiaofei Wu, Jiamin Wang, Yichen Yao, Sören Schwertfeger, Sibei Yang | http://arxiv.org/pdf/2402.13853v2 | None |
2024-12-07 | DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection | 深度伪造适配器:用于深度伪造检测的双层适配器 | Rui Shao, Tianxing Wu, Liqiang Nie, Ziwei Liu | http://arxiv.org/pdf/2306.00863v2 | https://github.com/rshaojimmy/DeepFake-Adapter. |
2024-12-07 | Securing Social Media Against Deepfakes using Identity, Behavioral, and Geometric Signatures | 利用身份、行为和几何签名保障社交媒体免受深度伪造攻击 | Muhammad Umar Farooq, Awais Khan, Ijaz Ul Haq, Khalid Mahmood Malik | http://arxiv.org/pdf/2412.05487v1 | None |
2024-12-07 | DiffuBox: Refining 3D Object Detection with Point Diffusion | DiffuBox:基于点扩散的3D目标检测精炼 | Xiangyu Chen, Zhenzhen Liu, Katie Z Luo, Siddhartha Datta, Adhitya Polavaram, Yan Wang, Yurong You, Boyi Li | http://arxiv.org/pdf/2405.16034v2 | https://github.com/cxy1997/DiffuBox |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-07 | Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events | 黑天鹅:不可预测事件中的推论和可反驳的视频推理 | Aditya Chinchure, Sahithya Ravi, Raymond Ng, Vered Shwartz, Boyang Li, Leonid Sigal | http://arxiv.org/pdf/2412.05725v1 | None |
2024-12-07 | Efficient Continuous Video Flow Model for Video Prediction | 高效连续视频流模型用于视频预测 | Gaurav Shrivastava, Abhinav Shrivastava | http://arxiv.org/pdf/2412.05633v1 | None |
2024-12-07 | Video2Reward: Generating Reward Function from Videos for Legged Robot Behavior Learning | 视频2奖励:为腿部机器人行为学习生成奖励函数 | Runhao Zeng, Dingjie Zhou, Qiwei Liang, Junlin Liu, Hui Li, Changxin Huang, Jianqiang Li, Xiping Hu | http://arxiv.org/pdf/2412.05515v1 | None |
发布日期 | 英文标题 | 中文标题 | 作者 | PDF链接 | 代码链接 |
---|---|---|---|---|---|
2024-12-07 | Evaluating Automated Radiology Report Quality through Fine-Grained Phrasal Grounding of Clinical Findings | 通过临床发现的细粒度短语定位评估自动放射学报告质量 | Razi Mahmood, Pingkun Yan, Diego Machado Reyes, Ge Wang, Mannudeep K. Kalra, Parisa Kaviani, Joy T. Wu, Tanveer Syeda-Mahmood | http://arxiv.org/pdf/2412.01031v2 | None |
2024-12-07 | Neural network interpretability with layer-wise relevance propagation: novel techniques for neuron selection and visualization | 基于层级相关传播的神经网络可解释性:神经元选择和可视化的新方法 | Deepshikha Bhati, Fnu Neha, Md Amiruzzaman, Angela Guercio, Deepak Kumar Shukla, Ben Ward | http://arxiv.org/pdf/2412.05686v1 | None |
2024-12-07 | PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation | PrefixKV:视觉指令跟随模型高效生成所需的自适应前缀键值缓存 | Ao Wang, Hui Chen, Jianchao Tan, Kefeng Zhang, Xunliang Cai, Zijia Lin, Jungong Han, Guiguang Ding | http://arxiv.org/pdf/2412.03409v2 | https://github.com/THU-MIG/PrefixKV |
2024-12-07 | Automated LaTeX Code Generation from Handwritten Math Expressions Using Vision Transformer | 基于视觉Transformer的手写数学表达式自动LaTeX代码生成 | Jayaprakash Sundararaj, Akhil Vyas, Benjamin Gonzalez-Maldonado | http://arxiv.org/pdf/2412.03853v2 | None |