Skip to content

Latest commit

 

History

History
executable file
·
167 lines (140 loc) · 29.9 KB

2024-10-29.md

File metadata and controls

executable file
·
167 lines (140 loc) · 29.9 KB

[UPDATED!] 2024-10-29 (Publish Time)

生成模型

Publish Date Title Title_CN Authors PDF Code
2024-10-29 Capacity Control is an Effective Memorization Mitigation Mechanism in Text-Conditional Diffusion Models 文本条件扩散模型中的容量控制是有效的记忆缓解机制 Raman Dutt, Pedro Sanchez, Ondrej Bohdal, Sotirios A. Tsaftaris, Timothy Hospedales http://arxiv.org/pdf/2410.22149v1 link
2024-10-29 PACA: Perspective-Aware Cross-Attention Representation for Zero-Shot Scene Rearrangement PACA:基于视角感知的跨注意力表示,用于零样本场景重排 Shutong Jin, Ruiyu Wang, Kuangyi Chen, Florian T. Pokorny http://arxiv.org/pdf/2410.22059v1 null
2024-10-29 FANCL: Feature-Guided Attention Network with Curriculum Learning for Brain Metastases Segmentation 基于课程学习的特征引导注意力网络在脑转移瘤分割中的应用 Zijiang Liu, Xiaoyu Liu, Linhao Qu, Yonghong Shi http://arxiv.org/pdf/2410.22057v1 null
2024-10-29 PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference PrefPaint:将图像修复扩散模型与人类偏好对齐 Kendong Liu, Zhiyu Zhu, Chuanhao Li, Hui Liu, Huanqiang Zeng, Junhui Hou http://arxiv.org/pdf/2410.21966v1 null
2024-10-29 CT to PET Translation: A Large-scale Dataset and Domain-Knowledge-Guided Diffusion Approach CT到PET转换:大规模数据集和领域知识引导的扩散方法 Dac Thai Nguyen, Trung Thanh Nguyen, Huu Tien Nguyen, Thanh Trung Nguyen, Huy Hieu Pham, Thanh Hung Nguyen, Thao Nguyen Truong, Phi Le Nguyen http://arxiv.org/pdf/2410.21932v1 link
2024-10-29 Enhancing Learned Image Compression via Cross Window-based Attention 通过跨窗口注意力增强学习图像压缩 Priyanka Mudgal, Feng Liu http://arxiv.org/pdf/2410.21144v2 null
2024-10-29 EEG-Driven 3D Object Reconstruction with Color Consistency and Diffusion Prior 基于EEG驱动的具有颜色一致性和扩散先验的3D物体重建 Xin Xiang, Wenhui Zhou, Guojun Dai http://arxiv.org/pdf/2410.20981v2 null
2024-10-29 Fractal and Turbulent Feature Extraction and NFT Label Generation for Pollock Style Migration Paintings Based on VGG19 基于VGG19的波洛克风格迁移画作分形和湍流特征提取及NFT标签生成 Yiquan Wang, Xu Wang, Jiazhuo Pan http://arxiv.org/pdf/2410.20519v2 link
2024-10-29 Towards Visual Text Design Transfer Across Languages 跨语言视觉文本设计迁移 Yejin Choi, Jiwan Chung, Sumin Shim, Giyeong Oh, Youngjae Yu http://arxiv.org/pdf/2410.18823v2 null
2024-10-29 Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization 泛化一致性策略至视觉强化学习:优先近端经验正则化 Haoran Li, Zhennan Jiang, Yuhui Chen, Dongbin Zhao http://arxiv.org/pdf/2410.00051v2 null
2024-10-29 Multi-hypotheses Conditioned Point Cloud Diffusion for 3D Human Reconstruction from Occluded Images 多假设条件点云扩散:遮挡图像的3D人体重建 Donghwan Kim, Tae-Kyun Kim http://arxiv.org/pdf/2409.18364v3 link
2024-10-29 Aligning Machine and Human Visual Representations across Abstraction Levels 在抽象层次上对机器和人类视觉表示进行对齐 Lukas Muttenthaler, Klaus Greff, Frieda Born, Bernhard Spitzer, Simon Kornblith, Michael C. Mozer, Klaus-Robert Müller, Thomas Unterthiner, Andrew K. Lampinen http://arxiv.org/pdf/2409.06509v3 null
2024-10-29 Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images 基于扩散模型的少样本X射线图像地标检测自监督预训练 Roberto Di Via, Francesca Odone, Vito Paolo Pastore http://arxiv.org/pdf/2407.18125v2 null
2024-10-29 GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction 基于视图引导的高斯溅射扩散三维重建 Yuxuan Mu, Xinxin Zuo, Chuan Guo, Yilin Wang, Juwei Lu, Xiaofeng Wu, Songcen Xu, Peng Dai, Youliang Yan, Li Cheng http://arxiv.org/pdf/2407.04237v4 null
2024-10-29 FastDrag: Manipulate Anything in One Step 快速拖拽:一步操作操纵任何对象 Xuanjia Zhao, Jian Guan, Congyi Fan, Dongli Xu, Youtian Lin, Haiwei Pan, Pengming Feng http://arxiv.org/pdf/2405.15769v3 null
2024-10-29 SMART: Scalable Multi-agent Real-time Generation via Next-token Prediction SMART:基于下一标记预测的可扩展多智能体实时生成 Wei Wu, Xiaoxin Feng, Ziyan Gao, Yuheng Kan http://arxiv.org/pdf/2405.15677v2 link
2024-10-29 PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher PaGoDA:从低分辨率扩散教师逐步成长的一步生成器 Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon http://arxiv.org/pdf/2405.14822v2 null
2024-10-29 Benchmarking Counterfactual Image Generation 反事实图像生成基准测试 Thomas Melistas, Nikos Spyrou, Nefeli Gkouti, Pedro Sanchez, Athanasios Vlontzos, Yannis Panagakis, Giorgos Papanastasiou, Sotirios A. Tsaftaris http://arxiv.org/pdf/2403.20287v3 link
2024-10-29 A Probabilistic Hadamard U-Net for MRI Bias Field Correction 概率Hadamard U-Net用于MRI偏场校正 Xin Zhu, Hongyi Pan, Yury Velichko, Adam B. Murphy, Ashley Ross, Baris Turkbey, Ahmet Enis Cetin, Ulas Bagci http://arxiv.org/pdf/2403.05024v2 null

多模态

Publish Date Title Title_CN Authors PDF Code
2024-10-29 ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising 多模态专家型视频检索系统:用于情境广告的ContextIQ Ashutosh Chaubey, Anoubhav Agarwaal, Sartaki Sinha Roy, Aayush Agarwal, Susmita Ghose http://arxiv.org/pdf/2410.22233v1 null
2024-10-29 ADAM: An Embodied Causal Agent in Open-World Environments ADAM:开放世界环境中的具身因果代理 Shu Yu, Chaochao Lu http://arxiv.org/pdf/2410.22194v1 null
2024-10-29 Are VLMs Really Blind VLMs真的盲目吗? Ayush Singh, Mansi Gupta, Shivank Garg http://arxiv.org/pdf/2410.22029v1 null
2024-10-29 Feature distribution Adaptation Network for Speech Emotion Recognition 语音情感识别的特征分布自适应网络 Shaokai Li, Yixuan Ji, Peng Song, Haoqin Sun, Wenming Zheng http://arxiv.org/pdf/2410.22023v1 null
2024-10-29 A Survey on RGB, 3D, and Multimodal Approaches for Unsupervised Industrial Anomaly Detection RGB、3D和多模态方法在无监督工业异常检测中的应用综述 Yuxuan Lin, Yang Chang, Xuan Tong, Jiawen Yu, Antonio Liotta, Guofan Huang, Wei Song, Deyu Zeng, Zongze Wu, Yan Wang, et.al. http://arxiv.org/pdf/2410.21982v1 null
2024-10-29 Spatio-temporal Transformers for Action Unit Classification with Event Cameras 基于事件相机的动作单元分类时空变换器 Luca Cultrera, Federico Becattini, Lorenzo Berlincioni, Claudio Ferrari, Alberto Del Bimbo http://arxiv.org/pdf/2410.21958v1 null
2024-10-29 AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? AutoBench-V:大型视觉语言模型能否自我基准测试? Han Bao, Yue Huang, Yanbo Wang, Jiayi Ye, Xiangqi Wang, Xiuying Chen, Mohamed Elhoseiny, Xiangliang Zhang http://arxiv.org/pdf/2410.21259v2 link
2024-10-29 Non-rigid Relative Placement through 3D Dense Diffusion 三维密集扩散实现的非刚性相对定位 Eric Cai, Octavian Donca, Ben Eisner, David Held http://arxiv.org/pdf/2410.19247v2 null
2024-10-29 Visual Robustness Benchmark for Visual Question Answering (VQA) 视觉问答(VQA)视觉鲁棒性基准 Md Farhan Ishmam, Ishmam Tashdeed, Talukder Asir Saadat, Md Hamjajul Ashmafee, Abu Raihan Mostofa Kamal, Md. Azam Hossain http://arxiv.org/pdf/2407.03386v5 link
2024-10-29 M$^2$IST: Multi-Modal Interactive Side-Tuning for Efficient Referring Expression Comprehension M$^2$IST:多模态交互式边调优高效指代表达理解 Xuyang Liu, Ting Liu, Siteng Huang, Yi Xin, Yue Hu, Quanjun Yin, Donglin Wang, Honggang Chen http://arxiv.org/pdf/2407.01131v2 null
2024-10-29 MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs MMDU:多轮多图像对话理解基准与用于LVLMs的指令微调数据集 Ziyu Liu, Tao Chu, Yuhang Zang, Xilin Wei, Xiaoyi Dong, Pan Zhang, Zijian Liang, Yuanjun Xiong, Yu Qiao, Dahua Lin, et.al. http://arxiv.org/pdf/2406.11833v2 link
2024-10-29 No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance 没有指数级数据就无法实现“零样本”:预训练概念频率决定多模态模型性能 Vishaal Udandarao, Ameya Prabhu, Adhiraj Ghosh, Yash Sharma, Philip H. S. Torr, Adel Bibi, Samuel Albanie, Matthias Bethge http://arxiv.org/pdf/2404.04125v3 link
2024-10-29 VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark VLKEB:一个大规模视觉-语言模型知识编辑基准 Han Huang, Haitian Zhong, Tao Yu, Qiang Liu, Shu Wu, Liang Wang, Tieniu Tan http://arxiv.org/pdf/2403.07350v3 link
2024-10-29 GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification 全球文档:面向真实世界文档图像检索与分类的多模态视觉-语言框架 Souhail Bakkali, Sanket Biswas, Zuheng Ming, Mickaël Coustaty, Marçal Rusiñol, Oriol Ramos Terrades, Josep Lladós http://arxiv.org/pdf/2309.05756v2 null

Nerf

Publish Date Title Title_CN Authors PDF Code
2024-10-29 MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos MoDGS:从随意捕获的单目视频中动态高斯散布 Qingming Liu, Yuan Liu, Jiepeng Wang, Xianqiang Lyv, Peng Wang, Wenping Wang, Junhui Hou http://arxiv.org/pdf/2406.00434v2 null
2024-10-29 DOGS: Distributed-Oriented Gaussian Splatting for Large-Scale 3D Reconstruction Via Gaussian Consensus 基于高斯一致性的大规模3D重建的分布式高斯散点法 Yu Chen, Gim Hee Lee http://arxiv.org/pdf/2405.13943v2 link

3DGS

Publish Date Title Title_CN Authors PDF Code
2024-10-29 PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting PF3plat:免姿态前馈3D高斯喷溅 Sunghwan Hong, Jaewoo Jung, Heeseong Shin, Jisang Han, Jiaolong Yang, Chong Luo, Seungryong Kim http://arxiv.org/pdf/2410.22128v1 link
2024-10-29 ActiveSplat: High-Fidelity Scene Reconstruction through Active Gaussian Splatting ActiveSplat:通过主动高斯Splatting实现的高保真场景重建 Yuetao Li, Zijia Kuang, Ting Li, Guyue Zhou, Shaohui Zhang, Zike Yan http://arxiv.org/pdf/2410.21955v1 null
2024-10-29 OmniGS: Fast Radiance Field Reconstruction using Omnidirectional Gaussian Splatting 全向高斯溅射:快速辐射场重建 Longwei Li, Huajian Huang, Sai-Kit Yeung, Hui Cheng http://arxiv.org/pdf/2404.03202v4 null

模型压缩/优化

Publish Date Title Title_CN Authors PDF Code
2024-10-29 Effective Guidance for Model Attention with Simple Yes-no Annotations 有效利用简单是/否标注进行模型注意力的指导 Seongmin Lee, Ali Payani, Duen Horng, Chau http://arxiv.org/pdf/2410.22312v1 null
2024-10-29 Multi-Level Feature Distillation of Joint Teachers Trained on Distinct Image Datasets 多级特征蒸馏:基于不同图像数据集联合训练的教师模型 Adrian Iordache, Bogdan Alexe, Radu Tudor Ionescu http://arxiv.org/pdf/2410.22184v1 link
2024-10-29 HRPVT: High-Resolution Pyramid Vision Transformer for medium and small-scale human pose estimation HRPVT:适用于中尺度和小尺度人体姿态估计的高分辨率金字塔视觉Transformer Zhoujie Xu http://arxiv.org/pdf/2410.22079v1 null

分类/检测/识别/分割/...

Publish Date Title Title_CN Authors PDF Code
2024-10-29 Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier 多类文本逆转换秘密生成语义无关分类器 Kai Wang, Fei Yang, Bogdan Raducanu, Joost van de Weijer http://arxiv.org/pdf/2410.22317v1 link
2024-10-29 Guide3D: A Bi-planar X-ray Dataset for 3D Shape Reconstruction 双平面X射线数据集:用于三维形状重建的Guide3D Tudor Jianu, Baoru Huang, Hoan Nguyen, Binod Bhattarai, Tuong Do, Erman Tjiputra, Quang Tran, Pierre Berthet-Rayne, Ngan Le, Sebastiano Fichera, et.al. http://arxiv.org/pdf/2410.22224v1 null
2024-10-29 MAPUNetR: A Hybrid Vision Transformer and U-Net Architecture for Efficient and Interpretable Medical Image Segmentation MAPUNetR:一种混合视觉Transformer和U-Net架构的高效且可解释医学图像分割 Ovais Iqbal Shah, Danish Raza Rizvi, Aqib Nazir Mir http://arxiv.org/pdf/2410.22223v1 null
2024-10-29 Active Learning for Vision-Language Models 视觉-语言模型中的主动学习 Bardia Safaei, Vishal M. Patel http://arxiv.org/pdf/2410.22187v1 null
2024-10-29 Lighten CARAFE: Dynamic Lightweight Upsampling with Guided Reassemble Kernels 轻量级CARAFE:带引导重组核的动态轻量上采样 Ruigang Fu, Qingyong Hu, Xiaohu Dong, Yinghui Gao, Biao Li, Ping Zhong http://arxiv.org/pdf/2410.22139v1 link
2024-10-29 Lightweight Frequency Masker for Cross-Domain Few-Shot Semantic Segmentation 轻量级跨域小样本语义分割频率掩码器 Jintao Tong, Yixiong Zou, Yuhua Li, Ruixuan Li http://arxiv.org/pdf/2410.22135v1 null
2024-10-29 RankUp: Boosting Semi-Supervised Regression with an Auxiliary Ranking Classifier RankUp:通过辅助排序分类器提升半监督回归 Pin-Yen Huang, Szu-Wei Fu, Yu Tsao http://arxiv.org/pdf/2410.22124v1 link
2024-10-29 Hyperspectral Imaging-Based Perception in Autonomous Driving Scenarios: Benchmarking Baseline Semantic Segmentation Models 基于高光谱成像的自动驾驶场景感知:基准语义分割模型基准测试 Imad Ali Shah, Jiarong Li, Martin Glavin, Edward Jones, Enda Ward, Brian Deegan http://arxiv.org/pdf/2410.22101v1 null
2024-10-29 DINeuro: Distilling Knowledge from 2D Natural Images via Deformable Tubular Transferring Strategy for 3D Neuron Reconstruction DINeuro:通过可变形管状迁移策略从2D自然图像中提炼知识以实现3D神经元重建 Yik San Cheng, Runkai Zhao, Heng Wang, Hanchuan Peng, Yui Lo, Yuqian Chen, Lauren J. O'Donnell, Weidong Cai http://arxiv.org/pdf/2410.22078v1 null
2024-10-29 Benchmarking Human and Automated Prompting in the Segment Anything Model 段任意模型中的人机提示基准测试 Jorge Quesada, Zoe Fowler, Mohammad Alotaibi, Mohit Prabhushankar, Ghassan AlRegib http://arxiv.org/pdf/2410.22048v1 null
2024-10-29 A Machine Learning-Based Secure Face Verification Scheme and Its Applications to Digital Surveillance 基于机器学习的安全人脸验证方案及其在数字监控中的应用 Huan-Chih Wang, Ja-Ling Wu http://arxiv.org/pdf/2410.21993v1 null
2024-10-29 From Explicit Rules to Implicit Reasoning in an Interpretable Violence Monitoring System 从显式规则到可解释暴力监控系统的隐式推理 Wen-Dong Jiang, Chih-Yung Chang, Hsiang-Chuan Chang, Diptendu Sinha Roy http://arxiv.org/pdf/2410.21991v1 null
2024-10-29 BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays BenchX:胸部X光片医学视觉-语言预训练的统一基准框架 Yang Zhou, Tan Li Hui Faith, Yanyu Xu, Sicong Leng, Xinxing Xu, Yong Liu, Rick Siow Mong Goh http://arxiv.org/pdf/2410.21969v1 link
2024-10-29 FakeFormer: Efficient Vulnerability-Driven Transformers for Generalisable Deepfake Detection 伪前驱:高效、基于漏洞的Transformer,用于泛化型深度伪造检测 Dat Nguyen, Marcella Astrid, Enjie Ghorbel, Djamila Aouada http://arxiv.org/pdf/2410.21964v1 null
2024-10-29 Multi-step feature fusion for natural disaster damage assessment on satellite images 多步特征融合用于卫星图像上的自然灾害损害评估 Mateusz Żarski, Jarosław Adam Miszczak http://arxiv.org/pdf/2410.21901v1 link
2024-10-29 Advancing Efficient Brain Tumor Multi-Class Classification -- New Insights from the Vision Mamba Model in Transfer Learning 推进高效脑肿瘤多类别分类——视觉Mamba模型在迁移学习中的新见解 Yinyi Lai, Anbo Cao, Yuan Gao, Jiaqi Shang, Zongyu Li, Jia Guo http://arxiv.org/pdf/2410.21872v1 null
2024-10-29 HRGR: Enhancing Image Manipulation Detection via Hierarchical Region-aware Graph Reasoning HRGR:通过分层区域感知图推理增强图像操纵检测 Xudong Wang, Yuezun Li, Huiyu Zhou, Jiaran Zhou, Junyu Dong http://arxiv.org/pdf/2410.21861v1 null
2024-10-29 Paved or unpaved? A Deep Learning derived Road Surface Global Dataset from Mapillary Street-View Imagery Classification of Road Surface Materials Using Deep Learning Techniques from Mapillary Street-View Imagery Sukanya Randhawa, Eren Aygun, Guntaj Randhawa, Benjamin Herfort, Sven Lautenbach, Alexander Zipf http://arxiv.org/pdf/2410.19874v2 null
2024-10-29 Efficient Neural Network Training via Subset Pretraining 通过子集预训练的高效神经网络训练 Jan Spörer, Bernhard Bermeitinger, Tomas Hrycej, Niklas Limacher, Siegfried Handschuh http://arxiv.org/pdf/2410.16523v2 null
2024-10-29 An Integrated Deep Learning Model for Skin Cancer Detection Using Hybrid Feature Fusion Technique 基于混合特征融合技术的集成深度学习皮肤癌检测模型 Maksuda Akter, Rabea Khatun, Md. Alamin Talukder, Md. Manowarul Islam, Md. Ashraf Uddin http://arxiv.org/pdf/2410.14489v2 null
2024-10-29 Stratified Domain Adaptation: A Progressive Self-Training Approach for Scene Text Recognition 分层域自适应:场景文字识别的渐进式自训练方法 Kha Nhat Le, Hoang-Tuan Nguyen, Hung Tien Tran, Thanh Duc Ngo http://arxiv.org/pdf/2410.09913v3 link
2024-10-29 Spatial-Aware Conformal Prediction for Trustworthy Hyperspectral Image Classification 空间感知的符合预测在可靠的高光谱图像分类中的应用 Kangdao Liu, Tianhao Sun, Hao Zeng, Yongshan Zhang, Chi-Man Pun, Chi-Man Vong http://arxiv.org/pdf/2409.01236v2 link
2024-10-29 SeTAR: Out-of-Distribution Detection with Selective Low-Rank Approximation SeTAR:基于选择性低秩近似的分布外检测 Yixia Li, Boya Xiong, Guanhua Chen, Yun Chen http://arxiv.org/pdf/2406.12629v3 link
2024-10-29 CAMS: Convolution and Attention-Free Mamba-based Cardiac Image Segmentation CAMS:卷积和无注意力Mamba基于的心脏图像分割 Abbas Khan, Muhammad Asad, Martin Benning, Caroline Roney, Gregory Slabaugh http://arxiv.org/pdf/2406.05786v3 link
2024-10-29 Dissecting Query-Key Interaction in Vision Transformers 视觉Transformer中的查询-键交互剖析 Xu Pan, Aaron Philip, Ziqian Xie, Odelia Schwartz http://arxiv.org/pdf/2405.14880v3 null
2024-10-29 PointCompress3D: A Point Cloud Compression Framework for Roadside LiDARs in Intelligent Transportation Systems PointCompress3D:智能交通系统中路边激光雷达点云压缩框架 Walter Zimmer, Ramandika Pranamulia, Xingcheng Zhou, Mingyu Liu, Alois C. Knoll http://arxiv.org/pdf/2405.01750v2 null
2024-10-29 Texture, Shape and Order Matter: A New Transformer Design for Sequential DeepFake Detection 纹理、形状和顺序至关重要:用于序列深度伪造检测的新型Transformer设计 Yunfei Li, Yuezun Li, Xin Wang, Baoyuan Wu, Jiaran Zhou, Junyu Dong http://arxiv.org/pdf/2404.13873v3 null
2024-10-29 Location-Free Scene Graph Generation 无位置场景图生成 Ege Özsoy, Felix Holm, Mahdi Saleh, Tobias Czempiel, Chantal Pellegrini, Nassir Navab, Benjamin Busam http://arxiv.org/pdf/2303.10944v2 null

OCR

Publish Date Title Title_CN Authors PDF Code
2024-10-29 Structured Analysis and Comparison of Alphabets in Historical Handwritten Ciphers 历史手写密文字母结构分析与比较 Martín Méndez, Pau Torras, Adrià Molina, Jialuo Chen, Oriol Ramos-Terrades, Alicia Fornés http://arxiv.org/pdf/2410.21913v1 null

GNN

Publish Date Title Title_CN Authors PDF Code
2024-10-29 MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models MamMIL:基于状态空间模型的整张切片图像的多实例学习 Zijie Fang, Yifeng Wang, Ye Zhang, Zhi Wang, Jian Zhang, Xiangyang Ji, Yongbing Zhang http://arxiv.org/pdf/2403.05160v2 link

图像理解

Publish Date Title Title_CN Authors PDF Code
2024-10-29 Active Event Alignment for Monocular Distance Estimation 单目距离估计中的主动事件对齐 Nan Cai, Pia Bideau http://arxiv.org/pdf/2410.22280v1 null

LLM

Publish Date Title Title_CN Authors PDF Code
2024-10-29 Local Policies Enable Zero-shot Long-horizon Manipulation 本地策略实现零样本长时程操纵 Murtaza Dalal, Min Liu, Walter Talbott, Chen Chen, Deepak Pathak, Jian Zhang, Ruslan Salakhutdinov http://arxiv.org/pdf/2410.22332v1 null
2024-10-29 Natural Language Inference Improves Compositionality in Vision-Language Models 自然语言推理提升视觉-语言模型中的组合性 Paola Cascante-Bonilla, Yu Hou, Yang Trista Cao, Hal Daumé III, Rachel Rudinger http://arxiv.org/pdf/2410.22315v1 null
2024-10-29 Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective 面向视觉时代统一理解和生成:从自回归视角的综述 Shenghao Xie, Wenqiang Zu, Mingyang Zhao, Duo Su, Shilong Liu, Ruohua Shi, Guoqi Li, Shanghang Zhang, Lei Ma http://arxiv.org/pdf/2410.22217v1 null

Transformer

Publish Date Title Title_CN Authors PDF Code
2024-10-29 Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention 多目标3D地面定位:动态模块与语言信息驱动的空间注意力 Haomeng Zhang, Chiao-An Yang, Raymond A. Yeh http://arxiv.org/pdf/2410.22306v1 null
2024-10-29 Emotion-Guided Image to Music Generation 情感引导的图像到音乐生成 Souraja Kundu, Saket Singh, Yuji Iwahori http://arxiv.org/pdf/2410.22299v1 null
2024-10-29 NCA-Morph: Medical Image Registration with Neural Cellular Automata NCA-Morph:基于神经网络细胞自动机的医学图像配准 Amin Ranem, John Kalkhof, Anirban Mukhopadhyay http://arxiv.org/pdf/2410.22265v1 null
2024-10-29 MotionBooth: Motion-Aware Customized Text-to-Video Generation 动态感知定制化文本到视频生成 Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen http://arxiv.org/pdf/2406.17758v3 null

3D/CG

Publish Date Title Title_CN Authors PDF Code
2024-10-29 TractShapeNet: Efficient Multi-Shape Learning with 3D Tractography Point Clouds TractShapeNet:基于3D束形轨迹点云的高效多形状学习 Yui Lo, Yuqian Chen, Dongnan Liu, Jon Haitz Legarreta, Leo Zekelman, Fan Zhang, Jarrett Rushmore, Yogesh Rathi, Nikos Makris, Alexandra J. Golby, et.al. http://arxiv.org/pdf/2410.22099v1 null
2024-10-29 FreeGaussian: Guidance-free Controllable 3D Gaussian Splats with Flow Derivatives FreeGaussian:无需指导的可控3D高斯块与流导数 Qizhi Chen, Delin Qu, Yiwen Tang, Haoming Song, Yiting Zhang, Dong Wang, Bin Zhao, Xuelong Li http://arxiv.org/pdf/2410.22070v1 null
2024-10-29 Micro-Structures Graph-Based Point Cloud Registration for Balancing Efficiency and Accuracy 基于微观结构图点云配准的效率与精度平衡 Rongling Zhang, Li Yan, Pengcheng Wei, Hong Xie, Pinzhuo Wang, Binbing Wang http://arxiv.org/pdf/2410.21857v1 null

其他

Publish Date Title Title_CN Authors PDF Code
2024-10-29 Task Vectors are Cross-Modal 跨模态的任务向量 Grace Luo, Trevor Darrell, Amir Bar http://arxiv.org/pdf/2410.22330v1 null
2024-10-29 Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset 机器人预训练机器人:以操作为中心的机器人表示从大规模机器人数据集 Guangqi Jiang, Yifei Sun, Tao Huang, Huanyu Li, Yongyuan Liang, Huazhe Xu http://arxiv.org/pdf/2410.22325v1 null
2024-10-29 Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving Senna:连接大型视觉-语言模型与端到端自动驾驶 Bo Jiang, Shaoyu Chen, Bencheng Liao, Xingyu Zhang, Wei Yin, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang http://arxiv.org/pdf/2410.22313v1 null
2024-10-29 Motion Graph Unleashed: A Novel Approach to Video Prediction 视频预测的全新方法:运动图技术解禁 Yiqi Zhong, Luming Liang, Bohan Tang, Ilya Zharkov, Ulrich Neumann http://arxiv.org/pdf/2410.22288v1 null
2024-10-29 LiVisSfM: Accurate and Robust Structure-from-Motion with LiDAR and Visual Cues LiVisSfM:基于激光雷达和视觉线索的精确且鲁棒的位姿估计 Hanqing Jiang, Liyang Zhou, Zhuang Zhang, Yihao Yu, Guofeng Zhang http://arxiv.org/pdf/2410.22213v1 null
2024-10-29 Shining a Light on Hurricane Damage Estimation via Nighttime Light Data: Pre-processing Matters 借助夜间灯光数据揭示飓风损害评估:预处理至关重要 Nancy Thomas, Saba Rahimi, Annita Vapsi, Cathy Ansell, Elizabeth Christie, Daniel Borrajo, Tucker Balch, Manuela Veloso http://arxiv.org/pdf/2410.22150v1 null
2024-10-29 4D-based Robot Navigation Using Relativistic Image Processing 基于4D的机器人导航采用相对图像处理 Simone Müller, Dieter Kranzlmüller http://arxiv.org/pdf/2410.22087v1 null
2024-10-29 Analyzing Noise Models and Advanced Filtering Algorithms for Image Enhancement 分析噪声模型和高级滤波算法以增强图像 Sahil Ali Akbar, Ananya Verma http://arxiv.org/pdf/2410.21946v1 link
2024-10-29 ReMix: Training Generalized Person Re-identification on a Mixture of Data ReMix:基于数据混合训练通用行人重识别 Timur Mamedov, Anton Konushin, Vadim Konushin http://arxiv.org/pdf/2410.21938v1 null
2024-10-29 A Longitudinal Analysis of Racial and Gender Bias in New York Times and Fox News Images and Articles 纽约时报与福克斯新闻图像和文章中的种族和性别偏见纵向分析 Hazem Ibrahim, Nouar AlDahoul, Syed Mustafa Ali Abbasi, Fareed Zaffar, Talal Rahwan, Yasir Zaki http://arxiv.org/pdf/2410.21898v1 null
2024-10-29 Self-Relaxed Joint Training: Sample Selection for Severity Estimation with Ordinal Noisy Labels 自松弛联合训练:具有序数噪声标签的严重程度估计样本选择 Shumpei Takezaki, Kiyohito Tanaka, Seiichi Uchida http://arxiv.org/pdf/2410.21885v1 link
2024-10-29 Search Wide, Focus Deep: Automated Fetal Brain Extraction with Sparse Training Data 广泛搜索,深度聚焦:稀疏训练数据下的胎儿大脑自动提取 Javid Dadashkarimi, Valeria Pena Trujillo, Camilo Jaimes, Lilla Zöllei, Malte Hoffmann http://arxiv.org/pdf/2410.20532v2 null
2024-10-29 Integration of Communication and Computational Imaging 通信与计算成像的融合 Zhenming Yu, Liming Cheng, Hongyu Huang, Wei Zhang, Liang Lin, Kun Xu http://arxiv.org/pdf/2410.19415v2 null
2024-10-29 Contrastive Sequential-Diffusion Learning: Non-linear and Multi-Scene Instructional Video Synthesis 对比序列扩散学习:非线性和多场景指令视频生成 Vasco Ramos, Yonatan Bitton, Michal Yarom, Idan Szpektor, Joao Magalhaes http://arxiv.org/pdf/2407.11814v2 null
2024-10-29 Chemical Shift Encoding based Double Bonds Quantification in Triglycerides using Deep Image Prior 基于化学位移编码的甘油三酯中双键定量分析:深度图像先验 Chaoxing Huang, Ziqiang Yu, Zijian Gao, Qiuyi Shen, Queenie Chan, Vincent Wai-Sun Wong, Winnie Chiu-Wing Chu, Weitian Chen http://arxiv.org/pdf/2407.01926v4 null
2024-10-29 NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing 纳瑞康:融合扩散先验的自然精炼规范图像用于视频编辑 Ting-Hsuan Chen, Jiewen Chan, Hau-Shiang Shiu, Shih-Han Yen, Chang-Han Yeh, Yu-Lun Liu http://arxiv.org/pdf/2406.06523v2 link
2024-10-29 GO4Align: Group Optimization for Multi-Task Alignment GO4Align:多任务对齐的分组优化 Jiayi Shen, Cheems Wang, Zehao Xiao, Nanne Van Noord, Marcel Worring http://arxiv.org/pdf/2404.06486v2 link
2024-10-29 What Makes ImageNet Look Unlike LAION 图像网与LAION有何不同之处 Ali Shirali, Moritz Hardt http://arxiv.org/pdf/2306.15769v2 link