Table of Contents
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-12-19 | FedPIA -- Permuting and Integrating Adapters leveraging Wasserstein Barycenters for Finetuning Foundation Models in Multi-Modal Federated Learning | Pramit Saha et.al. | 2412.14424 | null |
2024-12-18 | Parameter-efficient Fine-tuning for improved Convolutional Baseline for Brain Tumor Segmentation in Sub-Saharan Africa Adult Glioma Dataset | Bijay Adhikari et.al. | 2412.14100 | null |
2024-12-18 | A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Method-Level Code Smell Detection | Beiqi Zhang et.al. | 2412.13801 | null |
2024-12-18 | Refining Salience-Aware Sparse Fine-Tuning Strategies for Language Models | Xinxin Liu et.al. | 2412.13488 | null |
2024-12-17 | Train More Parameters But Mind Their Placement: Insights into Language Adaptation with PEFT | Jenny Kunz et.al. | 2412.12674 | link |
2024-12-16 | Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering | Jinhe Bi et.al. | 2412.12359 | link |
2024-12-16 | A LoRA is Worth a Thousand Pictures | Chenxi Liu et.al. | 2412.12048 | null |
2024-12-11 | Adaptive Principal Components Allocation with the |
Jingjing Zheng et.al. | 2412.08592 | link |
2024-12-10 | PETALface: Parameter Efficient Transfer Learning for Low-resolution Face Recognition | Kartik Narayan et.al. | 2412.07771 | null |
2024-12-10 | MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning | Yufei Ma et.al. | 2412.07405 | null |
2024-12-13 | Crack-EdgeSAM Self-Prompting Crack Segmentation System for Edge Devices | Yingchu Wang et.al. | 2412.07205 | null |
2024-12-08 | Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization | Dongwei Wang et.al. | 2412.06858 | null |
2024-12-09 | BoRA: Bi-dimensional Weight-Decomposed Low-Rank Adaptation | Qiushi Wang et.al. | 2412.06441 | null |
2024-12-19 | S |
Xinyu Yang et.al. | 2412.06289 | null |
2024-12-08 | KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models | Fan Wang et.al. | 2412.06071 | link |
2024-12-07 | Training-Free Bayesianization for Low-Rank Adapters of Large Language Models | Haizhou Shi et.al. | 2412.05723 | link |
2024-12-06 | PETapter: Leveraging PET-style classification heads for modular few-shot parameter-efficient fine-tuning | Jonas Rieger et.al. | 2412.04975 | null |
2024-12-04 | Prompting Large Language Models for Clinical Temporal Relation Extraction | Jianping He et.al. | 2412.04512 | null |
2024-12-05 | SoRA: Singular Value Decomposed Low-Rank Adaptation for Domain Generalizable Representation Learning | Seokju Yun et.al. | 2412.04077 | link |
2024-12-04 | Improving Linguistic Diversity of Large Language Models with Possibility Exploration Fine-Tuning | Long Mai et.al. | 2412.03343 | link |
2024-12-03 | Mixture of Physical Priors Adapter for Parameter-Efficient Fine-Tuning | Zhaozhi Wang et.al. | 2412.02759 | null |
2024-12-03 | CPP-UT-Bench: Can LLMs Write Complex Unit Tests in C++? | Vaishnavi Bhargava et.al. | 2412.02735 | null |
2024-12-03 | LoRA Diffusion: Zero-Shot LoRA Synthesis for Diffusion Model Personalization | Ethan Smith et.al. | 2412.02352 | null |
2024-12-03 | A Comprehensive Evaluation of Large Language Models on Aspect-Based Sentiment Analysis | Changzhi Zhou et.al. | 2412.02279 | null |
2024-11-30 | Unified Parameter-Efficient Unlearning for LLMs | Chenlu Ding et.al. | 2412.00383 | null |
2024-11-29 | SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks | Kim-Celine Kahl et.al. | 2411.19688 | link |
2024-11-28 | Parameter-Efficient Transfer Learning for Music Foundation Models | Yiwei Ding et.al. | 2411.19371 | link |
2024-11-28 | PEFT-as-an-Attack! Jailbreaking Language Models during Federated Parameter-Efficient Fine-Tuning | Shenghui Li et.al. | 2411.19335 | null |
2024-11-28 | Enhancing Parameter-Efficient Fine-Tuning of Vision Transformers through Frequency-Based Adaptation | Son Thai Ly et.al. | 2411.19297 | link |
2024-11-27 | Challenges in Adapting Multilingual LLMs to Low-Resource Languages using LoRA PEFT Tuning | Omkar Khade et.al. | 2411.18571 | null |
2024-11-26 | PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning | Zhen Sun et.al. | 2411.17453 | null |
2024-11-29 | Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning | Hui-Yue Yang et.al. | 2411.17217 | null |
2024-11-25 | Towards Efficient Model-Heterogeneity Federated Learning for Large Models | Ruofan Jia et.al. | 2411.16796 | null |
2024-11-25 | Parameter Efficient Instruction Tuning: An Empirical Study | Pengfei He et.al. | 2411.16775 | null |
2024-11-25 | Graph Adapter of EEG Foundation Models for Parameter Efficient Fine Tuning | Toyotaro Suzumura et.al. | 2411.16155 | null |
2024-11-24 | Efficient and Private: Memorisation under differentially private parameter-efficient fine-tuning in language models | Olivia Ma et.al. | 2411.15831 | null |
2024-11-21 | Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation | Seokil Ham et.al. | 2411.15224 | null |
2024-11-22 | LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement | Jieming Bian et.al. | 2411.14961 | null |
2024-11-21 | Multi LoRA Meets Vision: Merging multiple adapters to create a multi task model | Ege Kesim et.al. | 2411.14064 | null |
2024-11-17 | F |
Pramit Saha et.al. | 2411.11912 | null |
2024-11-16 | HELENE: Hessian Layer-wise Clipping and Gradient Annealing for Accelerating Fine-tuning LLM with Zeroth-order Optimization | Huaqin Zhao et.al. | 2411.10696 | null |
2024-11-12 | PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model | Yilun Liu et.al. | 2411.08212 | null |
2024-11-10 | Prompt-Efficient Fine-Tuning for GPT-like Deep Models to Reduce Hallucination and to Improve Reproducibility in Scientific Text Generation Using Stochastic Optimisation Techniques | Daniil Sulimov et.al. | 2411.06445 | null |
2024-11-06 | MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba | Masakazu Yoshimura et.al. | 2411.03855 | null |
2024-11-04 | PipeLLM: Fast and Confidential Large Language Model Services with Speculative Pipelined Encryption | Yifan Tan et.al. | 2411.03357 | null |
2024-11-05 | Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation | Junchen Fu et.al. | 2411.02992 | null |
2024-11-04 | Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study | André Storhaug et.al. | 2411.02462 | null |
2024-11-04 | Expanding Sparse Tuning for Low Memory Usage | Shufan Shen et.al. | 2411.01800 | link |
2024-11-15 | Visual Fourier Prompt Tuning | Runjia Zeng et.al. | 2411.01327 | link |
2024-10-31 | CleaR: Towards Robust and Generalized Parameter-Efficient Fine-Tuning for Noisy Label Learning | Yeachan Kim et.al. | 2411.00873 | null |
2024-10-30 | FPE-LLM: Highly Intelligent Time-Series Forecasting and Language Interaction LLM in Energy Systems | Zihang Qiu et.al. | 2411.00852 | null |
2024-11-01 | Dual Low-Rank Adaptation for Continual Learning with Pre-Trained Models | Huancheng Chen et.al. | 2411.00623 | null |
2024-11-01 | Is Multiple Object Tracking a Matter of Specialization? | Gianluca Mancusi et.al. | 2411.00553 | null |
2024-11-01 | C2A: Client-Customized Adaptation for Parameter-Efficient Federated Learning | Yeachan Kim et.al. | 2411.00311 | link |
2024-10-29 | Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models | Donghoon Kim et.al. | 2411.00029 | null |
2024-10-30 | Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation | Wei Dong et.al. | 2410.22952 | null |
2024-10-30 | MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning | Xujia Wang et.al. | 2410.22782 | null |
2024-10-29 | Meta-Learning Adaptable Foundation Models | Jacob L. Block et.al. | 2410.22264 | null |
2024-10-29 | Capacity Control is an Effective Memorization Mitigation Mechanism in Text-Conditional Diffusion Models | Raman Dutt et.al. | 2410.22149 | link |
2024-10-30 | IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models | Hang Guo et.al. | 2410.21759 | link |
2024-10-28 | KD-LoRA: A Hybrid Approach to Efficient Fine-Tuning with LoRA and Knowledge Distillation | Rambod Azimi et.al. | 2410.20777 | link |
2024-10-27 | Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation | Maohao Shen et.al. | 2410.20336 | null |
2024-11-01 | Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies | Luping Wang et.al. | 2410.19878 | null |
2024-10-23 | MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning | Jingfan Zhang et.al. | 2410.18035 | null |
2024-10-22 | Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations | Cheng Lei et.al. | 2410.16953 | null |
2024-10-22 | MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report | Samrajya Thapa et.al. | 2410.16239 | link |
2024-10-21 | Natural GaLore: Accelerating GaLore for memory-efficient LLM Training and Fine-tuning | Arijit Das et.al. | 2410.16029 | link |
2024-10-18 | Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation | Shuai Zhao et.al. | 2410.14425 | link |
2024-10-17 | LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning | Yiming Shi et.al. | 2410.13618 | link |
2024-10-16 | Communication-Efficient and Tensorized Federated Fine-Tuning of Large Language Models | Sajjad Ghiasvand et.al. | 2410.13097 | null |
2024-10-17 | Prompt Compression for Large Language Models: A Survey | Zongqian Li et.al. | 2410.12388 | link |
2024-10-15 | Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models | Kai Yao et.al. | 2410.11772 | link |
2024-10-15 | LoKO: Low-Rank Kalman Optimizer for Online Fine-Tuning of Large Models | Hossein Abdi et.al. | 2410.11551 | null |
2024-10-15 | RoCoFT: Efficient Finetuning of Large Language Models with Row-Column Updates | Md Kowsher et.al. | 2410.10075 | link |
2024-10-13 | BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation | Peijia Qin et.al. | 2410.09758 | null |
2024-10-12 | Towards Efficient Visual-Language Alignment of the Q-Former for Visual Reasoning Tasks | Sungkyung Kim et.al. | 2410.09489 | link |
2024-10-15 | MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning | Yaming Yang et.al. | 2410.09437 | null |
2024-10-09 | Parameter-Efficient Fine-Tuning via Selective Discrete Cosine Transform | Yixian Shen et.al. | 2410.09103 | null |
2024-10-04 | BIPEFT: Budget-Guided Iterative Search for Parameter Efficient Fine-Tuning of Large Pretrained Language Models | Aofei Chang et.al. | 2410.09079 | null |
2024-10-11 | Parameter-Efficient Fine-Tuning of State Space Models | Kevin Galim et.al. | 2410.09016 | link |
2024-10-10 | Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud Learning | Dingkang Liang et.al. | 2410.08114 | link |
2024-10-10 | SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture | Jiayi Han et.al. | 2410.07739 | null |
2024-10-10 | Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank Structures | Yiming Chen et.al. | 2410.07698 | link |
2024-10-09 | SparseGrad: A Selective Method for Efficient Fine-tuning of MLP Layers | Viktoriia Chekalina et.al. | 2410.07383 | link |
2024-10-09 | Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs | Ruijia Niu et.al. | 2410.06431 | null |
2024-10-08 | Are Large Language Models State-of-the-art Quality Estimators for Machine Translation of User-generated Content? | Shenbin Qian et.al. | 2410.06338 | link |
2024-10-15 | LoRTA: Low Rank Tensor Adaptation of Large Language Models | Ignacio Hounie et.al. | 2410.04060 | null |
2024-10-03 | Llama SLayer 8B: Shallow Layers Hold the Key to Knowledge Injection | Tianxiang Chen et.al. | 2410.02330 | link |
2024-10-02 | TPP-LLM: Modeling Temporal Point Processes by Efficiently Fine-Tuning Large Language Models | Zefang Liu et.al. | 2410.02062 | link |
2024-10-02 | NEAT: Nonlinear Parameter-efficient Adaptation of Pre-trained Models | Yibo Zhong et.al. | 2410.01870 | null |
2024-09-27 | A GEN AI Framework for Medical Note Generation | Hui Yi Leong et.al. | 2410.01841 | null |
2024-10-02 | DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models | Yuxuan Zhang et.al. | 2410.01497 | link |
2024-10-01 | PrivTuner with Homomorphic Encryption and LoRA: A P3EFT Scheme for Privacy-Preserving Parameter-Efficient Fine-Tuning of AI Foundation Models | Yang Li et.al. | 2410.00433 | null |
2024-09-30 | Adapting LLMs for the Medical Domain in Portuguese: A Study on Fine-Tuning and Model Evaluation | Pedro Henrique Paiola et.al. | 2410.00163 | null |
2024-09-30 | Resource Allocation for Stable LLM Training in Mobile Edge Computing | Chang Liu et.al. | 2409.20247 | null |
2024-09-30 | Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models | Luohe Shi et.al. | 2409.20181 | null |
2024-09-28 | FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models | Yucheng Xie et.al. | 2409.19289 | null |
2024-10-01 | Backdoor Attacks for LLMs with Weak-To-Strong Knowledge Distillation | Shuai Zhao et.al. | 2409.17946 | null |
2024-09-26 | PEDRO: Parameter-Efficient Fine-tuning with Prompt DEpenDent Representation MOdification | Tianfang Xie et.al. | 2409.17834 | null |
2024-09-30 | Efficient In-Domain Question Answering for Resource-Constrained Environments | Isaac Chung et.al. | 2409.17648 | null |
2024-10-07 | PACE: marrying generalization in PArameter-efficient fine-tuning with Consistency rEgularization | Yao Ni et.al. | 2409.17137 | link |
2024-09-25 | Parameter-efficient Bayesian Neural Networks for Uncertainty-aware Depth Estimation | Richard D. Paul et.al. | 2409.17085 | null |
2024-10-02 | Bone: Block Affine Transformation as Parameter Efficient Fine-tuning Methods for Large Language Models | Jiale Kang et.al. | 2409.15371 | link |
2024-09-22 | Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape | Tao Li et.al. | 2409.14396 | null |
2024-10-01 | Obliviate: Neutralizing Task-agnostic Backdoors within the Parameter-efficient Fine-tuning Paradigm | Jaehan Kim et.al. | 2409.14119 | link |
2024-09-20 | HUT: A More Computation Efficient Fine-Tuning Method With Hadamard Updated Transformation | Geyuan Zhang et.al. | 2409.13501 | null |
2024-09-17 | THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models | Mengfei Liang et.al. | 2409.11353 | link |
2024-09-17 | LPT++: Efficient Training on Mixture of Long-tailed Experts | Bowen Dong et.al. | 2409.11323 | null |
2024-09-17 | Beyond LoRA: Exploring Efficient Fine-Tuning Techniques for Time Series Foundational Models | Divij Gupta et.al. | 2409.11302 | null |
2024-09-18 | Propulsion: Steering LLM with Tiny Fine-Tuning | Md Kowsher et.al. | 2409.10927 | link |
2024-09-16 | From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs | Navya Jain et.al. | 2409.10245 | null |
2024-09-14 | COMFORT: A Continual Fine-Tuning Framework for Foundation Models Targeted at Consumer Healthcare | Chia-Hao Li et.al. | 2409.09549 | null |
2024-09-14 | Comparing Retrieval-Augmentation and Parameter-Efficient Fine-Tuning for Privacy-Preserving Personalization of Large Language Models | Alireza Salemi et.al. | 2409.09510 | link |
2024-09-13 | Risks When Sharing LoRA Fine-Tuned Diffusion Model Weights | Dixi Yao et.al. | 2409.08482 | null |
2024-09-12 | Do Vision Foundation Models Enhance Domain Generalization in Medical Image Segmentation? | Kerem Cekmeceli et.al. | 2409.07960 | link |
2024-09-11 | Efficient Localized Adaptation of Neural Weather Forecasting: A Case Study in the MENA Region | Muhammad Akhtar Munir et.al. | 2409.07585 | link |
2024-09-10 | Sam2Rad: A Segmentation Model for Medical Images with Learnable Prompts | Assefa Seyoum Wahd et.al. | 2409.06821 | link |
2024-09-11 | Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models | Yao Shu et.al. | 2409.06277 | link |
2024-09-09 | SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values | Chengwei Sun et.al. | 2409.05926 | null |
2024-09-10 | Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment | Zhixian Zhao et.al. | 2409.05015 | null |
2024-09-06 | Customizing Large Language Model Generation Style using Parameter-Efficient Finetuning | Xinyue Liu et.al. | 2409.04574 | null |
2024-09-04 | iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation | Hayeon Jo et.al. | 2409.02838 | null |
2024-09-04 | Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs | Ruoyu Wang et.al. | 2409.02686 | null |
2024-09-04 | Robust Federated Finetuning of Foundation Models via Alternating Minimization of LoRA | Shuangyi Chen et.al. | 2409.02346 | null |
2024-09-02 | Unleashing the Power of Task-Specific Directions in Parameter Efficient Fine-tuning | Chongjie Si et.al. | 2409.01035 | link |
2024-08-28 | 3-in-1: 2D Rotary Adaptation for Efficient Finetuning, Efficient Batching and Composability | Baohao Liao et.al. | 2409.00119 | link |
2024-08-21 | SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models | Yang Cao et.al. | 2409.00055 | link |
2024-08-30 | MoRe Fine-Tuning with 10x Fewer Parameters | Wenxuan Tan et.al. | 2408.17383 | link |
2024-09-02 | Instant Adversarial Purification with Adversarial Consistency Distillation | Chun Tong Lei et.al. | 2408.17064 | null |
2024-08-28 | Scaling Up Summarization: Leveraging Large Language Models for Long Text Extractive Summarization | Léo Hemamou et.al. | 2408.15801 | null |
2024-08-27 | GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs | Maxim Zhelnin et.al. | 2408.15300 | link |
2024-08-27 | Pre-training Everywhere: Parameter-Efficient Fine-Tuning for Medical Image Analysis via Target Parameter Pre-training | Xingliang Lei et.al. | 2408.15011 | null |
2024-08-27 | CVPT: Cross-Attention help Visual Prompt Tuning adapt visual task | Lingyun Huang et.al. | 2408.14961 | link |
2024-08-27 | Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models | Aradhye Agarwal et.al. | 2408.14470 | link |
2024-08-24 | Advancing Enterprise Spatio-Temporal Forecasting Applications: Data Mining Meets Instruction Tuning of Language Models For Multi-modal Time Series Analysis in Low-Resource Settings | Sagar Srinivas Sakhinana et.al. | 2408.13622 | null |
2024-08-21 | Positional Prompt Tuning for Efficient 3D Representation Learning | Shaochen Zhang et.al. | 2408.11567 | link |
2024-08-20 | Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-Tuning | Bei Ouyang et.al. | 2408.10746 | null |
2024-08-20 | TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning | Bin Wang et.al. | 2408.10688 | link |
2024-08-19 | TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition | Tianwei Lin et.al. | 2408.09856 | link |
2024-08-16 | Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models | Vladimir Araujo et.al. | 2408.09053 | null |
2024-08-14 | KIND: Knowledge Integration and Diversion in Diffusion Models | Yucheng Xie et.al. | 2408.07337 | null |
2024-08-30 | TaSL: Task Skill Localization and Consolidation for Language Model Continual Learning | Yujie Feng et.al. | 2408.05200 | link |
2024-08-08 | Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models | Yupeng Chang et.al. | 2408.04556 | link |
2024-08-06 | SARA: Singular-Value Based Adaptive Low-Rank Adaption | Jihao Gu et.al. | 2408.03290 | null |
2024-08-06 | Leveraging Parameter Efficient Training Methods for Low Resource Text Classification: A Case Study in Marathi | Pranita Deshmukh et.al. | 2408.03172 | null |
2024-08-03 | TS-SAM: Fine-Tuning Segment-Anything Model for Downstream Tasks | Yang Yu et.al. | 2408.01835 | link |
2024-08-02 | MoDE: Effective Multi-task Parameter Efficient Fine-Tuning with a Mixture of Dyadic Experts | Lin Ning et.al. | 2408.01505 | null |
2024-08-02 | Tensor Train Low-rank Approximation (TT-LoRA): Democratizing AI with Accelerated LLMs | Afia Anjum et.al. | 2408.01008 | null |
2024-07-31 | A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation | Mothilal Asokan et.al. | 2407.21739 | null |
2024-07-28 | Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models | Jifeng Wang et.al. | 2407.19564 | link |
2024-07-24 | Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective | Jingren Liu et.al. | 2407.17120 | null |
2024-07-22 | Zero-Shot Embeddings Inform Learning and Forgetting with Vision-Language Encoders | Laura Niss et.al. | 2407.15731 | null |
2024-07-21 | Learn to Preserve and Diversify: Parameter-Efficient Group with Orthogonal Regularization for Domain Generalization | Jiajun Hu et.al. | 2407.15085 | null |
2024-07-16 | InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification | Yujia Hu et.al. | 2407.12882 | link |
2024-07-18 | Turning Generative Models Degenerate: The Power of Data Poisoning Attacks | Shuli Jiang et.al. | 2407.12281 | null |
2024-07-16 | Probing the Efficacy of Federated Parameter-Efficient Fine-Tuning of Vision Transformers for Medical Image Classification | Naif Alkhunaizi et.al. | 2407.11573 | null |
2024-07-16 | An efficient framework based on large foundation model for cervical cytopathology whole slide image screening | Jialong Huang et.al. | 2407.11486 | link |
2024-07-10 | RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization | Xijie Huang et.al. | 2407.08044 | link |
2024-07-10 | ROSA: Random Subspace Adaptation for Efficient Fine-Tuning | Marawan Gamal Abdel Hameed et.al. | 2407.07802 | link |
2024-07-10 | Parameter Efficient Fine Tuning for Multi-scanner PET to PET Reconstruction | Yumin Kim et.al. | 2407.07517 | null |
2024-07-09 | Reprogramming Distillation for Medical Foundation Models | Yuhang Zhou et.al. | 2407.06504 | null |
2024-07-07 | See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition | Chongjie Si et.al. | 2407.05417 | link |
2024-07-16 | LoRA-GA: Low-Rank Adaptation with Gradient Approximation | Shaowen Wang et.al. | 2407.05000 | link |
2024-07-05 | GPT vs RETRO: Exploring the Intersection of Retrieval and Parameter-Efficient Fine-Tuning | Aleksander Ficek et.al. | 2407.04528 | null |
2024-07-04 | Deep Content Understanding Toward Entity and Aspect Target Sentiment Analysis on Foundation Models | Vorakit Vorakitphan et.al. | 2407.04050 | link |
2024-07-04 | ASteISR: Adapting Single Image Super-resolution Pre-trained Model for Efficient Stereo Image Super-resolution | Yuanbo Zhou et.al. | 2407.03598 | null |
2024-07-03 | Knowledge Composition using Task Vectors with Learned Anisotropic Scaling | Frederic Z. Zhang et.al. | 2407.02880 | link |
2024-07-03 | Exploring the Capabilities of LLMs for Code Change Related Tasks | Lishui Fan et.al. | 2407.02824 | link |
2024-07-02 | FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs | Haodong Chen et.al. | 2407.02157 | null |
2024-07-02 | CatMemo at the FinLLM Challenge Task: Fine-Tuning Large Language Models using Data Fusion in Financial Applications | Yupeng Cao et.al. | 2407.01953 | null |
2024-07-05 | Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models | Zihan Wang et.al. | 2407.01906 | link |
2024-07-01 | A Fingerprint for Large Language Models | Zhiguang Yang et.al. | 2407.01235 | null |
2024-07-02 | Embedded Prompt Tuning: Towards Enhanced Calibration of Pretrained Models for Medical Images | Wenqiang Zu et.al. | 2407.01003 | link |
2024-06-25 | Structured Unrestricted-Rank Matrices for Parameter Efficient Fine-tuning | Arijit Sehanobish et.al. | 2406.17740 | null |
2024-06-19 | Parameter Training Efficiency Aware Resource Allocation for AIGC in Space-Air-Ground Integrated Networks | Liangxin Qian et.al. | 2406.13602 | null |
2024-06-19 | Sparse High Rank Adapters | Kartikeya Bhardwaj et.al. | 2406.13175 | null |
2024-06-18 | Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates | Cristian Meo et.al. | 2406.13046 | null |
2024-06-18 | Fighting Randomness with Randomness: Mitigating Optimisation Instability of Fine-Tuning using Delayed Ensemble and Noisy Interpolation | Branislav Pecher et.al. | 2406.12471 | link |
2024-06-17 | A Semantic-based Layer Freezing Approach to Efficient Fine-Tuning of Language Models | Jian Gu et.al. | 2406.11753 | null |
2024-06-16 | ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts | Samar Khanna et.al. | 2406.10973 | null |
2024-06-16 | ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation | Yurun Song et.al. | 2406.10785 | null |
2024-06-16 | RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained Language Model for Knowledge Editing and Fine-tuning | Haoyu Wang et.al. | 2406.10777 | null |
2024-06-15 | Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models | Ruchao Fan et.al. | 2406.10507 | link |
2024-06-15 | Personalized Pieces: Efficient Personalized Large Language Models through Collaborative Efforts | Zhaoxuan Tan et.al. | 2406.10471 | link |
2024-06-13 | Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models | Lukas Thede et.al. | 2406.09384 | null |
2024-06-12 | Exploring Fact Memorization and Style Imitation in LLMs Using QLoRA: An Experimental Study and Quality Assessment Methods | Eugene Vyborov et.al. | 2406.08582 | null |
2024-06-12 | The Impact of Initialization on LoRA Finetuning Dynamics | Soufiane Hayou et.al. | 2406.08447 | null |
2024-06-20 | Low-Rank Quantization-Aware Training for LLMs | Yelysei Bondarenko et.al. | 2406.06385 | link |
2024-06-10 | A Parameter-efficient Language Extension Framework for Multilingual ASR | Wei Liu et.al. | 2406.06329 | null |
2024-06-09 | A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Automated Program Repair | Guochang Li et.al. | 2406.05639 | link |
2024-06-07 | Efficient Differentially Private Fine-Tuning of Diffusion Models | Jing Liu et.al. | 2406.05257 | null |
2024-06-07 | CorDA: Context-Oriented Decomposition Adaptation of Large Language Models | Yibo Yang et.al. | 2406.05223 | link |
2024-06-07 | An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models | Xiongtao Zhou et.al. | 2406.05130 | link |
2024-06-07 | MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter | Jitai Hao et.al. | 2406.04984 | link |
2024-06-06 | Time Sensitive Knowledge Editing through Efficient Finetuning | Xiou Ge et.al. | 2406.04496 | link |
2024-06-06 | VHDL-Eval: A Framework for Evaluating Large Language Models in VHDL Code Generation | Prashanth Vijayaraghavan et.al. | 2406.04379 | null |
2024-06-10 | Hypernetworks for Personalizing ASR to Atypical Speech | Max Müller-Eberstein et.al. | 2406.04240 | null |
2024-06-06 | Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning | Naibin Gu et.al. | 2406.03792 | link |
2024-06-05 | Choice of PEFT Technique in Continual Learning: Prompt Tuning is Not All You Need | Martin Wistuba et.al. | 2406.03216 | null |
2024-06-06 | Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision | Minglei Li et.al. | 2406.03051 | null |
2024-05-31 | Mamba State-Space Models Can Be Strong Downstream Learners | John T. Halloran et.al. | 2406.00209 | null |
2024-05-30 | ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections | Massimo Bini et.al. | 2405.20271 | link |
2024-05-30 | SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors | Vijay Lingam et.al. | 2405.19597 | link |
2024-05-29 | MemControl: Mitigating Memorization in Medical Diffusion Models via Automated Parameter Selection | Raman Dutt et.al. | 2405.19458 | link |
2024-05-29 | MLAE: Masked LoRA Experts for Parameter-Efficient Fine-Tuning | Junjie Wang et.al. | 2405.18897 | link |
2024-05-29 | Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation | Zelin Peng et.al. | 2405.18840 | null |
2024-06-01 | Low-Rank Few-Shot Adaptation of Vision-Language Models | Maxime Zanella et.al. | 2405.18541 | null |
2024-05-28 | Semantic are Beacons: A Semantic Perspective for Unveiling Parameter-Efficient Fine-Tuning in Knowledge Learning | Renzhi Wang et.al. | 2405.18292 | null |
2024-05-28 | VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections | Roy Miles et.al. | 2405.17991 | link |
2024-05-28 | Sparsity- and Hybridity-Inspired Visual Parameter-Efficient Fine-Tuning for Medical Diagnosis | Mingyuan Liu et.al. | 2405.17877 | null |
2024-05-27 | LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters | Klaudia Bałazy et.al. | 2405.17604 | link |
2024-05-23 | EMR-Merging: Tuning-Free High-Performance Model Merging | Chenyu Huang et.al. | 2405.17461 | link |
2024-05-28 | DoRA: Enhancing Parameter-Efficient Fine-Tuning with Dynamic Rank Distribution | Yulong Mao et.al. | 2405.17357 | link |
2024-05-27 | Runqian Wang et.al. | 2405.17258 | null | |
2024-05-30 | Sparse Matrix in Large Language Model Fine-tuning | Haoze He et.al. | 2405.15525 | null |
2024-05-24 | Prompt Tuning Strikes Back: Customizing Foundation Models with Low-Rank Prompt Adaptation | Abhinav Jain et.al. | 2405.15282 | link |
2024-05-27 | VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks | Yang Li et.al. | 2405.15179 | link |
2024-05-23 | Bitune: Bidirectional Instruction-Tuning | Dawid J. Kopiczko et.al. | 2405.14862 | null |
2024-05-23 | Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference | Ting Liu et.al. | 2405.14700 | link |
2024-05-22 | Spectral Adapter: Fine-Tuning in Spectral Space | Fangzhao Zhang et.al. | 2405.13952 | link |
2024-05-24 | MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models | Jingwei Xu et.al. | 2405.13053 | link |
2024-05-20 | FeTT: Continual Class Incremental Learning via Feature Transformation Tuning | Sunyuan Qiang et.al. | 2405.11822 | null |
2024-05-21 | HARIS: Human-Like Attention for Reference Image Segmentation | Mengxi Zhang et.al. | 2405.10707 | null |
2024-05-28 | DP-DyLoRA: Fine-Tuning Transformer-Based Models On-Device under Differentially Private Federated Learning using Dynamic Low-Rank Adaptation | Jie Xu et.al. | 2405.06368 | null |
2024-05-09 | Selective Fine-tuning on LLM-labeled Data May Reduce Reliance on Human Annotation: A Case Study Using Schedule-of-Event Table Detection | Bhawesh Kumar et.al. | 2405.06093 | null |
2024-05-09 | Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning | Shibo Jie et.al. | 2405.05615 | link |
2024-05-07 | Refining Joint Text and Source Code Embeddings for Retrieval Task with Parameter-Efficient Fine-Tuning | Karim Galliamov et.al. | 2405.04126 | link |
2024-05-04 | Random Masking Finds Winning Tickets for Parameter Efficient Fine-tuning | Jing Xu et.al. | 2405.02596 | link |
2024-03-16 | Empirical Studies of Parameter Efficient Methods for Large Language Models of Code and Knowledge Transfer to R | Amirreza Esmaeili et.al. | 2405.01553 | null |
2024-05-02 | NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment | Gerald Shen et.al. | 2405.01481 | link |
2024-04-29 | LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report | Justin Zhao et.al. | 2405.00732 | link |
2024-05-01 | Investigating Automatic Scoring and Feedback using Large Language Models | Gloria Ashiya Katuka et.al. | 2405.00602 | null |
2024-05-01 | MoPEFT: A Mixture-of-PEFTs for the Segment Anything Model | Rajat Sahay et.al. | 2405.00293 | null |
2024-04-30 | SPAFIT: Stratified Progressive Adaptation Fine-tuning for Pre-trained Large Language Models | Samir Arora et.al. | 2405.00201 | null |
2024-05-23 | HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning | Chunlin Tian et.al. | 2404.19245 | link |
2024-05-25 | FeDeRA:Efficient Fine-tuning of Language Models in Federated Learning Leveraging Weight Decomposition | Yuxuan Yan et.al. | 2404.18848 | null |
2024-04-25 | Efficiency in Focus: LayerNorm as a Catalyst for Fine-tuning Medical Visual Language Pre-trained Models | Jiawei Chen et.al. | 2404.16385 | null |
2024-05-23 | MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts | Dengchun Li et.al. | 2404.15159 | link |
2024-04-22 | ColA: Collaborative Adaptation with Gradient Learning | Enmao Diao et.al. | 2404.13844 | link |
2024-04-23 | Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications | Charith Chandra Sai Balne et.al. | 2404.13506 | null |
2024-04-18 | SKIP: Skill-Localized Prompt Tuning for Inference Speed Boost-Up | Nakyeong Yang et.al. | 2404.11916 | null |
2024-04-16 | Shears: Unstructured Sparsity with Neural Low-rank Adapter Search | J. Pablo Muñoz et.al. | 2404.10934 | link |
2024-04-16 | Exact and Efficient Unlearning for Large Language Model-based Recommendation | Zhiyu Hu et.al. | 2404.10327 | null |
2024-04-15 | LoRA Dropout as a Sparsity Regularizer for Overfitting Control | Yang Lin et.al. | 2404.09610 | null |
2024-04-21 | Analyzing the Impact of Data Selection and Fine-Tuning on Economic and Political Biases in LLMs | Ahmed Agiza et.al. | 2404.08699 | link |
2024-04-08 | Certified PEFTSmoothing: Parameter-Efficient Fine-Tuning with Randomized Smoothing | Chengyan Fu et.al. | 2404.05350 | null |
2024-04-08 | DLoRA: Distributed Parameter-Efficient Fine-Tuning Solution for Large Language Model | Chao Gao et.al. | 2404.05182 | null |
2024-04-12 | Q-PEFT: Query-dependent Parameter Efficient Fine-tuning for Text Reranking with Large Language Models | Zhiyuan Peng et.al. | 2404.04522 | null |
2024-04-05 | Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation | Tong Su et.al. | 2404.04212 | null |
2024-05-22 | ReFT: Representation Finetuning for Language Models | Zhengxuan Wu et.al. | 2404.03592 | link |
2024-06-11 | Personalized LLM Response Generation with Parameterized Memory Injection | Kai Zhang et.al. | 2404.03565 | null |
2024-06-20 | Eigenpruning: an Interpretability-Inspired PEFT Method | Tomás Vergara-Browne et.al. | 2404.03147 | link |
2024-05-28 | PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models | Fanxu Meng et.al. | 2404.02948 | link |
2024-04-03 | Enhancing Low-Resource LLMs Classification with PEFT and Synthetic Data | Parth Patwa et.al. | 2404.02422 | null |
2024-04-11 | IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT | Junchen Fu et.al. | 2404.02059 | link |
2024-03-31 | Query-driven Relevant Paragraph Extraction from Legal Judgments | T. Y. S. S Santosh et.al. | 2404.00595 | null |
2024-03-30 | Edinburgh Clinical NLP at SemEval-2024 Task 2: Fine-tune your model unless you have access to GPT-4 | Aryo Pradipta Gema et.al. | 2404.00484 | link |
2024-04-03 | InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning | Yan-Shuo Liang et.al. | 2404.00228 | link |
2024-03-27 | Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation | Mateusz Klimaszewski et.al. | 2403.18804 | link |
2024-03-26 | The Unreasonable Ineffectiveness of the Deeper Layers | Andrey Gromov et.al. | 2403.17887 | null |
2024-04-15 | ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models | Zequan Liu et.al. | 2403.16187 | null |
2024-03-22 | KnowLA: Enhancing Parameter-efficient Finetuning with Knowledgeable Adaptation | Xindi Luo et.al. | 2403.14950 | link |
2024-03-22 | A Single Linear Layer Yields Task-Adapted Low-Rank Matrices | Hwichan Kim et.al. | 2403.14946 | null |
2024-03-21 | AutoRE: Document-Level Relation Extraction with Large Language Models | Xue Lilong et.al. | 2403.14888 | link |
2024-04-29 | Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey | Zeyu Han et.al. | 2403.14608 | null |
2024-03-20 | Harnessing Large Language Models for Text-Rich Sequential Recommendation | Zhi Zheng et.al. | 2403.13325 | link |
2024-04-16 | AFLoRA: Adaptive Freezing of Low Rank Adaptation in Parameter Efficient Fine-Tuning of Large Models | Zeyu Liu et.al. | 2403.13269 | null |
2024-03-18 | Improving LoRA in Privacy-preserving Federated Learning | Youbang Sun et.al. | 2403.12313 | null |
2024-03-18 | Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation | Wangbo Zhao et.al. | 2403.11808 | link |
2024-03-18 | Let's Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model | Haoyun Xu et.al. | 2403.11621 | null |
2024-03-19 | JORA: JAX Tensor-Parallel LoRA Library for Retrieval Augmented Fine-Tuning | Anique Tahir et.al. | 2403.11366 | link |
2024-03-14 | Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks | Tingyu Qu et.al. | 2403.09377 | link |
2024-03-14 | PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation | Yizhe Xiong et.al. | 2403.09192 | link |
2024-03-13 | Data-oriented Dynamic Fine-tuning Parameter Selection Strategy for FISH Mask based Efficient Fine-tuning | Ming Dong et.al. | 2403.08484 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-12-19 | LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis | Hanlin Wang et.al. | 2412.15214 | null |
2024-12-19 | Flowing from Words to Pixels: A Framework for Cross-Modality Evolution | Qihao Liu et.al. | 2412.15213 | null |
2024-12-19 | Generative Multiview Relighting for 3D Reconstruction under Extreme Illumination Variation | Hadi Alzayer et.al. | 2412.15211 | null |
2024-12-19 | AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation | Moayed Haji-Ali et.al. | 2412.15191 | null |
2024-12-19 | LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation | Weijia Shi et.al. | 2412.15188 | null |
2024-12-19 | Tiled Diffusion | Or Madar et.al. | 2412.15185 | null |
2024-12-19 | SqueezeMe: Efficient Gaussian Avatars for VR | Shunsuke Saito et.al. | 2412.15171 | null |
2024-12-19 | OnlineVPO: Align Video Diffusion Model with Online Video-Centric Preference Optimization | Jiacheng Zhang et.al. | 2412.15159 | null |
2024-12-19 | Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM | Yatai Ji et.al. | 2412.15156 | link |
2024-12-19 | Jet: A Modern Transformer-Based Normalizing Flow | Alexander Kolesnikov et.al. | 2412.15129 | null |
2024-12-19 | Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation | Yang Tian et.al. | 2412.15109 | null |
2024-12-19 | Learning Disentangled Equivariant Representation for Explicitly Controllable 3D Molecule Generation | Haoran Liu et.al. | 2412.15086 | null |
2024-12-19 | Eigenstate Preparation on Quantum Computers | Joey Bonitati et.al. | 2412.15081 | null |
2024-12-19 | Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion | Zhifei Chen et.al. | 2412.15050 | null |
2024-12-19 | DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space | Mang Ning et.al. | 2412.15032 | link |
2024-12-18 | AniDoc: Animation Creation Made Easier | Yihao Meng et.al. | 2412.14173 | null |
2024-12-19 | E-CAR: Efficient Continuous Autoregressive Image Generation via Multistage Modeling | Zhihang Yuan et.al. | 2412.14170 | null |
2024-12-18 | Autoregressive Video Generation without Vector Quantization | Haoge Deng et.al. | 2412.14169 | link |
2024-12-18 | VideoDPO: Omni-Preference Alignment for Video Diffusion Generation | Runtao Liu et.al. | 2412.14167 | null |
2024-12-18 | MetaMorph: Multimodal Understanding and Generation via Instruction Tuning | Shengbang Tong et.al. | 2412.14164 | null |
2024-12-18 | MCMat: Multiview-Consistent and Physically Accurate PBR Material Generation | Shenhao Zhu et.al. | 2412.14148 | null |
2024-12-18 | Event-based Photometric Bundle Adjustment | Shuang Guo et.al. | 2412.14111 | null |
2024-12-18 | Future Research Avenues for Artificial Intelligence in Digital Gaming: An Exploratory Report | Markus Dablander et.al. | 2412.14085 | null |
2024-12-18 | SurgSora: Decoupled RGBD-Flow Diffusion Model for Controllable Surgical Video Generation | Tong Chen et.al. | 2412.14018 | null |
2024-12-18 | Comparative Analysis of Machine Learning-Based Imputation Techniques for Air Quality Datasets with High Missing Data Rates | Sen Yan et.al. | 2412.13966 | null |
2024-12-18 | A Rose by Any Other Name: LLM-Generated Explanations Are Good Proxies for Human Explanations to Collect Label Distributions on NLI | Beiduo Chen et.al. | 2412.13942 | null |
2024-12-18 | Development of a High-Resolution, High-Dynamic-Range Charge Detector for Ion Beam Monitoring | O. Adriani et.al. | 2412.13934 | null |
2024-12-18 | Investigating the Effects of Diffusion-based Conditional Generative Speech Models Used for Speech Enhancement on Dysarthric Speech | Joanna Reszka et.al. | 2412.13933 | null |
2024-12-18 | Graph-Driven Models for Gas Mixture Identification and Concentration Estimation on Heterogeneous Sensor Array Signals | Ding Wang et.al. | 2412.13891 | null |
2024-12-18 | Navigating limitations with precision: A fine-grained ensemble approach to wrist pathology recognition on a limited x-ray dataset | Ammar Ahmed et.al. | 2412.13884 | null |
2024-12-17 | CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models | Gaoyang Zhang et.al. | 2412.13195 | link |
2024-12-17 | StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models | Yunzhi Yan et.al. | 2412.13188 | null |
2024-12-17 | Move-in-2D: 2D-Conditioned Human Motion Generation | Hsin-Ping Huang et.al. | 2412.13185 | null |
2024-12-17 | F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration | Lu Liu et.al. | 2412.13155 | null |
2024-12-17 | Prompt Augmentation for Self-supervised Text-guided Image Manipulation | Rumeysa Bodur et.al. | 2412.13081 | null |
2024-12-17 | 3D MedDiffusion: A 3D Medical Diffusion Model for Controllable and High-quality Medical Image Generation | Haoshen Wang et.al. | 2412.13059 | null |
2024-12-17 | Guiding Generative Protein Language Models with Reinforcement Learning | Filippo Stocco et.al. | 2412.12979 | null |
2024-12-18 | Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance | Wenhao Sun et.al. | 2412.12974 | link |
2024-12-17 | ArchesWeather & ArchesWeatherGen: a deterministic and generative model for efficient ML weather forecasting | Guillaume Couairon et.al. | 2412.12971 | link |
2024-12-17 | Modified UNIFAC 2.0 -- A Group-Contribution Method Completed with Machine Learning | Nicolas Hayer et.al. | 2412.12962 | null |
2024-12-17 | MOPO: Multi-Objective Prompt Optimization for Affective Text Generation | Yarik Menchaca Resendiz et.al. | 2412.12948 | null |
2024-12-17 | Generation of cosmic ray trajectories by a Diffusion Model trained on test particles in 3D magnetohydrodynamic turbulence | Johannes Martin et.al. | 2412.12923 | null |
2024-12-17 | Unsupervised Region-Based Image Editing of Denoising Diffusion Models | Zixiang Li et.al. | 2412.12912 | null |
2024-12-18 | ArtAug: Enhancing Text-to-Image Generation through Synthesis-Understanding Interaction | Zhongjie Duan et.al. | 2412.12888 | link |
2024-12-17 | Memory-minimal quantum generation of stochastic processes: spectral invariants of quantum hidden Markov models | Magdalini Zonnios et.al. | 2412.12812 | null |
2024-12-16 | Causal Diffusion Transformers for Generative Modeling | Chaorui Deng et.al. | 2412.12095 | link |
2024-12-16 | CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models | Felix Taubner et.al. | 2412.12093 | null |
2024-12-16 | Wonderland: Navigating 3D Scenes from a Single Image | Hanwen Liang et.al. | 2412.12091 | null |
2024-12-16 | A LoRA is Worth a Thousand Pictures | Chenxi Liu et.al. | 2412.12048 | null |
2024-12-16 | LLMs for Cold-Start Cutting Plane Separator Configuration | Connor Lawless et.al. | 2412.12038 | null |
2024-12-16 | Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down Maps | Linfeng Zhao et.al. | 2412.12024 | null |
2024-12-16 | The entropic optimal (self-)transport problem: Limit distributions for decreasing regularization with application to score function estimation | Gilles Mordant et.al. | 2412.12007 | null |
2024-12-16 | Controllable Shadow Generation with Single-Step Diffusion Models from Synthetic Data | Onur Tasar et.al. | 2412.11972 | null |
2024-12-16 | The Erdős unit distance problem for small point sets | Boris Alexeev et.al. | 2412.11914 | null |
2024-12-16 | CharacterBench: Benchmarking Character Customization of Large Language Models | Jinfeng Zhou et.al. | 2412.11912 | link |
2024-12-16 | Towards Understanding Systems Trade-offs in Retrieval-Augmented Generation Model Inference | Michael Shen et.al. | 2412.11854 | null |
2024-12-16 | ColorFlow: Retrieval-Augmented Image Sequence Colorization | Junhao Zhuang et.al. | 2412.11815 | null |
2024-12-16 | InterDyn: Controllable Interactive Dynamics with Video Diffusion Models | Rick Akkerman et.al. | 2412.11785 | null |
2024-12-16 | Joint Reconstruction of the Activity and the Attenuation in PET by Diffusion Posterior Sampling: a Feasibility Study | Clémentine Phung-Ngoc et.al. | 2412.11776 | null |
2024-12-17 | No More Adam: Learning Rate Scaling at Initialization is All You Need | Minghao Xu et.al. | 2412.11768 | link |
2024-12-13 | Towards a foundation model for heavy-ion collision experiments through point cloud diffusion | Manjunath Omana Kuttan et.al. | 2412.10352 | null |
2024-12-13 | BrushEdit: All-In-One Image Inpainting and Editing | Yaowei Li et.al. | 2412.10316 | null |
2024-12-13 | Iterating the Transient Light Transport Matrix for Non-Line-of-Sight Imaging | Talha Sultan et.al. | 2412.10300 | null |
2024-12-13 | Coherent 3D Scene Diffusion From a Single RGB Image | Manuel Dahnert et.al. | 2412.10294 | null |
2024-12-13 | Adversarial Robustness of Bottleneck Injected Deep Neural Networks for Task-Oriented Communication | Alireza Furutanpey et.al. | 2412.10265 | null |
2024-12-13 | Targeted Angular Reversal of Weights (TARS) for Knowledge Removal in Large Language Models | Harry J. Davies et.al. | 2412.10257 | null |
2024-12-13 | Exploring the Frontiers of Animation Video Generation in the Sora Era: Method, Dataset and Benchmark | Yudong Jiang et.al. | 2412.10255 | null |
2024-12-13 | Radiator Tailoring for Enhanced Performance in InAs-Based Near-Field Thermophotovoltaics | Mathieu Giroux et.al. | 2412.10217 | null |
2024-12-13 | GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion | Jiapeng Tang et.al. | 2412.10209 | null |
2024-12-13 | Efficient Generative Modeling with Residual Vector Quantization-Based Tokens | Jaehyeon Kim et.al. | 2412.10208 | null |
2024-12-13 | Simple Guidance Mechanisms for Discrete Diffusion Models | Yair Schiff et.al. | 2412.10193 | link |
2024-12-13 | SwiftTry: Fast and Consistent Video Virtual Try-On with Diffusion Models | Hung Nguyen et.al. | 2412.10178 | null |
2024-12-13 | Learning payoffs while routing in skill-based queues | Sanne van Kempen et.al. | 2412.10168 | null |
2024-12-13 | The Art of Deception: Color Visual Illusions and Diffusion Models | Alex Gomez-Villa et.al. | 2412.10122 | null |
2024-12-13 | Familiarity: Better Evaluation of Zero-Shot Named Entity Recognition by Quantifying Label Shifts in Synthetic Training Data | Jonas Golde et.al. | 2412.10121 | null |
2024-12-12 | FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion | Haonan Qiu et.al. | 2412.09626 | null |
2024-12-12 | Illusion3D: 3D Multiview Illusion with 2D Diffusion Priors | Yue Feng et.al. | 2412.09625 | null |
2024-12-12 | GenEx: Generating an Explorable World | Taiming Lu et.al. | 2412.09624 | null |
2024-12-12 | OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation | Weiqi Li et.al. | 2412.09623 | null |
2024-12-12 | LoRACLR: Contrastive Adaptation for Customization of Diffusion Models | Enis Simsar et.al. | 2412.09622 | null |
2024-12-12 | SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training | Dongting Hu et.al. | 2412.09619 | null |
2024-12-12 | EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM | Zhuofan Zong et.al. | 2412.09618 | null |
2024-12-12 | Context Canvas: Enhancing Text-to-Image Diffusion Models with Knowledge Graph-Based RAG | Kavana Venkatesh et.al. | 2412.09614 | null |
2024-12-13 | Olympus: A Universal Task Router for Computer Vision Tasks | Yuanze Lin et.al. | 2412.09612 | link |
2024-12-12 | Owl-1: Omni World Model for Consistent Long Video Generation | Yuanhui Huang et.al. | 2412.09600 | link |
2024-12-12 | LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors | Yabo Chen et.al. | 2412.09597 | null |
2024-12-12 | Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion | Zexin He et.al. | 2412.09593 | null |
2024-12-12 | Improving the Reliability of Cable Broadband Networks via Proactive Network Maintenance | Jiyao Hu et.al. | 2412.09564 | null |
2024-12-12 | Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale | Zekun Hao et.al. | 2412.09548 | null |
2024-12-12 | SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing | Xueting Li et.al. | 2412.09545 | null |
2024-12-11 | Generative Semantic Communication: Architectures, Technologies, and Applications | Jinke Ren et.al. | 2412.08642 | null |
2024-12-11 | DMin: Scalable Training Data Influence Estimation for Diffusion Models | Huawei Lin et.al. | 2412.08637 | link |
2024-12-11 | Multimodal Latent Language Modeling with Next-Token Diffusion | Yutao Sun et.al. | 2412.08635 | link |
2024-12-11 | An SDR-Based Monostatic Wi-Fi System with Analog Self-Interference Cancellation for Sensing | Andreas Toftegaard Kristensen et.al. | 2412.08612 | null |
2024-12-12 | Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis | Feng Zhou et.al. | 2412.08603 | null |
2024-12-11 | TryOffAnyone: Tiled Cloth Generation from a Dressed Person | Ioannis Xarchakos et.al. | 2412.08573 | link |
2024-12-12 | Watermarking Training Data of Music Generation Models | Pascal Epple et.al. | 2412.08549 | null |
2024-12-11 | Orderly Management of Packets in RDMA by Eunomia | Sana Mahmood et.al. | 2412.08540 | null |
2024-12-11 | Ensemble-Based Quantum-Token Protocol Benchmarked on IBM Quantum Processors | Lucas Tsunaki et.al. | 2412.08530 | null |
2024-12-11 | Comparative Opinion Mining in Product Reviews: Multi-perspective Prompt-based Learning | Hai-Yen Thi Nguyen et.al. | 2412.08508 | null |
2024-12-11 | Open-Loop and Model Predictive Control for Electric Vehicle Charging to Manage Excess Renewable Energy Supply in Texas | Kelsey M. Nelson et.al. | 2412.08505 | null |
2024-12-11 | Learning Flow Fields in Attention for Controllable Person Image Generation | Zijian Zhou et.al. | 2412.08486 | link |
2024-12-11 | InvDiff: Invariant Guidance for Bias Mitigation in Diffusion Models | Min Hou et.al. | 2412.08480 | link |
2024-12-11 | CC-Diff: Enhancing Contextual Coherence in Remote Sensing Image Synthesis | Mu Zhang et.al. | 2412.08464 | null |
2024-12-11 | Federated Learning for Traffic Flow Prediction with Synthetic Data Augmentation | Fermin Orozco et.al. | 2412.08460 | null |
2024-12-10 | Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets | Zhen Liu et.al. | 2412.07775 | null |
2024-12-10 | UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics | Xi Chen et.al. | 2412.07774 | null |
2024-12-10 | From Slow Bidirectional to Fast Causal Video Generators | Tianwei Yin et.al. | 2412.07772 | null |
2024-12-10 | Make-A-Texture: Fast Shape-Aware Texture Generation in 3 Seconds | Xiaoyu Xiang et.al. | 2412.07766 | null |
2024-12-10 | Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences | Alan Nawzad Amin et.al. | 2412.07763 | link |
2024-12-10 | Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation | Jingxi Chen et.al. | 2412.07761 | null |
2024-12-10 | SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints | Jianhong Bai et.al. | 2412.07760 | link |
2024-12-10 | PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation | Fatemeh Nazarieh et.al. | 2412.07754 | null |
2024-12-10 | Multi-Shot Character Consistency for Text-to-Video Generation | Yuval Atzmon et.al. | 2412.07750 | null |
2024-12-10 | StyleMaster: Stylize Your Video with Artistic Generation and Translation | Zixuan Ye et.al. | 2412.07744 | null |
2024-12-10 | STIV: Scalable Text and Image Conditioned Video Generation | Zongyu Lin et.al. | 2412.07730 | null |
2024-12-10 | ObjCtrl-2.5D: Training-free Object Control with Camera Poses | Zhouxia Wang et.al. | 2412.07721 | null |
2024-12-10 | ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer | Jinyi Hu et.al. | 2412.07720 | link |
2024-12-10 | Privacy-Preserving Customer Support: A Framework for Secure and Scalable Interactions | Anant Prakash Awasthi et.al. | 2412.07687 | null |
2024-12-10 | Optimizing Sensor Redundancy in Sequential Decision-Making Problems | Jonas Nüßlein et.al. | 2412.07686 | null |
2024-12-10 | [MASK] is All You Need | Vincent Tao Hu et.al. | 2412.06787 | link |
2024-12-09 | Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation | Ruihan Gao et.al. | 2412.06785 | link |
2024-12-09 | Diverse Score Distillation | Yanbo Xu et.al. | 2412.06780 | null |
2024-12-09 | Visual Lexicon: Rich Image Features in Language Space | XuDong Wang et.al. | 2412.06774 | null |
2024-12-09 | InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention | Howard Zhang et.al. | 2412.06753 | null |
2024-12-09 | ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities | Adhiraj Ghosh et.al. | 2412.06745 | null |
2024-12-10 | ContRail: A Framework for Realistic Railway Image Synthesis using ControlNet | Andrei-Robert Alexandrescu et.al. | 2412.06742 | null |
2024-12-09 | Take Fake as Real: Realistic-like Robust Black-box Adversarial Attack to Evade AIGC Detection | Caiyun Xie et.al. | 2412.06727 | link |
2024-12-09 | You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale | Baorui Ma et.al. | 2412.06699 | link |
2024-12-09 | Gen-3Diffusion: Realistic Image-to-3D Generation via 2D & 3D Diffusion Synergy | Yuxuan Xue et.al. | 2412.06698 | null |
2024-12-09 | Diff5T: Benchmarking Human Brain Diffusion MRI with an Extensive 5.0 Tesla K-Space and Spatial Dataset | Shanshan Wang et.al. | 2412.06666 | null |
2024-12-09 | Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion | Shuaiting Li et.al. | 2412.06661 | null |
2024-12-09 | MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences | Weitao Wang et.al. | 2412.06614 | null |
2024-12-09 | Augmented reality for upper limb rehabilitation: real-time kinematic feedback with HoloLens 2 | Beatrice Luciani et.al. | 2412.06596 | null |
2024-12-09 | EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations | Weizhen Bian et.al. | 2412.06581 | null |
2024-12-06 | Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model | Lening Wang et.al. | 2412.05280 | link |
2024-12-06 | Perturb-and-Revise: Flexible 3D Editing with Generative Trajectories | Susung Hong et.al. | 2412.05279 | null |
2024-12-06 | Birth and Death of a Rose | Chen Geng et.al. | 2412.05278 | null |
2024-12-06 | MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models | Tuna Han Salih Meral et.al. | 2412.05275 | null |
2024-12-06 | Go-or-Grow Models in Biology: a Monster on a Leash | R. Thiessen et.al. | 2412.05191 | null |
2024-12-06 | Privacy Drift: Evolving Privacy Concerns in Incremental Learning | Sayyed Farid Ahamed et.al. | 2412.05183 | null |
2024-12-06 | DNF: Unconditional 4D Generation with Dictionary-based Neural Fields | Xinyi Zhang et.al. | 2412.05161 | null |
2024-12-06 | A text-to-tabular approach to generate synthetic patient data using LLMs | Margaux Tornqvist et.al. | 2412.05153 | link |
2024-12-06 | LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation | Donald Shenaj et.al. | 2412.05148 | null |
2024-12-06 | How to Squeeze An Explanation Out of Your Model | Tiago Roxo et.al. | 2412.05134 | null |
2024-12-06 | Probabilistic Galaxy Field Generation with Diffusion Models | Tanner Sether et.al. | 2412.05131 | null |
2024-12-06 | The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation | Ruoyu Wang et.al. | 2412.05101 | null |
2024-12-06 | Reconstructing Quantitative Cerebral Perfusion Images Directly From Measured Sinogram Data Acquired Using C-arm Cone-Beam CT | Haotian Zhao et.al. | 2412.05084 | null |
2024-12-06 | ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration | Chi-Wei Hsiao et.al. | 2412.05043 | null |
2024-12-06 | Get It Right: Improving Comprehensibility with Adaptable Speech Expression of a Humanoid Service Robot | Thomas Sievers et.al. | 2412.05022 | null |
2024-12-05 | PaintScene4D: Consistent 4D Scene Generation from Text Prompts | Vinayak Gupta et.al. | 2412.04471 | null |
2024-12-05 | LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors | Yusuf Dalva et.al. | 2412.04460 | null |
2024-12-05 | Four-Plane Factorized Video Autoencoders | Mohammed Suhail et.al. | 2412.04452 | null |
2024-12-05 | MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation | Longtao Zheng et.al. | 2412.04448 | null |
2024-12-05 | DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models | Yizhuo Li et.al. | 2412.04446 | null |
2024-12-05 | Learning Artistic Signatures: Symmetry Discovery and Style Transfer | Emma Finn et.al. | 2412.04441 | null |
2024-12-05 | GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration | Kaiyi Huang et.al. | 2412.04440 | null |
2024-12-05 | Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation | Yuying Ge et.al. | 2412.04432 | link |
2024-12-05 | Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis | Jian Han et.al. | 2412.04431 | link |
2024-12-05 | Reversible molecular simulation for training classical and machine learning force fields | Joe G Greener et.al. | 2412.04374 | link |
2024-12-05 | Machine Theory of Mind for Autonomous Cyber-Defence | Luke Swaby et.al. | 2412.04367 | null |
2024-12-05 | ActFusion: a Unified Diffusion Model for Action Segmentation and Anticipation | Dayoung Gong et.al. | 2412.04353 | null |
2024-12-05 | RMD: A Simple Baseline for More General Human Motion Generation via Training-free Retrieval-Augmented Motion Diffuse | Zhouyingcheng Liao et.al. | 2412.04343 | null |
2024-12-05 | Likelihood-Scheduled Score-Based Generative Modeling for Fully 3D PET Image Reconstruction | George Webber et.al. | 2412.04339 | null |
2024-12-05 | Multi-Subject Image Synthesis as a Generative Prior for Single-Subject PET Image Reconstruction | George Webber et.al. | 2412.04324 | null |
2024-12-04 | Navigation World Models | Amir Bar et.al. | 2412.03572 | null |
2024-12-04 | MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation | Zehuan Huang et.al. | 2412.03558 | null |
2024-12-04 | NODE-AdvGAN: Improving the transferability and perceptual similarity of adversarial examples by dynamic-system-driven adversarial generative model | Xinheng Xie et.al. | 2412.03539 | null |
2024-12-04 | NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images | Lingen Li et.al. | 2412.03517 | null |
2024-12-04 | Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion | Shengyuan Zhang et.al. | 2412.03515 | link |
2024-12-04 | Data Fusion of Semantic and Depth Information in the Context of Object Detection | Md Abu Yusuf et.al. | 2412.03490 | null |
2024-12-04 | Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective | Neta Shaul et.al. | 2412.03487 | null |
2024-12-04 | Pre-trained Multiple Latent Variable Generative Models are good defenders against Adversarial Attacks | Dario Serez et.al. | 2412.03453 | link |
2024-12-04 | CleanDIFT: Diffusion Features without Noise | Nick Stracke et.al. | 2412.03439 | link |
2024-12-04 | SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model | Yan Li et.al. | 2412.03430 | null |
2024-12-04 | Skel3D: Skeleton Guided Novel View Synthesis | Aron Fóthi et.al. | 2412.03407 | null |
2024-12-04 | Identifiability implies consistency of MLE in partially observed diffusions on a torus | Ibrahim Ekren et.al. | 2412.03380 | null |
2024-12-04 | TASR: Timestep-Aware Diffusion Model for Image Super-Resolution | Qinwei Lin et.al. | 2412.03355 | link |
2024-12-04 | DIVE: Taming DINO for Subject-Driven Video Editing | Yi Huang et.al. | 2412.03347 | null |
2024-12-04 | Geometry-guided Cross-view Diffusion for One-to-many Cross-view Image Synthesis | Tao Jun Lin et.al. | 2412.03315 | null |
2024-12-03 | Motion Prompting: Controlling Video Generation with Motion Trajectories | Daniel Geng et.al. | 2412.02700 | null |
2024-12-03 | Diffusion-based Visual Anagram as Multi-task Learning | Zhiyuan Xu et.al. | 2412.02693 | link |
2024-12-03 | FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation | Kefan Chen et.al. | 2412.02690 | null |
2024-12-04 | SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance | Viet Nguyen et.al. | 2412.02687 | null |
2024-12-03 | AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction | Lingteng Qiu et.al. | 2412.02684 | null |
2024-12-03 | Sharp-It: A Multi-view to Multi-view Diffusion Model for 3D Synthesis and Manipulation | Yiftach Edelstein et.al. | 2412.02631 | null |
2024-12-03 | The effect of priors on Learning with Restricted Boltzmann Machines | Gianluca Manzan et.al. | 2412.02623 | null |
2024-12-03 | ComPair-2: A Next Generation Medium Energy Gamma-ray Telescope Prototype | Regina Caputo et.al. | 2412.02562 | null |
2024-12-03 | The Two-Center Problem of Uncertain Points on Cactus Graphs | Haitao Xu et.al. | 2412.02559 | null |
2024-12-03 | ShadowHack: Hacking Shadows via Luminance-Color Divide and Conquer | Jin Hu et.al. | 2412.02545 | link |
2024-12-03 | Unveiling Concept Attribution in Diffusion Models | Quang H. Nguyen et.al. | 2412.02542 | null |
2024-12-03 | LLMForecaster: Improving Seasonal Event Forecasts with Unstructured Textual Data | Hanyu Zhang et.al. | 2412.02525 | null |
2024-12-03 | GerPS-Compare: Comparing NER methods for legal norm analysis | Sarah T. Bachinger et.al. | 2412.02427 | null |
2024-12-03 | It Takes Two: Real-time Co-Speech Two-person's Interaction Generation via Reactive Auto-regressive Diffusion Model | Mingyi Shi et.al. | 2412.02419 | null |
2024-12-03 | A Multi-Agent Framework for Extensible Structured Text Generation in PLCs | Donghao Yang et.al. | 2412.02410 | null |
2024-11-29 | Nanostructured micrometric-pore membranes for nanofiltration: Micrometric geometry may optimize performance, energy efficiency and operational lifetime | J. C. Verde et.al. | 2411.19900 | null |
2024-11-29 | Input-Output Optics as a Causal Time Series Mapping: A Generative Machine Learning Solution | Abhijit Sen et.al. | 2411.19897 | null |
2024-11-29 | MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks | Yiming Wu et.al. | 2411.19786 | null |
2024-11-29 | Riemannian Denoising Score Matching for Molecular Structure Optimization with Accurate Energy | Jeheon Woo et.al. | 2411.19769 | null |
2024-11-29 | JetFormer: An Autoregressive Generative Model of Raw Images and Text | Michael Tschannen et.al. | 2411.19722 | null |
2024-11-29 | Inverse Design of Mechanical Metamaterials Using a Point-Cloud-Based Deep Generative Model | Seungwook Hong et.al. | 2411.19681 | null |
2024-11-29 | TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting | Bojun Xiong et.al. | 2411.19654 | null |
2024-11-29 | Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing | Wenyi Mo et.al. | 2411.19652 | link |
2024-11-29 | Enhancing Security in Third-Party Library Reuse -- Comprehensive Detection of 1-day Vulnerability through Code Patch Analysis | Shangzhi Xu et.al. | 2411.19648 | null |
2024-11-29 | Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings | Qiong Wu et.al. | 2411.19628 | link |
2024-11-29 | Unimib Assistant: designing a student-friendly RAG-based chatbot for all their needs | Chiara Antico et.al. | 2411.19554 | null |
2024-11-29 | Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook | Florinel-Alin Croitoru et.al. | 2411.19537 | link |
2024-11-29 | Quantized Delta Weight Is Safety Keeper | Yule Liu et.al. | 2411.19530 | null |
2024-12-02 | DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding | Jungbin Cho et.al. | 2411.19527 | null |
2024-11-29 | Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis | Tianqi Li et.al. | 2411.19509 | null |
2024-11-27 | Textured Gaussians for Enhanced 3D Scene Appearance Modeling | Brian Chao et.al. | 2411.18625 | null |
2024-11-27 | GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data | Wentao Wang et.al. | 2411.18624 | null |
2024-11-27 | Diffusion Self-Distillation for Zero-Shot Customized Image Generation | Shengqu Cai et.al. | 2411.18616 | null |
2024-11-27 | CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models | Rundi Wu et.al. | 2411.18613 | null |
2024-11-27 | Evaluating and Improving the Effectiveness of Synthetic Chest X-Rays for Medical Image Analysis | Eva Prakash et.al. | 2411.18602 | null |
2024-11-27 | Bit symmetry entails the symmetry of the quantum transition probability | Gerd Niestegge et.al. | 2411.18589 | null |
2024-11-27 | Building Confidence in Deep Generative Protein Design | Tianyuan Zheng et.al. | 2411.18568 | link |
2024-11-27 | High-throughput antibody screening with high-quality factor nanophotonics and bioprinting | Sajjad Abdollahramezani et.al. | 2411.18557 | null |
2024-11-27 | FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion | Haosen Yang et.al. | 2411.18552 | null |
2024-11-28 | Enhancing weed detection performance by means of GenAI-based image augmentation | Sourav Modak et.al. | 2411.18513 | null |
2024-11-27 | GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation | Pengfei Zhou et.al. | 2411.18499 | null |
2024-11-27 | Synthetic ECG Generation for Data Augmentation and Transfer Learning in Arrhythmia Classification | José Fernando Núñez et.al. | 2411.18456 | null |
2024-11-27 | Is my Meeting Summary Good? Estimating Quality with a Multi-LLM Evaluator | Frederic Kirstein et.al. | 2411.18444 | null |
2024-11-27 | Learning the Evolution of Physical Structure of Galaxies via Diffusion Models | Andrew Lizarraga et.al. | 2411.18440 | link |
2024-11-27 | Search for heavy scalar or pseudoscalar states in |
Laurids Jeppe et.al. | 2411.18414 | null |
2024-11-27 | StableAnimator: High-Quality Identity-Preserving Human Image Animation | Shuyuan Tu et.al. | 2411.17697 | link |
2024-11-26 | ScribbleLight: Single Image Indoor Relighting with Scribbles | Jun Myeong Choi et.al. | 2411.17696 | null |
2024-11-26 | Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis | Akshita Gupta et.al. | 2411.17690 | null |
2024-11-26 | GenDeg: Diffusion-Based Degradation Synthesis for Generalizable All-in-One Image Restoration | Sudarshan Rajagopalan et.al. | 2411.17687 | null |
2024-11-26 | Semi-analytical model for the calculation of solar radiation pressure and its effects on a LEO satellite with predicting the change in position vectors using machine learning techniques | Pranava Seth et.al. | 2411.17626 | null |
2024-11-26 | Accelerating Vision Diffusion Transformers with Skip Branches | Guanjie Chen et.al. | 2411.17616 | link |
2024-11-26 | Mixed-State Quantum Denoising Diffusion Probabilistic Model | Gino Kwun et.al. | 2411.17608 | null |
2024-11-26 | Making History Readable | Bipasha Banerjee et.al. | 2411.17600 | null |
2024-11-26 | VideoDirector: Precise Video Editing via Text-to-Video Models | Yukun Wang et.al. | 2411.17592 | null |
2024-11-26 | Rapid Deployment of Domain-specific Hyperspectral Image Processors with Application to Autonomous Driving | Jon Gutiérrez-Zaballa et.al. | 2411.17543 | null |
2024-11-26 | Metaverse Innovation Canvas: A Tool for Extended Reality Product/Service Development | Amir Reza Asadi et.al. | 2411.17541 | null |
2024-11-26 | IMPROVE: Improving Medical Plausibility without Reliance on HumanValidation -- An Enhanced Prototype-Guided Diffusion Framework | Anurag Shandilya et.al. | 2411.17535 | null |
2024-11-26 | FTMoMamba: Motion Generation with Frequency and Text State Space Models | Chengjian Li et.al. | 2411.17532 | null |
2024-11-26 | Exact and Heuristic Approaches for the Covering Tour Location Routing Problem | Andreas Hagn et.al. | 2411.17510 | link |
2024-11-26 | WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model | Zongjian Li et.al. | 2411.17459 | link |
2024-11-25 | Generative Omnimatte: Learning to Decompose Video into Layers | Yao-Chih Lee et.al. | 2411.16683 | null |
2024-11-25 | Diffusion Features for Zero-Shot 6DoF Object Pose Estimation | Bernd Von Gimborn et.al. | 2411.16668 | null |
2024-11-25 | DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation | Zun Wang et.al. | 2411.16657 | null |
2024-11-25 | Exploring Discrete Flow Matching for 3D De Novo Molecule Generation | Ian Dunn et.al. | 2411.16644 | link |
2024-11-25 | LegoPET: Hierarchical Feature Guided Conditional Diffusion for PET Image Reconstruction | Yiran Sun et.al. | 2411.16629 | null |
2024-11-25 | Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models | Ronghuan Wu et.al. | 2411.16602 | null |
2024-11-25 | Unlocking The Potential of Adaptive Attacks on Diffusion-Based Purification | Andre Kassis et.al. | 2411.16598 | link |
2024-11-25 | Rethinking Diffusion for Text-Driven Human Motion Generation | Zichong Meng et.al. | 2411.16575 | null |
2024-11-25 | Representation Collapsing Problems in Vector Quantization | Wenhao Zhao et.al. | 2411.16550 | null |
2024-11-25 | ADOBI: Adaptive Diffusion Bridge For Blind Inverse Problems with Application to MRI Reconstruction | Yuyang Hu et.al. | 2411.16535 | null |
2024-11-25 | PriorPath: Coarse-To-Fine Approach for Controlled De-Novo Pathology Semantic Masks Generation | Nati Daniel et.al. | 2411.16515 | null |
2024-11-25 | Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis | Boming Miao et.al. | 2411.16503 | null |
2024-11-25 | Multi-Resolution Generative Modeling of Human Motion from Limited Data | David Eduardo Moreno-Villamarín et.al. | 2411.16498 | null |
2024-11-25 | Learning by Analogy: Enhancing Few-Shot Prompting for Math Word Problem Solving with Computational Graph-Based Retrieval | Xiaocong Yang et.al. | 2411.16454 | null |
2024-11-25 | Model-based reinforcement corrosion prediction: Continuous calibration with Bayesian optimization and corrosion wire sensor data | A. Potnis et.al. | 2411.16447 | null |
2024-11-22 | DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving | Bencheng Liao et.al. | 2411.15139 | link |
2024-11-22 | Material Anything: Generating Materials for Any 3D Object via Diffusion | Xin Huang et.al. | 2411.15138 | null |
2024-11-22 | VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement | Daeun Lee et.al. | 2411.15115 | null |
2024-11-22 | RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts | Hjalmar Wijk et.al. | 2411.15114 | link |
2024-11-22 | Efficient Pruning of Text-to-Image Models: Insights from Pruning Stable Diffusion | Samarth N Ramesh et.al. | 2411.15113 | null |
2024-11-22 | Leapfrog Latent Consistency Model (LLCM) for Medical Images Generation | Lakshmikar R. Polamreddy et.al. | 2411.15084 | link |
2024-11-22 | Towards Speaker Identification with Minimal Dataset and Constrained Resources using 1D-Convolution Neural Network | Irfan Nafiz Shahan et.al. | 2411.15082 | link |
2024-11-22 | Empowering Clients: Transformation of Design Processes Due to Generative AI | Johannes Schneider et.al. | 2411.15061 | null |
2024-11-22 | The 1D nonlocal Fisher-KPP equation with a top hat kernel. Part 3. The effect of perturbations in the kernel | David John Needham et.al. | 2411.15054 | null |
2024-11-22 | FloAt: Flow Warping of Self-Attention for Clothing Animation Generation | Swasti Shreya Mishra et.al. | 2411.15028 | null |
2024-11-22 | Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation | Huy Le et.al. | 2411.14913 | null |
2024-11-22 | Dynamically Encircled Higher-order Exceptional Points in an Optical Fiber | Arpan Roy et.al. | 2411.14874 | null |
2024-11-22 | Prioritize Denoising Steps on Diffusion Model Preference Alignment via Explicit Denoised Distribution Estimation | Dingyuan Shi et.al. | 2411.14871 | null |
2024-11-22 | Latent Schrodinger Bridge: Prompting Latent Diffusion for Fast Unpaired Image-to-Image Translation | Jeongsol Kim et.al. | 2411.14863 | null |
2024-11-22 | Style-Friendly SNR Sampler for Style-Driven Generation | Jooyoung Choi et.al. | 2411.14793 | null |
2024-11-21 | Stable Flow: Vital Layers for Training-Free Image Editing | Omri Avrahami et.al. | 2411.14430 | null |
2024-11-21 | Transformer-based Heuristic for Advanced Air Mobility Planning | Jun Xiang et.al. | 2411.14427 | null |
2024-11-21 | A Python-Based Approach to Sputter Deposition Simulations in Combinatorial Materials Science | Felix Thelen et.al. | 2411.14413 | null |
2024-11-21 | Multi-Agent Environments for Vehicle Routing Problems | Ricardo Gama et.al. | 2411.14411 | link |
2024-11-21 | Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation | Yuanhao Cai et.al. | 2411.14384 | null |
2024-11-21 | CoNFiLD-inlet: Synthetic Turbulence Inflow Using Generative Latent Diffusion Models with Neural Fields | Xin-Yang Liu et.al. | 2411.14378 | null |
2024-11-21 | Enhancing Medical Image Segmentation with Deep Learning and Diffusion Models | Houze Liu et.al. | 2411.14353 | null |
2024-11-21 | DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding | Tianhe Ren et.al. | 2411.14347 | link |
2024-11-21 | Lower Dimensional Spherical Representation of Medium Voltage Load Profiles for Visualization, Outlier Detection, and Generative Modelling | Edgar Mauricio Salazar Duque et.al. | 2411.14346 | null |
2024-11-21 | StereoCrafter-Zero: Zero-Shot Stereo Video Generation with Noisy Restart | Jian Shi et.al. | 2411.14295 | null |
2024-11-21 | Efficient Aspect-Based Summarization of Climate Change Reports with Small Language Models | Iacopo Ghinassi et.al. | 2411.14272 | link |
2024-11-21 | Guided MRI Reconstruction via Schrödinger Bridge | Yue Wang et.al. | 2411.14269 | null |
2024-11-21 | Regional Attention for Shadow Removal | Hengxing Liu et.al. | 2411.14201 | link |
2024-11-21 | TaQ-DiT: Time-aware Quantization for Diffusion Transformers | Xinyan Liu et.al. | 2411.14172 | null |
2024-11-21 | Creating a Formally Verified Neural Network for Autonomous Navigation: An Experience Report | Syed Ali Asadullah Bukhari et.al. | 2411.14163 | link |
2024-11-20 | REDUCIO! Generating 1024 |
Rui Tian et.al. | 2411.13552 | link |
2024-11-20 | Identity Preserving 3D Head Stylization with Multiview Score Distillation | Bahri Batuhan Bilecen et.al. | 2411.13536 | null |
2024-11-20 | VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models | Ziqi Huang et.al. | 2411.13503 | link |
2024-11-20 | LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models | Salvatore Mario Carta et.al. | 2411.13453 | null |
2024-11-20 | Heuristically Adaptive Diffusion-Model Evolutionary Strategy | Benedikt Hartl et.al. | 2411.13420 | null |
2024-11-20 | Energy-based generative models for monoclonal antibodies | Paul Pereira et.al. | 2411.13390 | link |
2024-11-20 | Small and Close-In Planets are Uncommon around A-type Stars | Steven Giacalone et.al. | 2411.13363 | null |
2024-11-20 | Vertical Validation: Evaluating Implicit Generative Models for Graphs on Thin Support Regions | Mai Elkady et.al. | 2411.13358 | null |
2024-11-20 | A CSI Feedback Framework based on Transmitting the Important Values and Generating the Others | Zhilin Du et.al. | 2411.13298 | null |
2024-11-21 | Structure-Based Molecule Optimization via Gradient-Guided Bayesian Update | Keyue Qiu et.al. | 2411.13280 | null |
2024-11-20 | XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation | Ziyi Wang et.al. | 2411.13243 | link |
2024-11-20 | BIPro: Zero-shot Chinese Poem Generation via Block Inverse Prompting Constrained Generation Framework | Xu Zou et.al. | 2411.13237 | null |
2024-11-20 | Building music with Lego bricks and Raspberry Pi | Ana M. Barbancho et.al. | 2411.13224 | null |
2024-11-20 | A computational framework for integrating Predictive processes with evidence Accumulation Models (PAM) | Antonino Visalli et.al. | 2411.13203 | link |
2024-11-20 | OpenMS WebApps: Building User-Friendly Solutions for MS Analysis | Tom David Müller et.al. | 2411.13189 | null |
2024-11-19 | Enhancing Multi-Class Disease Classification: Neoplasms, Cardiovascular, Nervous System, and Digestive Disorders Using Advanced LLMs | Ahmed Akib Jawad Karim et.al. | 2411.12712 | null |
2024-11-19 | OrigamiPlot: An R Package and Shiny Web App Enhanced Visualizations for Multivariate Data | Yiwen Lu et.al. | 2411.12674 | null |
2024-11-19 | Auto-Evaluation with Few Labels through Post-hoc Regression | Benjamin Eyre et.al. | 2411.12665 | null |
2024-11-19 | PoM: Efficient Image and Video Generation with the Polynomial Mixer | David Picard et.al. | 2411.12663 | link |
2024-11-19 | Optimizing Airline Reservation Systems with Edge-Enabled Microservices: A Framework for Real-Time Data Processing and Enhanced User Responsiveness | Biman Barua et.al. | 2411.12650 | null |
2024-11-19 | DLBacktrace: A Model Agnostic Explainability for any Deep Learning Models | Vinay Kumar Sankarapu et.al. | 2411.12643 | link |
2024-11-19 | Improving Controllability and Editability for Pretrained Text-to-Music Generation Models | Yixiao Zhang et.al. | 2411.12641 | null |
2024-11-19 | Universal programmable waveguide arrays | Akram Youssry et.al. | 2411.12610 | null |
2024-11-19 | Whisper Finetuning on Nepali Language | Sanjay Rijal et.al. | 2411.12587 | null |
2024-11-19 | Predicting Customer Satisfaction by Replicating the Survey Response Distribution | Etienne Manderscheid et.al. | 2411.12539 | null |
2024-11-19 | Data Pruning in Generative Diffusion Models | Rania Briq et.al. | 2411.12523 | null |
2024-11-19 | Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing | Ruyi Ding et.al. | 2411.12508 | null |
2024-11-19 | Empirical Privacy Evaluations of Generative and Predictive Machine Learning Models -- A review and challenges for practice | Flavio Hafner et.al. | 2411.12451 | null |
2024-11-19 | Frequency-Aware Guidance for Blind Image Restoration via Diffusion Models | Jun Xiao et.al. | 2411.12450 | null |
2024-11-19 | A general modeling and simulation framework for dynamic vehicle routing | Markó Horváth et.al. | 2411.12406 | link |
2024-11-18 | QARM: Quantitative Alignment Multi-Modal Recommendation at Kuaishou | Xinchen Luo et.al. | 2411.11739 | null |
2024-11-18 | Aligning Few-Step Diffusion Models with Dense Reward Difference Learning | Ziyi Zhang et.al. | 2411.11727 | link |
2024-11-18 | Multiscale nonlinear integration drives accurate encoding of input information | Giorgio Nicoletti et.al. | 2411.11710 | null |
2024-11-18 | Robust Reinforcement Learning under Diffusion Models for Data with Jumps | Chenyang Jiang et.al. | 2411.11697 | null |
2024-11-18 | Active droplets controlled by enzymatic reactions | Jacques Fries et.al. | 2411.11696 | null |
2024-11-18 | Do Captioning Metrics Reflect Music Semantic Alignment? | Jinwoo Lee et.al. | 2411.11692 | null |
2024-11-18 | Conceptwm: A Diffusion Model Watermark for Concept Protection | Liangqi Lei et.al. | 2411.11688 | null |
2024-11-19 | GNN-Based Code Annotation Logic for Establishing Security Boundaries in C Code | Varun Gadey et.al. | 2411.11567 | null |
2024-11-19 | Cascaded Diffusion Models for 2D and 3D Microscopy Image Synthesis to Enhance Cell Segmentation | Rüveyda Yilmaz et.al. | 2411.11515 | null |
2024-11-18 | Collaborative Contrastive Network for Click-Through Rate Prediction | Chen Gao et.al. | 2411.11508 | null |
2024-11-18 | LaVin-DiT: Large Vision Diffusion Transformer | Zhaoqing Wang et.al. | 2411.11505 | null |
2024-11-18 | Alien Recombination: Exploring Concept Blends Beyond Human Cognitive Availability in Visual Art | Alejandro Hernandez et.al. | 2411.11494 | null |
2024-11-18 | MVLight: Relightable Text-to-3D Generation via Light-conditioned Multi-View Diffusion | Dongseok Shim et.al. | 2411.11475 | null |
2024-11-18 | GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts | Junwen He et.al. | 2411.11435 | null |
2024-11-18 | CLUE-MARK: Watermarking Diffusion Models using CLWE | Kareem Shehata et.al. | 2411.11434 | null |
2024-11-15 | M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation | Sucheng Ren et.al. | 2411.10433 | link |
2024-11-15 | Mitigating Parameter Degeneracy using Joint Conditional Diffusion Model for WECC Composite Load Model in Power Systems | Feiqin Zhu et.al. | 2411.10431 | null |
2024-11-15 | Multiscale Dubuc: A New Similarity Measure for Time Series | Mahsa Khazaei et.al. | 2411.10418 | link |
2024-11-15 | Experimental generation of extreme electron beams for advanced accelerator applications | Claudio Emma et.al. | 2411.10413 | null |
2024-11-15 | How to Build a Quantum Supercomputer: Scaling Challenges and Opportunities | Masoud Mohseni et.al. | 2411.10406 | null |
2024-11-15 | Nonlinearity-Driven Morphing and Control of Topological Modes in Non-Hermitian Systems | Zhao-Fan Cai et.al. | 2411.10398 | null |
2024-11-15 | Towards High-Fidelity 3D Portrait Generation with Rich Details by Cross-View Prior-Aware Diffusion | Haoran Wei et.al. | 2411.10369 | null |
2024-11-15 | Safe Text-to-Image Generation: Simply Sanitize the Prompt Embedding | Huming Qiu et.al. | 2411.10329 | null |
2024-11-15 | Probabilistic Prior Driven Attention Mechanism Based on Diffusion Model for Imaging Through Atmospheric Turbulence | Guodong Sun et.al. | 2411.10321 | null |
2024-11-15 | Assortment Optimization under the Multinomial Logit Model with Covering Constraints | Omar El Housni et.al. | 2411.10310 | null |
2024-11-15 | Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting | Ziqi Xie et.al. | 2411.10309 | link |
2024-11-15 | MDHP-Net: Detecting Injection Attacks on In-vehicle Network using Multi-Dimensional Hawkes Process and Temporal Model | Qi Liu et.al. | 2411.10258 | null |
2024-11-15 | The Unreasonable Effectiveness of Guidance for Diffusion Models | Tim Kaiser et.al. | 2411.10257 | null |
2024-11-15 | Smooth transport map via diffusion process | Arthur Stéphanovitch et.al. | 2411.10235 | null |
2024-11-15 | ColorEdit: Training-free Image-Guided Color editing with diffusion model | Xingxi Yin et.al. | 2411.10232 | null |
2024-11-14 | A Bayesian Optimization Approach to Machine Translation Reranking | Julius Cheng et.al. | 2411.09694 | null |
2024-11-14 | SimTube: Generating Simulated Video Comments through Multimodal AI and User Personas | Yu-Kai Hung et.al. | 2411.09577 | null |
2024-11-14 | Golden Noise for Diffusion Models: A Learning Framework | Zikai Zhou et.al. | 2411.09502 | null |
2024-11-14 | Sparse Bayesian Generative Modeling for Compressive Sensing | Benedikt Böck et.al. | 2411.09483 | link |
2024-11-14 | DiffRoad: Realistic and Diverse Road Scenario Generation for Autonomous Vehicle Testing | Junjie Zhou et.al. | 2411.09451 | null |
2024-11-14 | Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models | Chutian Meng et.al. | 2411.09449 | null |
2024-11-14 | A survey of probabilistic generative frameworks for molecular simulations | Richard John et.al. | 2411.09388 | link |
2024-11-14 | Multi-scale Generative Modeling for Fast Sampling | Xiongye Xiao et.al. | 2411.09356 | null |
2024-11-14 | ParaLBench: A Large-Scale Benchmark for Computational Paralinguistics over Acoustic Foundation Models | Zixing Zhang et.al. | 2411.09349 | null |
2024-11-15 | Approximate Probabilistic Inference for Time-Series Data A Robust Latent Gaussian Model With Temporal Awareness | Anton Johansson et.al. | 2411.09312 | null |
2024-11-14 | EEG-Based Speech Decoding: A Novel Approach Using Multi-Kernel Ensemble Diffusion Models | Soowon Kim et.al. | 2411.09302 | null |
2024-11-14 | LES-Talker: Fine-Grained Emotion Editing for Talking Head Generation in Linear Emotion Space | Guanwen Feng et.al. | 2411.09268 | null |
2024-11-14 | Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey | Xuannan Liu et.al. | 2411.09259 | link |
2024-11-14 | RibCageImp: A Deep Learning Framework for 3D Ribcage Implant Generation | Gyanendra Chaubey et.al. | 2411.09204 | null |
2024-11-14 | Improvement and Implementation of a Speech Emotion Recognition Model Based on Dual-Layer LSTM | Xiaoran Yang et.al. | 2411.09189 | null |
2024-11-13 | 4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization | Mijeong Kim et.al. | 2411.08879 | null |
2024-11-13 | A generalized software framework for consolidation of radiotherapy planning and delivery data from diverse data sources | Yasin Abdulkadir et.al. | 2411.08876 | null |
2024-11-13 | Offline Adaptation of Quadruped Locomotion using Diffusion Models | Reece O'Mahoney et.al. | 2411.08832 | null |
2024-11-13 | SANDWICH: Towards an Offline, Differentiable, Fully-Trainable Wireless Neural Ray-Tracing Surrogate | Yifei Jin et.al. | 2411.08767 | null |
2024-11-13 | Analyst Reports and Stock Performance: Evidence from the Chinese Market | Rui Liu et.al. | 2411.08726 | null |
2024-11-14 | Reducing ADC Front-end Costs During Training of On-sensor Printed Multilayer Perceptrons | Florentia Afentaki et.al. | 2411.08674 | null |
2024-11-13 | Joint Model Caching and Resource Allocation in Generative AI-Enabled Wireless Edge Networks | Zhang Liu et.al. | 2411.08672 | null |
2024-11-13 | Toward Human Understanding with Controllable Synthesis | Hanz Cuevas-Velasquez et.al. | 2411.08663 | null |
2024-11-13 | The Galactica database: an open, generic and versatile tool for the dissemination of simulation data in astrophysics | Damien Chapon et.al. | 2411.08647 | null |
2024-11-13 | Towards More Accurate Fake Detection on Images Generated from Advanced Generative and Neural Rendering Models | Chengdong Dong et.al. | 2411.08642 | null |
2024-11-13 | Deep Generative Demand Learning for Newsvendor and Pricing | Shijin Gong et.al. | 2411.08631 | null |
2024-11-13 | LG-Gaze: Learning Geometry-aware Continuous Prompts for Language-Guided Gaze Estimation | Pengwei Yin et.al. | 2411.08606 | null |
2024-11-13 | CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs | Suhas S Kowshik et.al. | 2411.08553 | null |
2024-11-13 | Explainers' Mental Representations of Explainees' Needs in Everyday Explanations | Michael Erol Schaffer et.al. | 2411.08514 | null |
2024-11-13 | HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere | Hatef Otroshi Shahreza et.al. | 2411.08470 | null |
2024-11-12 | Scaling Properties of Diffusion Models for Perceptual Tasks | Rahul Ravishankar et.al. | 2411.08034 | null |
2024-11-12 | GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation | Yushi Lan et.al. | 2411.08033 | null |
2024-11-12 | Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings | Aditya Sanghi et.al. | 2411.08017 | link |
2024-11-12 | JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation | Yiyang Ma et.al. | 2411.07975 | link |
2024-11-12 | Diverse capability and scaling of diffusion and auto-regressive models when learning abstract rules | Binxu Wang et.al. | 2411.07873 | null |
2024-11-12 | Trustful LLMs: Customizing and Grounding Text Generation with Knowledge Bases and Dual Decoders | Xiaofeng Zhu et.al. | 2411.07870 | null |
2024-11-12 | CDXFormer: Boosting Remote Sensing Change Detection with Extended Long Short-Term Memory | Zhenkai Wu et.al. | 2411.07863 | link |
2024-11-12 | Sparsity-Aware Optimization of In-Memory Bayesian Binary Neural Network Accelerators | Prabodh Katti et.al. | 2411.07842 | null |
2024-11-12 | Novel View Synthesis with Pixel-Space Diffusion Models | Noam Elata et.al. | 2411.07765 | null |
2024-11-12 | Nanosecond nanothermometry in an electron microscope | Florian Castioni et.al. | 2411.07764 | null |
2024-11-12 | LapGSR: Laplacian Reconstructive Network for Guided Thermal Super-Resolution | Aditya Kasliwal et.al. | 2411.07750 | null |
2024-11-12 | The relationship between general equilibrium models with infinite-lived agents and overlapping generations models, and some applications | Ngoc-Sang Pham et.al. | 2411.07674 | null |
2024-11-12 | Evaluating the Generation of Spatial Relations in Text and Image Generative Models | Shang Hong Sim et.al. | 2411.07664 | null |
2024-11-12 | Leveraging Previous Steps: A Training-free Fast Solver for Flow Diffusion | Kaiyu Song et.al. | 2411.07627 | null |
2024-11-12 | Unraveling the Connections between Flow Matching and Diffusion Probabilistic Models in Training-free Conditional Generation | Kaiyu Song et.al. | 2411.07625 | null |
2024-11-11 | Score-based generative diffusion with "active" correlated noise sources | Alexandra Lamtyugina et.al. | 2411.07233 | null |
2024-11-12 | Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models | Yoad Tewel et.al. | 2411.07232 | null |
2024-11-11 | Learning from Limited and Imperfect Data | Harsh Rangwani et.al. | 2411.07229 | null |
2024-11-11 | TempCharBERT: Keystroke Dynamics for Continuous Access Control Based on Pre-trained Language Models | Matheus Simão et.al. | 2411.07224 | null |
2024-11-11 | DLCR: A Generative Data Expansion Framework via Diffusion for Clothes-Changing Person Re-ID | Nyle Siddiqui et.al. | 2411.07205 | link |
2024-11-11 | Crossover from inhomogeneous to homogeneous response of a resonantly driven hBN quantum emitter | Domitille Gérard et.al. | 2411.07202 | null |
2024-11-11 | OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision | Cong Wei et.al. | 2411.07199 | null |
2024-11-11 | More Expressive Attention with Negative Weights | Ang Lv et.al. | 2411.07176 | link |
2024-11-11 | Edify 3D: Scalable High-Quality 3D Asset Generation | NVIDIA et.al. | 2411.07135 | null |
2024-11-11 | Benchmarking LLMs' Judgments with No Gold Standard | Shengwei Xu et.al. | 2411.07127 | link |
2024-11-11 | Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models | NVIDIA et.al. | 2411.07126 | null |
2024-11-11 | Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models | Yanchen Wang et.al. | 2411.07121 | link |
2024-11-11 | Scaling Mesh Generation via Compressive Tokenization | Haohan Weng et.al. | 2411.07025 | link |
2024-11-11 | An Electrocardiogram Monitoring Device Based on STM32 | Wenqi Guan et.al. | 2411.06962 | null |
2024-11-11 | Generative Feature Training of Thin 2-Layer Networks | Johannes Hertrich et.al. | 2411.06848 | link |
2024-11-08 | StdGEN: Semantic-Decomposed 3D Character Generation from Single Images | Yuze He et.al. | 2411.05738 | null |
2024-11-08 | Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models | Jia-Hong Huang et.al. | 2411.05706 | null |
2024-11-08 | Improving Molecular Graph Generation with Flow Matching and Optimal Transport | Xiaoyang Hou et.al. | 2411.05676 | null |
2024-11-08 | Towards Lifelong Few-Shot Customization of Text-to-Image Diffusion | Nan Song et.al. | 2411.05544 | null |
2024-11-08 | Improving image synthesis with diffusion-negative sampling | Alakh Desai et.al. | 2411.05473 | null |
2024-11-08 | Bridging the Gap between Learning and Inference for Diffusion-Based Molecule Generation | Peidong Liu et.al. | 2411.05472 | link |
2024-11-08 | IntellBot: Retrieval Augmented LLM Chatbot for Cyber Threat Knowledge Delivery | Dincy R. Arikkat et.al. | 2411.05442 | null |
2024-11-08 | RED: Residual Estimation Diffusion for Low-Dose PET Sinogram Reconstruction | Xingyu Ai et.al. | 2411.05354 | null |
2024-11-08 | Electro-diffusive modeling and the role of spine geometry on action potential propagation in neurons | Rahul Gulati et.al. | 2411.05329 | null |
2024-11-08 | Social balance in directed networks | Bingjie Hao et.al. | 2411.05327 | null |
2024-11-08 | SeqRFM: Fast RFM Analysis in Sequence Data | Yanxin Zheng et.al. | 2411.05317 | link |
2024-11-08 | Differentiable Calibration of Inexact Stochastic Simulation Models via Kernel Score Minimization | Ziwei Su et.al. | 2411.05315 | null |
2024-11-08 | A Real-time Face Mask Detection and Social Distancing System for COVID-19 using Attention-InceptionV3 Model | Abdullah Al Asif et.al. | 2411.05312 | null |
2024-11-08 | Adaptive Whole-Body PET Image Denoising Using 3D Diffusion Models with ControlNet | Boxiao Yu et.al. | 2411.05302 | null |
2024-11-08 | GPT Semantic Cache: Reducing LLM Costs and Latency via Semantic Embedding Caching | Sajal Regmi et.al. | 2411.05276 | null |
2024-11-07 | SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models | Muyang Li et.al. | 2411.05007 | link |
2024-11-07 | ProEdit: Simple Progression is All You Need for High-Quality 3D Scene Editing | Jun-Kun Chen et.al. | 2411.05006 | null |
2024-11-07 | Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models | Shuhong Zheng et.al. | 2411.05005 | null |
2024-11-07 | ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning | David Junhao Zhang et.al. | 2411.05003 | null |
2024-11-07 | SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation | Koichi Namekata et.al. | 2411.04989 | null |
2024-11-07 | Few-Shot Task Learning through Inverse Generative Modeling | Aviv Netanyahu et.al. | 2411.04987 | null |
2024-11-07 | How fast does the WallGo? A package for computing wall velocities in first-order phase transitions | Andreas Ekstedt et.al. | 2411.04970 | link |
2024-11-07 | VAIR: Visuo-Acoustic Implicit Representations for Low-Cost, Multi-Modal Transparent Surface Reconstruction in Indoor Scenes | Advaith V. Sethuraman et.al. | 2411.04963 | null |
2024-11-07 | Uncovering Hidden Subspaces in Video Diffusion Models Using Re-Identification | Mischa Dombrowski et.al. | 2411.04956 | null |
2024-11-07 | Fed-LDR: Federated Local Data-infused Graph Creation with Node-centric Model Refinement | Jiechao Gao et.al. | 2411.04936 | null |
2024-11-07 | DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion | Wenqiang Sun et.al. | 2411.04928 | null |
2024-11-07 | StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration | Panwen Hu et.al. | 2411.04925 | null |
2024-11-07 | Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion Inversion | Kaizhe Hu et.al. | 2411.04919 | link |
2024-11-07 | GASE: Generatively Augmented Sentence Encoding | Manuel Frank et.al. | 2411.04914 | null |
2024-11-07 | Controlling Human Shape and Pose in Text-to-Image Diffusion Models via Domain Adaptation | Benito Buchheim et.al. | 2411.04724 | null |
2024-11-06 | Community Forensics: Using Thousands of Generators to Train Fake Image Detectors | Jeongsoo Park et.al. | 2411.04125 | null |
2024-11-06 | Stepping Forward on the Last Mile | Chen Feng et.al. | 2411.04036 | null |
2024-11-06 | Prototyping O-RAN Enabled UAV Experimentation for the AERPAW Testbed | Joshua Moore et.al. | 2411.04027 | null |
2024-11-06 | Object-Centric Dexterous Manipulation from Human Motion Data | Yuanpei Chen et.al. | 2411.04005 | null |
2024-11-06 | Synomaly Noise and Multi-Stage Diffusion: A Novel Approach for Unsupervised Anomaly Detection in Ultrasound Imaging | Yuan Bi et.al. | 2411.04004 | null |
2024-11-06 | ET-SEED: Efficient Trajectory-Level SE(3) Equivariant Diffusion Policy | Chenrui Tie et.al. | 2411.03990 | null |
2024-11-06 | ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models | Ashutosh Srivastava et.al. | 2411.03982 | null |
2024-11-06 | Customized Multiple Clustering via Multi-Modal Subspace Proxy Learning | Jiawei Yao et.al. | 2411.03978 | link |
2024-11-06 | Bayesian algorithmic perfumery: A Hierarchical Relevance Vector Machine for the Estimation of Personalized Fragrance Preferences based on Three Sensory Layers and Jungian Personality Archetypes | Rolando Gonzales Martinez et.al. | 2411.03965 | null |
2024-11-06 | Long-Form Text-to-Music Generation with Adaptive Prompts: A Case of Study in Tabletop Role-Playing Games Soundtracks | Felipe Marra et.al. | 2411.03948 | link |
2024-11-06 | Can Custom Models Learn In-Context? An Exploration of Hybrid Architecture Performance on In-Context Learning Tasks | Ryan Campbell et.al. | 2411.03945 | link |
2024-11-06 | GUIDE-VAE: Advancing Data Generation with User Information and Pattern Dictionaries | Kutay Bölat et.al. | 2411.03936 | null |
2024-11-06 | Large Generative Model-assisted Talking-face Semantic Communication System | Feibo Jiang et.al. | 2411.03876 | null |
2024-11-06 | ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization | Huayang Huang et.al. | 2411.03862 | link |
2024-11-06 | Sub-DM:Subspace Diffusion Model with Orthogonal Decomposition for MRI Reconstruction | Yu Guan et.al. | 2411.03758 | null |
2024-11-05 | MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning | Ziliang Gan et.al. | 2411.03314 | null |
2024-11-05 | LLMs for Domain Generation Algorithm Detection | Reynier Leyva La O et.al. | 2411.03307 | null |
2024-11-05 | DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models | Ying Zhou et.al. | 2411.03250 | null |
2024-11-05 | On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models | Tariq Berrada Ifriqi et.al. | 2411.03177 | null |
2024-11-05 | Unleashing the power of novel conditional generative approaches for new materials discovery | Lev Novitskiy et.al. | 2411.03156 | link |
2024-11-05 | Local Lesion Generation is Effective for Capsule Endoscopy Image Data Augmentation in a Limited Data Setting | Adrian B. Chłopowiec et.al. | 2411.03098 | null |
2024-11-05 | Gradient-Guided Conditional Diffusion Models for Private Image Reconstruction: Analyzing Adversarial Impacts of Differential Privacy and Denoising | Tao Huang et.al. | 2411.03053 | null |
2024-11-05 | GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details | Zhongjin Luo et.al. | 2411.03047 | null |
2024-11-05 | Speaker Emotion Recognition: Leveraging Self-Supervised Models for Feature Extraction Using Wav2Vec2 and HuBERT | Pourya Jafarzadeh et.al. | 2411.02964 | null |
2024-11-05 | IMUDiffusion: A Diffusion Model for Multivariate Time Series Synthetisation for Inertial Motion Capturing Systems | Heiko Oppel et.al. | 2411.02954 | null |
2024-11-05 | LDPM: Towards undersampled MRI reconstruction with MR-VAE and Latent Diffusion Prior | Xingjian Tang et.al. | 2411.02951 | null |
2024-11-05 | A scalable generative model for dynamical system reconstruction from neuroimaging data | Eric Volkmann et.al. | 2411.02949 | link |
2024-11-05 | Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey | Ao Fu et.al. | 2411.02914 | null |
2024-11-05 | The Unreasonable Effectiveness of LLMs for Query Optimization | Peter Akioyamen et.al. | 2411.02862 | link |
2024-11-05 | ADOPT: Modified Adam Can Converge with Any |
Shohei Taniguchi et.al. | 2411.02853 | link |
2024-11-04 | Training-free Regional Prompting for Diffusion Transformers | Anthony Chen et.al. | 2411.02395 | link |
2024-11-04 | How Far is Video Generation from World Model: A Physical Law Perspective | Bingyi Kang et.al. | 2411.02385 | null |
2024-11-04 | Virgo Filaments IV: Using WISE to Measure the Modification of Star-Forming Disks in the Extended Regions Around the Virgo Cluster | Kim Conger et.al. | 2411.02352 | null |
2024-11-04 | Diffusion-based Generative Multicasting with Intent-aware Semantic Decomposition | Xinkai Liu et.al. | 2411.02334 | null |
2024-11-05 | PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance | Ruyang Liu et.al. | 2411.02327 | link |
2024-11-04 | LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation | Mufei Li et.al. | 2411.02322 | link |
2024-11-04 | CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments | Kung-Hsiang Huang et.al. | 2411.02305 | link |
2024-11-04 | Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation | Xianghui Yang et.al. | 2411.02293 | null |
2024-11-04 | Counterfactual Explanations via Riemannian Latent Space Traversal | Paraskevas Pegios et.al. | 2411.02259 | null |
2024-11-04 | FewViewGS: Gaussian Splatting with Few View Matching and Multi-stage Training | Ruihong Yin et.al. | 2411.02229 | null |
2024-11-04 | Recursive Learning of Asymptotic Variational Objectives | Alessandro Mastrototaro et.al. | 2411.02217 | null |
2024-11-04 | Digi2Real: Bridging the Realism Gap in Synthetic Data Face Recognition via Foundation Models | Anjith George et.al. | 2411.02188 | null |
2024-11-04 | Touch-to-Touch Translation -- Learning the Mapping Between Heterogeneous Tactile Sensing Technologies | Francesco Grella et.al. | 2411.02187 | null |
2024-11-04 | CleAR: Robust Context-Guided Generative Lighting Estimation for Mobile Augmented Reality | Yiqin Zhao et.al. | 2411.02179 | null |
2024-11-04 | CryptoEL: A Novel Experiential Learning Tool for Enhancing K-12 Cryptography Education | Pranathi Rayavaram et.al. | 2411.02143 | null |
2024-10-31 | Bridging Geometric States via Geometric Diffusion Bridge | Shengjie Luo et.al. | 2410.24220 | null |
2024-10-31 | Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning | Penghui Ruan et.al. | 2410.24219 | link |
2024-10-31 | DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion | Weicai Ye et.al. | 2410.24203 | link |
2024-10-31 | Multi-Attribute Linguistic Tuning for Controlled Paraphrase Generation | Mohamed Elgaar et.al. | 2410.24199 | null |
2024-10-31 | Generative modelling for mass-mapping with fast uncertainty quantification | Jessica J. Whitney et.al. | 2410.24197 | link |
2024-10-31 | AR-Pro: Counterfactual Explanations for Anomaly Repair with Formal Properties | Xiayan Ji et.al. | 2410.24178 | null |
2024-10-31 | Redefining in Dictionary: Towards a Enhanced Semantic Understanding of Creative Generation | Fu Feng et.al. | 2410.24160 | null |
2024-10-31 | Scaling Concept With Text-Guided Diffusion Models | Chao Huang et.al. | 2410.24151 | null |
2024-10-31 | Repository-Level Compositional Code Translation and Validation | Ali Reza Ibrahimzada et.al. | 2410.24117 | link |
2024-10-31 | Extended electrochemical monitoring of biomolecular binding using commercially available, reusable electrodes in microliter volumes | Jeremy Mendez et.al. | 2410.24110 | null |
2024-10-31 | Sparsh: Self-supervised touch representations for vision-based tactile sensing | Carolina Higuera et.al. | 2410.24090 | null |
2024-10-31 | Understanding Generalizability of Diffusion Models Requires Rethinking the Hidden Gaussian Structure | Xiang Li et.al. | 2410.24060 | link |
2024-10-31 | TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation | Sunjae Yoon et.al. | 2410.24037 | null |
2024-10-31 | Unveiling Synthetic Faces: How Synthetic Datasets Can Expose Real Identities | Hatef Otroshi Shahreza et.al. | 2410.24015 | null |
2024-10-31 | DiffPAD: Denoising Diffusion-based Adversarial Patch Decontamination | Jia Fu et.al. | 2410.24006 | link |
2024-10-30 | ReferEverything: Towards Segmenting Everything We Can Speak of in Videos | Anurag Bagchi et.al. | 2410.23287 | null |
2024-10-30 | Provable acceleration for diffusion models under minimal assumptions | Gen Li et.al. | 2410.23285 | null |
2024-10-30 | RelationBooth: Towards Relation-Aware Customized Object Generation | Qingyu Shi et.al. | 2410.23280 | null |
2024-10-30 | SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation | Yining Hong et.al. | 2410.23277 | null |
2024-10-30 | Multi-student Diffusion Distillation for Better One-step Generators | Yanke Song et.al. | 2410.23274 | null |
2024-10-30 | ReaWristic: Remote Touch Sensation to Fingers from a Wristband via Visually Augmented Electro-Tactile Feedback | Yudai Tanaka et.al. | 2410.23193 | null |
2024-10-30 | Real-Time Personalization for LLM-based Recommendation with Customized In-Context Learning | Keqin Bao et.al. | 2410.23136 | link |
2024-10-30 | Educating for Hardware Specialization in the Chiplet Era: A Path for the HPC Community | Kazutomo Yoshii et.al. | 2410.23127 | null |
2024-10-30 | CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense | Mingkun Zhang et.al. | 2410.23091 | link |
2024-10-30 | General Bayesian quantile regression for counts via generative modeling | Yuta Yamauchi et.al. | 2410.23081 | null |
2024-10-30 | Controlling Language and Diffusion Models by Transporting Activations | Pau Rodriguez et.al. | 2410.23054 | link |
2024-10-30 | Dispersion kinks from electronic correlations in an unconventional iron-based superconductor | Ming-Hua Chang et.al. | 2410.23044 | null |
2024-10-30 | Improving Musical Accompaniment Co-creation via Diffusion Transformers | Javier Nistal et.al. | 2410.23005 | null |
2024-10-30 | DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes | Jialiang Zhang et.al. | 2410.23004 | null |
2024-10-30 | LumiSculpt: A Consistency Lighting Control Network for Video Generation | Yuxin Zhang et.al. | 2410.22979 | null |
2024-10-29 | CaStL: Constraints as Specifications through LLM Translation for Long-Horizon Task and Motion Planning | Weihang Guo et.al. | 2410.22225 | null |
2024-10-29 | A Gaussian Process Generative Model for QCD Equation of State | Jiaxuan Gong et.al. | 2410.22160 | null |
2024-10-29 | Capacity Control is an Effective Memorization Mitigation Mechanism in Text-Conditional Diffusion Models | Raman Dutt et.al. | 2410.22149 | link |
2024-10-29 | AmpleGCG-Plus: A Strong Generative Model of Adversarial Suffixes to Jailbreak LLMs with Higher Success Rates in Fewer Attempts | Vishal Kumar et.al. | 2410.22143 | null |
2024-10-29 | Infrared photometry with InGaAs detectors: First light with SPECULOOS | Peter P. Pedersen et.al. | 2410.22140 | link |
2024-10-29 | SimRec: Mitigating the Cold-Start Problem in Sequential Recommendation by Integrating Item Similarity | Shaked Brody et.al. | 2410.22136 | link |
2024-10-29 | Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench | Zheyuan Liu et.al. | 2410.22108 | link |
2024-10-29 | Variational inference for pile-up removal at hadron colliders with diffusion models | Malte Algren et.al. | 2410.22074 | null |
2024-10-29 | PACA: Perspective-Aware Cross-Attention Representation for Zero-Shot Scene Rearrangement | Shutong Jin et.al. | 2410.22059 | null |
2024-10-29 | Dual Conditional Diffusion Models for Sequential Recommendation | Hongtao Huang et.al. | 2410.21967 | null |
2024-10-29 | PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference | Kendong Liu et.al. | 2410.21966 | null |
2024-10-29 | CT to PET Translation: A Large-scale Dataset and Domain-Knowledge-Guided Diffusion Approach | Dac Thai Nguyen et.al. | 2410.21932 | link |
2024-10-29 | Guided Diffusion-based Counterfactual Augmentation for Robust Session-based Recommendation | Muskan Gupta et.al. | 2410.21892 | null |
2024-10-29 | On the study of the limit cycles for a class of population models with time-varying factors | Renhao Tian et.al. | 2410.21848 | null |
2024-10-29 | Diffusion as Reasoning: Enhancing Object Goal Navigation with LLM-Biased Diffusion Model | Yiming Ji et.al. | 2410.21842 | null |
2024-10-28 | On Inductive Biases That Enable Generalization of Diffusion Transformers | Jie An et.al. | 2410.21273 | link |
2024-10-28 | EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation | Shih-Yang Liu et.al. | 2410.21271 | null |
2024-10-28 | LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior | Hanyu Wang et.al. | 2410.21264 | null |
2024-10-28 | One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation | Zhendong Wang et.al. | 2410.21257 | null |
2024-10-28 | On learning higher-order cumulants in diffusion models | Gert Aarts et.al. | 2410.21212 | null |
2024-10-28 | The VSPEC Collection: A suite of utilities to model spectroscopic phase curves of 3D exoplanet atmospheres in the presence of stellar variability | Ted M Johnson et.al. | 2410.21190 | null |
2024-10-28 | Trajectory Flow Matching with Applications to Clinical Time Series Modeling | Xi Zhang et.al. | 2410.21154 | link |
2024-10-28 | Synthetica: Large Scale Synthetic Data for Robot Perception | Ritvik Singh et.al. | 2410.21153 | null |
2024-10-28 | Extrapolating Prospective Glaucoma Fundus Images through Diffusion Model in Irregular Longitudinal Sequences | Zhihao Zhao et.al. | 2410.21130 | null |
2024-10-28 | Shallow Diffuse: Robust and Invisible Watermarking through Low-Dimensional Subspaces in Diffusion Models | Wenda Li et.al. | 2410.21088 | link |
2024-10-28 | Federated Time Series Generation on Feature and Temporally Misaligned Data | Chenrui Fan et.al. | 2410.21072 | null |
2024-10-28 | Kandinsky 3: Text-to-Image Synthesis for Multifunctional Generative Framework | Vladimir Arkhipkin et.al. | 2410.21061 | link |
2024-10-28 | Beyond Autoregression: Fast LLMs via Self-Distillation Through Time | Justin Deschenaux et.al. | 2410.21035 | link |
2024-10-29 | EEG-Driven 3D Object Reconstruction with Color Consistency and Diffusion Prior | Xin Xiang et.al. | 2410.20981 | null |
2024-10-28 | MovieCharacter: A Tuning-Free Framework for Controllable Character Video Synthesis | Di Qiu et.al. | 2410.20974 | null |
2024-10-25 | Model merging with SVD to tie the Knots | George Stoica et.al. | 2410.19735 | link |
2024-10-25 | Adversarial Environment Design via Regret-Guided Diffusion Models | Hojun Chung et.al. | 2410.19715 | null |
2024-10-25 | Perception, Control and Hardware for In-Hand Slip-Aware Object Manipulation with Parallel Grippers | Gabriel Arslan Waltersson et.al. | 2410.19660 | null |
2024-10-25 | DiffGS: Functional Gaussian Splatting Diffusion | Junsheng Zhou et.al. | 2410.19657 | null |
2024-10-25 | VARS: Vision-based Assessment of Risk in Security Systems | Pranav Gupta et.al. | 2410.19642 | null |
2024-10-25 | Diffusion models for lattice gauge field simulations | Qianteng Zhu et.al. | 2410.19602 | null |
2024-10-25 | Energy Efficient Dual Designs of FeFET-Based Analog In-Memory Computing with Inherent Shift-Add Capability | Zeyu Yang et.al. | 2410.19593 | null |
2024-10-25 | Hybrid Memetic Search for Electric Vehicle Routing with Time Windows, Simultaneous Pickup-Delivery, and Partial Recharges | Zubin Zheng et.al. | 2410.19580 | null |
2024-10-25 | Utilizing Image Transforms and Diffusion Models for Generative Modeling of Short and Long Time Series | Ilan Naiman et.al. | 2410.19538 | null |
2024-10-25 | Ensemble Data Assimilation for Particle-based Methods | Marius Duvillard et.al. | 2410.19525 | null |
2024-10-25 | Marked Temporal Bayesian Flow Point Processes | Hui Chen et.al. | 2410.19512 | null |
2024-10-25 | EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data | Xuetian Chen et.al. | 2410.19461 | null |
2024-10-28 | NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction | Zixuan Gong et.al. | 2410.19452 | link |
2024-10-25 | Learned Reference-based Diffusion Sampling for multi-modal distributions | Maxence Noble et.al. | 2410.19449 | null |
2024-10-25 | Generative Diffusion Models for Sequential Recommendations | Sharare Zolghadr et.al. | 2410.19429 | null |
2024-10-24 | Framer: Interactive Frame Interpolation | Wen Wang et.al. | 2410.18978 | null |
2024-10-24 | MotionCLR: Motion Generation and Training-free Editing via Understanding Attention Mechanisms | Ling-Hao Chen et.al. | 2410.18977 | null |
2024-10-24 | Unbounded: A Generative Infinite Game of Character Life Simulation | Jialu Li et.al. | 2410.18975 | null |
2024-10-24 | 3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation | Hansheng Chen et.al. | 2410.18974 | link |
2024-10-24 | On the Crucial Role of Initialization for Matrix Factorization | Bingcong Li et.al. | 2410.18965 | null |
2024-10-24 | Stable Consistency Tuning: Understanding and Improving Consistency Models | Fu-Yun Wang et.al. | 2410.18958 | link |
2024-10-24 | Generation of synthetic financial time series by diffusion models | Tomonori Takahashi et.al. | 2410.18897 | null |
2024-10-24 | Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences | Weijian Luo et.al. | 2410.18881 | null |
2024-10-24 | The Cat and Mouse Game: The Ongoing Arms Race Between Diffusion Models and Detection Methods | Linda Laurier et.al. | 2410.18866 | null |
2024-10-24 | From Efficiency to Equity: Measuring Fairness in Preference Learning | Shreeyash Gowaikar et.al. | 2410.18841 | null |
2024-10-24 | From English-Centric to Effective Bilingual: LLMs with Custom Tokenizers for Underrepresented Languages | Artur Kiulian et.al. | 2410.18836 | null |
2024-10-24 | Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation | Xiaoyu Zhang et.al. | 2410.18830 | null |
2024-10-24 | Towards Visual Text Design Transfer Across Languages | Yejin Choi et.al. | 2410.18823 | null |
2024-10-24 | Fast constrained sampling in pre-trained diffusion models | Alexandros Graikos et.al. | 2410.18804 | null |
2024-10-24 | Large Generative AI Models meet Open Networks for 6G: Integration, Platform, and Monetization | Peizheng Li et.al. | 2410.18790 | null |
2024-10-23 | DynamicCity: Large-Scale LiDAR Generation from Dynamic Scenes | Hengwei Bian et.al. | 2410.18084 | null |
2024-10-23 | Prioritized Generative Replay | Renhao Wang et.al. | 2410.18082 | null |
2024-10-23 | WorldSimBench: Towards Video Generation Models as World Simulators | Yiran Qin et.al. | 2410.18072 | null |
2024-10-23 | TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts | Yuxuan Xie et.al. | 2410.18071 | null |
2024-10-23 | Training Free Guided Flow Matching with Optimal Control | Luran Wang et.al. | 2410.18070 | null |
2024-10-23 | Spectrally shaped THz pulses from tapered dielectric waveguides | Karel Peetermans et.al. | 2410.17975 | null |
2024-10-23 | Optical Generative Models | Shiqi Chen et.al. | 2410.17970 | null |
2024-10-23 | A Wavelet Diffusion GAN for Image Super-Resolution | Lorenzo Aloisi et.al. | 2410.17966 | null |
2024-10-23 | Addressing Asynchronicity in Clinical Multimodal Fusion via Individualized Chest X-ray Generation | Wenfang Yao et.al. | 2410.17918 | link |
2024-10-23 | regAL: Python Package for Active Learning of Regression Problems | Elizaveta Surzhikova et.al. | 2410.17917 | null |
2024-10-23 | Scaling Diffusion Language Models via Adaptation from Autoregressive Models | Shansan Gong et.al. | 2410.17891 | link |
2024-10-23 | Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech | Danilo de Oliveira et.al. | 2410.17834 | null |
2024-10-23 | PGDiffSeg: Prior-Guided Denoising Diffusion Model with Parameter-Shared Attention for Breast Cancer Segmentation | Feiyan Feng et.al. | 2410.17812 | null |
2024-10-23 | GenUDC: High Quality 3D Mesh Generation with Unsigned Dual Contouring Representation | Ruowei Wang et.al. | 2410.17802 | link |
2024-10-23 | Regularized autoregressive modeling and its application to audio signal declipping | Ondřej Mokrý et.al. | 2410.17790 | link |
2024-10-22 | Large Language Models Empowered Personalized Web Agents | Hongru Cai et.al. | 2410.17236 | null |
2024-10-22 | Creativity in AI: Progresses and Challenges | Mete Ismayilzada et.al. | 2410.17218 | null |
2024-10-22 | Audio-to-Score Conversion Model Based on Whisper methodology | Hongyao Zhang et.al. | 2410.17209 | null |
2024-10-22 | Reinforcement learning on structure-conditioned categorical diffusion for protein inverse folding | Yasha Ektefaie et.al. | 2410.17173 | link |
2024-10-22 | Performance of the CMS high-level trigger during LHC Run 2 | CMS Collaboration et.al. | 2410.17038 | null |
2024-10-22 | Hybrid Generative AI for De Novo Design of Co-Crystals with Enhanced Tabletability | Nina Gubina et.al. | 2410.17005 | link |
2024-10-22 | DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization | Haowei Zhu et.al. | 2410.16942 | null |
2024-10-22 | Hierarchical Clustering for Conditional Diffusion in Image Generation | Jorge da Silva Goncalves et.al. | 2410.16910 | link |
2024-10-22 | Bayes without Underfitting: Fully Correlated Deep Learning Posteriors via Alternating Projections | Marco Miani et.al. | 2410.16901 | null |
2024-10-22 | VistaDream: Sampling multiview consistent images for single-view scene reconstruction | Haiping Wang et.al. | 2410.16892 | null |
2024-10-22 | CK4Gen: A Knowledge Distillation Framework for Generating High-Utility Synthetic Survival Datasets in Healthcare | Nicholas I-Hsien Kuo et.al. | 2410.16872 | null |
2024-10-22 | MPDS: A Movie Posters Dataset for Image Generation with Diffusion Model | Meng Xu et.al. | 2410.16840 | null |
2024-10-22 | Bridging Search and Recommendation in Generative Retrieval: Does One Task Help the Other? | Gustavo Penha et.al. | 2410.16823 | null |
2024-10-22 | Evaluating the Effectiveness of Attack-Agnostic Features for Morphing Attack Detection | Laurent Colbois et.al. | 2410.16802 | link |
2024-10-22 | One-Step Diffusion Distillation through Score Implicit Matching | Weijian Luo et.al. | 2410.16794 | link |
2024-10-21 | MvDrag3D: Drag-based Creative 3D Editing via Multi-view Generation-Reconstruction Priors | Honghua Chen et.al. | 2410.16272 | null |
2024-10-21 | Agent-to-Sim: Learning Interactive Behavior Models from Casual Longitudinal Videos | Gengshan Yang et.al. | 2410.16259 | null |
2024-10-21 | Distribution Learning with Valid Outputs Beyond the Worst-Case | Nick Rittler et.al. | 2410.16253 | null |
2024-10-21 | Building A Coding Assistant via the Retrieval-Augmented Language Model | Xinze Li et.al. | 2410.16229 | link |
2024-10-21 | CiteClick: A Browser Extension for Real-Time Scholar Citation Tracking | Nishat Raihan et.al. | 2410.16211 | null |
2024-10-21 | A Framework for Evaluating Predictive Models Using Synthetic Image Covariates and Longitudinal Data | Simon Deltadahl et.al. | 2410.16177 | null |
2024-10-22 | Warped Diffusion: Solving Video Inverse Problems with Image Diffusion Models | Giannis Daras et.al. | 2410.16152 | null |
2024-10-21 | Modelling Structured Data Learning with Restricted Boltzmann Machines in the Teacher-Student Setting | Robin Thériault et.al. | 2410.16150 | null |
2024-10-21 | SeaDAG: Semi-autoregressive Diffusion for Conditional Directed Acyclic Graph Generation | Xinyi Zhou et.al. | 2410.16119 | null |
2024-10-21 | Critical Example Mining for Vehicle Trajectory Prediction using Flow-based Generative Models | Zhezhang Ding et.al. | 2410.16083 | null |
2024-10-21 | Continuous Speech Synthesis using per-token Latent Diffusion | Arnon Turetzky et.al. | 2410.16048 | null |
2024-10-21 | Some generalizations of the convective model of jet generation | S. N. Artekha et.al. | 2410.16035 | null |
2024-10-21 | ComPO: Community Preferences for Language Model Personalization | Sachin Kumar et.al. | 2410.16027 | null |
2024-10-21 | Massimo: Public Queue Monitoring and Management using Mass-Spring Model | Abhijeet Kumar et.al. | 2410.16012 | null |
2024-10-21 | AI-Driven Innovations in Modern Cloud Computing | Animesh Kumar et.al. | 2410.15960 | null |
2024-10-18 | BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities | Shaozhe Hao et.al. | 2410.14672 | link |
2024-10-18 | How Does Data Diversity Shape the Weight Landscape of Neural Networks? | Yang Ba et.al. | 2410.14602 | null |
2024-10-18 | Bayesian Multi-wavelength Imaging of the LMC SN1987A with SRG/eROSITA | Vincent Eberle et.al. | 2410.14599 | null |
2024-10-18 | Neuro-Symbolic Traders: Assessing the Wisdom of AI Crowds in Markets | Namid R. Stillman et.al. | 2410.14587 | null |
2024-10-18 | Reimagining partial thickness keratoplasty: An eye mountable robot for autonomous big bubble needle insertion | Y. Wang et.al. | 2410.14577 | null |
2024-10-18 | Multi-modal Pose Diffuser: A Multimodal Generative Conditional Pose Prior | Calvin-Khang Ta et.al. | 2410.14540 | null |
2024-10-18 | Blockchain-Based Trust and Transparency in Airline Reservation Systems using Microservices Architecture | Biman Barua et.al. | 2410.14518 | null |
2024-10-18 | LEAD: Latent Realignment for Human Motion Diffusion | Nefeli Andreou et.al. | 2410.14508 | null |
2024-10-18 | Reinforcement Learning in Non-Markov Market-Making | Luca Lalor et.al. | 2410.14504 | null |
2024-10-18 | Data-driven topology design with persistent homology for enhancing population diversity | Taisei Kii et.al. | 2410.14496 | null |
2024-10-18 | ANT: Adaptive Noise Schedule for Time Series Diffusion Models | Seunghan Lee et.al. | 2410.14488 | link |
2024-10-21 | CaTs and DAGs: Integrating Directed Acyclic Graphs with Transformers and Fully-Connected Neural Networks for Causally Constrained Predictions | Matthew J. Vowels et.al. | 2410.14485 | link |
2024-10-18 | DRL Optimization Trajectory Generation via Wireless Network Intent-Guided Diffusion Models for Optimizing Resource Allocation | Junjie Wu et.al. | 2410.14481 | null |
2024-10-18 | Flow-based Sampling for Entanglement Entropy and the Machine Learning of Defects | Andrea Bulgarelli et.al. | 2410.14466 | null |
2024-10-18 | FashionR2R: Texture-preserving Rendered-to-Real Image Translation with Diffusion Models | Rui Hu et.al. | 2410.14429 | null |
2024-10-17 | Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens | Lijie Fan et.al. | 2410.13863 | null |
2024-10-17 | Diffusing States and Matching Scores: A New Framework for Imitation Learning | Runzhe Wu et.al. | 2410.13855 | link |
2024-10-17 | Influence Functions for Scalable Data Attribution in Diffusion Models | Bruno Mlodozeniec et.al. | 2410.13850 | null |
2024-10-17 | VidPanos: Generative Panoramic Videos from Casual Panning Videos | Jingwei Ma et.al. | 2410.13832 | null |
2024-10-17 | DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control | Yujie Wei et.al. | 2410.13830 | null |
2024-10-17 | Deep Generative Models Unveil Patterns in Medical Images Through Vision-Language Conditioning | Xiaodan Xing et.al. | 2410.13823 | link |
2024-10-17 | ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution | Junhao Gu et.al. | 2410.13807 | null |
2024-10-17 | Probing the Latent Hierarchical Structure of Data via Diffusion Models | Antonio Sclocchi et.al. | 2410.13770 | null |
2024-10-17 | Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers | Yuchen Liang et.al. | 2410.13746 | null |
2024-10-17 | Improved Convergence Rate for Diffusion Probabilistic Models | Gen Li et.al. | 2410.13738 | null |
2024-10-17 | Optimizing Probabilistic Conformal Prediction with Vectorized Non-Conformity Scores | Minxing Zheng et.al. | 2410.13735 | null |
2024-10-18 | DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation | Hanbo Cheng et.al. | 2410.13726 | link |
2024-10-17 | Movie Gen: A Cast of Media Foundation Models | Adam Polyak et.al. | 2410.13720 | link |
2024-10-18 | Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion | Yijun Liang et.al. | 2410.13674 | link |
2024-10-17 | Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design | Chenyu Wang et.al. | 2410.13643 | link |
2024-10-16 | Geometry-Aware Generative Autoencoders for Warped Riemannian Metric Learning and Generative Modeling on Data Manifolds | Xingzhi Sun et.al. | 2410.12779 | null |
2024-10-16 | Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts | Hongcheng Gao et.al. | 2410.12777 | link |
2024-10-16 | SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation | Jaehong Yoon et.al. | 2410.12761 | null |
2024-10-16 | Signature of Vertical Mixing in Hydrogen-dominated Exoplanet Atmospheres | Vikas Soni et.al. | 2410.12737 | null |
2024-10-16 | Counterfactual Generative Modeling with Variational Causal Inference | Yulun Wu et.al. | 2410.12730 | null |
2024-10-16 | FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression | Zhenheng Tang et.al. | 2410.12707 | null |
2024-10-16 | Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization | Xingqi Wang et.al. | 2410.12700 | link |
2024-10-16 | AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing | DuoSheng Chen et.al. | 2410.12696 | null |
2024-10-16 | 3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation | Dewei Zhou et.al. | 2410.12669 | null |
2024-10-16 | Towards Designing Scalable Quantum-Enhanced Generative Networks for Neutrino Physics Experiments with Liquid Argon Time Projection Chambers | Andrea Delgado et.al. | 2410.12650 | null |
2024-10-16 | A Robo-Advisor System: expected utility modeling via pairwise comparisons | Bo Chen et.al. | 2410.12570 | null |
2024-10-16 | One Step Diffusion via Shortcut Models | Kevin Frans et.al. | 2410.12557 | link |
2024-10-16 | Disentangling data distribution for Federated Learning | Xinyuan Zhao et.al. | 2410.12530 | null |
2024-10-16 | Shaping a Stabilized Video by Mitigating Unintended Changes for Concept-Augmented Video Editing | Mingce Guo et.al. | 2410.12526 | null |
2024-10-16 | MING: A Functional Approach to Learning Molecular Generative Models | Van Khoa Nguyen et.al. | 2410.12522 | null |
2024-10-15 | High-Resolution Frame Interpolation with Patch-based Cascaded Diffusion | Junhwa Hur et.al. | 2410.11838 | null |
2024-10-15 | On the Effectiveness of Dataset Alignment for Fake Image Detection | Anirudh Sundara Rajan et.al. | 2410.11835 | null |
2024-10-15 | Bayesian Experimental Design via Contrastive Diffusions | Jacopo Iollo et.al. | 2410.11826 | link |
2024-10-15 | KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities | Hsin-Ping Huang et.al. | 2410.11824 | null |
2024-10-15 | Improving Long-Text Alignment for Text-to-Image Diffusion Models | Luping Liu et.al. | 2410.11817 | link |
2024-10-15 | SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing | Zhiyuan Zhang et.al. | 2410.11815 | null |
2024-10-16 | Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices | Zhiyuan Ma et.al. | 2410.11795 | null |
2024-10-15 | G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks | Guibin Zhang et.al. | 2410.11782 | null |
2024-10-15 | Technical Report of 1:10 Scale Autonomous Vehicle Robot | Amirhossein Kheiri Holighi et.al. | 2410.11746 | null |
2024-10-15 | Probabilistic Principles for Biophysics and Neuroscience: Entropy Production, Bayesian Mechanics & the Free-Energy Principle | Lancelot Da Costa et.al. | 2410.11735 | null |
2024-10-15 | Patch-Based Diffusion Models Beat Whole-Image Models for Mismatched Distribution Inverse Problems | Jason Hu et.al. | 2410.11730 | null |
2024-10-15 | Parameter estimation of structural dynamics with neural operators enabled surrogate modeling | Mingyuan Zhou et.al. | 2410.11712 | null |
2024-10-15 | Findings of the WMT 2024 Shared Task on Chat Translation | Wafaa Mohammed et.al. | 2410.11624 | null |
2024-10-15 | DeformPAM: Data-Efficient Learning for Long-horizon Deformable Object Manipulation via Preference-based Action Alignment | Wendi Chen et.al. | 2410.11584 | link |
2024-10-15 | A Data-Driven Aggressive Autonomous Racing Framework Utilizing Local Trajectory Planning with Velocity Prediction | Zhouheng Li et.al. | 2410.11570 | link |
2024-10-14 | Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models | Jingzhi Bao et.al. | 2410.10821 | link |
2024-10-15 | TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models | Mu Cai et.al. | 2410.10818 | link |
2024-10-14 | LVD-2M: A Long-take Video Dataset with Temporally Dense Captions | Tianwei Xiong et.al. | 2410.10816 | link |
2024-10-14 | Depth Any Video with Scalable Synthetic Data | Honghui Yang et.al. | 2410.10815 | link |
2024-10-14 | HART: Efficient Visual Generation with Hybrid Autoregressive Transformer | Haotian Tang et.al. | 2410.10812 | link |
2024-10-14 | TrajDiffuse: A Conditional Diffusion Model for Environment-Aware Trajectory Prediction | Qingze et.al. | 2410.10804 | link |
2024-10-14 | Boosting Camera Motion Control for Video Diffusion Transformers | Soon Yau Cheong et.al. | 2410.10802 | null |
2024-10-14 | Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations | Litu Rout et.al. | 2410.10792 | null |
2024-10-14 | ControlMM: Controllable Masked Motion Generation | Ekkasit Pinyoanuntapong et.al. | 2410.10780 | null |
2024-10-14 | Adaptive Diffusion Terrain Generator for Autonomous Uneven Terrain Navigation | Youwei Yu et.al. | 2410.10766 | null |
2024-10-14 | DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships | Zhang Wan et.al. | 2410.10751 | null |
2024-10-14 | CosForce: A Force-Based General Model for Simulating Pedestrian Anticipation and Reaction Mechanisms | Jinghui Wang et.al. | 2410.10746 | null |
2024-10-14 | FlexGen: Flexible Multi-View Generation from Text and Image Inputs | Xinli Xu et.al. | 2410.10745 | null |
2024-10-14 | Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models | Junyu Chen et.al. | 2410.10733 | link |
2024-10-14 | Large Language Models Are Active Critics in NLG Evaluation | Shuying Xu et.al. | 2410.10724 | null |
2024-10-11 | SceneCraft: Layout-Guided 3D Scene Generation | Xiuyu Yang et.al. | 2410.09049 | link |
2024-10-11 | Linear Convergence of Diffusion Models Under the Manifold Hypothesis | Peter Potaptchik et.al. | 2410.09046 | null |
2024-10-11 | PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents | Xiangyu Yin et.al. | 2410.09034 | link |
2024-10-11 | Semantic Score Distillation Sampling for Compositional Text-to-3D Generation | Ling Yang et.al. | 2410.09009 | link |
2024-10-11 | WaveDiffusion: Exploring Full Waveform Inversion via Joint Diffusion in the Latent Space | Hanchen Wang et.al. | 2410.09002 | null |
2024-10-11 | Maximizing the Potential of Synthetic Data: Insights from Random Matrix Theory | Aymane El Firdoussi et.al. | 2410.08942 | null |
2024-10-11 | DiffPO: A causal diffusion model for learning distributions of potential outcomes | Yuchen Ma et.al. | 2410.08924 | null |
2024-10-11 | An End-to-End Deep Learning Method for Solving Nonlocal Allen-Cahn and Cahn-Hilliard Phase-Field Models | Yuwei Geng et.al. | 2410.08914 | null |
2024-10-11 | Conditional Generative Models for Contrast-Enhanced Synthesis of T1w and T1 Maps in Brain MRI | Moritz Piening et.al. | 2410.08894 | link |
2024-10-11 | MATCH: Model-Aware TVM-based Compilation for Heterogeneous Edge Devices | Mohamed Amine Hamdi et.al. | 2410.08855 | link |
2024-10-14 | LIME-Eval: Rethinking Low-light Image Enhancement Evaluation via Object Detection | Mingjia Li et.al. | 2410.08810 | link |
2024-10-11 | Bad Neighbors: On Understanding VPN Provider Networks | Teemu Rytilahti et.al. | 2410.08737 | link |
2024-10-11 | 5G as Enabler for Industrie 4.0 Use Cases: Challenges and Concepts | M. Gundall et.al. | 2410.08726 | null |
2024-10-11 | Investigating Human-Computer Interaction and Visual Comprehension in Text Generation Process of Natural Language Generation Models | Yunchao Wang et.al. | 2410.08723 | null |
2024-10-11 | Impact of Surface Reflections in Maritime Obstacle Detection | Samed Yalçın et.al. | 2410.08713 | link |
2024-10-10 | LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts | Anh-Quan Cao et.al. | 2410.08211 | null |
2024-10-10 | DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models | Xiaoxiao He et.al. | 2410.08207 | null |
2024-10-10 | HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation | Shanyan Guan et.al. | 2410.08192 | null |
2024-10-10 | DifFRelight: Diffusion-Based Facial Performance Relighting | Mingming He et.al. | 2410.08188 | null |
2024-10-10 | RGM: Reconstructing High-fidelity 3D Car Assets with Relightable 3D-GS Generative Model from a Single Image | Xiaoxue Chen et.al. | 2410.08181 | null |
2024-10-10 | ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion | Zitian Zhang et.al. | 2410.08168 | null |
2024-10-10 | DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation | Jiatao Gu et.al. | 2410.08159 | null |
2024-10-10 | Progressive Autoregressive Video Diffusion Models | Desai Xie et.al. | 2410.08151 | link |
2024-10-10 | Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction | Jarrid Rector-Brooks et.al. | 2410.08134 | null |
2024-10-10 | Robust AI-Generated Text Detection by Restricted Embeddings | Kristian Kuznetsov et.al. | 2410.08113 | link |
2024-10-10 | LiPO: LiDAR Inertial Odometry for ICP Comparison | Darwin Mick et.al. | 2410.08097 | null |
2024-10-10 | Unstable Unlearning: The Hidden Risk of Concept Resurgence in Diffusion Models | Vinith M. Suriyakumar et.al. | 2410.08074 | null |
2024-10-10 | Reversible Decoupling Network for Single Image Reflection Removal | Hao Zhao et.al. | 2410.08063 | link |
2024-10-10 | A Target-Aware Analysis of Data Augmentation for Hate Speech Detection | Camilla Casula et.al. | 2410.08053 | null |
2024-10-10 | LADIMO: Face Morph Generation through Biometric Template Inversion with Latent Diffusion | Marcel Grimmer et.al. | 2410.07988 | link |
2024-10-09 | IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation | Xinchen Zhang et.al. | 2410.07171 | link |
2024-10-09 | Sylber: Syllabic Embedding Representation of Speech from Raw Audio | Cheol Jun Cho et.al. | 2410.07168 | link |
2024-10-09 | AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation | Yukang Cao et.al. | 2410.07164 | null |
2024-10-09 | InstructG2I: Synthesizing Images from Multimodal Attributed Graphs | Bowen Jin et.al. | 2410.07157 | link |
2024-10-09 | Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis | Bohan Zeng et.al. | 2410.07155 | link |
2024-10-10 | EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models | Rui Zhao et.al. | 2410.07133 | link |
2024-10-09 | Personalized Visual Instruction Tuning | Renjie Pi et.al. | 2410.07113 | link |
2024-10-09 | A Gentle Introduction and Tutorial on Deep Generative Models in Transportation Research | Seongjin Choi et.al. | 2410.07066 | link |
2024-10-09 | Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax | Ivan Butakov et.al. | 2410.06993 | null |
2024-10-09 | Diffusion Density Estimators | Akhil Premkumar et.al. | 2410.06986 | null |
2024-10-09 | Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control | Shimon Vainer et.al. | 2410.06985 | null |
2024-10-09 | Structure-Centric Robust Monocular Depth Estimation via Knowledge Distillation | Runze Chen et.al. | 2410.06982 | null |
2024-10-09 | Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think | Sihyun Yu et.al. | 2410.06940 | link |
2024-10-09 | VEC-Sim: A Simulation Platform for Evaluating Service Caching and Computation Offloading Policies in Vehicular Edge Networks | Fan Wu et.al. | 2410.06934 | null |
2024-10-09 | Generative Model for Less-Resourced Language with 1 billion parameters | Domen Vreš et.al. | 2410.06898 | null |
2024-10-07 | DART: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control | Kaifeng Zhao et.al. | 2410.05260 | null |
2024-10-07 | GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting | Yukang Cao et.al. | 2410.05259 | null |
2024-10-07 | SePPO: Semi-Policy Preference Optimization for Diffusion Alignment | Daoan Zhang et.al. | 2410.05255 | link |
2024-10-07 | DiffuseReg: Denoising Diffusion Model for Obtaining Deformation Fields in Unsupervised Deformable Image Registration | Yongtai Zhuo et.al. | 2410.05234 | link |
2024-10-07 | Density estimation with LLMs: a geometric investigation of in-context learning trajectories | Toni J. B. Liu et.al. | 2410.05218 | null |
2024-10-07 | Avoiding Deadlocks via Weak Deadlock Sets | Gianpaolo Oriolo et.al. | 2410.05175 | null |
2024-10-07 | Presto! Distilling Steps and Layers for Accelerating Music Generation | Zachary Novack et.al. | 2410.05167 | null |
2024-10-08 | A Simulation-Free Deep Learning Approach to Stochastic Optimal Control | Mengjian Hua et.al. | 2410.05163 | null |
2024-10-07 | Smart Jamming Attack and Mitigation on Deep Transfer Reinforcement Learning Enabled Resource Allocation for Network Slicing | Shavbo Salehi et.al. | 2410.05153 | null |
2024-10-07 | Leveraging Multimodal Diffusion Models to Accelerate Imaging with Side Information | Timofey Efimov et.al. | 2410.05143 | null |
2024-10-07 | Agnostic Smoothed Online Learning | Moïse Blanchard et.al. | 2410.05124 | null |
2024-10-07 | Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning | Ayano Hiranaka et.al. | 2410.05116 | null |
2024-10-07 | Synthetic Generation of Dermatoscopic Images with GAN and Closed-Form Factorization | Rohan Reddy Mekala et.al. | 2410.05114 | null |
2024-10-07 | Hyper-Representations: Learning from Populations of Neural Networks | Konstantin Schürholt et.al. | 2410.05107 | link |
2024-10-07 | DreamSat: Towards a General 3D Model for Novel View Synthesis of Space Objects | Nidhi Mathihalli et.al. | 2410.05097 | link |
2024-10-04 | Estimating Body and Hand Motion in an Ego-sensed World | Brent Yi et.al. | 2410.03665 | null |
2024-10-04 | Enhance Reasoning by Learning from Mistakes: Peer-Review Knowledge Distillation from Multiple Large Language Models | Zhuochun Li et.al. | 2410.03663 | null |
2024-10-04 | Geometric Representation Condition Improves Equivariant Molecule Generation | Zian Li et.al. | 2410.03655 | null |
2024-10-04 | Aligning LLMs with Individual Preferences via Interaction | Shujin Wu et.al. | 2410.03642 | link |
2024-10-04 | Real-World Benchmarks Make Membership Inference Attacks Fail on Diffusion Models | Chumeng Liang et.al. | 2410.03640 | link |
2024-10-04 | Conditional Enzyme Generation Using Protein Language Models with Adapters | Jason Yang et.al. | 2410.03634 | null |
2024-10-04 | How Discrete and Continuous Diffusion Meet: Comprehensive Analysis of Discrete Diffusion Models via a Stochastic Integral Framework | Yinuo Ren et.al. | 2410.03601 | null |
2024-10-04 | Teaching Transformers Modular Arithmetic at Scale | Eshika Saxena et.al. | 2410.03569 | null |
2024-10-04 | Not All Diffusion Model Activations Have Been Evaluated as Discriminative Features | Benyuan Meng et.al. | 2410.03558 | link |
2024-10-04 | Loading Ceramics: Visualising Possibilities of Robotics in Ceramics | Varvara Guljajeva et.al. | 2410.03550 | null |
2024-10-04 | NRGBoost: Energy-Based Generative Boosted Trees | João Bravo et.al. | 2410.03535 | null |
2024-10-04 | Generative Artificial Intelligence for Navigating Synthesizable Chemical Space | Wenhao Gao et.al. | 2410.03494 | link |
2024-10-04 | SeBS-Flow: Benchmarking Serverless Cloud Function Workflows | Larissa Schmid et.al. | 2410.03480 | null |
2024-10-04 | Formalizing MLTL Formula Progression in Isabelle/HOL | Katherine Kosaian et.al. | 2410.03465 | null |
2024-10-04 | Diffusion State-Guided Projected Gradient for Inverse Problems | Rayhan Zirvi et.al. | 2410.03463 | null |
2024-10-03 | SIEVE: General Purpose Data Filtering System Matching GPT-4o Accuracy at 1% the Cost | Jifan Zhang et.al. | 2410.02755 | null |
2024-10-03 | CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation | Han He et.al. | 2410.02748 | null |
2024-10-03 | Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization | Lei Xu et.al. | 2410.02741 | link |
2024-10-03 | Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models | Zhengfeng Lai et.al. | 2410.02740 | null |
2024-10-03 | Custom Non-Linear Model Predictive Control for Obstacle Avoidance in Indoor and Outdoor Environments | Lara Laban et.al. | 2410.02732 | link |
2024-10-03 | A Photonic Parameter-shift Rule: Enabling Gradient Computation for Photonic Quantum Computers | Axel Pappalardo et.al. | 2410.02726 | null |
2024-10-03 | AlzhiNet: Traversing from 2DCNN to 3DCNN, Towards Early Detection and Diagnosis of Alzheimer's Disease | Romoke Grace Akindele et.al. | 2410.02714 | null |
2024-10-03 | SteerDiff: Steering towards Safe Text-to-Image Diffusion Models | Hongxiang Zhang et.al. | 2410.02710 | null |
2024-10-03 | ControlAR: Controllable Image Generation with Autoregressive Models | Zongming Li et.al. | 2410.02705 | link |
2024-10-03 | User-centric Immersive Communications in 6G: A Data-oriented Approach via Digital Twin | Conghao Zhou et.al. | 2410.02688 | null |
2024-10-03 | GUD: Generation with Unified Diffusion | Mathis Gerdes et.al. | 2410.02667 | null |
2024-10-03 | Grounded Answers for Multi-agent Decision-making Problem through Generative World Model | Zeyang Liu et.al. | 2410.02664 | null |
2024-10-03 | Scalable Simulation-free Entropic Unbalanced Optimal Transport | Jaemoo Choi et.al. | 2410.02656 | null |
2024-10-03 | Measuring and Improving Persuasiveness of Generative Models | Somesh Singh et.al. | 2410.02653 | null |
2024-10-03 | Efficient calibration of the shifted square-root diffusion model to credit default swap spreads using asymptotic approximations | Ankush Agarwal et.al. | 2410.02645 | null |
2024-10-02 | FabricDiffusion: High-Fidelity Texture Transfer for 3D Garments Generation from In-The-Wild Clothing Images | Cheng Zhang et.al. | 2410.01801 | null |
2024-10-02 | Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space | Yangming Li et.al. | 2410.01796 | null |
2024-10-02 | Dynamical-generative downscaling of climate model ensembles | Ignacio Lopez-Gomez et.al. | 2410.01776 | null |
2024-10-02 | Towards deep learning sequence-structure co-generation for protein design | Chentong Wang et.al. | 2410.01773 | null |
2024-10-02 | ImageFolder: Autoregressive Image Generation with Folded Tokens | Xiang Li et.al. | 2410.01756 | link |
2024-10-02 | AssessITS: Integrating procedural guidelines and practical evaluation metrics for organizational IT and Cybersecurity risk assessment | Mir Mehedi Rahman et.al. | 2410.01750 | null |
2024-10-02 | VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models | Kailai Feng et.al. | 2410.01738 | link |
2024-10-02 | HarmoniCa: Harmonizing Training and Inference for Better Feature Cache in Diffusion Transformer Acceleration | Yushi Huang et.al. | 2410.01723 | null |
2024-10-02 | Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective | Zeyu Gan et.al. | 2410.01720 | link |
2024-10-02 | COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation | Mingzhen Sun et.al. | 2410.01718 | null |
2024-10-02 | A Mathematics-Inspired Learning-to-Optimize Framework for Decentralized Optimization | Yutong He et.al. | 2410.01700 | null |
2024-10-02 | Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding | Yao Teng et.al. | 2410.01699 | link |
2024-10-02 | Lossy Semantic Communication for the Logical Deduction of the State of the World | Ahmet Faruk Saz et.al. | 2410.01676 | link |
2024-10-02 | Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering | Klaus-Rudolf Kladny et.al. | 2410.01660 | null |
2024-10-02 | On The Adaptation of Unlimiformer for Decoder-Only Transformers | Kian Ahrabian et.al. | 2410.01637 | null |
2024-09-30 | SpaceMesh: A Continuous Representation for Learning Manifold Surface Meshes | Tianchang Shen et.al. | 2409.20562 | null |
2024-09-30 | Annealing Flow Generative Model Towards Sampling High-Dimensional and Multi-Modal Distributions | Dongze Wu et.al. | 2409.20547 | link |
2024-09-30 | A Compact Quantum Random Number Generator Based on Balanced Detection of Shot Noise | Jaideep Singh et.al. | 2409.20515 | null |
2024-09-30 | NUTRIVISION: A System for Automatic Diet Management in Smart Healthcare | Madhumita Veeramreddy et.al. | 2409.20508 | null |
2024-09-30 | COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models | Divyanshu Daiya et.al. | 2409.20502 | null |
2024-09-30 | FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing | Lingling Cai et.al. | 2409.20500 | null |
2024-09-30 | All-optical autoencoder machine learning framework using diffractive processors | Peijie Feng et.al. | 2409.20346 | null |
2024-09-30 | Devil is in Details: Locality-Aware 3D Abdominal CT Volume Generation for Self-Supervised Organ Segmentation | Yuran Wang et.al. | 2409.20332 | null |
2024-09-30 | UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation | Cheng Zhang et.al. | 2409.20197 | link |
2024-09-30 | Ensemble Kalman Diffusion Guidance: A Derivative-free Method for Inverse Problems | Hongkai Zheng et.al. | 2409.20175 | null |
2024-09-30 | Erase, then Redraw: A Novel Data Augmentation Approach for Free Space Detection Using Diffusion Model | Fulong Ma et.al. | 2409.20164 | null |
2024-09-30 | Conditional Diffusion Models are Minimax-Optimal and Manifold-Adaptive for Conditional Distribution Estimation | Rong Tang et.al. | 2409.20124 | null |
2024-09-30 | Training a Computer Vision Model for Commercial Bakeries with Primarily Synthetic Images | Thomas H. Schmitt et.al. | 2409.20122 | null |
2024-09-30 | Reaction-diffusion model for a population structured in phenotype and space I -- Criterion for persistence | Nathanaël Boutillon et.al. | 2409.20118 | null |
2024-09-30 | Near-Field Coupling Coil System: A Novel Radiofrequency Coil Solution for MRI | Zhiguang Mo et.al. | 2409.20095 | null |
2024-09-27 | Gen Li et.al. | 2409.18959 | null | |
2024-09-27 | ReviveDiff: A Universal Diffusion Model for Restoring Images in Adverse Weather Conditions | Wenfeng Huang et.al. | 2409.18932 | null |
2024-09-27 | Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors | Yunlong Lin et.al. | 2409.18899 | null |
2024-09-27 | Detecting Dataset Abuse in Fine-Tuning Stable Diffusion Models for Text-to-Image Synthesis | Songrui Wang et.al. | 2409.18897 | null |
2024-09-27 | HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models | Yu Zhou et.al. | 2409.18893 | null |
2024-09-27 | Explainable Artifacts for Synthetic Western Blot Source Attribution | João Phillipe Cardenuto et.al. | 2409.18881 | link |
2024-09-27 | Emu3: Next-Token Prediction is All You Need | Xinlong Wang et.al. | 2409.18869 | null |
2024-09-27 | Challenges of Generating Structurally Diverse Graphs | Fedor Velikonivtsev et.al. | 2409.18859 | link |
2024-09-27 | Moldable Development Patterns | Oscar Nierstrasz et.al. | 2409.18811 | null |
2024-09-27 | Convergence of Diffusion Models Under the Manifold Hypothesis in High-Dimensions | Iskander Azangulov et.al. | 2409.18804 | null |
2024-09-27 | Student-Oriented Teacher Knowledge Refinement for Knowledge Distillation | Chaomin Shen et.al. | 2409.18785 | null |
2024-09-27 | Geometric deep learning for galaxy-halo connection: a case study for galaxy intrinsic alignments | Yesukhei Jagvaral et.al. | 2409.18761 | null |
2024-09-27 | Cottention: Linear Transformers With Cosine Attention | Gabriel Mongaras et.al. | 2409.18747 | link |
2024-09-27 | Read Over the Lines: Attacking LLMs and Toxicity Detection Systems with ASCII Art to Mask Profanity | Sergey Berezin et.al. | 2409.18708 | link |
2024-09-27 | MG-Net: Learn to Customize QAOA with Circuit Depth Awareness | Yang Qian et.al. | 2409.18692 | link |
2024-09-26 | FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner | Wenliang Zhao et.al. | 2409.18128 | link |
2024-09-26 | Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction | Jing He et.al. | 2409.18124 | null |
2024-09-26 | EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation | Jiaxiang Tang et.al. | 2409.18114 | null |
2024-09-26 | MALPOLON: A Framework for Deep Species Distribution Modeling | Theo Larcher et.al. | 2409.18102 | link |
2024-09-26 | StackGen: Generating Stable Structures from Silhouettes via Diffusion | Luzhe Sun et.al. | 2409.18098 | null |
2024-09-26 | DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models | Helin Cao et.al. | 2409.18092 | null |
2024-09-26 | Stable Video Portraits | Mirela Ostrek et.al. | 2409.18083 | null |
2024-09-26 | LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field | Huan Wang et.al. | 2409.18057 | link |
2024-09-26 | Automated Detection and Analysis of Power Words in Persuasive Text Using Natural Language Processing | Sahil Garje et.al. | 2409.18033 | null |
2024-09-26 | PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging | Xin Cai et.al. | 2409.17996 | null |
2024-09-26 | Joint Localization and Planning using Diffusion | L. Lao Beyer et.al. | 2409.17995 | null |
2024-09-26 | Manufacturing, processing, applications, and advancements of Fe-based shape memory alloys | Anwar Algamal et.al. | 2409.17973 | null |
2024-09-26 | CNCA: Toward Customizable and Natural Generation of Adversarial Camouflage for Vehicle Detectors | Linye Lyu et.al. | 2409.17963 | null |
2024-09-26 | Relativistic diffusion model for hadron production in p-Pb collisions at the LHC | Philipp Schulz et.al. | 2409.17960 | null |
2024-09-26 | Perturb, Attend, Detect and Localize (PADL): Robust Proactive Image Defense | Filippo Bartolucci et.al. | 2409.17941 | null |
2024-09-25 | DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion | Yukun Huang et.al. | 2409.17145 | link |
2024-09-25 | Language-oriented Semantic Communication for Image Transmission with Fine-Tuned Diffusion Model | Xinfeng Wei et.al. | 2409.17104 | null |
2024-09-25 | Accumulator-Aware Post-Training Quantization | Ian Colbert et.al. | 2409.17092 | null |
2024-09-25 | Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification | Xinrui Zhou et.al. | 2409.17091 | null |
2024-09-25 | Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors | Aiping Zhang et.al. | 2409.17058 | link |
2024-09-25 | ControlCity: A Multimodal Diffusion Model Based Approach for Accurate Geospatial Data Generation and Urban Morphology Analysis | Fangshuo Zhou et.al. | 2409.17049 | link |
2024-09-25 | GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design | Phillip Mueller et.al. | 2409.17045 | null |
2024-09-25 | CNN Mixture-of-Depths | Rinor Cakaj et.al. | 2409.17016 | null |
2024-09-25 | Single Image, Any Face: Generalisable 3D Face Generation | Wenqing Wang et.al. | 2409.16990 | null |
2024-09-25 | Dynamic Obstacle Avoidance through Uncertainty-Based Adaptive Planning with Diffusion | Vineet Punyamoorty et.al. | 2409.16950 | null |
2024-09-25 | DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling | Kyuheon Jung et.al. | 2409.16949 | link |
2024-09-25 | Divergence asymmetry and connected components in a general duplication-divergence graph model | Dario Borrelli et.al. | 2409.16943 | null |
2024-09-25 | Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion Model | Hongliang Zhong et.al. | 2409.16938 | link |
2024-09-25 | Linking in Style: Understanding learned features in deep learning models | Maren H. Wehrheim et.al. | 2409.16865 | link |
2024-09-25 | A Versatile and Differentiable Hand-Object Interaction Representation | Théo Morales et.al. | 2409.16855 | null |
2024-09-18 | Massively Multi-Person 3D Human Motion Forecasting with Scene Context | Felix B Mueller et.al. | 2409.12189 | link |
2024-09-18 | MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion | Kalakonda Sai Shashank et.al. | 2409.12140 | null |
2024-09-24 | Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models | Sijing Chen et.al. | 2409.12139 | null |
2024-09-18 | Brain-Streams: fMRI-to-Image Reconstruction with Multi-modal Guidance | Jaehoon Joo et.al. | 2409.12099 | null |
2024-09-19 | Skill matching at scale: freelancer-project alignment for efficient multilingual candidate retrieval | Warren Jouanneau et.al. | 2409.12097 | null |
2024-09-18 | Design of Ligand-Binding Proteins with Atomic Flow Matching | Junqi Liu et.al. | 2409.12080 | null |
2024-09-18 | Denoising diffusion models for high-resolution microscopy image restoration | Pamela Osuna-Vargas et.al. | 2409.12078 | null |
2024-09-19 | Using Large Language Models to Generate Clinical Trial Tables and Figures | Yumeng Yang et.al. | 2409.12046 | null |
2024-09-18 | LEMON: Localized Editing with Mesh Optimization and Neural Shaders | Furkan Mert Algan et.al. | 2409.12024 | null |
2024-09-18 | Promise and Peril of Collaborative Code Generation Models: Balancing Effectiveness and Memorization | Zhi Chen et.al. | 2409.12020 | null |
2024-09-18 | Towards Global Localization using Multi-Modal Object-Instance Re-Identification | Aneesh Chavan et.al. | 2409.12002 | link |
2024-09-18 | Tracking Any Point with Frame-Event Fusion Network at High Frame Rate | Jiaxiong Liu et.al. | 2409.11953 | null |
2024-09-18 | Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models | Lorenzo Mandelli et.al. | 2409.11920 | null |
2024-09-18 | AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots | Zhaxizhuoma et.al. | 2409.11905 | null |
2024-09-18 | Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI Model Evaluation | Dimitrios Christodoulou et.al. | 2409.11904 | null |
2024-09-17 | Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion | Zhenwei Wang et.al. | 2409.11406 | null |
2024-09-17 | Teaching dark matter simulations to speak the halo language | Shivam Pandey et.al. | 2409.11401 | link |
2024-09-17 | Ultrasound Image Enhancement with the Variance of Diffusion Models | Yuxin Zhang et.al. | 2409.11380 | link |
2024-09-17 | OSV: One Step is Enough for High-Quality Image to Video Generation | Xiaofeng Mao et.al. | 2409.11367 | null |
2024-09-17 | Ping! Your Food is Ready: Comparing Different Notification Techniques in 3D AR Cooking Environment | Aditya Raikwar et.al. | 2409.11357 | null |
2024-09-17 | Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think | Gonzalo Martin Garcia et.al. | 2409.11355 | link |
2024-09-17 | OmniGen: Unified Image Generation | Shitao Xiao et.al. | 2409.11340 | link |
2024-09-17 | fMRI-3D: A Comprehensive Dataset for Enhancing fMRI-based 3D Reconstruction | Jianxiong Gao et.al. | 2409.11315 | null |
2024-09-17 | SpMis: An Investigation of Synthetic Spoken Misinformation Detection | Peizhuo Liu et.al. | 2409.11308 | null |
2024-09-17 | Measurement of top-quark pair production in association with charm quarks in proton-proton collisions at |
ATLAS Collaboration et.al. | 2409.11305 | null |
2024-09-17 | NirvaWave: An Accurate and Efficient Near Field Wave Propagation Simulator for 6G and Beyond | Vahid Yazdnian et.al. | 2409.11293 | link |
2024-09-17 | DroneDiffusion: Robust Quadrotor Dynamics Learning with Diffusion Models | Avirup Das et.al. | 2409.11292 | null |
2024-09-17 | Neural Networks for Vehicle Routing Problem | László Kovács et.al. | 2409.11290 | null |
2024-09-17 | Attacking Slicing Network via Side-channel Reinforcement Learning Attack | Wei Shao et.al. | 2409.11258 | null |
2024-09-17 | Learning Source Disentanglement in Neural Audio Codec | Xiaoyu Bie et.al. | 2409.11228 | null |
2024-09-16 | Pennsieve - A Collaborative Platform for Translational Neuroscience and Beyond | Zack Goldblum et.al. | 2409.10509 | null |
2024-09-16 | Torres funerarias chullpa en el valle del río Lauca: un primer análisis arqueoastronómico | Alejandro Gangui et.al. | 2409.10497 | null |
2024-09-16 | Incorporating Classifier-Free Guidance in Diffusion Model-Based Recommendation | Noah Buchanan et.al. | 2409.10494 | null |
2024-09-16 | SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing | Qi Qian et.al. | 2409.10476 | null |
2024-09-16 | MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion | Lehong Wu et.al. | 2409.10473 | null |
2024-09-16 | Signed Graph Autoencoder for Explainable and Polarization-Aware Network Embeddings | Nikolaos Nakis et.al. | 2409.10452 | null |
2024-09-16 | Mamba-ST: State Space Model for Efficient Style Transfer | Filippo Botti et.al. | 2409.10385 | link |
2024-09-16 | 2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation? | Téo Guichoux et.al. | 2409.10357 | null |
2024-09-16 | Taming Diffusion Models for Image Restoration: A Review | Ziwei Luo et.al. | 2409.10353 | null |
2024-09-16 | MEGS: Morphological Evaluation of Galactic Structure | Ufuk Çakır et.al. | 2409.10346 | link |
2024-09-16 | VAE-QWGAN: Improving Quantum GANs for High Resolution Image Generation | Aaron Mark Thomas et.al. | 2409.10339 | null |
2024-09-16 | Research and Design of a Financial Intelligent Risk Control Platform Based on Big Data Analysis and Deep Machine Learning | Shuochen Bi et.al. | 2409.10331 | null |
2024-09-16 | Fairness, not Emotion, Drives Socioeconomic Decision Making | Rudra Mukhopadhyay et.al. | 2409.10322 | null |
2024-09-16 | On Synthetic Texture Datasets: Challenges, Creation, and Curation | Blaine Hoak et.al. | 2409.10297 | null |
2024-09-16 | DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis | Fa-Ting Hong et.al. | 2409.10281 | null |
2024-09-13 | Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation | Qingwen Bu et.al. | 2409.09016 | link |
2024-09-13 | A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis | Yohan Poirier-Ginter et.al. | 2409.08947 | null |
2024-09-13 | Emerging Reliance Behaviors in Human-AI Text Generation: Hallucinations, Data Quality Assessment, and Cognitive Forcing Functions | Zahra Ashktorab et.al. | 2409.08937 | null |
2024-09-13 | Latent Space Score-based Diffusion Model for Probabilistic Multivariate Time Series Imputation | Guojun Liang et.al. | 2409.08917 | link |
2024-09-13 | Gaussian is All You Need: A Unified Framework for Solving Inverse Problems via Diffusion Posterior Sampling | Nebiyou Yismaw et.al. | 2409.08906 | null |
2024-09-13 | Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control | Carles Domingo-Enrich et.al. | 2409.08861 | null |
2024-09-13 | The Line-Based Dial-a-Ride Problem | Kendra Reiter et.al. | 2409.08860 | link |
2024-09-13 | InstantDrag: Improving Interactivity in Drag-based Image Editing | Joonghyuk Shin et.al. | 2409.08857 | null |
2024-09-13 | DX2CT: Diffusion Model for 3D CT Reconstruction from Bi or Mono-planar 2D X-ray(s) | Yun Su Jeong et.al. | 2409.08850 | null |
2024-09-13 | Development of a Compton Imager Setup | Anuraag Arya et.al. | 2409.08822 | null |
2024-09-13 | LLaQo: Towards a Query-Based Coach in Expressive Music Performance Assessment | Huan Zhang et.al. | 2409.08795 | link |
2024-09-13 | What You Say = What You Want? Teaching Humans to Articulate Requirements for LLMs | Qianou Ma et.al. | 2409.08775 | link |
2024-09-13 | A Hybrid Meta-Learning and Multi-Armed Bandit Approach for Context-Specific Multi-Objective Recommendation Optimization | Tiago Cunha et.al. | 2409.08752 | null |
2024-09-13 | Adaptive Sampling for Continuous Group Equivariant Neural Networks | Berfin Inal et.al. | 2409.08741 | null |
2024-09-13 | DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset | Jiawei Du et.al. | 2409.08731 | link |
2024-09-12 | DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors | Thomas Hanwen Zhu et.al. | 2409.08278 | null |
2024-09-12 | Hand-Object Interaction Pretraining from Videos | Himanshu Gaurav Singh et.al. | 2409.08273 | null |
2024-09-12 | Click2Mask: Local Editing with Dynamic Mask Generation | Omer Regev et.al. | 2409.08272 | null |
2024-09-12 | DreamBeast: Distilling 3D Fantastical Animals with Part-Aware Knowledge Transfer | Runjia Li et.al. | 2409.08271 | null |
2024-09-12 | Touch2Touch: Cross-Modal Tactile Generation for Object Manipulation | Samanta Rodriguez et.al. | 2409.08269 | null |
2024-09-12 | Improving Text-guided Object Inpainting with Semantic Pre-inpainting | Yifu Chen et.al. | 2409.08260 | link |
2024-09-12 | Improving Virtual Try-On with Garment-focused Diffusion Models | Siqi Wan et.al. | 2409.08258 | null |
2024-09-12 | LoRID: Low-Rank Iterative Diffusion for Adversarial Purification | Geigh Zollicoffer et.al. | 2409.08255 | null |
2024-09-12 | Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding | Hongyu Li et.al. | 2409.08251 | null |
2024-09-12 | IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation | Yinwei Wu et.al. | 2409.08240 | null |
2024-09-12 | Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources | Alisia Lupidi et.al. | 2409.08239 | null |
2024-09-12 | LT3SD: Latent Trees for 3D Scene Diffusion | Quan Meng et.al. | 2409.08215 | null |
2024-09-12 | VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis | Hao Chen et.al. | 2409.08207 | null |
2024-09-12 | High-Frequency Anti-DreamBooth: Robust Defense Against Image Synthesis | Takuto Onikubo et.al. | 2409.08167 | link |
2024-09-12 | MagicStyle: Portrait Stylization Based on Reference Image | Zhaoli Deng et.al. | 2409.08156 | null |
2024-09-11 | DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation | Haibo Yang et.al. | 2409.07454 | null |
2024-09-11 | Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models | Haibo Yang et.al. | 2409.07452 | link |
2024-09-11 | FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process | Yang Luo et.al. | 2409.07451 | null |
2024-09-11 | Efficient One-Step Diffusion Refinement for Snapshot Compressive Imaging | Yunzhen Wang et.al. | 2409.07417 | null |
2024-09-11 | Extracting TCPIP Headers at High Speed for the Anonymized Network Traffic Graph Challenge | Zhaoyang Han et.al. | 2409.07374 | null |
2024-09-11 | Awaking the Slides: A Tuning-free and Knowledge-regulated AI Tutoring System via Language Model Coordination | Daniel Zhang-Li et.al. | 2409.07372 | null |
2024-09-11 | Event-based Mosaicing Bundle Adjustment | Shuang Guo et.al. | 2409.07365 | link |
2024-09-11 | Training-Free Guidance for Discrete Diffusion Models for Molecular Generation | Thomas J. Kerby et.al. | 2409.07359 | null |
2024-09-11 | Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching | Eugenio Chisari et.al. | 2409.07343 | null |
2024-09-11 | Efficient and Unbiased Sampling of Boltzmann Distributions via Consistency Models | Fengzhe Zhang et.al. | 2409.07323 | null |
2024-09-11 | Optimizing Neural Network Performance and Interpretability with Diophantine Equation Encoding | Ronald Katende et.al. | 2409.07310 | null |
2024-09-11 | Exploring User-level Gradient Inversion with a Diffusion Prior | Zhuohang Li et.al. | 2409.07291 | null |
2024-09-11 | CCFExp: Facial Image Synthesis with Cycle Cross-Fusion Diffusion Model for Facial Paralysis Individuals | Weixiang Gao et.al. | 2409.07271 | link |
2024-09-11 | Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models | Sanoojan Baliah et.al. | 2409.07269 | link |
2024-09-11 | EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion | Jian Zhang et.al. | 2409.07255 | null |
2024-09-10 | Technical Report of Mobile Manipulator Robot for Industrial Environments | Erfan Amoozad Khalili et.al. | 2409.06693 | null |
2024-09-10 | SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation | Teng Hu et.al. | 2409.06633 | null |
2024-09-10 | MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification | Phu Pham et.al. | 2409.06620 | null |
2024-09-10 | A Primer on Variational Inference for Physics-Informed Deep Generative Modelling | Alex Glyn-Davies et.al. | 2409.06560 | null |
2024-09-10 | From LIMA to DeepLIMA: following a new path of interoperability | Victor Bocharov et.al. | 2409.06550 | null |
2024-09-10 | Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models | Xin Jing et.al. | 2409.06451 | null |
2024-09-10 | Prompt2Fashion: An automatically generated fashion dataset | Georgia Argyro et.al. | 2409.06442 | link |
2024-09-10 | Fast nonparametric inference of network backbones for graph sparsification | Alec Kirkley et.al. | 2409.06417 | link |
2024-09-10 | Distilling Generative-Discriminative Representations for Very Low-Resolution Face Recognition | Junzheng Zhang et.al. | 2409.06371 | null |
2024-09-10 | What happens to diffusion model likelihood when your model is conditional? | Mattias Cross et.al. | 2409.06364 | null |
2024-09-10 | DiffQRCoder: Diffusion-based Aesthetic QR Code Generation with Scanning Robustness Guided Iterative Refinement | Jia-Wei Liao et.al. | 2409.06355 | null |
2024-09-10 | Improving Conditional Level Generation using Automated Validation in Match-3 Games | Monica Villanueva Aylagas et.al. | 2409.06349 | null |
2024-09-10 | Foragax: An Agent Based Modelling framework based on JAX | Siddharth Chaturvedi et.al. | 2409.06345 | link |
2024-09-10 | G3PT: Unleash the power of Autoregressive Modeling in 3D Generation via Cross-scale Querying Transformer | Jinzhi Zhang et.al. | 2409.06322 | null |
2024-09-10 | Learning Augmentation Policies from A Model Zoo for Time Series Forecasting | Haochen Yuan et.al. | 2409.06282 | null |
2024-09-09 | Fast Generation of Custom Floating-Point Spatial Filters on FPGAs | Nelson Campos et.al. | 2409.05837 | null |
2024-09-09 | Enhancing Preference-based Linear Bandits via Human Response Time | Shen Li et.al. | 2409.05798 | null |
2024-09-09 | Predicting Critical Heat Flux with Uncertainty Quantification and Domain Generalization Using Conditional Variational Autoencoders and Deep Neural Networks | Farah Alsafadi et.al. | 2409.05790 | null |
2024-09-09 | Vector Quantized Diffusion Model Based Speech Bandwidth Extension | Yuan Fang et.al. | 2409.05784 | null |
2024-09-09 | AS-Speech: Adaptive Style For Speech Synthesis | Zhipeng Li et.al. | 2409.05730 | null |
2024-09-09 | pFedGPA: Diffusion-based Generative Parameter Aggregation for Personalized Federated Learning | Jiahao Lai et.al. | 2409.05701 | null |
2024-09-09 | Citizen-Led Personalization of User Interfaces: Investigating How People Customize Interfaces for Themselves and Others | Sérgio Alves et.al. | 2409.05696 | null |
2024-09-09 | Unlearning or Concealment? A Critical Analysis and Evaluation Metrics for Unlearning in Diffusion Models | Aakash Sen Sharma et.al. | 2409.05668 | null |
2024-09-09 | Forward KL Regularized Preference Optimization for Aligning Diffusion Policies | Zhao Shan et.al. | 2409.05622 | null |
2024-09-09 | CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization | Nan Chen et.al. | 2409.05606 | null |
2024-09-09 | Latent 3D Brain MRI Counterfactual | Wei Peng et.al. | 2409.05585 | null |
2024-09-09 | Spatially-Aware Speaker for Vision-and-Language Navigation Instruction Generation | Muraleekrishna Gopinathan et.al. | 2409.05583 | link |
2024-09-09 | Design and Implementation of TAO DAQ System | Shuihan Zhang et.al. | 2409.05522 | null |
2024-09-09 | A Taxonomy of Miscompressions: Preparing Image Forensics for Neural Compression | Nora Hofer et.al. | 2409.05490 | null |
2024-09-09 | DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation | Wei Wu et.al. | 2409.05463 | null |
2024-09-06 | VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation | Yecheng Wu et.al. | 2409.04429 | link |
2024-09-06 | Exploring Foundation Models for Synthetic Medical Imaging: A Study on Chest X-Rays and Fine-Tuning Techniques | Davide Clode da Silva et.al. | 2409.04424 | null |
2024-09-06 | Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation | Zhuoyan Luo et.al. | 2409.04410 | null |
2024-09-06 | Enhancing Skin Lesion Diagnosis with Ensemble Learning | Xiaoyi Liu et.al. | 2409.04381 | null |
2024-09-06 | How Fair is Your Diffusion Recommender Model? | Daniele Malitesta et.al. | 2409.04339 | null |
2024-09-06 | Random effects estimation in a fractional diffusion model based on continuous observations | Nesrine Chebli et.al. | 2409.04331 | null |
2024-09-06 | Advancing Automated Knowledge Transfer in Evolutionary Multitasking via Large Language Models | Yuxiao Huang et.al. | 2409.04270 | null |
2024-09-06 | An overview of domain-specific foundation model: key technologies, applications and challenges | Haolong Chen et.al. | 2409.04267 | null |
2024-09-06 | UniDet3D: Multi-dataset Indoor 3D Object Detection | Maksim Kolodiazhnyi et.al. | 2409.04234 | link |
2024-09-06 | Generative Modelling via Quantile Regression | Johannes Schmidt-Hieber et.al. | 2409.04231 | null |
2024-09-06 | Breaking the Brownian Barrier: Models and Manifestations of Molecular Diffusion in Complex Fluids | Harish Srinivasan et.al. | 2409.04199 | null |
2024-09-06 | GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers | Lorenza Prospero et.al. | 2409.04196 | null |
2024-09-06 | Subsampling of Correlated Graph Signals | Rishabh Ravi et.al. | 2409.04107 | null |
2024-09-06 | Estimation of service value parameters for a queue with unobserved balking | Daniel Podorojnyi et.al. | 2409.04090 | null |
2024-09-06 | D4: Text-guided diffusion model-based domain adaptive data augmentation for vineyard shoot detection | Kentaro Hirahara et.al. | 2409.04060 | null |
2024-09-05 | Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding | Yunze Man et.al. | 2409.03757 | link |
2024-09-05 | WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild | Yuntian Deng et.al. | 2409.03753 | null |
2024-09-05 | ArtiFade: Learning to Generate High-quality Subject from Blemished Images | Shuya Yang et.al. | 2409.03745 | null |
2024-09-06 | RAG based Question-Answering for Contextual Response Prediction System | Sriram Veturi et.al. | 2409.03708 | null |
2024-09-05 | RealisHuman: A Two-Stage Approach for Refining Malformed Human Parts in Generated Images | Benzhi Wang et.al. | 2409.03644 | link |
2024-09-05 | DiffEVC: Any-to-Any Emotion Voice Conversion with Expressive Guidance | Hsing-Hang Chou et.al. | 2409.03636 | null |
2024-09-05 | Generalizing Linear Graphs and Bond Graph Models with Hetero-functional Graphs for System-of-Systems Engineering Applications | Ehsanoddin Ghorbanichemazkati et.al. | 2409.03630 | null |
2024-09-05 | TCDiff: Triple Condition Diffusion Model with 3D Constraints for Stylizing Synthetic Faces | Bernardo Biesseck et.al. | 2409.03600 | link |
2024-09-05 | DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture | Qianlong Xiang et.al. | 2409.03550 | null |
2024-09-05 | Euclid preparation. Simulations and nonlinearities beyond |
Euclid Collaboration et.al. | 2409.03523 | null |
2024-09-05 | Blended Latent Diffusion under Attention Control for Real-World Video Editing | Deyin Liu et.al. | 2409.03514 | null |
2024-09-05 | Physical Modelling of Piano Sound | Haifan Xie et.al. | 2409.03481 | null |
2024-09-05 | Data-free Distillation with Degradation-prompt Diffusion for Multi-weather Image Restoration | Pei Wang et.al. | 2409.03455 | null |
2024-09-05 | Rx Strategist: Prescription Verification using LLM Agents System | Phuc Phan Van et.al. | 2409.03440 | null |
2024-09-05 | KiloBot: A Programming Language for Deploying Perception-Guided Industrial Manipulators at Scale | Wei Gao et.al. | 2409.03439 | null |
2024-09-04 | HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts | Xinyu Liu et.al. | 2409.02919 | link |
2024-09-04 | Latent Watermarking of Audio Generative Models | Robin San Roman et.al. | 2409.02915 | null |
2024-09-04 | Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling | Kaiwen Zheng et.al. | 2409.02908 | null |
2024-09-04 | Configurable Foundation Models: Building LLMs from a Modular Perspective | Chaojun Xiao et.al. | 2409.02877 | null |
2024-09-04 | Look Into the LITE in Deep Learning for Time Series Classification | Ali Ismail-Fawaz et.al. | 2409.02869 | link |
2024-09-04 | Building a Scalable, Effective, and Steerable Search and Ranking Platform | Marjan Celikik et.al. | 2409.02856 | null |
2024-09-04 | Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models | Zhibin Liu et.al. | 2409.02851 | link |
2024-09-04 | Anomaly Detection in Offshore Open Radio Access Network Using Long Short-Term Memory Models on a Novel Artificial Intelligence-Driven Cloud-Native Data Platform | Abdelrahim Ahmad et.al. | 2409.02849 | null |
2024-09-04 | Multi-Track MusicLDM: Towards Versatile Music Generation with Latent Diffusion Model | Tornike Karchkhadze et.al. | 2409.02845 | null |
2024-09-04 | SNNAX -- Spiking Neural Networks in JAX | Jamie Lohoff et.al. | 2409.02842 | null |
2024-09-04 | Experimental Framework for Generating Reliable Ground Truth for Laryngeal Spatial Segmentation Tasks | Hamzeh Ghasemzadeh et.al. | 2409.02809 | null |
2024-09-04 | Creating a Gen-AI based Track and Trace Assistant MVP (SuperTracy) for PostNL | Mohammad Reshadati et.al. | 2409.02711 | null |
2024-09-04 | Rethinking HTG Evaluation: Bridging Generation and Recognition | Konstantina Nikolaidou et.al. | 2409.02683 | link |
2024-09-04 | Introduction to Machine Learning | Laurent Younes et.al. | 2409.02668 | null |
2024-09-04 | Creating Domain-Specific Translation Memories for Machine Translation Fine-tuning: The TRENCARD Bilingual Cardiology Corpus | Gokhan Dogru et.al. | 2409.02667 | null |
2024-08-30 | Generative AI Enables Medical Image Segmentation in Ultra Low-Data Regimes | Li Zhang et.al. | 2408.17421 | link |
2024-08-30 | Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain | Francesca Grasso et.al. | 2408.17362 | link |
2024-08-30 | Subspace Diffusion Posterior Sampling for Travel-Time Tomography | Xiang Cao et.al. | 2408.17333 | null |
2024-08-30 | Structuring a Training Strategy to Robustify Perception Models with Realistic Image Augmentations | Ahmed Hammam et.al. | 2408.17311 | null |
2024-08-30 | Leveraging Deep Generative Model For Computational Protein Design And Optimization | Boqiao Lai et.al. | 2408.17241 | null |
2024-08-30 | Towards Symbolic XAI -- Explanation Through Human Understandable Logical Relationships Between Features | Thomas Schnake et.al. | 2408.17198 | null |
2024-09-02 | Leveraging Blockchain and ANFIS for Optimal Supply Chain Management | Amirfarhad Farhadi et.al. | 2408.17161 | null |
2024-08-30 | Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning | Xiaoye Qu et.al. | 2408.17150 | link |
2024-08-30 | Flow Matching for Optimal Reaction Coordinates of Biomolecular System | Mingyuan Zhang et.al. | 2408.17139 | link |
2024-08-30 | Temporal and Interactive Modeling for Efficient Human-Human Motion Generation | Yabiao Wang et.al. | 2408.17135 | null |
2024-09-02 | RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance | Avideep Mukherjee et.al. | 2408.17095 | null |
2024-08-30 | FissionVAE: Federated Non-IID Image Generation with Latent Space and Decoder Decomposition | Chen Hu et.al. | 2408.17090 | link |
2024-08-30 | Approximately Invertible Neural Network for Learned Image Compression | Yanbo Gao et.al. | 2408.17073 | null |
2024-09-02 | Instant Adversarial Purification with Adversarial Consistency Distillation | Chun Tong Lei et.al. | 2408.17064 | null |
2024-08-30 | Text-to-Image Generation Via Energy-Based CLIP | Roy Ganz et.al. | 2408.17046 | null |
2024-08-29 | ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model | Fangfu Liu et.al. | 2408.16767 | null |
2024-08-29 | CSGO: Content-Style Composition in Text-to-Image Generation | Peng Xing et.al. | 2408.16766 | null |
2024-08-29 | A Score-Based Density Formula, with Applications in Diffusion Generative Models | Gen Li et.al. | 2408.16765 | null |
2024-08-29 | UV-free Texture Generation with Denoising and Geodesic Heat Diffusions | Simone Foti et.al. | 2408.16762 | link |
2024-08-29 | One-Shot Learning Meets Depth Diffusion in Multi-Object Videos | Anisha Jain et.al. | 2408.16704 | null |
2024-08-29 | VMC: A Grammar for Visualizing Statistical Model Checks | Ziyang Guo et.al. | 2408.16702 | null |
2024-08-29 | GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models | Moreno D'Incà et.al. | 2408.16700 | link |
2024-08-29 | Optimization Models for the Quadratic Traveling Salesperson Problem | Yuxiao Chen et.al. | 2408.16680 | null |
2024-08-29 | DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving | Yongjie Fu et.al. | 2408.16647 | null |
2024-08-29 | RLCP: A Reinforcement Learning-based Copyright Protection Method for Text-to-Image Diffusion Model | Zhuan Shi et.al. | 2408.16634 | null |
2024-08-28 | TEDRA: Text-based Editing of Dynamic and Photoreal Actors | Basavaraj Sunagad et.al. | 2408.15995 | null |
2024-08-28 | Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation | Shengyuan Zhang et.al. | 2408.15991 | link |
2024-08-28 | Thoughtseeds: Evolutionary Priors, Nested Markov Blankets, and the Emergence of Embodied Cognition | Prakash Chandra Kavi et.al. | 2408.15982 | null |
2024-08-28 | Stability of Primal-Dual Gradient Flow Dynamics for Multi-Block Convex Optimization Problems | Ibrahim K. Ozaslan et.al. | 2408.15969 | null |
2024-08-28 | MetaGFN: Exploring Distant Modes with Adapted Metadynamics for Continuous GFlowNets | Dominic Phillips et.al. | 2408.15905 | null |
2024-08-28 | Gen-Swarms: Adapting Deep Generative Models to Swarms of Drones | Carlos Plou et.al. | 2408.15899 | null |
2024-08-28 | Airfoil Diffusion: Denoising Diffusion Model For Conditional Airfoil Generation | Reid Graves et.al. | 2408.15898 | link |
2024-08-28 | Disentangled Diffusion Autoencoder for Harmonization of Multi-site Neuroimaging Data | Ayodeji Ijishakin et.al. | 2408.15890 | null |
2024-08-29 | Recent Decade's Power Outage Data Reveals the Increasing Vulnerability of U.S. Power Infrastructure | Bo Li et.al. | 2408.15882 | null |
2024-08-28 | GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model | Yongjie Fu et.al. | 2408.15868 | null |
2024-08-27 | GenRec: Unifying Video Generation and Recognition with Diffusion Models | Zejia Weng et.al. | 2408.15241 | link |
2024-08-27 | Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation | Xiaojuan Wang et.al. | 2408.15239 | null |
2024-08-27 | Simulation of Stochastic Discrete Dislocation Dynamics in Ductile Vs Brittle Materials | Santosh Chhetri et.al. | 2408.15157 | null |
2024-08-27 | How transformers learn structured data: insights from hierarchical filtering | Jerome Garnier-Brun et.al. | 2408.15138 | link |
2024-08-27 | DIFR3CT: Latent Diffusion for Probabilistic 3D CT Reconstruction from Few Planar X-Rays | Yiran Sun et.al. | 2408.15118 | link |
2024-08-27 | Data-Driven Nonlinear Deformation Design of 3D-Printable Shells | Samuel Silverman et.al. | 2408.15097 | link |
2024-08-27 | Constrained Diffusion Models via Dual Training | Shervin Khalafi et.al. | 2408.15094 | null |
2024-08-27 | LN-Gen: Rectal Lymph Nodes Generation via Anatomical Features | Weidong Guo et.al. | 2408.14977 | null |
2024-08-27 | MegActor- |
Shurong Yang et.al. | 2408.14975 | null |
2024-08-27 | Integrated Bundling and Pricing of Unique Items | Maxime Bouscary et.al. | 2408.14913 | null |
2024-08-26 | K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences | Zhikai Li et.al. | 2408.14468 | null |
2024-08-26 | Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs | Xiaoman Zhang et.al. | 2408.14397 | link |
2024-08-26 | Reprogramming Foundational Large Language Models(LLMs) for Enterprise Adoption for Spatio-Temporal Forecasting Applications: Unveiling a New Era in Copilot-Guided Cross-Modal Time Series Representation Learning | Sakhinana Sagar Srinivas et.al. | 2408.14387 | null |
2024-08-26 | GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy | Peiyan Li et.al. | 2408.14368 | link |
2024-08-27 | Foundation Models for Music: A Survey | Yinghao Ma et.al. | 2408.14340 | link |
2024-08-26 | Automated Machine Learning in Insurance | Panyi Dong et.al. | 2408.14331 | link |
2024-08-26 | LLM-3D Print: Large Language Models To Monitor and Control 3D Printing | Yayati Jadhav et.al. | 2408.14307 | null |
2024-08-26 | Learning Local Pattern Modularization for Point Cloud Reconstruction from Unseen Classes | Chao Chen et.al. | 2408.14279 | null |
2024-08-26 | Towards Synthetic Trace Generation of Modeling Operations using In-Context Learning Approach | Vittoriano Muttillo et.al. | 2408.14259 | null |
2024-08-27 | Text3DAug -- Prompted Instance Augmentation for LiDAR Perception | Laurenz Reichardt et.al. | 2408.14253 | link |
2024-08-23 | How Diffusion Models Learn to Factorize and Compose | Qiyao Liang et.al. | 2408.13256 | null |
2024-08-23 | Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption | Sakhinana Sagar Srinivas et.al. | 2408.13248 | null |
2024-08-23 | CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities | Tao Wu et.al. | 2408.13239 | null |
2024-08-23 | Social Welfare Maximization for Federated Learning with Network Effects | Xiang Li et.al. | 2408.13223 | null |
2024-08-23 | Instruct-DeBERTa: A Hybrid Approach for Aspect-based Sentiment Analysis on Textual Reviews | Dineth Jayakody et.al. | 2408.13202 | null |
2024-08-23 | IFH: a Diffusion Framework for Flexible Design of Graph Generative Models | Samuel Cognolato et.al. | 2408.13194 | link |
2024-08-23 | Deep Learning for Lung Disease Classification Using Transfer Learning and a Customized CNN Architecture with Attention | Xiaoyi Liu et.al. | 2408.13180 | null |
2024-08-26 | Focus on Neighbors and Know the Whole: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation | Bonan Li et.al. | 2408.13149 | null |
2024-08-23 | Diffusion-based Episodes Augmentation for Offline Multi-Agent Reinforcement Learning | Jihwan Oh et.al. | 2408.13092 | null |
2024-08-23 | General Intelligent Imaging and Uncertainty Quantification by Deterministic Diffusion Model | Weiru Fan et.al. | 2408.13061 | null |
2024-08-22 | xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations | Can Qin et.al. | 2408.12590 | null |
2024-08-22 | ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation | Lujia Zhong et.al. | 2408.12561 | link |
2024-08-22 | Show-o: One Single Transformer to Unify Multimodal Understanding and Generation | Jinheng Xie et.al. | 2408.12528 | null |
2024-08-22 | FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing | Jue Wang et.al. | 2408.12429 | link |
2024-08-22 | Enhanced Infield Agriculture with Interpretable Machine Learning Approaches for Crop Classification | Sudi Murindanyi et.al. | 2408.12426 | null |
2024-08-22 | 4D Diffusion for Dynamic Protein Structure Prediction with Reference Guided Motion Alignment | Kaihui Cheng et.al. | 2408.12419 | null |
2024-08-22 | CODE: Confident Ordinary Differential Editing | Bastien van Delft et.al. | 2408.12418 | link |
2024-08-22 | Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures | Ce Liu et.al. | 2408.12413 | null |
2024-08-22 | A Stable Polygamy Approach to Spectrum Access with Channel Reuse | Dan Ben Ami et.al. | 2408.12402 | null |
2024-08-22 | Multi-Style Facial Sketch Synthesis through Masked Generative Modeling | Bowen Sun et.al. | 2408.12400 | null |
2024-08-21 | Pixel Is Not A Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models | Chun-Yen Shih et.al. | 2408.11810 | null |
2024-08-21 | ACE: A Cross-Platform Visual-Exoskeletons System for Low-Cost Dexterous Teleoperation | Shiqi Yang et.al. | 2408.11805 | null |
2024-08-21 | DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework | Zhifei Xie et.al. | 2408.11788 | null |
2024-08-21 | Timeline and Boundary Guided Diffusion Network for Video Shadow Detection | Haipeng Zhou et.al. | 2408.11785 | link |
2024-08-21 | Sum of Squares Circuits | Lorenzo Loconte et.al. | 2408.11778 | null |
2024-08-21 | Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards | Omar Erak et.al. | 2408.11775 | link |
2024-08-21 | D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal models | M. Forlini et.al. | 2408.11761 | null |
2024-08-21 | JieHua Paintings Style Feature Extracting Model using Stable Diffusion with ControlNet | Yujia Gu et.al. | 2408.11744 | null |
2024-08-21 | Enhancing Cross-Modal Medical Image Segmentation through Compositionality | Aniek Eijpe et.al. | 2408.11733 | link |
2024-08-21 | AI-assisted Automated Short Answer Grading of Handwritten University Level Mathematics Exams | Tianyi Liu et.al. | 2408.11728 | null |
2024-08-20 | Reconciling Methodological Paradigms: Employing Large Language Models as Novice Qualitative Research Assistants in Talent Management Research | Sreyoshi Bhaduri et.al. | 2408.11043 | null |
2024-08-20 | Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model | Chunting Zhou et.al. | 2408.11039 | null |
2024-08-20 | Full Detector Simulation of a Projective Dual-Readout Segmented Crystal Electromagnetic Calorimeter with Precision Timing | Wonyong Chung et.al. | 2408.11027 | null |
2024-08-20 | MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning | Haoning Wu et.al. | 2408.11001 | link |
2024-08-20 | GreediRIS: Scalable Influence Maximization using Distributed Streaming Maximum Cover | Reet Barik et.al. | 2408.10982 | null |
2024-08-21 | Assortment Optimization Under History-Dependent Effects | Taotao He et.al. | 2408.10967 | null |
2024-08-20 | Kilometer-Scale Convection Allowing Model Emulation using Generative Diffusion Modeling | Jaideep Pathak et.al. | 2408.10958 | null |
2024-08-20 | SysBench: Can Large Language Models Follow System Messages? | Yanzhao Qin et.al. | 2408.10943 | link |
2024-08-20 | A Closer Look at Data Augmentation Strategies for Finetuning-Based Low/Few-Shot Object Detection | Vladislav Li et.al. | 2408.10940 | null |
2024-08-20 | Large Point-to-Gaussian Model for Image-to-3D Generation | Longfei Lu et.al. | 2408.10935 | null |
2024-08-19 | MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model | Minghua Liu et.al. | 2408.10198 | null |
2024-08-19 | SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views | Chao Xu et.al. | 2408.10195 | null |
2024-08-19 | Customizing Language Models with Instance-wise LoRA for Sequential Recommendation | Xiaoyu Kong et.al. | 2408.10159 | link |
2024-08-19 | Advancing Voice Cloning for Nepali: Leveraging Transfer Learning in a Low-Resource Language | Manjil Karki et.al. | 2408.10128 | null |
2024-08-19 | Learning Precise Affordances from Egocentric Videos for Robotic Manipulation | Gen Li et.al. | 2408.10123 | null |
2024-08-19 | Convert and Speak: Zero-shot Accent Conversion with Minimum Supervision | Zhijun Jia et.al. | 2408.10096 | null |
2024-08-19 | Stacked Intelligent Metasurfaces for Integrated Sensing and Communications | Haoxian Niu et.al. | 2408.10043 | null |
2024-08-19 | General Impedance Modeling for Modular Multilevel Converter with Grid-forming and Grid-following Control | Chu Sun et.al. | 2408.10017 | null |
2024-08-19 | Uniting contrastive and generative learning for event sequences models | Aleksandr Yugay et.al. | 2408.09995 | null |
2024-08-19 | Multi-layer diffusion model of photovoltaic installations | Tomasz Weron et.al. | 2408.09904 | null |
2024-08-16 | Automated High-throughput Organic Crystal Structure Prediction via Population-based Sampling | Qiang Zhu et.al. | 2408.08843 | link |
2024-08-16 | PFDiff: Training-free Acceleration of Diffusion Models through the Gradient Guidance of Past and Future | Guangyi Wang et.al. | 2408.08822 | null |
2024-08-16 | A Unified Automata-Theoretic Approach to LTLf Modulo Theories (Extended Version) | Marco Faella et.al. | 2408.08817 | null |
2024-08-16 | EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse Dynamics | Chenwei Wan et.al. | 2408.08782 | link |
2024-08-16 | Comparative Analysis of Generative Models: Enhancing Image Synthesis with VAEs, GANs, and Stable Diffusion | Sanchayan Vivekananthan et.al. | 2408.08751 | null |
2024-08-16 | The Blessing of Strategic Customers in Personalized Pricing | Zhi Chen et.al. | 2408.08738 | null |
2024-08-16 | ChatZero:Zero-shot Cross-Lingual Dialogue Generation via Pseudo-Target Language | Yongkang Liu et.al. | 2408.08724 | null |
2024-08-16 | An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation | Peiming Guo et.al. | 2408.08650 | null |
2024-08-16 | Modeling the Neonatal Brain Development Using Implicit Neural Representations | Florentin Bieder et.al. | 2408.08647 | link |
2024-08-16 | Sampling effects on Lasso estimation of drift functions in high-dimensional diffusion processes | Chiara Amorino et.al. | 2408.08638 | null |
2024-08-15 | Understanding the Local Geometry of Generative Model Manifolds | Ahmed Imtiaz Humayun et.al. | 2408.08307 | null |
2024-08-15 | Accelerated Image-Aware Generative Diffusion Modeling | Tanmay Asthana et.al. | 2408.08306 | null |
2024-08-15 | Marker or Markerless? Mode-Switchable Optical Tactile Sensing for Diverse Robot Tasks | Ni Ou et.al. | 2408.08276 | null |
2024-08-15 | mhGPT: A Lightweight Generative Pre-Trained Transformer for Mental Health Text Analysis | Dae-young Kim et.al. | 2408.08261 | null |
2024-08-15 | Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding | Xiner Li et.al. | 2408.08252 | link |
2024-08-15 | Picosecond laser pulses for quantum dot-microcavity based single photon generation by cascaded electro-optic modulation of a narrow-linewidth laser | Mio Poortvliet et.al. | 2408.08213 | null |
2024-08-15 | Not Every Image is Worth a Thousand Words: Quantifying Originality in Stable Diffusion | Adi Haviv et.al. | 2408.08184 | null |
2024-08-15 | Impact of Comprehensive Data Preprocessing on Predictive Modelling of COVID-19 Mortality | Sangita Das et.al. | 2408.08142 | link |
2024-08-15 | Decoding Memes: A Comparative Study of Machine Learning Models for Template Identification | Levente Murgás et.al. | 2408.08126 | link |
2024-08-15 | When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding | Pingping Zhang et.al. | 2408.08093 | null |
2024-08-14 | Detecting Near-Duplicate Face Images | Sudipta Banerjee et.al. | 2408.07689 | link |
2024-08-14 | Composing Automatic Differentiation with Custom Derivatives of Higher-Order Functions | Sam Estep et.al. | 2408.07683 | null |
2024-08-14 | Drug Discovery SMILES-to-Pharmacokinetics Diffusion Models with Deep Molecular Understanding | Bing Hu et.al. | 2408.07636 | null |
2024-08-14 | Anisotropic Diffusion Model of Communication in 2D Biofilm | Yanahan Paramalingam et.al. | 2408.07626 | null |
2024-08-14 | Neural Quantum States and Peaked Molecular Wave Functions: Curse or Blessing? | Aleksei Malyshev et.al. | 2408.07625 | null |
2024-08-14 | MatterGPT: A Generative Transformer for Multi-Property Inverse Design of Solid-State Materials | Yan Chen et.al. | 2408.07608 | null |
2024-08-14 | PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation | Sang-Hoon Lee et.al. | 2408.07547 | link |
2024-08-14 | New Curriculum, New Chance -- Retrieval Augmented Generation for Lesson Planning in Ugandan Secondary Schools. Prototype Quality Evaluation | Simon Kloker et.al. | 2408.07542 | null |
2024-08-14 | DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model | Erez Yosef et.al. | 2408.07541 | null |
2024-08-14 | Towards Real-time Video Compressive Sensing on Mobile Devices | Miao Cao et.al. | 2408.07530 | link |
2024-08-13 | Imagen 3 | Imagen-Team-Google et.al. | 2408.07009 | null |
2024-08-13 | Low-Bitwidth Floating Point Quantization for Efficient High-Quality Diffusion Models | Cheng Chen et.al. | 2408.06995 | null |
2024-08-13 | DCMSA: Multi-Head Self-Attention Mechanism Based on Deformable Convolution For Seismic Data Denoising | Wang Mingwei et.al. | 2408.06963 | null |
2024-08-13 | Neural Speech and Audio Coding | Minje Kim et.al. | 2408.06954 | null |
2024-08-13 | Diffusion Model for Slate Recommendation | Federico Tomasi et.al. | 2408.06883 | null |
2024-08-13 | Efficient Search for Customized Activation Functions with Gradient Descent | Lukas Strack et.al. | 2408.06820 | link |
2024-08-13 | Enhancing Diabetic Retinopathy Diagnosis: A Lightweight CNN Architecture for Efficient Exudate Detection in Retinal Fundus Images | Mujadded Al Rabbani Alif et.al. | 2408.06784 | null |
2024-08-13 | Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective | Ouxiang Li et.al. | 2408.06741 | link |
2024-08-13 | DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion | Yujia Wu et.al. | 2408.06740 | null |
2024-08-13 | Multimodal Analysis of White Blood Cell Differentiation in Acute Myeloid Leukemia Patients using a β-Variational Autoencoder | Gizem Mert et.al. | 2408.06720 | null |
2024-08-12 | The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery | Chris Lu et.al. | 2408.06292 | link |
2024-08-12 | Open-Source Molecular Processing Pipeline for Generating Molecules | Shreyas V et.al. | 2408.06261 | null |
2024-08-12 | 3D Reconstruction of Protein Structures from Multi-view AFM Images using Neural Radiance Fields (NeRFs) | Jaydeep Rade et.al. | 2408.06244 | null |
2024-08-12 | Cislunar Constellation Design for Space Situational Awareness with Time-Expanded Facility Location Problem | Yuri Shimane et.al. | 2408.06238 | null |
2024-08-12 | Novel View Synthesis from a Single Image with Pretrained Diffusion Guidance | Taewon Kang et.al. | 2408.06157 | null |
2024-08-12 | LipidBERT: A Lipid Language Model Pre-trained on METiS de novo Lipid Library | Tianhao Yu et.al. | 2408.06150 | null |
2024-08-12 | Efficient and Scalable Point Cloud Generation with Sparse Point-Voxel Diffusion Models | Ioannis Romanelis et.al. | 2408.06145 | link |
2024-08-12 | Med42-v2: A Suite of Clinical LLMs | Clément Christophe et.al. | 2408.06142 | null |
2024-08-12 | Five Pitfalls When Assessing Synthetic Medical Images with Reference Metrics | Melanie Dohmen et.al. | 2408.06075 | null |
2024-08-12 | CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer | Zhuoyi Yang et.al. | 2408.06072 | link |
2024-08-09 | Multi-Garment Customized Model Generation | Yichen Liu et.al. | 2408.05206 | null |
2024-08-09 | TaSL: Task Skill Localization and Consolidation for Language Model Continual Learning | Yujie Feng et.al. | 2408.05200 | link |
2024-08-09 | Cell Morphology-Guided Small Molecule Generation with GFlowNets | Stephen Zhewen Lu et.al. | 2408.05196 | link |
2024-08-09 | Lithography-free patterning of chalcogenide materials for integrated photonic devices | Zhen Hu et.al. | 2408.05099 | null |
2024-08-09 | Social contagion under hybrid interactions | Xincheng Shu et.al. | 2408.05050 | null |
2024-08-09 | Infrared Beam-shaping on Demand via Tailored Geometric Phase Metasurfaces employing the Plasmonic Phase-Change Material In3SbTe2 | Lukas Conrads et.al. | 2408.05044 | null |
2024-08-09 | Collaborative Static-Dynamic Teaching: A Semi-Supervised Framework for Stripe-Like Space Target Detection | Zijian Zhu et.al. | 2408.05029 | null |
2024-08-09 | Retrieval-augmented code completion for local projects using large language models | Marko Hostnik et.al. | 2408.05026 | null |
2024-08-09 | DreamCouple: Exploring High Quality Text-to-3D Generation Via Rectified Flow | Hangyu Li et.al. | 2408.05008 | null |
2024-08-09 | Pay Attention To Mean Fields For Point Cloud Generation | Benno Käch et.al. | 2408.04997 | link |
2024-08-08 | Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics | Ruining Li et.al. | 2408.04631 | null |
2024-08-08 | Transformer Explainer: Interactive Learning of Text-Generative Models | Aeree Cho et.al. | 2408.04619 | null |
2024-08-08 | Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches | Yongzhi Xu et.al. | 2408.04567 | null |
2024-08-08 | Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models | Yupeng Chang et.al. | 2408.04556 | link |
2024-08-08 | On the Asymptotic Convergence of Subgraph Generated Models | Xinchen Xu et.al. | 2408.04541 | null |
2024-08-08 | AExGym: Benchmarks and Environments for Adaptive Experimentation | Jimmy Wang et.al. | 2408.04531 | null |
2024-08-08 | NFDI4Health workflow and service for synthetic data generation, assessment and risk management | Sobhan Moazemi et.al. | 2408.04478 | null |
2024-08-08 | Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations | Julen Urain et.al. | 2408.04380 | null |
2024-08-08 | Making sense of AI systems development | Mateusz Dolata et.al. | 2408.04311 | null |
2024-08-08 | AI-Driven Chatbot for Intrusion Detection in Edge Networks: Enhancing Cybersecurity with Ethical User Consent | Mugheez Asif et.al. | 2408.04281 | null |
2024-08-07 | Prospects for using drones to test formation-flying CubeSat concepts, and other astronomical applications | John D. Monnier et.al. | 2408.03911 | null |
2024-08-07 | Hate Speech Detection and Classification in Amharic Text with Deep Learning | Samuel Minale Gashe et.al. | 2408.03849 | null |
2024-08-07 | WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models | Prannaya Gupta et.al. | 2408.03837 | link |
2024-08-07 | A broken duet: multistable dynamics of dyadic interactions | Johan Medrano et.al. | 2408.03809 | link |
2024-08-07 | Navigating the Human Maze: Real-Time Robot Pathfinding with Generative Imitation Learning | Martin Moder et.al. | 2408.03807 | link |
2024-08-07 | Data Generation Scheme for Thermal Modality with Edge-Guided Adversarial Conditional Diffusion Model | Guoqing Zhu et.al. | 2408.03748 | link |
2024-08-07 | Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction | Benjamin Matthias Ruppik et.al. | 2408.03706 | null |
2024-08-07 | Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling | Zilyu Ye et.al. | 2408.03695 | link |
2024-08-07 | Unsupervised Detection of Fetal Brain Anomalies using Denoising Diffusion Models | Markus Ditlev Sjøgren Olsen et.al. | 2408.03654 | null |
2024-08-07 | Goal-oriented Semantic Communication for the Metaverse Application | Zhe Wang et.al. | 2408.03646 | null |
2024-08-06 | MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation | Xiaofeng Mao et.al. | 2408.03312 | null |
2024-08-06 | IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts | Ciara Rowles et.al. | 2408.03209 | null |
2024-08-06 | Personalizing Federated Instrument Segmentation with Visual Trait Priors in Robotic Surgery | Jialang Xu et.al. | 2408.03208 | null |
2024-08-06 | An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion | Xingguang Yan et.al. | 2408.03178 | null |
2024-08-06 | Iterative CT Reconstruction via Latent Variable Optimization of Shallow Diffusion Models | Sho Ozaki et.al. | 2408.03156 | null |
2024-08-06 | Enhancing Twitter Bot Detection via Multimodal Invariant Representations | Jibing Gong et.al. | 2408.03096 | null |
2024-08-06 | Analysis of Argument Structure Constructions in a Deep Recurrent Language Model | Pegah Ramezani et.al. | 2408.03062 | null |
2024-08-06 | OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational Agents | Qiang Sun et.al. | 2408.03047 | link |
2024-08-06 | Targeted Visual Prompting for Medical Visual Question Answering | Sergio Tascon-Morales et.al. | 2408.03043 | link |
2024-08-06 | Training-Free Condition Video Diffusion Models for single frame Spatial-Semantic Echocardiogram Synthesis | Van Phi Nguyen et.al. | 2408.03035 | link |
2024-08-05 | Command-line Obfuscation Detection using Small Language Models | Vojtech Outrata et.al. | 2408.02637 | null |
2024-08-05 | VidGen-1M: A Large-Scale Dataset for Text-to-video Generation | Zhiyu Tan et.al. | 2408.02629 | null |
2024-08-05 | YOWOv3: An Efficient and Generalized Framework for Human Action Detection and Recognition | Duc Manh Nguyen Dang et.al. | 2408.02623 | link |
2024-08-05 | LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba | Yunxiang Fu et.al. | 2408.02615 | link |
2024-08-05 | MetaParticles: Computationally engineered nanomaterials with tunable and responsive properties | Massimiliano Paesani et.al. | 2408.02564 | null |
2024-08-05 | Fairness and Bias Mitigation in Computer Vision: A Survey | Sepehr Dehdashtian et.al. | 2408.02464 | null |
2024-08-05 | TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments | Daeun Song et.al. | 2408.02454 | null |
2024-08-05 | Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models | Zi Liang et.al. | 2408.02416 | link |
2024-08-05 | Multi-weather Cross-view Geo-localization Using Denoising Diffusion Models | Tongtong Feng et.al. | 2408.02408 | null |
2024-08-05 | A Few-Shot Approach for Relation Extraction Domain Adaptation using Large Language Models | Vanni Zavarella et.al. | 2408.02377 | null |
2024-08-02 | Conditional LoRA Parameter Generation | Xiaolong Jin et.al. | 2408.01415 | null |
2024-08-02 | Autoencoders in Function Space | Justin Bunker et.al. | 2408.01362 | link |
2024-08-02 | MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code | Kaiwen Ning et.al. | 2408.01354 | link |
2024-08-02 | TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling | Dong Huo et.al. | 2408.01291 | null |
2024-08-02 | A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness | Lutao Jiang et.al. | 2408.01269 | null |
2024-08-02 | Exchange control in a MOS double quantum dot made using a 300 mm wafer process | Jacob F. Chittock-Wood et.al. | 2408.01241 | null |
2024-08-02 | CLIP4Sketch: Enhancing Sketch to Mugshot Matching through Dataset Augmentation using Diffusion Models | Kushal Kumar Jain et.al. | 2408.01233 | null |
2024-08-02 | Reality Fusion: Robust Real-time Immersive Mobile Robot Teleoperation with Volumetric Visual Data Fusion | Ke Li et.al. | 2408.01225 | link |
2024-08-02 | PSP-GEN: Stochastic inversion of the Process-Structure-Property chain in materials design through deep, generative probabilistic modeling | Yaohua Zang et.al. | 2408.01114 | null |
2024-08-02 | Six Dragons Fly Again: Reviving 15th-Century Korean Court Music with Transformers and Novel Encoding | Danbinaerin Han et.al. | 2408.01096 | link |
2024-08-01 | Optimizing Diffusion Models for Joint Trajectory Prediction and Controllable Generation | Yixiao Wang et.al. | 2408.00766 | null |
2024-08-01 | Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention | Susung Hong et.al. | 2408.00760 | link |
2024-08-01 | DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency | Jovan Stojkovic et.al. | 2408.00741 | null |
2024-08-01 | TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models | Gilad Deutch et.al. | 2408.00735 | null |
2024-08-01 | A Natural Language Processing Framework for Hotel Recommendation Based on Users' Text Reviews | Lavrentia Aravani et.al. | 2408.00716 | null |
2024-08-02 | Reinforcement Learning applied to Insurance Portfolio Pursuit | Edward James Young et.al. | 2408.00713 | link |
2024-08-01 | MotionFix: Text-Driven 3D Human Motion Editing | Nikos Athanasiou et.al. | 2408.00712 | null |
2024-08-01 | Synthetic dual image generation for reduction of labeling efforts in semantic segmentation of micrographs with a customized metric function | Matias Oscar Volman Stern et.al. | 2408.00707 | null |
2024-08-01 | AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models | Daqin Luo et.al. | 2408.00665 | link |
2024-08-01 | Privacy-preserving datasets by capturing feature distributions with Conditional VAEs | Francesco Di Salvo et.al. | 2408.00639 | link |
2024-07-31 | Detecting, Explaining, and Mitigating Memorization in Diffusion Models | Yuxin Wen et.al. | 2407.21720 | link |
2024-07-31 | Tora: Trajectory-oriented Diffusion Transformer for Video Generation | Zhenghao Zhang et.al. | 2407.21705 | link |
2024-07-31 | Generative Diffusion Model for Seismic Imaging Improvement of Sparsely Acquired Data and Uncertainty Quantification | Xingchen Shi et.al. | 2407.21683 | null |
2024-07-31 | Quality Control for Radiology Report Generation Models via Auxiliary Auditing Components | Hermione Warr et.al. | 2407.21638 | null |
2024-07-31 | LLM-for-X: Application-agnostic Integration of Large Language Models to Support Personal Writing Workflows | Lukas Teufelberger et.al. | 2407.21593 | null |
2024-07-31 | Long-term investment and energy procurement risk management under uncertainty for an electrolytic green hydrogen producer | Owen Palmer et.al. | 2407.21574 | null |
2024-07-31 | Conditioned Prompt-Optimization for Continual Deepfake Detection | Francesco Laiti et.al. | 2407.21554 | link |
2024-07-31 | CXSimulator: A User Behavior Simulation using LLM Embeddings for Web-Marketing Campaign Assessment | Akira Kasuga et.al. | 2407.21553 | null |
2024-07-31 | Explainable and Controllable Motion Curve Guided Cardiac Ultrasound Video Generation | Junxuan Yu et.al. | 2407.21490 | null |
2024-07-31 | Maverick: Efficient and Accurate Coreference Resolution Defying Recent Trends | Giuliano Martinelli et.al. | 2407.21489 | link |
2024-07-30 | Matting by Generation | Zhixiang Wang et.al. | 2407.21017 | null |
2024-07-30 | Add-SD: Rational Generation without Manual Reference | Lingfeng Yang et.al. | 2407.21016 | link |
2024-07-30 | Integrating Agent-Based and Compartmental Models for Infectious Disease Modeling: A Novel Hybrid Approach | Inan Bostanci et.al. | 2407.20993 | null |
2024-07-30 | MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions | Xiaowei Chi et.al. | 2407.20962 | link |
2024-07-30 | Mitigating calibration errors from mutual coupling with time-domain filtering of 21 cm cosmological radio observations | N. Charles et.al. | 2407.20923 | null |
2024-07-30 | Impact of Geographical Separation on Spectrum Sharing Markets | Kangle Mu et.al. | 2407.20909 | null |
2024-07-30 | Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering | Yanpeng Zhao et.al. | 2407.20908 | link |
2024-07-30 | Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks | Yunfeng Diao et.al. | 2407.20836 | null |
2024-07-30 | Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning | Norman Di Palo et.al. | 2407.20798 | null |
2024-07-30 | SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models | Zheng Liu et.al. | 2407.20756 | link |
2024-07-29 | Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing | Ekaterina Iakovleva et.al. | 2407.20232 | null |
2024-07-29 | LatentArtiFusion: An Effective and Efficient Histological Artifacts Restoration Framework | Zhenqi He et.al. | 2407.20172 | link |
2024-07-29 | Diffusion Feedback Helps CLIP See Better | Wenxuan Wang et.al. | 2407.20171 | link |
2024-07-29 | DDAP: Dual-Domain Anti-Personalization against Text-to-Image Diffusion Models | Jing Yang et.al. | 2407.20141 | null |
2024-07-29 | Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning | Liyuan Mao et.al. | 2407.20109 | null |
2024-07-29 | On the significance of parameters and the projective level in the Choice and Collection axioms | Vladimir Kanovei et.al. | 2407.20098 | null |
2024-07-29 | Generative Diffusion Model Bootstraps Zero-shot Classification of Fetal Ultrasound Images In Underrepresented African Populations | Fangyijie Wang et.al. | 2407.20072 | link |
2024-07-29 | ImagiNet: A Multi-Content Dataset for Generalizable Synthetic Image Detection via Contrastive Learning | Delyan Boychev et.al. | 2407.20020 | link |
2024-07-29 | Reproducibility Study of "ITI-GEN: Inclusive Text-to-Image Generation" | Daniel Gallo Fernández et.al. | 2407.19996 | link |
2024-07-29 | HeadsetOff: Enabling Photorealistic Video Conferencing on Economical VR Headsets | Yili Jin et.al. | 2407.19988 | null |
2024-07-26 | Generative Adversarial Networks for Imputing Sparse Learning Performance | Liang Zhang et.al. | 2407.18875 | null |
2024-07-26 | Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment | Yuze Zheng et.al. | 2407.18854 | null |
2024-07-26 | Scalable Group Choreography via Variational Phase Manifold Learning | Nhat Le et.al. | 2407.18839 | null |
2024-07-26 | Revision of calcium and scandium abundances in Am stars based on NLTE calculations and comparison with diffusion stellar evolution models | L. I. Mashonkina et.al. | 2407.18736 | null |
2024-07-26 | BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation | Peng Hao et.al. | 2407.18715 | null |
2024-07-26 | Q-gen: A Parameterized Quantum Circuit Generator | Yikai Mao et.al. | 2407.18697 | link |
2024-07-26 | Adversarial Robustification via Text-to-Image Diffusion Models | Daewon Choi et.al. | 2407.18658 | link |
2024-07-26 | Robust VAEs via Generating Process of Noise Augmented Data | Hiroo Irobe et.al. | 2407.18632 | null |
2024-07-26 | Denoising Lévy Probabilistic Models | Dario Shariatian et.al. | 2407.18609 | link |
2024-07-26 | How To Segment in 3D Using 2D Models: Automated 3D Segmentation of Prostate Cancer Metastatic Lesions on PET Volumes Using Multi-Angle Maximum Intensity Projections and Diffusion Models | Amirhosein Toosi et.al. | 2407.18555 | link |
2024-07-25 | RegionDrag: Fast Region-Based Image Editing with Diffusion Models | Jingyi Lu et.al. | 2407.18247 | null |
2024-07-25 | VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads | Orest Kupyn et.al. | 2407.18245 | link |
2024-07-25 | CodedVO: Coded Visual Odometry | Sachin Shah et.al. | 2407.18240 | null |
2024-07-25 | SuperFlow: A Fully-Customized RTL-to-GDS Design Automation Flow for Adiabatic Quantum-Flux-Parametron Superconducting Circuits | Yanyue Xie et.al. | 2407.18209 | null |
2024-07-25 | Test2VA: Reusing GUI Test Cases for Voice Assistant Features Development in Mobile Applications | Garrett Weaver et.al. | 2407.18155 | null |
2024-07-25 | Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images | Roberto Di Via et.al. | 2407.18125 | null |
2024-07-25 | Keypoint Promptable Re-Identification | Vladimir Somers et.al. | 2407.18112 | link |
2024-07-25 | SSTD: Stripe-Like Space Target Detection using Single-Point Supervision | Zijian Zhu et.al. | 2407.18097 | null |
2024-07-25 | Cross-Observatory Coordination with tilepy: A Novel Tool for Observations of Multi-Messenger Transient Events | Monica Seglar-Arroyo et.al. | 2407.18076 | null |
2024-07-25 | AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild | Junho Park et.al. | 2407.18034 | link |
2024-07-24 | SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency | Yiming Xie et.al. | 2407.17470 | null |
2024-07-24 | BlueTempNet: A Temporal Multi-network Dataset of Social Interactions in Bluesky Social | Ujun Jeong et.al. | 2407.17451 | link |
2024-07-24 | ProvenanceWidgets: A Library of UI Control Elements to Track and Dynamically Overlay Analytic Provenance | Arpit Narechania et.al. | 2407.17431 | link |
2024-07-24 | CDDIP: Constrained Diffusion-Driven Deep Image Prior for Seismic Image Reconstruction | Paul Goyes-Peñafiel et.al. | 2407.17402 | link |
2024-07-24 | Cosmic ray susceptibility of the Terahertz Intensity Mapper detector arrays | Lun-Jun Liu et.al. | 2407.17381 | null |
2024-07-24 | ViPer: Visual Personalization of Generative Models via Individual Preference Learning | Sogand Salehi et.al. | 2407.17365 | null |
2024-07-24 | Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching | Yuyang Ding et.al. | 2407.17349 | link |
2024-07-24 | Quantum nonlocal modulation cancellation with distributed clocks | Stephen D. Chapman et.al. | 2407.17330 | null |
2024-07-25 | Enhanced Deep Learning Methodologies and MRI Selection Techniques for Dementia Diagnosis in the Elderly Population | Nikolaos Ntampakis et.al. | 2407.17324 | null |
2024-07-24 | Edge-Cloud Continuum Orchestration of Critical Services: A Smart-City Approach | Rodrigo Rosmaninho et.al. | 2407.17314 | null |
2024-07-23 | Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions | Fabio Tosi et.al. | 2407.16698 | link |
2024-07-23 | From Imitation to Refinement -- Residual RL for Precise Visual Assembly | Lars Ankile et.al. | 2407.16677 | null |
2024-07-23 | RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent | Huiyu Xu et.al. | 2407.16667 | null |
2024-07-23 | MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence | Canyu Zhao et.al. | 2407.16655 | null |
2024-07-23 | Unveiling and Mitigating Bias in Audio Visual Segmentation | Peiwen Sun et.al. | 2407.16638 | null |
2024-07-23 | Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses | Haojun Yu et.al. | 2407.16634 | null |
2024-07-23 | GenRec: A Flexible Data Generator for Recommendations | Erica Coppolillo et.al. | 2407.16594 | null |
2024-07-23 | COALA: A Practical and Vision-Centric Federated Learning Platform | Weiming Zhuang et.al. | 2407.16560 | link |
2024-07-23 | DreamVTON: Customizing 3D Virtual Try-on with Personalized Diffusion Models | Zhenyu Xie et.al. | 2407.16511 | null |
2024-07-23 | qMRI Diffusor: Quantitative T1 Mapping of the Brain using a Denoising Diffusion Probabilistic Model | Shishuai Wang et.al. | 2407.16477 | null |
2024-07-22 | Artist: Aesthetically Controllable Text-Driven Stylization without Training | Ruixiang Jiang et.al. | 2407.15842 | link |
2024-07-23 | A Large-scale Benchmark Dataset for Commuting Origin-destination Matrix Generation | Can Rong et.al. | 2407.15823 | link |
2024-07-22 | Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget | Vikash Sehwag et.al. | 2407.15811 | null |
2024-07-22 | Quantum Computing for Phonon Scattering Effects on Thermal Conductivity | Xiangjun Tan et.al. | 2407.15808 | null |
2024-07-22 | Enhancing Mass Customization Manufacturing: Multiobjective Metaheuristic Algorithms for flow shop Production in Smart Industry | Diego Rossit et.al. | 2407.15802 | null |
2024-07-22 | Diffusion Model Based Resource Allocation Strategy in Ultra-Reliable Wireless Networked Control Systems | Amirhassan Babazadeh Darabi et.al. | 2407.15784 | null |
2024-07-22 | A Hamilton-Jacobi approach to road-field reaction-diffusion models | Christopher Henderson et.al. | 2407.15760 | null |
2024-07-22 | Diffusion for Out-of-Distribution Detection on Road Scenes and Beyond | Silvio Galesso et.al. | 2407.15739 | link |
2024-07-22 | DStruct2Design: Data and Benchmarks for Data Structure Driven Generative Floor Plan Design | Zhi Hao Luo et.al. | 2407.15723 | link |
2024-07-22 | Estimating Probability Densities with Transformer and Denoising Diffusion | Henry W. Leung et.al. | 2407.15703 | link |
2024-07-19 | DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks | Sarah Jabbour et.al. | 2407.14509 | null |
2024-07-19 | On Pre-training of Multimodal Language Models Customized for Chart Understanding | Wan-Cyuan Fan et.al. | 2407.14506 | null |
2024-07-19 | T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation | Kaiyue Sun et.al. | 2407.14505 | link |
2024-07-19 | M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models | Seunggeun Chi et.al. | 2407.14502 | null |
2024-07-19 | A Precision Cryogenic Positioning Stage for Detector Dithering and Flexure Compensation | Stephen A. Smee et.al. | 2407.14493 | null |
2024-07-19 | Contrastive Learning with Counterfactual Explanations for Radiology Report Generation | Mingjie Li et.al. | 2407.14474 | null |
2024-07-19 | Describe Data to get Science-Data-Ready Tooling: Awkward as a Target for Kaitai Struct YAML | Manasvi Goyal et.al. | 2407.14461 | null |
2024-07-19 | Co-synthesis of Histopathology Nuclei Image-Label Pairs using a Context-Conditioned Joint Diffusion Model | Seonghui Min et.al. | 2407.14434 | null |
2024-07-19 | Controllable and Efficient Multi-Class Pathology Nuclei Data Augmentation using Text-Conditioned Diffusion Models | Hyun-Jic Oh et.al. | 2407.14426 | null |
2024-07-19 | GLAudio Listens to the Sound of the Graph | Aurelio Sulser et.al. | 2407.14387 | link |
2024-07-18 | LogoSticker: Inserting Logos into Diffusion Models for Customized Generation | Mingkang Zhu et.al. | 2407.13752 | null |
2024-07-18 | Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review | Masatoshi Uehara et.al. | 2407.13734 | link |
2024-07-18 | Shaded Route Planning Using Active Segmentation and Identification of Satellite Images | Longchao Da et.al. | 2407.13689 | null |
2024-07-18 | PASTA: Controllable Part-Aware Shape Generation with Autoregressive Transformers | Songlin Li et.al. | 2407.13677 | link |
2024-07-18 | MeshSegmenter: Zero-Shot Mesh Semantic Segmentation via Texture Synthesis | Ziming Zhong et.al. | 2407.13675 | link |
2024-07-18 | Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models | Xiaoyu Zhu et.al. | 2407.13642 | null |
2024-07-18 | Training-free Composite Scene Generation for Layout-to-Image Synthesis | Jiaqi Liu et.al. | 2407.13609 | link |
2024-07-18 | EnergyDiff: Universal Time-Series Energy Data Generation using Diffusion Models | Nan Lin et.al. | 2407.13538 | null |
2024-07-18 | VeriQR: A Robustness Verification Tool for Quantum Machine Learning Models | Yanling Lin et.al. | 2407.13533 | null |
2024-07-18 | All Roads Lead to Rome? Exploring Representational Similarities Between Latent Spaces of Generative Image Models | Charumathi Badrinath et.al. | 2407.13449 | link |
2024-07-17 | SMooDi: Stylized Motion Diffusion Model | Lei Zhong et.al. | 2407.12783 | null |
2024-07-17 | VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control | Sherwin Bahmani et.al. | 2407.12781 | null |
2024-07-17 | Hallucination Index: An Image Quality Metric for Generative Reconstruction Models | Matthew Tivnan et.al. | 2407.12780 | null |
2024-07-17 | GroundUp: Rapid Sketch-Based 3D City Massing | Gizem Esra Unlu et.al. | 2407.12739 | null |
2024-07-17 | EchoSight: Advancing Visual-Language Models with Wiki Knowledge | Yibin Yan et.al. | 2407.12735 | null |
2024-07-17 | NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model | Zhongqun Zhang et.al. | 2407.12727 | null |
2024-07-17 | An Evaluation of Continual Learning for Advanced Node Semiconductor Defect Inspection | Amit Prasad et.al. | 2407.12724 | null |
2024-07-17 | Unlocking planetesimal magnetic field histories: a refined, versatile model for thermal evolution and dynamo generation | Hannah R. Sanderson et.al. | 2407.12721 | null |
2024-07-17 | SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow | Yuanzhi Zhu et.al. | 2407.12718 | link |
2024-07-17 | Teleoperation in Robot-assisted MIS with Adaptive RCM via Admittance Control | Ehsan Nasiri et.al. | 2407.12711 | null |
2024-07-16 | Efficient Training with Denoised Neural Weights | Yifan Gong et.al. | 2407.11966 | null |
2024-07-16 | UrbanWorld: An Urban World Model for 3D City Generation | Yu Shang et.al. | 2407.11965 | link |
2024-07-16 | Context-Guided Diffusion for Out-of-Distribution Molecular and Protein Design | Leo Klarner et.al. | 2407.11942 | link |
2024-07-16 | Code Documentation and Analysis to Secure Software Development | Paul Attie et.al. | 2407.11934 | null |
2024-07-16 | Global Optimisation of Black-Box Functions with Generative Models in the Wasserstein Space | Tigran Ramazyan et.al. | 2407.11917 | link |
2024-07-16 | Quantised Global Autoencoder: A Holistic Approach to Representing Visual Data | Tim Elsner et.al. | 2407.11913 | null |
2024-07-16 | Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development | Daoyuan Chen et.al. | 2407.11784 | link |
2024-07-16 | Diffusion-driven self-assembly of emerin nanodomains at the nuclear envelope | Carlos D. Alas et.al. | 2407.11758 | null |
2024-07-16 | Generating Multi-Modal and Multi-Attribute Single-Cell Counts with CFGen | Alessandro Palma et.al. | 2407.11734 | link |
2024-07-16 | Theoretical Insights into CycleGAN: Analyzing Approximation and Estimation Errors in Unpaired Data Generation | Luwei Sun et.al. | 2407.11678 | null |
2024-07-15 | Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion | Yongyuan Liang et.al. | 2407.10973 | null |
2024-07-15 | Fast Matrix Multiplications for Lookup Table-Quantized LLMs | Han Guo et.al. | 2407.10960 | link |
2024-07-15 | InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models | Nirat Saini et.al. | 2407.10958 | null |
2024-07-16 | DataDream: Few-shot Guided Dataset Generation | Jae Myung Kim et.al. | 2407.10910 | link |
2024-07-15 | Optical Diffusion Models for Image Generation | Ilker Oguz et.al. | 2407.10897 | null |
2024-07-15 | R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection | Zheyuan Zhou et.al. | 2407.10862 | null |
2024-07-15 | Physics-Inspired Generative Models in Medical Imaging: A Review | Dennis Hein et.al. | 2407.10856 | null |
2024-07-15 | Inferring dark energy properties from the scale factor parametrisation | Upala Mukhopadhayay et.al. | 2407.10845 | null |
2024-07-15 | MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration | Yulin Ren et.al. | 2407.10833 | null |
2024-07-15 | Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation | Tu Vu et.al. | 2407.10817 | null |
2024-07-12 | StyleSplat: 3D Object Style Transfer with Gaussian Splatting | Sahil Jain et.al. | 2407.09473 | null |
2024-07-12 | FairyLandAI: Personalized Fairy Tales utilizing ChatGPT and DALLE-3 | Georgios Makridis et.al. | 2407.09467 | null |
2024-07-12 | The |
Matteo Belenchia et.al. | 2407.09441 | null |
2024-07-12 | Graph Neural Network Causal Explanation via Neural Causal Models | Arman Behnam et.al. | 2407.09378 | link |
2024-07-12 | Computationally Efficient Estimation of Large Probit Models | Patrick Ding et.al. | 2407.09371 | null |
2024-07-12 | Is Contrasting All You Need? Contrastive Learning for the Detection and Attribution of AI-generated Text | Lucio La Cava et.al. | 2407.09364 | null |
2024-07-15 | Any-Property-Conditional Molecule Generation with Self-Criticism using Spanning Trees | Alexia Jolicoeur-Martineau et.al. | 2407.09357 | link |
2024-07-12 | PID: Physics-Informed Diffusion Model for Infrared Image Generation | Fangyuan Mao et.al. | 2407.09299 | link |
2024-07-12 | Learning Distances from Data with Normalizing Flows and Score Matching | Peter Sorrenson et.al. | 2407.09297 | null |
2024-07-12 | Surgical Text-to-Image Generation | Chinedu Innocent Nwoye et.al. | 2407.09230 | null |
2024-07-11 | Video Diffusion Alignment via Reward Gradients | Mihir Prabhudesai et.al. | 2407.08737 | link |
2024-07-11 | Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models | Zhening Xing et.al. | 2407.08701 | null |
2024-07-11 | FAR-Trans: An Investment Dataset for Financial Asset Recommendation | Javier Sanz-Cruzado et.al. | 2407.08692 | null |
2024-07-11 | Scattering transforms on the sphere, application to large scale structure modelling | Louise Mousset et.al. | 2407.08687 | null |
2024-07-11 | CAD-Prompted Generative Models: A Pathway to Feasible and Novel Engineering Designs | Leah Chong et.al. | 2407.08675 | null |
2024-07-11 | Still-Moving: Customized Video Generation without Customized Video Data | Hila Chefer et.al. | 2407.08674 | null |
2024-07-11 | Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density | Shuangqi Li et.al. | 2407.08659 | null |
2024-07-11 | Adaptive Smooth Non-Stationary Bandits | Joe Suk et.al. | 2407.08654 | null |
2024-07-11 | Fine-Tuning Stable Diffusion XL for Stylistic Icon Generation: A Comparison of Caption Size | Youssef Sultan et.al. | 2407.08513 | null |
2024-07-11 | Latent Conditional Diffusion-based Data Augmentation for Continuous-Time Dynamic Graph Mode | Yuxing Tian et.al. | 2407.08500 | null |
2024-07-10 | Generative Image as Action Models | Mohit Shridhar et.al. | 2407.07875 | link |
2024-07-10 | Dynamical Measure Transport and Neural PDE Solvers for Sampling | Jingtong Sun et.al. | 2407.07873 | null |
2024-07-10 | Controlling Space and Time with Diffusion Models | Daniel Watson et.al. | 2407.07860 | null |
2024-07-10 | Generic Numerical Analysis of Stochastic Reaction Diffusion Model with applications in excitable media | Yahya Alnashri et.al. | 2407.07834 | null |
2024-07-10 | Universal and non-universal signatures in the scaling functions of critical variables | Gianluca Teza et.al. | 2407.07782 | null |
2024-07-10 | Towards Human-Like Driving: Active Inference in Autonomous Vehicle Control | Elahe Delavari et.al. | 2407.07684 | null |
2024-07-10 | VEnhancer: Generative Space-Time Enhancement for Video Generation | Jingwen He et.al. | 2407.07667 | null |
2024-07-10 | A Coding-Theoretic Analysis of Hyperspherical Prototypical Learning Geometry | Martin Lindström et.al. | 2407.07664 | link |
2024-07-10 | The heterogeneous impact of the EU-Canada agreement with causal machine | Lionel Fontagné et.al. | 2407.07652 | null |
2024-07-11 | MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis | Wanggui He et.al. | 2407.07614 | link |
2024-07-09 | ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction | Shaozhe Hao et.al. | 2407.07077 | link |
2024-07-09 | Latent Space Imaging | Matheus Souza et.al. | 2407.07052 | null |
2024-07-09 | Generative models of astrophysical fields with scattering transforms on the sphere | Louise Mousset et.al. | 2407.07007 | link |
2024-07-10 | PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods | Yiying Wang et.al. | 2407.06985 | link |
2024-07-09 | Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach | Taolin Zhang et.al. | 2407.06964 | null |
2024-07-09 | RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models | Bowen Zhang et.al. | 2407.06938 | null |
2024-07-09 | HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance | Guian Fang et.al. | 2407.06937 | link |
2024-07-09 | Fine-grained large-scale content recommendations for MSX sellers | Manpreet Singh et.al. | 2407.06910 | null |
2024-07-09 | Enhanced Battery Degradation-Aware Scheduling for Distribution Network with Electric Vehicle Load | Vijay Babu Pamshetti et.al. | 2407.06857 | null |
2024-07-09 | A reaction-diffusion model for relapsing-remitting multiple sclerosis with a treatment term | Romina Travaglini et.al. | 2407.06802 | null |
2024-07-08 | Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images | Zhangyang Qi et.al. | 2407.06191 | null |
2024-07-08 | CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation | Xinying Guo et.al. | 2407.06188 | null |
2024-07-08 | JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation | Yu Zeng et.al. | 2407.06187 | null |
2024-07-08 | The Tug-of-War Between Deepfake Generation and Detection | Hannah Lee et.al. | 2407.06174 | null |
2024-07-08 | ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation | Ethan Chern et.al. | 2407.06135 | link |
2024-07-08 | Structured Generations: Using Hierarchical Clusters to guide Diffusion Models | Jorge da Silva Goncalves et.al. | 2407.06124 | link |
2024-07-08 | PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models | Jinhua Zhang et.al. | 2407.06109 | link |
2024-07-08 | Accelerating Diffusion for SAR-to-Optical Image Translation via Adversarial Consistency Distillation | Xinyu Bai et.al. | 2407.06095 | null |
2024-07-08 | Assessing Cardiomegaly in Dogs Using a Simple CNN Model | Nikhil Deekonda et.al. | 2407.06092 | null |
2024-07-08 | Layered Diffusion Model for One-Shot High Resolution Text-to-Image Synthesis | Emaad Khwaja et.al. | 2407.06079 | null |
2024-07-05 | RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation | Yuxuan Kuang et.al. | 2407.04689 | link |
2024-07-05 | Thermal and mechanical study of a parametrised cryostat model for optical characterisation of upcoming CMB experiments | Thomas J. L. J. Gascard et.al. | 2407.04613 | link |
2024-07-08 | PartCraft: Crafting Creative Objects by Parts | Kam Woh Ng et.al. | 2407.04604 | link |
2024-07-05 | Structural Constraint Integration in Generative Model for Discovery of Quantum Material Candidates | Ryotaro Okabe et.al. | 2407.04557 | null |
2024-07-05 | Unified continuous-time q-learning for mean-field game and mean-field control problems | Xiaoli Wei et.al. | 2407.04521 | null |
2024-07-08 | Speed-accuracy trade-off for the diffusion models: Wisdom from nonequilibrium thermodynamics and optimal transport | Kotaro Ikeda et.al. | 2407.04495 | null |
2024-07-05 | PROUD: PaRetO-gUided Diffusion Model for Multi-objective Generation | Yinghua Yao et.al. | 2407.04493 | link |
2024-07-05 | Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model | Duy M. H. Nguyen et.al. | 2407.04489 | null |
2024-07-05 | Leveraging Graph Structures to Detect Hallucinations in Large Language Models | Noa Nonkes et.al. | 2407.04485 | link |
2024-07-05 | VCD-Texture: Variance Alignment based 3D-2D Co-Denoising for Text-Guided Texturing | Shang Liu et.al. | 2407.04461 | null |
2024-07-03 | DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents | Yilun Xu et.al. | 2407.03300 | link |
2024-07-03 | Improved Noise Schedule for Diffusion Training | Tiankai Hang et.al. | 2407.03297 | null |
2024-07-03 | Anomaly-based Framework for Detecting Power Overloading Cyberattacks in Smart Grid AMI | Abdelaziz Amara Korba et.al. | 2407.03264 | null |
2024-07-03 | SOS! Soft Prompt Attack Against Open-Source Large Language Models | Ziqing Yang et.al. | 2407.03160 | null |
2024-07-04 | Spatio-Temporal Adaptive Diffusion Models for EEG Super-Resolution in Epilepsy Diagnosis | Tong Zhou et.al. | 2407.03089 | null |
2024-07-03 | Artificial Inductive Bias for Synthetic Tabular Data Generation in Data-Scarce Scenarios | Patricia A. Apellániz et.al. | 2407.03080 | link |
2024-07-03 | Electromagnetic Property Sensing Based on Diffusion Model in ISAC System | Yuhua Jiang et.al. | 2407.03075 | null |
2024-07-03 | Semantic-Aware Power Allocation for Generative Semantic Communications with Foundation Models | Chunmei Xu et.al. | 2407.03050 | null |
2024-07-03 | SlerpFace: Face Template Protection via Spherical Linear Interpolation | Zhizhou Zhong et.al. | 2407.03043 | null |
2024-07-03 | An Organism Starts with a Single Pix-Cell: A Neural Cellular Diffusion for High-Resolution Image Synthesis | Marawan Elbatel et.al. | 2407.03018 | link |
2024-07-02 | Magic Insert: Style-Aware Drag-and-Drop | Nataniel Ruiz et.al. | 2407.02489 | null |
2024-07-02 | Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models | Fei Shen et.al. | 2407.02482 | link |
2024-07-02 | A Pattern Language for Machine Learning Tasks | Benjamin Rodatz et.al. | 2407.02424 | null |
2024-07-02 | GCF: Graph Convolutional Networks for Facial Expression Recognition | Hozaifa Kassab et.al. | 2407.02361 | null |
2024-07-02 | MORPHEUS: Modeling Role from Personalized Dialogue History by Exploring and Utilizing Latent Space | Yihong Tang et.al. | 2407.02345 | null |
2024-07-02 | Choice-based time slot management in attended home delivery | Dorsa Abdolhamidi et.al. | 2407.02339 | null |
2024-07-02 | Mining Constraints from Reference Process Models for Detecting Best-Practice Violations in Event Log | Adrian Rebmann et.al. | 2407.02336 | link |
2024-07-02 | A tactical time slot management problem under mixed logit demand | Dorsa Abdolhamidi et.al. | 2407.02308 | null |
2024-07-02 | Renard: A Modular Pipeline for Extracting Character Networks from Narrative Texts | Arthur Amalvy et.al. | 2407.02284 | link |
2024-07-03 | Federated Distillation for Medical Image Classification: Towards Trustworthy Computer-Aided Diagnosis | Sufen Ren et.al. | 2407.02261 | null |
2024-06-28 | Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language | Yicheng Chen et.al. | 2406.20085 | null |
2024-06-28 | The hybrid Josephson rhombus: A superconducting element with tailored current-phase relation | L. Banszerus et.al. | 2406.20082 | null |
2024-06-28 | HouseCrafter: Lifting Floorplans to 3D Scenes with 2D Diffusion Model | Hieu T. Nguyen et.al. | 2406.20077 | null |
2024-06-28 | Modeling and LQR Control of Insect Sized Flapping Wing Robot | Daksh Dhingra et.al. | 2406.20061 | null |
2024-06-28 | Neural Differentiable Modeling with Diffusion-Based Super-resolution for Two-Dimensional Spatiotemporal Turbulence | Xiantao Fan et.al. | 2406.20047 | null |
2024-06-28 | Electrostatics-based particle sampling and approximate inference | Yongchao Huang et.al. | 2406.20044 | link |
2024-06-28 | HAITCH: A Framework for Distortion and Motion Correction in Fetal Multi-Shell Diffusion-Weighted MRI | Haykel Snoussi et.al. | 2406.20042 | null |
2024-06-28 | Concept Lens: Visually Analyzing the Consistency of Semantic Manipulation in GANs | Sangwon Jeong et.al. | 2406.19987 | null |
2024-07-01 | Text2Robot: Evolutionary Robot Design from Text Descriptions | Ryan P. Ringel et.al. | 2406.19963 | link |
2024-06-28 | Kolmogorov-Smirnov GAN | Maciej Falkiewicz et.al. | 2406.19948 | link |
2024-06-27 | Looking 3D: Anomaly Detection with 2D-3D Alignment | Ankan Bhunia et.al. | 2406.19393 | link |
2024-06-27 | Taming Data and Transformers for Audio Generation | Moayed Haji-Ali et.al. | 2406.19388 | null |
2024-06-27 | Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space | Core Francisco Park et.al. | 2406.19370 | link |
2024-06-27 | Accelerating Multiphase Flow Simulations with Denoising Diffusion Model Driven Initializations | Jaehong Chung et.al. | 2406.19333 | null |
2024-06-27 | Subtractive Training for Music Stem Insertion using Latent Diffusion Models | Ivan Villa-Renteria et.al. | 2406.19328 | null |
2024-06-27 | Efficient World Models with Context-Aware Tokenization | Vincent Micheli et.al. | 2406.19320 | link |
2024-06-27 | PNeRV: A Polynomial Neural Representation for Videos | Sonam Gupta et.al. | 2406.19299 | null |
2024-06-27 | Compositional Image Decomposition with Diffusion Models | Jocelin Su et.al. | 2406.19298 | null |
2024-06-27 | BISeizuRe: BERT-Inspired Seizure Data Representation to Improve Epilepsy Monitoring | Luca Benfenati et.al. | 2406.19189 | null |
2024-06-27 | On Pólya-Young urn models and growth processes | Markus Kuba et.al. | 2406.19110 | null |
2024-06-26 | MatchTime: Towards Automatic Soccer Game Commentary Generation | Jiayuan Rao et.al. | 2406.18530 | link |
2024-06-26 | MultiDiff: Consistent Novel View Synthesis from a Single Image | Norman Müller et.al. | 2406.18524 | null |
2024-06-26 | Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration | Kang Liao et.al. | 2406.18516 | link |
2024-06-26 | DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance | Younghyun Kim et.al. | 2406.18459 | link |
2024-06-26 | Cascading Large Language Models for Salient Event Graph Generation | Xingwei Tan et.al. | 2406.18449 | link |
2024-06-26 | Repeat and Concatenate: 2D to 3D Image Translation with 3D to 3D Generative Modeling | Abril Corona-Figueroa et.al. | 2406.18422 | link |
2024-06-26 | Towards diffusion models for large-scale sea-ice modelling | Tobias Sebastian Finn et.al. | 2406.18417 | null |
2024-06-27 | Stable Diffusion Segmentation for Biomedical Images with Single-step Reverse Process | Tianyu Lin et.al. | 2406.18361 | link |
2024-06-26 | Molecular Diffusion Models with Virtual Receptors | Matan Halfon et.al. | 2406.18330 | null |
2024-06-27 | Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems | Italo Luis da Silva et.al. | 2406.18245 | link |
2024-06-25 | DiffusionPDE: Generative PDE-Solving Under Partial Observation | Jiahe Huang et.al. | 2406.17763 | link |
2024-06-25 | MotionBooth: Motion-Aware Customized Text-to-Video Generation | Jianzong Wu et.al. | 2406.17758 | null |
2024-06-25 | Accelerating Clinical Evidence Synthesis with Large Language Models | Zifeng Wang et.al. | 2406.17755 | null |
2024-06-25 | Extensions of Panjer's recursion for mixed compound distributions | Spyridon M. Tzaninis et.al. | 2406.17726 | null |
2024-06-25 | PANDA: A self-driving lab for studying electrodeposited polymer films | Harley Quinn et.al. | 2406.17725 | null |
2024-06-25 | Unified Auto-Encoding with Masked Diffusion | Philippe Hansen-Estruch et.al. | 2406.17688 | link |
2024-06-25 | LaTable: Towards Large Tabular Models | Boris van Breugel et.al. | 2406.17673 | null |
2024-06-26 | SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond | Marco Comunità et.al. | 2406.17672 | null |
2024-06-25 | Banishing LLM Hallucinations Requires Rethinking Generalization | Johnny Li et.al. | 2406.17642 | null |
2024-06-25 | The experience of humans' and robots' mutual (im)politeness in enacted service scenarios: An empirical study | Victor Kaptelinin et.al. | 2406.17641 | null |
2024-06-24 | FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models | Haonan Qiu et.al. | 2406.16863 | link |
2024-06-24 | Dreamitate: Real-World Visuomotor Policy Learning via Video Generation | Junbang Liang et.al. | 2406.16862 | null |
2024-06-24 | DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation | Yuang Peng et.al. | 2406.16855 | link |
2024-06-24 | USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long |
Mounika Marreddy et.al. | 2406.16833 | null |
2024-06-24 | General Binding Affinity Guidance for Diffusion Models in Structure-Based Drug Design | Yue Jian et.al. | 2406.16821 | null |
2024-06-24 | ClotheDreamer: Text-Guided Garment Generation with 3D Gaussians | Yufei Liu et.al. | 2406.16815 | null |
2024-06-24 | Conformal time series decomposition with component-wise exchangeability | Derck W. E. Prinzhorn et.al. | 2406.16766 | link |
2024-06-24 | Inferring stochastic low-rank recurrent neural networks from neural data | Matthijs Pals et.al. | 2406.16749 | link |
2024-06-24 | Portrait3D: 3D Head Generation from Single In-the-wild Portrait Image | Jinkun Hao et.al. | 2406.16710 | null |
2024-06-24 | Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling | Min-Seop Kwak et.al. | 2406.16695 | null |
2024-06-21 | Masked Extended Attention for Zero-Shot Virtual Try-On In The Wild | Nadav Orzech et.al. | 2406.15331 | null |
2024-06-21 | Rethinking Remote Sensing Change Detection With A Mask View | Xiaowen Ma et.al. | 2406.15320 | link |
2024-06-21 | You Only Acquire Sparse-channel (YOAS): A Unified Framework for Dense-channel EEG Generation | Hongyu Chen et.al. | 2406.15269 | null |
2024-06-21 | Evaluating Diversity in Automatic Poetry Generation | Yanran Chen et.al. | 2406.15267 | link |
2024-06-21 | Fingerprint Membership and Identity Inference Against Generative Adversarial Networks | Saverio Cavasin et.al. | 2406.15253 | null |
2024-06-21 | MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation | Xuan He et.al. | 2406.15252 | null |
2024-06-21 | Unsupervised Bayesian Generation of Synthetic CT from CBCT Using Patient-Specific Score-Based Prior | Junbo Peng et.al. | 2406.15219 | null |
2024-06-21 | Sound and Fury, Signifying Nothing? Impact of Data Breach Disclosure Laws | Muhammad Zia Hydari et.al. | 2406.15215 | null |
2024-06-21 | Injecting Bias in Text-To-Image Models via Composite-Trigger Backdoors | Ali Naseh et.al. | 2406.15213 | link |
2024-06-21 | Exploring the Efficacy of Robotic Assistants with ChatGPT and Claude in Enhancing ADHD Therapy: Innovating Treatment Paradigms | Santiago Berrezueta-Guzman et.al. | 2406.15198 | null |
2024-06-20 | A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models | Xincheng Shuai et.al. | 2406.14555 | link |
2024-06-21 | Advancing Fine-Grained Classification by Structure and Subject Preserving Augmentation | Eyal Michaeli et.al. | 2406.14551 | link |
2024-06-20 | Consistency Models Made Easy | Zhengyang Geng et.al. | 2406.14548 | link |
2024-06-20 | IRASim: Learning Interactive Real-Robot Action Simulators | Fangqi Zhu et.al. | 2406.14540 | null |
2024-06-20 | Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps | Nikita Starodubcev et.al. | 2406.14539 | null |
2024-06-20 | Fantastic Copyrighted Beasts and How (Not) to Generate Them | Luxi He et.al. | 2406.14526 | null |
2024-06-20 | Photoacoustic methane detection assisted by a gas-filled anti-resonant hollow-core fiber laser | Cuiling Zhang et.al. | 2406.14521 | null |
2024-06-20 | V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data | Rotem Shalev-Arkushin et.al. | 2406.14510 | null |
2024-06-20 | CodeRAG-Bench: Can Retrieval Augment Code Generation? | Zora Zhiruo Wang et.al. | 2406.14497 | link |
2024-06-20 | SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset | Josef Dai et.al. | 2406.14477 | link |
2024-06-20 | CollaFuse: Collaborative Diffusion Models | Simeon Allmendinger et.al. | 2406.14429 | link |
2024-06-20 | Active Diffusion Subsampling | Oisin Nolan et.al. | 2406.14388 | link |
2024-06-20 | Multicoloured Hardcore Model: Fast Mixing and Queueing | Sam Olesker-Taylor et.al. | 2406.14376 | null |
2024-06-20 | FairX: A comprehensive benchmarking tool for model analysis using fairness, utility, and explainability | Md Fahim Sikder et.al. | 2406.14281 | link |
2024-06-20 | In Tree Structure Should Sentence Be Generated | Yaguang Li et.al. | 2406.14189 | link |
2024-06-20 | CriDiff: Criss-cross Injection Diffusion Framework via Generative Pre-train for Prostate Segmentation | Tingwei Liu et.al. | 2406.14186 | link |
2024-06-20 | Tractable Equilibrium Computation in Markov Games through Risk Aversion | Eric Mazumdar et.al. | 2406.14156 | null |
2024-06-20 | ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning | Zhongjie Duan et.al. | 2406.14130 | link |
2024-06-20 | Dye4AI: Assuring Data Boundary on Generative AI Services | Shu Wang et.al. | 2406.14114 | null |
2024-06-20 | HeartBeat: Towards Controllable Echocardiography Video Synthesis with Multimodal Conditions-Guided Diffusion Models | Xinrui Zhou et.al. | 2406.14098 | null |
2024-06-20 | Bridging bulk and surface: An interacting particle system towards the field-road diffusion model | Matthieu Alfaro et.al. | 2406.14093 | null |
2024-06-20 | A Practical Diffusion Path for Sampling | Omar Chehab et.al. | 2406.14040 | null |
2024-06-20 | Leveraging eBPF and AI for Ransomware Nose Out | Arjun Sekar et.al. | 2406.14020 | null |
2024-06-20 | Feature Fusion Based on Mutual-Cross-Attention Mechanism for EEG Emotion Recognition | Yimin Zhao et.al. | 2406.14014 | link |
2024-06-20 | Exploring Changes in Nation Perception with Nationality-Assigned Personas in LLMs | Mahammed Kamruzzaman et.al. | 2406.13993 | null |
2024-06-20 | The Elusive Pursuit of Replicating PATE-GAN: Benchmarking, Auditing, Debugging | Georgi Ganev et.al. | 2406.13985 | link |
2024-06-20 | Similarity-aware Syncretic Latent Diffusion Model for Medical Image Translation with Representation Learning | Tingyi Lin et.al. | 2406.13977 | null |
2024-06-20 | Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models | Yuan Zhong et.al. | 2406.13942 | null |
2024-06-20 | EnTruth: Enhancing the Traceability of Unauthorized Dataset Usage in Text-to-image Diffusion Models with Minimal and Robust Alterations | Jie Ren et.al. | 2406.13933 | null |
2024-06-20 | Generative AI for Enhancing Active Learning in Education: A Comparative Study of GPT-3.5 and GPT-4 in Crafting Customized Test Questions | Hamdireza Rouzegar et.al. | 2406.13903 | null |
2024-06-19 | INFusion: Diffusion Regularized Implicit Neural Representations for 2D and 3D accelerated MRI reconstruction | Yamin Arefeen et.al. | 2406.13895 | null |
2024-06-19 | Open Generative Large Language Models for Galician | Pablo Gamallo et.al. | 2406.13893 | null |
2024-06-19 | StackRAG Agent: Improving Developer Answers with Retrieval-Augmented Generation | Davit Abrahamyan et.al. | 2406.13840 | link |
2024-06-19 | RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design | Rishabh Anand et.al. | 2406.13839 | link |
2024-06-19 | COAC: Cross-layer Optimization of Accelerator Configurability for Efficient CNN Processing | Steven Colleman et.al. | 2406.13752 | null |
2024-06-19 | GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation | Baiqi Li et.al. | 2406.13743 | link |
2024-06-19 | Tree-Sliced Wasserstein Distance on a System of Lines | Viet-Hoang Tran et.al. | 2406.13725 | null |
2024-06-19 | Hitchhiker's guide on Energy-Based Models: a comprehensive review on the relation with other generative models, sampling and statistical physics | Davide Carbone et.al. | 2406.13661 | null |
2024-06-19 | Towards Minimal Targeted Updates of Language Models with Targeted Negative Training | Lily H. Zhang et.al. | 2406.13660 | link |
2024-06-19 | Stability and Generalizability in SDE Diffusion Models with Measure-Preserving Dynamics | Weitong Zhang et.al. | 2406.13652 | null |
2024-06-19 | On AI-Inspired UI-Design | Jialiang Wei et.al. | 2406.13631 | null |
2024-06-19 | Can AI be enabled to dynamical downscaling? Training a Latent Diffusion Model to mimic km-scale COSMO-CLM downscaling of ERA5 over Italy | Elena Tomasi et.al. | 2406.13627 | link |
2024-06-19 | Enhance the Image: Super Resolution using Artificial Intelligence in MRI | Ziyu Li et.al. | 2406.13625 | null |
2024-06-19 | Generative Modeling by Minimizing the Wasserstein-2 Loss | Yu-Jui Huang et.al. | 2406.13619 | null |
2024-06-19 | Parameter Training Efficiency Aware Resource Allocation for AIGC in Space-Air-Ground Integrated Networks | Liangxin Qian et.al. | 2406.13602 | null |
2024-06-19 | ModSec-Learn: Boosting ModSecurity with Machine Learning | Christian Scano et.al. | 2406.13547 | link |
2024-06-19 | Towards Cyber Threat Intelligence for the IoT | Alfonso Iacovazzi et.al. | 2406.13543 | null |
2024-06-19 | Image Distillation for Safe Data Sharing in Histopathology | Zhe Li et.al. | 2406.13536 | link |
2024-06-19 | Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement | Chenda Li et.al. | 2406.13471 | null |
2024-06-19 | Unifying nonlinearly constrained nonconvex optimization | Charlie Vanaret et.al. | 2406.13454 | link |
2024-06-19 | Federating to Grow Transformers with Constrained Resources without Model Sharing | Shikun Shen et.al. | 2406.13450 | null |
2024-06-19 | Multi-messenger modeling of the Monogem pulsar halo | Youyou Li et.al. | 2406.13426 | null |
2024-06-19 | Style-NeRF2NeRF: 3D Style Transfer From Style-Aligned Multi-View Images | Haruo Fujiwara et.al. | 2406.13393 | null |
2024-06-19 | Effective Edge-wise Representation Learning in Edge-Attributed Bipartite Graphs | Hewen Wang et.al. | 2406.13369 | null |
2024-06-19 | Situational Instructions Database: Task Guidance in Dynamic Environments | Muhammad Saif Ullah Khan et.al. | 2406.13302 | link |
2024-06-19 | ARDuP: Active Region Video Diffusion for Universal Policies | Shuaiyi Huang et.al. | 2406.13301 | null |
2024-06-19 | AniFaceDiff: High-Fidelity Face Reenactment via Facial Parametric Conditioned Diffusion Models | Ken Chen et.al. | 2406.13272 | null |
2024-06-19 | Self-Supervised Diffusion Model for 3-D Seismic Data Reconstruction | Xinyang Wang et.al. | 2406.13252 | null |
2024-06-19 | Optimizing Inventory Management through Multiobjective Reverse Logistics with Environmental Impact | I. B. Wadhawan et.al. | 2406.13226 | null |
2024-06-19 | Neural Residual Diffusion Models for Deep Scalable Vision Generation | Zhiyuan Ma et.al. | 2406.13215 | null |
2024-06-19 | Surgical Triplet Recognition via Diffusion Model | Daochang Liu et.al. | 2406.13210 | null |
2024-06-19 | Diffusion Model-based FOD Restoration from High Distortion in dMRI | Shuo Huang et.al. | 2406.13209 | null |
2024-06-19 | Toward Structure Fairness in Dynamic Graph Embedding: A Trend-aware Dual Debiasing Approach | Yicong Li et.al. | 2406.13201 | link |
2024-06-19 | Synthetic Context Generation for Question Generation | Naiming Liu et.al. | 2406.13188 | null |
2024-06-19 | Conditional score-based diffusion models for solving inverse problems in mechanics | Agnimitra Dasgupta et.al. | 2406.13154 | null |
2024-06-19 | von Mises Quasi-Processes for Bayesian Circular Regression | Yarden Cohen et.al. | 2406.13151 | null |
2024-06-19 | MCAD: Multi-modal Conditioned Adversarial Diffusion Model for High-Quality PET Image Reconstruction | Jiaqi Cui et.al. | 2406.13150 | null |
2024-06-19 | GVT2RPM: An Empirical Study for General Video Transformer Adaptation to Remote Physiological Measurement | Hao Wang et.al. | 2406.13136 | null |
2024-06-19 | Thruster-Assisted Incline Walking | Kaushik Venkatesh Krishnamurthy et.al. | 2406.13118 | null |
2024-06-18 | Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models | Paul Henderson et.al. | 2406.13099 | null |
2024-06-18 | RITA: A Real-time Interactive Talking Avatars Framework | Wuxinlin Cheng et.al. | 2406.13093 | null |
2024-06-18 | PIPPIN: Generating variable length full events from partons | Guillaume Quétant et.al. | 2406.13074 | link |
2024-06-18 | MaskPure: Improving Defense Against Text Adversaries with Stochastic Purification | Harrison Gietz et.al. | 2406.13066 | link |
2024-06-18 | Traffic Prediction considering Multiple Levels of Spatial-temporal Information: A Multi-scale Graph Wavelet-based Approach | Zilin Bian et.al. | 2406.13038 | null |
2024-06-18 | Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalities | Matthew T. C. Li et.al. | 2406.13036 | null |
2024-06-18 | Data Plagiarism Index: Characterizing the Privacy Risk of Data-Copying in Tabular Generative Models | Joshua Ward et.al. | 2406.13012 | null |
2024-06-18 | Synergizing Foundation Models and Federated Learning: A Survey | Shenghui Li et.al. | 2406.12844 | null |
2024-06-18 | Evaluating the design space of diffusion-based generative models | Yuqing Wang et.al. | 2406.12839 | null |
2024-06-18 | Neural Approximate Mirror Maps for Constrained Diffusion Models | Berthy T. Feng et.al. | 2406.12816 | null |
2024-06-19 | AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation | Xinyu Hou et.al. | 2406.12805 | link |
2024-06-18 | Extracting Training Data from Unconditional Diffusion Models | Yunhao Chen et.al. | 2406.12752 | null |
2024-06-18 | Useful stochastic bounds in time-varying queues with service and patience times having general joint distribution | Shreehari Anand Bodas et.al. | 2406.12745 | null |
2024-06-18 | SUPER: Selfie Undistortion and Head Pose Editing with Identity Preservation | Polina Karpikova et.al. | 2406.12700 | null |
2024-06-18 | Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation | Miseul Kim et.al. | 2406.12688 | null |
2024-06-18 | GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models | Yongtao Ge et.al. | 2406.12671 | link |
2024-06-18 | Research and Implementation of Data Enhancement Techniques for Graph Neural Networks | Jingzhao Gu et.al. | 2406.12640 | null |
2024-06-18 | News Without Borders: Domain Adaptation of Multilingual Sentence Embeddings for Cross-lingual News Recommendation | Andreea Iana et.al. | 2406.12634 | link |
2024-06-18 | Learning Diffusion at Lightspeed | Antonio Terpin et.al. | 2406.12616 | null |
2024-06-18 | Unmasking the Veil: An Investigation into Concept Ablation for Privacy and Copyright Protection in Images | Shivank Garg et.al. | 2406.12592 | link |
2024-06-18 | Behavior-Dependent Linear Recurrent Units for Efficient Sequential Recommendation | Chengkai Liu et.al. | 2406.12580 | link |
2024-06-18 | Training Diffusion Models with Federated Learning | Matthijs de Goede et.al. | 2406.12575 | null |
2024-06-18 | P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts | Yuhao Dan et.al. | 2406.12548 | null |
2024-06-18 | Structured Detection for Simultaneous Super-Resolution and Optical Sectioning in Laser Scanning Microscopy | Alessandro Zunino et.al. | 2406.12542 | link |
2024-06-18 | Variational Distillation of Diffusion Policies into Mixture of Experts | Hongyi Zhou et.al. | 2406.12538 | null |
2024-06-18 | HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors | Panwang Pan et.al. | 2406.12459 | link |
2024-06-18 | Planning Using Schrödinger Bridge Diffusion Models | Adarsh Srivastava et.al. | 2406.12458 | link |
2024-06-18 | Deep Temporal Deaggregation: Large-Scale Spatio-Temporal Generative Models | David Bergström et.al. | 2406.12423 | null |
2024-06-18 | ROVER: RTL Optimization via Verified E-Graph Rewriting | Samuel Coward et.al. | 2406.12421 | null |
2024-06-18 | TADM: Temporally-Aware Diffusion Model for Neurodegenerative Progression on Brain MRI | Mattia Litrico et.al. | 2406.12411 | null |
2024-06-18 | SDNIA-YOLO: A Robust Object Detection Model for Extreme Weather Conditions | Yuexiong Ding et.al. | 2406.12395 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-12-19 | OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving | Shuo Xing et.al. | 2412.15208 | null |
2024-12-19 | LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation | Weijia Shi et.al. | 2412.15188 | null |
2024-12-19 | Qwen2.5 Technical Report | Qwen et.al. | 2412.15115 | null |
2024-12-19 | Progressive Multimodal Reasoning via Active Retrieval | Guanting Dong et.al. | 2412.14835 | null |
2024-12-19 | Explainable Tampered Text Detection via Multimodal Large Models | Chenfan Qu et.al. | 2412.14816 | null |
2024-12-18 | Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception | Yanpeng Sun et.al. | 2412.14233 | link |
2024-12-18 | AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities | Guillaume Astruc et.al. | 2412.14123 | link |
2024-12-19 | G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4o | Tony Cheng Tong et.al. | 2412.13647 | link |
2024-12-18 | Detecting Machine-Generated Music with Explainability -- A Challenge and Early Benchmarks | Yupei Li et.al. | 2412.13421 | null |
2024-12-17 | DoPTA: Improving Document Layout Analysis using Patch-Text Alignment | Nikitha SR et.al. | 2412.12902 | null |
2024-12-17 | Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models | YiFan Zhang et.al. | 2412.12606 | null |
2024-12-17 | PBVS 2024 Solution: Self-Supervised Learning and Sampling Strategies for SAR Classification in Extreme Long-Tail Distribution | Yuhyun Kim et.al. | 2412.12565 | null |
2024-12-17 | Causal Diffusion Transformers for Generative Modeling | Chaorui Deng et.al. | 2412.12095 | link |
2024-12-16 | CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology | Yuxuan Sun et.al. | 2412.12077 | null |
2024-12-16 | Gramian Multimodal Representation Learning and Alignment | Giordano Cicchetti et.al. | 2412.11959 | null |
2024-12-16 | LMM-Regularized CLIP Embeddings for Image Classification | Maria Tzelepi et.al. | 2412.11663 | null |
2024-12-15 | Seeing the Forest and the Trees: Solving Visual Graph and Tree Based Data Structure Problems using Large Multimodal Models | Sebastian Gutierrez et.al. | 2412.11088 | null |
2024-12-13 | Apollo: An Exploration of Video Understanding in Large Multimodal Models | Orr Zohar et.al. | 2412.10360 | null |
2024-12-13 | Performance of ChatGPT on tasks involving physics visual representations: the case of the Brief Electricity and Magnetism Assessment | Giulia Polverini et.al. | 2412.10019 | null |
2024-12-12 | Vision-Language Models Represent Darker-Skinned Black Individuals as More Homogeneous than Lighter-Skinned Black Individuals | Messi H. J. Lee et.al. | 2412.09668 | null |
2024-12-12 | Exemplar Masking for Multimodal Incremental Learning | Yi-Lun Lee et.al. | 2412.09549 | link |
2024-12-12 | Embeddings are all you need! Achieving High Performance Medical Image Classification through Training-Free Embedding Analysis | Raj Hansini Khoiwal et.al. | 2412.09445 | null |
2024-12-12 | Enhancing Modality Representation and Alignment for Multimodal Cold-start Active Learning | Meng Shen et.al. | 2412.09126 | null |
2024-12-12 | A Wander Through the Multimodal Landscape: Efficient Transfer Learning via Low-rank Sequence Multimodal Adapter | Zirun Guo et.al. | 2412.08979 | null |
2024-12-11 | StreamChat: Chatting with Streaming Video | Jihao Liu et.al. | 2412.08646 | null |
2024-12-11 | Multimodal Latent Language Modeling with Next-Token Diffusion | Yutao Sun et.al. | 2412.08635 | link |
2024-12-12 | Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis | Feng Zhou et.al. | 2412.08603 | null |
2024-12-11 | Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions | Mohammadmostafa Rostamkhani et.al. | 2412.08169 | link |
2024-12-10 | Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning | Can Yaras et.al. | 2412.07909 | null |
2024-12-10 | BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities | Sahal Shaji Mullappilly et.al. | 2412.07769 | link |
2024-12-10 | ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer | Jinyi Hu et.al. | 2412.07720 | link |
2024-12-13 | DriveMM: All-in-One Large Multimodal Model for Autonomous Driving | Zhijian Huang et.al. | 2412.07689 | link |
2024-12-10 | Driving with InternVL: Oustanding Champion in the Track on Driving with Language of the Autonomous Grand Challenge at CVPR 2024 | Jiahan Li et.al. | 2412.07247 | null |
2024-12-10 | Maya: An Instruction Finetuned Multilingual Multimodal Model | Nahid Alam et.al. | 2412.07112 | link |
2024-12-09 | How to Merge Your Multimodal Models Over Time? | Sebastian Dziadzio et.al. | 2412.06712 | link |
2024-12-09 | Ranked from Within: Ranking Large Multimodal Models for Visual Question Answering Without Labels | Weijie Tu et.al. | 2412.06461 | null |
2024-12-09 | iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models | Lianyu Hu et.al. | 2412.06263 | link |
2024-12-08 | A Self-Learning Multimodal Approach for Fake News Detection | Hao Chen et.al. | 2412.05843 | null |
2024-12-08 | SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation | Leigang Qu et.al. | 2412.05818 | null |
2024-12-07 | WavFusion: Towards wav2vec 2.0 Multimodal Speech Emotion Recognition | Feng Li et.al. | 2412.05558 | null |
2024-12-07 | Comprehensive Evaluation of Multimodal AI Models in Medical Imaging Diagnosis: From Data Augmentation to Preference-Based Comparison | Cailian Ruan et.al. | 2412.05536 | null |
2024-12-06 | Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling | Zhe Chen et.al. | 2412.05271 | link |
2024-12-05 | Lattice Lingo: Effect of Textual Detail on Multimodal Learning for Property Prediction of Crystals | Mrigi Munjal et.al. | 2412.04670 | null |
2024-12-05 | BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks | Juan Rodriguez et.al. | 2412.04626 | null |
2024-12-05 | MageBench: Bridging Large Multimodal Models to Agents | Miaosen Zhang et.al. | 2412.04531 | link |
2024-12-04 | Video Quality Assessment: A Comprehensive Survey | Qi Zheng et.al. | 2412.04508 | link |
2024-12-05 | SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model | Zhenglin Huang et.al. | 2412.04292 | null |
2024-12-05 | CALMM-Drive: Confidence-Aware Autonomous Driving with Large Multimodal Model | Ruoyu Yao et.al. | 2412.04209 | null |
2024-12-05 | AIpparel: A Large Multimodal Generative Model for Digital Garments | Kiyohiro Nakayama et.al. | 2412.03937 | null |
2024-12-05 | MegaCOIN: Enhancing Medium-Grained Color Perception for Vision-Language Models | Ming-Chang Chiu et.al. | 2412.03927 | link |
2024-12-04 | Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning | Wujian Peng et.al. | 2412.03565 | link |
2024-12-04 | Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning | Neale Ratzlaff et.al. | 2412.03467 | null |
2024-12-06 | SJTU:Spatial judgments in multimodal models towards unified segmentation through coordinate detection | Joongwon Chae et.al. | 2412.02565 | link |
2024-12-03 | Initial Study On Improving Segmentation By Combining Preoperative CT And Intraoperative CBCT Using Synthetic Data | Maximilian E. Tschuchnig et.al. | 2412.02294 | null |
2024-12-05 | CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy | Zhibo Yang et.al. | 2412.02210 | null |
2024-12-03 | VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding | Kangsan Kim et.al. | 2412.02186 | link |
2024-12-04 | Agri-LLaVA: Knowledge-Infused Large Multimodal Assistant on Agricultural Pests and Diseases | Liqiong Wang et.al. | 2412.02158 | link |
2024-12-02 | Attacks on multimodal models | Viacheslav Iablochnikov et.al. | 2412.01725 | link |
2024-12-02 | LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant | Yikun Liu et.al. | 2412.01720 | null |
2024-12-01 | VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation | Weiming Ren et.al. | 2412.00927 | null |
2024-11-30 | MaintAGT:Sim2Real-Guided Multimodal Large Model for Intelligent Maintenance with Chain-of-Thought Reasoning | Hongliang He et.al. | 2412.00481 | null |
2024-11-30 | Approximate Fiber Product: A Preliminary Algebraic-Geometric Perspective on Multimodal Embedding Alignment | Dongfang Zhao et.al. | 2412.00373 | null |
2024-12-04 | ROSE: Revolutionizing Open-Set Dense Segmentation with Patch-Wise Perceptual Large Multimodal Model | Kunyang Han et.al. | 2412.00153 | null |
2024-11-28 | Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers | Chancharik Mitra et.al. | 2412.00142 | null |
2024-12-02 | LUMIA: Linear probing for Unimodal and MultiModal Membership Inference Attacks leveraging internal LLM states | Luis Ibanez-Lissen et.al. | 2411.19876 | null |
2024-11-29 | SDR-GNN: Spectral Domain Reconstruction Graph Neural Network for Incomplete Multimodal Learning in Conversational Emotion Recognition | Fangze Fu et.al. | 2411.19822 | null |
2024-11-29 | JetFormer: An Autoregressive Generative Model of Raw Images and Text | Michael Tschannen et.al. | 2411.19722 | null |
2024-11-28 | Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs | Anirudh Phukan et.al. | 2411.19187 | null |
2024-11-28 | Examining Multimodal Gender and Content Bias in ChatGPT-4o | Roberto Balestri et.al. | 2411.19140 | null |
2024-11-28 | ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges | Rao Fu et.al. | 2411.18932 | link |
2024-11-27 | Active Data Curation Effectively Distills Large-Scale Multimodal Models | Vishaal Udandarao et.al. | 2411.18674 | null |
2024-11-27 | AMPS: ASR with Multimodal Paraphrase Supervision | Amruta Parulekar et.al. | 2411.18368 | null |
2024-12-03 | Large Language Model-Brained GUI Agents: A Survey | Chaoyun Zhang et.al. | 2411.18279 | link |
2024-11-27 | Grid-augumented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents | Joongwon Chae et.al. | 2411.18270 | link |
2024-11-27 | Multimodal Integration of Longitudinal Noninvasive Diagnostics for Survival Prediction in Immunotherapy Using Deep Learning | Melda Yeghaian et.al. | 2411.18253 | null |
2024-11-26 | NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects? | Jiaxuan Li et.al. | 2411.17794 | null |
2024-11-26 | Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis | Akshita Gupta et.al. | 2411.17690 | null |
2024-11-26 | AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM | Jiarui Wang et.al. | 2411.17221 | link |
2024-11-26 | Learning Robust Anymodal Segmentor with Unimodal and Cross-modal Distillation | Xu Zheng et.al. | 2411.17141 | link |
2024-11-26 | Relations, Negations, and Numbers: Looking for Logic in Generative Text-to-Image Models | Colin Conwell et.al. | 2411.17066 | link |
2024-11-26 | Multimodal Alignment and Fusion: A Survey | Songtao Li et.al. | 2411.17040 | null |
2024-11-27 | SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE | Yongwei Chen et.al. | 2411.16856 | null |
2024-11-23 | Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents | Jun Chen et.al. | 2411.16740 | link |
2024-11-26 | All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages | Ashmal Vayani et.al. | 2411.16508 | link |
2024-11-25 | Boosting 3D Object Generation through PBR Materials | Yitong Wang et.al. | 2411.16080 | null |
2024-11-24 | M3-CVC: Controllable Video Compression with Multimodal Generative Models | Rui Wan et.al. | 2411.15798 | null |
2024-11-23 | Knowledge Transfer Across Modalities with Natural Language Supervision | Carlo Alberto Barbano et.al. | 2411.15611 | null |
2024-11-23 | From Complexity to Parsimony: Integrating Latent Class Analysis to Uncover Multimodal Learning Patterns in Collaborative Learning | Lixiang Yan et.al. | 2411.15590 | null |
2024-11-23 | Botfip-LLM: An Enhanced Multimodal Scientific Computing Framework Leveraging Knowledge Distillation from Large Language Models | Tianhao Chen et.al. | 2411.15525 | null |
2024-11-23 | MambaVLT: Time-Evolving Multimodal State Space Model for Vision-Language Tracking | Xinqi Liu et.al. | 2411.15459 | null |
2024-11-23 | freePruner: A Training-free Approach for Large Multimodal Model Acceleration | Bingxin Xu et.al. | 2411.15446 | null |
2024-11-22 | PRIMUS: Pretraining IMU Encoders with Multimodal Self-Supervision | Arnav M. Das et.al. | 2411.15127 | null |
2024-11-22 | Large Multi-modal Models Can Interpret Features in Large Multi-modal Models | Kaichen Zhang et.al. | 2411.14982 | link |
2024-11-25 | Information Extraction from Heterogeneous Documents without Ground Truth Labels using Synthetic Label Generation and Knowledge Distillation | Aniket Bhattacharyya et.al. | 2411.14957 | null |
2024-11-22 | Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains | Yurii Paniv et.al. | 2411.14647 | null |
2024-11-21 | Generative AI for Music and Audio | Hao-Wen Dong et.al. | 2411.14627 | null |
2024-11-21 | FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers | Zehua Pei et.al. | 2411.14507 | null |
2024-11-21 | MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation Perspective | Hailang Huang et.al. | 2411.14062 | link |
2024-11-21 | Multimodal 3D Reasoning Segmentation with Complex Scenes | Xueying Jiang et.al. | 2411.13927 | null |
2024-11-20 | VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation | Ziyang Luo et.al. | 2411.13281 | null |
2024-11-19 | VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge | Vishwesh Nath et.al. | 2411.12915 | null |
2024-11-19 | Mitigating Perception Bias: A Training-Free Approach to Enhance LMM for Image Quality Assessment | Siyi Pan et.al. | 2411.12791 | null |
2024-11-18 | MMBind: Unleashing the Potential of Distributed and Heterogeneous Data for Multimodal Learning in IoT | Xiaomin Ouyang et.al. | 2411.12126 | null |
2024-11-17 | SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization | Hongrui Jia et.al. | 2411.11909 | link |
2024-11-18 | The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning | Longju Bai et.al. | 2411.11758 | link |
2024-11-18 | Artificial Scientific Discovery | Antonio Norelli et.al. | 2411.11672 | null |
2024-11-18 | InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models | Yu Yan et.al. | 2411.11394 | null |
2024-11-19 | SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach | Ruoxi Sun et.al. | 2411.11195 | null |
2024-11-16 | ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models | Vipula Rawte et.al. | 2411.10867 | null |
2024-11-19 | MLAN: Language-Based Instruction Tuning Improves Zero-Shot Generalization of Multimodal Large Language Models | Jianhong Tu et.al. | 2411.10557 | link |
2024-11-15 | Everything is a Video: Unifying Modalities through Next-Frame Prediction | G. Thomas Hudson et.al. | 2411.10503 | null |
2024-11-15 | Weakly-Supervised Multimodal Learning on MIMIC-CXR | Andrea Agostini et.al. | 2411.10356 | link |
2024-11-21 | Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era | Thanh Tam Nguyen et.al. | 2411.09955 | link |
2024-11-14 | Cross-Modal Consistency in Multimodal Large Language Models | Xiang Zhang et.al. | 2411.09273 | null |
2024-11-14 | SmartInv: Multimodal Learning for Smart Contract Invariant Inference | Sally Junsong Wang et.al. | 2411.09217 | null |
2024-11-13 | Multimodal Object Detection using Depth and Image Data for Manufacturing Parts | Nazanin Mahjourian et.al. | 2411.09062 | null |
2024-11-13 | Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions | Moran Yanuka et.al. | 2411.09018 | null |
2024-11-13 | AstroM |
Mariia Rizhko et.al. | 2411.08842 | null |
2024-11-13 | Multimodal Instruction Tuning with Hybrid State Space Models | Jianing Zhou et.al. | 2411.08840 | null |
2024-11-13 | Retrieval Augmented Recipe Generation | Guoshan Liu et.al. | 2411.08715 | null |
2024-11-12 | DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection | Shawn Li et.al. | 2411.08227 | link |
2024-11-12 | Leveraging Multimodal Models for Enhanced Neuroimaging Diagnostics in Alzheimer's Disease | Francesco Chiumento et.al. | 2411.07871 | null |
2024-11-12 | SparrowVQE: Visual Question Explanation for Course Content Understanding | Jialu Li et.al. | 2411.07516 | link |
2024-11-12 | BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions | Anas Awadalla et.al. | 2411.07461 | null |
2024-11-11 | Multimodal Fusion Balancing Through Game-Theoretic Regularization | Konstantinos Kontras et.al. | 2411.07335 | null |
2024-11-11 | OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision | Cong Wei et.al. | 2411.07199 | null |
2024-11-09 | M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework | Yew Ken Chia et.al. | 2411.06176 | null |
2024-11-09 | An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models | Fatemeh Shiri et.al. | 2411.06048 | link |
2024-11-08 | Towards Low-Resource Harmful Meme Detection with LMM Agents | Jianzhao Huang et.al. | 2411.05383 | link |
2024-11-08 | Exploring the Alignment Landscape: LLMs and Geometric Deep Models in Protein Representation | Dong Shu et.al. | 2411.05316 | link |
2024-11-07 | HourVideo: 1-Hour Video-Language Understanding | Keshigeyan Chandrasegaran et.al. | 2411.04998 | link |
2024-11-07 | VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos | Shehan Munasinghe et.al. | 2411.04923 | null |
2024-11-07 | Exploring Hierarchical Molecular Graph Representation in Multimodal LLMs | Chengxin Hu et.al. | 2411.04708 | null |
2024-11-06 | AutoGameUI: Constructing High-Fidelity Game UIs via Multimodal Learning and Interactive Web-Based Tool | Zhongliang Tang et.al. | 2411.03709 | null |
2024-11-05 | MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning | Ziliang Gan et.al. | 2411.03314 | null |
2024-11-05 | HumanVLM: Foundation for Human-Scene Vision-Language Model | Dawei Dai et.al. | 2411.03034 | null |
2024-11-05 | Toward Robust Incomplete Multimodal Sentiment Analysis via Hierarchical Representation Learning | Mingcheng Li et.al. | 2411.02793 | null |
2024-11-11 | INQUIRE: A Natural World Text-to-Image Retrieval Benchmark | Edward Vendrow et.al. | 2411.02537 | link |
2024-11-04 | See it, Think it, Sorted: Large Multimodal Models are Few-shot Time Series Anomaly Analyzers | Jiaxin Zhuang et.al. | 2411.02465 | null |
2024-11-07 | TableGPT2: A Large Multimodal Model with Tabular Data Integration | Aofeng Su et.al. | 2411.02059 | link |
2024-11-04 | Foundations and Recent Trends in Multimodal Mobile Agents: A Survey | Biao Wu et.al. | 2411.02006 | link |
2024-11-04 | KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension | Jie Yang et.al. | 2411.01846 | null |
2024-11-03 | EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark | Ming Li et.al. | 2411.01492 | null |
2024-11-03 | Classifier-guided Gradient Modulation for Enhanced Multimodal Learning | Zirun Guo et.al. | 2411.01409 | link |
2024-11-02 | LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding | Jian Chen et.al. | 2411.01106 | null |
2024-11-01 | Text2Freq: Learning Series Patterns from Text via Frequency Domain | Ming-Chih Lo et.al. | 2411.00929 | null |
2024-11-01 | V-LoRA: An Efficient and Flexible System Boosts Vision Applications with LoRA LMM | Liang Mi et.al. | 2411.00915 | null |
2024-11-01 | Analyzing Multimodal Integration in the Variational Autoencoder from an Information-Theoretic Perspective | Carlotta Langer et.al. | 2411.00522 | null |
2024-10-31 | TurtleBench: A Visual Programming Benchmark in Turtle Geometry | Sina Rismanchian et.al. | 2411.00264 | link |
2024-10-31 | ResiDual Transformer Alignment with Spectral Decomposition | Lorenzo Basile et.al. | 2411.00246 | null |
2024-10-31 | Nearest Neighbor Normalization Improves Multimodal Retrieval | Neil Chowdhury et.al. | 2410.24114 | link |
2024-11-04 | AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents | Yifan Xu et.al. | 2410.24024 | link |
2024-10-31 | Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models | Hao Yang et.al. | 2410.23861 | null |
2024-10-30 | CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP | Tianyu Yang et.al. | 2410.23330 | null |
2024-10-30 | EMMA: End-to-End Multimodal Model for Autonomous Driving | Jyh-Jing Hwang et.al. | 2410.23262 | null |
2024-10-29 | ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding | Kimihiro Hasegawa et.al. | 2410.22211 | link |
2024-10-29 | Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications | Monica Riedler et.al. | 2410.21943 | link |
2024-10-28 | AiSciVision: A Framework for Specializing Large Multimodal Models in Scientific Image Classification | Brendan Hogan et.al. | 2410.21480 | link |
2024-10-27 | Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse | Ryan Liu et.al. | 2410.21333 | null |
2024-10-28 | IndraEye: Infrared Electro-Optical UAV-based Perception Dataset for Robust Downstream Tasks | Manjunath D et.al. | 2410.20953 | link |
2024-10-27 | Generator Matching: Generative modeling with arbitrary Markov processes | Peter Holderrieth et.al. | 2410.20587 | null |
2024-10-27 | PaPaGei: Open Foundation Models for Optical Physiological Signals | Arvind Pillai et.al. | 2410.20542 | link |
2024-10-25 | Turn-by-Turn Indoor Navigation for the Visually Impaired | Santosh Srinivasaiah et.al. | 2410.19954 | null |
2024-10-25 | A Multimodal Approach For Endoscopic VCE Image Classification Using BiomedCLIP-PubMedBERT | Nagarajan Ganapathy et.al. | 2410.19944 | link |
2024-10-25 | OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization | Hongliang He et.al. | 2410.19609 | link |
2024-10-24 | Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant | Abhirama Subramanyam Penamakuri et.al. | 2410.19144 | link |
2024-10-24 | VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks | Lawrence Jang et.al. | 2410.19100 | null |
2024-10-24 | CAMEL-Bench: A Comprehensive Arabic LMM Benchmark | Sara Ghaboura et.al. | 2410.18976 | link |
2024-10-24 | Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques | David Ortiz-Perez et.al. | 2410.18972 | null |
2024-10-24 | OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning | Xiaoqiang Wang et.al. | 2410.18963 | null |
2024-10-24 | A Survey of Multimodal Sarcasm Detection | Shafkat Farabi et.al. | 2410.18882 | null |
2024-10-27 | R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models | Linger Deng et.al. | 2410.17885 | link |
2024-10-22 | JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation | Shota Onohara et.al. | 2410.17250 | null |
2024-10-22 | An Eye for an AI: Evaluating GPT-4o's Visual Perception Skills and Geometric Reasoning Skills Using Computer Graphics Questions | Tony Haoran Feng et.al. | 2410.16991 | null |
2024-10-21 | DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding | Manan Suri et.al. | 2410.16472 | null |
2024-10-21 | Promoting cross-modal representations to improve multimodal foundation models for physiological signals | Ching Fang et.al. | 2410.16424 | null |
2024-10-22 | Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance | Zhangwei Gao et.al. | 2410.16261 | link |
2024-10-22 | MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report | Samrajya Thapa et.al. | 2410.16239 | link |
2024-10-21 | Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models | Yufei Zhan et.al. | 2410.16163 | link |
2024-10-21 | LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset | Ruikun Zhang et.al. | 2410.16095 | link |
2024-10-21 | How to Build a Pre-trained Multimodal model for Simultaneously Chatting and Decision-making? | Zuojin Tang et.al. | 2410.15885 | null |
2024-10-21 | Multimodal Learning for Embryo Viability Prediction in Clinical IVF | Junsik Kim et.al. | 2410.15581 | null |
2024-10-20 | IPO: Interpretable Prompt Optimization for Vision-Language Models | Yingjun Du et.al. | 2410.15397 | link |
2024-10-20 | Modality-Fair Preference Optimization for Trustworthy MLLM Alignment | Songtao Jiang et.al. | 2410.15334 | null |
2024-10-19 | ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla | Deeparghya Dutta Barua et.al. | 2410.14991 | null |
2024-10-19 | SemiHVision: Enhancing Medical Multimodal Models with a Semi-Human Annotated Dataset and Fine-Tuned Instruction Generation | Junda Wang et.al. | 2410.14948 | link |
2024-10-18 | Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension | Yin Xie et.al. | 2410.14332 | link |
2024-10-18 | Personalized Image Generation with Large Multimodal Models | Yiyan Xu et.al. | 2410.14170 | null |
2024-10-18 | Coherence-Driven Multimodal Safety Dialogue with Active Learning for Embodied Agents | Sabit Hassan et.al. | 2410.14141 | null |
2024-10-17 | Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation | Chengyue Wu et.al. | 2410.13848 | link |
2024-10-18 | Harnessing Webpage UIs for Text-Rich Visual Understanding | Junpeng Liu et.al. | 2410.13824 | null |
2024-10-17 | Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR | Abhishek Gupta et.al. | 2410.13445 | null |
2024-10-16 | The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio | Sicong Leng et.al. | 2410.12787 | null |
2024-10-16 | HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks | Fengji Zhang et.al. | 2410.12381 | link |
2024-10-15 | CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning | Qingqing Cao et.al. | 2410.11963 | null |
2024-10-15 | Generalizable Spacecraft Trajectory Generation via Multimodal Learning with Transformers | Davide Celestini et.al. | 2410.11723 | null |
2024-10-15 | Unveiling the Mystery of Visual Attributes of Concrete and Abstract Concepts: Variability, Nearest Neighbors, and Challenging Categories | Tarun Tater et.al. | 2410.11657 | link |
2024-10-15 | On-the-fly Modulation for Balanced Multimodal Learning | Yake Wei et.al. | 2410.11582 | link |
2024-10-15 | Enhancing Unimodal Latent Representations in Multimodal VAEs through Iterative Amortized Inference | Yuta Oshima et.al. | 2410.11403 | null |
2024-10-14 | Saliency Guided Optimization of Diffusion Latents | Xiwen Wang et.al. | 2410.10257 | null |
2024-10-14 | MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models | Peng Xia et.al. | 2410.10139 | link |
2024-10-13 | LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models | Junyan Ye et.al. | 2410.09732 | null |
2024-10-12 | Reconstructive Visual Instruction Tuning | Haochen Wang et.al. | 2410.09575 | null |
2024-10-11 | Can GPTs Evaluate Graphic Design Based on Design Principles? | Daichi Haraguchi et.al. | 2410.08885 | null |
2024-10-11 | VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding | Houlun Chen et.al. | 2410.08593 | link |
2024-10-10 | ElasticTok: Adaptive Tokenization for Image and Video | Wilson Yan et.al. | 2410.08368 | null |
2024-10-10 | Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts | Sukwon Yun et.al. | 2410.08245 | link |
2024-10-10 | LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts | Anh-Quan Cao et.al. | 2410.08211 | null |
2024-10-10 | Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision | Shengcao Cao et.al. | 2410.08209 | null |
2024-10-10 | MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models | Wenbo Hu et.al. | 2410.08182 | null |
2024-10-10 | Generated Bias: Auditing Internal Bias Dynamics of Text-To-Image Generative Models | Abhishek Mandal et.al. | 2410.07884 | null |
2024-10-09 | The Cognitive Capabilities of Generative AI: A Comparative Analysis with Human Benchmarks | Isaac R. Galatzer-Levy et.al. | 2410.07391 | null |
2024-10-12 | Deep Correlated Prompting for Visual Recognition with Missing Modalities | Lianyu Hu et.al. | 2410.06558 | link |
2024-10-11 | Chip-Tuning: Classify Before Language Models Say | Fangwei Zhu et.al. | 2410.06541 | link |
2024-10-09 | Does Spatial Cognition Emerge in Frontier Models? | Santhosh Kumar Ramakrishnan et.al. | 2410.06468 | null |
2024-10-08 | Multimodal Representation Learning using Adaptive Graph Construction | Weichen Huang et.al. | 2410.06395 | null |
2024-10-08 | Temporal Image Caption Retrieval Competition -- Description and Results | Jakub Pokrywka et.al. | 2410.06314 | null |
2024-10-08 | PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling | Xudong Xie et.al. | 2410.05970 | link |
2024-10-08 | ModalPrompt:Dual-Modality Guided Prompt for Continual Learning of Large Multimodal Models | Fanhu Zeng et.al. | 2410.05849 | null |
2024-10-08 | Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond | Soyeon Caren Han et.al. | 2410.05608 | link |
2024-10-08 | TeaserGen: Generating Teasers for Long Documentaries | Weihan Xu et.al. | 2410.05586 | null |
2024-10-07 | R-Bench: Are your Large Multimodal Model Robust to Real-world Corruptions? | Chunyi Li et.al. | 2410.05474 | link |
2024-10-07 | RespLLM: Unifying Audio and Text with Multimodal LLMs for Generalized Respiratory Health Prediction | Yuwei Zhang et.al. | 2410.05361 | null |
2024-10-07 | Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models | Dehong Kong et.al. | 2410.04884 | null |
2024-10-06 | VISTA: A Visual and Textual Attention Dataset for Interpreting Multimodal Models | Harshit et.al. | 2410.04609 | null |
2024-10-06 | UniMuMo: Unified Text, Music and Motion Generation | Han Yang et.al. | 2410.04534 | link |
2024-10-08 | Gamified crowd-sourcing of high-quality data for visual fine-tuning | Shashank Yadav et.al. | 2410.04038 | null |
2024-10-07 | Multimodal Point-of-Interest Recommendation | Yuta Kanzawa et.al. | 2410.03265 | null |
2024-10-04 | Bridging the Gap between Text, Audio, Image, and Any Sequence: A Novel Approach using Gloss-based Annotation | Sen Fang et.al. | 2410.03146 | null |
2024-10-04 | AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark | Wenhao Chai et.al. | 2410.03051 | null |
2024-10-07 | CPFD: Confidence-aware Privileged Feature Distillation for Short Video Classification | Jinghao Shi et.al. | 2410.03038 | null |
2024-10-07 | MMP: Towards Robust Multi-Modal Learning with Masked Modality Projection | Niki Nezakati et.al. | 2410.03010 | null |
2024-10-03 | Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos | Jianrui Zhang et.al. | 2410.02763 | null |
2024-10-03 | Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models | Zhengfeng Lai et.al. | 2410.02740 | null |
2024-10-04 | Video Instruction Tuning With Synthetic Data | Yuanhan Zhang et.al. | 2410.02713 | null |
2024-10-03 | LLaVA-Critic: Learning to Evaluate Multimodal Models | Tianyi Xiong et.al. | 2410.02712 | null |
2024-10-03 | Plots Unlock Time-Series Understanding in Multimodal Models | Mayank Daswani et.al. | 2410.02637 | null |
2024-10-02 | Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations | Minoh Jeong et.al. | 2410.02086 | null |
2024-10-02 | Toward a Holistic Evaluation of Robustness in CLIP Models | Weijie Tu et.al. | 2410.01534 | null |
2024-10-02 | SHAP-CAT: A interpretable multi-modal framework enhancing WSI classification via virtual staining and shapley-value-based multimodal fusion | Jun Wang et.al. | 2410.01408 | null |
2024-10-02 | Backdooring Vision-Language Models with Out-Of-Distribution Data | Weimin Lyu et.al. | 2410.01264 | null |
2024-10-02 | OCC-MLLM:Empowering Multimodal Large Language Model For the Understanding of Occluded Objects | Wenmo Qiu et.al. | 2410.01261 | null |
2024-09-30 | Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning | Weitai Kang et.al. | 2410.00255 | link |
2024-09-30 | Using Large Multimodal Models to Extract Knowledge Components for Knowledge Tracing from Multimedia Question Information | Hyeongdon Moon et.al. | 2409.20167 | link |
2024-10-02 | Visual Context Window Extension: A New Perspective for Long Video Understanding | Hongchen Wei et.al. | 2409.20018 | null |
2024-09-30 | Towards Robust Multimodal Sentiment Analysis with Incomplete Data | Haoyu Zhang et.al. | 2409.20012 | link |
2024-09-28 | FairPIVARA: Reducing and Assessing Biases in CLIP-Based Multimodal Models | Diego A. B. Moreira et.al. | 2409.19474 | link |
2024-09-28 | From Unimodal to Multimodal: Scaling up Projectors to Align Modalities | Mayug Maniparambil et.al. | 2409.19425 | null |
2024-10-02 | CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling | Jihai Zhang et.al. | 2409.19291 | link |
2024-09-28 | TrojVLM: Backdoor Attack Against Vision Language Models | Weimin Lyu et.al. | 2409.19232 | null |
2024-09-27 | Multimodal Markup Document Models for Graphic Design Completion | Kotaro Kikuchi et.al. | 2409.19051 | null |
2024-09-27 | Emu3: Next-Token Prediction is All You Need | Xinlong Wang et.al. | 2409.18869 | null |
2024-09-27 | Data Analysis in the Era of Generative AI | Jeevana Priya Inala et.al. | 2409.18475 | null |
2024-09-26 | MultiClimate: Multimodal Stance Detection on Climate Change Videos | Jiawen Wang et.al. | 2409.18346 | link |
2024-09-26 | LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness | Chenming Zhu et.al. | 2409.18125 | null |
2024-09-26 | GSON: A Group-based Social Navigation Framework with Large Multimodal Model | Shangyi Luo et.al. | 2409.18084 | null |
2024-09-26 | A Multimodal Single-Branch Embedding Network for Recommendation in Cold-Start and Missing Modality Scenarios | Christian Ganhör et.al. | 2409.17864 | link |
2024-09-26 | Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification | Raja Kumar et.al. | 2409.17777 | link |
2024-09-26 | MIO: A Foundation Model on Multimodal Tokens | Zekun Wang et.al. | 2409.17692 | link |
2024-09-25 | Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models | Matt Deitke et.al. | 2409.17146 | link |
2024-09-24 | CDChat: A Large Multimodal Model for Remote Sensing Change Description | Mubashir Noman et.al. | 2409.16261 | link |
2024-09-24 | CLSP: High-Fidelity Contrastive Language-State Pre-training for Agent State Representation | Fuxian Huang et.al. | 2409.15806 | null |
2024-09-18 | Recommendation with Generative Models | Yashar Deldjoo et.al. | 2409.15173 | null |
2024-09-23 | With Ears to See and Eyes to Hear: Sound Symbolism Experiments with Multimodal Large Language Models | Tyler Loakman et.al. | 2409.14917 | link |
2024-09-22 | Patch Ranking: Efficient CLIP by Learning to Rank Local Patches | Cheng-En Wu et.al. | 2409.14607 | null |
2024-09-22 | Can-Do! A Dataset and Neuro-Symbolic Grounded Framework for Embodied Planning with Large Multimodal Models | Yew Ken Chia et.al. | 2409.14277 | null |
2024-09-20 | Brain-Cognition Fingerprinting via Graph-GCCA with Contrastive Learning | Yixin Wang et.al. | 2409.13887 | null |
2024-09-20 | Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model | Li Zhou et.al. | 2409.13407 | link |
2024-09-20 | A Novel Adaptive Fine-Tuning Algorithm for Multimodal Models: Self-Optimizing Classification and Selection of High-Quality Datasets in Remote Sensing | Yi Ren et.al. | 2409.13345 | null |
2024-09-20 | ChemDFM-X: Towards Large Multimodal Model for Chemistry | Zihan Zhao et.al. | 2409.13194 | null |
2024-09-19 | MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines | Dongzhi Jiang et.al. | 2409.12959 | null |
2024-09-24 | TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation | Junjie Wen et.al. | 2409.12514 | null |
2024-09-18 | Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution | Peng Wang et.al. | 2409.12191 | link |
2024-09-18 | All-in-one foundational models learning across quantum chemical levels | Yuxinxin Chen et.al. | 2409.12015 | link |
2024-09-18 | LMMCoDrive: Cooperative Driving with Large Multimodal Model | Haichao Liu et.al. | 2409.11981 | link |
2024-09-16 | MusicLIME: Explainable Multimodal Music Understanding | Theodoros Sotirou et.al. | 2409.10496 | link |
2024-09-19 | IRIS: Interactive Responsive Intelligent Segmentation for 3D Affordance Analysis | Meng Chu et.al. | 2409.10078 | null |
2024-09-16 | AceParse: A Comprehensive Dataset with Diverse Structured Texts for Academic Literature Parsing | Huawei Ji et.al. | 2409.10016 | link |
2024-09-14 | Keypoints-Integrated Instruction-Following Data Generation for Enhanced Human Pose Understanding in Multimodal Models | Dewen Zhang et.al. | 2409.09306 | null |
2024-09-13 | Interactive Masked Image Modeling for Multimodal Object Detection in Remote Sensing | Minh-Duc Vu et.al. | 2409.08885 | null |
2024-09-13 | A Multimodal Approach for Fluid Overload Prediction: Integrating Lung Ultrasound and Clinical Data | Tianqi Yang et.al. | 2409.08790 | null |
2024-09-13 | Dynamics of Collective Group Affect: Group-level Annotations and the Multimodal Modeling of Convergence and Divergence | Navin Raj Prabhu et.al. | 2409.08578 | null |
2024-09-13 | A Comprehensive Survey on Deep Multimodal Learning with Missing Modality | Renjie Wu et.al. | 2409.07825 | null |
2024-09-12 | Top-down Activity Representation Learning for Video Question Answering | Yanan Wang et.al. | 2409.07748 | null |
2024-09-11 | What to align in multimodal contrastive learning? | Benoit Dufumier et.al. | 2409.07402 | null |
2024-09-11 | MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis | Hanyu Jiang et.al. | 2409.07129 | null |
2024-09-11 | FSMDet: Vision-guided feature diffusion for fully sparse 3D detector | Tianran Liu et.al. | 2409.06945 | null |
2024-09-16 | Scaling Law Hypothesis for Multimodal Model | Qingyun Sun et.al. | 2409.06754 | null |
2024-09-10 | Multiclass Arrhythmia Classification using Smartwatch Photoplethysmography Signals Collected in Real-life Settings | Dong Han et.al. | 2409.06147 | null |
2024-09-11 | A Survey of Multimodal Composite Editing and Retrieval | Suyan Li et.al. | 2409.05405 | link |
2024-09-05 | Learning in Order! A Sequential Strategy to Learn Invariant Features for Multimodal Sentiment Analysis | Xianbing Zhao et.al. | 2409.04473 | null |
2024-09-06 | Generating Faithful and Salient Text from Multimodal Data | Tahsina Hashem et.al. | 2409.03961 | link |
2024-09-06 | CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models | Wentao Liu et.al. | 2409.02834 | link |
2024-09-10 | MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark | Xiang Yue et.al. | 2409.02813 | null |
2024-09-04 | Understanding eGFR Trajectories and Kidney Function Decline via Large Multimodal Models | Chih-Yuan Li et.al. | 2409.02530 | null |
2024-09-03 | Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models | Bin Fu et.al. | 2409.01560 | null |
2024-09-03 | Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition | Yaozong Gan et.al. | 2409.01534 | null |
2024-09-02 | Towards General Industrial Intelligence: A Survey on IIoT-Enhanced Continual Large Models | Jiao Chen et.al. | 2409.01207 | null |
2024-09-02 | Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information | Yi Chen et.al. | 2409.01179 | null |
2024-08-31 | Comparative Analysis of Modality Fusion Approaches for Audio-Visual Person Identification and Verification | Aref Farhadipour et.al. | 2409.00562 | null |
2024-08-30 | UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios | Baichuan Zhou et.al. | 2408.17267 | null |
2024-08-29 | Seeking the Sufficiency and Necessity Causal Features in Multimodal Representation Learning | Boyu Chen et.al. | 2408.16577 | null |
2024-08-29 | Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach | Yifei Chen et.al. | 2408.16343 | link |
2024-08-28 | Meta-Learn Unimodal Signals with Weak Supervision for Multimodal Sentiment Analysis | Sijie Mai et.al. | 2408.16029 | null |
2024-08-28 | ModalityMirror: Improving Audio Classification in Modality Heterogeneity Federated Learning with Multimodal Distillation | Tiantian Feng et.al. | 2408.15803 | null |
2024-08-28 | Visual Prompt Engineering for Medical Vision Language Models in Radiology | Stefan Denner et.al. | 2408.15802 | null |
2024-08-27 | X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation | Hanjia Lyu et.al. | 2408.15172 | null |
2024-08-27 | The Benefits of Balance: From Information Projections to Variance Reduction | Lang Liu et.al. | 2408.15065 | null |
2024-08-27 | NeuralOOD: Improving Out-of-Distribution Generalization Performance with Brain-machine Fusion Learning Framework | Shuangchen Zhao et.al. | 2408.14950 | null |
2024-08-26 | MMR: Evaluating Reading Ability of Large Multimodal Models | Jian Chen et.al. | 2408.14594 | null |
2024-09-03 | Foundation Models for Music: A Survey | Yinghao Ma et.al. | 2408.14340 | link |
2024-08-26 | LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models | Qihang Ge et.al. | 2408.14008 | null |
2024-08-27 | Quantum Multimodal Contrastive Learning Framework | Chi-Sheng Chen et.al. | 2408.13919 | null |
2024-08-25 | Tangram: A Challenging Benchmark for Geometric Element Recognizing | Jiamin Tang et.al. | 2408.13854 | null |
2024-08-25 | Multimodal Ensemble with Conditional Feature Fusion for Dysgraphia Diagnosis in Children from Handwriting Samples | Jayakanth Kunhoth et.al. | 2408.13754 | null |
2024-08-24 | Preliminary Investigations of a Multi-Faceted Robust and Synergistic Approach in Semiconductor Electron Micrograph Analysis: Integrating Vision Transformers with Large Language and Multimodal Models | Sakhinana Sagar Srinivas et.al. | 2408.13621 | null |
2024-08-23 | Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption | Sakhinana Sagar Srinivas et.al. | 2408.13248 | null |
2024-08-23 | Indoor scene recognition from images under visual corruptions | Willams de Lima Costa et.al. | 2408.13029 | null |
2024-08-23 | Ada2I: Enhancing Modality Balance for Multimodal Conversational Emotion Recognition | Cam-Van Thi Nguyen et.al. | 2408.12895 | null |
2024-08-23 | Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive Survey | Qika Lin et.al. | 2408.12880 | link |
2024-08-22 | Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models | Jean Park et.al. | 2408.12763 | null |
2024-08-22 | Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization | Luyao Cheng et.al. | 2408.12102 | null |
2024-08-22 | Mental-Perceiver: Audio-Textual Multimodal Learning for Mental Health Assessment | Jinghui Qin et.al. | 2408.12088 | null |
2024-08-21 | GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models | Jonathan Roberts et.al. | 2408.11817 | null |
2024-08-21 | D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal models | M. Forlini et.al. | 2408.11761 | null |
2024-08-21 | UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation | Xiangyu Zhao et.al. | 2408.11305 | link |
2024-08-21 | BearLLM: A Prior Knowledge-Enhanced Bearing Health Management Framework with Unified Vibration Signal Representation | Haotian Peng et.al. | 2408.11281 | link |
2024-08-20 | Exploring the use of Generative AI to Support Automated Just-in-Time Programming for Visual Scene Displays | Cynthia Zastudil et.al. | 2408.11137 | null |
2024-08-21 | SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition | Zebang Cheng et.al. | 2408.10500 | link |
2024-08-19 | Enhance Modality Robustness in Text-Centric Multimodal Alignment with Adversarial Prompting | Yun-Da Tsai et.al. | 2408.09798 | null |
2024-08-19 | Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation | Yunxin Li et.al. | 2408.09787 | link |
2024-08-18 | PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding | Dawei Dai et.al. | 2408.09530 | link |
2024-08-17 | Measuring Visual Sycophancy in Multimodal Models | Jaehyuk Lim et.al. | 2408.09111 | link |
2024-08-16 | AdaRank: Disagreement Based Module Rank Prediction for Low-rank Adaptation | Yihe Dong et.al. | 2408.09015 | link |
2024-08-16 | xGen-MM (BLIP-3): A Family of Open Large Multimodal Models | Le Xue et.al. | 2408.08872 | null |
2024-08-16 | Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs | Jinming Liu et.al. | 2408.08575 | null |
2024-08-15 | LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning | Jiajie Li et.al. | 2408.07981 | null |
2024-08-15 | MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark | Minxuan Zhou et.al. | 2408.07543 | link |
2024-08-14 | Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach | Muhammad Saad Saeed et.al. | 2408.07445 | null |
2024-08-14 | Robust Semi-supervised Multimodal Medical Image Segmentation via Cross Modality Collaboration | Xiaogen Zhon et.al. | 2408.07341 | link |
2024-08-14 | Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion | Peiyuan Chen et.al. | 2408.07303 | null |
2024-08-13 | PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology | Xiaomin Wu et.al. | 2408.07037 | null |
2024-08-13 | EditScribe: Non-Visual Image Editing with Natural Language Verification Loops | Ruei-Che Chang et.al. | 2408.06632 | null |
2024-08-13 | CROME: Cross-Modal Adapters for Efficient Multimodal LLM | Sayna Ebrahimi et.al. | 2408.06610 | null |
2024-08-13 | Prioritizing Modalities: Flexible Importance Scheduling in Federated Multimodal Learning | Jieming Bian et.al. | 2408.06549 | null |
2024-08-12 | VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents | Xiao Liu et.al. | 2408.06327 | link |
2024-08-11 | HateSieve: A Contrastive Learning Framework for Detecting and Segmenting Hateful Content in Multimodal Memes | Xuanyu Su et.al. | 2408.05794 | null |
2024-08-08 | Enhancing Journalism with AI: A Study of Contextualized Image Captioning for News Articles using LLMs and LMMs | Aliki Anagnostopoulou et.al. | 2408.04331 | null |
2024-08-06 | LLaVA-OneVision: Easy Visual Task Transfer | Bo Li et.al. | 2408.03326 | link |
2024-08-06 | Multitask and Multimodal Neural Tuning for Large Models | Hao Sun et.al. | 2408.03001 | null |
2024-08-06 | Body of Her: A Preliminary Study on End-to-End Humanoid Agent | Tenglong Ao et.al. | 2408.02879 | null |
2024-08-04 | Distribution-Level Memory Recall for Continual Learning: Preserving Knowledge and Avoiding Confusion | Shaoxu Cheng et.al. | 2408.02695 | null |
2024-08-02 | A Systematic Review of Intermediate Fusion in Multimodal Deep Learning for Biomedical Applications | Valerio Guarrasi et.al. | 2408.02686 | null |
2024-08-05 | REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models | Agneet Chatterjee et.al. | 2408.02231 | null |
2024-08-04 | CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization | Xiang He et.al. | 2408.01952 | link |
2024-08-02 | MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models | Benno Weck et.al. | 2408.01337 | link |
2024-08-05 | Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions | Jin Gao et.al. | 2408.01091 | link |
2024-08-02 | GraphAge: Unleashing the power of Graph Neural Network to Decode Epigenetic Aging | Saleh Sakib Ahmed et.al. | 2408.00984 | link |
2024-08-01 | MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities | Weihao Yu et.al. | 2408.00765 | link |
2024-08-01 | GalleryGPT: Analyzing Paintings with Large Multimodal Models | Yi Bin et.al. | 2408.00491 | link |
2024-08-01 | Everything We Hear: Towards Tackling Misinformation in Podcasts | Sachin Pathiyan Cherumanal et.al. | 2408.00292 | null |
2024-08-01 | OmniParser for Pure Vision Based GUI Agent | Yadong Lu et.al. | 2408.00203 | null |
2024-07-30 | Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection | Jinfa Huang et.al. | 2407.21004 | null |
2024-07-30 | HyperMM : Robust Multimodal Learning with Varying-sized Inputs | Hava Chaptoukaev et.al. | 2407.20768 | null |
2024-07-30 | Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos | Dhruv Verma et.al. | 2407.20642 | link |
2024-07-29 | Adversarial Robustness in RGB-Skeleton Action Recognition: Leveraging Attention Modality Reweighter | Chao Liu et.al. | 2407.19981 | null |
2024-07-29 | ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2 | Wenjun Huang et.al. | 2407.19832 | null |
2024-08-02 | XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training | Biao Wu et.al. | 2407.19546 | link |
2024-07-28 | Detached and Interactive Multimodal Learning | Yunfeng Fan et.al. | 2407.19514 | link |
2024-07-27 | Data Processing Techniques for Modern Multimodal Models | Yinheng Li et.al. | 2407.19180 | null |
2024-07-26 | MangaUB: A Manga Understanding Benchmark for Large Multimodal Models | Hikaru Ikuta et.al. | 2407.19034 | null |
2024-07-26 | Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment | Yuze Zheng et.al. | 2407.18854 | null |
2024-07-26 | ChatSchema: A pipeline of extracting structured information with Large Multimodal Models based on schema | Fei Wang et.al. | 2407.18716 | null |
2024-07-25 | Sparse vs Contiguous Adversarial Pixel Perturbations in Multimodal Models: An Empirical Analysis | Cristian-Alexandru Botocan et.al. | 2407.18251 | link |
2024-07-25 | Vlad Sobal et.al. | 2407.18134 | null | |
2024-07-25 | Cross-Vendor Reproducibility of Radiomics-based Machine Learning Models for Computer-aided Diagnosis | Jatin Chaudhary et.al. | 2407.18060 | null |
2024-07-25 | What does Kiki look like? Cross-modal associations between speech sounds and visual shapes in vision-and-language models | Tessa Verhoef et.al. | 2407.17974 | null |
2024-07-25 | Shapley Value-based Contrastive Alignment for Multimodal Information Extraction | Wen Luo et.al. | 2407.17854 | null |
2024-07-25 | Enhancing Model Performance: Another Approach to Vision-Language Instruction Tuning | Vedanshu et.al. | 2407.17813 | null |
2024-07-25 | KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models | Eunice Yiu et.al. | 2407.17773 | link |
2024-07-24 | Testing Large Language Models on Driving Theory Knowledge and Skills for Connected Autonomous Vehicles | Zuoyin Tang et.al. | 2407.17211 | null |
2024-07-23 | Chameleon: Images Are What You Need For Multimodal Learning Robust To Missing Modalities | Muhammad Irzam Liaqat et.al. | 2407.16243 | null |
2024-07-22 | LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding | Haoning Wu et.al. | 2407.15754 | link |
2024-07-22 | Resource-Efficient Federated Multimodal Learning via Layer-wise and Progressive Training | Ye Lin Tun et.al. | 2407.15426 | null |
2024-07-21 | VideoGameBunny: Towards vision assistants for video games | Mohammad Reza Taesiri et.al. | 2407.15295 | null |
2024-07-22 | Patch-based Intuitive Multimodal Prototypes Network (PIMPNet) for Alzheimer's Disease classification | Lisa Anita De Santi et.al. | 2407.14277 | link |
2024-07-18 | Visual Haystacks: Answering Harder Questions About Sets of Images | Tsung-Han Wu et.al. | 2407.13766 | link |
2024-07-17 | Text- and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild | Nicolas Richet et.al. | 2407.12927 | link |
2024-07-16 | ChatBCG: Can AI Read Your Slide Deck? | Nikita Singh et.al. | 2407.12875 | null |
2024-07-17 | LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models | Kaichen Zhang et.al. | 2407.12772 | link |
2024-07-17 | Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models | Donggeun Kim et.al. | 2407.12616 | null |
2024-07-17 | E5-V: Universal Embeddings with Multimodal Large Language Models | Ting Jiang et.al. | 2407.12580 | link |
2024-07-16 | FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models | Pengxiang Li et.al. | 2407.11522 | null |
2024-07-16 | COMET: "Cone of experience" enhanced large multimodal model for mathematical problem generation | Sannyuya Liu et.al. | 2407.11315 | null |
2024-07-15 | OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models | Zijian Zhou et.al. | 2407.11213 | link |
2024-07-15 | FabGPT: An Efficient Large Multimodal Model for Complex Wafer Defect Knowledge Queries | Yuqi Jiang et.al. | 2407.10810 | null |
2024-07-15 | Scaling 3D Reasoning with LMMs to Large Robot Mission Environments Using Datagraphs | W. J. Meijer et.al. | 2407.10743 | null |
2024-07-16 | Qwen2 Technical Report | An Yang et.al. | 2407.10671 | link |
2024-07-15 | How and where does CLIP process negation? | Vincent Quantmeyer et.al. | 2407.10488 | null |
2024-07-12 | Diagnosing and Re-learning for Balanced Multimodal Learning | Yake Wei et.al. | 2407.09705 | link |
2024-07-12 | Unifying Sequences, Structures, and Descriptions for Any-to-Any Protein Generation with the Large Multimodal Model HelixProtX | Zhiyuan Chen et.al. | 2407.09274 | link |
2024-07-12 | DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training | Chen Xin et.al. | 2407.09174 | link |
2024-07-11 | Emerging Practices for Large Multimodal Model (LMM) Assistance for People with Visual Impairments: Implications for Design | Jingyi Xie et.al. | 2407.08882 | null |
2024-07-10 | RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization | Xijie Huang et.al. | 2407.08044 | link |
2024-07-10 | LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models | Feng Li et.al. | 2407.07895 | link |
2024-07-11 | InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior | Chenguo Lin et.al. | 2407.07580 | null |
2024-07-10 | Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model | Wenqi Zhang et.al. | 2407.07053 | link |
2024-07-08 | ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation | Ethan Chern et.al. | 2407.06135 | link |
2024-07-07 | Multimodal Language Models for Domain-Specific Procedural Video Summarization | Nafisa Hussain et.al. | 2407.05419 | null |
2024-07-07 | Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition | Zirun Guo et.al. | 2407.05374 | link |
2024-07-06 | Enhance the Robustness of Text-Centric Multimodal Alignments | Ting-Yu Yen et.al. | 2407.05036 | null |
2024-07-06 | Completed Feature Disentanglement Learning for Multimodal MRIs Analysis | Tianling Liu et.al. | 2407.04916 | null |
2024-07-06 | MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension | Zekun Li et.al. | 2407.04903 | link |
2024-07-05 | VCoME: Verbal Video Composition with Multimodal Editing Effects | Weibo Gong et.al. | 2407.04697 | null |
2024-07-05 | Multimodal Classification via Modal-Aware Interactive Enhancement | Qing-Yuan Jiang et.al. | 2407.04587 | null |
2024-07-05 | Robust Multimodal Learning via Representation Decoupling | Shicai Wei et.al. | 2407.04458 | null |
2024-07-05 | Smart Vision-Language Reasoners | Denisa Roberts et.al. | 2407.04212 | link |
2024-07-04 | Investigating the Role of Instruction Variety and Task Difficulty in Robotic Manipulation Tasks | Amit Parekh et.al. | 2407.03967 | link |
2024-07-04 | ADAPT: Multimodal Learning for Detecting Physiological Changes under Missing Modalities | Julie Mordacq et.al. | 2407.03836 | link |
2024-07-04 | M |
Florian Schneider et.al. | 2407.03791 | null |
2024-07-03 | HEMM: Holistic Evaluation of Multimodal Foundation Models | Paul Pu Liang et.al. | 2407.03418 | link |
2024-07-02 | Multi-Peptide: Multimodality Leveraged Language-Graph Learning of Peptide Properties | Srivathsan Badrinarayanan et.al. | 2407.03380 | link |
2024-07-02 | Understanding Alignment in Multimodal LLMs: A Comprehensive Study | Elmira Amirloo et.al. | 2407.02477 | null |
2024-07-02 | Synthetic Multimodal Question Generation | Ian Wu et.al. | 2407.02233 | null |
2024-07-02 | Crossroads of Continents: Automated Artifact Extraction for Cultural Adaptation with Large Multimodal Models | Anjishnu Mukherjee et.al. | 2407.02067 | link |
2024-07-01 | Empathic Grounding: Explorations using Multimodal Interaction and Large Language Models with Conversational Agents | Mehdi Arjmand et.al. | 2407.01824 | link |
2024-07-01 | We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? | Runqi Qiao et.al. | 2407.01284 | link |
2024-07-01 | Unaligning Everything: Or Aligning Any Text to Any Image in Multimodal Models | Shaeke Salman et.al. | 2407.01157 | null |
2024-06-29 | AI-powered multimodal modeling of personalized hemodynamics in aortic stenosis | Caglar Ozturk et.al. | 2407.00535 | null |
2024-06-29 | MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation | Jinsheng Huang et.al. | 2407.00468 | link |
2024-06-29 | How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models | Jaeyoung Lee et.al. | 2407.00369 | null |
2024-06-28 | PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration | Yuxuan Sun et.al. | 2407.00203 | null |
2024-06-28 | EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model | Yuxuan Zhang et.al. | 2406.20076 | link |
2024-06-28 | InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding | Kirolos Ataallah et.al. | 2406.19875 | link |
2024-06-28 | MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis | Jun-Yan He et.al. | 2406.19859 | null |
2024-06-28 | MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment | Jihao Liu et.al. | 2406.19736 | link |
2024-06-28 | Enhancing Radiological Diagnosis: A Collaborative Approach Integrating AI and Human Expertise for Visual Miss Correction | Akash Awasthi et.al. | 2406.19686 | null |
2024-06-28 | SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs | Xin Su et.al. | 2406.19593 | null |
2024-06-27 | OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding | Tao Zhang et.al. | 2406.19389 | null |
2024-06-28 | FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts | Shubhankar Singh et.al. | 2406.19237 | null |
2024-06-27 | RAVEN: Multitask Retrieval Augmented Vision-Language Learning | Varun Nagaraj Rao et.al. | 2406.19150 | null |
2024-06-27 | DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming | Jiaxin Zhang et.al. | 2406.19101 | null |
2024-06-27 | Fairness and Bias in Multimodal AI: A Survey | Tosin Adewumi et.al. | 2406.19097 | null |
2024-06-27 | MissionGNN: Hierarchical Multimodal GNN-based Weakly Supervised Video Anomaly Recognition with Mission-Specific Knowledge Graph Generation | Sanggeon Yun et.al. | 2406.18815 | null |
2024-06-26 | MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data | William Berman et.al. | 2406.18790 | null |
2024-06-26 | S3: A Simple Strong Sample-effective Multimodal Dialog System | Elisei Rykov et.al. | 2406.18305 | link |
2024-06-26 | EHR-Based Mobile and Web Platform for Chronic Disease Risk Prediction Using Large Language Multimodal Models | Chun-Chieh Liao et.al. | 2406.18087 | null |
2024-06-26 | Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs | Uttaran Bhattacharya et.al. | 2406.18068 | null |
2024-06-25 | Human-centered In-building Embodied Delivery Benchmark | Zhuoqun Xu et.al. | 2406.17898 | link |
2024-06-25 | InFiConD: Interactive No-code Fine-tuning with Concept-based Knowledge Distillation | Jinbin Huang et.al. | 2406.17838 | null |
2024-06-25 | Data curation via joint example selection further accelerates multimodal learning | Talfan Evans et.al. | 2406.17711 | null |
2024-06-25 | Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights | Hao Yang et.al. | 2406.17430 | link |
2024-06-24 | At First Sight: Zero-Shot Classification of Astronomical Images with Large Multimodal Models | Dimitrios Tanoglidis et.al. | 2406.17057 | null |
2024-06-24 | Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models | Jierun Chen et.al. | 2406.16866 | link |
2024-06-24 | Long Context Transfer from Language to Vision | Peiyuan Zhang et.al. | 2406.16852 | link |
2024-06-24 | QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds | Ye Wang et.al. | 2406.16578 | null |
2024-06-21 | Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning | Brandon Huang et.al. | 2406.15334 | link |
2024-06-21 | Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models | Jiayu Wang et.al. | 2406.14852 | link |
2024-06-20 | Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models | Giulia Polverini et.al. | 2406.14685 | null |
2024-06-20 | Revealing Vision-Language Integration in the Brain with Multimodal Networks | Vighnesh Subramaniam et.al. | 2406.14481 | link |
2024-06-25 | iWISDM: Assessing instruction following in multimodal models at scale | Xiaoxuan Lei et.al. | 2406.14343 | link |
2024-06-20 | Two Giraffes in a Dirt Field: Using Game Play to Investigate Situation Modelling in Large Multimodal Models | Sherzod Hakimov et.al. | 2406.14035 | null |
2024-06-20 | Knowledge-driven Subspace Fusion and Gradient Coordination for Multi-modal Learning | Yupei Zhang et.al. | 2406.13979 | link |
2024-06-20 | PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents | Junjie Wang et.al. | 2406.13923 | null |
2024-06-19 | Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models | Zhawnen Chen et.al. | 2406.13763 | null |
2024-06-19 | GUI Action Narrator: Where and When Did That Action Take Place? | Qinchen Wu et.al. | 2406.13719 | null |
2024-06-19 | Is AI fun? HumorDB: a curated dataset and benchmark to investigate graphical humor | Veedant Jain et.al. | 2406.13564 | null |
2024-06-19 | VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models | Haowen Hou et.al. | 2406.13362 | link |
2024-06-19 | Learnable In-Context Vector for Visual Question Answering | Yingzhe Peng et.al. | 2406.13185 | link |
2024-06-18 | Synergizing Foundation Models and Federated Learning: A Survey | Shenghui Li et.al. | 2406.12844 | null |
2024-06-18 | OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI | Zhen Huang et.al. | 2406.12753 | link |
2024-06-18 | Disturbing Image Detection Using LMM-Elicited Emotion Embeddings | Maria Tzelepi et.al. | 2406.12668 | null |
2024-06-18 | Automatic benchmarking of large multimodal models via iterative experiment programming | Alessandro Conti et.al. | 2406.12321 | link |
2024-06-18 | Language and Multimodal Models in Sports: A Survey of Datasets and Applications | Haotian Xia et.al. | 2406.12252 | null |
2024-06-17 | VideoLLM-online: Online Video Large Language Model for Streaming Video | Joya Chen et.al. | 2406.11816 | null |
2024-06-17 | LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning | Dantong Niu et.al. | 2406.11815 | null |
2024-06-17 | Multimodal Learning To Improve Segmentation With Intraoperative CBCT & Preoperative CT | Maximilian E. Tschuchnig et.al. | 2406.11650 | null |
2024-06-17 | Program Synthesis Benchmark for Visual Programming in XLogoOnline Environment | Chao Wen et.al. | 2406.11334 | null |
2024-06-17 | VideoVista: A Versatile Benchmark for Video Understanding and Reasoning | Yunxin Li et.al. | 2406.11303 | null |
2024-06-17 | i-SRT: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective Judgment | Daechul Ahn et.al. | 2406.11280 | link |
2024-06-17 | MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens | Anas Awadalla et.al. | 2406.11271 | link |
2024-06-17 | Generative Visual Instruction Tuning | Jefferson Hernandez et.al. | 2406.11262 | link |
2024-06-17 | Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective | Yang Chen et.al. | 2406.11249 | null |
2024-06-16 | Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies | Hung-Ting Su et.al. | 2406.10923 | null |
2024-06-15 | Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model | Lu Xu et.al. | 2406.10484 | link |
2024-06-12 | MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases | Rithesh Murthy et.al. | 2406.10290 | null |
2024-06-14 | VideoGUI: A Benchmark for GUI Automation from Instructional Videos | Kevin Qinghong Lin et.al. | 2406.10227 | null |
2024-06-14 | ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation | Chufan Shi et.al. | 2406.09961 | link |
2024-06-14 | BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval | Imanol Miranda et.al. | 2406.09952 | link |
2024-06-13 | VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding | Muhammad Maaz et.al. | 2406.09418 | link |
2024-06-13 | Explore the Limits of Omni-modal Pretraining at Scale | Yiyuan Zhang et.al. | 2406.09412 | link |
2024-06-14 | 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities | Roman Bachmann et.al. | 2406.09406 | null |
2024-06-13 | Yo'LLaVA: Your Personalized Language and Vision Assistant | Thao Nguyen et.al. | 2406.09400 | link |
2024-06-13 | CMC-Bench: Towards a New Paradigm of Visual Signal Compression | Chunyi Li et.al. | 2406.09356 | link |
2024-06-13 | Comparison Visual Instruction Tuning | Wei Lin et.al. | 2406.09240 | null |
2024-06-13 | Zoom and Shift are All You Need | Jiahao Qin et.al. | 2406.08866 | null |
2024-06-11 | Embedding-based Multimodal Learning on Pan-Squamous Cell Carcinomas for Improved Survival Outcomes | Asim Waqas et.al. | 2406.08521 | null |
2024-06-14 | Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models | Yi-Fan Zhang et.al. | 2406.08487 | link |
2024-06-13 | OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text | Qingyun Li et.al. | 2406.08418 | link |
2024-06-12 | A Concept-Based Explainability Framework for Large Multimodal Models | Jayneel Parekh et.al. | 2406.08074 | link |
2024-06-12 | LVBench: An Extreme Long Video Understanding Benchmark | Weihan Wang et.al. | 2406.08035 | link |
2024-06-11 | Cognitive Insights Across Languages: Enhancing Multimodal Interview Analysis | David Ortiz-Perez et.al. | 2406.07542 | link |
2024-06-11 | Understanding Visual Concepts Across Models | Brandon Trabucco et.al. | 2406.07506 | link |
2024-06-11 | Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology | Huahui Yi et.al. | 2406.07078 | link |
2024-06-14 | BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification | June-Woo Kim et.al. | 2406.06786 | link |
2024-06-10 | Vript: A Video Is Worth Thousands of Words | Dongjie Yang et.al. | 2406.06040 | link |
2024-06-10 | FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model | Yebin Lee et.al. | 2406.06004 | link |
2024-06-10 | CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark | David Romero et.al. | 2406.05967 | null |
2024-06-09 | Stealthy Targeted Backdoor Attacks against Image Captioning | Wenshu Fan et.al. | 2406.05874 | link |
2024-06-09 | F-LMM: Grounding Frozen Large Multimodal Models | Size Wu et.al. | 2406.05821 | link |
2024-06-08 | Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities | Sai Munikoti et.al. | 2406.05496 | null |
2024-06-07 | Semantic Segmentation on VSPW Dataset through Masked Video Consistency | Chen Liang et.al. | 2406.04979 | null |
2024-06-07 | Predictive Dynamic Fusion | Bing Cao et.al. | 2406.04802 | link |
2024-06-07 | MGIMM: Multi-Granularity Instruction Multimodal Model for Attribute-Guided Remote Sensing Image Detailed Description | Cong Yang et.al. | 2406.04716 | link |
2024-06-07 | AICoderEval: Improving AI Domain Code Generation of Large Language Models | Yinghui Xia et.al. | 2406.04712 | null |
2024-06-06 | GenAI Arena: An Open Evaluation Platform for Generative Models | Dongfu Jiang et.al. | 2406.04485 | null |
2024-06-06 | MAIRA-2: Grounded Radiology Report Generation | Shruthi Bannur et.al. | 2406.04449 | link |
2024-06-06 | DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs | Lingchen Meng et.al. | 2406.04334 | null |
2024-06-06 | BLSP-Emo: Towards Empathetic Large Speech-Language Models | Chen Wang et.al. | 2406.03872 | link |
2024-06-05 | Identification of Stone Deterioration Patterns with Large Multimodal Models | Daniele Corradetti et.al. | 2406.03207 | link |
2024-06-05 | Exploiting LMM-based knowledge for image classification tasks | Maria Tzelepi et.al. | 2406.03071 | null |
2024-06-02 | Multimodal Deep Learning for Low-Resource Settings: A Vector Embedding Alignment Approach for Healthcare Applications | David Restrepo et.al. | 2406.02601 | null |
2024-06-04 | Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning | Alex Jinpeng Wang et.al. | 2406.02547 | link |
2024-06-04 | Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization | Yunpeng Zhao et.al. | 2406.01987 | null |
2024-06-03 | Automatic Fused Multimodal Deep Learning for Plant Identification | Alfreds Lapkovskis et.al. | 2406.01455 | link |
2024-06-05 | Pulmonary Embolism Mortality Prediction Using Multimodal Learning Based on Computed Tomography Angiography and Clinical Data | Zhusi Zhong et.al. | 2406.01302 | null |
2024-06-03 | Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model | Kezhen Chen et.al. | 2406.00977 | link |
2024-06-02 | Learning Multimodal Behaviors from Scratch with Diffusion Policy Gradient | Zechu Li et.al. | 2406.00681 | null |
2024-06-04 | StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond | Pengyuan Lyu et.al. | 2405.21013 | null |
2024-05-31 | Don't Buy it! Reassessing the Ad Understanding Abilities of Contrastive Multimodal Models | A. Bavaresco et.al. | 2405.20846 | link |
2024-06-17 | Ovis: Structural Embedding Alignment for Multimodal Large Language Model | Shiyin Lu et.al. | 2405.20797 | link |
2024-05-31 | Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning | Yang Chen et.al. | 2405.20606 | link |
2024-05-30 | Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA | Qianqi Yan et.al. | 2405.20421 | link |
2024-05-30 | Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use | Franz Louis Cesista et.al. | 2405.20245 | null |
2024-05-31 | Visual Attention Analysis in Online Learning | Miriam Navarro et.al. | 2405.20091 | null |
2024-05-30 | MM-Lego: Modular Biomedical Multimodal Models with Minimal Fine-Tuning | Konstantin Hemker et.al. | 2405.19950 | null |
2024-05-30 | Instruction-Guided Visual Masking | Jinliang Zheng et.al. | 2405.19783 | link |
2024-05-29 | Thermodynamically Informed Multimodal Learning of High-Dimensional Free Energy Models in Molecular Coarse Graining | Blake R. Duschatko et.al. | 2405.19386 | null |
2024-06-09 | LLMs Meet Multimodal Generation and Editing: A Survey | Yingqing He et.al. | 2405.19334 | link |
2024-05-29 | Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare | Hanwei Zhu et.al. | 2405.19298 | link |
2024-05-31 | Benchmarking and Improving Detail Image Caption | Hongyuan Dong et.al. | 2405.19092 | link |
2024-05-29 | Topological Perspectives on Optimal Multimodal Embedding Spaces | Abdul Aziz A. B et.al. | 2405.18867 | null |
2024-05-29 | Exploring Exotic Decays of the Higgs Boson to Multi-Photons at the LHC via Multimodal Learning Approaches | A. Hammad et.al. | 2405.18834 | null |
2024-05-28 | The Evolution of Multimodal Model Architectures | Shakti N. Wadekar et.al. | 2405.17927 | null |
2024-05-28 | Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment | Xin Xiao et.al. | 2405.17871 | link |
2024-05-28 | Full-Stack Allreduce on Multi-Rail Networks | Enda Yu et.al. | 2405.17870 | null |
2024-05-28 | MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance | Yake Wei et.al. | 2405.17730 | link |
2024-05-27 | Matryoshka Multimodal Models | Mu Cai et.al. | 2405.17430 | null |
2024-05-27 | XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser | Xianfu Cheng et.al. | 2405.17336 | link |
2024-05-28 | LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding | Haoyu Zhao et.al. | 2405.17104 | null |
2024-05-27 | Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning | Zihua Zhao et.al. | 2405.16996 | link |
2024-05-27 | Multilingual Diversity Improves Vision-Language Representations | Thao Nguyen et.al. | 2405.16915 | null |
2024-05-26 | Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs | Mustafa Shukor et.al. | 2405.16700 | link |
2024-05-25 | How Well Do Deep Learning Models Capture Human Concepts? The Case of the Typicality Effect | Siddhartha K. Vemuri et.al. | 2405.16128 | null |
2024-05-24 | ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models | Chunjiang Ge et.al. | 2405.15738 | link |
2024-05-24 | Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models | Yongsheng Yu et.al. | 2405.15687 | null |
2024-05-24 | M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models | Hongyu Wang et.al. | 2405.15638 | link |
2024-05-24 | DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception | Run Luo et.al. | 2405.15232 | link |
2024-05-24 | Shopping Queries Image Dataset (SQID): An Image-Enriched ESCI Dataset for Exploring Multimodal Learning in Product Search | Marie Al Ghossein et.al. | 2405.15190 | link |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-12-19 | DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation | Wang Zhao et.al. | 2412.15200 | null |
2024-12-18 | On the principle of linearized stability for quasilinear evolution equations in time-weighted spaces | Bogdan-Vasile Matioc et.al. | 2412.13940 | null |
2024-12-17 | On the Bäcklund transform and the stability of the line soliton of the KP-II equation on |
Lorenzo Pompili et.al. | 2412.12530 | null |
2024-12-13 | On the embedding of weighted Sobolev spaces with applications to a planar nonlinear Schrödinger equation | Antonio Azzolini et.al. | 2412.10067 | null |
2024-12-12 | Modified scattering for the cubic dispersion-managed NLS | Jason Murphy et.al. | 2412.09762 | null |
2024-12-12 | LoRACLR: Contrastive Adaptation for Customization of Diffusion Models | Enis Simsar et.al. | 2412.09622 | null |
2024-12-11 | Exploring superconformal Yang-Mills theories through matrix Bessel kernels | Zoltan Bajnok et.al. | 2412.08732 | null |
2024-12-09 | Bilinear singular integral operators with kernels in weighted spaces | Petr Honzík et.al. | 2412.07014 | null |
2024-12-04 | Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach | Lingchen Sun et.al. | 2412.03017 | link |
2024-11-21 | Strong localization blurs criticality of time series for spreading phenomena on networks | Juliane T. Moraes et.al. | 2412.01842 | null |
2024-12-02 | Geometric invariant theory and stretched Kostka quasi-polynomials | Marc Besson et.al. | 2412.01651 | null |
2024-11-29 | Origin-Destination Demand Prediction: An Urban Radiation and Attraction Perspective | Xuan Ma et.al. | 2412.00167 | null |
2024-11-29 | Rényi complexity in mean-field disordered systems | Nina Javerzat et.al. | 2411.19817 | null |
2024-11-28 | An Extensive Evaluation of Factual Consistency in Large Language Models for Data-to-Text Generation | Joy Mahapatra et.al. | 2411.19203 | null |
2024-11-27 | Task Arithmetic Through The Lens Of One-Shot Federated Learning | Zhixu Tao et.al. | 2411.18607 | null |
2024-11-25 | Spectral properties of Lévy Fokker--Planck equations | Hardy Chan et.al. | 2411.16424 | null |
2024-11-20 | Nonlinear orbital stability of stationary shock profiles for the Lax-Wendroff scheme | Jean-François Coulombel et.al. | 2411.13094 | null |
2024-11-26 | Enhancing generalization in high energy physics using white-box adversarial attacks | Franck Rothen et.al. | 2411.09296 | null |
2024-11-11 | Minimal nilpotent finite |
Genqiang Liu et.al. | 2411.06768 | null |
2024-11-07 | Well-Posedness and Regularity of the Heat Equation with Robin Boundary Conditions in the Two-Dimensional Wedge | Marco Bravin et.al. | 2411.04651 | null |
2024-11-04 | SALSA: Soup-based Alignment Learning for Stronger Adaptation in RLHF | Atoosa Chegini et.al. | 2411.01798 | null |
2024-12-06 | Modular Duality in Deep Learning | Jeremy Bernstein et.al. | 2410.21265 | null |
2024-10-26 | MarDini: Masked Autoregressive Diffusion for Video Generation at Scale | Haozhe Liu et.al. | 2410.20280 | null |
2024-10-25 | Four-parameter Mittag-Leffler functions and their associated coherent states | Dušan Popov et.al. | 2410.19462 | null |
2024-10-24 | Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation | Krzysztof Ociepa et.al. | 2410.18565 | null |
2024-10-21 | Two dimensional delta Bose gas in a weighted space | Sudheesh Surendranath et.al. | 2410.16550 | null |
2024-10-21 | In Search of the Successful Interpolation: On the Role of Sharpness in CLIP Generalization | Alireza Abdollahpoorrostam et.al. | 2410.16476 | link |
2024-10-23 | Universal approximation results for neural networks with non-polynomial activation function over non-compact domains | Ariel Neufeld et.al. | 2410.14759 | null |
2024-10-23 | Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching | Jie Peng et.al. | 2410.14740 | null |
2024-10-16 | Differential Shape Optimization with Image Representation for Photonic Design | Zhaocheng Liu et.al. | 2410.13074 | null |
2024-10-15 | Scaling Laws for Multilingual Language Models | Yifei He et.al. | 2410.12883 | null |
2024-10-16 | AutoSimTTF: A Fully Automatic Pipeline for Electric Field Simulation and Treatment Planning of Tumor Treating Fields | Minmin Wang et.al. | 2410.12196 | null |
2024-10-15 | Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence | Shangbin Feng et.al. | 2410.11163 | null |
2024-10-14 | Deep Linear Probe Generators for Weight Space Learning | Jonathan Kahana et.al. | 2410.10811 | null |
2024-10-14 | Generating Model Parameters for Controlling: Parameter Diffusion for Controllable Multi-Task Recommendation | Chenglei Shen et.al. | 2410.10639 | null |
2024-10-14 | MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer | Minghao Zhu et.al. | 2410.10589 | link |
2024-10-15 | Regions of Level |
Yanru Chen et.al. | 2410.10198 | null |
2024-10-13 | A Quantum Circuit-Based Compression Perspective for Parameter-Efficient Learning | Chen-Yu Liu et.al. | 2410.09846 | null |
2024-10-11 | Meta-Transfer Learning Empowered Temporal Graph Networks for Cross-City Real Estate Appraisal | Weijia Zhang et.al. | 2410.08947 | null |
2024-10-09 | Efficient Weight-Space Laplace-Gaussian Filtering and Smoothing for Sequential Deep Learning | Joanna Sliwa et.al. | 2410.06800 | null |
2024-10-09 | Revisiting Multi-Permutation Equivariance through the Lens of Irreducible Representations | Yonatan Sverdlov et.al. | 2410.06665 | link |
2024-10-08 | Weighted Embeddings for Low-Dimensional Graph Representation | Thomas Bläsius et.al. | 2410.06042 | null |
2024-10-05 | Computing ground states of Bose-Einstein condensation by normalized deep neural network | Weizhu Bao et.al. | 2410.05319 | link |
2024-10-07 | Hyper-Representations: Learning from Populations of Neural Networks | Konstantin Schürholt et.al. | 2410.05107 | link |
2024-10-06 | Integrable Modules of Map full Toroidal Lie Algebras | Pradeep Bisht et.al. | 2410.04495 | null |
2024-10-06 | Global well-posedness for the defocusing 3D quadratic NLS in the sharp critical space | Jia Shen et.al. | 2410.04337 | null |
2024-10-05 | Equivariant Neural Functional Networks for Transformers | Viet-Hoang Tran et.al. | 2410.04209 | null |
2024-10-15 | Learning on LoRAs: GL-Equivariant Processing of Low-Rank Weight Spaces for Large Finetuned Models | Theo Putterman et.al. | 2410.04207 | null |
2024-10-04 | Measuring and Controlling Solution Degeneracy across Task-Trained Recurrent Neural Networks | Ann Huang et.al. | 2410.03972 | null |
2024-10-04 | Autoregressive Moving-average Attention Mechanism for Time Series Forecasting | Jiecheng Lu et.al. | 2410.03159 | link |
2024-10-02 | Composing Global Optimizers to Reasoning Tasks via Algebraic Objects in Neural Nets | Yuandong Tian et.al. | 2410.01779 | link |
2024-10-01 | SynCOM: A tool for simulating coronal outflows | Valmir Moraes Filho et.al. | 2410.01004 | null |
2024-10-01 | On the prime ideals of higher secant varieties of Veronese embeddings of small degrees | Katsuhisa Furukawa et.al. | 2410.00652 | null |
2024-09-30 | Old Optimizer, New Norm: An Anthology | Jeremy Bernstein et.al. | 2409.20325 | null |
2024-09-27 | Effects of Peierls phases in open linear chains | Anselmo M. Marques et.al. | 2409.18780 | null |
2024-09-27 | Density of states in neural networks: an in-depth exploration of learning in parameter space | Margherita Mele et.al. | 2409.18683 | null |
2024-09-26 | The time periodic problem for the Navier-Stokes equations in exterior domains in weighted spaces | Reinhard Farwig et.al. | 2409.17590 | null |
2024-09-25 | Scalable Ensemble Diversification for OOD Generalization and Detection | Alexander Rubinstein et.al. | 2409.16797 | null |
2024-10-04 | Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition | Zheda Mai et.al. | 2409.16434 | link |
2024-09-24 | VascX Models: Model Ensembles for Retinal Vascular Analysis from Color Fundus Images | Jose Vargas Quiros et.al. | 2409.16016 | link |
2024-09-23 | Efficient Large-Scale Quantum Optimization via Counterdiabatic Ansatz | Jie Liu et.al. | 2409.15055 | null |
2024-09-24 | Weighted Approximation By Max-Product Generalized Exponential Sampling Series | Satyaranjan Pradhan et.al. | 2409.14884 | null |
2024-09-21 | Weakly magnetized black holes in Einstein-ModMax theory | Haryanto M. Siahaan et.al. | 2409.13967 | null |
2024-09-18 | Monomial Matrix Group Equivariant Neural Functional Networks | Hoang V. Tran et.al. | 2409.11697 | link |
2024-09-17 | Existence of an extremal function of Sobolev critical embedding with an |
Petr Gurka et.al. | 2409.11193 | null |
2024-09-16 | Inferring stellar parameters and their uncertainties from high-resolution spectroscopy using invertible neural networks | Nils Candebat et.al. | 2409.10621 | null |
2024-09-13 | Non-unitary Wightman CFTs and non-unitary vertex algebras | Sebastiano Carpi et.al. | 2409.08454 | null |
2024-09-12 | Global well-posedness and scattering in weighted space for nonlinear Schrödinger equations below the Strauss exponent without gauge-invariance | Masaki Kawamoto et.al. | 2409.08432 | null |
2024-09-09 | Fast gradient-free optimization of excitations in variational quantum eigensolvers | Jonas Jäger et.al. | 2409.05939 | null |
2024-09-06 | SCARF: Scalable Continual Learning Framework for Memory-efficient Multiple Neural Radiance Fields | Yuze Wang et.al. | 2409.04482 | null |
2024-09-04 | Federated Quantum-Train with Batched Parameter Generation | Chen-Yu Liu et.al. | 2409.02763 | null |
2024-09-16 | Regret Analysis for Randomized Gaussian Process Upper Confidence Bound | Shion Takeno et.al. | 2409.00979 | null |
2024-08-30 | Abstracted Gaussian Prototypes for One-Shot Concept Learning | Chelsea Zou et.al. | 2408.17251 | link |
2024-08-23 | Emergence of global receptive fields capturing multipartite quantum correlations | Oleg M. Sotnikov et.al. | 2408.13033 | null |
2024-08-22 | **Action of $\mathfrak{osp}(1 | 2n)$ on polynomials tensor $\mathbb{C}^{0 | 2n}$** | Dwight Anderson Williams II et.al. |
2024-08-19 | Unimodal sequences and mixed false theta functions | Kevin Allen et.al. | 2408.09789 | null |
2024-08-16 | Onsager-Machlup functional for stochastic lattice dynamical systems driven by time-varying noise | Xinze Zhang et.al. | 2408.08465 | null |
2024-08-10 | Variational Inference Failures Under Model Symmetries: Permutation Invariant Posteriors for Bayesian Neural Networks | Yoav Gelberg et.al. | 2408.05496 | null |
2024-08-09 | Quasilinear parabolic equations with superlinear nonlinearities in critical spaces | Bogdan-Vasile Matioc et.al. | 2408.05067 | null |
2024-08-08 | A framework for generalizing toric inequalities for holographic entanglement entropy | Ning Bao et.al. | 2408.04741 | null |
2024-08-07 | Counterfactuals and Uncertainty-Based Explainable Paradigm for the Automated Detection and Segmentation of Renal Cysts in Computed Tomography Images: A Multi-Center Study | Zohaib Salahuddin et.al. | 2408.03789 | null |
2024-08-05 | BOTS-LM: Training Large Language Models for Setswana | Nathan Brown et.al. | 2408.02239 | null |
2024-08-02 | Conditional LoRA Parameter Generation | Xiaolong Jin et.al. | 2408.01415 | null |
2024-08-01 | Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization | Róisín Luo et.al. | 2408.00923 | null |
2024-07-31 | Semantic Codebook Learning for Dynamic Recommendation Models | Zheqi Lv et.al. | 2408.00123 | null |
2024-07-29 | Tensor product weight modules over the affine-Virasoro algebra | Qiu-Fan Chen et.al. | 2407.19844 | null |
2024-07-24 | Generalized Hilbert operators acting on weighted spaces of holomorphic functions with sup-norms | María J. Beltrán-Meneu et.al. | 2407.17646 | null |
2024-07-24 | Generalized Ordinal Priority Approach for Multi-Attribute Decision-Making under Incomplete Preference Information | Renlong Wang et.al. | 2407.17099 | null |
2024-07-22 | WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation | Zirui Shao et.al. | 2407.15502 | link |
2024-07-18 | FSP-Laplace: Function-Space Priors for the Laplace Approximation in Bayesian Deep Learning | Tristan Cinquin et.al. | 2407.13711 | null |
2024-07-19 | Parameter Generation of Quantum Approximate Optimization Algorithm with Diffusion Model | Fanxu Meng et.al. | 2407.12242 | null |
2024-07-24 | Effect Heterogeneity with Earth Observation in Randomized Controlled Trials: Exploring the Role of Data, Model, and Evaluation Metric Choice | Connor T. Jerzak et.al. | 2407.11674 | link |
2024-07-15 | Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion | Yongyuan Liang et.al. | 2407.10973 | null |
2024-07-16 | The well-posedness of generalized nonlinear wave equations on the lattice graph | Bobo Hua et.al. | 2407.09815 | null |
2024-07-15 | Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization | Jinlong Li et.al. | 2407.08374 | null |
2024-07-09 | Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic | Ruochen Jin et.al. | 2407.07089 | link |
2024-07-04 | Recovering Initial States in Semilinear Parabolic Problems from Time-Averages | Lina Sophie Schmitz et.al. | 2407.03829 | null |
2024-07-01 | A quantum deformation of the |
H. Awata et.al. | 2407.00901 | null |
2024-06-24 | WARP: On the Benefits of Weight Averaged Rewarded Policies | Alexandre Ramé et.al. | 2406.16768 | null |
2024-06-24 | Improving robustness to corruptions with multiplicative weight perturbations | Trung Trinh et.al. | 2406.16540 | link |
2024-06-21 | Determination of certain mod |
Abhik Ganguli et.al. | 2406.15600 | null |
2024-06-21 | Elliptic analysis on collapsing gravitational instantons modelled using the Gibbons-Hawking ansatz | Willem Adriaan Salm et.al. | 2406.15008 | null |
2024-06-20 | MEAT: Median-Ensemble Adversarial Training for Improving Robustness and Generalization | Zhaozhe Hu et.al. | 2406.14259 | link |
2024-06-18 | From Instance Training to Instruction Learning: Task Adapters Generation from Instructions | Huanxuan Liao et.al. | 2406.12382 | link |
2024-06-17 | Kaniadakis entropy in extreme gravitational and cosmological environments: a review on the state-of-the-art and future prospects | Giuseppe Gaetano Luciano et.al. | 2406.11373 | null |
2024-06-16 | Analysis and approximation of elliptic problems with Uhlenbeck structure in convex polytopes | Tadele Mengesha et.al. | 2406.10762 | null |
2024-06-14 | Towards Scalable and Versatile Weight Space Learning | Konstantin Schürholt et.al. | 2406.09997 | link |
2024-06-13 | Interpreting the Weight Space of Customized Diffusion Models | Amil Dravid et.al. | 2406.09413 | link |
2024-06-12 | Diffusion Soup: Model Merging for Text-to-Image Diffusion Models | Benjamin Biggs et.al. | 2406.08431 | null |
2024-06-24 | Cartan monopoles | Andrei Smilga et.al. | 2406.06042 | null |
2024-06-08 | Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models | Minho Park et.al. | 2406.05432 | link |
2024-06-06 | Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networks | Tristan Cinquin et.al. | 2406.04317 | null |
2024-06-06 | A characterization of |
Lucas Backes et.al. | 2406.04126 | null |
2024-06-05 | Reproducing Kernel Thesis of Hankel Operators on Weighted Hardy Spaces | Ana Čolović et.al. | 2406.03106 | null |
2024-05-21 | Backpropogation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration | Wei Ji et.al. | 2406.01601 | null |
2024-05-29 | Thermodynamics of the most generalized form of Holographic Dark Energy and some particular cases with Corrected Entropies | Sanghati Saha et.al. | 2405.20783 | null |
2024-06-20 | The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof | Derek Lim et.al. | 2405.20231 | link |
2024-05-28 | Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography | Jie Liu et.al. | 2405.18356 | link |
2024-05-28 | Donato Crisostomi et.al. | 2405.17897 | link | |
2024-05-27 | Smoothing effects and extinction in finite time for fractional fast diffusions on Riemannian manifolds | Elvise Berchio et.al. | 2405.17126 | null |
2024-05-31 | FedSheafHN: Personalized Federated Learning on Graph-structured Data | Wenfei Liang et.al. | 2405.16056 | null |
2024-05-27 | HyperInterval: Hypernetwork approach to training weight interval regions in continual learning | Patryk Krukowski et.al. | 2405.15444 | link |
2024-05-23 | Scalable Optimization in the Modular Norm | Tim Large et.al. | 2405.14813 | link |
2024-06-16 | A refined Weyl character formula for comodules on |
Helge Øystein Maakestad et.al. | 2405.09210 | null |
2024-05-13 | Localizing Task Information for Improved Model Merging and Compression | Ke Wang et.al. | 2405.07813 | link |
2024-05-13 | Rafael Kourdis et.al. | 2405.07769 | null | |
2024-05-12 | Approximation by a new sequence of operators involving Laguerre polynomials | Kapil Kumar et.al. | 2405.07228 | null |
2024-05-06 | Swarm intelligence for full Stokes dynamic imaging reconstruction of interferometric data | Alejandro Mus et.al. | 2405.03330 | null |
2024-05-04 | Large Deviation Principles of Invariant Measures of Stochastic Reaction-Diffusion Lattice Systems | Bixiang Wang et.al. | 2405.02720 | null |
2024-05-03 | The Immersed Inextensible Interface Problem in 2D Stokes Flow | Eduardo García-Juárez et.al. | 2405.02446 | null |
2024-05-02 | Customizing Text-to-Image Models with a Single Image Pair | Maxwell Jones et.al. | 2405.01536 | null |
2024-04-25 | Robust Fine-tuning for Pre-trained 3D Point Cloud Models | Zhibo Zhang et.al. | 2404.16422 | null |
2024-04-23 | The Geometry of the Set of Equivalent Linear Neural Networks | Jonathan Richard Shewchuk et.al. | 2404.14855 | null |
2024-04-24 | Nonexistence of solutions to parabolic problems with a potential on weighted graphs | Dario D. Monticelli et.al. | 2404.12058 | null |
2024-04-17 | On the relaxation to equilibrium of a quantum oscillator interacting with a radiation field | Pierre-A. Vuillermot et.al. | 2404.11329 | null |
2024-04-15 | Higher-curvature gravity in AdS |
Mariano Chernicoff et.al. | 2404.10128 | null |
2024-04-16 | Asymptotic-preserving approximations for stochastic incompressible viscous fluids and SPDEs on graph | Jianbo Cui et.al. | 2404.09168 | null |
2024-04-09 | Perspective on Physical Interpretations of Rényi Entropy in Statistical Mechanics | Misaki Ozawa et.al. | 2404.06436 | null |
2024-04-09 | A gluing construction of singular solutions for a fully non-linear equation in conformal geometry | María Fernanda Espinal et.al. | 2404.05965 | null |
2024-04-05 | Dissipative Euler flows originating from circular vortex filaments | Francisco Gancedo et.al. | 2404.04250 | null |
2024-04-05 | Macdonald characters from a new formula for Macdonald polynomials | Houcine Ben Dali et.al. | 2404.03904 | null |
2024-04-04 | Fundamental inequalities for the iterated Fourier-cosine convolution with Gaussian weight and its application | Nguyen Thi Hong Phuong et.al. | 2404.03609 | null |
2024-03-29 | Embracing Unknown Step by Step: Towards Reliable Sparse Training in Real World | Bowen Lei et.al. | 2403.20047 | link |
2024-03-28 | Model Stock: All we need is just a few fine-tuned models | Dong-Hwan Jang et.al. | 2403.19522 | link |
2024-03-26 | A location Invariant Statistic-Based Consistent Estimation Method for Three-Parameter Generalized Exponential Distribution | Kiran Prajapat et.al. | 2403.17609 | null |
2024-06-03 | FissionFusion: Fast Geometric Generation and Hierarchical Souping for Medical Image Analysis | Santosh Sanjeev et.al. | 2403.13341 | link |
2024-06-18 | Learning Useful Representations of Recurrent Neural Network Weight Matrices | Vincent Herrmann et.al. | 2403.11998 | link |
2024-03-16 | Function-space Parameterization of Neural Networks for Sequential Learning | Aidan Scannell et.al. | 2403.10929 | link |
2024-03-14 | Imprints of Barrow-Tsallis Cosmology in Primordial Gravitational Waves | Petr Jizba et.al. | 2403.09797 | null |
2024-03-14 | Eigenvariety for partially classical Hilbert modular forms | Mladen Dimitrov et.al. | 2403.09784 | null |
2024-03-12 | The solenoidal Heisenberg Virasoro algebra and its simple weight modules | Boujemaa Agrebaoui et.al. | 2403.07381 | null |
2024-03-10 | FrameQuant: Flexible Low-Bit Quantization for Transformers | Harshavardhan Adepu et.al. | 2403.06082 | link |
2024-03-06 | The solenoidal Virasoro algebra and its simple weight modules | Boujemaa Agrebaoui et.al. | 2403.03753 | null |
2024-03-05 | Tensor Decomposition-based Time Varying Channel Estimation for mmWave MIMO-OFDM Systems | Ruizhe Wang et.al. | 2403.02942 | null |
2024-03-05 | Neural Redshift: Random Networks are not Random Functions | Damien Teney et.al. | 2403.02241 | null |
2024-03-04 | Tiny fluctuations of the averaging process around its degenerate steady state | Federico Sau et.al. | 2403.02032 | null |
2024-03-15 | Training-Free Pretrained Model Merging | Zhengqi Xu et.al. | 2403.01753 | link |
2024-04-22 | HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances | Supreeth Narasimhaswamy et.al. | 2403.01693 | null |
2024-03-13 | TOOLVERIFIER: Generalization to New Tools via Self-Verification | Dheeraj Mekala et.al. | 2402.14158 | link |
2024-02-21 | Computing Tangent Spaces to Eigenvarieties | James Rawson et.al. | 2402.13799 | null |
2024-05-28 | Neural Network Parameter Diffusion | Kai Wang et.al. | 2402.13144 | link |
2024-02-19 | Exponential attractors for a nonlocal delayed reaction-diffusion equation on an unbounded domain | Wenjie Hu et.al. | 2402.11856 | null |
2024-02-18 | Discrete Neural Algorithmic Reasoning | Gleb Rodionov et.al. | 2402.11628 | link |
2024-02-17 | Uncertainty Quantification of Graph Convolution Neural Network Models of Evolving Processes | Jeremiah Hauth et.al. | 2402.11179 | null |
2024-06-06 | Generalizability of Mixture of Domain-Specific Adapters from the Lens of Signed Weight Directions and its Application to Effective Model Pruning | Tuc Nguyen et.al. | 2402.10639 | null |
2024-02-14 | TAI-GAN: A Temporally and Anatomically Informed Generative Adversarial Network for early-to-late frame conversion in dynamic cardiac PET inter-frame motion correction | Xueqi Guo et.al. | 2402.09567 | null |
2024-02-14 | The cohomology of |
Alexander B. Ivanov et.al. | 2402.09017 | null |
2024-02-09 | The Asymptotic Structure of Cosmological Integrals | Paolo Benincasa et.al. | 2402.06558 | null |
2024-02-07 | Universal Neural Functionals | Allan Zhou et.al. | 2402.05232 | link |
2024-02-06 | Maximal regularity and optimal control for a non-local Cahn-Hilliard tumour growth model | Matteo Fornoni et.al. | 2402.04204 | null |
2024-02-06 | Improved Generalization of Weight Space Networks via Augmentations | Aviv Shamsian et.al. | 2402.04081 | link |
2024-02-02 | Training-time Neuron Alignment through Permutation Subspace for Improving Linear Mode Connectivity and Model Fusion | Zexi Li et.al. | 2402.01342 | null |
2024-02-01 | Understanding Neural Network Systems for Image Analysis using Vector Spaces and Inverse Maps | Rebecca Pattichis et.al. | 2402.00261 | link |
2024-01-26 | Do deep neural networks utilize the weight space efficiently? | Onur Can Koyun et.al. | 2401.16438 | null |
2024-01-22 | On strong growth conditions for weighted spaces of entire functions | Gerhard Schindl et.al. | 2401.14330 | null |
2024-01-24 | Task structure and nonlinearity jointly determine learned representational geometry | Matteo Alleman et.al. | 2401.13558 | null |
2024-01-25 | Sparse Domination of Singular Bilinear Forms on Non-Homogeneous spaces | Paco Villarroya et.al. | 2401.13130 | null |
2024-01-22 | WARM: On the Benefits of Weight Averaged Reward Models | Alexandre Ramé et.al. | 2401.12187 | null |
2024-01-17 | Cesàro operators associated with Borel measures acting on weighted spaces of holomorphic functions with sup-norm | Maria José Beltrán Meneu et.al. | 2401.09406 | null |
2024-01-15 | Singular fractal dimension at periodicity cascades in parameters spaces | Carlos E. P. Abreu et.al. | 2401.07648 | null |
2024-01-17 | Computing Fringe Presentations of Multigraded Persistence Modules | Fabian Lenzen et.al. | 2401.06008 | null |
2024-01-10 | Grimoire is All You Need for Enhancing Large Language Models | Ding Chen et.al. | 2401.03385 | link |
2024-03-26 | Artificial Intelligence for Operations Research: Revolutionizing the Operations Research Process | Zhenan Fan et.al. | 2401.03244 | null |
2023-12-31 | A Compact Representation for Bayesian Neural Networks By Removing Permutation Symmetry | Tim Z. Xiao et.al. | 2401.00611 | link |
2023-12-28 | Fractional non-homogeneous counting process | Nick Laskin et.al. | 2312.17389 | null |
2023-12-28 | Some unimodal sequences of Kronecker coefficients | Alimzhan Amanov et.al. | 2312.17054 | null |
2023-12-24 | The Vlasov-Maxwell-Boltzmann/Landau system with polynomial perturbation near Maxwellian | Chuqi Cao et.al. | 2312.15510 | null |
2023-12-22 | Emage: Non-Autoregressive Text-to-Image Generation | Zhangyin Feng et.al. | 2312.14988 | null |
2023-12-21 | Hypercyclic shifts on lattice graphs | Anton Baranov et.al. | 2312.13934 | null |
2023-12-21 | Scattering for 2d semi-relativistic Hartree equations with short range potential | Changhun Yang et.al. | 2312.13606 | null |
2023-12-21 | Entropic Inflation in Presence of Scalar Field | Sergei D. Odintsov et.al. | 2312.13587 | null |
2023-12-30 | Time is Encoded in the Weights of Finetuned Language Models | Kai Nylund et.al. | 2312.13401 | link |
2023-12-14 | Efficient momentum space approach to superconductivity in quasiperiodic systems | Mao Yoshii et.al. | 2312.09124 | null |
2023-12-13 | Best one-sided algebraic approximation by average modulus | Raheam A. Al-Saphory et.al. | 2312.08407 | null |
2023-12-19 | Well-Posedness of Quasilinear Parabolic Equations in Time-Weighted Spaces | Bogdan Matioc et.al. | 2312.07974 | null |
2023-12-12 | Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models | Arnav Chavan et.al. | 2312.07046 | link |
2023-12-11 | Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks | MohammadReza Davari et.al. | 2312.06795 | null |
2023-12-08 | Stoichiometry preservation and generalization of Bilger mixture fraction for non-premixed combustion with differential molecular diffusion | Haifeng Wang et.al. | 2312.05204 | null |
2023-12-01 | New polyconvolution product for Fourier-cosine and Laplace integral operators and their applications | Trinh Tuan et.al. | 2312.00764 | null |
2023-11-30 | Modelling Einstein cluster using Einasto profile | Ritwik Acharyya et.al. | 2311.18622 | null |
2023-11-27 | Extraction of the microscopic properties of quasi-particles using deep neural networks | Olga Soloveva et.al. | 2311.15984 | null |
2024-01-24 | Deep Latent Force Models: ODE-based Process Convolutions for Bayesian Deep Learning | Thomas Baldwin-McDonald et.al. | 2311.14828 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-10-25 | FLiP: Privacy-Preserving Federated Learning based on the Principle of Least Privileg | ShiMao Xu et.al. | 2410.19548 | null |
2024-10-25 | SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models | Jahyun Koo et.al. | 2410.19503 | null |
2024-10-24 | AlignCap: Aligning Speech Emotion Captioning to Human Preferences | Ziqi Liang et.al. | 2410.19134 | null |
2024-10-24 | High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws | M. Emrullah Ildiz et.al. | 2410.18837 | null |
2024-10-24 | Knowledge Distillation Using Frontier Open-source LLMs: Generalizability and the Role of Synthetic Data | Anup Shirgaonkar et.al. | 2410.18588 | null |
2024-10-24 | SIKeD: Self-guided Iterative Knowledge Distillation for mathematical reasoning | Shivam Adarsh et.al. | 2410.18574 | link |
2024-10-23 | ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams | Srija Anand et.al. | 2410.17901 | null |
2024-10-23 | Towards Active Participant-Centric Vertical Federated Learning: Some Representations May Be All You Need | Jon Irureta et.al. | 2410.17648 | null |
2024-10-23 | Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation | Muquan Li et.al. | 2410.17606 | link |
2024-10-23 | Physics-driven AI for Channel Estimation in Cellular Network | Xiaoqian Qi et.al. | 2410.17525 | null |
2024-10-22 | MiniPLM: Knowledge Distillation for Pre-Training Language Models | Yuxian Gu et.al. | 2410.17215 | link |
2024-10-22 | Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios | Kai Wang et.al. | 2410.17193 | link |
2024-10-22 | CK4Gen: A Knowledge Distillation Framework for Generating High-Utility Synthetic Survival Datasets in Healthcare | Nicholas I-Hsien Kuo et.al. | 2410.16872 | null |
2024-10-22 | AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models | Yongjian Wu et.al. | 2410.16820 | link |
2024-10-22 | SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation | Jing-Jing Li et.al. | 2410.16665 | null |
2024-10-21 | Pre-training Distillation for Large Language Models: A Design Space Exploration | Hao Peng et.al. | 2410.16215 | null |
2024-10-18 | Interpreting Microbiome Relative Abundance Data Using Symbolic Regression | Swagatam Haldar et.al. | 2410.16109 | link |
2024-10-21 | Are Large-scale Soft Labels Necessary for Large-scale Dataset Distillation? | Lingao Xiao et.al. | 2410.15919 | link |
2024-10-21 | Model Mimic Attack: Knowledge Distillation for Provably Transferable Adversarial Examples | Kirill Lukyanov et.al. | 2410.15889 | null |
2024-10-20 | Hybrid Memory Replay: Blending Real and Distilled Data for Class Incremental Learning | Jiangtao Kong et.al. | 2410.15372 | null |
2024-10-20 | GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning | Haiwen Diao et.al. | 2410.15266 | link |
2024-10-19 | LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound | Xuechen Guo et.al. | 2410.15074 | null |
2024-10-19 | Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS | Tuan Nam Nguyen et.al. | 2410.14997 | null |
2024-10-17 | CAKD: A Correlation-Aware Knowledge Distillation Framework Based on Decoupling Kullback-Leibler Divergence | Zao Zhang et.al. | 2410.14741 | null |
2024-10-18 | Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation | Shuai Zhao et.al. | 2410.14425 | link |
2024-10-18 | Preview-based Category Contrastive Learning for Knowledge Distillation | Muhe Ding et.al. | 2410.14143 | null |
2024-10-17 | Leveraging Fine-Tuned Language Models for Efficient and Accurate Smart Contract Auditing | Zhiyuan Wei et.al. | 2410.13918 | link |
2024-10-17 | GDeR: Safeguarding Efficiency, Balancing, and Robustness via Prototypical Graph Pruning | Guibin Zhang et.al. | 2410.13761 | link |
2024-10-17 | An Active Learning Framework for Inclusive Generation by Large Language Models | Sabit Hassan et.al. | 2410.13641 | null |
2024-10-18 | Towards Satellite Non-IID Imagery: A Spectral Clustering-Assisted Federated Learning Approach | Luyao Zou et.al. | 2410.13602 | null |
2024-10-17 | Enhancing Dataset Distillation via Label Inconsistency Elimination and Learning Pattern Refinement | Chuhao Zhou et.al. | 2410.13311 | link |
2024-10-18 | Cyber Attacks Prevention Towards Prosumer-based EV Charging Stations: An Edge-assisted Federated Prototype Knowledge Distillation Approach | Luyao Zou et.al. | 2410.13260 | null |
2024-10-16 | TAS: Distilling Arbitrary Teacher and Student via a Hybrid Assistant | Guopeng Li et.al. | 2410.12342 | null |
2024-10-16 | Optimizing YOLOv5s Object Detection through Knowledge Distillation algorithm | Guanming Huang et.al. | 2410.12259 | null |
2024-10-16 | TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration | Yiwei Guo et.al. | 2410.12183 | link |
2024-10-17 | SAM-Guided Masked Token Prediction for 3D Scene Understanding | Zhimin Chen et.al. | 2410.12158 | null |
2024-10-15 | MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router | Yanyue Xie et.al. | 2410.12013 | null |
2024-10-15 | Breaking Modality Gap in RGBT Tracking: Coupled Knowledge Distillation | Andong Lu et.al. | 2410.11586 | link |
2024-10-15 | Learning from Imperfect Data: Towards Efficient Knowledge Distillation of Autoregressive Language Models for Text-to-SQL | Qihuang Zhong et.al. | 2410.11371 | null |
2024-10-15 | Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling | Wenda Xu et.al. | 2410.11325 | null |
2024-10-14 | BrainMVP: Multi-modal Vision Pre-training for Brain Image Analysis using Multi-parametric MRI | Shaohao Rui et.al. | 2410.10604 | null |
2024-10-14 | ROSAR: An Adversarial Re-Training Framework for Robust Side-Scan Sonar Object Detection | Martin Aubard et.al. | 2410.10554 | link |
2024-10-14 | Temperature-Centric Investigation of Speculative Decoding with Knowledge Distillation | Siru Ouyang et.al. | 2410.10141 | null |
2024-10-14 | REHRSeg: Unleashing the Power of Self-Supervised Super-Resolution for Resource-Efficient 3D MRI Segmentation | Zhiyun Song et.al. | 2410.10097 | null |
2024-10-15 | Self-Data Distillation for Recovering Quality in Pruned Large Language Models | Vithursan Thangarasa et.al. | 2410.09982 | null |
2024-10-13 | Generalized Group Data Attribution | Dan Ley et.al. | 2410.09940 | null |
2024-10-12 | Distilling Invariant Representations with Dual Augmentation | Nikolaos Giakoumoglou et.al. | 2410.09474 | null |
2024-10-12 | Declarative Knowledge Distillation from Large Language Models for Visual Question Answering Datasets | Thomas Eiter et.al. | 2410.09428 | link |
2024-10-15 | Transforming In-Vehicle Network Intrusion Detection: VAE-based Knowledge Distillation Meets Explainable AI | Muhammet Anil Yagiz et.al. | 2410.09043 | null |
2024-10-11 | Mentor-KD: Making Small Language Models Better Multi-step Reasoners | Hojae Lee et.al. | 2410.09037 | link |
2024-10-11 | Contrastive Knowledge Distillation for Robust Multimodal Sentiment Analysis | Zhongyi Sang et.al. | 2410.08692 | null |
2024-10-11 | DistDD: Distributed Data Distillation Aggregation through Gradient Matching | Peiran Wang et.al. | 2410.08665 | null |
2024-10-11 | GAI-Enabled Explainable Personalized Federated Semi-Supervised Learning | Yubo Peng et.al. | 2410.08634 | null |
2024-10-11 | Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both | Abhijnan Nath et.al. | 2410.08458 | null |
2024-10-10 | What is Left After Distillation? How Knowledge Transfer Impacts Fairness and Bias | Aida Mohammadshahi et.al. | 2410.08407 | null |
2024-10-10 | A Lightweight Target-Driven Network of Stereo Matching for Inland Waterways | Jing Su et.al. | 2410.07915 | null |
2024-10-10 | SNN-PAR: Energy Efficient Pedestrian Attribute Recognition via Spiking Neural Networks | Haiyang Wang et.al. | 2410.07857 | link |
2024-10-12 | Relational Diffusion Distillation for Efficient Image Generation | Weilun Feng et.al. | 2410.07679 | link |
2024-10-10 | Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching | Ruonan Yu et.al. | [2410.07579](http://arxiv.org/abs/2410.07 |