Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-12-16 | Wonderland: Navigating 3D Scenes from a Single Image | Hanwen Liang et.al. | 2412.12091v1 | null |
2024-12-16 | Instruction-based Image Manipulation by Watching How Things Move | Mingdeng Cao et.al. | 2412.12087v1 | null |
2024-12-16 | CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology | Yuxuan Sun et.al. | 2412.12077v1 | null |
2024-12-16 | CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding | Guo Chen et.al. | 2412.12075v1 | null |
2024-12-16 | Exploring Semantic Consistency and Style Diversity for Domain Generalized Semantic Segmentation | Hongwei Niu et.al. | 2412.12050v1 | link |
2024-12-16 | Deep-learning-based identification of individual motion characteristics from upper-limb trajectories towards disorder stage evaluation | Tim Sziburis et.al. | 2412.12016v1 | null |
2024-12-16 | Cost-Effective Label-free Node Classification with LLMs | Taiyan Zhang et.al. | 2412.11983v1 | null |
2024-12-16 | On the Nielsen-Thomsen sequence | Laurent Cantier et.al. | 2412.11975v1 | null |
2024-12-16 | On vertex-transitive distance-regular covers of complete graphs with an extremal smallest eigenvalue | Ludmila Yu. Tsiovkina et.al. | 2412.11962v1 | null |
2024-12-16 | Gramian Multimodal Representation Learning and Alignment | Giordano Cicchetti et.al. | 2412.11959v1 | null |
2024-12-13 | UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities | Muhammad Uzair Khattak et.al. | 2412.10372v1 | link |
2024-12-13 | Apollo: An Exploration of Video Understanding in Large Multimodal Models | Orr Zohar et.al. | 2412.10360v1 | null |
2024-12-13 | Robust image classification with multi-modal large language models | Francesco Villani et.al. | 2412.10353v1 | null |
2024-12-13 | BrushEdit: All-In-One Image Inpainting and Editing | Yaowei Li et.al. | 2412.10316v1 | null |
2024-12-13 | Performance evaluation of predictive AI models to support medical decisions: Overview and guidance | Ben Van Calster et.al. | 2412.10288v1 | null |
2024-12-13 | TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation | Xingrui Wang et.al. | 2412.10275v1 | null |
2024-12-13 | Reasoner Outperforms: Generative Stance Detection with Rationalization for Social Media | Jiaqing Yuan et.al. | 2412.10266v1 | null |
2024-12-13 | Adversarial Robustness of Bottleneck Injected Deep Neural Networks for Task-Oriented Communication | Alireza Furutanpey et.al. | 2412.10265v1 | null |
2024-12-13 | MVQ:Towards Efficient DNN Compression and Acceleration with Masked Vector Quantization | Shuaiting Li et.al. | 2412.10261v1 | null |
2024-12-13 | Copy-Move Detection in Optical Microscopy: A Segmentation Network and A Dataset | Hao-Chiang Shao et.al. | 2412.10258v1 | null |
2024-12-12 | Doe-1: Closed-Loop Autonomous Driving with Large World Model | Wenzhao Zheng et.al. | 2412.09627v1 | link |
2024-12-12 | FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion | Haonan Qiu et.al. | 2412.09626v1 | null |
2024-12-12 | GenEx: Generating an Explorable World | Taiming Lu et.al. | 2412.09624v1 | null |
2024-12-12 | OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation | Weiqi Li et.al. | 2412.09623v1 | null |
2024-12-12 | Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos | Linyi Jin et.al. | 2412.09621v1 | null |
2024-12-12 | Learning Camera Movement Control from Real-World Drone Videos | Yunzhong Hou et.al. | 2412.09620v1 | null |
2024-12-12 | NormalFlow: Fast, Robust, and Accurate Contact-based Object 6DoF Pose Tracking with Vision-based Tactile Sensors | Hung-Jui Huang et.al. | 2412.09617v1 | link |
2024-12-12 | V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding | Junqi Ge et.al. | 2412.09616v1 | link |
2024-12-12 | PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models | Chenyu Yang et.al. | 2412.09613v1 | null |
2024-12-12 | Olympus: A Universal Task Router for Computer Vision Tasks | Yuanze Lin et.al. | 2412.09612v1 | link |
2024-12-11 | StreamChat: Chatting with Streaming Video | Jihao Liu et.al. | 2412.08646v1 | null |
2024-12-11 | Generative Semantic Communication: Architectures, Technologies, and Applications | Jinke Ren et.al. | 2412.08642v1 | null |
2024-12-11 | Multimodal Latent Language Modeling with Next-Token Diffusion | Yutao Sun et.al. | 2412.08635v1 | null |
2024-12-11 | MNIST-Fraction: Enhancing Math Education with AI-Driven Fraction Detection and Analysis | Pegah Ahadian et.al. | 2412.08633v1 | null |
2024-12-11 | Image Retrieval Methods in the Dissimilarity Space | Madhu Kiran et.al. | 2412.08618v1 | null |
2024-12-11 | CCSNscore: A multi-input deep learning tool for classification of core-collapse supernovae using SED-Machine spectra | Yashvi Sharma et.al. | 2412.08601v1 | null |
2024-12-11 | RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation | Mingfei Han et.al. | 2412.08591v1 | null |
2024-12-11 | SPACE-SUIT: An Artificial Intelligence based chromospheric feature extractor and classifier for SUIT | Pranava Seth et.al. | 2412.08589v1 | null |
2024-12-11 | Advancing Single- and Multi-task Text Classification through Large Language Model Fine-tuning | Hang Zhao et.al. | 2412.08587v1 | null |
2024-12-11 | Utilizing Multi-step Loss for Single Image Reflection Removal | Abdelrahman Elnenaey et.al. | 2412.08582v1 | link |
2024-12-10 | Video Motion Transfer with Diffusion Transformers | Alexander Pondaven et.al. | 2412.07776v1 | link |
2024-12-10 | UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics | Xi Chen et.al. | 2412.07774v1 | null |
2024-12-10 | From Slow Bidirectional to Fast Causal Video Generators | Tianwei Yin et.al. | 2412.07772v1 | null |
2024-12-10 | From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos | Matthew Wallingford et.al. | 2412.07770v1 | null |
2024-12-10 | Learning Visual Generative Priors without Text | Shuailei Ma et.al. | 2412.07767v1 | null |
2024-12-10 | Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation | Jingxi Chen et.al. | 2412.07761v1 | null |
2024-12-10 | SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints | Jianhong Bai et.al. | 2412.07760v1 | link |
2024-12-10 | 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation | Xiao Fu et.al. | 2412.07759v1 | null |
2024-12-10 | PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation | Fatemeh Nazarieh et.al. | 2412.07754v1 | null |
2024-12-10 | On Motion Blur and Deblurring in Visual Place Recognition | Timur Ismagilov et.al. | 2412.07751v1 | null |
2024-12-09 | [MASK] is All You Need | Vincent Tao Hu et.al. | 2412.06787v1 | link |
2024-12-09 | P3-PO: Prescriptive Point Priors for Visuo-Spatial Generalization of Robot Policies | Mara Levy et.al. | 2412.06784v1 | null |
2024-12-09 | Convolution goes higher-order: a biologically inspired mechanism empowers image classification | Simone Azeglio et.al. | 2412.06740v1 | null |
2024-12-09 | JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM | Takuro Fujii et.al. | 2412.06738v1 | null |
2024-12-09 | Demystifying shock breakout spectra | Christopher M. Irwin et.al. | 2412.06734v1 | null |
2024-12-09 | Parkinson's Disease Diagnosis Through Deep Learning: A Novel LSTM-Based Approach for Freezing of Gait Detection | Aqib Nazir Mir et.al. | 2412.06709v1 | null |
2024-12-09 | You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale | Baorui Ma et.al. | 2412.06699v1 | null |
2024-12-09 | FedSynthCT-Brain: A Federated Learning Framework for Multi-Institutional Brain MRI-to-CT Synthesis | Ciro Benito Raggio et.al. | 2412.06690v1 | null |
2024-12-09 | Impact of Privacy Parameters on Deep Learning Models for Image Classification | Basanta Chaulagain et.al. | 2412.06689v1 | null |
2024-12-09 | Diff5T: Benchmarking Human Brain Diffusion MRI with an Extensive 5.0 Tesla K-Space and Spatial Dataset | Shanshan Wang et.al. | 2412.06666v1 | null |
2024-12-06 | Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model | Lening Wang et.al. | 2412.05280v1 | link |
2024-12-06 | Sparse autoencoders reveal selective remapping of visual concepts during adaptation | Hyesu Lim et.al. | 2412.05276v1 | link |
2024-12-06 | MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models | Tuna Han Salih Meral et.al. | 2412.05275v1 | null |
2024-12-06 | Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling | Zhe Chen et.al. | 2412.05271v1 | null |
2024-12-06 | Mind the Time: Temporally-Controlled Multi-Event Video Generation | Ziyi Wu et.al. | 2412.05263v1 | null |
2024-12-06 | TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft | Qian Long et.al. | 2412.05255v1 | link |
2024-12-06 | Uncertainty Quantification for Transformer Models for Dark-Pattern Detection | Javier Muñoz et.al. | 2412.05251v1 | null |
2024-12-06 | ColonNet: A Hybrid Of DenseNet121 And U-NET Model For Detection And Segmentation Of GI Bleeding | Ayushman Singh et.al. | 2412.05216v1 | null |
2024-12-06 | LinVT: Empower Your Image-level Large Language Model to Understand Videos | Lishuai Gao et.al. | 2412.05185v1 | link |
2024-12-06 | DreamColour: Controllable Video Colour Editing without Training | Chaitat Utintu et.al. | 2412.05180v1 | null |
2024-12-05 | PaintScene4D: Consistent 4D Scene Generation from Text Prompts | Vinayak Gupta et.al. | 2412.04471v1 | null |
2024-12-05 | QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos | Sharath Girish et.al. | 2412.04469v1 | null |
2024-12-05 | NVILA: Efficient Frontier Visual Language Models | Zhijian Liu et.al. | 2412.04468v1 | null |
2024-12-05 | VisionZip: Longer is Better but Not Necessary in Vision Language Models | Senqiao Yang et.al. | 2412.04467v1 | link |
2024-12-05 | MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos | Zhengqi Li et.al. | 2412.04463v1 | null |
2024-12-05 | 4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion | Chaoyang Wang et.al. | 2412.04462v1 | null |
2024-12-05 | Four-Plane Factorized Video Autoencoders | Mohammed Suhail et.al. | 2412.04452v1 | null |
2024-12-05 | MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation | Longtao Zheng et.al. | 2412.04448v1 | null |
2024-12-05 | EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios | Lu Qiu et.al. | 2412.04447v1 | null |
2024-12-05 | DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models | Yizhuo Li et.al. | 2412.04446v1 | null |
2024-12-04 | Navigation World Models | Amir Bar et.al. | 2412.03572v1 | null |
2024-12-04 | The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control | Ruili Feng et.al. | 2412.03568v1 | null |
2024-12-04 | Streaming Detection of Queried Event Start | Cristobal Eyzaguirre et.al. | 2412.03567v1 | null |
2024-12-04 | Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning | Wujian Peng et.al. | 2412.03565v1 | null |
2024-12-04 | From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents | Xinyi Mou et.al. | 2412.03563v1 | null |
2024-12-04 | Imagine360: Immersive 360 Video Generation from Perspective Anchor | Jing Tan et.al. | 2412.03552v1 | null |
2024-12-04 | Kibble-Zurek Dynamics & Statistics of Topological Defects in Chiral Superfluid $^3$He Films | Noble Gluscevich et.al. | 2412.03544v1 | null |
2024-12-04 | Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos | Hanxue Liang et.al. | 2412.03526v1 | null |
2024-12-04 | Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention | Hannan Lu et.al. | 2412.03520v1 | null |
2024-12-04 | Distillation of Diffusion Features for Semantic Correspondence | Frank Fundel et.al. | 2412.03512v1 | null |
2024-12-03 | Motion Prompting: Controlling Video Generation with Motion Trajectories | Daniel Geng et.al. | 2412.02700v1 | null |
2024-12-03 | An ADHD Diagnostic Interface Based on EEG Spectrograms and Deep Learning Techniques | Medha Pappula et.al. | 2412.02695v1 | null |
2024-12-03 | FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation | Kefan Chen et.al. | 2412.02690v1 | null |
2024-12-03 | AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction | Lingteng Qiu et.al. | 2412.02684v1 | null |
2024-12-03 | On Third-Order Evolution Systems Describing Pseudo-Spherical or Spherical Surfaces | Filipe Kelmer et.al. | 2412.02657v1 | null |
2024-12-03 | Robust soybean seed yield estimation using high-throughput ground robot videos | Jiale Feng et.al. | 2412.02642v1 | null |
2024-12-03 | QA-TOOLBOX: Conversational Question-Answering for process task guidance in manufacturing | Ramesh Manuvinakurike et.al. | 2412.02638v1 | null |
2024-12-03 | Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback | Hiroki Furuta et.al. | 2412.02617v1 | null |
2024-12-03 | Interpretable Company Similarity with Sparse Autoencoders | Marco Molinari et.al. | 2412.02605v1 | null |
2024-12-03 | Efficient Algorithms for Low Tubal Rank Tensor Approximation with Applications to Image Compression, Super-Resolution and Deep Learning | Salman Ahmadi-Asl et.al. | 2412.02598v1 | null |
2024-12-02 | T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs | Shukang Yin et.al. | 2411.19951v2 | link |
2024-11-29 | AlphaTablets: A Generic Plane Representation for 3D Planar Reconstruction from Monocular Videos | Yuze He et.al. | 2411.19950v1 | null |
2024-11-29 | Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark | Joseph Heyward et.al. | 2411.19941v1 | null |
2024-11-29 | SIMS: Simulating Human-Scene Interactions with Real World Script Planning | Wenjia Wang et.al. | 2411.19921v1 | null |
2024-11-29 | Noncommutative Model Selection for Data Clustering and Dimension Reduction Using Relative von Neumann Entropy | Araceli Guzmán-Tristán et.al. | 2411.19902v1 | null |
2024-11-29 | To the Problem of Cosmic Expansion in Massive Gravity | Lavinia Heisenberg et.al. | 2411.19873v1 | null |
2024-11-29 | AIDetx: a compression-based method for identification of machine-learning generated text | Leonardo Almeida et.al. | 2411.19869v1 | link |
2024-11-29 | Towards Class-wise Robustness Analysis | Tejaswini Medi et.al. | 2411.19853v1 | null |
2024-11-29 | Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation | Dimosthenis Antypas et.al. | 2411.19832v1 | null |
2024-11-29 | A new definition of outsplitting on |
Mackenzie Amann et.al. | 2411.19816v1 | null |
2024-11-27 | GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data | Wentao Wang et.al. | 2411.18624v1 | null |
2024-11-27 | Leveraging Semi-Supervised Learning to Enhance Data Mining for Image Classification under Limited Labeled Data | Aoran Shen et.al. | 2411.18622v1 | null |
2024-11-27 | CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models | Rundi Wu et.al. | 2411.18613v1 | null |
2024-11-27 | Novel Class Discovery for Open Set Raga Classification | Parampreet Singh et.al. | 2411.18611v1 | null |
2024-11-27 | Variability of hot sub-luminous stars and binaries: Machine learning analysis of Gaia DR3 multi-epoch photometry | P. Ranaivomanana et.al. | 2411.18609v1 | null |
2024-11-27 | Evaluating and Improving the Effectiveness of Synthetic Chest X-Rays for Medical Image Analysis | Eva Prakash et.al. | 2411.18602v1 | null |
2024-11-27 | Periodic symplectic and Hamiltonian diffeomorphisms on irrational ruled surfaces | Nicholas Lindsay et.al. | 2411.18580v1 | null |
2024-11-27 | Pruning Deep Convolutional Neural Network Using Conditional Mutual Information | Tien Vu-Van et.al. | 2411.18578v1 | null |
2024-11-27 | Exploring Depth Information for Detecting Manipulated Face Videos | Haoyue Wang et.al. | 2411.18572v1 | null |
2024-11-27 | Perturbation Ontology based Graph Attention Networks | Yichen Wang et.al. | 2411.18520v1 | null |
2024-11-26 | Video-Guided Foley Sound Generation with Multimodal Controls | Ziyang Chen et.al. | 2411.17698v1 | null |
2024-11-26 | StableAnimator: High-Quality Identity-Preserving Human Image Animation | Shuyuan Tu et.al. | 2411.17697v1 | link |
2024-11-26 | Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis | Akshita Gupta et.al. | 2411.17690v1 | null |
2024-11-26 | BERT or FastText? A Comparative Analysis of Contextual as well as Non-Contextual Embeddings | Abhay Shanbhag et.al. | 2411.17661v1 | null |
2024-11-26 | DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting | Christian Homeyer et.al. | 2411.17660v1 | link |
2024-11-26 | SAMWISE: Infusing wisdom in SAM2 for Text-Driven Video Segmentation | Claudia Cuttano et.al. | 2411.17646v1 | link |
2024-11-26 | A robust image encryption scheme based on new 4-D hyperchaotic system and elliptic curve | Yehia Lalili et.al. | 2411.17643v1 | null |
2024-11-26 | On Limitations of LLM as Annotator for Low Resource Languages | Suramya Jadhav et.al. | 2411.17637v1 | null |
2024-11-26 | An Ensemble Approach for Brain Tumor Segmentation and Synthesis | Juampablo E. Heras Rivera et.al. | 2411.17617v1 | null |
2024-11-26 | Accelerating Vision Diffusion Transformers with Skip Branches | Guanjie Chen et.al. | 2411.17616v1 | link |
2024-11-25 | Generative Omnimatte: Learning to Decompose Video into Layers | Yao-Chih Lee et.al. | 2411.16683v1 | null |
2024-11-25 | Quark: Real-time, High-resolution, and General Neural View Synthesis | John Flynn et.al. | 2411.16680v1 | null |
2024-11-25 | A Supervised Machine Learning Approach for Assessing Grant Peer Review Reports | Gabriel Okasa et.al. | 2411.16662v1 | null |
2024-11-25 | Fast training of large kernel models with delayed projections | Amirhesam Abedsoltan et.al. | 2411.16658v1 | null |
2024-11-25 | DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation | Zun Wang et.al. | 2411.16657v1 | null |
2024-11-25 | Automated Registration of 3D Neurovascular Territory Atlas to 2D DSA for Targeted Quantitative Angiography Analysis | George Dimopoulos et.al. | 2411.16637v1 | null |
2024-11-25 | LegoPET: Hierarchical Feature Guided Conditional Diffusion for PET Image Reconstruction | Yiran Sun et.al. | 2411.16629v1 | null |
2024-11-25 | Inference-Time Policy Steering through Human Interactions | Yanwei Wang et.al. | 2411.16627v1 | null |
2024-11-25 | Imperceptible Adversarial Examples in the Physical World | Weilin Xu et.al. | 2411.16622v1 | null |
2024-11-25 | Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric | Zhichao Zhang et.al. | 2411.16619v1 | null |
2024-11-22 | Health AI Developer Foundations | Atilla P. Kiraly et.al. | 2411.15128v1 | null |
2024-11-22 | PRIMUS: Pretraining IMU Encoders with Multimodal Self-Supervision | Arnav M. Das et.al. | 2411.15127v1 | null |
2024-11-22 | VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement | Daeun Lee et.al. | 2411.15115v1 | null |
2024-11-22 | About Time: Advances, Challenges, and Outlooks of Action Understanding | Alexandros Stergiou et.al. | 2411.15106v1 | null |
2024-11-22 | Efficient Radar Modulation Recognition via a Noise-Aware Ensemble Neural Network | Do-Hyun Park et.al. | 2411.15104v1 | null |
2024-11-22 | RED: Effective Trajectory Representation Learning with Comprehensive Information | Silin Zhou et.al. | 2411.15096v1 | null |
2024-11-22 | Dimension-independent rates for structured neural density estimation | Robert A. Vandermeulen et.al. | 2411.15095v1 | null |
2024-11-22 | Quantum-enhanced unsupervised image segmentation for medical images analysis | Laia Domingo et.al. | 2411.15086v1 | null |
2024-11-22 | Leapfrog Latent Consistency Model (LLCM) for Medical Images Generation | Lakshmikar R. Polamreddy et.al. | 2411.15084v1 | link |
2024-11-22 | RankByGene: Gene-Guided Histopathology Representation Learning Through Cross-Modal Ranking Consistency | Wentao Huang et.al. | 2411.15076v1 | null |
2024-11-21 | Revisiting the Integration of Convolution and Attention for Vision Backbone | Lei Zhu et.al. | 2411.14429v1 | link |
2024-11-21 | Quantum States Imaging of Magnetic Field Contours based on Autler-Townes Effect in Yb Atoms | Tanaporn Na Narong et.al. | 2411.14426v1 | null |
2024-11-21 | Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation | Zhuoman Liu et.al. | 2411.14423v1 | null |
2024-11-21 | Multimodal 3D Brain Tumor Segmentation with Adversarial Training and Conditional Random Field | Lan Jiang et.al. | 2411.14418v1 | null |
2024-11-21 | Multimodal Autoregressive Pre-training of Large Vision Encoders | Enrico Fini et.al. | 2411.14402v1 | link |
2024-11-21 | Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding | Yiming Zhang et.al. | 2411.14401v1 | null |
2024-11-21 | POS-tagging to highlight the skeletal structure of sentences | Grigorii Churakov et.al. | 2411.14393v1 | link |
2024-11-21 | Persistent Homology for Structural Characterization in Disordered Systems | An Wang et.al. | 2411.14390v1 | link |
2024-11-21 | Enhancing Diagnostic Precision in Gastric Bleeding through Automated Lesion Segmentation: A Deep DuS-KFCM Approach | Xian-Xian Liu et.al. | 2411.14385v1 | null |
2024-11-21 | Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation | Yuanhao Cai et.al. | 2411.14384v1 | null |
2024-11-20 | REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents | Rui Tian et.al. | 2411.13552v1 | link |
2024-11-20 | Generating 3D-Consistent Videos from Unposed Internet Photos | Gene Chou et.al. | 2411.13549v1 | null |
2024-11-20 | Comparative Analysis of Machine Learning and Deep Learning Models for Classifying Squamous Epithelial Cells of the Cervix | Subhasish Das et.al. | 2411.13535v1 | null |
2024-11-20 | Predictive Insights into LGBTQ+ Minority Stress: A Transductive Exploration of Social Media Discourse | S. Chapagain et.al. | 2411.13534v1 | null |
2024-11-20 | Geometric Algebra Planes: Convex Implicit Neural Volumes | Irmak Sivgin et.al. | 2411.13525v1 | null |
2024-11-20 | VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models | Ziqi Huang et.al. | 2411.13503v1 | link |
2024-11-20 | Efficient Brain Imaging Analysis for Alzheimer's and Dementia Detection Using Convolution-Derivative Operations | Yasmine Mustafa et.al. | 2411.13490v1 | null |
2024-11-20 | Benchmarking Quantum Convolutional Neural Networks for Classification and Data Compression Tasks | Jun Yong Khoo et.al. | 2411.13468v1 | null |
2024-11-20 | Heuristically Adaptive Diffusion-Model Evolutionary Strategy | Benedikt Hartl et.al. | 2411.13420v1 | null |
2024-11-20 | Transformer-Based Contextualized Language Models Joint with Neural Networks for Natural Language Inference in Vietnamese | Dat Van-Thanh Nguyen et.al. | 2411.13407v1 | null |
2024-11-19 | Soft Robotic Dynamic In-Hand Pen Spinning | Yunchao Yao et.al. | 2411.12734v1 | null |
2024-11-19 | Enhancing Multi-Class Disease Classification: Neoplasms, Cardiovascular, Nervous System, and Digestive Disorders Using Advanced LLMs | Ahmed Akib Jawad Karim et.al. | 2411.12712v1 | null |
2024-11-19 | UBSoft: A Simulation Platform for Robotic Skill Learning in Unbounded Soft Environments | Chunru Lin et.al. | 2411.12711v1 | null |
2024-11-19 | Attribute Inference Attacks for Federated Regression Tasks | Francesco Diana et.al. | 2411.12697v1 | null |
2024-11-19 | IMUVIE: Pickup Timeline Action Localization via Motion Movies | John Clapham et.al. | 2411.12689v1 | null |
2024-11-19 | AI Guided Early Screening of Cervical Cancer | Dharanidharan S I et.al. | 2411.12681v1 | null |
2024-11-19 | Yang--Mills topology on four-dimensional triangulations | Giuseppe Clemente et.al. | 2411.12668v1 | null |
2024-11-19 | Machine Learning Approaches on Crop Pattern Recognition a Comparative Analysis | Kazi Hasibul Kabir et.al. | 2411.12667v1 | null |
2024-11-19 | PoM: Efficient Image and Video Generation with the Polynomial Mixer | David Picard et.al. | 2411.12663v1 | link |
2024-11-19 | AdaCM$^2$: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction | Yuanbin Man et.al. | 2411.12593v1 | null |
2024-11-18 | Partially Hyperbolic Dynamics with Quasi-isometric Center | Ziqiang Feng et.al. | 2411.11836v1 | null |
2024-11-18 | Describe Now: User-Driven Audio Description for Blind and Low Vision Individuals | Maryam Cheema et.al. | 2411.11835v1 | null |
2024-11-18 | Absorbing state dynamics of stochastic gradient descent | Guanming Zhang et.al. | 2411.11834v1 | null |
2024-11-18 | Equivariant spatio-hemispherical networks for diffusion MRI deconvolution | Axel Elaldi et.al. | 2411.11819v1 | link |
2024-11-18 | Edge-Enhanced Dilated Residual Attention Network for Multimodal Medical Image Fusion | Meng Zhou et.al. | 2411.11799v1 | link |
2024-11-18 | Exploring adversarial robustness of JPEG AI: methodology, comparison and new methods | Egor Kovalev et.al. | 2411.11795v1 | null |
2024-11-18 | Energy shifts and broadening of excitonic resonances in electrostatically-doped semiconductors | Hanan Dery et.al. | 2411.11790v1 | null |
2024-11-18 | High-Speed Cornering Control and Real-Vehicle Deployment for Autonomous Electric Vehicles | Shiyue Zhao et.al. | 2411.11762v1 | null |
2024-11-18 | Additional Tests for TV 3.0 | Eduardo Peixoto et.al. | 2411.11755v1 | null |
2024-11-18 | Advacheck at GenAI Detection Task 1: AI Detection Powered by Domain-Aware Multi-Tasking | German Gritsai et.al. | 2411.11736v1 | null |
2024-11-15 | The Spatial Complexity of Optical Computing and How to Reduce It | Yandong Li et.al. | 2411.10435v1 | null |
2024-11-15 | Private Counterfactual Retrieval With Immutable Features | Shreya Meel et.al. | 2411.10429v1 | null |
2024-11-15 | Back to Supervision: Boosting Word Boundary Detection through Frame Classification | Simone Carnemolla et.al. | 2411.10423v1 | null |
2024-11-15 | Multiscale Dubuc: A New Similarity Measure for Time Series | Mahsa Khazaei et.al. | 2411.10418v1 | null |
2024-11-15 | Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations | Jianfeng Chi et.al. | 2411.10414v1 | null |
2024-11-15 | Experimental demonstration of Tessellation Structured Illumination Microscopy | Doron Shterman et.al. | 2411.10405v1 | null |
2024-11-15 | On the Foundation Model for Cardiac MRI Reconstruction | Chi Zhang et.al. | 2411.10403v1 | null |
2024-11-15 | Tropical combinatorics of max-linear Bayesian networks | Carlos Améndola et.al. | 2411.10394v1 | null |
2024-11-15 | Mechanisms of Generative Image-to-Image Translation Networks | Guangzong Chen et.al. | 2411.10368v1 | null |
2024-11-15 | On the Cost of Model-Serving Frameworks: An Experimental Evaluation | Pasquale De Rosa et.al. | 2411.10337v1 | null |
2024-11-14 | Towards a Classification of Open-Source ML Models and Datasets for Software Engineering | Alexandra González et.al. | 2411.09683v1 | null |
2024-11-14 | Commensurability Among Deligne-Mostow Monodromy Groups | Chenglong Yu et.al. | 2411.09682v1 | null |
2024-11-14 | Modular Fault Diagnosis Framework for Complex Autonomous Driving Systems | Stefan Orf et.al. | 2411.09643v1 | null |
2024-11-14 | The Moral Foundations Weibo Corpus | Renjie Cao et.al. | 2411.09612v1 | null |
2024-11-14 | Effect of viewing angle in Gamma-ray Burst properties | Sreelakshmi P Chakyar et.al. | 2411.09609v1 | null |
2024-11-14 | Local-Global Attention: An Adaptive Mechanism for Multi-Scale Feature Integration | Yifan Shao et.al. | 2411.09604v1 | link |
2024-11-14 | Assessing the Performance of the DINOv2 Self-supervised Learning Vision Transformer Model for the Segmentation of the Left Atrium from MRI Images | Bipasha Kundu et.al. | 2411.09598v1 | null |
2024-11-14 | SMILE-UHURA Challenge -- Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiograms | Soumick Chatterjee et.al. | 2411.09593v1 | null |
2024-11-14 | SimTube: Generating Simulated Video Comments through Multimodal AI and User Personas | Yu-Kai Hung et.al. | 2411.09577v1 | null |
2024-11-14 | Mutual Influence of Photon Sphere and Non-Commutative Parameter in Various Non-Commutative Black Holes: Part I- Towards evidence for WGC | Mohammad Ali S. Afshar et.al. | 2411.09557v1 | null |
2024-11-13 | 4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization | Mijeong Kim et.al. | 2411.08879v1 | null |
2024-11-13 | A Short Note on Evaluating RepNet for Temporal Repetition Counting in Videos | Debidatta Dwibedi et.al. | 2411.08878v1 | link |
2024-11-13 | Quantum cryptography beyond key distribution: theory and experiment | Mathieu Bozzio et.al. | 2411.08877v1 | null |
2024-11-13 | Large Wireless Model (LWM): A Foundation Model for Wireless Channels | Sadjad Alikhani et.al. | 2411.08872v1 | null |
2024-11-13 | AstroM$^3$: A self-supervised multimodal model for astronomy | Mariia Rizhko et.al. | 2411.08842v1 | null |
2024-11-13 | Multimodal Instruction Tuning with Hybrid State Space Models | Jianing Zhou et.al. | 2411.08840v1 | null |
2024-11-13 | Model agnostic local variable importance for locally dependent relationships | Kelvyn K. Bladen et.al. | 2411.08821v1 | null |
2024-11-13 | Identifying Spicules in Mg II: Statistics and Comparisons with Hα | Vicki L. Herde et.al. | 2411.08801v1 | null |
2024-11-13 | Algorithms in 4-manifold topology | Stefan Bastl et.al. | 2411.08775v1 | null |
2024-11-13 | Sharingan: Extract User Action Sequence from Desktop Recordings | Yanting Chen et.al. | 2411.08768v1 | null |
2024-11-12 | Leonardo vindicated: Pythagorean trees for minimal reconstruction of the natural branching structures | Dymitr Ruta et.al. | 2411.08024v1 | null |
2024-11-12 | Artistic Neural Style Transfer Algorithms with Activation Smoothing | Xiangtian Li et.al. | 2411.08014v1 | null |
2024-11-12 | A computer-vision aided Compton-imaging system for radioactive waste characterization and decommissioning of nuclear power plants | Victor Babiano-Suarez et.al. | 2411.07996v1 | null |
2024-11-12 | DINO-LG: A Task-Specific DINO Model for Coronary Calcium Scoring | Mahmut S. Gokmen et.al. | 2411.07976v1 | null |
2024-11-12 | Commissioning An All-Sky Infrared Camera Array for Detection Of Airborne Objects | Laura Dominé et.al. | 2411.07956v1 | null |
2024-11-12 | SimBase: A Simple Baseline for Temporal Video Grounding | Peijun Bao et.al. | 2411.07945v1 | null |
2024-11-12 | DuoLift-GAN:Reconstructing CT from Single-view and Biplanar X-Rays with Generative Adversarial Networks | Zhaoxi Zhang et.al. | 2411.07941v1 | null |
2024-11-12 | Prediction of Acoustic Communication Performance for AUVs using Gaussian Process Classification | Yifei Gao et.al. | 2411.07933v1 | null |
2024-11-12 | CT-Mamba: A Hybrid Convolutional State Space Model for Low-Dose CT Denoising | Linxuan Li et.al. | 2411.07930v1 | null |
2024-11-12 | CryptoLLM: Unleashing the Power of Prompted LLMs for SmartQnA and Classification of Crypto Posts | Aniket Deroy et.al. | 2411.07917v1 | null |
2024-11-11 | Grounding Video Models to Actions through Goal Conditioned Exploration | Yunhao Luo et.al. | 2411.07223v1 | null |
2024-11-11 | NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics | David Robinson et.al. | 2411.07186v1 | null |
2024-11-11 | Enhancing Predictive Maintenance in Mining Mobile Machinery through a TinyML-enabled Hierarchical Inference Network | Raúl de la Fuente et.al. | 2411.07168v1 | null |
2024-11-11 | Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation | Kaijian Zou et.al. | 2411.07130v1 | link |
2024-11-11 | StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification | Yichen He et.al. | 2411.07076v1 | link |
2024-11-11 | Unified Bayesian representation for high-dimensional multi-modal biomedical data for small-sample classification | Albert Belenguer-Llorens et.al. | 2411.07043v1 | null |
2024-11-11 | The Inherent Adversarial Robustness of Analog In-Memory Computing | Corey Lammie et.al. | 2411.07023v1 | null |
2024-11-11 | HeteroSample: Meta-path Guided Sampling for Heterogeneous Graph Representation Learning | Ao Liu et.al. | 2411.07022v1 | null |
2024-11-11 | Token2Wave | Xin Zhang et.al. | 2411.06989v1 | null |
2024-11-11 | A Hyperspectral Imaging Dataset and Methodology for Intraoperative Pixel-Wise Classification of Metastatic Colon Cancer in the Liver | Ivica Kopriva et.al. | 2411.06969v1 | null |
2024-11-08 | Gender Inequalities in Content Collaborations: Asymmetric Creator Synergy and Symmetric Audience Biases | Mingyue Zha et.al. | 2411.05782v1 | null |
2024-11-08 | Sketched Equivariant Imaging Regularization and Deep Internal Learning for Inverse Problems | Guixian Xu et.al. | 2411.05771v1 | null |
2024-11-08 | FisherMask: Enhancing Neural Network Labeling Efficiency in Image Classification Using Fisher Information | Shreen Gul et.al. | 2411.05752v1 | link |
2024-11-08 | Accurate Unsupervised Photon Counting from Transition Edge Sensor Signals | Nicolas Dalbec-Constant et.al. | 2411.05737v1 | null |
2024-11-08 | Poze: Sports Technique Feedback under Data Constraints | Agamdeep Singh et.al. | 2411.05734v1 | null |
2024-11-08 | Differential Privacy Under Class Imbalance: Methods and Empirical Insights | Lucas Rosenblatt et.al. | 2411.05733v1 | null |
2024-11-08 | On-chip rewritable phase-change metasurface for programmable diffractive deep neural networks | Sanaz Zarei et.al. | 2411.05723v1 | null |
2024-11-08 | Classification of ( |
Basdouri Imed et.al. | 2411.05716v1 | null |
2024-11-08 | STARS: Sensor-agnostic Transformer Architecture for Remote Sensing | Ethan King et.al. | 2411.05714v1 | null |
2024-11-08 | Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream | Abdulkadir Gokce et.al. | 2411.05712v1 | link |
2024-11-07 | ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning | David Junhao Zhang et.al. | 2411.05003v1 | null |
2024-11-07 | DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation | Peiqi Liu et.al. | 2411.04999v1 | null |
2024-11-07 | HourVideo: 1-Hour Video-Language Understanding | Keshigeyan Chandrasegaran et.al. | 2411.04998v1 | null |
2024-11-07 | SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation | Koichi Namekata et.al. | 2411.04989v1 | null |
2024-11-07 | Efficient Preparation of Solvable Anyons with Adaptive Quantum Circuits | Yuanjie Ren et.al. | 2411.04985v1 | null |
2024-11-07 | Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries | Dylan Manuel et.al. | 2411.04981v1 | null |
2024-11-07 | Uncovering Hidden Subspaces in Video Diffusion Models Using Re-Identification | Mischa Dombrowski et.al. | 2411.04956v1 | null |
2024-11-07 | Estimating the Influence of Sequentially Correlated Literary Properties in Textual Classification: A Data-Centric Hypothesis-Testing Approach | Gideon Yoffe et.al. | 2411.04950v1 | null |
2024-11-07 | Proof of the absence of local conserved quantities in the spin-1 bilinear-biquadratic chain and its anisotropic extensions | Akihiro Hokkyo et.al. | 2411.04945v1 | null |
2024-11-07 | A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model | Panwen Hu et.al. | 2411.04942v1 | null |
2024-11-06 | RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models | Maya Varma et.al. | 2411.04097v1 | link |
2024-11-06 | Local unitary equivalence of absolutely maximally entangled states constructed from orthogonal arrays | N Ramadas et.al. | 2411.04096v1 | null |
2024-11-06 | A Collaborative Content Moderation Framework for Toxicity Detection based on Conformalized Estimates of Annotation Disagreement | Guillermo Villate-Castillo et.al. | 2411.04090v1 | link |
2024-11-06 | Pseudo-labeling with Keyword Refining for Few-Supervised Video Captioning | Ping Li et.al. | 2411.04059v1 | link |
2024-11-06 | Distinguishing Coupled Dark Energy Models with Neural Networks | L. W. K. Goh et.al. | 2411.04058v1 | link |
2024-11-06 | Synomaly Noise and Multi-Stage Diffusion: A Novel Approach for Unsupervised Anomaly Detection in Ultrasound Imaging | Yuan Bi et.al. | 2411.04004v1 | null |
2024-11-06 | Learning Aggregate Queries Defined by First-Order Logic with Counting | Steffen van Bergerem et.al. | 2411.04003v1 | null |
2024-11-06 | ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks | Ziji Shi et.al. | 2411.03999v1 | null |
2024-11-06 | Fine-tuning -- a Transfer Learning approach | Joseph Arul Raj et.al. | 2411.03941v1 | null |
2024-11-06 | Inter-Frame Coding for Dynamic Meshes via Coarse-to-Fine Anchor Mesh Generation | He Huang et.al. | 2411.03921v1 | null |
2024-11-05 | Classification Done Right for Vision-Language Pre-Training | Huang Zilong et.al. | 2411.03313v1 | link |
2024-11-05 | Automatic solid form classification in pharmaceutical drug development | Julius Lange et.al. | 2411.03308v1 | null |
2024-11-05 | Data-Driven Sampling Based Stochastic MPC for Skid-Steer Mobile Robot Navigation | Ananya Trivedi et.al. | 2411.03289v1 | link |
2024-11-05 | Graph-Based Semi-Supervised Segregated Lipschitz Learning | Farid Bozorgnia et.al. | 2411.03273v1 | null |
2024-11-05 | Tuning into spatial frequency space: Satellite and space debris detection in the ZTF alert stream | J. P. Carvajal et.al. | 2411.03258v1 | null |
2024-11-05 | Kernel Orthogonality does not necessarily imply a Decrease in Feature Map Redundancy in CNNs: Convolutional Similarity Minimization | Zakariae Belmekki et.al. | 2411.03226v1 | null |
2024-11-05 | Beyond Grid Data: Exploring Graph Neural Networks for Earth Observation | Shan Zhao et.al. | 2411.03223v1 | null |
2024-11-05 | Statistical Analysis to Support CSI-Based Sensing Methods | Elena Tonini et.al. | 2411.03203v1 | null |
2024-11-05 | Navigating Extremes: Dynamic Sparsity in Large Output Space | Nasib Ullah et.al. | 2411.03171v1 | null |
2024-11-05 | Pre-trained Visual Dynamics Representations for Efficient Policy Learning | Hao Luo et.al. | 2411.03169v1 | null |
2024-11-04 | Adaptive Caching for Faster Video Generation with Diffusion Transformers | Kumara Kahatapitiya et.al. | 2411.02397v1 | null |
2024-11-04 | AutoVFX: Physically Realistic Video Editing from Natural Language Instructions | Hao-Yu Hsu et.al. | 2411.02394v1 | null |
2024-11-04 | How Far is Video Generation from World Model: A Physical Law Perspective | Bingyi Kang et.al. | 2411.02385v1 | null |
2024-11-04 | Drone Data Analytics for Measuring Traffic Metrics at Intersections in High-Density Areas | Qingwen Pu et.al. | 2411.02349v1 | null |
2024-11-04 | SplatOverflow: Asynchronous Hardware Troubleshooting | Amritansh Kwatra et.al. | 2411.02332v1 | null |
2024-11-04 | PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance | Ruyang Liu et.al. | 2411.02327v1 | link |
2024-11-04 | GenXD: Generating Any 3D and 4D Scenes | Yuyang Zhao et.al. | 2411.02319v1 | null |
2024-11-04 | Information plane and compression-gnostic feedback in quantum machine learning | Nathan Haboury et.al. | 2411.02313v1 | null |
2024-11-04 | Grouped Discrete Representation for Object-Centric Learning | Rongzhen Zhao et.al. | 2411.02299v1 | null |
2024-11-04 | Conformal-in-the-Loop for Learning with Imbalanced Noisy Data | John Brandon Graham-Knight et.al. | 2411.02281v1 | null |
2024-10-31 | EgoMimic: Scaling Imitation Learning via Egocentric Video | Simar Kareer et.al. | 2410.24221v1 | link |
2024-10-31 | Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning | Penghui Ruan et.al. | 2410.24219v1 | link |
2024-10-31 | Learning Video Representations without Natural Videos | Xueyang Yu et.al. | 2410.24213v1 | null |
2024-11-01 | DELTA: Dense Efficient Long-range 3D Tracking for any video | Tuan Duc Ngo et.al. | 2410.24211v2 | null |
2024-10-31 | DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion | Weicai Ye et.al. | 2410.24203v1 | link |
2024-10-31 | DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning | Zhenyu Jiang et.al. | 2410.24185v1 | null |
2024-10-31 | Extended Object Tracking and Classification based on Linear Splines | Matteo Tesori et.al. | 2410.24183v1 | null |
2024-10-31 | Kevin Black et.al. | 2410.24164v1 | null | |
2024-10-31 | Exploring Vision Language Models for Facial Attribute Recognition: Emotion, Race, Gender, and Age | Nouar AlDahoul et.al. | 2410.24148v1 | null |
2024-10-31 | HoloChrome: Polychromatic Illumination for Speckle Reduction in Holographic Near-Eye Displays | Florian Schiffers et.al. | 2410.24144v1 | null |
2024-10-30 | Bridging the Human to Robot Dexterity Gap through Object-Oriented Rewards | Irmak Guzey et.al. | 2410.23289v1 | null |
2024-10-30 | Computing the bridge length: the key ingredient in a continuous isometry classification of periodic point sets | Jonathan McManus et.al. | 2410.23288v1 | null |
2024-10-30 | ReferEverything: Towards Segmenting Everything We Can Speak of in Videos | Anurag Bagchi et.al. | 2410.23287v1 | null |
2024-10-30 | DisCo: Distributed Contact-Rich Trajectory Optimization for Forceful Multi-Robot Collaboration | Ola Shorinwa et.al. | 2410.23283v1 | null |
2024-10-30 | A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization | Bin Wu et.al. | 2410.23279v1 | null |
2024-10-30 | SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation | Yining Hong et.al. | 2410.23277v1 | null |
2024-10-30 | TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models | Ziyao Shangguan et.al. | 2410.23266v1 | link |
2024-10-30 | bit2bit: 1-bit quanta video reconstruction via self-supervised photon prediction | Yehe Liu et.al. | 2410.23247v1 | null |
2024-10-30 | PointRecon: Online Point-based 3D Reconstruction via Ray-based 2D-3D Matching | Chen Ziwen et.al. | 2410.23245v1 | null |
2024-10-31 | Aligning Audio-Visual Joint Representations with an Agentic Workflow | Shentong Mo et.al. | 2410.23230v2 | null |
2024-10-29 | Local Policies Enable Zero-shot Long-horizon Manipulation | Murtaza Dalal et.al. | 2410.22332v1 | null |
2024-10-30 | Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets | Guangqi Jiang et.al. | 2410.22325v2 | null |
2024-10-29 | Enhancing Code Annotation Reliability: Generative AI's Role in Comment Quality Assessment Models | Seetharam Killivalavan et.al. | 2410.22323v1 | null |
2024-10-29 | Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier | Kai Wang et.al. | 2410.22317v1 | link |
2024-10-29 | Convex Formulations for Training Two-Layer ReLU Neural Networks | Karthik Prakhya et.al. | 2410.22311v1 | link |
2024-10-29 | Emotion-Guided Image to Music Generation | Souraja Kundu et.al. | 2410.22299v1 | null |
2024-10-29 | Motion Graph Unleashed: A Novel Approach to Video Prediction | Yiqi Zhong et.al. | 2410.22288v1 | link |
2024-10-29 | Non-LTE Synthetic Observables of a Multidimensional Model of Type Ia Supernovae | Samuel J. Boos et.al. | 2410.22276v1 | null |
2024-10-29 | Leveraging Reverberation and Visual Depth Cues for Sound Event Localization and Detection with Distance Estimation | Davide Berghi et.al. | 2410.22271v1 | null |
2024-10-29 | LipKernel: Lipschitz-Bounded Convolutional Neural Networks via Dissipative Layers | Patricia Pauli et.al. | 2410.22258v1 | link |
2024-10-28 | LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior | Hanyu Wang et.al. | 2410.21264v1 | null |
2024-10-28 | Multi-modal AI for comprehensive breast cancer prognostication | Jan Witowski et.al. | 2410.21256v1 | null |
2024-10-28 | Joint Audio-Visual Idling Vehicle Detection with Streamlined Input Dependencies | Xiwen Li et.al. | 2410.21170v1 | null |
2024-10-28 | KaLDeX: Kalman Filter based Linear Deformable Cross Attention for Retina Vessel Segmentation | Zhihao Zhao et.al. | 2410.21160v1 | null |
2024-10-28 | Synthetica: Large Scale Synthetic Data for Robot Perception | Ritvik Singh et.al. | 2410.21153v1 | null |
2024-10-28 | The tau function for ABS equations | James Atkinson et.al. | 2410.21148v1 | null |
2024-10-28 | Enhancing Learned Image Compression via Cross Window-based Attention | Priyanka Mudgal et.al. | 2410.21144v1 | null |
2024-10-28 | uOttawa at LegalLens-2024: Transformer-based Classification Experiments | Nima Meghdadi et.al. | 2410.21139v1 | link |
2024-10-28 | Do LLMs generate test oracles that capture the actual or the expected program behaviour? | Michael Konstantinou et.al. | 2410.21136v1 | null |
2024-10-28 | Extrapolating Prospective Glaucoma Fundus Images through Diffusion Model in Irregular Longitudinal Sequences | Zhihao Zhao et.al. | 2410.21130v1 | null |
2024-10-25 | Sparse Decomposition of Graph Neural Networks | Yaochen Hu et.al. | 2410.19723v1 | null |
2024-10-25 | Arabic Music Classification and Generation using Deep Learning | Mohamed Elshaarawy et.al. | 2410.19719v1 | null |
2024-10-25 | Enhanced Anomaly Detection in Industrial Control Systems aided by Machine Learning | Vegard Berge et.al. | 2410.19717v1 | null |
2024-10-25 | TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning | Xiangyu Zeng et.al. | 2410.19702v1 | null |
2024-10-25 | MILES: Making Imitation Learning Easy with Self-Supervision | Georgios Papagiannis et.al. | 2410.19693v1 | null |
2024-10-25 | Deep Learning for Classification of Inflammatory Bowel Disease Activity in Whole Slide Images of Colonic Histopathology | Amit Das et.al. | 2410.19690v1 | null |
2024-10-25 | Optimizing Hearthstone Agents using an Evolutionary Algorithm | Pablo García-Sánchez et.al. | 2410.19681v1 | null |
2024-10-25 | Learning the Regularization Strength for Deep Fine-Tuning via a Data-Emphasized Variational Objective | Ethan Harvey et.al. | 2410.19675v1 | null |
2024-10-25 | MetaTrading: An Immersion-Aware Model Trading Framework for Vehicular Metaverse Services | Hongjia Wu et.al. | 2410.19665v1 | null |
2024-10-25 | VARS: Vision-based Assessment of Risk in Security Systems | Pranav Gupta et.al. | 2410.19642v1 | null |
2024-10-24 | Framer: Interactive Frame Interpolation | Wen Wang et.al. | 2410.18978v1 | null |
2024-10-24 | CAMEL-Bench: A Comprehensive Arabic LMM Benchmark | Sara Ghaboura et.al. | 2410.18976v1 | link |
2024-10-24 | Unbounded: A Generative Infinite Game of Character Life Simulation | Jialu Li et.al. | 2410.18975v1 | null |
2024-10-24 | Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling | Mingtong Zhang et.al. | 2410.18912v1 | null |
2024-10-24 | SkillMimicGen: Automated Demonstration Generation for Efficient Skill Learning and Deployment | Caelan Garrett et.al. | 2410.18907v1 | null |
2024-10-24 | A Survey of Multimodal Sarcasm Detection | Shafkat Farabi et.al. | 2410.18882v1 | null |
2024-10-24 | Multi-Class Abnormality Classification in Video Capsule Endoscopy Using Deep Learning | Arnav Samal et.al. | 2410.18879v1 | link |
2024-10-24 | Exploring the Universe with SNAD: Anomaly Detection in Astronomy | Alina A. Volnova et.al. | 2410.18875v1 | null |
2024-10-24 | Exploring a Geometric Conjecture, Some Properties of Blaschke Products, and the Geometry of Curves Formed by Them | Mehmet Celik et.al. | 2410.18863v1 | null |
2024-10-24 | Highly efficient non-rigid registration in k-space with application to cardiac Magnetic Resonance Imaging | Aya Ghoul et.al. | 2410.18834v1 | link |
2024-10-23 | FIPER: Generalizable Factorized Fields for Joint Image Compression and Super-Resolution | Yang-Che Sun et.al. | 2410.18083v1 | null |
2024-10-23 | WorldSimBench: Towards Video Generation Models as World Simulators | Yiran Qin et.al. | 2410.18072v1 | null |
2024-10-23 | Eigenvalue crossings in equivariant families of matrices | Jonathan Rawlinson et.al. | 2410.18068v1 | null |
2024-10-23 | The Double-Edged Sword of Behavioral Responses in Strategic Classification: Theory and User Studies | Raman Ebrahimi et.al. | 2410.18066v1 | null |
2024-10-23 | Real time anomalies detection on video | Fabien Poirier et.al. | 2410.18051v1 | null |
2024-10-23 | Boundary topological insulators and superconductors of Altland-Zirnbauer tenfold classes | Xun-Jiang Luo et.al. | 2410.18015v1 | null |
2024-10-24 | Effective Finite Time Stability Control for Human-Machine Shared Vehicle Following System | Zihan Wang et.al. | 2410.18007v2 | null |
2024-10-23 | Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and Validation | Suho Kang et.al. | 2410.18001v1 | link |
2024-10-23 | Optical Generative Models | Shiqi Chen et.al. | 2410.17970v1 | null |
2024-10-23 | A Wavelet Diffusion GAN for Image Super-Resolution | Lorenzo Aloisi et.al. | 2410.17966v1 | null |
2024-10-22 | Altogether: Image Captioning via Re-aligning Alt-text | Hu Xu et.al. | 2410.17251v1 | null |
2024-10-22 | Classifying rational polygons with small denominator and few interior lattice points | Martin Bohnert et.al. | 2410.17244v1 | null |
2024-10-22 | Frontiers in Intelligent Colonoscopy | Ge-Peng Ji et.al. | 2410.17241v1 | link |
2024-10-22 | Automated Spinal MRI Labelling from Reports Using a Large Language Model | Robin Y. Park et.al. | 2410.17235v1 | link |
2024-10-22 | Few-shot In-Context Preference Learning Using Large Language Models | Chao Yu et.al. | 2410.17233v1 | null |
2024-10-22 | Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods | Tsachi Blau et.al. | 2410.17222v1 | null |
2024-10-22 | The Decision Problem for Regular First-Order Theories | Umang Mathur et.al. | 2410.17185v1 | null |
2024-10-22 | Technical Report: Toward Applying Quantum Computing to Network Verification | Kahlil Dozier et.al. | 2410.17184v1 | null |
2024-10-22 | KANICE: Kolmogorov-Arnold Networks with Interactive Convolutional Elements | Md Meftahul Ferdaus et.al. | 2410.17172v1 | link |
2024-10-22 | Are Visual-Language Models Effective in Action Recognition? A Comparative Study | Mahmoud Ali et.al. | 2410.17149v1 | null |
2024-10-21 | SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree | Shuangrui Ding et.al. | 2410.16268v1 | link |
2024-10-21 | xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs | Michael S. Ryoo et.al. | 2410.16267v1 | null |
2024-10-21 | 3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors | Xi Liu et.al. | 2410.16266v1 | null |
2024-10-21 | Agent-to-Sim: Learning Interactive Behavior Models from Casual Longitudinal Videos | Gengshan Yang et.al. | 2410.16259v1 | null |
2024-10-21 | Serendipitous detection of an intense X-ray flare in the weak-line T Tauri star KM Ori with SRG/eROSITA | Savithri H. Ezhikode et.al. | 2410.16241v1 | null |
2024-10-21 | MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report | Samrajya Thapa et.al. | 2410.16239v1 | link |
2024-10-21 | Deep Radiomics Detection of Clinically Significant Prostate Cancer on Multicenter MRI: Initial Comparison to PI-RADS Assessment | G. A. Nketiah et.al. | 2410.16238v1 | null |
2024-10-22 | Warped Diffusion: Solving Video Inverse Problems with Image Diffusion Models | Giannis Daras et.al. | 2410.16152v2 | null |
2024-10-21 | An Explainable Contrastive-based Dilated Convolutional Network with Transformer for Pediatric Pneumonia Detection | Chandravardhan Singh Raghaw et.al. | 2410.16143v1 | null |
2024-10-21 | Modeling dynamic neural activity by combining naturalistic video stimuli and stimulus-independent latent factors | Finn Schmidt et.al. | 2410.16136v1 | null |
2024-10-18 | Real-time Fake News from Adversarial Feedback | Sanxing Chen et.al. | 2410.14651v1 | null |
2024-10-18 | GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings | Raghuveer Thirukovalluru et.al. | 2410.14635v1 | null |
2024-10-18 | You Shall Know a Tool by the Traces it Leaves: The Predictability of Sentiment Analysis Tools | Daniel Baumartz et.al. | 2410.14626v1 | null |
2024-10-18 | Learning to Control the Smoothness of Graph Convolutional Network Features | Shih-Hsin Wang et.al. | 2410.14604v1 | null |
2024-10-18 | Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection | Aaron Alvarado Kristanto Julistiono et.al. | 2410.14581v1 | null |
2024-10-18 | A Hybrid Feature Fusion Deep Learning Framework for Leukemia Cancer Detection in Microscopic Blood Sample Using Gated Recurrent Unit and Uncertainty Quantification | Maksuda Akter et.al. | 2410.14536v1 | null |
2024-10-18 | Less is More: Selective Reduction of CT Data for Self-Supervised Pre-Training of Deep Learning Models with Contrastive Learning Improves Downstream Classification Performance | Daniel Wolf et.al. | 2410.14524v1 | link |
2024-10-18 | Influence of anisotropy on the study of critical behavior of spin models by machine learning methods | Diana Sukhoverkhova et.al. | 2410.14523v1 | null |
2024-10-18 | A character approach to the ISR property | Artem Dudko et.al. | 2410.14517v1 | null |
2024-10-18 | Efficient Annotator Reliability Assessment and Sample Weighting for Knowledge-Based Misinformation Detection on Social Media | Owen Cook et.al. | 2410.14515v1 | link |
2024-10-17 | DepthSplat: Connecting Gaussian Splatting and Depth | Haofei Xu et.al. | 2410.13862v1 | link |
2024-10-17 | Adaptive Subsampling and Learned Model Improve Spatiotemporal Resolution of Tactile Skin | Ariel Slepyan et.al. | 2410.13847v1 | null |
2024-10-17 | VidPanos: Generative Panoramic Videos from Casual Panning Videos | Jingwei Ma et.al. | 2410.13832v1 | null |
2024-10-17 | DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control | Yujie Wei et.al. | 2410.13830v1 | null |
2024-10-17 | Multi-style conversion for semantic segmentation of lesions in fundus images by adversarial attacks | Clément Playout et.al. | 2410.13822v1 | link |
2024-10-17 | Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance | Mitsuhiko Nakamoto et.al. | 2410.13816v1 | null |
2024-10-17 | A Pattern to Align Them All: Integrating Different Modalities to Define Multi-Modal Entities | Gianluca Apriceno et.al. | 2410.13803v1 | link |
2024-10-17 | MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based Annotations | Liang Xu et.al. | 2410.13790v1 | link |
2024-10-17 | Strong-to-weak spontaneous symmetry breaking meets average symmetry-protected topological order | Yuchen Guo et.al. | 2410.13734v1 | null |
2024-10-17 | Representing Model Weights with Language using Tree Experts | Eliahu Horwitz et.al. | 2410.13569v1 | null |
2024-10-16 | Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception | Jihao Zhao et.al. | 2410.12788v1 | null |
2024-10-16 | The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio | Sicong Leng et.al. | 2410.12787v1 | null |
2024-10-16 | Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions | Zhenyu Jiang et.al. | 2410.12773v1 | null |
2024-10-16 | Vaccinating Federated Learning for Robust Modulation Classification in Distributed Wireless Networks | Hunmin Lee et.al. | 2410.12772v1 | null |
2024-10-16 | Phase retrieval via media diversity | Yan Cheng et.al. | 2410.12767v1 | null |
2024-10-16 | SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation | Jaehong Yoon et.al. | 2410.12761v1 | null |
2024-10-16 | Unitary Multi-Margin BERT for Robust Natural Language Processing | Hao-Yuan Chang et.al. | 2410.12759v1 | null |
2024-10-16 | PND-Net: Plant Nutrition Deficiency and Disease Classification using Graph Convolutional Network | Asish Bera et.al. | 2410.12742v1 | null |
2024-10-16 | How much time do we have before catastrophic disclosure occurs? | Matthew Szydagis et.al. | 2410.12738v1 | null |
2024-10-16 | Machine Learning-Augmented Ontology-Based Data Access for Renewable Energy Data | Marco Calautti et.al. | 2410.12734v1 | null |
2024-10-15 | High-Resolution Frame Interpolation with Patch-based Cascaded Diffusion | Junhwa Hur et.al. | 2410.11838v1 | null |
2024-10-15 | Contrastive Touch-to-Touch Pretraining | Samanta Rodriguez et.al. | 2410.11834v1 | null |
2024-10-15 | CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos | Nikita Karaev et.al. | 2410.11831v1 | null |
2024-10-15 | Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos | Zhouxia Wang et.al. | 2410.11828v1 | null |
2024-10-15 | On representations of Arthur type and unitary dual for classical groups | Alexander Hazeltine et.al. | 2410.11806v1 | null |
2024-10-16 | Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices | Zhiyuan Ma et.al. | 2410.11795v2 | null |
2024-10-15 | OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video Imitation | Jinhan Li et.al. | 2410.11792v1 | null |
2024-10-15 | Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability | Tsz Ting Chung et.al. | 2410.11786v1 | null |
2024-10-15 | On the Training Convergence of Transformers for In-Context Classification | Wei Shen et.al. | 2410.11778v1 | null |
2024-10-15 | Temporal resolution enhancement in Structured Illumination Microscopy using cascaded reconstruction | Doron Shterman et.al. | 2410.11770v1 | null |
2024-10-14 | Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models | Jingzhi Bao et.al. | 2410.10821v1 | null |
2024-10-14 | TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models | Mu Cai et.al. | 2410.10818v1 | null |
2024-10-14 | LVD-2M: A Long-take Video Dataset with Temporally Dense Captions | Tianwei Xiong et.al. | 2410.10816v1 | link |
2024-10-14 | Depth Any Video with Scalable Synthetic Data | Honghui Yang et.al. | 2410.10815v1 | null |
2024-10-14 | Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies | Yanjie Ze et.al. | 2410.10803v1 | link |
2024-10-14 | Boosting Camera Motion Control for Video Diffusion Transformers | Soon Yau Cheong et.al. | 2410.10802v1 | null |
2024-10-14 | Probabilistic Degeneracy Detection for Point-to-Plane Error Minimization | Johan Hatleskog et.al. | 2410.10784v1 | null |
2024-10-14 | 3DArticCyclists: Generating Simulated Dynamic 3D Cyclists for Human-Object Interaction (HOI) and Autonomous Driving Applications | Eduardo R. Corral-Soto et.al. | 2410.10782v1 | null |
2024-10-14 | ControlMM: Controllable Masked Motion Generation | Ekkasit Pinyoanuntapong et.al. | 2410.10780v1 | null |
2024-10-14 | Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention | Dejia Xu et.al. | 2410.10774v1 | null |
2024-10-11 | Optimal Downsampling for Imbalanced Classification with Generalized Linear Models | Yan Chen et.al. | 2410.08994v1 | null |
2024-10-11 | Realizing Linear Synaptic Plasticity in Electric Double Layer-Gated Transistors for Improved Predictive Accuracy and Efficiency in Neuromorphic Computing | Nithil Harris Manimaran et.al. | 2410.08978v1 | null |
2024-10-11 | ALVIN: Active Learning Via INterpolation | Michalis Korakakis et.al. | 2410.08972v1 | null |
2024-10-11 | Evaluating Federated Kolmogorov-Arnold Networks on Non-IID Data | Arthur Mendonça Sasse et.al. | 2410.08961v1 | null |
2024-10-11 | Lifted Coefficient of Determination: Fast model-free prediction intervals and likelihood-free model comparison | Daniel Salnikov et.al. | 2410.08958v1 | null |
2024-10-11 | Rapid Grassmannian Averaging with Chebyshev Polynomials | Brighton Ancelin et.al. | 2410.08956v1 | null |
2024-10-11 | Local moduli in the special 2-flags of length 5 | Piotr Mormul et.al. | 2410.08951v1 | null |
2024-10-11 | On the Adversarial Transferability of Generalized "Skip Connections" | Yisen Wang et.al. | 2410.08950v1 | null |
2024-10-11 | Enhancing Motion Variation in Text-to-Motion Models via Pose and Video Conditioned Editing | Clayton Leite et.al. | 2410.08931v1 | null |
2024-10-11 | Zero-Shot Pupil Segmentation with SAM 2: A Case Study of Over 14 Million Images | Virmarie Maquiling et.al. | 2410.08926v1 | null |
2024-10-10 | LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts | Anh-Quan Cao et.al. | 2410.08211v1 | null |
2024-10-10 | Scaling Laws For Diffusion Transformers | Zhengyang Liang et.al. | 2410.08184v1 | null |
2024-10-10 | RGM: Reconstructing High-fidelity 3D Car Assets with Relightable 3D-GS Generative Model from a Single Image | Xiaoxue Chen et.al. | 2410.08181v1 | null |
2024-10-10 | A note on the symplectic classification of almost-toric systems | Xiudi Tang et.al. | 2410.08175v1 | null |
2024-10-10 | Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models | Qingni Wang et.al. | 2410.08174v1 | null |
2024-10-10 | Progressive Autoregressive Video Diffusion Models | Desai Xie et.al. | 2410.08151v1 | link |
2024-10-10 | Robust AI-Generated Text Detection by Restricted Embeddings | Kristian Kuznetsov et.al. | 2410.08113v1 | null |
2024-10-10 | Color-Guided Flying Pixel Correction in Depth Images | Ekamresh Vasudevan et.al. | 2410.08084v1 | null |
2024-10-10 | Dynamic Object Catching with Quadruped Robot Front Legs | André Schakkal et.al. | 2410.08065v1 | null |
2024-10-10 | A Target-Aware Analysis of Data Augmentation for Hate Speech Detection | Camilla Casula et.al. | 2410.08053v1 | null |
2024-10-09 | MM-Ego: Towards Building Egocentric Multimodal LLMs | Hanrong Ye et.al. | 2410.07177v1 | null |
2024-10-09 | One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation | Fabian Paischer et.al. | 2410.07170v1 | null |
2024-10-09 | Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis | Bohan Zeng et.al. | 2410.07155v1 | link |
2024-10-09 | Mental Disorders Detection in the Era of Large Language Models | Gleb Kuzmin et.al. | 2410.07129v1 | null |
2024-10-09 | Thing2Reality: Transforming 2D Content into Conditioned Multiviews and 3D Gaussian Objects for XR Communication | Erzhen Hu et.al. | 2410.07119v1 | null |
2024-10-09 | JPEG Inspired Deep Learning | Ahmed H. Salamah et.al. | 2410.07081v1 | null |
2024-10-09 | Retrieval-Augmented Decision Transformer: External Memory for In-context RL | Thomas Schmied et.al. | 2410.07071v1 | null |
2024-10-09 | TinyEmo: Scaling down Emotional Reasoning via Metric Projection | Cristian Gutierrez et.al. | 2410.07062v1 | link |
2024-10-09 | Z-upscaling: Optical Flow Guided Frame Interpolation for Isotropic Reconstruction of 3D EM Volumes | Fisseha A. Ferede et.al. | 2410.07043v1 | link |
2024-10-09 | Optimizing Estimators of Squared Calibration Errors in Classification | Sebastian G. Gruber et.al. | 2410.07014v1 | null |
2024-10-07 | Fine-Tuning CLIP's Last Visual Projector: A Few-Shot Cornucopia | Mohammad Fahes et.al. | 2410.05270v1 | link |
2024-10-07 | Grounding Partially-Defined Events in Multimodal Data | Kate Sanders et.al. | 2410.05267v1 | null |
2024-10-07 | DART: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control | Kaifeng Zhao et.al. | 2410.05260v1 | null |
2024-10-07 | SePPO: Semi-Policy Preference Optimization for Diffusion Alignment | Daoan Zhang et.al. | 2410.05255v1 | link |
2024-10-07 | Causal Micro-Narratives | Mourad Heddaya et.al. | 2410.05252v1 | null |
2024-10-07 | LoTLIP: Improving Language-Image Pre-training for Long Text Understanding | Wei Wu et.al. | 2410.05249v1 | null |
2024-10-07 | The Dawn of Video Generation: Preliminary Explorations with SORA-like Models | Ailing Zeng et.al. | 2410.05227v1 | null |
2024-10-07 | Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality | Ge Ya et.al. | 2410.05203v1 | link |
2024-10-07 | Variable Resolution Pixel Quantization for Low Power Machine Vision Application on Edge | Senorita Deb et.al. | 2410.05189v1 | null |
2024-10-07 | VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks | Ziyan Jiang et.al. | 2410.05160v1 | null |
2024-10-04 | Spatial Hyperspheric Models for Compositional Data | Michael R. Schwob et.al. | 2410.03648v1 | null |
2024-10-04 | HyperCMR: Enhanced Multi-Contrast CMR Reconstruction with Eagle Loss | Ruru Xu et.al. | 2410.03624v1 | null |
2024-10-04 | Crystallography, Group Cohomology, and Lieb-Schultz-Mattis Constraints | Chunxiao Liu et.al. | 2410.03607v1 | null |
2024-10-04 | LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Videos | Noriaki Hirose et.al. | 2410.03603v1 | null |
2024-10-04 | Training Over a Distribution of Hyperparameters for Enhanced Performance and Adaptability on Imbalanced Classification | Kelsey Lieberman et.al. | 2410.03588v1 | null |
2024-10-04 | A Multi-model Approach for Video Data Retrieval in Autonomous Vehicle Development | Jesper Knapp et.al. | 2410.03580v1 | null |
2024-10-04 | Re-examining Sexism and Misogyny Classification with Annotator Attitudes | Aiqi Jiang et.al. | 2410.03543v1 | null |
2024-10-04 | Classification-Denoising Networks | Louis Thiry et.al. | 2410.03505v1 | null |
2024-10-04 | MO-DDN: A Coarse-to-Fine Attribute-based Exploration Agent for Multi-object Demand-driven Navigation | Hongcheng Wang et.al. | 2410.03488v1 | null |
2024-10-04 | A Multimodal Framework for Deepfake Detection | Kashish Gandhi et.al. | 2410.03487v1 | null |
2024-10-03 | Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats | Mingyang Xie et.al. | 2410.02764v1 | null |
2024-10-03 | Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos | Jianrui Zhang et.al. | 2410.02763v1 | null |
2024-10-03 | Loong: Generating Minute-level Long Videos with Autoregressive Language Models | Yuqing Wang et.al. | 2410.02757v1 | null |
2024-10-03 | An Online Automatic Modulation Classification Scheme Based on Isolation Distributional Kernel | Xinpeng Li et.al. | 2410.02750v1 | null |
2024-10-03 | OOD-Chameleon: Is Algorithm Selection for OOD Generalization Learnable? | Liangze Jiang et.al. | 2410.02735v1 | null |
2024-10-03 | Liouville's theorem in calibrated geometries | Toni Ikonen et.al. | 2410.02722v1 | null |
2024-10-03 | Curvature Diversity-Driven Deformation and Domain Alignment for Point Cloud | Mengxi Wu et.al. | 2410.02720v1 | link |
2024-10-03 | AlzhiNet: Traversing from 2DCNN to 3DCNN, Towards Early Detection and Diagnosis of Alzheimer's Disease | Romoke Grace Akindele et.al. | 2410.02714v1 | null |
2024-10-04 | Video Instruction Tuning With Synthetic Data | Yuanhan Zhang et.al. | 2410.02713v2 | null |
2024-10-03 | Impact of a reclassification on Web of Science articles on bibliometric indicators | Agénor Lahatte et.al. | 2410.02701v1 | null |
2024-10-02 | Loki: An Open-Source Tool for Fact Verification | Haonan Li et.al. | 2410.01794v1 | null |
2024-10-03 | Application of convolutional neural networks for extensive air shower separation in the SPHERE-3 experiment | E. L. Entina et.al. | 2410.01781v2 | null |
2024-10-03 | TopER: Topological Embeddings in Graph Representation Learning | Astrit Tola et.al. | 2410.01778v2 | null |
2024-10-02 | Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context | Spencer Frei et.al. | 2410.01774v1 | null |
2024-10-02 | SegHeD: Segmentation of Heterogeneous Data for Multiple Sclerosis Lesions with Anatomical Constraints | Berke Doga Basaran et.al. | 2410.01766v1 | null |
2024-10-02 | LightSC: The Making of a Usable Security Classification Tool for DevSecOps | Manish Shrestha et.al. | 2410.01762v1 | null |
2024-10-02 | Integrating Protein Sequence and Expression Level to Analysis Molecular Characterization of Breast Cancer Subtypes | Hossein Sholehrasa et.al. | 2410.01755v1 | null |
2024-10-02 | Unitary Representations of the Isometry Groups of Urysohn Spaces | Rémi Barritault et.al. | 2410.01725v1 | null |
2024-10-02 | COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation | Mingzhen Sun et.al. | 2410.01718v1 | null |
2024-10-02 | Rabi oscillations at three-photon laser excitation of a single rubidium Rydberg atom in an optical dipole trap | I. I. Beterov et.al. | 2410.01703v1 | null |
2024-09-30 | Continuously Improving Mobile Manipulation with Autonomous Real-World RL | Russell Mendonca et.al. | 2409.20568v1 | null |
2024-09-30 | MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning | Haotian Zhang et.al. | 2409.20566v1 | null |
2024-09-30 | DressRecon: Freeform 4D Human Reconstruction from Monocular Video | Jeff Tan et.al. | 2409.20563v1 | null |
2024-09-30 | LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner | Xiaopan Zhang et.al. | 2409.20560v1 | null |
2024-09-30 | Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos | Md Mohaiminul Islam et.al. | 2409.20557v1 | null |
2024-09-30 | Inverse Painting: Reconstructing The Painting Process | Bowei Chen et.al. | 2409.20556v1 | null |
2024-09-30 | UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models | Qiaojun Yu et.al. | 2409.20551v1 | null |
2024-09-30 | Statistical view of orbital circularisation with 14 000 characterised TESS eclipsing binaries | L. W. IJspeert et.al. | 2409.20540v1 | null |
2024-09-30 | Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers | Lirui Wang et.al. | 2409.20537v1 | link |
2024-09-30 | Dual Encoder GAN Inversion for High-Fidelity 3D Head Reconstruction from Single Images | Bahri Batuhan Bilecen et.al. | 2409.20530v1 | null |
2024-09-27 | PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation | Shaowei Liu et.al. | 2409.18964v1 | link |
2024-09-27 | LML: Language Model Learning a Dataset for Data-Augmented Prediction | Praneeth Vadlapati et.al. | 2409.18957v1 | link |
2024-09-27 | Unconditional stability of a recurrent neural circuit implementing divisive normalization | Shivang Rawat et.al. | 2409.18946v1 | null |
2024-09-27 | From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding | Heqing Zou et.al. | 2409.18938v1 | null |
2024-09-27 | Subspace Preserving Quantum Convolutional Neural Network Architectures | Léo Monbroussou et.al. | 2409.18918v1 | null |
2024-09-27 | Improving Visual Object Tracking through Visual Prompting | Shih-Fang Chen et.al. | 2409.18901v1 | link |
2024-09-27 | Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors | Yunlong Lin et.al. | 2409.18899v1 | null |
2024-09-27 | Suicide Phenotyping from Clinical Notes in Safety-Net Psychiatric Hospital Using Multi-Label Classification with Pre-Trained Language Models | Zehan Li et.al. | 2409.18878v1 | null |
2024-09-27 | Simulating Dynamic Tumor Contrast Enhancement in Breast MRI using Conditional Generative Adversarial Networks | Richard Osuala et.al. | 2409.18872v1 | null |
2024-09-27 | Fusion Systems and Simple Groups With Class Two Sylow |
Martin van Beek et.al. | 2409.18870v1 | null |
2024-09-26 | EgoLM: Multi-Modal Language Model of Egocentric Motions | Fangzhou Hong et.al. | 2409.18127v1 | null |
2024-09-26 | LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness | Chenming Zhu et.al. | 2409.18125v1 | null |
2024-09-26 | RT-GuIDE: Real-Time Gaussian splatting for Information-Driven Exploration | Yuezhan Tao et.al. | 2409.18122v1 | null |
2024-09-26 | Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction | Justin Kerr et.al. | 2409.18121v1 | null |
2024-09-26 | E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding | Ye Liu et.al. | 2409.18111v1 | link |
2024-09-26 | MALPOLON: A Framework for Deep Species Distribution Modeling | Theo Larcher et.al. | 2409.18102v1 | null |
2024-09-26 | Incorporating sparse labels into biologging studies using hidden Markov models with weighted likelihoods | Evan Sidrow et.al. | 2409.18091v1 | null |
2024-09-26 | Stable Video Portraits | Mirela Ostrek et.al. | 2409.18083v1 | null |
2024-09-26 | Graded contractions on the orthogonal Lie algebras of dimensions 7 and 8 | Cristina Draper et.al. | 2409.18069v1 | null |
2024-09-26 | LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field | Huan Wang et.al. | 2409.18057v1 | link |
2024-09-25 | DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion | Yukun Huang et.al. | 2409.17145v1 | null |
2024-09-25 | Streaming Neural Images | Marcos V. Conde et.al. | 2409.17134v1 | null |
2024-09-25 | Assessing the Level of Toxicity Against Distinct Groups in Bangla Social Media Comments: A Comprehensive Investigation | Mukaffi Bin Moin et.al. | 2409.17130v1 | null |
2024-09-25 | Classification of Gleason Grading in Prostate Cancer Histopathology Images Using Deep Learning Techniques: YOLO, Vision Transformers, and Vision Mamba | Amin Malekmohammadi et.al. | 2409.17122v1 | link |
2024-09-25 | Counting Triangles in Triangles | Jim Propp et.al. | 2409.17117v1 | null |
2024-09-25 | BitQ: Tailoring Block Floating Point Precision for Improved DNN Efficiency on Resource-Constrained Devices | Yongqi Xu et.al. | 2409.17093v1 | link |
2024-09-25 | Accumulator-Aware Post-Training Quantization | Ian Colbert et.al. | 2409.17092v1 | null |
2024-09-25 | Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification | Xinrui Zhou et.al. | 2409.17091v1 | null |
2024-09-25 | SEN12-WATER: A New Dataset for Hydrological Applications and its Benchmarking | Luigi Russo et.al. | 2409.17087v1 | null |
2024-09-25 | The Effect of Perceptual Metrics on Music Representation Learning for Genre Classification | Tashi Namgyal et.al. | 2409.17069v1 | null |
2024-09-24 | Self-Supervised Any-Point Tracking by Contrastive Random Walks | Ayush Shrivastava et.al. | 2409.16288v1 | link |
2024-09-24 | Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking | Xi Wang et.al. | 2409.16287v1 | null |
2024-09-24 | Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation | Homanga Bharadhwaj et.al. | 2409.16283v1 | null |
2024-09-24 | Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation | Yong Xien Chng et.al. | 2409.16278v1 | null |
2024-09-24 | Compressed Depth Map Super-Resolution and Restoration: AIM 2024 Challenge Results | Marcos V. Conde et.al. | 2409.16277v1 | null |
2024-09-24 | CDChat: A Large Multimodal Model for Remote Sensing Change Description | Mubashir Noman et.al. | 2409.16261v1 | link |
2024-09-24 | Empirically Exploring the Space of Monostationarity in Dual Phosphorylation | May Cai et.al. | 2409.16234v1 | null |
2024-09-24 | VideoPatchCore: An Effective Method to Memorize Normality for Video Anomaly Detection | Sunghyun Ahn et.al. | 2409.16225v1 | link |
2024-09-24 | Upper-body free-breathing Magnetic Resonance Fingerprinting applied to the quantification of water T1 and fat fraction | Constantin Slioussarenko et.al. | 2409.16200v1 | null |
2024-09-24 | Leveraging Estimated Transferability Over Human Intuition for Model Selection in Text Ranking | Jun Bai et.al. | 2409.16198v1 | null |
2024-09-20 | Gender Representation and Bias in Indian Civil Service Mock Interviews | Somonnoy Banerjee et.al. | 2409.12194v3 | null |
2024-09-18 | DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control | Zichen Jeff Cui et.al. | 2409.12192v1 | null |
2024-09-18 | Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution | Peng Wang et.al. | 2409.12191v1 | link |
2024-09-18 | multiPI-TransBTS: A Multi-Path Learning Framework for Brain Tumor Image Segmentation Based on Multi-Physical Information | Hongjun Zhu et.al. | 2409.12167v1 | link |
2024-09-18 | JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation | Sai Tanmay Reddy Chakkera et.al. | 2409.12156v1 | null |
2024-09-18 | Autopet III challenge: Incorporating anatomical knowledge into nnUNet for lesion segmentation in PET/CT | Hamza Kalisch et.al. | 2409.12155v1 | link |
2024-09-18 | MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion | Kalakonda Sai Shashank et.al. | 2409.12140v1 | null |
2024-09-18 | Mirages in the Energy Landscape of Soft Sphere Packings | Praharsh Suryadevara et.al. | 2409.12113v1 | null |
2024-09-18 | SPRMamba: Surgical Phase Recognition for Endoscopic Submucosal Dissection with Mamba | Xiangning Zhang et.al. | 2409.12108v1 | null |
2024-09-18 | Unveiling the Secrets of New Physics Through Top Quark Tagging | Rameswar Sahu et.al. | 2409.12085v1 | null |
2024-09-17 | Systematic analysis of Parity-Violating modes | Hong-Ming Zhu et.al. | 2409.11400v1 | null |
2024-09-17 | Online 4D Ultrasound-Guided Robotic Tracking Enables 3D Ultrasound Localisation Microscopy with Large Tissue Displacements | Jipeng Yan et.al. | 2409.11391v1 | null |
2024-09-17 | Normalization in Proportional Feature Spaces | Alexandre Benatti et.al. | 2409.11389v1 | null |
2024-09-17 | Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification | Fatema-E- Jannat et.al. | 2409.11375v1 | null |
2024-09-17 | Uncertainty and Prediction Quality Estimation for Semantic Segmentation via Graph Neural Networks | Edgar Heinert et.al. | 2409.11373v1 | null |
2024-09-17 | Compact Implicit Neural Representations for Plane Wave Images | Mathilde Monvoisin et.al. | 2409.11370v1 | null |
2024-09-17 | OSV: One Step is Enough for High-Quality Image to Video Generation | Xiaofeng Mao et.al. | 2409.11367v1 | null |
2024-09-17 | THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models | Mengfei Liang et.al. | 2409.11353v1 | null |
2024-09-17 | CLIP Adaptation by Intra-modal Overlap Reduction | Alexey Kravets et.al. | 2409.11338v1 | null |
2024-09-17 | LPT++: Efficient Training on Mixture of Long-tailed Experts | Bowen Dong et.al. | 2409.11323v1 | null |
2024-09-16 | Enhancing Video Transmission with Machine Learning based Routing in Software-Defined Networks | Anıl Dursun İpek et.al. | 2409.10512v1 | null |
2024-09-16 | Exploring 3D Face Reconstruction and Fusion Methods for Face Verification: A Case-Study in Video Surveillance | Simone Maurizio La Cava et.al. | 2409.10481v1 | null |
2024-09-16 | Real-Time Whole-Body Control of Legged Robots with Model-Predictive Path Integral Control | Juan Alvarez-Padilla et.al. | 2409.10469v1 | null |
2024-09-16 | Assortativity in sympatric speciation and species classification | Joao U. F. Lizarraga et.al. | 2409.10466v1 | null |
2024-09-16 | Kolmogorov-Arnold Networks in Low-Data Regimes: A Comparative Study with Multilayer Perceptrons | Farhad Pourkamali-Anaraki et.al. | 2409.10463v1 | null |
2024-09-16 | Deep-Wide Learning Assistance for Insect Pest Classification | Toan Nguyen et.al. | 2409.10445v1 | link |
2024-09-16 | A point process approach for the classification of noisy calcium imaging data | Arianna Burzacchi et.al. | 2409.10409v1 | null |
2024-09-16 | MOST: MR reconstruction Optimization for multiple downStream Tasks via continual learning | Hwihun Jeong et.al. | 2409.10394v1 | link |
2024-09-16 | Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning | Amin Karimi Monsefi et.al. | 2409.10362v1 | null |
2024-09-16 | 2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation? | Téo Guichoux et.al. | 2409.10357v1 | null |
2024-09-13 | An Efficient and Streaming Audio Visual Active Speaker Detection System | Arnav Kundu et.al. | 2409.09018v1 | null |
2024-09-13 | Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation | Qingwen Bu et.al. | 2409.09016v1 | link |
2024-09-13 | Model-independent variable selection via the rule-based variable priorit | Min Lu et.al. | 2409.09003v1 | null |
2024-09-13 | Biomimetic Frontend for Differentiable Audio Processing | Ruolan Leslie Famularo et.al. | 2409.08997v1 | link |
2024-09-13 | Comparative Analysis of Pretrained Audio Representations in Music Recommender Systems | Yan-Martin Tamm et.al. | 2409.08987v1 | link |
2024-09-13 | Fast DCT+: A Family of Fast Transforms Based on Rank-One Updates of the Path Graph | Samuel Fernández-Menduiña et.al. | 2409.08970v1 | null |
2024-09-13 | Pushing the boundaries of event subsampling in event-based video classification using CNNs | Hesam Araghi et.al. | 2409.08953v1 | link |
2024-09-13 | Pushing Joint Image Denoising and Classification to the Edge | Thomas C Markhorst et.al. | 2409.08943v1 | null |
2024-09-13 | LLM-based Weak Supervision Framework for Query Intent Classification in Video Search | Farnoosh Javadi et.al. | 2409.08931v1 | null |
2024-09-13 | Classification of electronic structures and state preparation for quantum computation of reaction chemistry | Maximilian Mörchen et.al. | 2409.08910v1 | null |
2024-09-12 | Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor | Andrea Conti et.al. | 2409.08277v1 | null |
2024-09-12 | Hand-Object Interaction Pretraining from Videos | Himanshu Gaurav Singh et.al. | 2409.08273v1 | null |
2024-09-12 | DreamBeast: Distilling 3D Fantastical Animals with Part-Aware Knowledge Transfer | Runjia Li et.al. | 2409.08271v1 | null |
2024-09-12 | OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering | Jiahao Nick Li et.al. | 2409.08250v1 | null |
2024-09-12 | A review of compact geodesic orbit manifolds and the g.o. condition for |
Andreas Arvanitoyeorgos et.al. | 2409.08247v1 | null |
2024-09-12 | Model Ensemble for Brain Tumor Segmentation in Magnetic Resonance Imaging | Daniel Capellán-Martín et.al. | 2409.08232v1 | null |
2024-09-12 | CliquePH: Higher-Order Information for Graph Neural Networks through Persistent Homology on Clique Graphs | Davide Buffelli et.al. | 2409.08217v1 | null |
2024-09-12 | LT3SD: Latent Trees for 3D Scene Diffusion | Quan Meng et.al. | 2409.08215v1 | null |
2024-09-12 | Gaussian Garments: Reconstructing Simulation-Ready Clothing with Photorealistic Appearance from Multi-View Video | Boxiang Rong et.al. | 2409.08189v1 | null |
2024-09-13 | Efficient Sparse Coding with the Adaptive Locally Competitive Algorithm for Speech Classification | Soufiyan Bahadi et.al. | 2409.08188v2 | null |
2024-09-11 | Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models | Haibo Yang et.al. | 2409.07452v1 | link |
2024-09-11 | VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos | Yan-Bo Lin et.al. | 2409.07450v1 | null |
2024-09-11 | Autonomous loading of ore piles with Load-Haul-Dump machines using Deep Reinforcement Learning | Rodrigo Salas et.al. | 2409.07449v1 | null |
2024-09-11 | StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos | Sijie Zhao et.al. | 2409.07447v1 | null |
2024-09-11 | Deep Neural Network-Based Sign Language Recognition: A Comprehensive Approach Using Transfer Learning with Explainability | A. E. M Ridwan et.al. | 2409.07426v1 | null |
2024-09-11 | Controllable retinal image synthesis using conditional StyleGAN and latent space manipulation for improved diagnosis and grading of diabetic retinopathy | Somayeh Pakdelmoez et.al. | 2409.07422v1 | null |
2024-09-11 | Efficient One-Step Diffusion Refinement for Snapshot Compressive Imaging | Yunzhen Wang et.al. | 2409.07417v1 | null |
2024-09-11 | NVRC: Neural Video Representation Compression | Ho Man Kwan et.al. | 2409.07414v1 | null |
2024-09-12 | Robust Robot Walker: Learning Agile Locomotion over Tiny Traps | Shaoting Zhu et.al. | 2409.07409v2 | null |
2024-09-11 | Revisiting Static Feature-Based Android Malware Detection | Md Tanvirul Alam et.al. | 2409.07397v1 | null |
2024-09-10 | A study on Deep Convolutional Neural Networks, Transfer Learning and Ensemble Model for Breast Cancer Detection | Md Taimur Ahad et.al. | 2409.06699v1 | null |
2024-09-10 | DANCE: Deep Learning-Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images | Taslim Murad et.al. | 2409.06694v1 | null |
2024-09-10 | Benchmarking Sub-Genre Classification For Mainstage Dance Music | Hongzhi Shu et.al. | 2409.06690v1 | null |
2024-09-10 | A comprehensive study on Blood Cancer detection and classification using Convolutional Neural Network | Md Taimur Ahad et.al. | 2409.06689v1 | null |
2024-09-10 | A study on deep feature extraction to detect and classify Acute Lymphoblastic Leukemia (ALL) | Sabit Ahamed Preanto et.al. | 2409.06687v1 | null |
2024-09-10 | Constructing an Interpretable Deep Denoiser by Unrolling Graph Laplacian Regularizer | Seyed Alireza Hosseini et.al. | 2409.06676v1 | null |
2024-09-10 | Bulk and atmospheric metallicities as direct probes of sequentially varying accretion mechanisms of gas and solids onto planets | Yasuhiro Hasegawa et.al. | 2409.06670v1 | null |
2024-09-10 | Data Collection-free Masked Video Modeling | Yuchi Ishikawa et.al. | 2409.06665v1 | null |
2024-09-10 | World-Grounded Human Motion Recovery via Gravity-View Coordinates | Zehong Shen et.al. | 2409.06662v1 | null |
2024-09-10 | Classifying Functions via growth rates of repeated iterations | Titus Hilberdink et.al. | 2409.06661v1 | null |
2024-09-09 | Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments | Haritheja Etukuru et.al. | 2409.05865v1 | null |
2024-09-09 | Neural MP: A Generalist Neural Motion Planner | Murtaza Dalal et.al. | 2409.05864v1 | null |
2024-09-09 | LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation | Henghui Ding et.al. | 2409.05847v1 | null |
2024-09-10 | Finite-size topological phases from semimetals | Adipta Pal et.al. | 2409.05842v2 | null |
2024-09-09 | Fast Generation of Custom Floating-Point Spatial Filters on FPGAs | Nelson Campos et.al. | 2409.05837v1 | null |
2024-09-09 | Limits on the computational expressivity of non-equilibrium biophysical processes | Carlos Floyd et.al. | 2409.05827v1 | null |
2024-09-09 | A Flexible Framework for Universal Computational Aberration Correction via Automatic Lens Library Generation and Domain Adaptation | Qi Jiang et.al. | 2409.05809v1 | null |
2024-09-09 | A CLIP-based siamese approach for meme classification | Javier Huertas-Tato et.al. | 2409.05772v1 | null |
2024-09-09 | Consensus-based Distributed Quantum Kernel Learning for Speech Recognition | Kuan-Cheng Chen et.al. | 2409.05770v1 | null |
2024-09-09 | A Toolkit for Joint Speaker Diarization and Identification with Application to Speaker-Attributed ASR | Giovanni Morrone et.al. | 2409.05750v1 | null |
2024-09-06 | Synergy and Synchrony in Couple Dances | Vongani Maluleke et.al. | 2409.04440v1 | null |
2024-09-06 | VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation | Yecheng Wu et.al. | 2409.04429v1 | null |
2024-09-06 | Exploring Foundation Models for Synthetic Medical Imaging: A Study on Chest X-Rays and Fine-Tuning Techniques | Davide Clode da Silva et.al. | 2409.04424v1 | null |
2024-09-06 | Virtual Reality-Based Preoperative Planning for Optimized Trocar Placement in Thoracic Surgery: A Preliminary Study | Arash Harirpoush et.al. | 2409.04414v1 | null |
2024-09-06 | Quantum Kernel Methods under Scrutiny: A Benchmarking Study | Jan Schnabel et.al. | 2409.04406v1 | null |
2024-09-09 | Question-Answering Dense Video Events | Hangyu Qin et.al. | 2409.04388v2 | null |
2024-09-06 | Empirical Bayesian image restoration by Langevin sampling with a denoising diffusion implicit prior | Charlesquin Kemajou Mbakam et.al. | 2409.04384v1 | null |
2024-09-06 | Enhancing Skin Lesion Diagnosis with Ensemble Learning | Xiaoyi Liu et.al. | 2409.04381v1 | null |
2024-09-06 | Tykhyy's Conjecture on finite mapping class group orbits | Samuel Bronstein et.al. | 2409.04379v1 | null |
2024-09-06 | The Impact of Scanner Domain Shift on Deep Learning Performance in Medical Imaging: an Experimental Study | Gregory Szumel et.al. | 2409.04368v1 | null |
2024-09-05 | Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding | Yunze Man et.al. | 2409.03757v1 | link |
2024-09-05 | Dynamics of Supervised and Reinforcement Learning in the Non-Linear Perceptron | Christian Schmid et.al. | 2409.03749v1 | null |
2024-09-05 | Orbital Support and Evolution of CX/OX Structures in Boxy/Peanut Bars | Behzad Tahmasebzadeh et.al. | 2409.03746v1 | null |
2024-09-05 | Libra: Architectural Support For Principled, Secure And Efficient Balanced Execution On High-End Processors (Extended Version) | Hans Winderix et.al. | 2409.03743v1 | null |
2024-09-05 | Classification and Prediction of Heart Diseases using Machine Learning Algorithms | Akua Sekyiwaa Osei-Nkwantabisa et.al. | 2409.03697v1 | null |
2024-09-05 | View-Invariant Policy Learning via Zero-Shot Novel View Synthesis | Stephen Tian et.al. | 2409.03685v1 | null |
2024-09-05 | Threat Classification on Deployed Optical Networks Using MIMO Digital Fiber Sensing, Wavelets, and Machine Learning | Khouloud Abdelli et.al. | 2409.03667v1 | null |
2024-09-05 | Limited but consistent gains in adversarial robustness by co-training object recognition models with human EEG | Manshan Guo et.al. | 2409.03646v1 | null |
2024-09-05 | Variance reduction in Texas hold'em and in video poker | Stewart N. Ethier et.al. | 2409.03607v1 | null |
2024-09-05 | SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing | Lingyu Xiong et.al. | 2409.03605v1 | null |
2024-09-04 | SITAR: Semi-supervised Image Transformer for Action Recognition | Owais Iqbal et.al. | 2409.02910v1 | null |
2024-09-04 | GraphTrials: Visual Proofs of Graph Properties | Henry Förster et.al. | 2409.02907v1 | null |
2024-09-04 | Classification of spin-$1/2$ fermionic quantum spin liquids on the trillium lattice | Ming-Hao Li et.al. | 2409.02898v1 | null |
2024-09-04 | LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture | Xidong Wang et.al. | 2409.02889v1 | link |
2024-09-04 | CanvOI, an Oncology Intelligence Foundation Model: Scaling FLOPS Differently | Jonathan Zalach et.al. | 2409.02885v1 | null |
2024-09-04 | Look Into the LITE in Deep Learning for Time Series Classification | Ali Ismail-Fawaz et.al. | 2409.02869v1 | null |
2024-09-04 | Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models | Zhibin Liu et.al. | 2409.02851v1 | null |
2024-09-04 | iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation | Hayeon Jo et.al. | 2409.02838v1 | null |
2024-09-04 | Evolution of radiation profiles in a strongly baffled divertor on MAST Upgrade | Fabio Federici et.al. | 2409.02837v1 | null |
2024-09-04 | Exploring Sentiment Dynamics and Predictive Behaviors in Cryptocurrency Discussions by Few-Shot Learning with Large Language Models | Moein Shahiki Tash et.al. | 2409.02836v1 | null |
2024-08-30 | Bridging Episodes and Semantics: A Novel Framework for Long-Form Video Understanding | Gueter Josmy Faure et.al. | 2408.17443v1 | link |
2024-08-30 | SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists | Raoyuan Zhao et.al. | 2408.17437v1 | link |
2024-08-30 | CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion | Yiran Chen et.al. | 2408.17424v1 | null |
2024-09-03 | Open-vocabulary Temporal Action Localization using VLMs | Naoki Wake et.al. | 2408.17422v2 | null |
2024-08-30 | Generative AI Enables Medical Image Segmentation in Ultra Low-Data Regimes | Li Zhang et.al. | 2408.17421v1 | link |
2024-08-30 | End-to-End Learning for Task-Oriented Semantic Communications Over MIMO Channels: An Information-Theoretic Framework | Chang Cai et.al. | 2408.17397v1 | null |
2024-08-30 | Equivariant isomorphism of Quantum Lens Spaces of low dimension | Søren Eilers et.al. | 2408.17386v1 | null |
2024-08-30 | LASSO-MOGAT: A Multi-Omics Graph Attention Framework for Cancer Classification | Fadi Alharbi et.al. | 2408.17384v1 | null |
2024-08-30 | Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain | Francesca Grasso et.al. | 2408.17362v1 | link |
2024-08-30 | Enhancing Underwater Imaging with 4-D Light Fields: Dataset and Method | Yuji Lin et.al. | 2408.17339v1 | null |
2024-08-29 | SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners | Ziyu Guo et.al. | 2408.16768v1 | link |
2024-08-29 | ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model | Fangfu Liu et.al. | 2408.16767v1 | null |
2024-08-29 | OmniRe: Omni Urban Scene Reconstruction | Ziyu Chen et.al. | 2408.16760v1 | null |
2024-08-29 | Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge | Beidi Dong et.al. | 2408.16749v1 | null |
2024-08-29 | Automatic detection of Mild Cognitive Impairment using high-dimensional acoustic features in spontaneous speech | Cong Zhang et.al. | 2408.16732v1 | null |
2024-08-29 | VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation | Shiwei Wu et.al. | 2408.16730v1 | null |
2024-08-29 | Prediction-Feedback DETR for Temporal Action Detection | Jihwan Kim et.al. | 2408.16729v1 | null |
2024-08-29 | A GREAT Architecture for Edge-Based Graph Problems Like TSP | Attila Lischka et.al. | 2408.16717v1 | null |
2024-08-29 | One-Shot Learning Meets Depth Diffusion in Multi-Object Videos | Anisha Jain et.al. | 2408.16704v1 | null |
2024-08-29 | RoboMNIST: A Multimodal Dataset for Multi-Robot Activity Recognition Using WiFi Sensing, Video, and Audio | Kian Behzad et.al. | 2408.16703v1 | null |
2024-08-29 | Spatio-Temporal Context Prompting for Zero-Shot Action Detection | Wei-Jhe Huang et.al. | 2408.15996v2 | null |
2024-08-28 | TEDRA: Text-based Editing of Dynamic and Photoreal Actors | Basavaraj Sunagad et.al. | 2408.15995v1 | null |
2024-08-28 | Minimizing movements solutions for a monotone model of droplet motion | Carson Collins et.al. | 2408.15984v1 | null |
2024-08-28 | VLT/MUSE detection of accretion-ejection associated with the close stellar companion in the HT Lup system | Sebastián Jorquera et.al. | 2408.15976v1 | null |
2024-08-28 | 1+1d SPT phases with fusion category symmetry: interface modes and non-abelian Thouless pump | Kansei Inamura et.al. | 2408.15960v1 | null |
2024-08-28 | Generating Binary Species Range Maps | Filip Dorm et.al. | 2408.15956v1 | null |
2024-08-28 | Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games | Nicholas R. Waytowich et.al. | 2408.15950v1 | null |
2024-08-28 | Auxiliary Input in Training: Incorporating Catheter Features into Deep Learning Models for ECG-Free Dynamic Coronary Roadmapping | Yikang Liu et.al. | 2408.15947v1 | null |
2024-08-28 | A latticed total K-theory | Qingnan An et.al. | 2408.15941v1 | null |
2024-08-28 | Local Descriptors Weighted Adaptive Threshold Filtering For Few-Shot Learning | Bingchen Yan et.al. | 2408.15924v1 | null |
2024-08-27 | GenRec: Unifying Video Generation and Recognition with Diffusion Models | Zejia Weng et.al. | 2408.15241v1 | null |
2024-08-27 | Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation | Xiaojuan Wang et.al. | 2408.15239v1 | null |
2024-08-27 | DCT-CryptoNets: Scaling Private Inference in the Frequency Domain | Arjun Roy et.al. | 2408.15231v1 | null |
2024-08-27 | SAM & SAM 2 in 3D Slicer: SegmentWithSAM Extension for Annotating Medical Images | Zafer Yildiz et.al. | 2408.15224v1 | link |
2024-08-27 | Histo-Diffusion: A Diffusion Super-Resolution Method for Digital Pathology with Comprehensive Quality Assessment | Xuan Xu et.al. | 2408.15218v1 | null |
2024-08-27 | Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance | Weiyi Zhang et.al. | 2408.15217v1 | null |
2024-08-27 | Classifying populist language in American presidential and governor speeches using automatic text analysis | Olaf van der Veen et.al. | 2408.15213v1 | null |
2024-08-27 | Sec2Sec Co-attention for Video-Based Apparent Affective Prediction | Mingwei Sun et.al. | 2408.15209v1 | link |
2024-08-27 | Automatic 8-tissue Segmentation for 6-month Infant Brains | Yilan Dong et.al. | 2408.15198v1 | null |
2024-08-27 | Infusing Acoustic Pause Context into Text-Based Dementia Assessment | Franziska Braun et.al. | 2408.15188v1 | null |
2024-08-26 | Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos | Qirui Chen et.al. | 2408.14469v1 | null |
2024-08-26 | K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences | Zhikai Li et.al. | 2408.14468v1 | null |
2024-08-26 | Reconstructing physiological signals from fMRI across the adult lifespan | Shiyu Wang et.al. | 2408.14453v1 | null |
2024-08-26 | Model Parallel Training and Transfer Learning for Convolutional Neural Networks by Domain Decomposition | Axel Klawonn et.al. | 2408.14442v1 | null |
2024-08-26 | Attend-Fusion: Efficient Audio-Visual Fusion for Video Classification | Mahrukh Awan et.al. | 2408.14441v1 | null |
2024-08-26 | Radiance Cascades: A Novel High-Resolution Formal Solution for Multidimensional Non-LTE Radiative Transfer | Christopher M. J. Osborne et.al. | 2408.14425v1 | null |
2024-08-26 | Learning Tree-Structured Composition of Data Augmentation | Dongyue Li et.al. | 2408.14381v1 | link |
2024-08-26 | Probing Causality Manipulation of Large Language Models | Chenyang Zhang et.al. | 2408.14380v1 | link |
2024-08-26 | GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy | Peiyan Li et.al. | 2408.14368v1 | null |
2024-08-26 | An Embedding is Worth a Thousand Noisy Labels | Francesco Di Salvo et.al. | 2408.14358v1 | null |
2024-08-23 | Ensemble Modeling of Multiple Physical Indicators to Dynamically Phenotype Autism Spectrum Disorder | Marie Huynh et.al. | 2408.13255v1 | null |
2024-08-23 | Domain-specific long text classification from sparse relevant information | Célia D'Cruz et.al. | 2408.13253v1 | null |
2024-08-23 | CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities | Tao Wu et.al. | 2408.13239v1 | null |
2024-08-23 | D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching | Jingyu Liu et.al. | 2408.13226v1 | null |
2024-08-23 | ResSR: A Residual Approach to Super-Resolving Multispectral Images | Haley Duba-Sullivan et.al. | 2408.13225v1 | null |
2024-08-23 | EUR-USD Exchange Rate Forecasting Based on Information Fusion with Large Language Models and Deep Learning Methods | Hongcheng Ding et.al. | 2408.13214v1 | null |
2024-08-23 | Instruct-DeBERTa: A Hybrid Approach for Aspect-based Sentiment Analysis on Textual Reviews | Dineth Jayakody et.al. | 2408.13202v1 | null |
2024-08-23 | EAViT: External Attention Vision Transformer for Audio Classification | Aquib Iqbal et.al. | 2408.13201v1 | null |
2024-08-23 | Deep Learning for Lung Disease Classification Using Transfer Learning and a Customized CNN Architecture with Attention | Xiaoyi Liu et.al. | 2408.13180v1 | null |
2024-08-23 | Augmented Functional Random Forests: Classifier Construction and Unbiased Functional Principal Components Importance through Ad-Hoc Conditional Permutations | Fabrizio Maturo et.al. | 2408.13179v1 | null |
2024-08-22 | Automating Deformable Gasket Assembly | Simeon Adebola et.al. | 2408.12593v1 | null |
2024-08-22 | xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations | Can Qin et.al. | 2408.12590v1 | null |
2024-08-22 | Real-Time Video Generation with Pyramid Attention Broadcast | Xuanlei Zhao et.al. | 2408.12588v1 | link |
2024-08-22 | Enhanced Parking Perception by Multi-Task Fisheye Cross-view Transformers | Antonyo Musabini et.al. | 2408.12575v1 | null |
2024-08-22 | MuMA-ToM: Multi-modal Multi-Agent Theory of Mind | Haojun Shi et.al. | 2408.12574v1 | null |
2024-08-22 | Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers | Sayed Mohammad Vakilzadeh Hatefi et.al. | 2408.12568v1 | null |
2024-08-22 | ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation | Lujia Zhong et.al. | 2408.12561v1 | link |
2024-08-22 | Exploring the Role of Audio in Multimodal Misinformation Detection | Moyang Liu et.al. | 2408.12558v1 | null |
2024-08-22 | Automatic Organ and Pan-cancer Segmentation in Abdomen CT: the FLARE 2023 Challenge | Jun Ma et.al. | 2408.12534v1 | null |
2024-08-22 | UMAD: University of Macau Anomaly Detection Benchmark Dataset | Dong Li et.al. | 2408.12527v1 | link |
2024-08-21 | Great Memory, Shallow Reasoning: Limits of $k$NN-LMs | Shangyi Geng et.al. | 2408.11815v1 | link |
2024-08-21 | EmbodiedSAM: Online Segment Any 3D Thing in Real Time | Xiuwei Xu et.al. | 2408.11811v1 | null |
2024-08-21 | Approaching Deep Learning through the Spectral Dynamics of Weights | David Yunis et.al. | 2408.11804v1 | link |
2024-08-21 | Practical token pruning for foundation models in few-shot conversational virtual assistant systems | Haode Qi et.al. | 2408.11799v1 | null |
2024-08-21 | Critique-out-Loud Reward Models | Zachary Ankner et.al. | 2408.11791v1 | link |
2024-08-21 | DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework | Zhifei Xie et.al. | 2408.11788v1 | null |
2024-08-21 | NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation | Zhenye Lou et.al. | 2408.11787v1 | link |
2024-08-21 | Timeline and Boundary Guided Diffusion Network for Video Shadow Detection | Haipeng Zhou et.al. | 2408.11785v1 | link |
2024-08-21 | SBDet: A Symmetry-Breaking Object Detector via Relaxed Rotation-Equivariance | Zhiqiang Wu et.al. | 2408.11760v1 | null |
2024-08-21 | Improving the Scan-rescan Precision of AI-based CMR Biomarker Estimation | Dewmini Hasara Wickremasinghe et.al. | 2408.11754v1 | null |
2024-08-20 | Discriminant Analysis in stationary time series based on robust cepstral coefficients | Jonathan de Souza Matias et.al. | 2408.11012v1 | null |
2024-08-20 | Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos | Dennis Fedorishin et.al. | 2408.10998v1 | null |
2024-08-20 | Denoising Plane Wave Ultrasound Images Using Diffusion Probabilistic Models | Hojat Asgariandehkordi et.al. | 2408.10987v1 | null |
2024-08-20 | ISLES'24: Improving final infarct prediction in ischemic stroke using multimodal imaging and clinical data | Ezequiel de la Rosa et.al. | 2408.10966v1 | null |
2024-08-20 | Multichannel Attention Networks with Ensembled Transfer Learning to Recognize Bangla Handwritten Charecter | Farhanul Haque et.al. | 2408.10955v1 | null |
2024-08-20 | Wave-Mask/Mix: Exploring Wavelet-Based Augmentations for Time Series Forecasting | Dona Arabi et.al. | 2408.10951v1 | link |
2024-08-20 | Proxona: Leveraging LLM-Driven Personas to Enhance Creators' Understanding of Their Audience | Yoonseo Choi et.al. | 2408.10937v1 | null |
2024-08-20 | SDI-Net: Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement | Linlin Hu et.al. | 2408.10934v1 | null |
2024-08-20 | ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining | Qi Ma et.al. | 2408.10906v1 | null |
2024-08-20 | ViLReF: A Chinese Vision-Language Retinal Foundation Model | Shengzhu Yang et.al. | 2408.10894v1 | link |
2024-08-19 | Some model theory of quadratic geometries | Charlotte Kestner et.al. | 2408.10196v1 | null |
2024-08-19 | Area under the ROC Curve has the Most Consistent Evaluation for Binary Classification | Jing Li et.al. | 2408.10193v1 | null |
2024-08-20 | LongVILA: Scaling Long-Context Visual Language Models for Long Videos | Fuzhao Xue et.al. | 2408.10188v2 | link |
2024-08-19 | SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models | Anke Tang et.al. | 2408.10174v1 | link |
2024-08-19 | Galaxy Zoo: Morphologies based on UKIDSS NIR Imaging for 71,052 Galaxies | Karen L. Masters et.al. | 2408.10160v1 | null |
2024-08-19 | Structure-preserving Image Translation for Depth Estimation in Colonoscopy Video | Shuxian Wang et.al. | 2408.10153v1 | null |
2024-08-19 | Biharmonic conformal immersions into a 3-dimensional conformally flat space | Ze-Ping Wang et.al. | 2408.10144v1 | null |
2024-08-19 | Perceptual Depth Quality Assessment of Stereoscopic Omnidirectional Images | Wei Zhou et.al. | 2408.10134v1 | null |
2024-08-19 | UNINEXT-Cutie: The 1st Solution for LSVOS Challenge RVOS Track | Hao Fang et.al. | 2408.10129v1 | null |
2024-08-19 | Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track | Feiyu Pan et.al. | 2408.10125v1 | null |
2024-08-16 | Quantum Annealing for Enhanced Feature Selection in Single-Cell RNA Sequencing Data Analysis | Selim Romero et.al. | 2408.08867v1 | null |
2024-08-16 | DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models | Eman Ali et.al. | 2408.08855v1 | null |
2024-08-16 | ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis | Yubao Zhao et.al. | 2408.08849v1 | null |
2024-08-16 | HistoGym: A Reinforcement Learning Environment for Histopathological Image Analysis | Zhi-Bo Liu et.al. | 2408.08847v1 | link |
2024-08-16 | LEVIS: Large Exact Verifiable Input Spaces for Neural Networks | Mohamad Fares El Hajj Chehade et.al. | 2408.08824v1 | null |
2024-08-16 | Optimal Symmetries in Binary Classification | Vishal S. Ngairangbam et.al. | 2408.08823v1 | null |
2024-08-16 | Leveraging FourierKAN Classification Head for Pre-Trained Transformer-based Text Classification | Abdullah Al Imran et.al. | 2408.08803v1 | null |
2024-08-16 | Xpikeformer: Hybrid Analog-Digital Hardware Acceleration for Spiking Transformers | Zihang Song et.al. | 2408.08794v1 | null |
2024-08-16 | Assessing Generalization Capabilities of Malaria Diagnostic Models from Thin Blood Smears | Louise Guillon et.al. | 2408.08792v1 | null |
2024-08-16 | A Disease-Specific Foundation Model Using Over 100K Fundus Images: Release and Validation for Abnormality and Multi-Disease Classification on Downstream Tasks | Boa Jang et.al. | 2408.08790v1 | link |
2024-08-15 | HyperTaxel: Hyper-Resolution for Taxel-Based Tactile Signals Through Contrastive Learning | Hongyu Li et.al. | 2408.08312v1 | null |
2024-08-15 | Gauge-invariant optical selection rules for excitons | Tharindu Fernando et.al. | 2408.08311v1 | null |
2024-08-15 | Accelerated Image-Aware Generative Diffusion Modeling | Tanmay Asthana et.al. | 2408.08306v1 | null |
2024-08-15 | SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training | Gengwei Zhang et.al. | 2408.08295v1 | link |
2024-08-15 | Marker or Markerless? Mode-Switchable Optical Tactile Sensing for Diverse Robot Tasks | Ni Ou et.al. | 2408.08276v1 | null |
2024-08-15 | Snuffy: Efficient Whole Slide Image Classifier | Hossein Jafarinia et.al. | 2408.08258v1 | link |
2024-08-15 | Rethinking Medical Anomaly Detection in Brain MRI: An Image Quality Assessment Perspective | Zixuan Pan et.al. | 2408.08228v1 | link |
2024-08-15 | RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science | David Farr et.al. | 2408.08217v1 | null |
2024-08-15 | Moving Healthcare AI-Support Systems for Visually Detectable Diseases onto Constrained Devices | Tess Watt et.al. | 2408.08215v1 | null |
2024-08-15 | Learned Multimodal Compression for Autonomous Driving | Hadi Hadizadeh et.al. | 2408.08211v1 | null |
2024-08-14 | End-to-end Semantic-centric Video-based Multimodal Affective Computing | Ronghao Lin et.al. | 2408.07694v1 | null |
2024-08-15 | A Spitting Image: Modular Superpixel Tokenization in Vision Transformers | Marius Aasan et.al. | 2408.07680v2 | link |
2024-08-14 | G$^2$V$^2$former: Graph Guided Video Vision Transformer for Face Anti-Spoofing | Jingyi Yang et.al. | 2408.07675v1 | null |
2024-08-14 | Graph Triple Attention Network: A Decoupled Perspective | Xiaotang Wang et.al. | 2408.07654v1 | link |
2024-08-14 | Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving | Yuqing Wen et.al. | 2408.07605v1 | null |
2024-08-14 | Disentangle and denoise: Tackling context misalignment for video moment retrieval | Kaijing Ma et.al. | 2408.07600v1 | null |
2024-08-14 | Theoretical and Practical Progress in Hyperspectral Pixel Unmixing with Large Spectral Libraries from a Sparse Perspective | Jade Preston et.al. | 2408.07580v1 | null |
2024-08-14 | TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases | Thibault Simonetto et.al. | 2408.07579v1 | link |
2024-08-14 | DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model | Erez Yosef et.al. | 2408.07541v1 | null |
2024-08-14 | Improved 3D Whole Heart Geometry from Sparse CMR Slices | Yiyang Xu et.al. | 2408.07532v1 | link |
2024-08-13 | On Networks and their Applications: Stability of Gene Regulatory Networks and Gene Function Prediction using Autoencoders | Hamza Coban et.al. | 2408.07064v1 | null |
2024-08-13 | Subjective and Objective Quality Assessment of Rendered Human Avatar Videos in Virtual Reality | Yu-Chih Chen et.al. | 2408.07041v1 | null |
2024-08-13 | PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology | Xiaomin Wu et.al. | 2408.07037v1 | null |
2024-08-13 | Feature-Preserving Rate-Distortion Optimization in Image Coding for Machines | Samuel Fernández Menduiña et.al. | 2408.07028v1 | null |
2024-08-13 | Event-Stream Super Resolution using Sigma-Delta Neural Network | Waseem Shariff et.al. | 2408.06968v1 | null |
2024-08-13 | DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs | Dongyuan Li et.al. | 2408.06966v1 | null |
2024-08-13 | OpenResearcher: Unleashing AI for Accelerated Scientific Research | Yuxiang Zheng et.al. | 2408.06941v1 | link |
2024-08-13 | Diagnosis extraction from unstructured Dutch echocardiogram reports using span- and document-level characteristic classification | Bauke Arends et.al. | 2408.06930v1 | null |
2024-08-13 | Divide and Conquer: Improving Multi-Camera 3D Perception with 2D Semantic-Depth Priors and Input-Dependent Queries | Qi Song et.al. | 2408.06901v1 | null |
2024-08-13 | Entendre, a Social Bot Detection Tool for Niche, Fringe, and Extreme Social Media | Pranav Venkatesh et.al. | 2408.06900v1 | null |
2024-08-12 | Is it a work or leisure travel? Applying text classification to identify work-related travel on social networks | Lucas Félix et.al. | 2408.06341v1 | null |
2024-08-12 | Moo-ving Beyond Tradition: Revolutionizing Cattle Behavioural Phenotyping with Pose Estimation Techniques | Navid Ghassemi et.al. | 2408.06336v1 | null |
2024-08-12 | LOLgorithm: Integrating Semantic,Syntactic and Contextual Elements for Humor Classification | Tanisha Khurana et.al. | 2408.06335v1 | null |
2024-08-12 | From SAM to SAM 2: Exploring Improvements in Meta's Segment Anything Model | Athulya Sundaresan Geetha et.al. | 2408.06305v1 | null |
2024-08-12 | Sparsity Based Multi-Source Robust 3D Localization Using a Moving Receiver | Amir Mansourian et.al. | 2408.06274v1 | null |
2024-08-12 | Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample Importance | Manuel Milling et.al. | 2408.06264v1 | null |
2024-08-12 | Deep Learning System Boundary Testing through Latent Space Style Mixing | Amr Abdellatif et.al. | 2408.06258v1 | null |
2024-08-12 | Rethinking Video with a Universal Event-Based Representation | Andrew Freeman et.al. | 2408.06248v1 | null |
2024-08-12 | A Comprehensive Case Study on the Performance of Machine Learning Methods on the Classification of Solar Panel Electroluminescence Images | Xinyi Song et.al. | 2408.06229v1 | link |
2024-08-12 | ARCADE: An Augmented Reality Display Environment for Multimodal Interaction with Conversational Agents | Carolin Schindler et.al. | 2408.06222v1 | null |
2024-08-09 | VITA: Towards Open-Source Interactive Omni Multimodal LLM | Chaoyou Fu et.al. | 2408.05211v1 | null |
2024-08-09 | Kalman-Inspired Feature Propagation for Video Face Super-Resolution | Ruicheng Feng et.al. | 2408.05205v1 | null |
2024-08-09 | HistoKernel: Whole Slide Image Level Maximum Mean Discrepancy Kernels for Pan-Cancer Predictive Modelling | Piotr Keller et.al. | 2408.05195v1 | link |
2024-08-09 | Cross-Domain Learning for Video Anomaly Detection with Limited Supervision | Yashika Jain et.al. | 2408.05191v1 | null |
2024-08-09 | Holomorphic vector fields with real integral manifolds | Martin Kolář et.al. | 2408.05186v1 | null |
2024-08-09 | MADE-WIC: Multiple Annotated Datasets for Exploring Weaknesses In Code | Moritz Mock et.al. | 2408.05163v1 | null |
2024-08-09 | Meta-Learning Guided Label Noise Distillation for Robust Signal Modulation Classification | Xiaoyang Hao et.al. | 2408.05151v1 | null |
2024-08-09 | Sportify: Question Answering with Embedded Visualizations and Personified Narratives for Sports Video | Chunggi Lee et.al. | 2408.05123v1 | null |
2024-08-09 | Cautious Calibration in Binary Classification | Mari-Liis Allikivi et.al. | 2408.05120v1 | null |
2024-08-09 | Beyond the Eye: A Relational Model for Early Dementia Detection Using Retinal OCTA Images | Shouyue Liu et.al. | 2408.05117v1 | null |
2024-08-08 | Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics | Ruining Li et.al. | 2408.04631v1 | null |
2024-08-08 | LogogramNLP: Comparing Visual and Textual Representations of Ancient Logographic Writing Systems for NLP | Danlu Chen et.al. | 2408.04628v1 | null |
2024-08-08 | Transformer Explainer: Interactive Learning of Text-Generative Models | Aeree Cho et.al. | 2408.04619v1 | null |
2024-08-08 | Quantifying the Impact of Population Shift Across Age and Sex for Abdominal Organ Segmentation | Kate Čevora et.al. | 2408.04610v1 | null |
2024-08-08 | Enhanced Prototypical Part Network (EPPNet) For Explainable Image Classification Via Prototypes | Bhushan Atote et.al. | 2408.04606v1 | null |
2024-08-08 | SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation | Jieming Yu et.al. | 2408.04593v1 | null |
2024-08-08 | Learn To Learn More Precisely | Runxi Cheng et.al. | 2408.04590v1 | null |
2024-08-08 | SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals | Haoran Zheng et.al. | 2408.04575v1 | null |
2024-08-08 | Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches | Yongzhi Xu et.al. | 2408.04567v1 | null |
2024-08-08 | MemeMind at ArAIEval Shared Task: Spotting Persuasive Spans in Arabic Text with Persuasion Techniques Identification | Md Rafiul Biswas et.al. | 2408.04540v1 | null |
2024-08-07 | How Well Can Vision Language Models See Image Details? | Chenhui Gou et.al. | 2408.03940v1 | null |
2024-08-07 | Fast Sprite Decomposition from Animated Graphics | Tomoyuki Suzuki et.al. | 2408.03923v1 | null |
2024-08-07 | FMiFood: Multi-modal Contrastive Learning for Food Image Classification | Xinyue Pan et.al. | 2408.03922v1 | null |
2024-08-07 | Holomorphic foliations tangent to Rolle-pfaffian hypersurfaces | Arturo Fernández-Pérez et.al. | 2408.03914v1 | null |
2024-08-07 | AdapMTL: Adaptive Pruning Framework for Multitask Learning Model | Mingcan Xiang et.al. | 2408.03913v1 | null |
2024-08-07 | Achieving Human Level Competitive Robot Table Tennis | David B. D'Ambrosio et.al. | 2408.03906v1 | null |
2024-08-07 | Lightweight Video Denoising Using a Classic Bayesian Backbone | Clément Bled et.al. | 2408.03904v1 | null |
2024-08-07 | Retrieval Augmentation via User Interest Clustering | Hanjia Lyu et.al. | 2408.03886v1 | null |
2024-08-07 | Global-Local Progressive Integration Network for Blind Image Quality Assessment | Xiaoqi Wang et.al. | 2408.03885v1 | null |
2024-08-07 | Knowledge Probing for Graph Representation Learning | Mingyu Zhao et.al. | 2408.03877v1 | null |
2024-08-06 | LLaVA-OneVision: Easy Visual Task Transfer | Bo Li et.al. | 2408.03326v1 | null |
2024-08-06 | ClassiFIM: An Unsupervised Method To Detect Phase Transitions | Victor Kasatkin et.al. | 2408.03323v1 | null |
2024-08-06 | Segment Anything in Medical Images and Videos: Benchmark and Deployment | Jun Ma et.al. | 2408.03322v1 | null |
2024-08-06 | MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation | Xiaofeng Mao et.al. | 2408.03312v1 | null |
2024-08-06 | Left of Fab: Securing Design and Collaboration in the Semiconductor Value Chain | John C. Hoag et.al. | 2408.03295v1 | null |
2024-08-06 | Biomedical SAM 2: Segment Anything in Biomedical Images and Videos | Zhiling Yan et.al. | 2408.03286v1 | null |
2024-08-06 | ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer | Jiazhi Guan et.al. | 2408.03284v1 | null |
2024-08-06 | Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments | Angie Boggust et.al. | 2408.03274v1 | null |
2024-08-07 | BVI-AOM: A New Training Dataset for Deep Video Compression Optimization | Jakub Nawała et.al. | 2408.03265v2 | null |
2024-08-06 | Analysis of Partially-Calibrated Sparse Subarrays for Direction Finding with Extended Degrees of Freedom | W. S. Leite et.al. | 2408.03236v1 | null |
2024-08-05 | Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics | Shishira R Maiya et.al. | 2408.02672v1 | null |
2024-08-05 | Interactive 3D Medical Image Segmentation with SAM 2 | Chuyun Shen et.al. | 2408.02635v1 | null |
2024-08-05 | VidGen-1M: A Large-Scale Dataset for Text-to-video Generation | Zhiyu Tan et.al. | 2408.02629v1 | null |
2024-08-05 | DanModCap: Designing a Danmaku Moderation Tool for Video-Sharing Platforms that Leverages Impact Captions | Siying Hu et.al. | 2408.02574v1 | null |
2024-08-05 | Cross-Modality Clustering-based Self-Labeling for Multimodal Data Classification | Paweł Zyblewski et.al. | 2408.02568v1 | null |
2024-08-05 | HQOD: Harmonious Quantization for Object Detection | Long Huang et.al. | 2408.02561v1 | null |
2024-08-05 | The effect of dynamical states on galaxy clusters populations. I. Classification of dynamical states | S. Véliz Astudillo et.al. | 2408.02519v1 | null |
2024-08-05 | Automatic rating of incomplete hippocampal inversions evaluated across multiple cohorts | Lisa Hemforth et.al. | 2408.02496v1 | null |
2024-08-05 | HyperSpaceX: Radial and Angular Exploration of HyperSpherical Dimensions | Chiranjeev Chiranjeev et.al. | 2408.02494v1 | null |
2024-08-05 | Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection | Ting Lei et.al. | 2408.02484v1 | null |
2024-08-02 | Conditional LoRA Parameter Generation | Xiaolong Jin et.al. | 2408.01415v1 | null |
2024-08-02 | Derivation of Back-propagation for Graph Convolutional Networks using Matrix Calculus and its Application to Explainable Artificial Intelligence | Yen-Che Hsiao et.al. | 2408.01408v1 | null |
2024-08-02 | NOLO: Navigate Only Look Once | Bohan Zhou et.al. | 2408.01384v1 | null |
2024-08-02 | Explaining a probabilistic prediction on the simplex with Shapley compositions | Paul-Gauthier Noé et.al. | 2408.01382v1 | null |
2024-08-02 | Spatial-Spectral Morphological Mamba for Hyperspectral Image Classification | Muhammad Ahmad et.al. | 2408.01372v1 | null |
2024-08-02 | Classification of marked elliptic root systems with non-reduced quotient | A. Fialowski et.al. | 2408.01358v1 | null |
2024-08-02 | Harmonized connectome resampling for variance in voxel sizes | Elyssa M. McMaster et.al. | 2408.01351v1 | null |
2024-08-02 | Human foraging strategies flexibly adapt to resource distribution and time constraints | Valeria Simonelli et.al. | 2408.01350v1 | null |
2024-08-02 | PC$^2$: Pseudo-Classification Based Pseudo-Captioning for Noisy Correspondence Learning in Cross-Modal Retrieval | Yue Duan et.al. | 2408.01349v1 | null |
2024-08-02 | Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks | Anders Giovanni Møller et.al. | 2408.01346v1 | null |
2024-08-01 | Text-Guided Video Masked Autoencoder | David Fan et.al. | 2408.00759v1 | null |
2024-08-01 | Segment anything model 2: an application to 2D and 3D medical images | Haoyu Dong et.al. | 2408.00756v1 | null |
2024-08-01 | Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model | Benlin Liu et.al. | 2408.00754v1 | null |
2024-08-01 | CERT-ED: Certifiably Robust Text Classification for Edit Distance | Zhuoqun Huang et.al. | 2408.00728v1 | null |
2024-08-01 | SAM 2: Segment Anything in Images and Videos | Nikhila Ravi et.al. | 2408.00714v1 | null |
2024-08-01 | Investigating Brain Connectivity and Regional Statistics from EEG for early stage Parkinson's Classification | Amarpal Sahota et.al. | 2408.00711v1 | null |
2024-08-01 | Point-supervised Brain Tumor Segmentation with Box-prompted MedSAM | Xiaofeng Liu et.al. | 2408.00706v1 | null |
2024-08-01 | Granular-Balls based Fuzzy Twin Support Vector Machine for Classification | Lixi Zhao et.al. | 2408.00699v1 | null |
2024-08-01 | ExpertAF: Expert Actionable Feedback from Video | Kumar Ashutosh et.al. | 2408.00672v1 | null |
2024-08-01 | AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models | Daqin Luo et.al. | 2408.00665v1 | null |
2024-07-31 | The Llama 3 Herd of Models | Abhimanyu Dubey et.al. | 2407.21783v1 | null |
2024-07-31 | RainMamba: Enhanced Locality Learning with State Space Models for Video Deraining | Hongtao Wu et.al. | 2407.21773v1 | null |
2024-07-31 | ReplanVLM: Replanning Robotic Tasks with Visual Language Models | Aoran Mei et.al. | 2407.21762v1 | null |
2024-07-31 | Learning Video Context as Interleaved Multimodal Sequences | Kevin Qinghong Lin et.al. | 2407.21757v1 | null |
2024-08-01 | Topological Woodward-Hoffmann classification for cycloadditions in polycyclic aromatic azomethine ylides | Juan Li et.al. | 2407.21756v2 | null |
2024-07-31 | A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation | Mothilal Asokan et.al. | 2407.21739v1 | null |
2024-07-31 | Leveraging Self-Supervised Learning for Fetal Cardiac Planes Classification using Ultrasound Scan Videos | Joseph Geo Benjamin et.al. | 2407.21738v1 | null |
2024-07-31 | Artificial Intelligence Approaches for Energy Efficiency: A Review | Alberto Pasqualetto et.al. | 2407.21726v1 | null |
2024-07-31 | Open-Vocabulary Audio-Visual Semantic Segmentation | Ruohao Guo et.al. | 2407.21721v1 | null |
2024-07-31 | Tora: Trajectory-oriented Diffusion Transformer for Video Generation | Zhenghao Zhang et.al. | 2407.21705v1 | null |
2024-07-30 | Contrasting Deep Learning Models for Direct Respiratory Insufficiency Detection Versus Blood Oxygen Saturation Estimation | Marcelo Matheus Gauy et.al. | 2407.20989v1 | null |
2024-07-30 | Transfer Learning for Multi-material Classification of Transition Metal Dichalcogenides with Atomic Force Microscopy | Isaiah A. Moses et.al. | 2407.20975v1 | null |
2024-07-30 | MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions | Xiaowei Chi et.al. | 2407.20962v1 | link |
2024-07-30 | EAR: Edge-Aware Reconstruction of 3-D vertebrae structures from bi-planar X-ray images | Lixing Tan et.al. | 2407.20937v1 | null |
2024-07-30 | Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering | Yanpeng Zhao et.al. | 2407.20908v1 | link |
2024-07-30 | Simultaneous Multi-Slice Diffusion Imaging using Navigator-free Multishot Spiral Acquisition | Yuancheng Jiang et.al. | 2407.20904v1 | null |
2024-07-30 | Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach | Adam Wojciechowski et.al. | 2407.20899v1 | null |
2024-07-30 | MambaCapsule: Towards Transparent Cardiac Disease Diagnosis with Electrocardiography Using Mamba Capsule Network | Yinlong Xu et.al. | 2407.20893v1 | null |
2024-07-30 | Shift operators and their classification | Maria Carvalho et.al. | 2407.20890v1 | null |
2024-07-30 | Effective Black Box Testing of Sentiment Analysis Classification Networks | Parsa Karbasizadeh et.al. | 2407.20884v1 | null |
2024-07-29 | SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction | Çağhan Köksal et.al. | 2407.20214v1 | null |
2024-07-30 | SpaER: Learning Spatio-temporal Equivariant Representations for Fetal Brain Motion Tracking | Jian Wang et.al. | 2407.20198v2 | null |
2024-07-29 | Radiance Fields for Robotic Teleoperation | Maximum Wilder-Smith et.al. | 2407.20194v1 | null |
2024-07-29 | Theia: Distilling Diverse Vision Foundation Models for Robot Learning | Jinghuan Shang et.al. | 2407.20179v1 | link |
2024-07-29 | LatentArtiFusion: An Effective and Efficient Histological Artifacts Restoration Framework | Zhenqi He et.al. | 2407.20172v1 | link |
2024-07-29 | Diffusion Feedback Helps CLIP See Better | Wenxuan Wang et.al. | 2407.20171v1 | null |
2024-07-29 | Language-Conditioned Offline RL for Multi-Robot Navigation | Steven Morad et.al. | 2407.20164v1 | null |
2024-07-29 | Quantum Machine Learning Architecture Search via Deep Reinforcement Learning | Xin Dai et.al. | 2407.20147v1 | null |
2024-07-30 | AxiomVision: Accuracy-Guaranteed Adaptive Visual Model Selection for Perspective-Aware Video Analytics | Xiangxiang Dai et.al. | 2407.20124v2 | link |
2024-07-29 | Integrable and superintegrable quantum mechanical systems with position dependent masses invariant with respect to one parametric Lie groups. 2. Systems with dilatation and shift symmetries | A. G. Nikitin et.al. | 2407.20112v1 | null |
2024-07-26 | HRP: Human Affordances for Robotic Pre-Training | Mohan Kumar Srirama et.al. | 2407.18911v1 | null |
2024-07-26 | Wolf: Captioning Everything with a World Summarization Framework | Boyi Li et.al. | 2407.18908v1 | null |
2024-07-26 | A Scalable Quantum Non-local Neural Network for Image Classification | Sparsh Gupta et.al. | 2407.18906v1 | link |
2024-07-26 | Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment | Yuze Zheng et.al. | 2407.18854v1 | null |
2024-07-26 | The Role of Temporal Hierarchy in Spiking Neural Networks | Filippo Moro et.al. | 2407.18838v1 | null |
2024-07-26 | Learning the Chaotic and Regular Nature of Trajectories in Hamiltonian Systems with Lagrangian descriptors | Javier Jiménez López et.al. | 2407.18831v1 | null |
2024-07-26 | Binary orbit and disks properties of the RW Aur system using ALMA observations | N. T. Kurtovic et.al. | 2407.18828v1 | null |
2024-07-26 | Three-dimensional ultrasound-based online system for automated ovarian follicle measurement | Pedro Royo et.al. | 2407.18818v1 | null |
2024-07-26 | Automatic Detection of Moral Values in Music Lyrics | Vjosa Preniqi et.al. | 2407.18787v1 | null |
2024-07-26 | Deep learning interpretable analysis for carbon star identification in Gaia DR3 | Shuo Ye et.al. | 2407.18754v1 | null |
2024-07-25 | Review of Degenerate Higher Order Scalar Tensor Theories in Cosmology | Andrei Lazanu et.al. | 2407.18234v1 | null |
2024-07-25 | One-point Statistics in various cosmic environments in the presence of massive neutrinos | Mohadese Khoshtinat et.al. | 2407.18233v1 | null |
2024-07-26 | Enhanced Depth Estimation and 3D Geometry Reconstruction using Bayesian Helmholtz Stereopsis with Belief Propagation | Razieh Azizi et.al. | 2407.18195v2 | null |
2024-07-25 | PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations | Cheng Qian et.al. | 2407.18178v1 | null |
2024-07-26 | On-chip near-infrared spectroscopic sensing with over 520nm bandwidth | Chunhui Yao et.al. | 2407.18172v2 | null |
2024-07-25 | IRIS: Wireless Ring for Vision-based Smart Home Interaction | Maruchi Kim et.al. | 2407.18141v1 | null |
2024-07-25 | XS-VID: An Extremely Small Video Object Detection Dataset | Jiahao Guo et.al. | 2407.18137v1 | null |
2024-07-25 | Estimating Earthquake Magnitude in Sentinel-1 Imagery via Ranking | Daniele Rege Cambrin et.al. | 2407.18128v1 | null |
2024-07-25 | Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images | Roberto Di Via et.al. | 2407.18125v1 | null |
2024-07-25 | Multi-Resolution Histopathology Patch Graphs for Ovarian Cancer Subtyping | Jack Breen et.al. | 2407.18105v1 | link |
2024-07-24 | SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency | Yiming Xie et.al. | 2407.17470v1 | null |
2024-07-24 | SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning | Jianpeng Yao et.al. | 2407.17460v1 | null |
2024-07-24 | EuroCropsML: A Time Series Benchmark Dataset For Few-Shot Crop Type Classification | Joana Reuss et.al. | 2407.17458v1 | null |
2024-07-24 | HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation | Zhenzhi Wang et.al. | 2407.17438v1 | link |
2024-07-24 | Systematic study of High |
Z. Wang et.al. | 2407.17407v1 | null |
2024-07-24 | Self-Calibrated Variance-Stabilizing Transformations for Real-World Image Denoising | Sébastien Herbreteau et.al. | 2407.17399v1 | null |
2024-07-24 | Sampling-Based Hierarchical Trajectory Planning for Formation Flight | Qingzhao Liu et.al. | 2407.17392v1 | null |
2024-07-24 | 2D and 3D Deep Learning Models for MRI-based Parkinson's Disease Classification: A Comparative Analysis of Convolutional Kolmogorov-Arnold Networks, Convolutional Neural Networks, and Graph Convolutional Networks | Salil B Patel et.al. | 2407.17380v1 | null |
2024-07-24 | Entropy Reweighted Conformal Classification | Rui Luo et.al. | 2407.17377v1 | null |
2024-07-24 | MuST: Multi-Scale Transformers for Surgical Phase Recognition | Alejandra Pérez et.al. | 2407.17361v1 | link |
2024-07-23 | Explanation Regularisation through the Lens of Attributions | Pedro Ferreira et.al. | 2407.16693v1 | null |
2024-07-23 | On the local cohomology of secant varieties | Sebastian Olano et.al. | 2407.16688v1 | null |
2024-07-23 | AutoRG-Brain: Grounded Report Generation for Brain MRI | Jiayu Lei et.al. | 2407.16684v1 | null |
2024-07-24 | Goedel logics: Prenex fragments | Matthias Baaz et.al. | 2407.16683v2 | null |
2024-07-24 | A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data | Adrian Remonda et.al. | 2407.16680v2 | link |
2024-07-23 | From Imitation to Refinement -- Residual RL for Precise Visual Assembly | Lars Ankile et.al. | 2407.16677v1 | null |
2024-07-23 | FakingRecipe: Detecting Fake News on Short Video Platforms from the Perspective of Creative Process | Yuyan Bu et.al. | 2407.16670v1 | null |
2024-07-23 | EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval | Thomas Hummel et.al. | 2407.16658v1 | link |
2024-07-23 | Fluorescence Diffraction Tomography using Explicit Neural Fields | Renzhi He et.al. | 2407.16657v1 | null |
2024-07-23 | MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence | Canyu Zhao et.al. | 2407.16655v1 | null |
2024-07-22 | AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description | Junyu Xie et.al. | 2407.15850v1 | link |
2024-07-22 | SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models | Mingze Xu et.al. | 2407.15841v1 | null |
2024-07-23 | QueST: Self-Supervised Skill Abstractions for Learning Continuous Control | Atharva Mete et.al. | 2407.15840v2 | null |
2024-07-22 | Enhancing Cell Instance Segmentation in Scanning Electron Microscopy Images via a Deep Contour Closing Operator | Florian Robert et.al. | 2407.15817v1 | null |
2024-07-22 | Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning | Zhecheng Yuan et.al. | 2407.15815v1 | null |
2024-07-22 | The Evaporating Massive Embedded Stellar Cluster IRS 13 Close to Sgr A. II. Kinematic structure* | Florian Peißker et.al. | 2407.15800v1 | null |
2024-07-22 | Adaptive Extensions of Unbiased Risk Estimators for Unsupervised Magnetic Resonance Image Denoising | Reeshad Khan et.al. | 2407.15799v1 | null |
2024-07-23 | Disentangling spatio-temporal knowledge for weakly supervised object detection and segmentation in surgical video | Guiqiu Liao et.al. | 2407.15794v2 | null |
2024-07-22 | LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding | Haoning Wu et.al. | 2407.15754v1 | link |
2024-07-22 | SAM2CLIP2SAM: Vision Language Model for Segmentation of 3D CT Scans for Covid-19 Detection | Dimitrios Kollias et.al. | 2407.15728v1 | null |
2024-07-19 | DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks | Sarah Jabbour et.al. | 2407.14509v1 | null |
2024-07-19 | T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation | Kaiyue Sun et.al. | 2407.14505v1 | null |
2024-07-19 | Nonlinear Schrödinger Network | Yiming Zhou et.al. | 2407.14504v1 | null |
2024-07-19 | Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery | Sukrut Rao et.al. | 2407.14499v1 | link |
2024-07-19 | Enhancing Layout Hotspot Detection Efficiency with YOLOv8 and PCA-Guided Augmentation | Dongyang Wu et.al. | 2407.14498v1 | null |
2024-07-19 | Evaluating the Reliability of Self-Explanations in Large Language Models | Korbinian Randl et.al. | 2407.14487v1 | link |
2024-07-19 | Co-synthesis of Histopathology Nuclei Image-Label Pairs using a Context-Conditioned Joint Diffusion Model | Seonghui Min et.al. | 2407.14434v1 | null |
2024-07-19 | Dataset Distillation in Medical Imaging: A Feasibility Study | Muyang Li et.al. | 2407.14429v1 | null |
2024-07-19 | Controllable and Efficient Multi-Class Pathology Nuclei Data Augmentation using Text-Conditioned Diffusion Models | Hyun-Jic Oh et.al. | 2407.14426v1 | null |
2024-07-19 | Improving classification of road surface conditions via road area extraction and contrastive learning | Linh Trinh et.al. | 2407.14418v1 | null |
2024-07-18 | GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model | Abdelrahman Shaker et.al. | 2407.13772v1 | null |
2024-07-18 | Addressing Imbalance for Class Incremental Learning in Medical Image Classification | Xuze Hao et.al. | 2407.13768v1 | null |
2024-07-18 | Shape of Motion: 4D Reconstruction from a Single Video | Qianqian Wang et.al. | 2407.13764v1 | null |
2024-07-18 | Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion | Boyang Deng et.al. | 2407.13759v1 | null |
2024-07-18 | Exploring Facial Biomarkers for Depression through Temporal Analysis of Action Units | Aditya Parikh et.al. | 2407.13753v1 | null |
2024-07-18 | Temporal Representation Learning for Stock Similarities and Its Applications in Investment Management | Yoontae Hwang et.al. | 2407.13751v1 | null |
2024-07-18 | Pose-guided multi-task video transformer for driver action recognition | Ricardo Pizarro et.al. | 2407.13750v1 | null |
2024-07-18 | Multi-Label Learning with Stronger Consistency Guarantees | Anqi Mao et.al. | 2407.13746v1 | null |
2024-07-18 | Realizable |
Anqi Mao et.al. | 2407.13732v1 | null |
2024-07-18 | Enhanced |
Anqi Mao et.al. | 2407.13722v1 | null |
2024-07-17 | VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control | Sherwin Bahmani et.al. | 2407.12781v1 | null |
2024-07-17 | Hallucination Index: An Image Quality Metric for Generative Reconstruction Models | Matthew Tivnan et.al. | 2407.12780v1 | null |
2024-07-17 | LookupViT: Compressing visual information to a limited number of tokens | Rajat Koner et.al. | 2407.12753v1 | null |
2024-07-17 | 4Dynamic: Text-to-4D Generation with Hybrid Priors | Yu-Jie Yuan et.al. | 2407.12684v1 | null |
2024-07-17 | Goldfish: Vision-Language Understanding of Arbitrarily Long Videos | Kirolos Ataallah et.al. | 2407.12679v1 | null |
2024-07-17 | Promptable Counterfactual Diffusion Model for Unified Brain Tumor Segmentation and Generation with MRIs | Yiqing Shen et.al. | 2407.12678v1 | null |
2024-07-17 | CoSIGN: Few-Step Guidance of ConSIstency Model to Solve General INverse Problems | Jiankun Zhao et.al. | 2407.12676v1 | link |
2024-07-17 | Distilling Tiny and Ultra-fast Deep Neural Networks for Autonomous Navigation on Nano-UAVs | Lorenzo Lamberti et.al. | 2407.12675v1 | null |
2024-07-17 | Enhancing the Utility of Privacy-Preserving Cancer Classification using Synthetic Data | Richard Osuala et.al. | 2407.12669v1 | null |
2024-07-17 | Is That Rain? Understanding Effects on Visual Odometry Performance for Autonomous UAVs and Efficient DNN-based Rain Classification at the Edge | Andrea Albanese et.al. | 2407.12663v1 | null |
2024-07-16 | Motion-Oriented Compositional Neural Radiance Fields for Monocular Dynamic Human Modeling | Jaehyeok Kim et.al. | 2407.11962v1 | null |
2024-07-16 | A Transformer-based Approach for Augmenting Software Engineering Chatbots Datasets | Ahmad Abdellatif et.al. | 2407.11955v1 | null |
2024-07-16 | Gated Temporal Diffusion for Stochastic Long-Term Dense Anticipation | Olga Zatsarynna et.al. | 2407.11954v1 | null |
2024-07-16 | Temporally Consistent Stereo Matching | Jiaxi Zeng et.al. | 2407.11950v1 | link |
2024-07-17 | Hierarchical Separable Video Transformer for Snapshot Compressive Imaging | Ping Wang et.al. | 2407.11946v2 | link |
2024-07-16 | Tackling Oversmoothing in GNN via Graph Sparsification: A Truss-based Approach | Tanvir Hossain et.al. | 2407.11928v1 | null |
2024-07-16 | The Strength of Bisymmetric Modes in SDSS-IV/MaNGA Barred Galaxy Kinematics | Brian DiGiorgio Zanger et.al. | 2407.11908v1 | null |
2024-07-16 | GraphFM: A Scalable Framework for Multi-Graph Pretraining | Divyansha Lachi et.al. | 2407.11907v1 | null |
2024-07-16 | SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge | Hao Ding et.al. | 2407.11906v1 | null |
2024-07-16 | Automated production of batched unclonable micro-patterns anti-counterfeiting labels with strong robustness and rapid recognition speed | Yuzheng He et.al. | 2407.11886v1 | null |
2024-07-15 | No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations | Walter Simoncini et.al. | 2407.10964v1 | link |
2024-07-15 | InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models | Nirat Saini et.al. | 2407.10958v1 | null |
2024-07-15 | MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models | Chengguang Gan et.al. | 2407.10953v1 | null |
2024-07-15 | IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation | Yuanhao Zhai et.al. | 2407.10937v1 | link |
2024-07-15 | Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together | Dilara Soylu et.al. | 2407.10930v1 | null |
2024-07-15 | In-Loop Filtering via Trained Look-Up Tables | Zhuoyuan Li et.al. | 2407.10926v1 | null |
2024-07-15 | A Dual-Attention Aware Deep Convolutional Neural Network for Early Alzheimer's Detection | Pandiyaraju V et.al. | 2407.10921v1 | null |
2024-07-16 | DataDream: Few-shot Guided Dataset Generation | Jae Myung Kim et.al. | 2407.10910v2 | link |
2024-07-15 | Interpreting Hand gestures using Object Detection and Digits Classification | Sangeetha K et.al. | 2407.10902v1 | null |
2024-07-15 | Leveraging Multimodal CycleGAN for the Generation of Anatomically Accurate Synthetic CT Scans from MRIs | Leonardo Crespi et.al. | 2407.10888v1 | null |
2024-07-12 | Non-Hermitian Origin of Wannier Localizability and Detachable Topological Boundary States | Daichi Nakamura et.al. | 2407.09458v1 | null |
2024-07-12 | Let Me DeCode You: Decoder Conditioning with Tabular Data | Tomasz Szczepański et.al. | 2407.09437v1 | link |
2024-07-12 | Rethinking temporal self-similarity for repetitive action counting | Yanan Luo et.al. | 2407.09431v1 | null |
2024-07-12 | TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models | Hang Zou et.al. | 2407.09424v1 | null |
2024-07-12 | A grid of self-consistent MSG (MARCS-StaticWeather-GGchem) cool stellar, sub-stellar, and exoplanetary model atmospheres | Uffe G. Jørgensen et.al. | 2407.09397v1 | null |
2024-07-12 | Open-Canopy: A Country-Scale Benchmark for Canopy Height Estimation at Very High Resolution | Fajwel Fogel et.al. | 2407.09392v1 | link |
2024-07-12 | Radiance Fields from Photons | Sacha Jungerman et.al. | 2407.09386v1 | null |
2024-07-12 | Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation | Zhilin Zhu et.al. | 2407.09367v1 | link |
2024-07-12 | Novel clustered federated learning based on local loss | Endong Gu et.al. | 2407.09360v1 | link |
2024-07-12 | Imaging Interiors: An Implicit Solution to Electromagnetic Inverse Scattering Problems | Ziyuan Luo et.al. | 2407.09352v1 | null |
2024-07-11 | Video Diffusion Alignment via Reward Gradients | Mihir Prabhudesai et.al. | 2407.08737v1 | link |
2024-07-11 | Real-Time Anomaly Detection and Reactive Planning with Large Language Models | Rohan Sinha et.al. | 2407.08735v1 | null |
2024-07-11 | WhisperNetV2: SlowFast Siamese Network For Lip-Based Biometrics | Abdollah Zakeri et.al. | 2407.08717v1 | null |
2024-07-11 | Sensor-Aware Classifiers for Energy-Efficient Time Series Applications on IoT Devices | Dina Hussein et.al. | 2407.08715v1 | null |
2024-07-11 | Towards Efficient Deployment of Hybrid SNNs on Neuromorphic and Edge AI Hardware | James Seekings et.al. | 2407.08704v1 | null |
2024-07-11 | Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models | Zhening Xing et.al. | 2407.08701v1 | null |
2024-07-11 | ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions | Jiu Feng et.al. | 2407.08691v1 | link |
2024-07-11 | Generalizable Implicit Motion Modeling for Video Frame Interpolation | Zujin Guo et.al. | 2407.08680v1 | null |
2024-07-11 | Still-Moving: Customized Video Generation without Customized Video Data | Hila Chefer et.al. | 2407.08674v1 | null |
2024-07-11 | NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning | Yi Zhang et.al. | 2407.08672v1 | null |
2024-07-10 | LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models | Feng Li et.al. | 2407.07895v1 | link |
2024-07-10 | Vegetable Peeling: A Case Study in Constrained Dexterous Manipulation | Tao Chen et.al. | 2407.07884v1 | null |
2024-07-10 | Controlling Space and Time with Diffusion Models | Daniel Watson et.al. | 2407.07860v1 | null |
2024-07-11 | Functional Assessment of Cerebral Capillaries using Single Capillary Reporters in Ultrasound Localization Microscopy | Stephen A Lee et.al. | 2407.07857v2 | null |
2024-07-10 | Study on Aspect Ratio Variability toward Robustness of Vision Transformer-based Vehicle Re-identification | Mei Qiu et.al. | 2407.07842v1 | null |
2024-07-10 | Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective | Shengjia Chen et.al. | 2407.07841v1 | link |
2024-07-10 | Probe and Prejudice: Classification of compact objects and model comparison using EOS knowledge | Hauke Koehn et.al. | 2407.07837v1 | null |
2024-07-10 | RT-LA-VocE: Real-Time Low-SNR Audio-Visual Speech Enhancement | Honglie Chen et.al. | 2407.07825v1 | null |
2024-07-10 | New Gravitational Wave Discoveries Enabled by Machine Learning | Alexandra E. Koloniari et.al. | 2407.07820v1 | null |
2024-07-10 | The Misclassification Likelihood Matrix: Some Classes Are More Likely To Be Misclassified Than Others | Daniel Sikar et.al. | 2407.07818v1 | null |
2024-07-09 | V-VIPE: Variational View Invariant Pose Embedding | Mara Levy et.al. | 2407.07092v1 | null |
2024-07-09 | Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic | Ruochen Jin et.al. | 2407.07089v1 | link |
2024-07-09 | MoSt-DSA: Modeling Motion and Structural Interactions for Direct Multi-Frame Interpolation in DSA Images | Ziyang Xu et.al. | 2407.07078v1 | link |
2024-07-09 | MADE-for-ASD: A Multi-Atlas Deep Ensemble Network for Diagnosing Autism Spectrum Disorder | Md Rakibul Hasan et.al. | 2407.07076v1 | null |
2024-07-10 | CAPformer: Compression-Aware Pre-trained Transformer for Low-Light Image Enhancement | Wei Wang et.al. | 2407.07056v2 | null |
2024-07-09 | Latent Space Imaging | Matheus Souza et.al. | 2407.07052v1 | null |
2024-07-09 | Simple and Interpretable Probabilistic Classifiers for Knowledge Graphs | Christian Riefolo et.al. | 2407.07045v1 | null |
2024-07-09 | Free Fermionic Constructions of Heterotic Strings | Ioannis Florakis et.al. | 2407.07034v1 | null |
2024-07-09 | Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition | Daiqing Wu et.al. | 2407.07026v1 | null |
2024-07-09 | Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization | Jeongseok Hyun et.al. | 2407.07024v1 | link |
2024-07-08 | Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision | Orr Zohar et.al. | 2407.06189v1 | link |
2024-07-08 | Classification of Cellular Automata based on the Hamming distance | Gaspar Alfaro et.al. | 2407.06175v1 | null |
2024-07-08 | The Tug-of-War Between Deepfake Generation and Detection | Hannah Lee et.al. | 2407.06174v1 | null |
2024-07-08 | PanDORA: Casual HDR Radiance Acquisition for Indoor Scenes | Mohammad Reza Karimi Dastjerdi et.al. | 2407.06150v1 | null |
2024-07-08 | Physics-informed machine learning approaches to reactor antineutrino detection | Sophia Farrell et.al. | 2407.06139v1 | null |
2024-07-08 | Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities | Avinash Anand et.al. | 2407.06125v1 | null |
2024-07-08 | Accelerating Diffusion for SAR-to-Optical Image Translation via Adversarial Consistency Distillation | Xinyu Bai et.al. | 2407.06095v1 | null |
2024-07-08 | ERR@HRI 2024 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Interactions | Micol Spitale et.al. | 2407.06094v1 | null |
2024-07-08 | Artificial Intuition: Efficient Classification of Scientific Abstracts | Harsh Sakhrani et.al. | 2407.06093v1 | null |
2024-07-08 | Assessing Cardiomegaly in Dogs Using a Simple CNN Model | Nikhil Deekonda et.al. | 2407.06092v1 | null |
2024-07-05 | VCoME: Verbal Video Composition with Multimodal Editing Effects | Weibo Gong et.al. | 2407.04697v1 | null |
2024-07-05 | Enhancing Vehicle Re-identification and Matching for Weaving Analysis | Mei Qiu et.al. | 2407.04688v1 | null |
2024-07-05 | Embracing Massive Medical Data | Yu-Cheng Chou et.al. | 2407.04687v1 | link |
2024-07-05 | Is plantar thermography a valid digital biomarker for characterising diabetic foot ulceration risk? | Akshay Jagadeesh et.al. | 2407.04676v1 | null |
2024-07-05 | AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation | Yuhan Zhu et.al. | 2407.04603v1 | null |
2024-07-05 | Multimodal Classification via Modal-Aware Interactive Enhancement | Qing-Yuan Jiang et.al. | 2407.04587v1 | null |
2024-07-05 | A Degree Bound for Planar Functions | Christof Beierle et.al. | 2407.04570v1 | null |
2024-07-05 | Pencils of plane cubics with one base point | Riccardo Moschetti et.al. | 2407.04569v1 | null |
2024-07-05 | Anticipating Solar Flares | Hugh S. Hudson et.al. | 2407.04567v1 | null |
2024-07-05 | Real Time Emotion Analysis Using Deep Learning for Education, Entertainment, and Beyond | Abhilash Khuntia et.al. | 2407.04560v1 | null |
2024-07-03 | InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output | Pan Zhang et.al. | 2407.03320v1 | link |
2024-07-03 | Value-Penalized Auxiliary Control from Examples for Learning without Rewards or Demonstrations | Trevor Ablett et.al. | 2407.03311v1 | link |
2024-07-03 | Accelerated Proton Resonance Frequency-based Magnetic Resonance Thermometry by Optimized Deep Learning Method | Sijie Xu et.al. | 2407.03308v1 | link |
2024-07-03 | HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution Sequential Tokenization | Yucheng Tang et.al. | 2407.03307v1 | null |
2024-07-03 | VCHAR:Variance-Driven Complex Human Activity Recognition framework with Generative Representation | Yuan Sun et.al. | 2407.03291v1 | null |
2024-07-03 | Using Photoplethysmography to Detect Real-time Blood Pressure Changes with a Calibration-free Deep Learning Model | Jingyuan Hong et.al. | 2407.03274v1 | null |
2024-07-03 | Modern Neighborhood Components Analysis: A Deep Tabular Baseline Two Decades Later | Han-Jia Ye et.al. | 2407.03257v1 | link |
2024-07-03 | STF: Sentence Transformer Fine-Tuning For Topic Categorization With Limited Data | Kheir Eddine Daouadi et.al. | 2407.03253v1 | null |
2024-07-03 | ACTRESS: Active Retraining for Semi-supervised Visual Grounding | Weitai Kang et.al. | 2407.03251v1 | null |
2024-07-04 | TieBot: Learning to Knot a Tie from Visual Demonstration through a Real-to-Sim-to-Real Approach | Weikun Peng et.al. | 2407.03245v2 | null |
2024-07-02 | Characterizing the Interpretability of Attention Maps in Digital Pathology | Tomé Albuquerque et.al. | 2407.02484v1 | null |
2024-07-02 | Ensemble of pre-trained language models and data augmentation for hate speech detection from Arabic tweets | Kheir Eddine Daouadi et.al. | 2407.02448v1 | null |
2024-07-02 | PLeaS -- Merging Models with Permutations and Least Squares | Anshul Nasery et.al. | 2407.02447v1 | null |
2024-07-02 | Evaluating the Robustness of Adverse Drug Event Classification Models Using Templates | Dorothea MacPhail et.al. | 2407.02432v1 | null |
2024-07-02 | AXIAL: Attention-based eXplainability for Interpretable Alzheimer's Localized Diagnosis using 2D CNNs on 3D MRI brain scans | Gabriele Lozupone et.al. | 2407.02418v1 | link |
2024-07-03 | Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs | Jinmin Li et.al. | 2407.02411v2 | null |
2024-07-02 | Tiny-PULP-Dronets: Squeezing Neural Networks for Faster and Lighter Inference on Multi-Tasking Autonomous Nano-Drones | Lorenzo Lamberti et.al. | 2407.02405v1 | null |
2024-07-03 | A neural networks method to search for long transient gravitational waves | Francesca Attadio et.al. | 2407.02391v2 | null |
2024-07-02 | Real HSI-MSI-PAN image dataset for the hyperspectral/multi-spectral/panchromatic image fusion and super-resolution fields | Shuangliang Li et.al. | 2407.02387v1 | link |
2024-07-02 | OpenSlot: Mixed Open-set Recognition with Object-centric Learning | Xu Yin et.al. | 2407.02386v1 | null |
2024-06-28 | Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs | Sukmin Yun et.al. | 2406.20098v1 | link |
2024-06-28 | LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression | Jieneng Chen et.al. | 2406.20092v1 | link |
2024-06-28 | Minimax And Adaptive Transfer Learning for Nonparametric Classification under Distributed Differential Privacy Constraints | Arnab Auddy et.al. | 2406.20088v1 | null |
2024-06-28 | Extreme horizon equation | Wojciech Kamiński et.al. | 2406.20068v1 | null |
2024-06-28 | Modeling and LQR Control of Insect Sized Flapping Wing Robot | Daksh Dhingra et.al. | 2406.20061v1 | null |
2024-06-28 | Pairwise Difference Learning for Classification | Mohamed Karim Belaid et.al. | 2406.20031v1 | link |
2024-06-28 | On the Trade-off between Flatness and Optimization in Distributed Learning | Ying Cao et.al. | 2406.20006v1 | null |
2024-06-28 | Malaria Cell Detection Using Deep Neural Networks | Saurabh Sawant et.al. | 2406.20005v1 | null |
2024-06-28 | Impact of Initialization on Intra-subject Pediatric Brain MR Image Registration: A Comparative Analysis between SyN ANTs and Deep Learning-Based Approaches | Andjela Dimitrijevic et.al. | 2406.19943v1 | link |
2024-07-01 | GRACE: Graph-Regularized Attentive Convolutional Entanglement with Laplacian Smoothing for Robust DeepFake Video Detection | Chih-Chung Hsu et.al. | 2406.19941v2 | link |
2024-06-27 | ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos | Jr-Jen Chen et.al. | 2406.19392v1 | link |
2024-06-27 | Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads | Ali Khaleghi Rahimian et.al. | 2406.19391v1 | link |
2024-06-27 | OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding | Tao Zhang et.al. | 2406.19389v1 | null |
2024-06-27 | Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model | Haobo Yuan et.al. | 2406.19369v1 | null |
2024-06-27 | IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language | Lucky Susanto et.al. | 2406.19349v1 | null |
2024-06-27 | Learning Visual Conditioning Tokens to Correct Domain Shift for Fully Test-time Adaptation | Yushun Tang et.al. | 2406.19341v1 | null |
2024-06-28 | LiverUSRecon: Automatic 3D Reconstruction and Volumetry of the Liver with a Few Partial Ultrasound Scans | Kaushalya Sivayogaraj et.al. | 2406.19336v2 | null |
2024-06-27 | PNeRV: A Polynomial Neural Representation for Videos | Sonam Gupta et.al. | 2406.19299v1 | null |
2024-06-27 | Leveraging Contrastive Learning for Enhanced Node Representations in Tokenized Graph Transformers | Jinsong Chen et.al. | 2406.19258v1 | null |
2024-06-27 | Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment | Hao Fei et.al. | 2406.19255v1 | null |
2024-06-26 | Towards Compositionality in Concept Learning | Adam Stein et.al. | 2406.18534v1 | link |
2024-06-26 | MatchTime: Towards Automatic Soccer Game Commentary Generation | Jiayuan Rao et.al. | 2406.18530v1 | null |
2024-06-26 | MultiDiff: Consistent Novel View Synthesis from a Single Image | Norman Müller et.al. | 2406.18524v1 | null |
2024-06-26 | ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation | Shenghai Yuan et.al. | 2406.18522v1 | null |
2024-06-27 | Distinguishing mechanisms of social contagion from local network view | Elsa Andres et.al. | 2406.18519v2 | null |
2024-06-26 | Assessment of Clonal Hematopoiesis of Indeterminate Potential from Cardiac Magnetic Resonance Imaging using Deep Learning in a Cardio-oncology Population | Sangeon Ryu et.al. | 2406.18508v1 | null |
2024-06-26 | Robust Surgical Phase Recognition From Annotation Efficient Supervision | Or Rubin et.al. | 2406.18481v1 | null |
2024-06-26 | Universal Anomaly Detection at the LHC: Transforming Optimal Classifiers and the DDD Method | Sascha Caron et.al. | 2406.18469v1 | null |
2024-06-26 | An Autotuning-based Optimization Framework for Mixed-kernel SVM Classifications in Smart Pixel Datasets and Heterojunction Transistors | Xingfu Wu et.al. | 2406.18445v1 | null |
2024-06-26 | Repeat and Concatenate: 2D to 3D Image Translation with 3D to 3D Generative Modeling | Abril Corona-Figueroa et.al. | 2406.18422v1 | null |
2024-06-25 | Text-Animator: Controllable Visual Text Video Generation | Lin Liu et.al. | 2406.17777v1 | null |
2024-06-25 | MotionBooth: Motion-Aware Customized Text-to-Video Generation | Jianzong Wu et.al. | 2406.17758v1 | null |
2024-06-25 | Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation | Tushar Prasanna Swaminathan et.al. | 2406.17749v1 | null |
2024-06-25 | Structured Unrestricted-Rank Matrices for Parameter Efficient Fine-tuning | Arijit Sehanobish et.al. | 2406.17740v1 | null |
2024-06-25 | Mask-Guided Attention U-Net for Enhanced Neonatal Brain Extraction and Image Preprocessing | Bahram Jafrasteh et.al. | 2406.17709v1 | link |
2024-06-25 | SurgeMOD: Translating image-space tissue motions into vision-based surgical forces | Mikel De Iturrate Reyzabal et.al. | 2406.17707v1 | link |
2024-06-25 | Dualities for universal (co)acting Hopf monoids | Ana Agore et.al. | 2406.17684v1 | null |
2024-06-25 | Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation | Xuming Zhang et.al. | 2406.17679v1 | null |
2024-06-25 | Lifting of locally initial objects and universal (co)acting Hopf algebras | Ana Agore et.al. | 2406.17677v1 | null |
2024-06-25 | Brain Tumor Classification using Vision Transformer with Selective Cross-Attention Mechanism and Feature Calibration | Mohammad Ali Labbaf Khaniki et.al. | 2406.17670v1 | null |
2024-06-24 | StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal | Chongjie Ye et.al. | 2406.16864v1 | null |
2024-06-24 | FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models | Haonan Qiu et.al. | 2406.16863v1 | link |
2024-06-24 | Dreamitate: Real-World Visuomotor Policy Learning via Video Generation | Junbang Liang et.al. | 2406.16862v1 | null |
2024-06-24 | Long Context Transfer from Language to Vision | Peiyuan Zhang et.al. | 2406.16852v1 | link |
2024-06-24 | Unsupervised Domain Adaptation for Pediatric Brain Tumor Segmentation | Jingru Fu et.al. | 2406.16848v1 | null |
2024-06-24 | Exploring Factual Entailment with NLI: A News Media Study | Guy Mor-Lan et.al. | 2406.16842v1 | null |
2024-06-24 | A Certifiable Algorithm for Simultaneous Shape Estimation and Object Tracking | Lorenzo Shaikewitz et.al. | 2406.16837v1 | null |
2024-06-24 | USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$onversations | Mounika Marreddy et.al. | 2406.16833v1 | null |
2024-06-24 | The classification of simple complex Lie superalgebras of polynomial vector fields and their deformations | Dimitry Leites et.al. | 2406.16760v1 | null |
2024-06-24 | The MRI Scanner as a Diagnostic: Image-less Active Sampling | Yuning Du et.al. | 2406.16754v1 | null |
2024-06-21 | Full-Scale Indexing and Semantic Annotation of CT Imaging: Boosting FAIRness | Hannes Ulrich et.al. | 2406.15340v1 | null |
2024-06-21 | Image Conductor: Precision Control for Interactive Video Synthesis | Yaowei Li et.al. | 2406.15339v1 | null |
2024-06-21 | An End-to-End, Segmentation-Free, Arabic Handwritten Recognition Model on KHATT | Sondos Aabed et.al. | 2406.15329v1 | null |
2024-06-21 | Fine-grained Attention in Hierarchical Transformers for Tabular Time-series | Raphael Azorin et.al. | 2406.15327v1 | link |
2024-06-21 | NLP-KG: A System for Exploratory Search of Scientific Literature in Natural Language Processing | Tim Schopf et.al. | 2406.15294v1 | link |
2024-06-21 | Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics | Weijia Zhang et.al. | 2406.15264v1 | null |
2024-06-24 | VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation | Xuan He et.al. | 2406.15252v2 | null |
2024-06-21 | Retrieval Augmented Zero-Shot Text Classification | Tassallah Abdullahi et.al. | 2406.15241v1 | null |
2024-06-21 | Model Equivalences | Michael Benedikt et.al. | 2406.15235v1 | null |
2024-06-21 | Rate-Splitting Multiple Access for Overloaded Multi-group Multicast: A First Experimental Study | Xinze Lyu et.al. | 2406.15217v1 | null |
2024-06-20 | A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models | Xincheng Shuai et.al. | 2406.14555v1 | link |
2024-06-21 | Advancing Fine-Grained Classification by Structure and Subject Preserving Augmentation | Eyal Michaeli et.al. | 2406.14551v2 | link |
2024-06-20 | IRASim: Learning Interactive Real-Robot Action Simulators | Fangqi Zhu et.al. | 2406.14540v1 | null |
2024-06-20 | Epicardium Prompt-guided Real-time Cardiac Ultrasound Frame-to-volume Registration | Long Lei et.al. | 2406.14534v1 | link |
2024-06-20 | Local symmetries in partially ordered sets | Christoph Minz et.al. | 2406.14533v1 | null |
2024-06-20 | Fantastic Copyrighted Beasts and How (Not) to Generate Them | Luxi He et.al. | 2406.14526v1 | null |
2024-06-20 | MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding | Xinyu Fang et.al. | 2406.14515v1 | link |
2024-06-20 | V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data | Rotem Shalev-Arkushin et.al. | 2406.14510v1 | null |
2024-06-20 | LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors | Sheikh Asif Imran et.al. | 2406.14498v1 | link |
2024-06-20 | African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification | Gregor Geigle et.al. | 2406.14496v1 | null |
2024-06-18 | DrVideo: Document Retrieval Based Long Video Understanding | Ziyu Ma et.al. | 2406.12846v1 | null |
2024-06-18 | LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging | Jinuk Kim et.al. | 2406.12837v1 | link |
2024-06-18 | GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation | Ci-Siang Lin et.al. | 2406.12834v1 | null |
2024-06-18 | VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing | Jing Gu et.al. | 2406.12831v1 | null |
2024-06-18 | Neural Approximate Mirror Maps for Constrained Diffusion Models | Berthy T. Feng et.al. | 2406.12816v1 | null |
2024-06-18 | Privacy Preserving Federated Learning in Medical Imaging with Uncertainty Estimation | Nikolas Koutsoubis et.al. | 2406.12815v1 | link |
2024-06-18 | Probabilistic Temporal Prediction of Continuous Disease Trajectories and Treatment Effects Using Neural SDEs | Joshua Durso-Finley et.al. | 2406.12807v1 | null |
2024-06-18 | Composited-Nested-Learning with Data Augmentation for Nested Named Entity Recognition | Xingming Liao et.al. | 2406.12779v1 | null |
2024-06-18 | Medvedev degrees of subshifts on groups | Sebastián Barbieri et.al. | 2406.12777v1 | null |
2024-06-18 | Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video | Xiangming Zhu et.al. | 2406.12769v1 | null |
2024-06-17 | Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99% | Lei Zhu et.al. | 2406.11837v1 | link |
2024-06-17 | Spectral Introspection Identifies Group Training Dynamics in Deep Neural Networks for Neuroimaging | Bradley T. Baker et.al. | 2406.11825v1 | null |
2024-06-17 | Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation | Alexander Raistrick et.al. | 2406.11824v1 | null |
2024-06-17 | VideoLLM-online: Online Video Large Language Model for Streaming Video | Joya Chen et.al. | 2406.11816v1 | null |
2024-06-17 | Faces of Experimental Pain: Transferability of Deep Learned Heat Pain Features to Electrical Pain | Pooja Prajod et.al. | 2406.11808v1 | null |
2024-06-17 | Mix-Domain Contrastive Learning for Unpaired H&E-to-IHC Stain Translation | Song Wang et.al. | 2406.11799v1 | null |
2024-06-17 | CELL your Model: Contrastive Explanation Methods for Large Language Models | Ronny Luss et.al. | 2406.11785v1 | null |
2024-06-17 | Task Me Anything | Jieyu Zhang et.al. | 2406.11775v1 | link |
2024-06-17 | Domain Generalization for In-Orbit 6D Pose Estimation | Antoine Legrand et.al. | 2406.11743v1 | null |
2024-06-17 | Lightweight Model Pre-training via Language Guided Knowledge Distillation | Mingsheng Li et.al. | 2406.11689v1 | link |
2024-06-14 | VideoGUI: A Benchmark for GUI Automation from Instructional Videos | Kevin Qinghong Lin et.al. | 2406.10227v1 | null |
2024-06-14 | Short Film Dataset (SFD): A Benchmark for Story-Level Video Understanding | Ridouane Ghermi et.al. | 2406.10221v1 | null |
2024-06-14 | SSTFB: Leveraging self-supervised pretext learning and temporal self-attention with feature branching for real-time video polyp segmentation | Ziang Xu et.al. | 2406.10200v1 | null |
2024-06-14 | CarLLaVA: Vision language models for camera-only closed-loop driving | Katrin Renz et.al. | 2406.10165v1 | null |
2024-06-14 | Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition | Guinan Li et.al. | 2406.10152v1 | null |
2024-06-14 | Training-free Camera Control for Video Generation | Chen Hou et.al. | 2406.10126v1 | null |
2024-06-14 | Modified Risk Formulation for Improving the Prediction of Knee Osteoarthritis Progression | Haresh Rengaraj Rajamohan et.al. | 2406.10119v1 | null |
2024-06-14 | ECGMamba: Towards Efficient ECG Classification with BiSSM | Yupeng Qiang et.al. | 2406.10098v1 | null |
2024-06-14 | Biomarker based Cancer Classification using an Ensemble with Pre-trained Models | Chongmin Lee et.al. | 2406.10087v1 | null |
2024-06-14 | On the Evaluation of Speech Foundation Models for Spoken Language Understanding | Siddhant Arora et.al. | 2406.10083v1 | null |
2024-06-13 | VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding | Muhammad Maaz et.al. | 2406.09418v1 | link |
2024-06-13 | An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels | Duy-Kien Nguyen et.al. | 2406.09415v1 | null |
2024-06-13 | CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras | Sachin Shah et.al. | 2406.09409v1 | null |
2024-06-13 | Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion | Linzhan Mou et.al. | 2406.09402v1 | null |
2024-06-13 | OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation | Junke Wang et.al. | 2406.09399v1 | link |
2024-06-13 | Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA | Jongwoo Park et.al. | 2406.09396v1 | null |
2024-06-13 | LLAVIDAL: Benchmarking Large Language Vision Models for Daily Activities of Living | Rajatsubhra Chakraborty et.al. | 2406.09390v1 | null |
2024-06-13 | Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior | Baiang Li et.al. | 2406.09389v1 | null |
2024-06-13 | Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition | Youngtaek Oh et.al. | 2406.09388v1 | link |
2024-06-13 | SimGen: Simulator-conditioned Driving Scene Generation | Yunsong Zhou et.al. | 2406.09386v1 | null |
2024-06-12 | On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models | Hashmat Shadab Malik et.al. | 2406.08486v1 | link |
2024-06-12 | RMem: Restricted Memory Banks Improve Video Object Segmentation | Junbao Zhou et.al. | 2406.08476v1 | null |
2024-06-12 | AToM-Bot: Embodied Fulfillment of Unspoken Human Needs with Affective Theory of Mind | Wei Ding et.al. | 2406.08455v1 | null |
2024-06-12 | Transformation-Dependent Adversarial Attacks | Yaoteng Tan et.al. | 2406.08443v1 | null |
2024-06-12 | A Sticker is Worth a Thousand Words: Characterizing the Use of Stickers in WhatsApp Political Groups in Brazil | Philipe Melo et.al. | 2406.08429v1 | null |
2024-06-12 | Improving Noise Robustness through Abstractions and its Impact on Machine Learning | Alfredo Ibias et.al. | 2406.08428v1 | null |
2024-06-12 | OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text | Qingyun Li et.al. | 2406.08418v1 | link |
2024-06-13 | MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos | Xuehai He et.al. | 2406.08407v2 | link |
2024-06-12 | Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze | Michele Mazzamuto et.al. | 2406.08379v1 | null |
2024-06-12 | 2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction | Tianqi Chen et.al. | 2406.08374v1 | null |
2024-06-11 | Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring | Huicong Zhang et.al. | 2406.07551v1 | link |
2024-06-11 | Image and Video Tokenization with Binary Spherical Quantization | Yue Zhao et.al. | 2406.07548v1 | link |
2024-06-11 | Zero-shot Image Editing with Reference Imitation | Xi Chen et.al. | 2406.07547v1 | null |
2024-06-11 | Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance | Kuan Heng Lin et.al. | 2406.07540v1 | null |
2024-06-11 | BAKU: An Efficient Transformer for Multi-Task Policy Learning | Siddhant Haldar et.al. | 2406.07539v1 | null |
2024-06-11 | Transforming a rare event search into a not-so-rare event search in real-time with deep learning-based object detection | J. Schueler et.al. | 2406.07538v1 | null |
2024-06-11 | Towards Fundamentally Scalable Model Selection: Asymptotically Fast Update and Selection | Wenxiao Wang et.al. | 2406.07536v1 | null |
2024-06-11 | Dynamics of the non-radial energy-critical inhomogeneous NLS | Carlos M. Guzmán et.al. | 2406.07535v1 | null |
2024-06-11 | Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement | Yunzhen Feng et.al. | 2406.07515v1 | null |
2024-06-11 | Understanding Visual Concepts Across Models | Brandon Trabucco et.al. | 2406.07506v1 | link |
2024-06-10 | NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing | Ting-Hsuan Chen et.al. | 2406.06523v1 | null |
2024-06-10 | Data Augmentation for Multivariate Time Series Classification: An Experimental Study | Romain Ilbert et.al. | 2406.06518v1 | null |
2024-06-10 | Merlin: A Vision Language Foundation Model for 3D Computed Tomography | Louis Blankemeier et.al. | 2406.06512v1 | null |
2024-06-10 | Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer | Sigal Raab et.al. | 2406.06508v1 | link |
2024-06-10 | Equivariant Neural Tangent Kernels | Philipp Misof et.al. | 2406.06504v1 | null |
2024-06-10 | Viscous shock fluctuations in KPZ | Alexander Dunlap et.al. | 2406.06502v1 | null |
2024-06-10 | NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative | Asmar Nadeem et.al. | 2406.06499v1 | null |
2024-06-10 | Demonstrating HumanTHOR: A Simulation Platform and Benchmark for Human-Robot Collaboration in a Shared Workspace | Chenxu Wang et.al. | 2406.06498v1 | null |
2024-06-10 | Graph-Based Bidirectional Transformer Decision Threshold Adjustment Algorithm for Class-Imbalanced Molecular Data | Nicole Hayes et.al. | 2406.06479v1 | null |
2024-06-10 | DiffAudit: Auditing Privacy Practices of Online Services for Children and Adolescents | Olivia Figueira et.al. | 2406.06473v1 | null |
2024-06-07 | DVOS: Self-Supervised Dense-Pattern Video Object Segmentation | Keyhan Najafian et.al. | 2406.05131v1 | null |
2024-06-07 | Compositional Curvature Bounds for Deep Neural Networks | Taha Entesari et.al. | 2406.05119v1 | null |
2024-06-07 | Large Generative Graph Models | Yu Wang et.al. | 2406.05109v1 | null |
2024-06-07 | A Novel Time Series-to-Image Encoding Approach for Weather Phenomena Classification | Christian Giannetti et.al. | 2406.05096v1 | null |
2024-06-10 | Discovery of An Apparent Red, High-Velocity Type Ia Supernova at z = 2.9 with JWST | J. D. R. Pierel et.al. | 2406.05089v2 | null |
2024-06-07 | CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion | Xingrui Wang et.al. | 2406.05082v1 | null |
2024-06-10 | Discovery of a Relativistic Stripped Envelope Type Ic-BL Supernova at z = 2.83 with JWST | M. R. Siebert et.al. | 2406.05076v2 | null |
2024-06-07 | Diving Deep into the Motion Representation of Video-Text Models | Chinmaya Devaraj et.al. | 2406.05075v1 | null |
2024-06-07 | Hibou: A Family of Foundational Vision Transformers for Pathology | Dmitry Nechaev et.al. | 2406.05074v1 | null |
2024-06-07 | Classification Metrics for Image Explanations: Towards Building Reliable XAI-Evaluations | Benjamin Fresz et.al. | 2406.05068v1 | link |
2024-06-06 | Verbalized Machine Learning: Revisiting Machine Learning with Language Models | Tim Z. Xiao et.al. | 2406.04344v1 | null |
2024-06-07 | Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion | Fangfu Liu et.al. | 2406.04338v2 | null |
2024-06-06 | Parameter-Inverted Image Pyramid Networks | Xizhou Zhu et.al. | 2406.04330v1 | link |
2024-06-06 | ShareGPT4Video: Improving Video Understanding and Generation with Better Captions | Lin Chen et.al. | 2406.04325v1 | null |
2024-06-06 | SF-V: Single Forward Video Generation Model | Zhixing Zhang et.al. | 2406.04324v1 | null |
2024-06-06 | ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories | Qianlan Yang et.al. | 2406.04323v1 | null |
2024-06-06 | VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling | Zeyue Tian et.al. | 2406.04321v1 | link |
2024-06-06 | Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models | Ali Behrouz et.al. | 2406.04320v1 | null |
2024-06-06 | Adaptive Sampling of k-Space in Magnetic Resonance for Rapid Pathology Prediction | Chen-Yu Yen et.al. | 2406.04318v1 | null |
2024-06-06 | Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networks | Tristan Cinquin et.al. | 2406.04317v1 | null |
2024-06-05 | Grokking Modular Polynomials | Darshil Doshi et.al. | 2406.03495v1 | null |
2024-06-05 | The Logarithmic Memristor-Based Bayesian Machine | Clément Turck et.al. | 2406.03492v1 | null |
2024-06-05 | Convolutional Neural Networks and Vision Transformers for Fashion MNIST Classification: A Literature Review | Sonia Bbouzidi et.al. | 2406.03478v1 | null |
2024-06-05 | Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach | Haoyu Han et.al. | 2406.03464v1 | null |
2024-06-05 | Polarization Wavefront Lidar: Learning Large Scene Reconstruction from Polarized Wavefronts | Dominik Scheuble et.al. | 2406.03461v1 | null |
2024-06-05 | FILS: Self-Supervised Video Feature Prediction In Semantic Language Space | Mona Ahmadian et.al. | 2406.03447v1 | null |
2024-06-05 | Text-to-Events: Synthetic Event Camera Streams from Conditional Text Input | Joachim Ott et.al. | 2406.03439v1 | null |
2024-06-05 | Stabilizing massless fields with fluxes in Landau-Ginzburg models | Katrin Becker et.al. | 2406.03435v1 | null |
2024-06-05 | Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis | Moein Heidari et.al. | 2406.03430v1 | link |
2024-06-05 | Post-hoc Part-prototype Networks | Andong Tan et.al. | 2406.03421v1 | null |
2024-06-05 | Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting | Inkyu Shin et.al. | 2406.02541v2 | null |
2024-06-04 | ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation | Tianchen Zhao et.al. | 2406.02540v1 | null |
2024-06-04 | Enhancing predictive imaging biomarker discovery through treatment effect analysis | Shuhan Xiao et.al. | 2406.02534v1 | null |
2024-06-04 | ReLUs Are Sufficient for Learning Implicit Neural Representations | Joseph Shenouda et.al. | 2406.02529v1 | link |
2024-06-04 | RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots | Soroush Nasiriany et.al. | 2406.02523v1 | null |
2024-06-04 | DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering | Zhongpai Gao et.al. | 2406.02518v1 | null |
2024-06-04 | V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation | Cong Wang et.al. | 2406.02511v1 | null |
2024-06-04 | CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation | Dejia Xu et.al. | 2406.02509v1 | null |
2024-06-04 | Endomorphisms of Artin groups of type |
Luis Paris et.al. | 2406.02484v1 | null |
2024-06-04 | Inpainting Pathology in Lumbar Spine MRI with Latent Diffusion | Colin Hansen et.al. | 2406.02477v1 | null |
2024-05-31 | Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis | Chaoyou Fu et.al. | 2405.21075v1 | null |
2024-05-31 | Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights | Xin Wen et.al. | 2405.21070v1 | link |
2024-05-31 | You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet | Zhen Qin et.al. | 2405.21022v1 | null |
2024-05-31 | Beyond Conventional Parametric Modeling: Data-Driven Framework for Estimation and Prediction of Time Activity Curves in Dynamic PET Imaging | Niloufar Zakariaei et.al. | 2405.21021v1 | null |
2024-05-31 | The classification of dp-minimal integral domains | Christian d'Elbée et.al. | 2405.21014v1 | null |
2024-05-31 | Early Stopping Criteria for Training Generative Adversarial Networks in Biomedical Imaging | Muhammad Muneeb Saad et.al. | 2405.20987v1 | null |
2024-05-31 | PUAL: A Classifier on Trifurcate Positive-Unlabeled Data | Xiaoke Wang et.al. | 2405.20970v1 | null |
2024-05-31 | Aligning Multiclass Neural Network Classifier Criterion with Task Performance via |
Nathan Tsoi et.al. | 2405.20954v1 | null |
2024-05-31 | Standard model of electromagnetism and chirality in crystals | R. Winkler et.al. | 2405.20940v1 | null |
2024-05-31 | MALT: Multi-scale Action Learning Transformer for Online Action Detection | Zhipeng Yang et.al. | 2405.20892v1 | null |
2024-05-30 | MotionLLM: Understanding Human Behaviors from Human Motions and Videos | Ling-Hao Chen et.al. | 2405.20340v1 | null |
2024-05-30 | OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving | Lening Wang et.al. | 2405.20337v1 | link |
2024-05-30 | VividDream: Generating 3D Scene with Ambient Dynamics | Yao-Chih Lee et.al. | 2405.20334v1 | null |
2024-05-30 | SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical Videos | Chinedu Innocent Nwoye et.al. | 2405.20333v1 | null |
2024-05-31 | 4DHands: Reconstructing Interactive Hands in 4D with Transformers | Dixuan Lin et.al. | 2405.20330v2 | null |
2024-05-30 | MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion | Shuyuan Tu et.al. | 2405.20325v1 | null |
2024-05-30 | Vision-based Manipulation from Single Human Video with Open-World Object Graphs | Yifeng Zhu et.al. | 2405.20321v1 | null |
2024-05-30 | Improving the Training of Rectified Flows | Sangyun Lee et.al. | 2405.20320v1 | link |
2024-05-30 | CausalQuest: Collecting Natural Causal Questions for AI Agents | Roberto Ceraolo et.al. | 2405.20318v1 | link |
2024-05-30 | Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models | Himangi Mittal et.al. | 2405.20305v1 | null |
2024-05-29 | X-VILA: Cross-Modality Alignment for Large Language Model | Hanrong Ye et.al. | 2405.19335v1 | null |
2024-05-29 | LLMs Meet Multimodal Generation and Editing: A Survey | Yingqing He et.al. | 2405.19334v1 | link |
2024-05-29 | Multi-Modal Generative Embedding Model | Feipeng Ma et.al. | 2405.19333v1 | null |
2024-05-29 | NPGA: Neural Parametric Gaussian Avatars | Simon Giebenhain et.al. | 2405.19331v1 | null |
2024-05-29 | Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation | Atrisha Sarkar et.al. | 2405.19328v1 | null |
2024-05-29 | DGD: Dynamic 3D Gaussians Distillation | Isaac Labe et.al. | 2405.19321v1 | null |
2024-05-29 | Real-Time Environment Condition Classification for Autonomous Vehicles | Marco Introvigne et.al. | 2405.19305v1 | null |
2024-05-29 | Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare | Hanwei Zhu et.al. | 2405.19298v1 | null |
2024-05-29 | Archetype-Based Redshift Estimation for the Dark Energy Spectroscopic Instrument Survey | Abhijeet Anand et.al. | 2405.19288v1 | null |
2024-05-29 | A study on the adequacy of common IQA measures for medical images | Anna Breger et.al. | 2405.19224v1 | null |
2024-05-28 | Classifying Overlapping Gaussian Mixtures in High Dimensions: From Optimal Classifiers to Neural Nets | Khen Cohen et.al. | 2405.18427v1 | null |
2024-05-28 | GFlow: Recovering 4D World from Monocular Video | Shizun Wang et.al. | 2405.18426v1 | null |
2024-05-28 | Hierarchical World Models as Visual Whole-Body Humanoid Controllers | Nicklas Hansen et.al. | 2405.18418v1 | null |
2024-05-28 | 3D StreetUnveiler with Semantic-Aware 2DGS | Jingwei Xu et.al. | 2405.18416v1 | null |
2024-05-28 | Why are Visually-Grounded Language Models Bad at Image Classification? | Yuhui Zhang et.al. | 2405.18415v1 | link |
2024-05-28 | Towards a Sampling Theory for Implicit Neural Representations | Mahrokh Najaf et.al. | 2405.18410v1 | null |
2024-05-28 | Phased Consistency Model | Fu-Yun Wang et.al. | 2405.18407v1 | null |
2024-05-28 | RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives | Jaehong Yoon et.al. | 2405.18406v1 | null |
2024-05-28 | MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning | Somnath Kumar et.al. | 2405.18358v1 | null |
2024-05-28 | Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography | Jie Liu et.al. | 2405.18356v1 | link |
2024-05-27 | Matryoshka Multimodal Models | Mu Cai et.al. | 2405.17430v1 | null |
2024-05-27 | NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models | Chankyu Lee et.al. | 2405.17428v1 | null |
2024-05-27 | MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds | Jiahui Lei et.al. | 2405.17421v1 | null |
2024-05-27 | Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control | Zhengfei Kuang et.al. | 2405.17414v1 | null |
2024-05-27 | Enhancing Music Genre Classification through Multi-Algorithm Analysis and User-Friendly Visualization | Navin Kamuni et.al. | 2405.17413v1 | null |
2024-05-27 | The Peripatetic Hater: Predicting Movement Among Hate Subreddits | Daniel Hickey et.al. | 2405.17410v1 | null |
2024-05-27 | Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer | Ruizhi Shao et.al. | 2405.17405v1 | null |
2024-05-27 | Spectral Greedy Coresets for Graph Neural Networks | Mucong Ding et.al. | 2405.17404v1 | null |
2024-05-27 | Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability | Shenyuan Gao et.al. | 2405.17398v1 | link |
2024-05-27 | Non-Unitary Quantum Machine Learning | Jamie Heredge et.al. | 2405.17388v1 | null |
2024-05-24 | Canonical Variates in Wasserstein Metric Space | Jia Li et.al. | 2405.15768v1 | null |
2024-05-24 | Scaling Laws for Discriminative Classification in Large Language Models | Dean Wyatte et.al. | 2405.15765v1 | null |
2024-05-24 | InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation | Yuchi Wang et.al. | 2405.15758v1 | link |
2024-05-24 | Looking Backward: Streaming Video-to-Video Translation with Feature Banks | Feng Liang et.al. | 2405.15757v1 | link |
2024-05-24 | Characterizing Discourse Group Roles in Inquiry-based University Science Labs | Tong Wan et.al. | 2405.15746v1 | null |
2024-05-24 | Hierarchical Uncertainty Exploration via Feedforward Posterior Trees | Elias Nehme et.al. | 2405.15719v1 | null |
2024-05-24 | EmpathicStories++: A Multimodal Dataset for Empathy towards Personal Experiences | Jocelyn Shen et.al. | 2405.15708v1 | null |
2024-05-24 | Sums: Sniffing Unknown Multiband Signals under Low Sampling Rates | Jinbo Peng et.al. | 2405.15705v1 | null |
2024-05-24 | realSEUDO for real-time calcium imaging analysis | Iuliia Dmitrieva et.al. | 2405.15701v1 | null |
2024-05-24 | UNION: Unsupervised 3D Object Detection using Object Appearance-based Pseudo-Classes | Ted Lentsch et.al. | 2405.15688v1 | null |
2024-05-23 | PuzzleAvatar: Assembling 3D Avatars from Personal Albums | Yuliang Xiu et.al. | 2405.14869v1 | null |
2024-05-23 | Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis | Basile Van Hoorick et.al. | 2405.14868v1 | null |
2024-05-23 | Video Diffusion Models are Training-free Motion Interpreter and Controller | Zeqi Xiao et.al. | 2405.14864v1 | null |
2024-05-23 | Synergistic Global-space Camera and Human Reconstruction from Videos | Yizhou Zhao et.al. | 2405.14855v1 | null |
2024-05-23 | Domain Wall Magnetic Tunnel Junction Reliable Integrate and Fire Neuron | Can Cui1 et.al. | 2405.14851v1 | null |
2024-05-23 | Learning to Detect and Segment Mobile Objects from Unlabeled Videos | Yihong Sun et.al. | 2405.14841v1 | null |
2024-05-23 | Designing A Sustainable Marine Debris Clean-up Framework without Human Labels | Raymond Wang et.al. | 2405.14815v1 | null |
2024-05-23 | As an AI Language Model, "Yes I Would Recommend Calling the Police'': Norm Inconsistency in LLM Decision-Making | Shomik Jain et.al. | 2405.14812v1 | null |
2024-05-23 | Lorentz-Equivariant Geometric Algebra Transformers for High-Energy Physics | Jonas Spinner et.al. | 2405.14806v1 | null |
2024-05-24 | Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image Generation | Hongxu Jiang et.al. | 2405.14802v2 | link |
2024-05-21 | Comprehensive Multimodal Deep Learning Survival Prediction Enabled by a Transformer Architecture: A Multicenter Study in Glioblastoma | Ahmed Gomaa et.al. | 2405.12963v1 | null |
2024-05-21 | Online Learning of Halfspaces with Massart Noise | Ilias Diakonikolas et.al. | 2405.12958v1 | null |
2024-05-21 | Quantifying Uncertainty in Classification Performance: ROC Confidence Bands Using Conformal Prediction | Zheshi Zheng et.al. | 2405.12953v1 | null |
2024-05-21 | Tutorly: Turning Programming Videos Into Apprenticeship Learning Environments with LLMs | Wengxi Li et.al. | 2405.12946v1 | null |
2024-05-21 | Pytorch-Wildlife: A Collaborative Deep Learning Framework for Conservation | Andres Hernandez et.al. | 2405.12930v1 | link |
2024-05-21 | Streamlining Software Reviews: Efficient Predictive Modeling with Minimal Examples | Tim Menzies et.al. | 2405.12920v1 | null |
2024-05-21 | The |
Bachir Bekka et.al. | 2405.12919v1 | null |
2024-05-21 | Topic Modelling Case Law Using a Large Language Model and a New Taxonomy for UK Law: AI Insights into Summary Judgment | Holli Sargeant et.al. | 2405.12910v1 | link |
2024-05-21 | Decentralized Federated Learning Over Imperfect Communication Channels | Weicai Li et.al. | 2405.12894v1 | null |
2024-05-21 | Investigating Persuasion Techniques in Arabic: An Empirical Study Leveraging Large Language Models | Abdurahmman Alzahrani et.al. | 2405.12884v1 | null |
2024-05-20 | Images that Sound: Composing Images and Sounds on a Single Canvas | Ziyang Chen et.al. | 2405.12221v1 | null |
2024-05-20 | Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices | Nathaniel Cohen et.al. | 2405.12211v1 | null |
2024-05-20 | The sign of scalar curvature on Kähler blowups | Garrett M. Brown et.al. | 2405.12189v1 | null |
2024-05-20 | Building Temporal Kernels with Orthogonal Polynomials | Yan Ru Pei et.al. | 2405.12179v1 | link |
2024-05-20 | Wireless vs. Traditional Ultrasound Assessed Knee Cartilage Outcomes Utilizing Automated Gain and Normalization Techniques | Arjun Parmar et.al. | 2405.12172v1 | null |
2024-05-20 | DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM | Xuchen Li et.al. | 2405.12139v1 | null |
2024-05-20 | Alzheimer's Magnetic Resonance Imaging Classification Using Deep and Meta-Learning Models | Nida Nasir et.al. | 2405.12126v1 | null |
2024-05-20 | An Active Learning Framework with a Class Balancing Strategy for Time Series Classification | Shemonto Das et.al. | 2405.12122v1 | null |
2024-05-20 | AGNfitter-rx: Modelling the radio-to-X-ray SEDs of AGNs | L. N. Martínez-Ramírez et.al. | 2405.12111v1 | null |
2024-05-20 | Real topological phonons in 3D carbon allotropes | Xiaotian Wang et.al. | 2405.12072v1 | null |
2024-05-17 | Submodular Information Selection for Hypothesis Testing with Misclassification Penalties | Jayanth Bhargav et.al. | 2405.10930v1 | null |
2024-05-17 | A Versatile Framework for Analyzing Galaxy Image Data by Implanting Human-in-the-loop on a Large Vision Model | Mingxiang Fu et.al. | 2405.10890v1 | null |
2024-05-17 | Multicenter Privacy-Preserving Model Training for Deep Learning Brain Metastases Autosegmentation | Yixing Huang et.al. | 2405.10870v1 | null |
2024-05-17 | "Hall" transport of liquid crystal solitons in Couette flow | Rodrigo C. V. Coelho et.al. | 2405.10850v1 | null |
2024-05-17 | Automatic segmentation of Organs at Risk in Head and Neck cancer patients from CT and MRI scans | Sébastien Quetin et.al. | 2405.10833v1 | null |
2024-05-17 | Open-Vocabulary Spatio-Temporal Action Detection | Tao Wu et.al. | 2405.10832v1 | null |
2024-05-17 | Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities | Hao Zhou et.al. | 2405.10825v1 | null |
2024-05-17 | ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios | Markus Bayer et.al. | 2405.10808v1 | null |
2024-05-17 | A Large-scale Multi Domain Leukemia Dataset for the White Blood Cells Detection with Morphological Attributes for Explainability | Abdul Rehman et.al. | 2405.10803v1 | null |
2024-05-17 | Reduced storage direct tensor ring decomposition for convolutional neural networks compression | Mateusz Gabor et.al. | 2405.10802v1 | link |
2024-05-16 | TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction | Yunfan Jiang et.al. | 2405.10315v1 | null |
2024-05-16 | 4D Panoptic Scene Graph Generation | Jingkang Yang et.al. | 2405.10305v1 | link |
2024-05-16 | On Sample Selection for Continual Learning: a Video Streaming Case Study | Alexander Dietmüller et.al. | 2405.10290v1 | null |
2024-05-16 | Quantum Vision Transformers for Quark-Gluon Classification | Marçal Comajoan Cara et.al. | 2405.10284v1 | null |
2024-05-16 | Faces that Speak: Jointly Synthesising Talking Face and Speech from Text | Youngjoon Jang et.al. | 2405.10272v1 | null |
2024-05-16 | A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision | Charles Raude et.al. | 2405.10266v1 | null |
2024-05-16 | PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology | George Shaikovski et.al. | 2405.10254v1 | null |
2024-05-16 | A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts | Xinru Zhang et.al. | 2405.10246v1 | null |
2024-05-16 | Ternary mappings of some evolution algebras | Candido Martin Gonzalez et.al. | 2405.10241v1 | null |
2024-05-16 | ENADPool: The Edge-Node Attention-based Differentiable Pooling for Graph Neural Networks | Zhehan Zhao et.al. | 2405.10218v1 | null |
2024-05-15 | Classifying geospatial objects from multiview aerial imagery using semantic meshes | David Russell et.al. | 2405.09544v1 | null |
2024-05-15 | Spectral complexity of deep neural networks | Simmaco Di Lillo et.al. | 2405.09541v1 | null |
2024-05-16 | MMFusion: Multi-modality Diffusion Model for Lymph Node Metastasis Diagnosis in Esophageal Cancer | Chengyu Wu et.al. | 2405.09539v2 | link |
2024-05-15 | Restoring balance: principled under/oversampling of data for optimal classification | Emanuele Loffredo et.al. | 2405.09535v1 | null |
2024-05-15 | Tackling Distribution Shifts in Task-Oriented Communication with Information Bottleneck | Hongru Li et.al. | 2405.09514v1 | null |
2024-05-15 | Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts | Donya Rooein et.al. | 2405.09482v1 | null |
2024-05-15 | Perception- and Fidelity-aware Reduced-Reference Super-Resolution Image Quality Assessment | Xinying Lin et.al. | 2405.09472v1 | null |
2024-05-15 | Non-contact Lung Disease Classification via OFDM-based Passive 6G ISAC Sensing | Hasan Mujtaba Buttar et.al. | 2405.09458v1 | null |
2024-05-15 | Cohomogeneity one RCD-spaces | Diego Corro et.al. | 2405.09448v1 | null |
2024-05-15 | M$^4$oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts | Yufeng Jiang et.al. | 2405.09446v1 | null |
2024-05-14 | CinePile: A Long Video Question Answering Dataset and Benchmark | Ruchit Rawal et.al. | 2405.08813v1 | null |
2024-05-14 | The Developing Human Connectome Project: A Fast Deep Learning-based Pipeline for Neonatal Cortical Surface Reconstruction | Qiang Ma et.al. | 2405.08783v1 | null |
2024-05-14 | Harnessing the power of longitudinal medical imaging for eye disease prognosis using Transformer-based sequence modeling | Gregory Holste et.al. | 2405.08780v1 | null |
2024-05-14 | FolkTalent: Enhancing Classification and Tagging of Indian Folk Paintings | Nancy Hada et.al. | 2405.08776v1 | null |
2024-05-14 | From Text to Context: An Entailment Approach for News Stakeholder Classification | Alapan Kuila et.al. | 2405.08751v1 | null |
2024-05-14 | Enhancing Blind Video Quality Assessment with Rich Quality-aware Features | Wei Sun et.al. | 2405.08745v1 | null |
2024-05-14 | The impact of Compositionality in Zero-shot Multi-label action recognition for Object-based tasks | Carmela Calabrese et.al. | 2405.08695v1 | null |
2024-05-14 | Latent group structure in linear panel data models with endogenous regressors | Junho Choi et.al. | 2405.08687v1 | null |
2024-05-14 | Achieving Fairness Through Channel Pruning for Dermatological Disease Diagnosis | Qingpeng Kong et.al. | 2405.08681v1 | link |
2024-05-14 | Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning | Alain Riou et.al. | 2405.08679v1 | null |
2024-05-14 | MambaOut: Do We Really Need Mamba for Vision? | Weihao Yu et.al. | 2405.07992v2 | link |
2024-05-13 | SPIN: Simultaneous Perception, Interaction and Navigation | Shagun Uppal et.al. | 2405.07991v1 | null |
2024-05-13 | KG-Planner: Knowledge-Informed Graph Neural Planning for Collaborative Manipulators | Wansong Liu et.al. | 2405.07962v1 | null |
2024-05-13 | An Algorithmic Classification of Generalized Pseudo-Anosov Homeomorphisms via Geometric Markov Partitions | Inti Cruz Diaz et.al. | 2405.07954v1 | null |
2024-05-13 | Scene Action Maps: Behavioural Maps for Navigation without Metric Information | Joel Loo et.al. | 2405.07948v1 | null |
2024-05-14 | PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition | Ziyang Zhang et.al. | 2405.07932v2 | link |
2024-05-13 | Improving Multimodal Learning with Multi-Loss Gradient Modulation | Konstantinos Kontras et.al. | 2405.07930v1 | null |
2024-05-13 | PLUTO: Pathology-Universal Transformer | Dinkar Juyal et.al. | 2405.07905v1 | null |
2024-05-13 | Enhancing Clinically Significant Prostate Cancer Prediction in T2-weighted Images through Transfer Learning from Breast Cancer | Chi-en Amy Tai et.al. | 2405.07869v1 | null |
2024-05-13 | Improving Breast Cancer Grade Prediction with Multiparametric MRI Created Using Optimized Synthetic Correlated Diffusion Imaging | Chi-en Amy Tai et.al. | 2405.07861v1 | null |
2024-05-10 | Multi-Object Tracking in the Dark | Xinzhe Wang et.al. | 2405.06600v1 | link |
2024-05-10 | Ice phase classification made easy with score-based denoising | Hong Sun et.al. | 2405.06599v1 | null |
2024-05-10 | Enhancing Weakly Supervised Semantic Segmentation with Multi-modal Foundation Models: An End-to-End Approach | Elham Ravanbakhsh et.al. | 2405.06586v1 | null |
2024-05-10 | Deep video representation learning: a survey | Elham Ravanbakhsh et.al. | 2405.06574v1 | null |
2024-05-10 | The Role of Topological Photon Spheres in Constraining the Parameters of Black Holes | Jafar Sadeghi et.al. | 2405.06568v1 | null |
2024-05-10 | OneTo3D: One Image to Re-editable Dynamic 3D Model and Video Generation | Jinwei Lin et.al. | 2405.06547v1 | link |
2024-05-10 | Separating States in Astronomical Sources Using Hidden Markov Models: With a Case Study of Flaring and Quiescence on EV Lac | Robert Zimmerman et.al. | 2405.06540v1 | null |
2024-05-10 | Semantic and Spatial Adaptive Pixel-level Classifier for Semantic Segmentation | Xiaowen Ma et.al. | 2405.06525v1 | link |
2024-05-10 | Aspect-based Sentiment Evaluation of Chess Moves (ASSESS): an NLP-based Method for Evaluating Chess Strategies from Textbooks | Haifa Alrdahi et.al. | 2405.06499v1 | null |
2024-05-10 | Improving Deep Learning Model Calibration for Cardiac Applications using Deterministic Uncertainty Networks and Uncertainty-aware Training | Tareen Dawood et.al. | 2405.06487v1 | null |
2024-05-09 | A Universal Growth Rate for Learning with Smooth Surrogate Losses | Anqi Mao et.al. | 2405.05968v1 | null |
2024-05-09 | Self-Supervised Learning of Time Series Representation via Diffusion Process and Imputation-Interpolation-Forecasting Mask | Zineb Senane et.al. | 2405.05959v1 | link |
2024-05-09 | Frame Interpolation with Consecutive Brownian Bridge Diffusion | Zonglin Lyu et.al. | 2405.05953v1 | null |
2024-05-09 | Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers | Peng Gao et.al. | 2405.05945v1 | link |
2024-05-09 | MRISegmentator-Abdomen: A Fully Automated Multi-Organ and Structure Segmentation Tool for T1-weighted Abdominal MRI | Yan Zhuang et.al. | 2405.05944v1 | null |
2024-05-09 | Non-symplectic automorphisms of prime order of O'Grady's tenfolds and cubic fourfolds | Simone Billi et.al. | 2405.05932v1 | null |
2024-05-09 | Deep Multi-Task Learning for Malware Image Classification | Ahmed Bensaoud et.al. | 2405.05906v1 | null |
2024-05-09 | An RNN-policy gradient approach for quantum architecture search | Gang Wang et.al. | 2405.05892v1 | null |
2024-05-09 | Composable Part-Based Manipulation | Weiyu Liu et.al. | 2405.05876v1 | null |
2024-05-09 | ExACT: An End-to-End Autonomous Excavator System Using Action Chunking With Transformers | Liangliang Chen et.al. | 2405.05861v1 | null |
2024-05-08 | Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo | Nayantara Mudur et.al. | 2405.05255v1 | link |
2024-05-08 | Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models | Hongjie Wang et.al. | 2405.05252v1 | null |
2024-05-08 | DanceCam: atmospheric turbulence mitigation in wide-field astronomical images with short-exposure video streams | Spencer Bialek et.al. | 2405.05250v1 | null |
2024-05-08 | Deep learning-based variational autoencoder for classification of quantum and classical states of light | Mahesh Bhupati et.al. | 2405.05243v1 | null |
2024-05-08 | On |
Barry Chin et.al. | 2405.05230v1 | null |
2024-05-08 | Are Economically Advanced Countries More Efficient in Basic and Applied Research? | Vladimír Holý et.al. | 2405.05227v1 | null |
2024-05-08 | Clustering Retail Products Based on Customer Behaviour | Vladimír Holý et.al. | 2405.05218v1 | null |
2024-05-08 | FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models | Jinglin Xu et.al. | 2405.05216v1 | link |
2024-05-08 | Graded Relevance Scoring of Written Essays with Dense Retrieval | Salam Albatarni et.al. | 2405.05200v1 | null |
2024-05-08 | Is Transductive Learning Equivalent to PAC Learning? | Shaddin Dughmi et.al. | 2405.05190v1 | null |
2024-05-07 | Switchable Decision: Dynamic Neural Generation Networks | Shujian Zhang et.al. | 2405.04513v1 | null |
2024-05-07 | Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing | Yi Zuo et.al. | 2405.04496v1 | null |
2024-05-07 | Exploration of Novel Neuromorphic Methodologies for Materials Applications | Derek Gobin et.al. | 2405.04478v1 | null |
2024-05-07 | Generalized classical Yang-Baxter equation and regular decompositions | Raschid Abedin et.al. | 2405.04440v1 | null |
2024-05-07 | On the classification of product-quotient surfaces with |
Federico Fallucca et.al. | 2405.04425v1 | null |
2024-05-07 | Vision Mamba: A Comprehensive Survey and Taxonomy | Xiao Liu et.al. | 2405.04404v1 | link |
2024-05-07 | Efficient Online Set-valued Classification with Bandit Feedback | Zhou Wang et.al. | 2405.04393v1 | null |
2024-05-07 | DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving | Chen Min et.al. | 2405.04390v1 | null |
2024-05-07 | Parallelized Multi-Agent Bayesian Optimization in Lava | Shay Snyder et.al. | 2405.04387v1 | null |
2024-05-07 | Pragmatist Intelligence: Where the Principle of Usefulness Can Take ANNs | Antonio Bikić et.al. | 2405.04386v1 | null |
2024-05-06 | Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs | Muhammad Uzair Khattak et.al. | 2405.03690v1 | null |
2024-05-06 | All-in-One Deep Learning Framework for MR Image Reconstruction | Geunu Jeong et.al. | 2405.03684v1 | null |
2024-05-06 | ScrewMimic: Bimanual Imitation from Human Videos with Screw Space Projection | Arpit Bahety et.al. | 2405.03666v1 | null |
2024-05-06 | CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classification | Sankalp Sinha et.al. | 2405.03660v1 | null |
2024-05-06 | Collecting Consistently High Quality Object Tracks with Minimal Human Involvement by Using Self-Supervised Learning to Detect Tracker Errors | Samreen Anjum et.al. | 2405.03643v1 | null |
2024-05-06 | Classification of Breast Cancer Histopathology Images using a Modified Supervised Contrastive Learning Method | Matina Mahdizadeh Sani et.al. | 2405.03642v1 | link |
2024-05-06 | Nonequilibrium relaxation and odd-even effect in finite-temperature electron gases | Eric Nilsson et.al. | 2405.03635v1 | null |
2024-05-06 | Nonnegative Matrix Factorization in Dimensionality Reduction: A Survey | Farid Saberi-Movahed et.al. | 2405.03615v1 | null |
2024-05-06 | Dual Relation Mining Network for Zero-Shot Learning | Jinwei Han et.al. | 2405.03613v1 | null |
2024-05-06 | Communities for the Lagrangian Dynamics of the Turbulent Velocity Gradient Tensor: A Network Participation Approach | Christopher J. Keylock et.al. | 2405.03589v1 | null |
2024-05-03 | DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos | Wen-Hsuan Chu et.al. | 2405.02280v1 | null |
2024-05-03 | Transversely Projective Structures on Smooth Foliations on Surfaces | Gabriel Fazoli et.al. | 2405.02273v1 | null |
2024-05-03 | On its way to the neutron star-white dwarf binary graveyard, IGR J16194-2810, a first ascent M giant X-ray binary | K. H. Hinkle et.al. | 2405.02270v1 | null |
2024-05-03 | Validating Gaia DR3 Pulsating Variable Classifications with TESS: Building Reliable |
Ai-Ying Zhou et.al. | 2405.02264v1 | null |
2024-05-03 | Subgraph2vec: A random walk-based algorithm for embedding knowledge graphs | Elika Bozorgi et.al. | 2405.02240v1 | null |
2024-05-03 | Fair Risk Control: A Generalized Framework for Calibrating Multi-group Fairness Risks | Lujing Zhang et.al. | 2405.02225v1 | null |
2024-05-03 | Designed Dithering Sign Activation for Binary Neural Networks | Brayan Monroy et.al. | 2405.02220v1 | null |
2024-05-03 | Multispectral Fine-Grained Classification of Blackgrass in Wheat and Barley Crops | Madeleine Darbyshire et.al. | 2405.02218v1 | null |
2024-05-03 | Non-Destructive Peat Analysis using Hyperspectral Imaging and Machine Learning | Yijun Yan et.al. | 2405.02191v1 | null |
2024-05-03 | Hoaxpedia: A Unified Wikipedia Hoax Articles Dataset | Hsuvas Borkakoty et.al. | 2405.02175v1 | null |
2024-05-02 | Confronting sparse Gaia DR3 photometry with TESS for a sample of about 60,000 hot massive non-radial pulsators | Daniel Hey et.al. | 2405.01539v1 | null |
2024-05-02 | Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks | Murtaza Dalal et.al. | 2405.01534v1 | null |
2024-05-02 | Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models | Nishad Singhi et.al. | 2405.01531v1 | null |
2024-05-02 | Track2Act: Predicting Point Tracks from Internet Videos enables Diverse Zero-shot Robot Manipulation | Homanga Bharadhwaj et.al. | 2405.01527v1 | null |
2024-05-03 | A separability-based approach to quantifying generalization: which layer is best? | Luciano Dyballa et.al. | 2405.01524v2 | null |
2024-05-02 | Grand Design vs. Multi-Armed Spiral Galaxies: Dependence on Galaxy Structure | Beverly J. Smith et.al. | 2405.01516v1 | null |
2024-05-03 | Accelerating Convergence in Bayesian Few-Shot Classification | Tianjun Ke et.al. | 2405.01507v2 | link |
2024-05-02 | PAM-UNet: Shifting Attention on Region of Interest in Medical Images | Abhijit Das et.al. | 2405.01503v1 | null |
2024-05-02 | Exploring Privacy Issues in Mission Critical Communication: Navigating 5G and Beyond Networks | Prajnamaya Dass et.al. | 2405.01492v1 | null |
2024-05-02 | Designing Algorithmic Recommendations to Achieve Human-AI Complementarity | Bryce McLaughlin et.al. | 2405.01484v1 | null |
2024-05-01 | Quantum algorithms for matrix geometric means | Nana Liu et.al. | 2405.00673v1 | null |
2024-05-01 | Adapting Pretrained Networks for Image Quality Assessment on High Dynamic Range Displays | Andrei Chubarau et.al. | 2405.00670v1 | null |
2024-05-01 | Screening of BindingDB database ligands against EGFR, HER2, Estrogen, Progesterone and NF-kB receptors based on machine learning and molecular docking | Parham Rezaee et.al. | 2405.00647v1 | null |
2024-05-01 | Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling | Yida Mu et.al. | 2405.00611v1 | null |
2024-05-01 | Investigating Automatic Scoring and Feedback using Large Language Models | Gloria Ashiya Katuka et.al. | 2405.00602v1 | null |
2024-05-01 | Discovering robust biomarkers of neurological disorders from functional MRI using graph neural networks: A Review | Yi Hao Chan et.al. | 2405.00577v1 | null |
2024-05-01 | EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model | Deng Li et.al. | 2405.00574v1 | null |
2024-05-01 | Remote Sensing Data Assimilation with a Chained Hydrologic-hydraulic Model for Flood Forecasting | Thanh Huy Nguyen et.al. | 2405.00567v1 | null |
2024-05-01 | Digital-analog quantum convolutional neural networks for image classification | Anton Simen et.al. | 2405.00548v1 | null |
2024-05-01 | UWAFA-GAN: Ultra-Wide-Angle Fluorescein Angiography Transformation via Multi-scale Generation and Registration Enhancement | Ruiquan Ge et.al. | 2405.00542v1 | link |
2024-04-30 | A Framework for Leveraging Human Computation Gaming to Enhance Knowledge Graphs for Accuracy Critical Generative AI Applications | Steph Buongiorno et.al. | 2404.19729v1 | null |
2024-04-30 | Classification of simple 0-dimensional isolated complete intersection singularities | Thuy Huong Pham et.al. | 2404.19728v1 | null |
2024-04-30 | PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios | Jingbo Wang et.al. | 2404.19722v1 | null |
2024-04-30 | PANGeA: Procedural Artificial Narrative using Generative AI for Turn-Based Video Games | Steph Buongiorno et.al. | 2404.19721v1 | null |
2024-04-30 | ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents | Hoang-Thang Ta et.al. | 2404.19714v1 | null |
2024-04-30 | A rank decomposition for the topological classification of neural representations | Kosio Beshkov et.al. | 2404.19710v1 | null |
2024-04-30 | Neural Controlled Differential Equations with Quantum Hidden Evolutions | Lingyi Yang et.al. | 2404.19673v1 | link |
2024-04-30 | Beyond MOS: Subjective Image Quality Score Preprocessing Method Based on Perceptual Similarity | Lei Wang et.al. | 2404.19666v1 | null |
2024-04-30 | Towards Generalist Robot Learning from Internet Video: A Survey | Robert McCarthy et.al. | 2404.19664v1 | null |
2024-04-30 | Regularization of Riemannian optimization: Application to process tomography and quantum machine learning | Felix Soest et.al. | 2404.19659v1 | null |
2024-04-29 | Hallucination of Multimodal Large Language Models: A Survey | Zechen Bai et.al. | 2404.18930v1 | link |
2024-04-29 | Swin2-MoSE: A New Single Image Super-Resolution Model for Remote Sensing | Leonardo Rossi et.al. | 2404.18924v1 | null |
2024-04-29 | Anomaly and invertible field theory with higher-form symmetry: Extended group cohomology | Shi Chen et.al. | 2404.18921v1 | null |
2024-04-29 | A Survey on Diffusion Models for Time Series and Spatio-Temporal Data | Yiyuan Yang et.al. | 2404.18886v1 | link |
2024-04-29 | A Multilevel Strategy to Improve People Tracking in a Real-World Scenario | Cristiano B. de Oliveira et.al. | 2404.18876v1 | null |
2024-04-29 | A Survey on Vision Mamba: Models, Applications and Challenges | Rui Xu et.al. | 2404.18861v1 | link |
2024-04-29 | ConPro: Learning Severity Representation for Medical Images using Contrastive Learning and Preference Optimization | Hong Nguyen et.al. | 2404.18831v1 | link |
2024-04-29 | Towards Extreme Image Compression with Latent Feature Guidance and Diffusion Prior | Zhiyuan Li et.al. | 2404.18820v1 | null |
2024-04-29 | Certification of Speaker Recognition Models to Additive Perturbations | Dmitrii Korzh et.al. | 2404.18791v1 | null |
2024-04-29 | Understanding Radicals via Orbital Parities | Reza G. Shirazi et.al. | 2404.18787v1 | null |
2024-04-26 | Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos | Zhengze Xu et.al. | 2404.17571v1 | null |
2024-04-26 | Multifold topological semimetals | Iñigo Robredo et.al. | 2404.17539v1 | null |
2024-04-26 | Exploring the Distinctiveness and Fidelity of the Descriptions Generated by Large Vision-Language Models | Yuhang Huang et.al. | 2404.17534v1 | null |
2024-04-26 | Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations | Puhao Li et.al. | 2404.17521v1 | link |
2024-04-26 | Learning text-to-video retrieval from image captioning | Lucas Ventura et.al. | 2404.17498v1 | null |
2024-04-26 | Tabular Data Contrastive Learning via Class-Conditioned and Feature-Correlation Based Augmentation | Wei Cui et.al. | 2404.17489v1 | link |
2024-04-26 | Low Cost Machine Vision for Insect Classification | Danja Brandt et.al. | 2404.17488v1 | null |
2024-04-26 | Conformal Prediction with Learned Features | Shayan Kiyani et.al. | 2404.17487v1 | null |
2024-04-26 | Sparse Reconstruction of Optical Doppler Tomography Based on State Space Model | Zhenghong Li et.al. | 2404.17484v1 | null |
2024-04-26 | One-Shot Image Restoration | Deborah Pereg et.al. | 2404.17426v1 | null |
2024-04-25 | Made to Order: Discovering monotonic temporal changes via self-supervised video ordering | Charig Yang et.al. | 2404.16828v1 | null |
2024-04-25 | ResVR: Joint Rescaling and Viewport Rendering of Omnidirectional Images | Weiqi Li et.al. | 2404.16825v1 | null |
2024-04-25 | V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection | Xuanyu Zhang et.al. | 2404.16824v1 | null |
2024-04-25 | Learning Visuotactile Skills with Two Multifingered Hands | Toru Lin et.al. | 2404.16823v1 | link |
2024-04-25 | Meta-Transfer Derm-Diagnosis: Exploring Few-Shot Learning and Transfer Learning for Skin Disease Classification in Long-Tail Distribution | Zeynep Özdemir et.al. | 2404.16814v1 | null |
2024-04-25 | Transformer-Based Local Feature Matching for Multimodal Image Registration | Remi Delaunay et.al. | 2404.16802v1 | null |
2024-04-25 | DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks | Tongzhou Mu et.al. | 2404.16779v1 | null |
2024-04-25 | Modeling Selective Feature Attention for Representation-based Siamese Text Matching | Jianxiang Zang et.al. | 2404.16776v1 | link |
2024-04-25 | Classifying One-Dimensional Quantum States Prepared by a Single Round of Measurements | Rahul Sahay et.al. | 2404.16753v1 | null |
2024-04-25 | Characterizing Solar Center-to-Limb Radial-Velocity Variability with SDO | Michael L. Palumbo III et.al. | 2404.16747v1 | null |
2024-04-24 | Optimizing OOD Detection in Molecular Graphs: A Novel Approach with Diffusion Models | Xu Shen et.al. | 2404.15625v1 | null |
2024-04-24 | Layer Ensemble Averaging for Improving Memristor-Based Artificial Neural Network Performance | Osama Yousuf et.al. | 2404.15621v1 | null |
2024-04-24 | A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution | Zhixiong Yang et.al. | 2404.15620v1 | link |
2024-04-24 | MDDD: Manifold-based Domain Adaptation with Dynamic Distribution for Non-Deep Transfer Learning in Cross-subject and Cross-session EEG-based Emotion Recognition | Ting Luo et.al. | 2404.15615v1 | null |
2024-04-24 | Federated Learning with Only Positive Labels by Exploring Label Correlations | Xuming An et.al. | 2404.15598v1 | null |
2024-04-24 | A Survey of Deep Long-Tail Classification Advancements | Charika de Alvis et.al. | 2404.15593v1 | null |
2024-04-24 | Domain Adaptation for Learned Image Compression with Supervised Adapters | Alberto Presta et.al. | 2404.15591v1 | null |
2024-04-24 | Brain Storm Optimization Based Swarm Learning for Diabetic Retinopathy Image Classification | Liang Qu et.al. | 2404.15585v1 | null |
2024-04-24 | Research on OPF control of three-phase four-wire low-voltage distribution network considering uncertainty | Rui Wang et.al. | 2404.15584v1 | null |
2024-04-24 | MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis | Jiaxin Zhuang et.al. | 2404.15580v1 | null |
2024-04-23 | ID-Animator: Zero-Shot Identity-Preserving Human Video Generation | Xuanhua He et.al. | 2404.15275v1 | link |
2024-04-23 | Metric-guided Image Reconstruction Bounds via Conformal Prediction | Matt Y Cheung et.al. | 2404.15274v1 | link |
2024-04-23 | Quantum optical classifier with superexponential speedup | Simone Roncallo et.al. | 2404.15266v1 | null |
2024-04-23 | TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting | Jiahe Li et.al. | 2404.15264v1 | null |
2024-04-23 | Multi-Session SLAM with Differentiable Wide-Baseline Pose Optimization | Lahav Lipson et.al. | 2404.15263v1 | link |
2024-04-23 | FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent | Cameron Smith et.al. | 2404.15259v1 | null |
2024-04-23 | Source-free Domain Adaptation for Video Object Detection Under Adverse Image Conditions | Xingguang Zhang et.al. | 2404.15252v1 | null |
2024-04-23 | Unifying the Temperature Dependent Dynamics of Glasses | Joseph B. Schlenoff et.al. | 2404.15250v1 | null |
2024-04-23 | Mining Invariance from Nonlinear Multi-Environment Data: Binary Classification | Austin Goddard et.al. | 2404.15245v1 | null |
2024-04-23 | Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models | Aidan Z. H. Yang et.al. | 2404.15236v1 | null |
2024-04-22 | AutoAD III: The Prequel -- Back to the Pixels | Tengda Han et.al. | 2404.14412v1 | null |
2024-04-22 | Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses | Inhee Lee et.al. | 2404.14410v1 | null |
2024-04-22 | Hyp-OC: Hyperbolic One Class Classification for Face Anti-Spoofing | Kartik Narayan et.al. | 2404.14406v1 | null |
2024-04-22 | A mean curvature flow arising in adversarial training | Leon Bungert et.al. | 2404.14402v1 | null |
2024-04-22 | TAVGBench: Benchmarking Text to Audible-Video Generation | Yuxin Mao et.al. | 2404.14381v1 | link |
2024-04-22 | Rethinking Legal Compliance Automation: Opportunities with Large Language Models | Shabnam Hassani et.al. | 2404.14356v1 | null |
2024-04-22 | On-the-Fly Point Annotation for Fast Medical Video Labeling | Meyer Adrien et.al. | 2404.14344v1 | null |
2024-04-22 | X-Ray: A Sequential 3D Representation for Generation | Tao Hu et.al. | 2404.14329v1 | null |
2024-04-22 | A Novel Approach to Chest X-ray Lung Segmentation Using U-net and Modified Convolutional Block Attention Module | Mohammad Ali Labbaf Khaniki et.al. | 2404.14322v1 | null |
2024-04-22 | "I Upload...All Types of Different Things to Say, the World of Blindness Is More Than What They Think It Is": A Study of Blind TikTokers' Identity Work from a Flourishing Perspective | Yao Lyu et.al. | 2404.14305v1 | null |
2024-04-19 | Data Alignment for Zero-Shot Concept Generation in Dermatology AI | Soham Gadgil et.al. | 2404.13043v1 | null |
2024-04-19 | PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation | Tianyuan Zhang et.al. | 2404.13026v1 | null |
2024-04-19 | BANF: Band-limited Neural Fields for Levels of Detail Reconstruction | Ahan Shabanov et.al. | 2404.13024v1 | null |
2024-04-19 | Stronger Random Baselines for In-Context Learning | Gregory Yauney et.al. | 2404.13020v1 | link |
2024-04-19 | A New Multi-Picture Architecture for Learned Video Deinterlacing and Demosaicing with Parallel Deformable Convolution and Self-Attention Blocks | Ronglei Ji et.al. | 2404.13018v1 | null |
2024-04-19 | Towards Robust Ferrous Scrap Material Classification with Deep Learning and Conformal Prediction | Paulo Henrique dos Santos et.al. | 2404.13002v1 | null |
2024-04-19 | RadRotator: 3D Rotation of Radiographs with Diffusion Models | Pouria Rouzrokh et.al. | 2404.13000v1 | null |
2024-04-19 | Nuclei Instance Segmentation of Cryosectioned H&E Stained Histological Images using Triple U-Net Architecture | Zarif Ahmed et.al. | 2404.12986v1 | null |
2024-04-19 | Cross-modal Diffusion Modelling for Super-resolved Spatial Transcriptomics | Xiaofei Wang et.al. | 2404.12973v1 | null |
2024-04-19 | Improving Pediatric Pneumonia Diagnosis with Adult Chest X-ray Images Utilizing Contrastive Learning and Embedding Similarity | Mohammad Zunaed et.al. | 2404.12958v1 | null |
2024-04-18 | On the Content Bias in Fréchet Video Distance | Songwei Ge et.al. | 2404.12391v1 | null |
2024-04-18 | Moving Object Segmentation: All You Need Is SAM (and Flow) | Junyu Xie et.al. | 2404.12389v1 | null |
2024-04-18 | VideoGigaGAN: Towards Detail-rich Video Super-Resolution | Yiran Xu et.al. | 2404.12388v1 | null |
2024-04-18 | Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models | Aitor Ormazabal et.al. | 2404.12387v1 | null |
2024-04-18 | G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis | Yufei Ye et.al. | 2404.12383v1 | null |
2024-04-18 | Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos | Isabella Liu et.al. | 2404.12379v1 | null |
2024-04-18 | RoboDreamer: Learning Compositional World Models for Robot Imagination | Siyuan Zhou et.al. | 2404.12377v1 | null |
2024-04-18 | When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes | Asaf Yehudai et.al. | 2404.12365v1 | null |
2024-04-18 | Inverse Neural Rendering for Explainable Multi-Object Tracking | Julian Ost et.al. | 2404.12359v1 | null |
2024-04-18 | Improving the interpretability of GNN predictions through conformal-based graph sparsification | Pablo Sanchez-Martin et.al. | 2404.12356v1 | link |
2024-04-18 | Dynamic Typography: Bringing Text to Life via Video Diffusion Prior | Zichen Liu et.al. | 2404.11614v2 | null |
2024-04-17 | VG4D: Vision-Language Model Goes 4D Video Recognition | Zhichao Deng et.al. | 2404.11605v1 | link |
2024-04-17 | Variational Bayesian Last Layers | James Harrison et.al. | 2404.11599v1 | link |
2024-04-17 | State-space Decomposition Model for Video Prediction Considering Long-term Motion Trend | Fei Cui et.al. | 2404.11576v1 | null |
2024-04-17 | Simple Image Signal Processing using Global Context Guidance | Omar Elezabi et.al. | 2404.11569v1 | link |
2024-04-17 | Spatio-Temporal Motion Retargeting for Quadruped Robots | Taerim Yoon et.al. | 2404.11557v1 | null |
2024-04-17 | Predicting Long-horizon Futures by Conditioning on Geometry and Time | Tarasha Khurana et.al. | 2404.11554v1 | null |
2024-04-17 | Carbon- and Oxygen-rich stars in MaStar: identification and classification | Lewis Hill et.al. | 2404.11541v1 | null |
2024-04-17 | GenFighter: A Generative and Evolutive Textual Attack Removal | Md Athikul Islam et.al. | 2404.11538v1 | null |
2024-04-17 | SSDiff: Spatial-spectral Integrated Diffusion Model for Remote Sensing Pansharpening | Yu Zhong et.al. | 2404.11537v1 | null |
2024-04-16 | COMBO: Compositional World Models for Embodied Multi-Agent Cooperation | Hongxin Zhang et.al. | 2404.10775v1 | null |
2024-04-16 | RapidVol: Rapid Reconstruction of 3D Ultrasound Volumes from Sensorless 2D Scans | Mark C. Eid et.al. | 2404.10766v1 | null |
2024-04-16 | Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification | Yu-Yang Li et.al. | 2404.10757v1 | null |
2024-04-16 | Integer-valued o-minimal functions | Neer Bhardwaj et.al. | 2404.10737v1 | null |
2024-04-16 | Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning | Hao-Lun Hsu et.al. | 2404.10728v1 | null |
2024-04-16 | AV-GAN: Attention-Based Varifocal Generative Adversarial Network for Uneven Medical Image Translation | Zexin Li et.al. | 2404.10714v1 | null |
2024-04-17 | Dual Modalities of Text: Visual and Textual Generative Pre-training | Yekun Chai et.al. | 2404.10710v2 | null |
2024-04-16 | Question Difficulty Ranking for Multiple-Choice Reading Comprehension | Vatsal Raina et.al. | 2404.10704v1 | null |
2024-04-16 | Retrieval Augmented Verification : Unveiling Disinformation with Structured Representations for Zero-Shot Real-Time Evidence-guided Fact-Checking of Multi-modal Social media posts | Arka Ujjal Dey et.al. | 2404.10702v1 | null |
2024-04-16 | Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs | Georgy Perevozchikov et.al. | 2404.10700v1 | null |
2024-04-15 | Squish Jamming | Samuel Poincloux et.al. | 2404.09773v1 | null |
2024-04-15 | Hilti SLAM Challenge 2023: Benchmarking Single + Multi-session SLAM across Sensor Constellations in Construction | Ashish Devadas Nair et.al. | 2404.09765v1 | null |
2024-04-15 | Deep Learning-Based Segmentation of Tumors in PET/CT Volumes: Benchmark of Different Architectures and Training Strategies | Monika Górka et.al. | 2404.09761v1 | null |
2024-04-15 | Quantization of Large Language Models with an Overdetermined Basis | Daniil Merkulov et.al. | 2404.09737v1 | null |
2024-04-15 | FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance, Head-pose, and Facial Expression Features | Andre Rochow et.al. | 2404.09736v1 | null |
2024-04-15 | Classification of finite type fusion quivers | Ben Elias et.al. | 2404.09714v1 | null |
2024-04-15 | LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models | Guangyan Li et.al. | 2404.09695v1 | null |
2024-04-15 | Harnessing GPT-4V(ision) for Insurance: A Preliminary Exploration | Chenwei Lin et.al. | 2404.09690v1 | null |
2024-04-15 | Post-Training Network Compression for 3D Medical Image Segmentation: Reducing Computational Efforts via Tucker Decomposition | Tobias Weber et.al. | 2404.09683v1 | link |
2024-04-15 | Cluster analysis of the Roma-BZCAT blazars | D. O. Kudryavtsev et.al. | 2404.09667v1 | null |
2024-04-15 | Deformable MRI Sequence Registration for AI-based Prostate Cancer Diagnosis | Alessa Hering et.al. | 2404.09666v1 | null |
2024-04-15 | Closing the Gap in the Trade-off between Fair Representations and Accuracy | Biswajit Rout et.al. | 2404.09664v1 | null |
2024-04-15 | If there's a Trigger Warning, then where's the Trigger? Investigating Trigger Warnings at the Passage Level | Matti Wiegmann et.al. | 2404.09615v1 | link |
2024-04-12 | FCert: Certifiably Robust Few-Shot Classification in the Era of Foundation Models | Yanting Wang et.al. | 2404.08631v1 | null |
2024-04-12 | Classification of Boolean Algebras through von Neumann regular $\mathcal{C}^{\infty}-$Rings | Jean Cerqueira Berni et.al. | 2404.08629v1 | null |
2024-04-12 | Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation | Yanhao Zheng et.al. | 2404.08603v1 | link |
2024-04-12 | Pathological Primitive Segmentation Based on Visual Foundation Model with Zero-Shot Mask Generation | Abu Bakor Hayat Arnob et.al. | 2404.08584v1 | link |
2024-04-12 | Lossy Image Compression with Foundation Diffusion Models | Lucas Relic et.al. | 2404.08580v1 | null |
2024-04-12 | IDD-X: A Multi-View Dataset for Ego-relative Important Object Localization and Explanation in Dense and Unstructured Traffic | Chirag Parikh et.al. | 2404.08561v1 | null |
2024-04-12 | Scalability in Building Component Data Annotation: Enhancing Facade Material Classification with Synthetic Data | Josie Harrison et.al. | 2404.08557v1 | null |
2024-04-12 | Benchmarking the Cell Image Segmentation Models Robustness under the Microscope Optical Aberrations | Boyuan Peng et.al. | 2404.08549v1 | null |
2024-04-12 | VertAttack: Taking advantage of Text Classifiers' horizontal vision | Jonathan Rusert et.al. | 2404.08538v1 | null |
2024-04-12 | Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection | Zhiwei Yang et.al. | 2404.08531v1 | null |
2024-04-11 | Connecting NeRFs, Images, and Text | Francesco Ballerini et.al. | 2404.07993v1 | null |
2024-04-11 | GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh | Jing Wen et.al. | 2404.07991v1 | null |
2024-04-11 | WaveMo: Learning Wavefront Modulations to See Through Scattering | Mingyang Xie et.al. | 2404.07985v1 | null |
2024-04-11 | Gaga: Group Any Gaussians via 3D-aware Memory Bank | Weijie Lyu et.al. | 2404.07977v1 | null |
2024-04-11 | FusionMamba: Efficient Image Fusion with State Space Model | Siran Peng et.al. | 2404.07932v1 | null |
2024-04-11 | HGRN2: Gated Linear RNNs with State Expansion | Zhen Qin et.al. | 2404.07904v1 | link |
2024-04-11 | Q-ITAGS: Quality-Optimized Spatio-Temporal Heterogeneous Task Allocation with a Time Budget | Glen Neville et.al. | 2404.07902v1 | null |
2024-04-11 | Auditing health-related recommendations in social media: A Case Study of Abortion on YouTube | Mohammed Lahsaini et.al. | 2404.07896v1 | null |
2024-04-11 | Typical blocks of the category |
Chih-Whi Chen et.al. | 2404.07894v1 | null |
2024-04-11 | Context-aware Video Anomaly Detection in Long-Term Datasets | Zhengye Yang et.al. | 2404.07887v1 | null |
2024-04-10 | RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion | Jaidev Shriram et.al. | 2404.07199v1 | null |
2024-04-10 | GCV-Turbo: End-to-end Acceleration of GNN-based Computer Vision Tasks on FPGA | Bingyi Zhang et.al. | 2404.07188v1 | null |
2024-04-10 | Adinkras and Pure Spinors | Richard Eager et.al. | 2404.07167v1 | null |
2024-04-10 | Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations | Ofir Shifman et.al. | 2404.07153v1 | null |
2024-04-10 | Learning of deep convolutional network image classifiers via stochastic gradient descent and over-parametrization | Michael Kohler et.al. | 2404.07128v1 | null |
2024-04-10 | Measuring proximity to standard planes during fetal brain ultrasound scanning | Chiara Di Vece et.al. | 2404.07124v1 | null |
2024-04-10 | "My toxic trait is thinking I'll remember this": gaps in the learner experience of video tutorials for feature-rich software | Ian Drosos et.al. | 2404.07114v1 | null |
2024-04-10 | The generic dual of p-adic groups and applications | Chris Jantzen et.al. | 2404.07111v1 | null |
2024-04-10 | Learning Priors for Non Rigid SfM from Casual Videos | Yoni Kasten et.al. | 2404.07097v1 | null |
2024-04-10 | VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning | Alexandros Xenos et.al. | 2404.07078v1 | link |
2024-04-09 | MoReVQA: Exploring Modular Reasoning Models for Video Question Answering | Juhong Min et.al. | 2404.06511v1 | null |
2024-04-10 | Reconstructing Hand-Held Objects in 3D | Jane Wu et.al. | 2404.06507v2 | null |
2024-04-09 | A Machine Learning Framework for the Prediction of Grain Boundary Segregation in Chemically Complex Environments | Doruk Aksoy et.al. | 2404.06499v1 | null |
2024-04-10 | Flying with Photons: Rendering Novel Views of Propagating Light | Anagh Malik et.al. | 2404.06493v2 | null |
2024-04-09 | Uncovering Tidal Treasures: Automated Classification of Faint Tidal Features in DECaLS Data | Alexander J. Gordon et.al. | 2404.06487v1 | null |
2024-04-09 | RhythmMamba: Fast Remote Physiological Measurement with Arbitrary Length Videos | Bochao Zou et.al. | 2404.06483v1 | null |
2024-04-09 | Laue Indexing with Optimal Transport | Tomasz Kacprzak et.al. | 2404.06478v1 | link |
2024-04-09 | A comparative analysis of deep learning models for lung segmentation on X-ray images | Weronika Hryniewska-Guzik et.al. | 2404.06455v1 | link |
2024-04-09 | QueSTMaps: Queryable Semantic Topological Maps for 3D Scene Understanding | Yash Mehan et.al. | 2404.06442v1 | null |
2024-04-09 | ClassiPyGRB: Machine Learning-Based Classification and Visualization of Gamma Ray Bursts using t-SNE | Keneth Garcia-Cifuentes et.al. | 2404.06439v1 | null |
2024-04-08 | MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding | Bo He et.al. | 2404.05726v1 | null |
2024-04-08 | Predicting Overtakes in Trucks Using CAN Data | Talha Hanif Butt et.al. | 2404.05723v1 | null |
2024-04-08 | Case Study: Neural Network Malware Detection Verification for Feature and Image Datasets | Preston K. Robinette et.al. | 2404.05703v1 | null |
2024-04-08 | Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding | Ahmad Idrissi-Yaghir et.al. | 2404.05694v1 | null |
2024-04-08 | Evaluating the Efficacy of Cut-and-Paste Data Augmentation in Semantic Segmentation for Satellite Imagery | Ionut M. Motoi et.al. | 2404.05693v1 | null |
2024-04-08 | AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation | Jiannan Ge et.al. | 2404.05667v1 | null |
2024-04-08 | Oblique photons, plasmons, and current-plasmons in relativistic plasmas and their topological implications | Hong Qin et.al. | 2404.05636v1 | null |
2024-04-08 | AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets | Pietro Lesci et.al. | 2404.05623v1 | null |
2024-04-08 | Experimental observation of a time rondeau crystal: Temporal Disorder in Spatiotemporal Order | Leo Joon Il Moon et.al. | 2404.05620v1 | null |
2024-04-08 | Self-Explainable Affordance Learning with Embodied Caption | Zhipeng Zhang et.al. | 2404.05603v1 | null |
2024-04-05 | On classification of global dynamics for energy-critical equivariant harmonic map heat flows and radial nonlinear heat equation | Kihyun Kim et.al. | 2404.04247v1 | null |
2024-04-05 | Evaluating Adversarial Robustness: A Comparison Of FGSM, Carlini-Wagner Attacks, And The Role of Distillation as Defense Mechanism | Trilokesh Ranjan Sarkar et.al. | 2404.04245v1 | null |
2024-04-05 | player2vec: A Language Modeling Approach to Understand Player Behavior in Games | Tianze Wang et.al. | 2404.04234v1 | null |
2024-04-05 | Deep-learning Segmentation of Small Volumes in CT images for Radiotherapy Treatment Planning | Jianxin Zhou et.al. | 2404.04202v1 | null |
2024-04-05 | SCAResNet: A ResNet Variant Optimized for Tiny Object Detection in Transmission and Distribution Towers | Weile Li et.al. | 2404.04179v1 | link |
2024-04-05 | Noisy Label Processing for Classification: A Survey | Mengting Li et.al. | 2404.04159v1 | null |
2024-04-05 | Improving Detection in Aerial Images by Capturing Inter-Object Relationships | Botao Ren et.al. | 2404.04140v1 | null |
2024-04-05 | Label Propagation for Zero-shot Classification with Vision-Language Models | Vladan Stojnić et.al. | 2404.04072v1 | link |
2024-04-05 | VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive Robots | Akhil Padmanabha et.al. | 2404.04066v1 | null |
2024-04-05 | Phase Binarization in Mutually Synchronized Bias Field-free Spin Hall Nano-oscillators for Reservoir Computing | Sourabh Manna et.al. | 2404.04023v1 | null |
2024-04-04 | OW-VISCap: Open-World Video Instance Segmentation and Captioning | Anwesa Choudhuri et.al. | 2404.03657v1 | null |
2024-04-04 | Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation | Shuting He et.al. | 2404.03645v1 | link |
2024-04-04 | On the Efficiency of Convolutional Neural Networks | Andrew Lavin et.al. | 2404.03617v1 | null |
2024-04-04 | Creator Hearts: Investigating the Impact Positive Signals from YouTube Creators in Shaping Comment Section Behavior | Frederick Choi et.al. | 2404.03612v1 | null |
2024-04-04 | InsectMamba: Insect Pest Classification with State Space Model | Qianning Wang et.al. | 2404.03611v1 | null |
2024-04-04 | DiffDet4SAR: Diffusion-based Aircraft Target Detection Network for SAR Images | Zhou Jie et.al. | 2404.03595v1 | link |
2024-04-04 | Alzheimer's disease detection in PSG signals | Lorena Gallego-Viñarás et.al. | 2404.03549v1 | null |
2024-04-04 | Towards Transcranial 3D Ultrasound Localization Microscopy of the Nonhuman Primate Brain | Paul Xing et.al. | 2404.03547v1 | null |
2024-04-04 | Segmentation-Guided Knee Radiograph Generation using Conditional Diffusion Models | Siyuan Mei et.al. | 2404.03541v1 | null |
2024-04-05 | A Methodology to Study the Impact of Spiking Neural Network Parameters considering Event-Based Automotive Data | Iqra Bano et.al. | 2404.03493v2 | null |
2024-04-03 | LidarDM: Generative LiDAR Simulation in a Generated World | Vlas Zyrianov et.al. | 2404.02903v1 | null |
2024-04-03 | Guarantees of confidentiality via Hammersley-Chapman-Robbins bounds | Kamalika Chaudhuri et.al. | 2404.02866v1 | link |
2024-04-03 | Semisimple Algebras of Vector Fields on |
Sajid Ali et.al. | 2404.02847v1 | null |
2024-04-03 | GPU-Accelerated RSF Level Set Evolution for Large-Scale Microvascular Segmentation | Meher Niger et.al. | 2404.02813v1 | null |
2024-04-03 | Generative-Contrastive Heterogeneous Graph Neural Network | Yu Wang et.al. | 2404.02810v1 | null |
2024-04-03 | FPT: Feature Prompt Tuning for Few-shot Readability Assessment | Ziyang Wang et.al. | 2404.02772v1 | link |
2024-04-03 | DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement | Hao Wu et.al. | 2404.02755v1 | null |
2024-04-03 | Terraced Compression Method with Automated Threshold Selection for Multidimensional Image Clustering of Heterogeneous Bodies | Jiatong Li et.al. | 2404.02744v1 | null |
2024-04-03 | Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss | Yunfan Lu et.al. | 2404.02731v1 | link |
2024-04-03 | Unblind Text Inputs: Predicting Hint-text of Text Input in Mobile Apps via LLM | Zhe Liu et.al. | 2404.02706v1 | null |
2024-04-02 | Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models | Zeyu Yang et.al. | 2404.02148v1 | link |
2024-04-02 | Multiparametric quantification and visualization of liver fat using ultrasound | Jihye Baek et.al. | 2404.02143v1 | null |
2024-04-03 | ResNet with Integrated Convolutional Block Attention Module for Ship Classification Using Transfer Learning on Optical Satellite Imagery | Ryan Donghan Kwon et.al. | 2404.02135v2 | null |
2024-04-02 | ViTamin: Designing Scalable Vision Models in the Vision-Language Era | Jienneg Chen et.al. | 2404.02132v1 | link |
2024-04-02 | ImageNot: A contrast with ImageNet preserves model rankings | Olawale Salaudeen et.al. | 2404.02112v1 | null |
2024-04-02 | CameraCtrl: Enabling Camera Control for Text-to-Video Generation | Hao He et.al. | 2404.02101v1 | link |
2024-04-02 | Explainability in JupyterLab and Beyond: Interactive XAI Systems for Integrated and Collaborative Workflows | Grace Guo et.al. | 2404.02081v1 | null |
2024-04-02 | Multi-Level Label Correction by Distilling Proximate Patterns for Semi-supervised Semantic Segmentation | Hui Xiao et.al. | 2404.02065v1 | null |
2024-04-02 | Long-context LLMs Struggle with Long In-context Learning | Tianle Li et.al. | 2404.02060v1 | link |
2024-04-02 | Deconstructing In-Context Learning: Understanding Prompts via Corruption | Namrata Shivagunde et.al. | 2404.02054v1 | link |
2024-03-29 | Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations | Jaisidh Singh et.al. | 2403.20312v1 | link |
2024-03-29 | Emotion-Anchored Contrastive Learning Framework for Emotion Recognition in Conversation | Fangxu Yu et.al. | 2403.20289v1 | link |
2024-03-29 | Prototype-based Interpretable Breast Cancer Prediction Models: Analysis and Challenges | Shreyasi Pathak et.al. | 2403.20260v1 | null |
2024-03-29 | Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions | Runhao Zeng et.al. | 2403.20254v1 | null |
2024-03-29 | Latent Embedding Clustering for Occlusion Robust Head Pose Estimation | José Celestino et.al. | 2403.20251v1 | null |
2024-03-29 | Long-Tailed Anomaly Detection with Learnable Class Names | Chih-Hui Ho et.al. | 2403.20236v1 | null |
2024-04-02 | Artificial Neural Networks-based Real-time Classification of ENG Signals for Implanted Nerve Interfaces | Antonio Coviello et.al. | 2403.20234v2 | null |
2024-03-29 | MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark | Sanghyun Woo et.al. | 2403.20225v1 | null |
2024-03-29 | Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science | Yazheng Yang et.al. | 2403.20208v1 | null |
2024-03-29 | The Future of Combating Rumors? Retrieval, Discrimination, and Generation | Junhao Xu et.al. | 2403.20204v1 | null |
2024-03-28 | RSMamba: Remote Sensing Image Classification with State Space Model | Keyan Chen et.al. | 2403.19654v1 | link |
2024-03-28 | Square patterns in dynamical orbits | Vefa Goksel et.al. | 2403.19642v1 | null |
2024-03-28 | Siamese Vision Transformers are Scalable Audio-visual Learners | Yan-Bo Lin et.al. | 2403.19638v1 | null |
2024-03-28 | Four-dimensional gradient Ricci solitons with (half) nonnegative isotropic curvature | Huai-Dong Cao et.al. | 2403.19627v1 | null |
2024-03-28 | Top-$k$ Classification and Cardinality-Aware Prediction | Anqi Mao et.al. | 2403.19625v1 | null |
2024-03-28 | RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents | Zeren Chen et.al. | 2403.19622v1 | null |
2024-03-28 | SAID-NeRF: Segmentation-AIDed NeRF for Depth Completion of Transparent Objects | Avinash Ummadisingu et.al. | 2403.19607v1 | null |
2024-03-28 | Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model | Zhicai Wang et.al. | 2403.19600v1 | link |
2024-03-28 | Frame by Familiar Frame: Understanding Replication in Video Diffusion Models | Aimon Rahman et.al. | 2403.19593v1 | null |
2024-03-28 | Img2Loc: Revisiting Image Geolocalization using Multi-modality Foundation Models and Image-based Retrieval-Augmented Generation | Zhongliang Zhou et.al. | 2403.19584v1 | null |
2024-03-27 | MetaCap: Meta-learning Priors from Multi-View Imagery for Sparse-view Human Performance Capture and Rendering | Guoxing Sun et.al. | 2403.18820v1 | null |
2024-03-27 | Breaking the Limitations with Sparse Inputs by Variational Frameworks (BLIss) in Terahertz Super-Resolution 3D Reconstruction | Yiyao Zhang et.al. | 2403.18776v1 | null |
2024-03-27 | CaT: Constraints as Terminations for Legged Locomotion Reinforcement Learning | Elliot Chane-Sane et.al. | 2403.18765v1 | null |
2024-03-27 | A vascular synthetic model for improved aneurysm segmentation and detection via Deep Neural Networks | Rafic Nader et.al. | 2403.18734v1 | null |
2024-03-27 | Contrastive Learning with Orthonormal Anchors (CLOA) | Huanran Li et.al. | 2403.18699v1 | null |
2024-03-27 | Annolid: Annotate, Segment, and Track Anything You Need | Chen Yang et.al. | 2403.18690v1 | null |
2024-03-27 | InceptionTime vs. Wavelet -- A comparison for time series classification | Daniel Klenkert et.al. | 2403.18687v1 | null |
2024-03-27 | TransFusion: Contrastive Learning with Transformers | Huanran Li et.al. | 2403.18681v1 | null |
2024-03-28 | FluxGAT: Integrating Flux Sampling with Graph Neural Networks for Unbiased Gene Essentiality Classification | Kieren Sharma et.al. | 2403.18666v2 | null |
2024-03-27 | Indecomposable set-theoretical solutions to the Yang-Baxter equation of size |
Carsten Dietzel et.al. | 2403.18653v1 | null |
2024-03-26 | Efficient Video Object Segmentation via Modulated Cross-Attention Memory | Abdelrahman Shaker et.al. | 2403.17937v1 | link |
2024-03-26 | ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis | Muhammad Hamza Mughal et.al. | 2403.17936v1 | null |
2024-03-26 | OmniVid: A Generative Framework for Universal Video Understanding | Junke Wang et.al. | 2403.17935v1 | link |
2024-03-26 | Track Everything Everywhere Fast and Robustly | Yunzhou Song et.al. | 2403.17931v1 | null |
2024-03-26 | FastCAR: Fast Classification And Regression Multi-Task Learning via Task Consolidation for Modelling a Continuous Property Variable of Object Classes | Anoop Kini et.al. | 2403.17926v1 | null |
2024-03-26 | The Need for Speed: Pruning Transformers with One Recipe | Samir Khaki et.al. | 2403.17921v1 | link |
2024-03-26 | TC4D: Trajectory-Conditioned Text-to-4D Generation | Sherwin Bahmani et.al. | 2403.17920v1 | null |
2024-03-26 | AgentStudio: A Toolkit for Building General Virtual Agents | Longtao Zheng et.al. | 2403.17918v1 | null |
2024-03-26 | Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos | Akshay Paruchuri et.al. | 2403.17915v1 | null |
2024-03-26 | Hierarchical Multi-label Classification for Fine-level Event Extraction from Aviation Accident Reports | Xinyu Zhao et.al. | 2403.17914v1 | null |
2024-03-25 | DBPF: A Framework for Efficient and Robust Dynamic Bin-Picking | Yichuan Li et.al. | 2403.16786v1 | null |
2024-03-25 | C-arm inverse geometry CT for 3D cardiac chamber mapping | Jordan M. Slagowski et.al. | 2403.16779v1 | null |
2024-03-25 | Diff-Def: Diffusion-Generated Deformation Fields for Conditional Atlases | Sophie Starck et.al. | 2403.16776v1 | null |
2024-03-25 | As Good As A Coin Toss Human detection of AI-generated images, videos, audio, and audiovisual stimuli | Di Cooke et.al. | 2403.16760v1 | null |
2024-03-25 | Creating a Digital Twin of Spinal Surgery: A Proof of Concept | Jonas Hein et.al. | 2403.16736v1 | null |
2024-03-25 | A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation Models | Nils Ingelhag et.al. | 2403.16730v1 | null |
2024-03-25 | One-Shot Domain Incremental Learning | Yasushi Esaki et.al. | 2403.16707v1 | null |
2024-03-25 | Assessing the Performance of Deep Learning for Automated Gleason Grading in Prostate Cancer | Dominik Müller et.al. | 2403.16695v1 | null |
2024-03-25 | DeepGleason: a System for Automated Gleason Grading of Prostate Cancer using Deep Neural Networks | Dominik Müller et.al. | 2403.16678v1 | link |
2024-03-25 | FOOL: Addressing the Downlink Bottleneck in Satellite Computing with Neural Feature Compression | Alireza Furutanpey et.al. | 2403.16677v1 | null |
2024-03-25 | A Novel Loss Function-based Support Vector Machine for Binary Classification | Yan Li et.al. | 2403.16654v1 | null |
2024-03-25 | Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution | Qingping Zheng et.al. | 2403.16643v1 | null |
2024-03-25 | Multi-Scale Texture Loss for CT denoising with GANs | Francesco Di Feola et.al. | 2403.16640v1 | link |
2024-03-25 | AI-Generated Video Detection via Spatio-Temporal Anomaly Learning | Jianfa Bai et.al. | 2403.16638v1 | null |
2024-03-25 | Distributed collaborative anomalous sound detection by embedding sharing | Kota Dohi et.al. | 2403.16610v1 | null |
2024-03-25 | EDUE: Expert Disagreement-Guided One-Pass Uncertainty Estimation for Medical Image Segmentation | Kudaibergen Abutalip et.al. | 2403.16594v1 | null |
2024-03-22 | LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models | Yuzhang Shang et.al. | 2403.15388v1 | null |
2024-03-22 | Time-efficient, high-resolution 3T whole-brain relaxometry using Cartesian 3D MR-STAT with CSF suppression | Hongyan Liu et.al. | 2403.15379v1 | null |
2024-03-22 | Long-CLIP: Unlocking the Long-Text Capability of CLIP | Beichen Zhang et.al. | 2403.15378v1 | null |
2024-03-22 | InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding | Yi Wang et.al. | 2403.15377v1 | null |
2024-03-22 | Cascading Blackout Severity Prediction with Statistically-Augmented Graph Neural Networks | Joe Gorka et.al. | 2403.15363v1 | null |
2024-03-22 | SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series | Badri N. Patro et.al. | 2403.15360v1 | null |
2024-03-22 | Ultrasound Imaging based on the Variance of a Diffusion Restoration Model | Yuxin Zhang et.al. | 2403.15316v1 | null |
2024-03-22 | Global Control for Local SO(3)-Equivariant Scale-Invariant Vessel Segmentation | Patryk Rygiel et.al. | 2403.15314v1 | null |
2024-03-22 | Quantum-inspired classification via efficient simulation of Helstrom measurement | Wooseop Hwang et.al. | 2403.15308v1 | null |
2024-03-22 | Reconnaissance ultracool spectra in the Euclid Deep Fields | Jerry Jun-Yan Zhang et.al. | 2403.15288v1 | null |
2024-03-21 | Language Repository for Long Video Understanding | Kumara Kahatapitiya et.al. | 2403.14622v1 | link |
2024-03-22 | Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion | Xiang Fan et.al. | 2403.14617v2 | null |
2024-03-21 | Explorative Inbetweening of Time and Space | Haiwen Feng et.al. | 2403.14611v1 | null |
2024-03-21 | ReNoise: Real Image Inversion Through Iterative Noising | Daniel Garibi et.al. | 2403.14602v1 | null |
2024-03-21 | PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model | Zheng Zhang et.al. | 2403.14598v1 | link |
2024-03-21 | Large Language Models for Multi-Choice Question Classification of Medical Subjects | Víctor Ponce-López et.al. | 2403.14582v1 | null |
2024-03-21 | DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video | Narek Tumanyan et.al. | 2403.14548v1 | null |
2024-03-21 | Estimating Physical Information Consistency of Channel Data Augmentation for Remote Sensing Images | Tom Burgert et.al. | 2403.14547v1 | null |
2024-03-21 | Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets | Ahmet Alp Kindiroglu et.al. | 2403.14534v1 | link |
2024-03-21 | Invisible Needle Detection in Ultrasound: Leveraging Mechanism-Induced Vibration | Chenyang Li et.al. | 2403.14523v1 | null |
2024-03-21 | Denoising Diffusion Models for 3D Healthy Brain Tissue Inpainting | Alicia Durrer et.al. | 2403.14499v1 | link |
2024-03-20 | TimeRewind: Rewinding Time with Image-and-Events Video Diffusion | Jingxi Chen et.al. | 2403.13800v1 | null |
2024-03-20 | Hierarchical NeuroSymbolic Approach for Action Quality Assessment | Lauren Okamoto et.al. | 2403.13798v1 | null |
2024-03-20 | Bridge the Modality and Capacity Gaps in Vision-Language Model Selection | Chao Yi et.al. | 2403.13797v1 | null |
2024-03-20 | The Model Openness Framework: Promoting Completeness and Openness for Reproducibility, Transparency and Usability in AI | Matt White et.al. | 2403.13784v1 | null |
2024-03-20 | Gradings on associative triple systems of the second kind | Alberto Daza-Garcia et.al. | 2403.13775v1 | null |
2024-03-20 | Towards Principled Representation Learning from Videos for Reinforcement Learning | Dipendra Misra et.al. | 2403.13765v1 | null |
2024-03-20 | Enhancing Gait Video Analysis in Neurodegenerative Diseases by Knowledge Augmentation in Vision Language Model | Diwei Wang et.al. | 2403.13756v1 | null |
2024-03-20 | Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation | Fu-Yun Wang et.al. | 2403.13745v1 | null |
2024-03-20 | Probabilistic Forecasting with Stochastic Interpolants and Föllmer Processes | Yifan Chen et.al. | 2403.13724v1 | null |
2024-03-20 | Improving the Adaptive Moment Estimation (ADAM) stochastic optimizer through an Implicit-Explicit (IMEX) time-stepping approach | Abhinab Bhattacharjee et.al. | 2403.13704v1 | null |
2024-03-19 | LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression | Zhuoshi Pan et.al. | 2403.12968v1 | null |
2024-03-19 | FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation | Shuai Yang et.al. | 2403.12962v1 | link |
2024-03-19 | WHAC: World-grounded Humans and Cameras | Wanqi Yin et.al. | 2403.12959v1 | null |
2024-03-19 | FutureDepth: Learning to Predict the Future Improves Video Depth Estimation | Rajeev Yasarla et.al. | 2403.12953v1 | null |
2024-03-19 | Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models | Elaine Sui et.al. | 2403.12952v1 | link |
2024-03-19 | Legendrian loops and cluster modular groups | James Hughes et.al. | 2403.12951v1 | null |
2024-03-19 | Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers | Vidhi Jain et.al. | 2403.12943v1 | null |
2024-03-19 | Contextual AD Narration with Interleaved Multimodal Sequence | Hanlin Wang et.al. | 2403.12922v1 | null |
2024-03-19 | Semantic Layering in Room Segmentation via LLMs | Taehyeon Kim et.al. | 2403.12920v1 | null |
2024-03-19 | Yell At Your Robot: Improving On-the-Fly from Language Corrections | Lucy Xiaoyang Shi et.al. | 2403.12910v1 | null |
2024-03-18 | Time Series Compression using Quaternion Valued Neural Networks and Quaternion Backpropagation | Johannes Pöppelbaum et.al. | 2403.11722v1 | null |
2024-03-18 | Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing | Juan Zhang et.al. | 2403.11700v1 | null |
2024-03-18 | A Spatial-Temporal Progressive Fusion Network for Breast Lesion Segmentation in Ultrasound Videos | Zhengzheng Tu et.al. | 2403.11699v1 | null |
2024-03-18 | Object Segmentation-Assisted Inter Prediction for Versatile Video Coding | Zhuoyuan Li et.al. | 2403.11694v1 | null |
2024-03-19 | MoreStyle: Relax Low-frequency Constraint of Fourier-based Image Reconstruction in Generalizable Medical Image Segmentation | Haoyu Zhao et.al. | 2403.11689v2 | null |
2024-03-18 | Better (pseudo-)labels for semi-supervised instance segmentation | François Porcher et.al. | 2403.11675v1 | null |
2024-03-19 | WIA-LD2ND: Wavelet-based Image Alignment for Self-supervised Low-Dose CT Denoising | Haoyu Zhao et.al. | 2403.11672v2 | null |
2024-03-18 | Binary Noise for Binary Tasks: Masked Bernoulli Diffusion for Unsupervised Anomaly Detection | Julia Wolleb et.al. | 2403.11667v1 | null |
2024-03-18 | Combining Local and Global Perception for Autonomous Navigation on Nano-UAVs | Lorenzo Lamberti et.al. | 2403.11661v1 | null |
2024-03-18 | LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything Model | Yuxin Cao et.al. | 2403.11656v1 | null |
2024-03-15 | Strong and Controllable Blind Image Decomposition | Zeyu Zhang et.al. | 2403.10520v1 | link |
2024-03-15 | Frozen Feature Augmentation for Few-Shot Image Classification | Andreas Bär et.al. | 2403.10519v1 | null |
2024-03-15 | VideoAgent: Long-form Video Understanding with Large Language Model as Agent | Xiaohan Wang et.al. | 2403.10517v1 | null |
2024-03-15 | Surveyor: Facilitating Discovery Within Video Games for Blind and Low Vision Players | Vishnu Nair et.al. | 2403.10512v1 | null |
2024-03-15 | Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study | Chenguang Wang et.al. | 2403.10499v1 | link |
2024-03-15 | Joint Multimodal Transformer for Dimensional Emotional Recognition in the Wild | Paul Waligora et.al. | 2403.10488v1 | null |
2024-03-15 | Tensor Star Decomposition | Wuyang Zhou et.al. | 2403.10481v1 | null |
2024-03-15 | Using an LLM to Turn Sign Spottings into Spoken Language Sentences | Ozge Mercanoglu Sincan et.al. | 2403.10434v1 | null |
2024-03-15 | Neural Networks Hear You Loud And Clear: Hearing Loss Compensation Using Deep Neural Networks | Peter Leer et.al. | 2403.10420v1 | null |
2024-03-15 | A comparative study on machine learning approaches for rock mass classification using drilling data | Tom F. Hansen et.al. | 2403.10404v1 | null |
2024-03-14 | Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models | Akhil Kedia et.al. | 2403.09635v1 | link |
2024-03-14 | Generalized Predictive Model for Autonomous Driving | Jiazhi Yang et.al. | 2403.09630v1 | link |
2024-03-14 | From the Conformal Anomaly to the Virasoro Algebra | Sid Maibach et.al. | 2403.09628v1 | null |
2024-03-14 | Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding | Guo Chen et.al. | 2403.09626v1 | link |
2024-03-14 | Score-Guided Diffusion for 3D Human Recovery | Anastasis Stathopoulos et.al. | 2403.09623v1 | link |
2024-03-14 | PosSAM: Panoptic Open-vocabulary Segment Anything | Vibashan VS et.al. | 2403.09620v1 | null |
2024-03-14 | Explore In-Context Segmentation via Latent Diffusion Models | Chaoyang Wang et.al. | 2403.09616v1 | null |
2024-03-14 | Compute-first optical detection for noise-resilient visual perception | Jungmin Kim et.al. | 2403.09612v1 | null |
2024-03-14 | Mixture of Mixups for Multi-label Classification of Rare Anuran Sounds | Ilyass Moummad et.al. | 2403.09598v1 | link |
2024-03-14 | DungeonMaker: Embedding Tangible Creation and Destruction in Hybrid Board Games through Personal Fabrication Technology | Evgeny Stemasov et.al. | 2403.09592v1 | null |
2024-03-13 | VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis | Enric Corona et.al. | 2403.08764v1 | null |
2024-03-13 | Segmentation of Knee Bones for Osteoarthritis Assessment: A Comparative Analysis of Supervised, Few-Shot, and Zero-Shot Learning Approaches | Yun Xin Teoh et.al. | 2403.08761v1 | null |
2024-03-13 | MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning | Jialv Zou et.al. | 2403.08760v1 | link |
2024-03-13 | Spatiotemporal Diffusion Model with Paired Sampling for Accelerated Cardiac Cine MRI | Shihan Qiu et.al. | 2403.08758v1 | null |
2024-03-13 | DAM: Dynamic Adapter Merging for Continual Video QA Learning | Feng Cheng et.al. | 2403.08755v1 | link |
2024-03-13 | Clinically Feasible Diffusion Reconstruction for Highly-Accelerated Cardiac Cine MRI | Shihan Qiu et.al. | 2403.08749v1 | null |
2024-03-13 | Torsion pairs, t-structures, and co-t-structures for completions of discrete cluster categories | Sofia Franchini et.al. | 2403.08735v1 | null |
2024-03-13 | Euclid: Testing photometric selection of emission-line galaxy targets | M. S. Cagliari et.al. | 2403.08726v1 | null |
2024-03-13 | Diffusion-based Iterative Counterfactual Explanations for Fetal Ultrasound Image Quality Assessment | Paraskevas Pegios et.al. | 2403.08700v1 | null |
2024-03-13 | Implicit Regularization of Gradient Flow on One-Layer Softmax Attention | Heejune Sheen et.al. | 2403.08699v1 | null |
2024-03-12 | OPEN TEACH: A Versatile Teleoperation System for Robotic Manipulation | Aadhithya Iyer et.al. | 2403.07870v1 | null |
2024-03-12 | TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation | Shivin Dass et.al. | 2403.07869v1 | null |
2024-03-12 | Iterative Graph Neural Network Enhancement via Frequent Subgraph Mining of Explanations | Harish G. Naik et.al. | 2403.07849v1 | null |
2024-03-12 | When Eye-Tracking Meets Machine Learning: A Systematic Review on Applications in Medical Image Analysis | Sahar Moradizeyveh et.al. | 2403.07834v1 | null |
2024-03-12 | DeliGrasp: Inferring Object Mass, Friction, and Compliance with LLMs for Adaptive and Minimally Deforming Grasp Policies | William Xie et.al. | 2403.07832v1 | null |
2024-03-12 | A geometric model for the module category of a string algebra | Karin Baur et.al. | 2403.07810v1 | null |
2024-03-12 | BraSyn 2023 challenge: Missing MRI synthesis and the effect of different learning objectives | Ivo M. Baltruschat et.al. | 2403.07800v1 | null |
2024-03-12 | A robust SVM-based approach with feature selection and outliers detection for classification problems | Marta Baldomero-Naranjo et.al. | 2403.07753v1 | null |
2024-03-12 | Vision-based Vehicle Re-identification in Bridge Scenario using Flock Similarity | Chunfeng Zhang et.al. | 2403.07752v1 | null |
2024-03-12 | Harnessing two-photon dissipation for enhanced quantum measurement and control | Antoine Marquet et.al. | 2403.07744v1 | null |
2024-03-11 | Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling | Wele Gedara Chaminda Bandara et.al. | 2403.06978v1 | link |
2024-03-12 | VideoMamba: State Space Model for Efficient Video Understanding | Kunchang Li et.al. | 2403.06977v2 | link |
2024-03-11 | Memory-based Adapters for Online 3D Scene Perception | Xiuwei Xu et.al. | 2403.06974v1 | null |
2024-03-11 | Explainable Transformer Prototypes for Medical Diagnoses | Ugur Demir et.al. | 2403.06961v1 | link |
2024-03-11 | Quadruped-Frog: Rapid Online Optimization of Continuous Quadruped Jumping | Guillaume Bellegarda et.al. | 2403.06954v1 | null |
2024-03-11 | Optimizing Latent Graph Representations of Surgical Scenes for Zero-Shot Domain Transfer | Siddhant Satyanaik et.al. | 2403.06953v1 | null |
2024-03-11 | Advancing Generalizable Remote Physiological Measurement through the Integration of Explicit and Implicit Prior Knowledge | Yuting Zhang et.al. | 2403.06947v1 | link |
2024-03-11 | Conditional Score-Based Diffusion Model for Cortical Thickness Trajectory Prediction | Qing Xiao et.al. | 2403.06940v1 | null |
2024-03-11 | FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks | Muhammad Saif Ullah Khan et.al. | 2403.06904v1 | null |
2024-03-11 | Benign overfitting in leaky ReLU networks with moderate input dimension | Kedar Karhadkar et.al. | 2403.06903v1 | null |
2024-03-08 | Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos | Tarun Kalluri et.al. | 2403.05535v1 | null |
2024-03-08 | Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets | Lorenzo Brigato et.al. | 2403.05532v1 | null |
2024-03-08 | Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context | Machel Reid et.al. | 2403.05530v1 | null |
2024-03-08 | Take Your Best Shot: Sampling-Based Next-Best-View Planning for Autonomous Photography & Inspection | Shijie Gao et.al. | 2403.05477v1 | null |
2024-03-08 | Will GPT-4 Run DOOM? | Adrian de Wynter et.al. | 2403.05468v1 | null |
2024-03-08 | Evaluating AI and Human Authorship Quality in Academic Writing through Physics Essays | Will Yeadon et.al. | 2403.05458v1 | null |
2024-03-08 | VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models | Yabo Zhang et.al. | 2403.05438v1 | link |
2024-03-08 | OmniCount: Multi-label Object Counting with Semantic-Geometric Priors | Anindya Mondal et.al. | 2403.05435v1 | null |
2024-03-08 | Infinite Translation Surfaces in the Wild | Vincent Delecroix et.al. | 2403.05424v1 | null |
2024-03-08 | Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery | Mubashir Noman et.al. | 2403.05419v1 | link |
2024-03-07 | DeepSee: Multidimensional Visualizations of Seabed Ecosystems | Adam Coscia et.al. | 2403.04761v1 | link |
2024-03-07 | iScore: Visual Analytics for Interpreting How Language Models Automatically Score Summaries | Adam Coscia et.al. | 2403.04760v1 | link |
2024-03-07 | KnowledgeVIS: Interpreting Language Models by Comparing Fill-in-the-Blank Prompts | Adam Coscia et.al. | 2403.04758v1 | link |
2024-03-07 | Preliminary Guidelines For Combining Data Integration and Visual Data Analysis | Adam Coscia et.al. | 2403.04757v1 | link |
2024-03-07 | Photonic probabilistic machine learning using quantum vacuum noise | Seou Choi et.al. | 2403.04731v1 | null |
2024-03-07 | Analysis of Systems' Performance in Natural Language Processing Competitions | Sergio Nava-Muñoz et.al. | 2403.04693v1 | null |
2024-03-07 | CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios | Qilang Ye et.al. | 2403.04640v1 | link |
2024-03-07 | Scalable, Simulation-Guided Compliant Tactile Finger Design | Yuxiang Ma et.al. | 2403.04638v1 | null |
2024-03-08 | Pix2Gif: Motion-Guided Diffusion for GIF Generation | Hitesh Kandala et.al. | 2403.04634v2 | null |
2024-03-07 | MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder | Lei Li et.al. | 2403.04626v1 | null |
2024-03-06 | 3D Diffusion Policy | Yanjie Ze et.al. | 2403.03954v1 | link |
2024-03-06 | Stop Regressing: Training Value Functions via Classification for Scalable Deep RL | Jesse Farebrother et.al. | 2403.03950v1 | null |
2024-03-06 | Reconciling Reality through Simulation: A Real-to-Sim-to-Real Approach for Robust Manipulation | Marcel Torne et.al. | 2403.03949v1 | null |
2024-03-06 | DART: Implicit Doppler Tomography for Radar Novel View Synthesis | Tianshu Huang et.al. | 2403.03896v1 | null |
2024-03-06 | Joint multi-task learning improves weakly-supervised biomarker prediction in computational pathology | Omar S. M. El Nahhas et.al. | 2403.03891v1 | link |
2024-03-06 | Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation | Xiao Ma et.al. | 2403.03890v1 | null |
2024-03-06 | Decoupled Vertical Federated Learning for Practical Training on Vertically Partitioned Data | Avi Amalanshu et.al. | 2403.03871v1 | null |
2024-03-06 | X-Shot: A Unified System to Handle Frequent, Few-shot and Zero-shot Learning Simultaneously in Classification | Hanzi Xu et.al. | 2403.03863v1 | link |
2024-03-06 | ProxNF: Neural Field Proximal Training for High-Resolution 4D Dynamic Image Reconstruction | Luke Lozenski et.al. | 2403.03860v1 | null |
2024-03-06 | MedMamba: Vision Mamba for Medical Image Classification | Yubiao Yue et.al. | 2403.03849v1 | link |
2024-03-05 | Extension Theory and Fermionic Strongly Fusion 2-Categories | Thibault D. Décoppet et.al. | 2403.03211v1 | null |
2024-03-05 | Scaling Rectified Flow Transformers for High-Resolution Image Synthesis | Patrick Esser et.al. | 2403.03206v1 | null |
2024-03-05 | Behavior Generation with Latent Actions | Seungjae Lee et.al. | 2403.03181v1 | link |
2024-03-05 | Deep-Learned Compression for Radio-Frequency Signal Classification | Armani Rodriguez et.al. | 2403.03150v1 | null |
2024-03-05 | Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization | Yuxin Guo et.al. | 2403.03145v1 | link |
2024-03-05 | Motion-Corrected Moving Average: Including Post-Hoc Temporal Information for Improved Video Segmentation | Robert Mendel et.al. | 2403.03120v1 | null |
2024-03-05 | Equilibria in Two-Stage Facility Location with Atomic Clients | Simon Krogmann et.al. | 2403.03114v1 | null |
2024-03-05 | Galaxies in the Zone of Avoidance: Misclassifications using machine learning tools | P. Marchant Cortés et.al. | 2403.03098v1 | null |
2024-03-05 | Collective self-caging of active filaments in virtual confinement | Maximilian Kurjahn et.al. | 2403.03093v1 | null |
2024-03-05 | A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives | Simone Alberto Peirone et.al. | 2403.03037v1 | null |
2024-03-03 | Enhancing Retinal Vascular Structure Segmentation in Images With a Novel Design Two-Path Interactive Fusion Module Model | Rui Yang et.al. | 2403.01362v1 | null |
2024-03-02 | Improve Cost Efficiency of Active Learning over Noisy Dataset | Zan-Kai Chong et.al. | 2403.01346v1 | null |
2024-03-02 | An eternal hypersurface flow arising in centro-affine geometry | Xinjie Jiang et.al. | 2403.01340v1 | null |
2024-03-02 | Image-Based Dietary Assessment: A Healthy Eating Plate Estimation System | Assylzhan Izbassar et.al. | 2403.01310v1 | null |
2024-03-02 | VNLP: Turkish NLP Package | Meliksah Turker et.al. | 2403.01309v1 | null |
2024-03-02 | Towards a classification of |
Alyson Deines et.al. | 2403.01287v1 | null |
2024-03-02 | Irfan Habib et.al. | 2403.01285v1 | null | |
2024-03-02 | Fast Low-parameter Video Activity Localization in Collaborative Learning Environments | Venkatesh Jatla et.al. | 2403.01281v1 | null |
2024-03-02 | Rigidity results for group von Neumann algebras with diffuse center | Ionuţ Chifan et.al. | 2403.01280v1 | null |
2024-03-02 | Can a Confident Prior Replace a Cold Posterior? | Martin Marek et.al. | 2403.01272v1 | link |
2024-02-29 | Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers | Tsai-Shien Chen et.al. | 2402.19479v1 | null |
2024-02-29 | Towards Generalizable Tumor Synthesis | Qi Chen et.al. | 2402.19470v1 | null |
2024-02-29 | Humanoid Locomotion as Next Token Prediction | Ilija Radosavovic et.al. | 2402.19469v1 | null |
2024-03-01 | TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning | Kate Sanders et.al. | 2402.19467v2 | null |
2024-02-29 | Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models | Frederik Kunstner et.al. | 2402.19449v1 | null |
2024-02-29 | Probing the Information Encoded in Neural-based Acoustic Models of Automatic Speech Recognition Systems | Quentin Raymondaud et.al. | 2402.19443v1 | null |
2024-02-29 | Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation | Jonathan Yang et.al. | 2402.19432v1 | null |
2024-02-29 | PaECTER: Patent-level Representation Learning using Citation-informed Transformers | Mainak Ghosh et.al. | 2402.19411v1 | null |
2024-02-29 | Navigating Hallucinations for Reasoning of Unintentional Activities | Shresth Grover et.al. | 2402.19405v1 | null |
2024-02-29 | A Newborn AGN in a Starforming Galaxy | P. Arévalo et.al. | 2402.19403v1 | null |
2024-02-28 | Time-efficient filtering of polarimetric data by checking physical realizability of experimental Mueller matrices | Tatiana Novikova et.al. | 2402.18555v1 | null |
2024-02-28 | Selection of appropriate multispectral camera exposure settings and radiometric calibration methods for applications in phenotyping and precision agriculture | Vaishali Swaminathan et.al. | 2402.18553v1 | null |
2024-02-28 | Implicit Bias of Next-Token Prediction | Christos Thrampoulidis et.al. | 2402.18551v1 | null |
2024-02-28 | Defect Detection in Tire X-Ray Images: Conventional Methods Meet Deep Structures | Andrei Cozma et.al. | 2402.18527v1 | null |
2024-02-28 | Do galaxy mergers prefer under-dense environments? | U. Sureshkumar et.al. | 2402.18520v1 | null |
2024-02-28 | Log Neural Controlled Differential Equations: The Lie Brackets Make a Difference | Benjamin Walker et.al. | 2402.18512v1 | null |
2024-02-28 | Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling | Mahdi Karami et.al. | 2402.18508v1 | null |
2024-02-28 | Detection of Micromobility Vehicles in Urban Traffic Videos | Khalil Sabri et.al. | 2402.18503v1 | link |
2024-02-28 | Few-Shot Fairness: Unveiling LLM's Potential for Fairness-Aware Classification | Garima Chhikara et.al. | 2402.18502v1 | null |
2024-02-28 | ROG$_{PL}$: Robust Open-Set Graph Learning via Region-Based Prototype Learning | Qin Zhang et.al. | 2402.18495v1 | null |
2024-02-27 | Diffusion Meets DAgger: Supercharging Eye-in-hand Imitation Learning | Xiaoyu Zhang et.al. | 2402.17768v1 | null |
2024-02-27 | Towards Optimal Learning of Language Models | Yuxian Gu et.al. | 2402.17759v1 | null |
2024-02-27 | An Eye Gaze Heatmap Analysis of Uncertainty Head-Up Display Designs for Conditional Automated Driving | Michael A. Gerber et.al. | 2402.17751v1 | null |
2024-02-27 | Scaling on-chip photonic neural processors using arbitrarily programmable wave propagation | Tatsuhiro Onodera et.al. | 2402.17750v1 | link |
2024-02-27 | Linking Order to Strength in Metals | Nicolas Argibay et.al. | 2402.17728v1 | null |
2024-02-27 | MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation | Hanan Gani et.al. | 2402.17725v1 | link |
2024-02-27 | Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners | Yazhou Xing et.al. | 2402.17723v1 | null |
2024-02-27 | Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers | Yiwei Lu et.al. | 2402.17710v1 | null |
2024-02-27 | NextLevelBERT: Investigating Masked Language Modeling with Higher-Level Representations for Long Documents | Tamara Czinczoll et.al. | 2402.17682v1 | null |
2024-02-27 | MCF-VC: Mitigate Catastrophic Forgetting in Class-Incremental Learning for Multimodal Video Captioning | Huiyu Xiong et.al. | 2402.17680v1 | null |
2024-02-26 | Open Your Ears to Take a Look: A State-of-the-Art Report on the Integration of Sonification and Visualization | Kajetan Enge et.al. | 2402.16558v1 | null |
2024-02-26 | LLM-based Privacy Data Augmentation Guided by Knowledge Distillation with a Distribution Tutor for Medical Text Classification | Yiping Song et.al. | 2402.16515v1 | null |
2024-02-26 | Photonic Neural Network Fabricated on Thin Film Lithium Niobate for High-Fidelity and Power-Efficient Matrix Computation | Yong Zheng et.al. | 2402.16513v1 | null |
2024-02-26 | Intelligent Known and Novel Aircraft Recognition -- A Shift from Classification to Similarity Learning for Combat Identification | Ahmad Saeed et.al. | 2402.16486v1 | null |
2024-02-26 | Edge Detectors Can Make Deep Convolutional Neural Networks More Robust | Jin Ding et.al. | 2402.16479v1 | null |
2024-02-26 | Autonomous Integration of TSN-unaware Applications with QoS Requirements in TSN Networks | Moritz Fluechter et.al. | 2402.16454v1 | null |
2024-02-26 | Retrouver l'inventeur-auteur : la lev{é}e d'homonymies d'autorat entre les brevets et les publications scientifiques | David Reymond et.al. | 2402.16440v1 | null |
2024-02-26 | Improving behavior based authentication against adversarial attack using XAI | Dong Qin et.al. | 2402.16430v1 | null |
2024-02-26 | Adaptive Online Learning of Separable Path Graph Transforms for Intra-prediction | Wen-Yang Lu et.al. | 2402.16371v1 | null |
2024-02-26 | DEYO: DETR with YOLO for End-to-End Object Detection | Haodong Ouyang et.al. | 2402.16370v1 | null |
2024-02-26 | SPINEPS -- Automatic Whole Spine Segmentation of T2-weighted MR images using a Two-Phase Approach to Multi-class Semantic and Instance Segmentation | Hendrik Möller et.al. | 2402.16368v1 | link |
2024-02-26 | An Integrated Data Processing Framework for Pretraining Foundation Models | Yiding Sun et.al. | 2402.16358v1 | link |
2024-02-26 | What Text Design Characterizes Book Genres? | Daichi Haraguchi et.al. | 2402.16356v1 | null |
2024-02-23 | A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends | Abolfazl Younesi et.al. | 2402.15490v1 | null |
2024-02-23 | Retinotopic Mapping Enhances the Robustness of Convolutional Neural Networks | Jean-Nicolas Jérémie et.al. | 2402.15480v1 | null |
2024-02-23 | FAIR: Filtering of Automatically Induced Rules | Divya Jyoti Bajpai et.al. | 2402.15472v1 | null |
2024-02-23 | GROS: A General Robust Aggregation Strategy | Alejandro Cholaquidis et.al. | 2402.15442v1 | null |
2024-02-23 | Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales | Shuren Qi et.al. | 2402.15430v1 | link |
2024-02-23 | ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation | Yi Zhang et.al. | 2402.15429v1 | link |
2024-02-23 | Understanding Entrainment in Human Groups: Optimising Human-Robot Collaboration from Lessons Learned during Human-Human Collaboration | Eike Schneiders et.al. | 2402.15427v1 | null |
2024-02-23 | PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning | Simon Holk et.al. | 2402.15420v1 | null |
2024-02-23 | G-RepsNet: A Fast and General Construction of Equivariant Networks for Arbitrary Matrix Groups | Sourya Basu et.al. | 2402.15413v1 | null |
2024-02-23 | A Universal Method for Solar Filament Detection from H-alpha Observations using Semi-supervised Deep Learning | Andrea Diercke et.al. | 2402.15407v1 | null |
2024-02-22 | Link Prediction under Heterophily: A Physics-Inspired Graph Neural Network Approach | Andrea Giuseppe Di Francesco et.al. | 2402.14802v1 | null |
2024-02-22 | Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis | Willi Menapace et.al. | 2402.14797v1 | null |
2024-02-22 | Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models | Yixuan Ren et.al. | 2402.14780v1 | null |
2024-02-22 | Zero-Shot Pediatric Tuberculosis Detection in Chest X-Rays using Self-Supervised Learning | Daniel Capellán-Martín et.al. | 2402.14741v1 | null |
2024-02-22 | Solitons of the mean curvature flow in |
Rafael López et.al. | 2402.14727v1 | null |
2024-02-22 | A Transformer Model for Boundary Detection in Continuous Sign Language | Razieh Rastgoo et.al. | 2402.14720v1 | null |
2024-02-22 | InfFeed: Influence Functions as a Feedback to Improve the Performance of Subjective Tasks | Somnath Banerjee et.al. | 2402.14702v1 | null |
2024-02-22 | Big data analytics to classify earthwork-related locations: A Chengdu study | Lei Yu et.al. | 2402.14698v1 | null |
2024-02-22 | Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off | Futa Waseda et.al. | 2402.14648v1 | null |
2024-02-22 | Distributed Radiance Fields for Edge Video Compression and Metaverse Integration in Autonomous Driving | Eugen Šlapak et.al. | 2402.14642v1 | null |
2024-02-21 | A Simple and Yet Fairly Effective Defense for Graph Neural Networks | Sofiane Ennadir et.al. | 2402.13987v1 | link |
2024-02-21 | On modular representations of inner forms of |
Johannes Droschl et.al. | 2402.13969v1 | null |
2024-02-21 | New directions in algebraic statistics: Three challenges from 2023 | Yulia Alexandr et.al. | 2402.13961v1 | null |
2024-02-21 | On the topological classification of complex plane curve singularities | Alberto Fernández-Hernández et.al. | 2402.13941v1 | null |
2024-02-21 | Verifying message-passing neural networks via topology-based bounds tightening | Christopher Hojny et.al. | 2402.13937v1 | null |
2024-02-21 | Tumor segmentation on whole slide images: training or prompting? | Huaqian Wu et.al. | 2402.13932v1 | null |
2024-02-21 | BenchCloudVision: A Benchmark Analysis of Deep Learning Approaches for Cloud Detection and Segmentation in Remote Sensing Imagery | Loddo Fabio et.al. | 2402.13918v1 | link |
2024-02-21 | An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach | Mohammad Amaz Uddin et.al. | 2402.13871v1 | null |
2024-02-21 | RFI-DRUnet: Restoring dynamic spectra corrupted by radio frequency interference -- Application to pulsar observations | Xiao Zhang et.al. | 2402.13867v1 | null |
2024-02-21 | What we can learn from TikTok through its Research API | Francesco Corso et.al. | 2402.13855v1 | null |
2024-02-20 | Video ReCap: Recursive Captioning of Hour-Long Videos | Md Mohaiminul Islam et.al. | 2402.13250v1 | null |
2024-02-20 | SMORE: Similarity-based Hyperdimensional Domain Adaptation for Multi-Sensor Time Series Classification | Junyao Wang et.al. | 2402.13233v1 | null |
2024-02-20 | A Touch, Vision, and Language Dataset for Multimodal Alignment | Letian Fu et.al. | 2402.13232v1 | null |
2024-02-20 | NeRF Solves Undersampled MRI Reconstruction | Tae Jun Jang et.al. | 2402.13226v1 | null |
2024-02-20 | VideoPrism: A Foundational Visual Encoder for Video Understanding | Long Zhao et.al. | 2402.13217v1 | null |
2024-02-20 | How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena | Marco Gaido et.al. | 2402.13208v1 | null |
2024-02-20 | A novel image correction method for cloud-affected observations with Imaging Atmospheric Cherenkov Telescopes | Natalia Żywucka et.al. | 2402.13190v1 | null |
2024-02-20 | UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing | Jianhong Bai et.al. | 2402.13185v1 | null |
2024-02-20 | DINOBot: Robot Manipulation via Retrieval and Alignment with Vision Foundation Models | Norman Di Palo et.al. | 2402.13181v1 | null |
2024-02-20 | 3D Kinematics Estimation from Video with a Biomechanical Model and Synthetic Training Data | Zhi-Yi Lin et.al. | 2402.13172v1 | null |
2024-02-19 | Short-Period Variables in TESS Full-Frame Image Light Curves Identified via Convolutional Neural Networks | Greg Olmschenk et.al. | 2402.12369v1 | null |
2024-02-19 | The first all-sky survey of star-forming galaxies with eROSITA: Scaling relations and a population of X-ray luminous starbursts | E. Kyritsis et.al. | 2402.12367v1 | null |
2024-02-19 | An Adversarial Approach to Evaluating the Robustness of Event Identification Models | Obai Bahwal et.al. | 2402.12338v1 | null |
2024-02-19 | Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models | Christian Schlarmann et.al. | 2402.12336v1 | link |
2024-02-19 | Generating Survival Interpretable Trajectories and Data | Andrei V. Konstantinov et.al. | 2402.12331v1 | null |
2024-02-19 | Asymptotic Gaussian Fluctuations of Eigenvectors in Spectral Clustering | Hugo Lebeau et.al. | 2402.12302v1 | null |
2024-02-19 | Time-periodic behaviour in one- and two-dimensional interacting particle systems | Jonas Köppl et.al. | 2402.12300v1 | null |
2024-02-19 | Is Open-Source There Yet? A Comparative Study on Commercial and Open-Source LLMs in Their Ability to Label Chest X-Ray Reports | Felix J. Dorfner et.al. | 2402.12298v1 | null |
2024-02-19 | Revisiting registration-based synthesis: A focus on unsupervised MR image synthesis | Savannah P. Hays et.al. | 2402.12288v1 | null |
2024-02-19 | Significance of Chirp MFCC as a Feature in Speech and Audio Applications | S. Johanan Joysingh et.al. | 2402.12239v1 | null |
2024-02-16 | PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter | Junfei Xiao et.al. | 2402.10896v1 | null |
2024-02-16 | Fusion of Diffusion Weighted MRI and Clinical Data for Predicting Functional Outcome after Acute Ischemic Stroke with Deep Contrastive Learning | Chia-Ling Tsai et.al. | 2402.10894v1 | null |
2024-02-16 | Weak-Mamba-UNet: Visual Mamba Makes CNN and ViT Work Better for Scribble-based Medical Image Segmentation | Ziyang Wang et.al. | 2402.10887v1 | link |
2024-02-16 | Control Color: Multimodal Diffusion-based Interactive Image Colorization | Zhexin Liang et.al. | 2402.10855v1 | null |
2024-02-16 | HistoSegCap: Capsules for Weakly-Supervised Semantic Segmentation of Histological Tissue Type in Whole Slide Images | Mobina Mansoori et.al. | 2402.10851v1 | null |
2024-02-16 | FedD2S: Personalized Data-Free Federated Knowledge Distillation | Kawa Atapour et.al. | 2402.10846v1 | null |
2024-02-16 | Pedipulate: Enabling Manipulation Skills using a Quadruped Robot's Leg | Philip Arm et.al. | 2402.10837v1 | null |
2024-02-16 | GAN-driven Electromagnetic Imaging of 2-D Dielectric Scatterers | Ehtasham Naseer et.al. | 2402.10831v1 | null |
2024-02-16 | Structure results for torus fixed loci | Jarod Alper et.al. | 2402.10823v1 | null |
2024-02-16 | Training Class-Imbalanced Diffusion Model Via Overlap Optimization | Divin Yan et.al. | 2402.10821v1 | link |
2024-02-15 | Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling | Raunaq Bhirangi et.al. | 2402.10211v1 | null |
2024-02-15 | FedAnchor: Enhancing Federated Semi-Supervised Learning with Label Contrastive Loss for Unlabeled Clients | Xinchi Qiu et.al. | 2402.10191v1 | null |
2024-02-15 | Euclid preparation. Measuring detailed galaxy morphologies for Euclid with Machine Learning | Euclid Collaboration et.al. | 2402.10187v1 | link |
2024-02-15 | DeepSRGM -- Sequence Classification and Ranking in Indian Classical Music with Deep Learning | Sathwik Tejaswi Madhusudhan et.al. | 2402.10168v1 | null |
2024-02-15 | Holographic covering and the fortuity of black holes | Chi-Ming Chang et.al. | 2402.10129v1 | null |
2024-02-15 | Classification Diffusion Models | Shahar Yadin et.al. | 2402.10095v1 | null |
2024-02-15 | MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations | Benedikt Alkin et.al. | 2402.10093v1 | link |
2024-02-15 | GraphCBAL: Class-Balanced Active Learning for Graph Neural Networks via Reinforcement Learning | Chengcheng Yu et.al. | 2402.10074v1 | null |
2024-02-15 | Both Matter: Enhancing the Emotional Intelligence of Large Language Models without Compromising the General Intelligence | Weixiang Zhao et.al. | 2402.10073v1 | null |
2024-02-15 | NYCTALE: Neuro-Evidence Transformer for Adaptive and Personalized Lung Nodule Invasiveness Prediction | Sadaf Khademi et.al. | 2402.10066v1 | null |
2024-02-14 | LL-GABR: Energy Efficient Live Video Streaming Using Reinforcement Learning | Adithya Raman et.al. | 2402.09392v1 | null |
2024-02-14 | GraSSRep: Graph-Based Self-Supervised Learning for Repeat Detection in Metagenomic Assembly | Ali Azizpour et.al. | 2402.09381v1 | link |
2024-02-14 | Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge | Jiancheng Yang et.al. | 2402.09372v1 | null |
2024-02-14 | Magic-Me: Identity-Specific Video Customized Diffusion | Ze Ma et.al. | 2402.09368v1 | null |
2024-02-14 | Small instanton-induced flavor invariants and the axion potential | Ravneet Bedi et.al. | 2402.09361v1 | null |
2024-02-14 | Pruning Sparse Tensor Neural Networks Enables Deep Learning for 3D Ultrasound Localization Microscopy | Brice Rauby et.al. | 2402.09359v1 | null |
2024-02-14 | DoRA: Weight-Decomposed Low-Rank Adaptation | Shih-Yang Liu et.al. | 2402.09353v1 | null |
2024-02-14 | Irreducible representations of the crystallisation of the |
Manabendra Giri et.al. | 2402.09347v1 | null |
2024-02-14 | Registration of Longitudinal Spine CTs for Monitoring Lesion Growth | Malika Sanhinova et.al. | 2402.09341v1 | null |
2024-02-14 | Stability and Multigroup Fairness in Ranking with Uncertain Predictions | Siddartha Devic et.al. | 2402.09326v1 | null |
2024-02-13 | IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation | Luke Melas-Kyriazi et.al. | 2402.08682v1 | null |
2024-02-13 | A Convergence Analysis of Approximate Message Passing with Non-Separable Functions and Applications to Multi-Class Classification | Burak Çakmak et.al. | 2402.08676v1 | null |
2024-02-13 | Learning Emergent Gaits with Decentralized Phase Oscillators: on the role of Observations, Rewards, and Feedback | Jenny Zhang et.al. | 2402.08662v1 | null |
2024-02-13 | BdSLW60: A Word-Level Bangla Sign Language Dataset | Husne Ara Rubaiyeat et.al. | 2402.08635v1 | link |
2024-02-13 | Convolutional Neural Networks Towards Facial Skin Lesions Detection | Reza Sarshar et.al. | 2402.08592v1 | null |
2024-02-13 | Totally geodesic submanifolds and polar actions on Stiefel manifolds | Claudio Gorodski et.al. | 2402.08585v1 | null |
2024-02-13 | Motion-Adaptive Inference for Flexible Learned B-Frame Compression | M. Akin Yilmaz et.al. | 2402.08550v1 | null |
2024-02-13 | Approximately Piecewise E(3) Equivariant Point Networks | Matan Atzmon et.al. | 2402.08529v1 | null |
2024-02-13 | Reduced-order modeling of the dynamics of an inverted flag from experimental data | Zhenwei Xu et.al. | 2402.08504v1 | null |
2024-02-13 | Intriguing Differences Between Zero-Shot and Systematic Evaluations of Vision-Language Transformer Models | Shaeke Salman et.al. | 2402.08473v1 | null |
2024-02-13 | Wavefront Randomization Improves Deconvolution | Amit Kohli et.al. | 2402.07900v2 | null |
2024-02-12 | Detection of Spider Mites on Labrador Beans through Machine Learning Approaches Using Custom Datasets | Violet Liu et.al. | 2402.07895v1 | null |
2024-02-12 | Perfect stable regularity lemma and slice-wise stable hypergraphs | Artem Chernikov et.al. | 2402.07870v1 | null |
2024-02-12 | On Computationally Efficient Multi-Class Calibration | Parikshit Gopalan et.al. | 2402.07821v1 | null |
2024-02-12 | A Benchmark Grocery Dataset of Realworld Point Clouds From Single View | Shivanand Venkanna Sheshappanavar et.al. | 2402.07819v1 | null |
2024-02-12 | Fixation for |
Laure Marêché et.al. | 2402.07807v1 | null |
2024-02-12 | Estimation of non-uniform blur using a patch-based regression convolutional neural network (CNN) | Luis G. Varela et.al. | 2402.07796v1 | null |
2024-02-12 | "Layer-by-layer" Unsupervised Clustering of Statistically Relevant Fluctuations in Noisy Time-series Data of Complex Dynamical Systems | Matteo Becchi et.al. | 2402.07786v1 | null |
2024-02-12 | Solving parameter-dependent semi-algebraic systems | Louis Gaillard et.al. | 2402.07782v1 | null |
2024-02-12 | Observations of the new meteor shower from comet 46P/Wirtanen | D. Vida et.al. | 2402.07769v1 | null |
2024-02-09 | A two-stage algorithm in evolutionary product unit neural networks for classification | Antonio J. Tallón-Ballesteros et.al. | 2402.06622v1 | null |
2024-02-09 | Image-based Deep Learning for the time-dependent prediction of fresh concrete properties | Max Meyer et.al. | 2402.06611v1 | null |
2024-02-09 | SAE: Single Architecture Ensemble Neural Networks | Martin Ferianc et.al. | 2402.06580v1 | null |
2024-02-09 | Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning | Amir Ziai et.al. | 2402.06560v1 | link |
2024-02-09 | Self Supervised Learning for Improved Calibrationless Radial MRI with NLINV-Net | Moritz Blumenthal et.al. | 2402.06550v1 | null |
2024-02-09 | Bryndza at ClimateActivism 2024: Stance, Target and Hate Event Detection via Retrieval-Augmented GPT-4 and LLaMA | Marek Šuppa et.al. | 2402.06549v1 | null |
2024-02-09 | Feature Density Estimation for Out-of-Distribution Detection via Normalizing Flows | Evan D. Cook et.al. | 2402.06537v1 | null |
2024-02-09 | Refining Myocardial Infarction Detection: A Novel Multi-Modal Composite Kernel Strategy in One-Class Classification | Muhammad Uzair Zahid et.al. | 2402.06530v1 | null |
2024-02-09 | Flexible infinite-width graph convolutional networks and the importance of representation learning | Ben Anson et.al. | 2402.06525v1 | null |
2024-02-09 | Dynamic swarms regulate the morphology and distribution of soft membrane domains | Aakanksha Gubbala et.al. | 2402.06518v1 | null |
2024-02-08 | Classifying Nodes in Graphs without GNNs | Daniel Winter et.al. | 2402.05934v1 | link |
2024-02-08 | An Interactive Agent Foundation Model | Zane Durante et.al. | 2402.05929v1 | null |
2024-02-08 | Point-VOS: Pointing Up Video Object Segmentation | Idil Esen Zulfikar et.al. | 2402.05917v1 | null |
2024-02-08 | A Survey on Detection, Classification, and Tracking of Aerial Threats using Radar and Communications Systems | Wahab Khawaja et.al. | 2402.05909v1 | null |
2024-02-09 | Large Language Model Meets Graph Neural Network in Knowledge Distillation | Shengxiang Hu et.al. | 2402.05894v2 | null |
2024-02-08 | Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data | Shufan Li et.al. | 2402.05892v1 | null |
2024-02-08 | CREMA: Multimodal Compositional Video Reasoning via Efficient Modular Adaptation and Fusion | Shoubin Yu et.al. | 2402.05889v1 | null |
2024-02-08 | Sandwiched Compression: Repurposing Standard Codecs with Neural Network Wrappers | Onur G. Guleryuz et.al. | 2402.05887v1 | link |
2024-02-08 | GET-Tok: A GenAI-Enriched Multimodal TikTok Dataset Documenting the 2022 Attempted Coup in Peru | Gabriela Pinto et.al. | 2402.05882v1 | link |
2024-02-08 | You've Got to Feel It To Believe It: Multi-Modal Bayesian Inference for Semantic and Property Prediction | Parker Ewen et.al. | 2402.05872v1 | null |
2024-02-07 | Edu-ConvoKit: An Open-Source Library for Education Conversation Data | Rose E. Wang et.al. | 2402.05111v1 | link |
2024-02-07 | Moduli Parameters of Complex Singularities with Non-Degenerate Newton Boundary | Janko Boehm et.al. | 2402.05093v1 | null |
2024-02-07 | Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation | Ziyang Wang et.al. | 2402.05079v1 | link |
2024-02-07 | Arbitrary Scale Super-Resolution Assisted Lunar Crater Detection in Satellite Images | Atal Tewari et.al. | 2402.05068v1 | null |
2024-02-07 | Efficient Multi-Resolution Fusion for Remote Sensing Data with Label Uncertainty | Hersh Vakharia et.al. | 2402.05045v1 | link |
2024-02-07 | PAC Learnability under Explanation-Preserving Graph Perturbations | Xu Zheng et.al. | 2402.05039v1 | null |
2024-02-07 | Strong convexity-guided hyper-parameter optimization for flatter losses | Rahul Yedida et.al. | 2402.05025v1 | null |
2024-02-07 | Example-based Explanations for Random Forests using Machine Unlearning | Tanmay Surve et.al. | 2402.05007v1 | null |
2024-02-07 | Randomized Confidence Bounds for Stochastic Partial Monitoring | Maxime Heuillet et.al. | 2402.05002v1 | null |
2024-02-07 | Beyond explaining: XAI-based Adaptive Learning with SHAP Clustering for Energy Consumption Prediction | Tobias Clement et.al. | 2402.04982v1 | null |
2024-02-06 | EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters | Quan Sun et.al. | 2402.04252v1 | link |
2024-02-06 | The spectrum of excisive functors | Gregory Arone et.al. | 2402.04244v1 | null |
2024-02-06 | A classification of nonzero skew immaculate functions | Sarah Mason et.al. | 2402.04219v1 | null |
2024-02-06 | Resource-Aware Hierarchical Federated Learning in Wireless Video Caching Networks | Md Ferdous Pervej et.al. | 2402.04216v1 | null |
2024-02-06 | "Task Success" is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors | Lin Guan et.al. | 2402.04210v1 | null |
2024-02-06 | 3D Volumetric Super-Resolution in Radiology Using 3D RRDB-GAN | Juhyung Ha et.al. | 2402.04171v1 | null |
2024-02-06 | Human Emotions Analysis and Recognition Using EEG Signals in Response to 360$^\circ$ Videos | Haseeb ur Rahman Abbasi et.al. | 2402.04142v1 | null |
2024-02-06 | Hierarchical Delay Attribution Classification using Unstructured Text in Train Management Systems | Anton Borg et.al. | 2402.04108v1 | null |
2024-02-06 | Analysis of Deep Image Prior and Exploiting Self-Guidance for Image Reconstruction | Shijun Liang et.al. | 2402.04097v1 | null |
2024-02-06 | A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation | Zhengbo Wang et.al. | 2402.04087v1 | link |
2024-02-05 | Multiclass Classification Procedure for Detecting Attacks on MQTT-IoT Protocol | Hector Alaiz-Moreton et.al. | 2402.03270v1 | null |
2024-02-05 | Security Advice for Parents and Children About Content Filtering and Circumvention as Found on YouTube and TikTok | Ran Elgedawy et.al. | 2402.03255v1 | null |
2024-02-05 | JOBSKAPE: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching | Antoine Magron et.al. | 2402.03242v1 | link |
2024-02-05 | FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition | Xiaohu Huang et.al. | 2402.03241v1 | null |
2024-02-05 | IGUANe: a 3D generalizable CycleGAN for multicenter harmonization of brain MR images | Vincent Roca et.al. | 2402.03227v1 | null |
2024-02-05 | English Prompts are Better for NLI-based Zero-Shot Emotion Classification than Target-Language Prompts | Patrick Barreiß et.al. | 2402.03223v1 | null |
2024-02-05 | "Define Your Terms" : Enhancing Efficient Offensive Speech Classification with Definition | Huy Nghiem et.al. | 2402.03221v1 | link |
2024-02-05 | Isotropy, Clusters, and Classifiers | Timothee Mickus et.al. | 2402.03191v1 | null |
2024-02-06 | Cool-chic video: Learned video coding with 800 parameters | Thomas Leguay et.al. | 2402.03179v2 | null |
2024-02-05 | Accurate and Well-Calibrated ICD Code Assignment Through Attention Over Diverse Label Embeddings | Gonçalo Gomes et.al. | 2402.03172v1 | link |
2024-02-02 | From gas to stars: MUSEings on the internal evolution of IC 1613 | S. Taibi et.al. | 2402.01631v1 | null |
2024-02-02 | Truncation technique for variational quantum eigensolver for Molecular Hamiltonians | Qidong Xu et.al. | 2402.01630v1 | null |
2024-02-02 | L2G2G: a Scalable Local-to-Global Network Embedding with Graph Autoencoders | Ruikang Ouyang et.al. | 2402.01614v1 | link |
2024-02-02 | Immersive Video Compression using Implicit Neural Representations | Ho Man Kwan et.al. | 2402.01596v1 | link |
2024-02-02 | NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties | Jingyuan Sun et.al. | 2402.01590v1 | null |
2024-02-02 | Boximator: Generating Rich and Controllable Motions for Video Synthesis | Jiawei Wang et.al. | 2402.01566v1 | null |
2024-02-02 | Deep Continuous Networks | Nergis Tomen et.al. | 2402.01557v1 | link |
2024-02-02 | SLYKLatent, a Learning Framework for Facial Features Estimation | Samuel Adebayo et.al. | 2402.01555v1 | null |
2024-02-02 | Advancing Brain Tumor Inpainting with Generative Models | Ruizhi Zhu et.al. | 2402.01509v1 | null |
2024-02-02 | Di-NeRF: Distributed NeRF for Collaborative Learning with Unknown Relative Poses | Mahboubeh Asadi et.al. | 2402.01485v1 | null |
2024-02-01 | We're Not Using Videos Effectively: An Updated Domain Adaptive Video Segmentation Baseline | Simar Kareer et.al. | 2402.00868v1 | link |
2024-02-01 | Deep Room Impulse Response Completion | Jackie Lin et.al. | 2402.00859v1 | null |
2024-02-01 | Early Time Classification with Accumulated Accuracy Gap Control | Liran Ringel et.al. | 2402.00857v1 | link |
2024-02-01 | BootsTAP: Bootstrapped Training for Tracking-Any-Point | Carl Doersch et.al. | 2402.00847v1 | link |
2024-02-01 | Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering | Pinxin Liu et.al. | 2402.00827v1 | null |
2024-02-01 | Examining the Influence of Digital Phantom Models in Virtual Imaging Trials for Tomographic Breast Imaging | Amar Kavuri et.al. | 2402.00812v1 | null |
2024-02-01 | ReAGent: Towards A Model-agnostic Feature Attribution Method for Generative Language Models | Zhixue Zhao et.al. | 2402.00794v1 | link |
2024-02-01 | Distinguishing the Indistinguishable: Human Expertise in Algorithmic Prediction | Rohan Alur et.al. | 2402.00793v1 | link |
2024-02-02 | CroissantLLM: A Truly Bilingual French-English Language Model | Manuel Faysse et.al. | 2402.00786v2 | link |
2024-02-01 | Hybrid Quantum Vision Transformers for Event Classification in High Energy Physics | Eyup B. Unlu et.al. | 2402.00776v1 | null |
2024-01-31 | Classification-Oriented Semantic Wireless Communications | Emrecan Kutay et.al. | 2401.18069v1 | null |
2024-01-31 | Rank Supervised Contrastive Learning for Time Series Classification | Qianying Ren et.al. | 2401.18057v1 | null |
2024-01-31 | Variable selection for Naïve Bayes classification | Rafael Blanquero et.al. | 2401.18039v1 | null |
2024-01-31 | Optimizing contrastive learning for cortical folding pattern detection | Aymeric Gaudin et.al. | 2401.18035v1 | null |
2024-01-31 | A Neural Enhancement Post-Processor with a Dynamic AV1 Encoder Configuration Strategy for CLIC 2024 | Darren Ramsook et.al. | 2401.18021v1 | null |
2024-01-31 | EEG-GPT: Exploring Capabilities of Large Language Models for EEG Classification and Interpretation | Jonathan W. Kim et.al. | 2401.18006v1 | null |
2024-01-31 | Unsupervised Learning of Topological Non-Abelian Braiding in Non-Hermitian Bands | Yang Long et.al. | 2401.17968v1 | null |
2024-01-31 | Error-Tolerant E-Discovery Protocols | Jinshuo Dong et.al. | 2401.17952v1 | null |
2024-01-31 | HyperZ$\cdot$Z$\cdot$W Operator Connects Slow-Fast Networks for Full Context Interaction | Harvie Zhang et.al. | 2401.17948v1 | null |
2024-01-31 | Probabilistic Photonic Computing with Chaotic Light | Frank Brückerhoff-Plückelmann et.al. | 2401.17915v1 | null |
2024-01-30 | The SRG/eROSITA all-sky survey: Hard X-ray selected Active Galactic Nuclei | Sophia G. H. Waddell et.al. | 2401.17306v1 | null |
2024-01-30 | Compact white-dwarf binaries in the combined SRG/eROSITA/SDSS eFEDS survey | A. Schwope et.al. | 2401.17304v1 | null |
2024-01-30 | Searching for X-ray counterparts of unassociated Fermi-LAT sources and rotation-powered pulsars with SRG/eROSITA | Martin G. F. Mayer et.al. | 2401.17295v1 | null |
2024-01-30 | X-ray AGNs with SRG/eROSITA: Multi-wavelength observations reveal merger triggering and post-coalescence circumnuclear blowout | Robert W. Bickley et.al. | 2401.17277v1 | null |
2024-01-30 | ReacLLaMA: Merging chemical and textual information in chemical reactivity AI models | Aline Hartgers et.al. | 2401.17267v1 | null |
2024-01-30 | SLIC: A Learned Image Codec Using Structure and Color | Srivatsa Prativadibhayankaram et.al. | 2401.17246v1 | link |
2024-01-31 | Faster coloring and embedding in dense hypergraphs via stability | Jianfeng Hou et.al. | 2401.17219v2 | null |
2024-01-31 | GazeGPT: Augmenting Human Capabilities using Gaze-contingent Contextual AI for Smart Eyewear | Robert Konrad et.al. | 2401.17217v2 | null |
2024-01-30 | Single Word Change is All You Need: Designing Attacks and Defenses for Text Classifiers | Lei Xu et.al. | 2401.17196v1 | null |
2024-01-30 | GraphViz2Vec: A Structure-aware Feature Generation Model to Improve Classification in GNNs | Shraban Kumar Chatterjee et.al. | 2401.17178v1 | null |
2024-01-29 | Computer Vision for Primate Behavior Analysis in the Wild | Richard Vogg et.al. | 2401.16424v1 | null |
2024-01-29 | Synchformer: Efficient Synchronization from Sparse Cues | Vladimir Iashin et.al. | 2401.16423v1 | null |
2024-01-29 | Strategic Usage in a Multi-Learner Setting | Eliot Shekhtman et.al. | 2401.16422v1 | null |
2024-01-29 | ReTaSA: A Nonparametric Functional Estimation Approach for Addressing Continuous Target Shift | Hwanwoo Kim et.al. | 2401.16410v1 | null |
2024-01-29 | Is K-fold cross validation the best model selection method for Machine Learning? | Juan M Gorriz et.al. | 2401.16407v1 | null |
2024-01-29 | Zero-shot Imitation Policy via Search in Demonstration Dataset | Federco Malato et.al. | 2401.16398v1 | null |
2024-01-29 | Ovarian Cancer Diagnostics using Wavelet Packet Scaling Descriptors | Raymond J. Hinton Jr. et.al. | 2401.16396v1 | null |
2024-01-29 | Evaluation of pseudo-healthy image reconstruction for anomaly detection with deep generative models: Application to brain FDG PET | Ravi Hassanaly et.al. | 2401.16363v1 | link |
2024-01-29 | Curriculum-Based Reinforcement Learning for Quadrupedal Jumping: A Reference-free Design | Vassil Atanassov et.al. | 2401.16337v1 | null |
2024-01-29 | Making the unmodulated Pyramid wavefront sensor smart. Closed-loop demonstration of neural network wavefront reconstruction with MagAO-X | Rico Landman et.al. | 2401.16325v1 | null |
2024-01-26 | From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities | Chaochao Lu et.al. | 2401.15071v1 | null |
2024-01-26 | Deep learning-based approach for tomato classification in complex scenes | Mikael A. Mousse et.al. | 2401.15055v1 | null |
2024-01-26 | Non-Unitary |
Pedro M. F. Pereira et.al. | 2401.15049v1 | null |
2024-01-26 | Machine learning-based analysis of glioma tissue sections: a review | Jan-Philipp Redlich et.al. | 2401.15022v1 | null |
2024-01-26 | Enhancement of a Text-Independent Speaker Verification System by using Feature Combination and Parallel-Structure Classifiers | Kerlos Atia Abdalmalak et.al. | 2401.15018v1 | null |
2024-01-26 | Graph-based Active Learning for Entity Cluster Repair | Victor Christen et.al. | 2401.14992v1 | null |
2024-01-26 | Stokes graphs of the Rabi problem with real parameters | René Langøen et.al. | 2401.14991v1 | null |
2024-01-26 | Minimum-dissipation principle for synchronised stochastic oscillators far from equilibrium | Jan Meibohm et.al. | 2401.14982v1 | null |
2024-01-26 | Microwave lymphedema assessment using deep learning with contour assisted backprojection | Yuyi Chang et.al. | 2401.14970v1 | null |
2024-01-26 | Hold Tight: Identifying Behavioral Patterns During Prolonged Work in VR through Video Analysis | Verena Biener et.al. | 2401.14920v1 | null |
2024-01-25 | Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities | Yiyuan Zhang et.al. | 2401.14405v1 | link |
2024-01-25 | Adaptive Mobile Manipulation for Articulated Objects In the Open World | Haoyu Xiong et.al. | 2401.14403v1 | null |
2024-01-25 | Range-Agnostic Multi-View Depth Estimation With Keyframe Selection | Andrea Conti et.al. | 2401.14401v1 | link |
2024-01-25 | Rethinking Patch Dependence for Masked Autoencoders | Letian Fu et.al. | 2401.14391v1 | null |
2024-01-25 | Smooth Ranking SVM via Cutting-Plane Method | Erhan Can Ozcan et.al. | 2401.14388v1 | link |
2024-01-25 | Inconsistency Masks: Removing the Uncertainty from Input-Pseudo-Label Pairs | Michael R. H. Vorndran et.al. | 2401.14387v1 | link |
2024-01-25 | A Comparative Analysis of Noise Reduction Methods in Sentiment Analysis on Noisy Bengali Texts | Kazi Toufique Elahi et.al. | 2401.14360v1 | link |
2024-01-25 | Computing Derivations on Nilpotent Quadratic Lie Algebras | Pilar Benito et.al. | 2401.14348v1 | null |
2024-01-25 | Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective | Xuechen Zhang et.al. | 2401.14343v1 | null |
2024-01-25 | Progressive Multi-task Anti-Noise Learning and Distilling Frameworks for Fine-grained Vehicle Recognition | Dichao Liu et.al. | 2401.14336v1 | link |
2024-01-24 | Tyche: Stochastic In-Context Learning for Medical Image Segmentation | Marianne Rakic et.al. | 2401.13650v1 | null |
2024-01-24 | Quantifying the Impact of Frame Preemption on Combined TSN Shapers | Rubi Debnath et.al. | 2401.13631v1 | null |
2024-01-24 | Can overfitted deep neural networks in adversarial training generalize? -- An approximation viewpoint | Zhongjie Shi et.al. | 2401.13624v1 | null |
2024-01-24 | FLLIC: Functionally Lossless Image Compression | Xi Zhang et.al. | 2401.13616v1 | null |
2024-01-24 | Enhancing Image Retrieval : A Comprehensive Study on Photo Search using the CLIP Mode | Naresh Kumar Lahajal et.al. | 2401.13613v1 | null |
2024-01-24 | Prompt Weight Experiments for LLM Instruction Fine-Tuning | Mathew Huerta-Enochian et.al. | 2401.13586v1 | null |
2024-01-24 | WPDA: Frequency-based Backdoor Attack with Wavelet Packet Decomposition | Zhengyao Song et.al. | 2401.13578v1 | null |
2024-01-24 | CNN architecture extraction on edge GPU | Peter Horvath et.al. | 2401.13575v1 | null |
2024-01-24 | Benchmarking the Fairness of Image Upsampling Methods | Mike Laszkiewicz et.al. | 2401.13555v1 | null |
2024-01-24 | PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition | Otto Brookes et.al. | 2401.13554v1 | null |
2024-01-23 | SegmentAnyBone: A Universal Model that Segments Any Bone at Any Location on MRI | Hanxue Gu et.al. | 2401.12974v1 | null |
2024-01-23 | On the Efficacy of Text-Based Input Modalities for Action Anticipation | Apoorva Beedu et.al. | 2401.12972v1 | null |
2024-01-23 | The role of environment and AGN feedback in quenching local galaxies: Comparing cosmological hydrodynamical simulations to the SDSS | Paul H. Goubert et.al. | 2401.12953v1 | null |
2024-01-23 | Lumiere: A Space-Time Diffusion Model for Video Generation | Omer Bar-Tal et.al. | 2401.12945v1 | null |
2024-01-23 | Long-range three-dimensional tracking of nanoparticles using interferometric scattering (iSCAT) microscopy | Kiarash Kasaian et.al. | 2401.12939v1 | null |
2024-01-23 | Neural deformation fields for template-based reconstruction of cortical surfaces from MRI | Fabian Bongratz et.al. | 2401.12938v1 | null |
2024-01-23 | Segmentation of tibiofemoral joint tissues from knee MRI using MtRA-Unet and incorporating shape information: Data from the Osteoarthritis Initiative | Akshay Daydar et.al. | 2401.12932v1 | null |
2024-01-23 | pyAKI - An Open Source Solution to Automated KDIGO classification | Christian Porschen et.al. | 2401.12930v1 | null |
2024-01-23 | Performance Analysis of Support Vector Machine (SVM) on Challenging Datasets for Forest Fire Detection | Ankan Kar et.al. | 2401.12924v1 | null |
2024-01-23 | Advancing Glitch Classification in Gravity Spy: Multi-view Fusion with Attention-based Machine Learning for Advanced LIGO's Fourth Observing Run | Yunan Wu et.al. | 2401.12913v1 | null |
2024-01-22 | Connecting the Dots: Leveraging Spatio-Temporal Graph Neural Networks for Accurate Bangla Sign Language Recognition | Haz Sameen Shahgir et.al. | 2401.12210v1 | null |
2024-01-22 | Unsupervised Machine Learning for the Classification of Astrophysical X-ray Sources | Víctor Samuel Pérez-Díaz et.al. | 2401.12203v1 | link |
2024-01-22 | OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics | Peiqi Liu et.al. | 2401.12202v1 | null |
2024-01-22 | In-Context Learning for Extreme Multi-Label Classification | Karel D'Oosterlinck et.al. | 2401.12178v1 | null |
2024-01-22 | Broiler-Net: A Deep Convolutional Framework for Broiler Behavior Analysis in Poultry Houses | Tahereh Zarrat Ehsan et.al. | 2401.12176v1 | link |
2024-01-22 | VRMN-bD: A Multi-modal Natural Behavior Dataset of Immersive Human Fear Responses in VR Stand-up Interactive Games | He Zhang et.al. | 2401.12133v1 | link |
2024-01-22 | Evaluation of QCNN-LSTM for Disability Forecasting in Multiple Sclerosis Using Sequential Multisequence MRI | John D. Mayfield et.al. | 2401.12132v1 | null |
2024-01-22 | Out-of-Distribution Detection & Applications With Ablated Learned Temperature Energy | Will LeVine et.al. | 2401.12129v1 | link |
2024-01-22 | Measures of the Capital Network of the U.S. Economy | Ben Klemens et.al. | 2401.12118v1 | null |
2024-01-22 | A quantitative version of the Steinhaus theorem | Alex Iosevich et.al. | 2401.12112v1 | null |
2024-01-19 | Classifying affine structures with focus-focus singularities | Xiudi Tang et.al. | 2401.10881v1 | null |
2024-01-19 | Motion Consistency Loss for Monocular Visual Odometry with Attention-Based Deep Learning | André O. Françani et.al. | 2401.10857v1 | null |
2024-01-19 | Emotion Classification In Software Engineering Texts: A Comparative Analysis of Pre-trained Transformers Language Models | Mia Mohammad Imran et.al. | 2401.10845v1 | null |
2024-01-19 | Understanding Video Transformers via Universal Concept Discovery | Matthew Kowal et.al. | 2401.10831v1 | null |
2024-01-19 | Long-Term Monitoring of the Oe Star VES 735: Ope! Not So Quiet After All | Brandon Marshall et.al. | 2401.10829v1 | null |
2024-01-19 | ActAnywhere: Subject-Aware Video Background Generation | Boxiao Pan et.al. | 2401.10822v1 | null |
2024-01-19 | RAD-DINO: Exploring Scalable Medical Image Encoders Beyond Text Supervision | Fernando Pérez-García et.al. | 2401.10815v1 | null |
2024-01-19 | Learning to Visually Connect Actions and their Effects | Eric Peh et.al. | 2401.10805v1 | null |
2024-01-19 | Endovascular Detection of Catheter-Thrombus Contact by Vacuum Excitation | Jared Lawson et.al. | 2401.10804v1 | null |
2024-01-19 | TDC-less Direct Time-of-Flight Imaging Using Spiking Neural Networks | Jack MacLean et.al. | 2401.10793v1 | null |
2024-01-18 | Simultaneous Tactile Estimation and Control for Extrinsic Dexterity | Antonia Bronars et.al. | 2401.10230v1 | null |
2024-01-18 | OMG-Seg: Is One Model Good Enough For All Segmentation? | Xiangtai Li et.al. | 2401.10229v1 | link |
2024-01-18 | RAP-SAM: Towards Real-Time All-Purpose Segment Anything | Shilin Xu et.al. | 2401.10228v1 | link |
2024-01-18 | Towards Language-Driven Video Inpainting via Multimodal Large Language Models | Jianzong Wu et.al. | 2401.10226v1 | null |
2024-01-18 | Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing their Contributions | Namitha Padmanabhan et.al. | 2401.10217v1 | null |
2024-01-18 | Transfer Learning in Human Activity Recognition: A Survey | Sourish Gunesh Dhekane et.al. | 2401.10185v1 | null |
2024-01-18 | SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild | Andreas Engelhardt et.al. | 2401.10171v1 | null |
2024-01-19 | Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation | Changgu Chen et.al. | 2401.10150v2 | null |
2024-01-18 | Few-shot learning for COVID-19 Chest X-Ray Classification with Imbalanced Data: An Inter vs. Intra Domain Study | Alejandro Galán-Cuenca et.al. | 2401.10129v1 | null |
2024-01-18 | Sub2Full: split spectrum to boost OCT despeckling without clean data | Lingyun Wang et.al. | 2401.10128v1 | link |
2024-01-17 | Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model | Lianghui Zhu et.al. | 2401.09417v1 | link |
2024-01-17 | Vlogger: Make Your Dream A Vlog | Shaobin Zhuang et.al. | 2401.09414v1 | link |
2024-01-17 | Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text | Mazal Bethany et.al. | 2401.09407v1 | null |
2024-01-17 | Élivágar: Efficient Quantum Circuit Search for Classification | Sashwat Anagolum et.al. | 2401.09393v1 | null |
2024-01-17 | Tri$^{2}$-plane: Volumetric Avatar Reconstruction with Feature Pyramid | Luchuan Song et.al. | 2401.09386v1 | link |
2024-01-17 | New relations of pod partition and its connection with other partition functions | Hemjyoti Nath et.al. | 2401.09374v1 | null |
2024-01-17 | To deform or not: treatment-aware longitudinal registration for breast DCE-MRI during neoadjuvant chemotherapy via unsupervised keypoints detection | Luyi Han et.al. | 2401.09336v1 | link |
2024-01-17 | Machines Do See Color: A Guideline to Classify Different Forms of Racist Discourse in Large Corpora | Diana Davila Gordillo et.al. | 2401.09333v1 | null |
2024-01-17 | Spectral Distribution Complexity of the Surface Fibrillatory Waves Predicts Post-Catheter Ablation Relapse in Persistent Atrial Fibrillation | Pilar Escribano et.al. | 2401.09297v1 | null |
2024-01-17 | T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis | Yoonjin Chung et.al. | 2401.09294v1 | null |
2024-01-16 | From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers | Jiu Feng et.al. | 2401.08415v1 | null |
2024-01-16 | Faster ISNet for Background Bias Mitigation on Deep Neural Networks | Pedro R. A. S. Bassi et.al. | 2401.08409v1 | null |
2024-01-16 | Training and Comparison of nnU-Net and DeepMedic Methods for Autosegmentation of Pediatric Brain Tumors | Arastoo Vossough et.al. | 2401.08404v1 | null |
2024-01-16 | High-Quality Mesh Blendshape Generation from Face Videos via Neural Inverse Rendering | Xin Ming et.al. | 2401.08398v1 | null |
2024-01-16 | DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models | Zongxin Yang et.al. | 2401.08392v1 | link |
2024-01-16 | We don't need no labels: Estimating post-deployment model performance under covariate shift without ground truth | Jakub Białek et.al. | 2401.08348v1 | null |
2024-01-16 | Learn What You Need in Personalized Federated Learning | Kexin Lv et.al. | 2401.08327v1 | link |
2024-01-16 | Application of LLM Agents in Recruitment: A Novel Framework for Resume Screening | Chengguang Gan et.al. | 2401.08315v1 | null |
2024-01-16 | Central extensions of restricted Lie superalgebras and classification of |
Sofiane Bouarroudj et.al. | 2401.08313v1 | null |
2024-01-16 | Evaluating online elasticity estimation of soft objects using standard robot grippers | Shubhan P. Patni et.al. | 2401.08298v1 | null |
2024-01-16 | Multitask Learning in Minimally Invasive Surgical Vision: A Review | Oluwatosin Alabi et.al. | 2401.08256v1 | null |
2024-01-16 | Multi-scale 2D Temporal Map Diffusion Models for Natural Language Video Localization | Chongzhi Zhang et.al. | 2401.08232v1 | null |
2024-01-16 | Towards Causal Relationship in Indefinite Data: Baseline Model and New Datasets | Hang Chen et.al. | 2401.08221v1 | link |
2024-01-16 | Ship Detection in SAR Images with Human-in-the-Loop | Hecheng Jia et.al. | 2401.08213v1 | null |
2024-01-16 | ModelNet-O: A Large-Scale Synthetic Dataset for Occlusion-Aware Point Cloud Classification | Zhongbin Fang et.al. | 2401.08210v1 | link |
2024-01-12 | Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements | Anton Voronov et.al. | 2401.06766v1 | null |
2024-01-12 | Classification of singularities of cluster algebras of finite type II: coefficients | Angélica Benito et.al. | 2401.06758v1 | null |
2024-01-12 | Synthetic Data Generation Framework, Dataset, and Efficient Deep Model for Pedestrian Intention Prediction | Muhammad Naveed Riaz et.al. | 2401.06757v1 | null |
2024-01-12 | Stylometry Analysis of Multi-authored Documents for Authorship and Author Style Change Detection | Muhammad Tayyab Zamir et.al. | 2401.06752v1 | null |
2024-01-12 | Efficient Parallel Algorithms for Inpainting-Based Representations of 4K Images -- Part II: Spatial and Tonal Data Optimization | Niklas Kämper et.al. | 2401.06747v1 | null |
2024-01-12 | Efficient Parallel Algorithms for Inpainting-Based Representations of 4K Images -- Part I: Homogeneous Diffusion Inpainting | Niklas Kämper et.al. | 2401.06744v1 | null |
2024-01-12 | Complexity Classification of Product State Problems for Local Hamiltonians | John Kallaugher et.al. | 2401.06725v1 | null |
2024-01-12 | Obstacle-Aware Positioning of a Mobile Robotic Platform for 6G Networks | Alexandre Costa et.al. | 2401.06717v1 | null |
2024-01-12 | Reliability Analysis of Psychological Concept Extraction and Classification in User-penned Text | Muskan Garg et.al. | 2401.06709v1 | null |
2024-01-12 | On the existence of charged electrostatic black holes in arbitrary topology | Martin Reiris et.al. | 2401.06702v1 | null |
2024-01-11 | Distilling Vision-Language Models on Millions of Videos | Yue Zhao et.al. | 2401.06129v1 | null |
2024-01-11 | Dubbing for Everyone: Data-Efficient Visual Dubbing using Neural Rendering Priors | Jack Saunders et.al. | 2401.06126v1 | null |
2024-01-11 | Gaussian Shadow Casting for Neural Characters | Luis Bolanos et.al. | 2401.06116v1 | null |
2024-01-11 | A Closer Look at AUROC and AUPRC under Class Imbalance | Matthew B. A. McDermott et.al. | 2401.06091v1 | link |
2024-01-12 | LEGO:Language Enhanced Multi-modal Grounding Model | Zhaowei Li et.al. | 2401.06071v2 | link |
2024-01-11 | On the Power of Graph Neural Networks and Feature Augmentation Strategies to Classify Social Networks | Walid Guettala et.al. | 2401.06048v1 | null |
2024-01-11 | RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks | Partha Ghosh et.al. | 2401.06035v1 | null |
2024-01-11 | Attention to detail: inter-resolution knowledge distillation | Rocío del Amor et.al. | 2401.06010v1 | link |
2024-01-11 | Sea ice detection using concurrent multispectral and synthetic aperture radar imagery | Martin S J Rogers et.al. | 2401.06009v1 | null |
2024-01-11 | Boosting Mixed-Initiative Co-Creativity in Game Design: A Tutorial | Solange Margarido et.al. | 2401.05999v1 | null |
2024-01-10 | Towards Online Sign Language Recognition and Translation | Ronglai Zuo et.al. | 2401.05336v1 | link |
2024-01-10 | ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video | Kevin Cai et.al. | 2401.05314v1 | link |
2024-01-10 | Strategic Client Selection to Address Non-IIDness in HAPS-enabled FL Networks | Amin Farajzadeh et.al. | 2401.05308v1 | null |
2024-01-10 | Frame-like Fourier expansions for finite Borel measures on |
Chad Berner et.al. | 2401.05243v1 | null |
2024-01-10 | Learning effective good variables from physical data | Giulio Barletta et.al. | 2401.05226v1 | link |
2024-01-10 | TOVAC: Tele-operated Vehicle Admission Control and Routing | Jorge Martín-Pérez et.al. | 2401.05225v1 | null |
2024-01-10 | Do Vision and Language Encoders Represent the World Similarly? | Mayug Maniparambil et.al. | 2401.05224v1 | null |
2024-01-10 | Exploring Vulnerabilities of No-Reference Image Quality Assessment Models: A Query-Based Black-Box Method | Chenxi Yang et.al. | 2401.05217v1 | null |
2024-01-10 | Pre-trained Large Language Models for Financial Sentiment Analysis | Wei Luo et.al. | 2401.05215v1 | link |
2024-01-10 | A Novel Prompt-tuning Method: Incorporating Scenario-specific Concepts into a Verbalizer | Yong Ma et.al. | 2401.05204v1 | null |
2024-01-09 | A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars | Ronglai Zuo et.al. | 2401.04730v1 | link |
2024-01-09 | U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation | Jun Ma et.al. | 2401.04722v1 | null |
2024-01-09 | Helicoidal surfaces of prescribed mean curvature in |
Aires Eduardo Menani Barbieri et.al. | 2401.04721v1 | null |
2024-01-09 | Low-resource finetuning of foundation models beats state-of-the-art in histopathology | Benedikt Roth et.al. | 2401.04720v1 | null |
2024-01-09 | Jump Cut Smoothing for Talking Heads | Xiaojuan Wang et.al. | 2401.04718v1 | null |
2024-01-09 | NIPn CHIPS | Blaise Boissonneau et.al. | 2401.04697v1 | null |
2024-01-09 | CoordGate: Efficiently Computing Spatially-Varying Convolutions in Convolutional Neural Networks | Sunny Howard et.al. | 2401.04680v1 | null |
2024-01-09 | Benchmark Analysis of Various Pre-trained Deep Learning Models on ASSIRA Cats and Dogs Dataset | Galib Muhammad Shahriar Himel et.al. | 2401.04666v1 | null |
2024-01-09 | DepressionEmo: A novel dataset for multilabel classification of depression emotions | Abu Bakar Siddiqur Rahman et.al. | 2401.04655v1 | link |
2024-01-09 | Hold 'em and Fold 'em: Towards Human-scale, Feedback-Controlled Soft Origami Robots | Immanuel Ampomah Mensah et.al. | 2401.04650v1 | null |
2024-01-08 | Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning | Chen Zhao et.al. | 2401.04105v1 | null |
2024-01-08 | RudolfV: A Foundation Model by Pathologists for Pathologists | Jonas Dippel et.al. | 2401.04079v1 | null |
2024-01-08 | Variance Reduction in Ratio Metrics for Efficient Online Experiments | Shubham Baweja et.al. | 2401.04062v1 | null |
2024-01-08 | Bjøntegaard Delta (BD): A Tutorial Overview of the Metric, Evolution, Challenges, and Recommendations | Nabajeet Barman et.al. | 2401.04039v1 | null |
2024-01-08 | Blocks whose defect groups are Suzuki |
Charles W. Eaton et.al. | 2401.04028v1 | null |
2024-01-08 | IDoFew: Intermediate Training Using Dual-Clustering in Language Models for Few Labels Text Classification | Abdullah Alsuhaibani et.al. | 2401.04025v1 | null |
2024-01-08 | Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification | Wentao Zhu et.al. | 2401.04023v1 | null |
2024-01-08 | Resident space object detection method based on the connection between Fourier spectrum of the video data difference frame and the linear velocity projection | V. S. Baranova et.al. | 2401.04021v1 | null |
2024-01-09 | Recognizing Blazars Using Radio Morphology from the VLA Sky Survey | Zhang-Liang Xie et.al. | 2401.04009v2 | null |
2024-01-08 | Calabi-Yau Varieties via Cyclic Covers, and Complex Hyperbolic Structures for their Moduli Spaces | Chenglong Yu et.al. | 2401.04006v1 | null |
2024-01-05 | Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively | Haobo Yuan et.al. | 2401.02955v1 | link |
2024-01-05 | The Dark Energy Survey Supernova Program: Cosmological Analysis and Systematic Uncertainties | M. Vincenzi et.al. | 2401.02945v1 | null |
2024-01-05 | Digital-analog quantum learning on Rydberg atom arrays | Jonathan Z. Lu et.al. | 2401.02940v1 | null |
2024-01-05 | Mixing Magnetic and Electric Ehlers-Harrison transformations: The Electromagnetic Swirling Spacetime and Novel Type I Backgrounds | José Barrientos et.al. | 2401.02924v1 | null |
2024-01-05 | Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks | Kevin Everson et.al. | 2401.02921v1 | null |
2024-01-05 | Analytically-Driven Resource Management for Cloud-Native Microservices | Yanqi Zhang et.al. | 2401.02920v1 | null |
2024-01-05 | Introducing Bode: A Fine-Tuned Large Language Model for Portuguese Prompt-Based Task | Gabriel Lino Garcia et.al. | 2401.02909v1 | null |
2024-01-05 | Robust Bichromatic Classification using Two Lines | Erwin Glazenburg et.al. | 2401.02897v1 | null |
2024-01-05 | Particle-Wise Higher-Order SPH Field Approximation for DVR | Jonathan Fischer et.al. | 2401.02896v1 | null |
2024-01-05 | Nonlinear functional regression by functional deep neural network with kernel embedding | Zhongjie Shi et.al. | 2401.02890v1 | null |
2024-01-04 | asimulation: Domain formation and impact on observables in resolved cosmological simulations of the (a)symmetron | Øyvind Christiansen et.al. | 2401.02410v1 | link |
2024-01-04 | Gravitational waves from dark domain walls | Øyvind Christiansen et.al. | 2401.02409v1 | link |
2024-01-05 | Correctness Comparison of ChatGPT-4, Bard, Claude-2, and Copilot for Spatial Tasks | Hartwig H. Hochmair et.al. | 2401.02404v2 | null |
2024-01-04 | 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation | Zihao Xiao et.al. | 2401.02402v1 | null |
2024-01-04 | Analyzing Misinformation Claims During the 2022 Brazilian General Election on WhatsApp, Twitter, and Kwai | Scott A. Hale et.al. | 2401.02395v1 | null |
2024-01-04 | Image denoising and model-independent parameterization for improving IVIM MRI | Caleb Sample et.al. | 2401.02394v1 | null |
2024-01-04 | Survey of 3D Human Body Pose and Shape Estimation Methods for Contemporary Dance Applications | Darshan Venkatrayappa et.al. | 2401.02383v1 | null |
2024-01-04 | A novel method to enhance pneumonia detection via a model-level ensembling of CNN and vision transformer | Sandeep Angara et.al. | 2401.02358v1 | null |
2024-01-04 | ClassWise-SAM-Adapter: Parameter Efficient Fine-tuning Adapts Segment Anything to SAR Domain for Semantic Segmentation | Xinyang Pu et.al. | 2401.02326v1 | link |
2024-01-04 | Reflection physics in X-ray-emitting Symbiotic Stars | Jesús A. Toalá et.al. | 2401.02318v1 | null |
2024-01-03 | Profinite equivariant spectra and their tensor-triangular geometry | Scott Balchin et.al. | 2401.01878v1 | null |
2024-01-03 | A spatial mixture model for spaceborne lidar observations over mixed forest and non-forest land types | Paul B. May et.al. | 2401.01848v1 | null |
2024-01-03 | Teaching with a companion: the case of gravity | Iuliia Zhurakovskaia et.al. | 2401.01832v1 | null |
2024-01-03 | Iterative Mask Filling: An Effective Text Augmentation Method Using Masked Language Modeling | Himmet Toprak Kesgin et.al. | 2401.01830v1 | null |
2024-01-03 | Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions | David Junhao Zhang et.al. | 2401.01827v1 | link |
2024-01-03 | Detours for Navigating Instructional Videos | Kumar Ashutosh et.al. | 2401.01823v1 | null |
2024-01-03 | SENS3: Multisensory Database of Finger-Surface Interactions and Corresponding Sensations | Jagan K. Balasubramanian et.al. | 2401.01818v1 | null |
2024-01-03 | Signal Processing in the Retina: Interpretable Graph Classifier to Predict Ganglion Cell Responses | Yasaman Parhizkar et.al. | 2401.01813v1 | null |
2024-01-03 | Efficient Computation of Confidence Sets Using Classification on Equidistributed Grids | Lujie Zhou et.al. | 2401.01804v1 | null |
2024-01-03 | An experimental sorting method for improving metagenomic data encoding | Diogo Pratas et.al. | 2401.01786v1 | null |
2024-01-02 | Street Gaussians for Modeling Dynamic Urban Scenes | Yunzhi Yan et.al. | 2401.01339v1 | null |
2024-01-02 | Classifying Words with 3-sort Automata | Tomasz Jastrząb et.al. | 2401.01314v1 | null |
2024-01-03 | A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models | S. M Towhidul Islam Tonmoy et.al. | 2401.01313v2 | null |
2024-01-02 | Integrating Edges into U-Net Models with Explainable Activation Maps for Brain Tumor Segmentation using MR Images | Subin Sahayam et.al. | 2401.01303v1 | null |
2024-01-02 | Nicola Novello et.al. | 2401.01268v1 | link | |
2024-01-02 | VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM | Fuchen Long et.al. | 2401.01256v1 | null |
2024-01-02 | An operational approach to classifying measurement incompatibility | Arun Kumar Das et.al. | 2401.01236v1 | null |
2024-01-03 | Distribution Matching for Multi-Task Learning of Classification Tasks: a Large-Scale Study on Faces & Beyond | Dimitrios Kollias et.al. | 2401.01219v2 | null |
2024-01-02 | FGENet: Fine-Grained Extraction Network for Congested Crowd Counting | Hao-Yuan Ma et.al. | 2401.01208v1 | null |
2024-01-02 | Whole-examination AI estimation of fetal biometrics from 20-week ultrasound scans | Lorenzo Venturini et.al. | 2401.01201v1 | null |
2023-12-29 | Computational Tools for Trees in Gauge Theory and Gravity | Jacob L. Bourjaily et.al. | 2312.17745v1 | null |
2023-12-29 | Multiscale Vision Transformers meet Bipartite Matching for efficient single-stage Action Localization | Ioanna Ntinou et.al. | 2312.17686v1 | null |
2023-12-29 | Malware Detection in IOT Systems Using Machine Learning Techniques | Ali Mehrban et.al. | 2312.17683v1 | null |
2023-12-29 | FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis | Feng Liang et.al. | 2312.17681v1 | null |
2023-12-29 | Grasping, Part Identification, and Pose Refinement in One Shot with a Tactile Gripper | Joyce Xin-Yan Lim et.al. | 2312.17650v1 | null |
2023-12-29 | MoD2T:Model-Data-Driven Motion-Static Object Tracking Method | Yang Feng et.al. | 2312.17641v1 | null |
2023-12-29 | A New Explanation of the Mechanism of Hadley Circulation | Wei Huang et.al. | 2312.17637v1 | null |
2023-12-29 | Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training | Dongfang Li et.al. | 2312.17591v1 | null |
2023-12-29 | A Tool for the Procedural Generation of Shaders using Interactive Evolutionary Algorithms | Elio Sasso et.al. | 2312.17587v1 | link |
2023-12-29 | Distribution-based Low-rank Embedding | Bardia Yousefi et.al. | 2312.17579v1 | null |
2023-12-28 | A Simple LLM Framework for Long-Range Video Question-Answering | Ce Zhang et.al. | 2312.17235v1 | null |
2023-12-28 | 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency | Yuyang Yin et.al. | 2312.17225v1 | null |
2023-12-28 | EFHQ: Multi-purpose ExtremePose-Face-HQ dataset | Trung Tuan Dao et.al. | 2312.17205v1 | null |
2023-12-28 | One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts | Ziheng Zhao et.al. | 2312.17183v1 | null |
2023-12-28 | Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action | Jiasen Lu et.al. | 2312.17172v1 | null |
2023-12-28 | Classification of multiplication modules over multiplication rings with finitely many minimal primes | Volodymyr Bavula et.al. | 2312.17170v1 | null |
2023-12-28 | Securing NextG Systems against Poisoning Attacks on Federated Learning: A Game-Theoretic Solution | Yalin E. Sagduyu et.al. | 2312.17164v1 | null |
2023-12-28 | Replica Tree-based Federated Learning using Limited Data | Ramona Ghilea et.al. | 2312.17159v1 | null |
2023-12-29 | ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe | Yifan Bai et.al. | 2312.17133v2 | null |
2023-12-28 | Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos | Houlun Chen et.al. | 2312.17117v1 | null |
2023-12-26 | Microwave signal processing using an analog quantum reservoir computer | Alen Senanian et.al. | 2312.16166v1 | null |
2023-12-26 | Large-scale Long-tailed Disease Diagnosis on Radiology Images | Qiaoyu Zheng et.al. | 2312.16151v1 | null |
2023-12-27 | The Media Bias Taxonomy: A Systematic Literature Review on the Forms and Automated Detection of Media Bias | Timo Spinde et.al. | 2312.16148v2 | link |
2023-12-26 | The non-Abelian Aharonov-Bohm effect | P. A. Horvathy et.al. | 2312.16133v1 | null |
2023-12-26 | LangSplat: 3D Language Gaussian Splatting | Minghan Qin et.al. | 2312.16084v1 | null |
2023-12-26 | AdaNAS: Adaptively Post-processing with Self-supervised Neural Architecture Search for Ensemble Rainfall Forecasts | Yingpeng Wen et.al. | 2312.16046v1 | null |
2023-12-26 | An extended asymmetric sigmoid with Perceptron (SIGTRON) for imbalanced linear classification | Hyenkyun Woo et.al. | 2312.16043v1 | null |
2023-12-26 | Multi-scale Progressive Feature Embedding for Accurate NIR-to-RGB Spectral Domain Translation | Xingxing Yang et.al. | 2312.16040v1 | null |
2023-12-26 | Plug-and-Play Regularization on Magnitude with Deep Priors for 3D Near-Field MIMO Imaging | Okyanus Oral et.al. | 2312.16024v1 | null |
2023-12-26 | Classification of positive solutions of Hardy-Sobolev equation without the finite volume constraints | Lu Chen et.al. | 2312.16017v1 | null |
2023-12-25 | Training Convolutional Neural Networks with the Forward-Forward algorithm | Riccardo Scodellaro et.al. | 2312.14924v2 | null |
2023-12-22 | DRStageNet: Deep Learning for Diabetic Retinopathy Staging from Fundus Images | Yevgeniy Men et.al. | 2312.14891v1 | null |
2023-12-22 | On rate-optimal classification from non-private and from private data | Balázs Csanád Csáji et.al. | 2312.14889v1 | null |
2023-12-22 | Classification of cubic tricirculant nut graphs | Ivan Damnjanović et.al. | 2312.14884v1 | null |
2023-12-22 | Neural-network-based regularization methods for inverse problems in imaging | Andreas Habring et.al. | 2312.14849v1 | null |
2023-12-22 | Classification of 3-GNDB Graphs | Amir Hosseini et.al. | 2312.14835v1 | null |
2023-12-22 | Dreaming of Electrical Waves: Generative Modeling of Cardiac Excitation Waves using Diffusion Models | Tanish Baranwal et.al. | 2312.14830v1 | null |
2023-12-22 | Classification of generalised higher-order Einstein-Maxwell Lagrangians | Aimeric Colléaux et.al. | 2312.14814v1 | null |
2023-12-22 | On support vector machines under a multiple-cost scenario | Sandra Benítez-Peña et.al. | 2312.14795v1 | null |
2023-12-22 | The Rate-Distortion-Perception-Classification Tradeoff: Joint Source Coding and Modulation via Inverse-Domain GANs | Junli Fang et.al. | 2312.14792v1 | null |
2023-12-21 | 3D Pose Estimation of Two Interacting Hands from a Monocular Event Camera | Christen Millerdurai et.al. | 2312.14157v1 | null |
2023-12-21 | Virtual Pets: Animatable Animal Generation in 3D Scenes | Yen-Chi Cheng et.al. | 2312.14154v1 | null |
2023-12-21 | TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification | Qinying Liu et.al. | 2312.14149v1 | link |
2023-12-21 | HeadCraft: Modeling High-Detail Shape Variations for Animated 3DMMs | Artem Sevastopolsky et.al. | 2312.14140v1 | null |
2023-12-21 | Revisiting Foreground and Background Separation in Weakly-supervised Temporal Action Localization: A Clustering-based Approach | Qinying Liu et.al. | 2312.14138v1 | link |
2023-12-21 | Diffusion Reward: Learning Rewards via Conditional Video Diffusion | Tao Huang et.al. | 2312.14134v1 | null |
2023-12-21 | WellFactor: Patient Profiling using Integrative Embedding of Healthcare Data | Dongjin Choi et.al. | 2312.14129v1 | null |
2023-12-21 | VideoPoet: A Large Language Model for Zero-Shot Video Generation | Dan Kondratyuk et.al. | 2312.14125v1 | null |
2023-12-21 | LingoQA: Video Question Answering for Autonomous Driving | Ana-Maria Marcu et.al. | 2312.14115v1 | link |
2023-12-21 | LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding | Senqiao Yang et.al. | 2312.14074v1 | null |
2023-12-20 | Deep Learning on 3D Neural Fields | Pierluigi Zama Ramirez et.al. | 2312.13277v1 | null |
2023-12-20 | The 1/4-BPS building blocks of brane interactions | Ben Eckardt et.al. | 2312.13269v1 | null |
2023-12-20 | ClassLIE: Structure- and Illumination-Adaptive Classification for Low-Light Image Enhancement | Zixiang Wei et.al. | 2312.13265v1 | null |
2023-12-20 | Putting the p back in Prym | Jeff Achter et.al. | 2312.13263v1 | null |
2023-12-20 | The role of data embedding in equivariant quantum convolutional neural networks | Sreetama Das et.al. | 2312.13250v1 | null |
2023-12-20 | Enhancing Neural Training via a Correlated Dynamics Model | Jonathan Brokman et.al. | 2312.13247v1 | null |
2023-12-20 | SISMIK for brain MRI: Deep-learning-based motion estimation and model-based motion correction in k-space | Oscar Dabrowski et.al. | 2312.13220v1 | null |
2023-12-20 | Boost recall in QSO selection from highly imbalanced photometric datasets | Giorgio Calderone et.al. | 2312.13194v1 | null |
2023-12-20 | Ergodic measures for periodic type |
Yuriy Tumarkin et.al. | 2312.13165v1 | null |
2023-12-20 | Underwater Acoustic Signal Recognition Based on Salient Features | Minghao Chen et.al. | 2312.13143v1 | null |
2023-12-19 | Tracking Any Object Amodally | Cheng-Yen Hsieh et.al. | 2312.12433v1 | null |
2023-12-19 | The Endoscapes Dataset for Surgical Scene Segmentation, Object Detection, and Critical View of Safety Assessment: Official Splits and Benchmark | Aditya Murali et.al. | 2312.12429v1 | null |
2023-12-19 | Chasing Fairness in Graphs: A GNN Architecture Perspective | Zhimeng Jiang et.al. | 2312.12369v1 | link |
2023-12-19 | Easy quantum groups | Teo Banica et.al. | 2312.12368v1 | null |
2023-12-19 | SMC-NCA: Semantic-guided Multi-level Contrast for Semi-supervised Action Segmentation | Feixiang Zhou et.al. | 2312.12347v1 | null |
2023-12-19 | On the Effectiveness of Retrieval, Alignment, and Replay in Manipulation | Norman Di Palo et.al. | 2312.12345v1 | null |
2023-12-19 | Full-reference Video Quality Assessment for User Generated Content Transcoding | Zihao Qi et.al. | 2312.12317v1 | null |
2023-12-19 | First qualitative observations on deep learning vision model YOLO and DETR for automated driving in Austria | Stefan Schoder et.al. | 2312.12314v1 | null |
2023-12-19 | Holography of New Conformal Higher Spin Gravities in 3d | I. Lovrekovic et.al. | 2312.12301v1 | null |
2023-12-19 | Prompt-based Domain Discrimination for Multi-source Time Series Domain Adaptation | Junxiang Wang et.al. | 2312.12276v1 | null |
2023-12-18 | Development and Evaluation of Ensemble Learning-based Environmental Methane Detection and Intensity Prediction Models | Reek Majumder et.al. | 2312.10879v1 | null |
2023-12-18 | Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation | Hui Fu et.al. | 2312.10877v1 | null |
2023-12-17 | Global relaxation-based LP-Newton method for multiple hyperparameter selection in support vector classification with feature selection | Qingna Li et.al. | 2312.10848v1 | null |
2023-12-17 | Online Boosting Adaptive Learning under Concept Drift for Multistream Classification | En Yu et.al. | 2312.10841v1 | null |
2023-12-17 | Learning to Act without Actions | Dominik Schmidt et.al. | 2312.10812v1 | null |
2023-12-17 | Land use/land cover classification of fused Sentinel-1 and Sentinel-2 imageries using ensembles of Random Forests | Shivam Pande et.al. | 2312.10798v1 | null |
2023-12-17 | Learning to Learn in Interactive Constraint Acquisition | Dimos Tsouros et.al. | 2312.10795v1 | null |
2023-12-17 | Identification of Knowledge Neurons in Protein Language Models | Divya Nori et.al. | 2312.10770v1 | null |
2023-12-17 | Multi-Label Classification of COVID-Tweets Using Large Language Models | Aniket Deroy et.al. | 2312.10748v1 | link |
2023-12-17 | Unmasking Deepfake Faces from Videos Using An Explainable Cost-Sensitive Deep Learning Approach | Faysal Mahmud et.al. | 2312.10740v1 | link |
2023-12-15 | Understanding Probe Behaviors through Variational Bounds of Mutual Information | Kwanghee Choi et.al. | 2312.10019v1 | link |
2023-12-15 | Wearable Coaxially-shielded Metamaterial for Magnetic Resonance Imaging | Xia Zhu et.al. | 2312.10018v1 | null |
2023-12-15 | On the Invertibility of Euler Integral Transforms with Hyperplanes and Quadric Hypersurfaces | Mattie Ji et.al. | 2312.10002v1 | null |
2023-12-15 | Towards Architecture-Insensitive Untrained Network Priors for Accelerated MRI Reconstruction | Yilin Liu et.al. | 2312.09988v1 | null |
2023-12-15 | DHFormer: A Vision Transformer-Based Attention Module for Image Dehazing | Abdul Wasi et.al. | 2312.09955v1 | null |
2023-12-15 | Multi-level graph learning for audio event classification and human-perceived annoyance rating prediction | Yuanbo Hou et.al. | 2312.09952v1 | null |
2023-12-15 | LogoStyleFool: Vitiating Video Recognition Systems via Logo Style Transfer | Yuxin Cao et.al. | 2312.09935v1 | link |
2023-12-15 | RDR: the Recap, Deliberate, and Respond Method for Enhanced Language Understanding | Yuxin Zi et.al. | 2312.09932v1 | null |
2023-12-15 | Reliable Probabilistic Classification with Neural Networks | Harris Papadopoulos et.al. | 2312.09912v1 | null |
2023-12-15 | TMP: Temporal Motion Propagation for Online Video Super-Resolution | Zhengqiang Zhang et.al. | 2312.09909v1 | null |
2023-12-14 | 3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting | Zhiyin Qian et.al. | 2312.09228v1 | null |
2023-12-14 | Efficient Online Learning of Contact Force Models for Connector Insertion | Kevin Tracy et.al. | 2312.09190v1 | null |
2023-12-14 | General Object Foundation Model for Images and Videos at Scale | Junfeng Wu et.al. | 2312.09158v1 | null |
2023-12-14 | Evaluating Augmented Reality Communication: How Can We Teach Procedural Skill in AR? | Manuel Rebol et.al. | 2312.09152v1 | null |
2023-12-14 | Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting | Anthony Chen et.al. | 2312.09148v1 | null |
2023-12-14 | Class-Wise Buffer Management for Incremental Object Detection: An Effective Buffer Training Strategy | Junsu Kim et.al. | 2312.09139v1 | null |
2023-12-14 | Less is more -- the Dispatcher/ Executor principle for multi-task Reinforcement Learning | Martin Riedmiller et.al. | 2312.09120v1 | null |
2023-12-14 | VideoLCM: Video Latent Consistency Model | Xiang Wang et.al. | 2312.09109v1 | null |
2023-12-14 | FastInject: Injecting Unpaired Text Data into CTC-based ASR training | Keqi Deng et.al. | 2312.09100v1 | null |
2023-12-14 | Agent Attention: On the Integration of Softmax and Linear Attention | Dongchen Han et.al. | 2312.08874v1 | link |
2023-12-13 | VLAP: Efficient Video-Language Alignment via Frame Prompting and Distilling for Video Question Answering | Xijun Wang et.al. | 2312.08367v1 | null |
2023-12-13 | Challenges and Opportunities in Implementing Negative Differential Resistance Mode Reconfigurable Field Effect Transistors | Lephe S et.al. | 2312.08351v1 | null |
2023-12-13 | Ehancing CT Image synthesis from multi-modal MRI data based on a multi-task neural network framework | Zhuoyao Xin et.al. | 2312.08343v1 | null |
2023-12-13 | Preparing VVC for Streaming: A Fast Multi-Rate Encoding Approach | Yiqun Liu et.al. | 2312.08330v1 | null |
2023-12-13 | Affine monoids of corank one | Yulia Zaitseva et.al. | 2312.08316v1 | null |
2023-12-13 | VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space | Guénolé Fiche et.al. | 2312.08291v1 | null |
2023-12-13 | PhenDiff: Revealing Invisible Phenotypes with Conditional Diffusion Models | Anis Bourou et.al. | 2312.08290v1 | link |
2023-12-13 | On the verification of Embeddings using Hybrid Markov Logic | Anup Shakya et.al. | 2312.08287v1 | null |
2023-12-14 | High-throughput Biomedical Relation Extraction for Semi-Structured Web Articles Empowered by Large Language Models | Songchi Zhou et.al. | 2312.08274v2 | null |
2023-12-13 | Efficient Multi-Object Pose Estimation using Multi-Resolution Deformable Attention and Query Aggregation | Arul Selvam Periyasamy et.al. | 2312.08268v1 | null |
2023-12-12 | diff History for Long-Context Language Agents | Ulyana Piterbarg et.al. | 2312.07540v1 | null |
2023-12-12 | FreeInit: Bridging Initialization Gap in Video Diffusion Models | Tianxing Wu et.al. | 2312.07537v1 | link |
2023-12-12 | WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion | Soyong Shin et.al. | 2312.07531v1 | null |
2023-12-12 | RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation | Peng Lu et.al. | 2312.07526v1 | link |
2023-12-12 | PEEKABOO: Interactive Video Generation via Masked-Diffusion | Yash Jain et.al. | 2312.07509v1 | null |
2023-12-12 | NAC-TCN: Temporal Convolutional Networks with Causal Dilated Neighborhood Attention for Emotion Understanding | Alexander Mehta et.al. | 2312.07507v1 | link |
2023-12-12 | COLMAP-Free 3D Gaussian Splatting | Yang Fu et.al. | 2312.07504v1 | null |
2023-12-12 | NearbyPatchCL: Leveraging Nearby Patches for Self-Supervised Patch-Level Multi-Class Classification in Whole-Slide Images | Gia-Bao Le et.al. | 2312.07489v1 | null |
2023-12-12 | MinD-3D: Reconstruct High-quality 3D objects in Human Brain | Jianxiong Gao et.al. | 2312.07485v1 | null |
2023-12-12 | Classification of retail products: From probabilistic ranking to neural networks | Manar Mohamed Hafez et.al. | 2312.07482v1 | null |
2023-12-11 | Photorealistic Video Generation with Diffusion Models | Agrim Gupta et.al. | 2312.06662v1 | null |
2023-12-11 | LightSim: Neural Lighting Simulation for Urban Scenes | Ava Pun et.al. | 2312.06654v1 | null |
2023-12-11 | Beyond Classification: Definition and Density-based Estimation of Calibration in Object Detection | Teodora Popordanoska et.al. | 2312.06645v1 | null |
2023-12-11 | Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution | Shangchen Zhou et.al. | 2312.06640v1 | null |
2023-12-12 | TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation | Rongkun Zheng et.al. | 2312.06630v2 | link |
2023-12-11 | Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism | Georgios Milis et.al. | 2312.06613v1 | link |
2023-12-11 | Early Action Recognition with Action Prototypes | Guglielmo Camporese et.al. | 2312.06598v1 | null |
2023-12-11 | Flexible visual prompts for in-context learning in computer vision | Thomas Foster et.al. | 2312.06592v1 | link |
2023-12-11 | QuickQuakeBuildings: Post-earthquake SAR-Optical Dataset for Quick Damaged-building Detection | Yao Sun et.al. | 2312.06587v1 | null |
2023-12-12 | ESO/HARPS Radial Velocities Catalog | Mauro Barbieri et.al. | 2312.06586v2 | null |
2023-12-08 | The Long Secondary Period (LSP) Variables: Overview and Some Analysis | John R. Percy et.al. | 2312.05255v1 | null |
2023-12-08 | Few-Shot Class-Incremental Learning via Training-Free Prototype Calibration | Qi-Wei Wang et.al. | 2312.05229v1 | null |
2023-12-08 | Shape Matters: Detecting Vertebral Fractures Using Differentiable Point-Based Shape Decoding | Hellena Hempe et.al. | 2312.05220v1 | link |
2023-12-08 | Enhancing Facial Classification and Recognition using 3D Facial Models and Deep Learning | Houting Li et.al. | 2312.05219v1 | null |
2023-12-08 | IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing | Shaofei Wang et.al. | 2312.05210v1 | null |
2023-12-08 | Embedding theory in ML toward real-time tracking of structural dynamics through hyperspectral datasets | Jonathan D Hollenbach et.al. | 2312.05201v1 | null |
2023-12-08 | Video-Based Rendering Techniques: A Survey | Rafael Kuffner dos Anjos et.al. | 2312.05179v1 | null |
2023-12-08 | Enhancing Single-Frame Supervision for Better Temporal Action Localization | Changjian Chen et.al. | 2312.05178v1 | null |
2023-12-08 | MRI Scan Synthesis Methods based on Clustering and Pix2Pix | Giulia Baldini et.al. | 2312.05176v1 | null |
2023-12-08 | TriHuman : A Real-time and Controllable Tri-plane Representation for Detailed Human Geometry and Appearance Synthesis | Heming Zhu et.al. | 2312.05161v1 | null |
2023-12-07 | GenDeF: Learning Generative Deformation Field for Video Generation | Wen Wang et.al. | 2312.04561v1 | null |
2023-12-07 | MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar | Yufan Chen et.al. | 2312.04558v1 | null |
2023-12-07 | GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation | Shoufa Chen et.al. | 2312.04557v1 | null |
2023-12-07 | SPIDeRS: Structured Polarization for Invisible Depth and Reflectance Sensing | Tomoki Ichikawa et.al. | 2312.04553v1 | null |
2023-12-07 | PlayFusion: Skill Acquisition via Diffusion from Language-Annotated Play | Lili Chen et.al. | 2312.04549v1 | null |
2023-12-07 | Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve Aerial Visual Perception? | Aritra Dutta et.al. | 2312.04548v1 | null |
2023-12-07 | Dream2Real: Zero-Shot 3D Object Rearrangement with Vision-Language Models | Ivan Kapelyukh et.al. | 2312.04533v1 | null |
2023-12-07 | Camera Height Doesn't Change: Unsupervised Monocular Scale-Aware Road-Scene Depth Estimation | Genki Kinoshita et.al. | 2312.04530v1 | null |
2023-12-07 | RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models | Ozgur Kara et.al. | 2312.04524v1 | link |
2023-12-07 | Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation | Zhiwu Qing et.al. | 2312.04483v1 | null |
2023-12-06 | OneLLM: One Framework to Align All Modalities with Language | Jiaming Han et.al. | 2312.03700v1 | link |
2023-12-07 | Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers | Umberto Cappellazzo et.al. | 2312.03694v2 | null |
2023-12-06 | Direct Exoplanet Detection Using Deep Convolutional Image Reconstruction (ConStruct): A New Algorithm for Post-Processing High-Contrast Images | Trevor N. Wolf et.al. | 2312.03671v1 | null |
2023-12-06 | Annihilating branching Brownian motion | Daniel Ahlberg et.al. | 2312.03669v1 | null |
2023-12-06 | Towards small and accurate convolutional neural networks for acoustic biodiversity monitoring | Serge Zaugg et.al. | 2312.03666v1 | null |
2023-12-06 | Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving | Ming Nie et.al. | 2312.03661v1 | link |
2023-12-06 | Editable Stain Transformation Of Histological Images Using Unpaired GANs | Tibor Sloboda et.al. | 2312.03647v1 | link |
2023-12-06 | MotionCtrl: A Unified and Flexible Motion Controller for Video Generation | Zhouxia Wang et.al. | 2312.03641v1 | null |
2023-12-06 | Training Neural Networks on RAW and HDR Images for Restoration Tasks | Lei Luo et.al. | 2312.03640v1 | link |
2023-12-07 | Evaluation of Active Feature Acquisition Methods for Static Feature Settings | Henrik von Kleist et.al. | 2312.03619v2 | null |
2023-12-05 | Dexterous Functional Grasping | Ananye Agarwal et.al. | 2312.02975v1 | null |
2023-12-05 | Describing Differences in Image Sets with Natural Language | Lisa Dunlap et.al. | 2312.02974v1 | link |
2023-12-05 | GauHuman: Articulated Gaussian Splatting from Monocular Human Videos | Shoukang Hu et.al. | 2312.02973v1 | link |
2023-12-05 | Detecting algorithmic bias in medical AI-models | Jeffrey Smith et.al. | 2312.02959v1 | null |
2023-12-05 | Classification for everyone : Building geography agnostic models for fairer recognition | Akshat Jindal et.al. | 2312.02957v1 | null |
2023-12-05 | Choroidalyzer: An open-source, end-to-end pipeline for choroidal analysis in optical coherence tomography | Justin Engelmann et.al. | 2312.02956v1 | null |
2023-12-05 | An alternating peak-optimization method for optimal trajectory generation of quadrotor drones | Wytze A. B. de Vries et.al. | 2312.02944v1 | null |
2023-12-05 | Fast CT anatomic localization algorithm | Amit Oved et.al. | 2312.02941v1 | null |
2023-12-05 | Drag-A-Video: Non-rigid Video Editing with Point-based Interaction | Yao Teng et.al. | 2312.02936v1 | null |
2023-12-06 | WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation | Jiachen Lu et.al. | 2312.02934v2 | link |
2023-12-04 | iMatching: Imperative Correspondence Learning | Zitong Zhan et.al. | 2312.02141v1 | null |
2023-12-04 | Fast View Synthesis of Casual Videos | Yao-Chih Lee et.al. | 2312.02135v1 | null |
2023-12-04 | GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians | Liangxiao Hu et.al. | 2312.02134v1 | null |
2023-12-04 | Hot PATE: Private Aggregation of Distributions for Diverse Task | Edith Cohen et.al. | 2312.02132v1 | null |
2023-12-04 | Can we truly transfer an actor's genuine happiness to avatars? An investigation into virtual, real, posed and spontaneous faces | Vitor Miguel Xavier Peres et.al. | 2312.02128v1 | null |
2023-12-04 | Cosmic star-formation history and black hole accretion history inferred from the JWST mid-infrared source counts | Seong Jin Kim et.al. | 2312.02090v1 | null |
2023-12-05 | VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence | Yuchao Gu et.al. | 2312.02087v2 | null |
2023-12-04 | Integrating AI into CCTV Systems: A Comprehensive Evaluation of Smart Video Surveillance in Community Space | Shanle Yao et.al. | 2312.02078v1 | null |
2023-12-04 | GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians | Shenhan Qian et.al. | 2312.02069v1 | null |
2023-12-04 | TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding | Shuhuai Ren et.al. | 2312.02051v1 | null |
2023-12-01 | Dense Optical Tracking: Connecting the Dots | Guillaume Le Moing et.al. | 2312.00786v1 | null |
2023-12-01 | Sequential Modeling Enables Scalable Learning for Large Vision Models | Yutong Bai et.al. | 2312.00785v1 | null |
2023-12-01 | MorpheuS: Neural Dynamic 360° Surface Reconstruction from Monocular RGB-D Video | Hengyi Wang et.al. | 2312.00778v1 | null |
2023-12-01 | VideoBooth: Diffusion-based Video Generation with Image Prompts | Yuming Jiang et.al. | 2312.00777v1 | null |
2023-12-01 | Towards Generalizable Zero-Shot Manipulation via Translating Human Interaction Plans | Homanga Bharadhwaj et.al. | 2312.00775v1 | null |
2023-12-01 | Explaining Knock-on Effects of Bias Mitigation | Svetoslav Nizhnichenkov et.al. | 2312.00765v1 | null |
2023-12-04 | Deep Unlearning: Fast and Efficient Training-free Approach to Controlled Forgetting | Sangamesh Kodge et.al. | 2312.00761v2 | null |
2023-12-01 | Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals | Tam Nguyen et.al. | 2312.00751v1 | null |
2023-12-01 | Tight-minimal dichotomies in Banach spaces | Alejandra C. Cáceres-Rigo et.al. | 2312.00721v1 | null |
2023-12-01 | GIFT: Generative Interpretable Fine-Tuning Transformers | Chinmay Savadikar et.al. | 2312.00700v1 | link |
2023-11-30 | Just Add |
Dominick Reilly et.al. | 2311.18840v1 | null |
2023-11-30 | TrafficMOT: A Challenging Dataset for Multi-Object Tracking in Complex Traffic Scenarios | Lihao Liu et.al. | 2311.18839v1 | null |
2023-11-30 | VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models | Zhen Xing et.al. | 2311.18837v1 | null |
2023-11-30 | ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models | Wenming Weng et.al. | 2311.18834v1 | null |
2023-11-30 | MotionEditor: Editing Video Motion via Content-Aware Diffusion | Shuyuan Tu et.al. | 2311.18830v1 | link |
2023-11-30 | MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation | Yanhui Wang et.al. | 2311.18829v1 | null |
2023-11-30 | Motion-Conditioned Image Animation for Video Editing | Wilson Yan et.al. | 2311.18827v1 | null |
2023-11-30 | CAST: Cross-Attention in Space and Time for Video Action Recognition | Dongho Lee et.al. | 2311.18825v1 | link |
2023-11-30 | Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking | Kaifeng Lyu et.al. | 2311.18817v1 | link |
2023-11-30 | BIOCLIP: A Vision Foundation Model for the Tree of Life | Samuel Stevens et.al. | 2311.18803v1 | null |
2023-11-30 | Do text-free diffusion models learn discriminative visual representations? | Soumik Mukhopadhyay et.al. | 2311.17921v2 | null |
2023-11-29 | Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving | Yuqi Wang et.al. | 2311.17918v1 | link |
2023-11-29 | HUGS: Human Gaussian Splats | Muhammed Kocabas et.al. | 2311.17910v1 | null |
2023-11-29 | SODA: Bottleneck Diffusion Models for Representation Learning | Drew A. Hudson et.al. | 2311.17901v1 | null |
2023-11-30 | Knowledge Pursuit Prompting for Zero-Shot Multimodal Synthesis | Jinqi Luo et.al. | 2311.17898v2 | null |
2023-11-29 | On the geometry of tensor products over finite fields | Stefano Lia et.al. | 2311.17896v1 | null |
2023-11-29 | Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation | Shuangrui Ding et.al. | 2311.17893v1 | null |
2023-11-29 | TSDF-Sampling: Efficient Sampling for Neural Surface Field using Truncated Signed Distance Field | Chaerin Min et.al. | 2311.17878v1 | null |
2023-11-29 | Enhancing Post-Hoc Explanation Benchmark Reliability for Image Classification | Tristan Gomez et.al. | 2311.17876v1 | null |
2023-11-29 | On the Adversarial Robustness of Graph Contrastive Learning Methods | Filippo Guerranti et.al. | 2311.17853v1 | null |
2023-11-28 | Panoptic Video Scene Graph Generation | Jingkang Yang et.al. | 2311.17058v1 | link |
2023-11-28 | Self-Supervised Motion Magnification by Backpropagating Through Optical Flow | Zhaoying Pan et.al. | 2311.17056v1 | null |
2023-11-28 | MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training | Pavan Kumar Anasosalu Vasu et.al. | 2311.17049v1 | null |
2023-11-28 | Jets of foliations and |
Francis Bischoff et.al. | 2311.17045v1 | null |
2023-11-28 | LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models | Yanwei Li et.al. | 2311.17043v1 | link |
2023-11-29 | Efficient In-Context Learning in Vision-Language Models for Egocentric Videos | Keunwoo Peter Yu et.al. | 2311.17041v2 | null |
2023-11-28 | Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer | Danah Yatim et.al. | 2311.17009v1 | null |
2023-11-28 | MVBench: A Comprehensive Multi-modal Video Understanding Benchmark | Kunchang Li et.al. | 2311.17005v1 | link |
2023-11-28 | Mirković-Vilonen Polytopes from Combinatorics | Mario Sanchez et.al. | 2311.16979v1 | null |
2023-11-28 | Natural Language Processing Through Transfer Learning: A Case Study on Sentiment Analysis | Aman Yadav et.al. | 2311.16965v1 | null |
2023-11-28 | Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models | Munan Ning et.al. | 2311.16103v2 | link |
2023-11-27 | GART: Gaussian Articulated Template Models | Jiahui Lei et.al. | 2311.16099v1 | null |
2023-11-27 | On Bringing Robots Home | Nur Muhammad Mahi Shafiullah et.al. | 2311.16098v1 | link |
2023-11-27 | CG-HOI: Contact-Guided 3D Human-Object Interaction Generation | Christian Diller et.al. | 2311.16097v1 | null |
2023-11-27 | Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling | Zhe Li et.al. | 2311.16096v1 | link |
2023-11-27 | Three-dimensional |
Alexander C. Tyner et.al. | 2311.16092v1 | null |
2023-11-27 | BERT Goes Off-Topic: Investigating the Domain Transfer Challenge using Genre Classification | Dmitri Roussinov et.al. | 2311.16083v1 | link |
2023-11-27 | ViT-Lens-2: Gateway to Omni-modal Intelligence | Weixian Lei et.al. | 2311.16081v1 | link |
2023-11-27 | Correlated Spectral and Recurrence Variations of Cygnus X-1 | E. M. Broadbent et.al. | 2311.16070v1 | null |
2023-11-27 | DiffSLVA: Harnessing Diffusion Models for Sign Language Video Anonymization | Zhaoyang Xia et.al. | 2311.16060v1 | link |
2023-11-24 | SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation | Lingchen Meng et.al. | 2311.14671v1 | link |
2023-11-24 | JetLOV: Enhancing Jet Tree Tagging through Neural Network Learning of Optimal LundNet Variables | Mauricio A. Diaz et.al. | 2311.14654v1 | link |
2023-11-24 | Learning in Deep Factor Graphs with Gaussian Belief Propagation | Seth Nabarro et.al. | 2311.14649v1 | null |
2023-11-24 | Continuous football player tracking from discrete broadcast data | Matthew J. Penn et.al. | 2311.14642v1 | null |
2023-11-24 | Emergent Topology in Many-Body Dissipative Quantum Chaos | Antonio M. García-García et.al. | 2311.14640v1 | null |
2023-11-24 | Unsupervised high-throughput segmentation of cells and cell nuclei in quantitative phase images | Julia Sistermanns et.al. | 2311.14639v1 | null |
2023-11-24 | ARIA: On the interaction between Architectures, Aggregation methods and Initializations in federated visual classification | Vasilis Siomos et.al. | 2311.14625v1 | null |
2023-11-24 | Neural Style Transfer for Computer Games | Eleftherios Ioannou et.al. | 2311.14617v1 | null |
2023-11-24 | Animate124: Animating One Image to 4D Dynamic Scene | Yuyang Zhao et.al. | 2311.14603v1 | null |
2023-11-24 | A Metalearned Neural Circuit for Nonparametric Bayesian Inference | Jake C. Snell et.al. | 2311.14601v1 | link |
2023-11-22 | WildFusion: Learning 3D-Aware Latent Diffusion Models in View Space | Katja Schwarz et.al. | 2311.13570v1 | null |
2023-11-22 | Belted sum decompositions of fully augmented links | Porter Morgan et.al. | 2311.13540v1 | null |
2023-11-22 | Learned Nonlinear Predictor for Critically Sampled 3D Point Cloud Attribute Compression | Tam Thuc Do et.al. | 2311.13539v1 | null |
2023-11-22 | Leveraging CNNs and Ensemble Learning for Automated Disaster Image Classification | Archit Rathod et.al. | 2311.13531v1 | null |
2023-11-22 | Applying Dimensionality Reduction as Precursor to LSTM-CNN Models for Classifying Imagery and Motor Signals in ECoG-Based BCIs | Soham Bafana et.al. | 2311.13507v1 | link |
2023-11-22 | Current Topological and Machine Learning Applications for Bias Detection in Text | Colleen Farrelly et.al. | 2311.13495v1 | null |
2023-11-22 | Benchmarking Toxic Molecule Classification using Graph Neural Networks and Few Shot Learning | Bhavya Mehta et.al. | 2311.13490v1 | null |
2023-11-22 | Deep-learning-based acceleration of MRI for radiotherapy planning of pediatric patients with brain tumors | Shahinur Alam et.al. | 2311.13485v1 | link |
2023-11-22 | Solution discovery via reconfiguration for problems in P | Mario Grobler et.al. | 2311.13478v1 | null |
2023-11-22 | Experimentation in Early-Stage Video Game Startups: Practices and Challenges | Henry Edison et.al. | 2311.13462v1 | null |
2023-11-21 | Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models | David Stotko et.al. | 2311.12796v1 | null |
2023-11-21 | Quantifying Impairment and Disease Severity Using AI Models Trained on Healthy Subjects | Boyang Yu et.al. | 2311.12781v1 | link |
2023-11-21 | Swift Parameter-free Attention Network for Efficient Super-Resolution | Cheng Wan et.al. | 2311.12770v1 | link |
2023-11-22 | Investigating Weight-Perturbed Deep Neural Networks With Application in Iris Presentation Attack Detection | Renu Sharma et.al. | 2311.12764v2 | link |
2023-11-21 | High-resolution Image-based Malware Classification using Multiple Instance Learning | Tim Peters et.al. | 2311.12760v1 | link |
2023-11-21 | SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction | Yuanhui Huang et.al. | 2311.12754v1 | link |
2023-11-21 | Image Transformation for IoT Time-Series Data: A Review | Duygu Altunkaya et.al. | 2311.12742v1 | null |
2023-11-21 | Exploring Graph Classification Techniques Under Low Data Constraints: A Comprehensive Study | Kush Kothari et.al. | 2311.12737v1 | null |
2023-11-21 | Not Just Training, Also Testing: High School Youths' Perspective-Taking through Peer Testing Machine Learning-Powered Applications | L. Morales-Navarro et.al. | 2311.12733v1 | null |
2023-11-21 | Cascade Learning Localises Discriminant Features in Visual Scene Classification | Junwen Wang et.al. | 2311.12704v1 | null |
2023-11-20 | Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation | Wenhao Li et.al. | 2311.12028v1 | null |
2023-11-20 | GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration | Naoki Wake et.al. | 2311.12015v1 | null |
2023-11-20 | Evaluating Supervision Levels Trade-Offs for Infrared-Based People Counting | David Latortue et.al. | 2311.11974v1 | null |
2023-11-20 | SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks | Jin Ye et.al. | 2311.11969v1 | link |
2023-11-20 | Correlated Attention in Transformers for Multivariate Time Series | Quang Minh Nguyen et.al. | 2311.11959v1 | null |
2023-11-20 | Tubular Curvature Filter: Implicit Pointwise Curvature Calculation Method for Tubular Objects | Elifnur Sunger et.al. | 2311.11931v1 | null |
2023-11-20 | LLMs as Visual Explainers: Advancing Image Classification with Evolving Visual Descriptions | Songhao Han et.al. | 2311.11904v1 | null |
2023-11-20 | Multimodal Characterization of Emotion within Multimedia Space | Dayo Samuel Banjo et.al. | 2311.11892v1 | null |
2023-11-20 | SniffyArt: The Dataset of Smelling Persons | Mathias Zinnen et.al. | 2311.11888v1 | null |
2023-11-20 | Multi-Task Faces (MTF) Data Set: A Legally and Ethically Compliant Collection of Face Images for Various Classification Tasks | Rami Haffar et.al. | 2311.11882v1 | link |
2023-11-17 | Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning | Rohit Girdhar et.al. | 2311.10709v1 | null |
2023-11-17 | SpACNN-LDVAE: Spatial Attention Convolutional Latent Dirichlet Variational Autoencoder for Hyperspectral Pixel Unmixing | Soham Chitnis et.al. | 2311.10701v1 | null |
2023-11-17 | A note on the convergence of the Bayesian entropy estimator for exchangeable partitions | Servet Martinez et.al. | 2311.10698v1 | null |
2023-11-17 | Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections | Lihan Zha et.al. | 2311.10678v1 | link |
2023-11-17 | 3D-TexSeg: Unsupervised Segmentation of 3D Texture using Mutual Transformer Learning | Iyyakutti Iyappan Ganapathi et.al. | 2311.10651v1 | null |
2023-11-17 | User Dynamics-Aware Edge Caching and Computing for Mobile Virtual Reality | Mushu Li et.al. | 2311.10645v1 | null |
2023-11-17 | Image-Domain Material Decomposition for Dual-energy CT using Unsupervised Learning with Data-fidelity Loss | Junbo Peng et.al. | 2311.10641v1 | null |
2023-11-17 | Scaling TabPFN: Sketching and Feature Selection for Tabular Prior-Data Fitted Networks | Benjamin Feuer et.al. | 2311.10609v1 | null |
2023-11-17 | Designing Reconfigurable Intelligent Systems with Markov Blankets | Boris Sedlak et.al. | 2311.10597v1 | null |
2023-11-17 | FOCAL: A Cost-Aware Video Dataset for Active Learning | Kiran Kokilepersaud et.al. | 2311.10591v1 | link |
2023-11-16 | Traffic Video Object Detection using Motion Prior | Lihao Liu et.al. | 2311.10092v1 | null |
2023-11-16 | Moduli space of rank three logarithmic connections on the projective line with three poles | Takafumi Matsumoto et.al. | 2311.10071v1 | null |
2023-11-16 | Inherently Interpretable Time Series Classification via Multiple Instance Learning | Joseph Early et.al. | 2311.10049v1 | link |
2023-11-16 | On the potential of Carbon-Enhanced Metal-Poor stars for Galactic Archaeology | Aruna Goswami et.al. | 2311.10043v1 | null |
2023-11-16 | Match and Locate: low-frequency monocular odometry based on deep feature matching | Stepan Konev et.al. | 2311.10034v1 | null |
2023-11-16 | Revolutionizing Customer Interactions: Insights and Challenges in Deploying ChatGPT and Generative Chatbots for FAQs | Feriel Khennouche et.al. | 2311.09976v1 | null |
2023-11-16 | From Pretext to Purpose: Batch-Adaptive Self-Supervised Learning | Jiansong Zhang et.al. | 2311.09974v1 | null |
2023-11-16 | VertDetect: Fully End-to-End 3D Vertebral Instance Segmentation Model | Geoff Klein et.al. | 2311.09958v1 | null |
2023-11-16 | Harnessing Transformers: A Leap Forward in Lung Cancer Image Detection | Amine Bechar et.al. | 2311.09942v1 | null |
2023-11-17 | A Framework for Monitoring and Retraining Language Models in Real-World Applications | Jaykumar Kasundra et.al. | 2311.09930v2 | null |
2023-11-15 | Single-Image 3D Human Digitization with Shape-Guided Diffusion | Badour AlBahar et.al. | 2311.09221v1 | null |
2023-11-15 | ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy | Kirill Vishniakov et.al. | 2311.09215v1 | link |
2023-11-15 | Topology of Pulsar Profiles (ToPP). I. Graph theory method and classification of the EPN | D. Vohl et.al. | 2311.09201v1 | null |
2023-11-15 | ExpM+NF: Differentially Private Machine Learning that Surpasses DPSGD | Robert A. Bridges et.al. | 2311.09200v1 | null |
2023-11-15 | Domain Aligned CLIP for Few-shot Classification | Muhammad Waleed Gondal et.al. | 2311.09191v1 | null |
2023-11-15 | ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models | Jierui Li et.al. | 2311.09182v1 | null |
2023-11-15 | RBPGAN: Recurrent Back-Projection GAN for Video Super Resolution | Dareen Hussein et.al. | 2311.09178v1 | null |
2023-11-15 | Model Agnostic Explainable Selective Regression via Uncertainty Estimation | Andrea Pugnana et.al. | 2311.09145v1 | null |
2023-11-15 | Explainable Text Classification Techniques in Legal Document Review: Locating Rationales without Using Human Annotated Training Text Snippets | Christian Mahoney et.al. | 2311.09133v1 | null |
2023-11-15 | Cross-view and Cross-pose Completion for 3D Human Understanding | Matthieu Armando et.al. | 2311.09104v1 | null |
2023-11-14 | MVSA-Net: Multi-View State-Action Recognition for Robust and Deployable Trajectory Generation | Ehsan Asali et.al. | 2311.08393v1 | null |
2023-11-14 | USLR: an open-source tool for unbiased and smooth longitudinal registration of brain MR | Adrià Casamitjana et.al. | 2311.08371v1 | link |
2023-11-14 | Inverse Learning with Extremely Sparse Feedback for Recommendation | Guanyu Lin et.al. | 2311.08302v1 | null |
2023-11-14 | Level Set KSVD | Omer Sapir et.al. | 2311.08284v1 | null |
2023-11-14 | TENT: Connect Language Models with IoT Sensors for Zero-Shot Activity Recognition | Yunjiao Zhou et.al. | 2311.08245v1 | null |
2023-11-14 | MCMC to address model misspecification in Deep Learning classification of Radio Galaxies | Devina Mohan et.al. | 2311.08243v1 | null |
2023-11-14 | Learning Physics-Inspired Regularization for Medical Image Registration with Hypernetworks | Anna Reithmeir et.al. | 2311.08239v1 | link |
2023-11-14 | Counterfactual Explanation for Regression via Disentanglement in Latent Space | Xuan Zhao et.al. | 2311.08228v1 | null |
2023-11-14 | Uni-COAL: A Unified Framework for Cross-Modality Synthesis and Super-Resolution of MR Images | Zhiyun Song et.al. | 2311.08225v1 | null |
2023-11-14 | Eval-GCSC: A New Metric for Evaluating ChatGPT's Performance in Chinese Spelling Correction | Kunting Li et.al. | 2311.08219v1 | link |
2023-11-13 | GPT-4V(ision) as A Social Media Analysis Engine | Hanjia Lyu et.al. | 2311.07547v1 | link |
2023-11-13 | mlscorecheck: Testing the consistency of reported performance scores and experiments in machine learning | György Kovács et.al. | 2311.07541v1 | null |
2023-11-13 | FEMDA: a unified framework for discriminant analysis | Pierre Houdouin et.al. | 2311.07518v1 | null |
2023-11-13 | Reducing the Need for Backpropagation and Discovering Better Optima With Explicit Optimizations of Neural Networks | Jake Ryland Williams et.al. | 2311.07498v1 | null |
2023-11-13 | Towards Robotic Tree Manipulation: Leveraging Graph Representations | Chung Hee Kim et.al. | 2311.07479v1 | null |
2023-11-13 | Temporal Performance Prediction for Deep Convolutional Long Short-Term Memory Networks | Laura Fieback et.al. | 2311.07477v1 | null |
2023-11-13 | Masked Face Dataset Generation and Masked Face Recognition | Rui Cai et.al. | 2311.07475v1 | link |
2023-11-13 | A Bayesian Approach to Strong Lens Finding in the Era of Wide-area Surveys | Philip Holloway et.al. | 2311.07455v1 | null |
2023-11-13 | On the Robustness of Neural Collapse and the Neural Collapse of Robustness | Jingtong Su et.al. | 2311.07444v1 | null |
2023-11-13 | Optimising Human-AI Collaboration by Learning Convincing Explanations | Alex J. Chan et.al. | 2311.07426v1 | null |
2023-11-10 | Learning Human Action Recognition Representations Without Real Humans | Howard Zhong et.al. | 2311.06231v1 | link |
2023-11-10 | Semantic-aware Video Representation for Few-shot Action Recognition | Yutao Tang et.al. | 2311.06218v1 | null |
2023-11-10 | MultiIoT: Towards Large-scale Multisensory Learning for the Internet of Things | Shentong Mo et.al. | 2311.06217v1 | null |
2023-11-10 | Deep learning segmentation of fibrous cap in intravascular optical coherence tomography images | Juhwan Lee et.al. | 2311.06202v1 | null |
2023-11-10 | An Automated Pipeline for Tumour-Infiltrating Lymphocyte Scoring in Breast Cancer | Adam J Shephard et.al. | 2311.06185v1 | link |
2023-11-10 | Automatic Report Generation for Histopathology images using pre-trained Vision Transformers | Saurav Sengupta et.al. | 2311.06176v1 | null |
2023-11-10 | Two vertex geometrically irreducible algebras | Grzegorz Bobinski et.al. | 2311.06173v1 | null |
2023-11-10 | Time Scale Network: A Shallow Neural Network For Time Series Data | Trevor Meyer et.al. | 2311.06170v1 | null |
2023-11-10 | Deep Fast Vision: A Python Library for Accelerated Deep Transfer Learning Vision Prototyping | Fabi Prezja et.al. | 2311.06169v1 | link |
2023-11-10 | Going beyond persistent homology using persistent homology | Johanna Immonen et.al. | 2311.06152v1 | null |
2023-11-09 | FogROS2-Sky: Optimizing Latency and Cost for Multi-Cloud Robot Applications | Kaiyuan Chen et.al. | 2311.05600v1 | null |
2023-11-09 | A Coefficient Makes SVRG Effective | Yida Yin et.al. | 2311.05589v1 | link |
2023-11-09 | Outlier-Robust Wasserstein DRO | Sloan Nietert et.al. | 2311.05573v1 | link |
2023-11-09 | Exploring Emotion Expression Recognition in Older Adults Interacting with a Virtual Coach | Cristina Palmero et.al. | 2311.05567v1 | null |
2023-11-09 | Disentangling Quantum and Classical Contributions in Hybrid Quantum Machine Learning Architectures | Michael Kölle et.al. | 2311.05559v1 | null |
2023-11-09 | L-WaveBlock: A Novel Feature Extractor Leveraging Wavelets for Generative Adversarial Networks | Mirat Shah et.al. | 2311.05548v1 | null |
2023-11-09 | BakedAvatar: Baking Neural Fields for Real-Time Head Avatar Synthesis | Hao-Bin Duan et.al. | 2311.05521v1 | null |
2023-11-09 | Dirichlet Active Learning | Kevin Miller et.al. | 2311.05501v1 | null |
2023-11-09 | Retinal OCT Synthesis with Denoising Diffusion Probabilistic Models for Layer Segmentation | Yuli Wu et.al. | 2311.05479v1 | null |
2023-11-09 | Robust Retraining-free GAN Fingerprinting via Personalized Normalization | Jianwei Fei et.al. | 2311.05478v1 | null |
2023-11-08 | Towards Few-Annotation Learning in Computer Vision: Application to Image Classification and Object Detection tasks | Quentin Bouniot et.al. | 2311.04888v1 | null |
2023-11-08 | Are foundation models efficient for medical image segmentation? | Danielle Ferreira et.al. | 2311.04847v1 | null |
2023-11-08 | Bayesian multi-band fitting of alerts for kilonovae detection | Biswajit Biswas et.al. | 2311.04845v1 | null |
2023-11-08 | Hierarchically Gated Recurrent Neural Network for Sequence Modeling | Zhen Qin et.al. | 2311.04823v1 | link |
2023-11-08 | A Lightweight Architecture for Real-Time Neuronal-Spike Classification | Muhammad Ali Siddiqi et.al. | 2311.04808v1 | null |
2023-11-08 | Determination of toxic comments and unintended model bias minimization using Deep learning approach | Md Azim Khan et.al. | 2311.04789v1 | null |
2023-11-08 | VioLA: Aligning Videos to 2D LiDAR Scans | Jun-Jee Chao et.al. | 2311.04783v1 | null |
2023-11-08 | FetMRQC: an open-source machine learning framework for multi-centric fetal brain MRI quality control | Thomas Sanchez et.al. | 2311.04780v1 | link |
2023-11-08 | GCS-ICHNet: Assessment of Intracerebral Hemorrhage Prognosis using Self-Attention with Domain Knowledge Integration | Xuhao Shan et.al. | 2311.04772v1 | link |
2023-11-08 | An attention-based deep learning network for predicting Platinum resistance in ovarian cancer | Haoming Zhuang et.al. | 2311.04769v1 | null |
2023-11-08 | Video Instance Matting | Jiachen Li et.al. | 2311.04212v2 | link |
2023-11-07 | JPAVE: A Generation and Classification-based Model for Joint Product Attribute Prediction and Value Extraction | Zhongfen Deng et.al. | 2311.04196v1 | link |
2023-11-07 | Linear to circular conversion in the polarized radio emission of a magnetar | Marcus E. Lower et.al. | 2311.04195v1 | null |
2023-11-07 | SpaDeLeF: A Dataset for Hierarchical Classification of Lexical Functions for Collocations in Spanish | Yevhen Kostiuk et.al. | 2311.04189v1 | null |
2023-11-07 | A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis | Dipanjyoti Paul et.al. | 2311.04157v1 | link |
2023-11-07 | Galaxy Spectra neural Network (GaSNet). II. Using Deep Learning for Spectral Classification and Redshift Predictions | Fucheng Zhong et.al. | 2311.04146v1 | null |
2023-11-07 | I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models | Shiwei Zhang et.al. | 2311.04145v1 | null |
2023-11-07 | Modelling Sentiment Analysis: LLMs and data augmentation techniques | Guillem Senabre Prades et.al. | 2311.04139v1 | null |
2023-11-07 | Improved Topological Preservation in 3D Axon Segmentation and Centerline Detection using Geometric Assessment-driven Topological Smoothing (GATS) | Nina I. Shamsi et.al. | 2311.04116v1 | null |
2023-11-07 | Joint modelling of recurrent and terminal events with discretely-distributed non-parametric frailty: application on re-hospitalizations and death in heart failure patients | Chiara Masci et.al. | 2311.04103v1 | null |
2023-11-06 | A Classification of Graphs through Quadratic Embedding Constants and Clique Graph Insights | Edy Tri Baskoro et.al. | 2311.03342v1 | null |
2023-11-06 | Tackling Concept Shift in Text Classification using Entailment-style Modeling | Sumegh Roychowdhury et.al. | 2311.03320v1 | null |
2023-11-06 | A Foundation Model for Music Informatics | Minz Won et.al. | 2311.03318v1 | link |
2023-11-06 | FATE: Feature-Agnostic Transformer-based Encoder for learning generalized embedding spaces in flow cytometry data | Lisa Weijler et.al. | 2311.03314v1 | link |
2023-11-06 | A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose Estimation | Qitao Zhao et.al. | 2311.03312v1 | null |
2023-11-06 | Advancing Post Hoc Case Based Explanation with Feature Highlighting | Eoin Kenny et.al. | 2311.03246v1 | null |
2023-11-06 | Machine Learning-Based Tea Leaf Disease Detection: A Comprehensive Review | Faruk Ahmed et.al. | 2311.03240v1 | null |
2023-11-06 | Out-of-distribution Detection Learning with Unreliable Out-of-distribution Sources | Haotian Zheng et.al. | 2311.03236v1 | null |
2023-11-06 | Segmentation of Drone Collision Hazards in Airborne RADAR Point Clouds Using PointNet | Hector Arroyo et.al. | 2311.03221v1 | null |
2023-11-06 | Leveraging Transformers to Improve Breast Cancer Classification and Risk Assessment with Multi-modal and Longitudinal Data | Yiqiu Shen et.al. | 2311.03217v1 | null |
2023-11-03 | LOTUS: Continual Imitation Learning for Robot Manipulation Through Unsupervised Skill Discovery | Weikang Wan et.al. | 2311.02058v1 | null |
2023-11-03 | MetaFast: Enabling Fast Metagenomic Classification via Seed Counting and Edit Distance Approximation | Arvid E. Gollwitzer et.al. | 2311.02029v1 | null |
2023-11-03 | A Structured Pruning Algorithm for Model-based Deep Learning | Chicago Park et.al. | 2311.02003v1 | null |
2023-11-03 | Detection of keratoconus Diseases using deep Learning | AKM Enzam-Ul Haque et.al. | 2311.01996v1 | null |
2023-11-03 | Obtaining Explainable Classification Models using Distributionally Robust Optimization | Sanjeeb Dash et.al. | 2311.01994v1 | null |
2023-11-03 | Leveraging Large-Scale Pretrained Vision Foundation Models for Label-Efficient 3D Point Cloud Segmentation | Shichao Dong et.al. | 2311.01989v1 | null |
2023-11-06 | RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches | Jiayuan Gu et.al. | 2311.01977v2 | null |
2023-11-03 | Welded graphs, Wirtinger groups and knotted punctured spheres | Benjamin Audoux et.al. | 2311.01922v1 | null |
2023-11-03 | Contrast-Agnostic Groupwise Registration by Robust PCA for Quantitative Cardiac MRI | Xinqi Li et.al. | 2311.01916v1 | null |
2023-11-03 | VQPy: An Object-Oriented Approach to Modern Video Analytics | Shan Yu et.al. | 2311.01623v1 | null |
2023-11-02 | Tailoring Mixup to Data using Kernel Warping functions | Quentin Bouniot et.al. | 2311.01434v1 | link |
2023-11-02 | Identifying Alzheimer Disease Dementia Levels Using Machine Learning Methods | Md Gulzar Hussain et.al. | 2311.01428v1 | null |
2023-11-02 | Exploring Deep Learning Techniques for Glaucoma Detection: A Comprehensive Review | Aized Amin Soofi et.al. | 2311.01425v1 | null |
2023-11-02 | Holistic Transfer: Towards Non-Disruptive Fine-Tuning with Partial Target Data | Cheng-Hao Tu et.al. | 2311.01420v1 | null |
2023-11-02 | Learning to See Physical Properties with Active Sensing Motor Policies | Gabriel B. Margolis et.al. | 2311.01405v1 | null |
2023-11-02 | Sim2Real Bilevel Adaptation for Object Surface Classification using Vision-Based Tactile Sensors | Gabriele M. Caddeo et.al. | 2311.01380v1 | link |
2023-11-02 | Deep learning based Image Compression for Microscopy Images: An Empirical Study | Yu Zhou et.al. | 2311.01352v1 | null |
2023-11-02 | Unreading Race: Purging Protected Features from Chest X-ray Embeddings | Tobias Weber et.al. | 2311.01349v1 | null |
2023-11-02 | Scattering Vision Transformer: Spectral Mixing Matters | Badri N. Patro et.al. | 2311.01310v1 | null |
2023-11-02 | Hybrid-Fusion Transformer for Multisequence MRI | Jihoon Cho et.al. | 2311.01308v1 | null |
2023-11-01 | Software Repositories and Machine Learning Research in Cyber Security | Mounika Vanamala et.al. | 2311.00691v1 | null |
2023-11-01 | What User Behaviors Make the Differences During the Process of Visual Analytics? | Shahin Doroudian et.al. | 2311.00690v1 | null |
2023-11-01 | Deep Learning-Based Classification of Gamma Photon Interactions in Room-Temperature Semiconductor Radiation Detectors | Sandeep K. Chaudhuri et.al. | 2311.00682v1 | null |
2023-11-01 | Latent Space Translation via Semantic Alignment | Valentino Maiorca et.al. | 2311.00664v1 | link |
2023-11-01 | Rediscussion of eclipsing binaries. Paper XV. The B-type supergiant system V1765 Cygni | John Southworth et.al. | 2311.00655v1 | null |
2023-11-02 | Emergence of Collective Open-Ended Exploration from Decentralized Meta-Reinforcement Learning | Richard Bornemann et.al. | 2311.00651v2 | null |
2023-11-01 | Understanding the Issues and Causes in WebAssembly Application Development: A Mining-based Study | Muhammad Waseem et.al. | 2311.00646v1 | null |
2023-11-01 | A Bi-level Framework for Traffic Accident Duration Prediction: Leveraging Weather and Road Condition Data within a Practical Optimum Pipeline | Rafat Tabassum Sukonna et.al. | 2311.00634v1 | null |
2023-11-01 | Controllable Music Production with Diffusion Models and Guidance Gradients | Mark Levy et.al. | 2311.00613v1 | null |
2023-11-01 | A Robust Deep Learning Method with Uncertainty Estimation for the Pathological Classification of Renal Cell Carcinoma based on CT Images | Ni Yao et.al. | 2311.00567v1 | null |
2023-10-31 | Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders | Srijan Das et.al. | 2310.20704v1 | null |
2023-10-31 | SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction | Xinyuan Chen et.al. | 2310.20700v1 | null |
2023-10-31 | StairNet: Visual Recognition of Stairs for Human-Robot Locomotion | Andrew Garrett Kurbis et.al. | 2310.20666v1 | null |
2023-10-31 | Performance Improvement in Multi-class Classification via Automated Hierarchy Generation and Exploitation through Extended LCPN Schemes | Celal Alagoz et.al. | 2310.20641v1 | null |
2023-10-31 | Deepfake detection by exploiting surface anomalies: the SurFake approach | Andrea Ciamarra et.al. | 2310.20621v1 | null |
2023-10-31 | Enhanced Synthetic MRI Generation from CT Scans Using CycleGAN with Feature Extraction | Saba Nikbakhsh et.al. | 2310.20604v1 | null |
2023-10-31 | Finiteness properties for Shimura curves and modified diagonal cycles | Congling Qiu et.al. | 2310.20600v1 | null |
2023-10-31 | Brain-like Flexible Visual Inference by Harnessing Feedback-Feedforward Alignment | Tahereh Toosi et.al. | 2310.20599v1 | link |
2023-10-31 | Tracially Complete C-Algebras* | José R. Carrión et.al. | 2310.20594v1 | null |
2023-10-31 | Strongly Magnetized Tidal Disruption Event Disks via Stream Injection in GRMHD | Brandon Curd et.al. | 2310.20592v1 | null |
2023-10-29 | Improved Motor Imagery Classification Using Adaptive Spatial Filters Based on Particle Swarm Optimization Algorithm | Xiong Xiong et.al. | 2310.19202v1 | null |
2023-10-29 | Enhancing Motor Imagery Decoding in Brain Computer Interfaces using Riemann Tangent Space Mapping and Cross Frequency Coupling | Xiong Xiong et.al. | 2310.19198v1 | null |
2023-10-29 | A Survey on Watching Social Issue Videos among YouTube and TikTok Users | Shuo Niu et.al. | 2310.19193v1 | null |
2023-10-29 | Subjective Quality Evaluation of Point Clouds Using a Head Mounted Display | Joao Prazeres et.al. | 2310.19179v1 | null |
2023-10-29 | Robustifying Language Models with Test-Time Adaptation | Noah Thomas McDermott et.al. | 2310.19177v1 | null |
2023-10-29 | Predicting recovery following stroke: deep learning, multimodal data and feature selection using explainable AI | Adam White et.al. | 2310.19174v1 | null |
2023-10-29 | BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping | Srikumar Sastry et.al. | 2310.19168v1 | link |
2023-10-29 | Unified Representation for Non-compositional and Compositional Expressions | Ziheng Zeng et.al. | 2310.19127v1 | null |
2023-10-29 | Efficient IoT Inference via Context-Awareness | Mohammad Mehdi Rastikerdar et.al. | 2310.19112v1 | null |
2023-10-29 | Pushdown Layers: Encoding Recursive Structure in Transformer Language Models | Shikhar Murty et.al. | 2310.19089v1 | null |
2023-10-27 | Addressing GAN Training Instabilities via Tunable Classification Losses | Monica Welfert et.al. | 2310.18291v1 | null |
2023-10-27 | PlantPlotGAN: A Physics-Informed Generative Adversarial Network for Plant Disease Prediction | Felipe A. Lopes et.al. | 2310.18268v1 | null |
2023-10-27 | MalFake: A Multimodal Fake News Identification for Malayalam using Recurrent Neural Networks and VGG-16 | Adhish S. Sujan et.al. | 2310.18263v1 | null |
2023-10-27 | Edge AI-Based Vein Detector for Efficient Venipuncture in the Antecubital Fossa | Edwi |
-
Notifications
You must be signed in to change notification settings - Fork 21
DWCTOD/cv-arxiv-daily
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published