This repository hosts the code and datasets for LLaSA (Large Language and Sensor Assistant), a multimodal large language model that integrates inertial measurement units (IMUs) with natural language understanding. Built on LIMU-BERT and Llama, LLaSA is designed to interpret and respond to complex queries about human activities and motion by combining sensor data with contextual reasoning.
- SensorCaps: A dataset of 35,960 IMU-derived activity narrations enriched with handcrafted features.
- OpenSQA: An instruction-following dataset containing 179,727 question-answer pairs, tailored for sensor- and activity-aware contexts.
- LLaSA integrates IMU data with natural language processing capabilities, leveraging multimodal inputs for nuanced activity analysis.
- Includes advanced hyperparameter tuning to optimize performance for contextual, sensor-based question-answering tasks.
- Comprehensive evaluations, including human-led assessments, show that LLaSA outperforms GPT-3.5-Turbo and Vicuna-1.5-13b-16K in sensor-aware and context-sensitive question answering.
- We employed a hyperparameter tuning method with GPT-assisted evaluation of question-answer pairs.
LLaSA is designed to support impactful research and practical applications in:
- Personal Health: Monitoring activity patterns, providing actionable insights, and assisting in wellness routines.
- Human-Computer Interaction: Context-aware assistance and enhanced user experience through activity interpretation.
- Code: Scripts for training, fine-tuning, and evaluating the LLaSA model.
- Datasets: SensorCaps and OpenSQA Google Drive (Email address: llasa.data@gmail.com)
- Documentation: Instructions for replicating experiments and integrating LLaSA into your projects.