diff --git a/contents/labs/arduino/nicla_vision/kws/kws.qmd b/contents/labs/arduino/nicla_vision/kws/kws.qmd index 08fee3e4f..f6a03dd99 100644 --- a/contents/labs/arduino/nicla_vision/kws/kws.qmd +++ b/contents/labs/arduino/nicla_vision/kws/kws.qmd @@ -267,7 +267,7 @@ void setup() Create two functions, `turn_off_leds()` function , to turn off all RGB LEDs ``` cpp -** +/* * @brief turn_off_leds function - turn-off all RGB LEDs */ void turn_off_leds(){ @@ -280,7 +280,7 @@ void turn_off_leds(){ Another `turn_on_led()` function is used to turn on the RGB LEDs according to the most probable result of the classifier. ``` cpp -/** +/* * @brief turn_on_leds function used to turn on the RGB LEDs * @param[in] pred_index * no: [0] ==> Red ON diff --git a/contents/labs/arduino/nicla_vision/nicla_vision.qmd b/contents/labs/arduino/nicla_vision/nicla_vision.qmd index c19e7b118..50990d7ce 100644 --- a/contents/labs/arduino/nicla_vision/nicla_vision.qmd +++ b/contents/labs/arduino/nicla_vision/nicla_vision.qmd @@ -6,9 +6,9 @@ These labs provide a unique opportunity to gain practical experience with machin ## Pre-requisites -- **Nicla Vision Board**: Ensure you have the Nicla Vision board. -- **USB Cable**: For connecting the board to your computer. -- **Network**: With internet access for downloading necessary software. +- **Nicla Vision Board** : Ensure you have the Nicla Vision board. +- **USB Cable** : For connecting the board to your computer. +- **Network** : With internet access for downloading necessary software. ## Setup diff --git a/contents/labs/labs.qmd b/contents/labs/labs.qmd index 10fc84749..a61d02d5b 100644 --- a/contents/labs/labs.qmd +++ b/contents/labs/labs.qmd @@ -54,15 +54,15 @@ These labs are designed for: Each lab follows a structured approach: -1. **Introduction**: Explore the application and its significance in real-world scenarios. +1. **Introduction** : Explore the application and its significance in real-world scenarios. -2. **Setup**: Step-by-step instructions to configure the hardware and software environment. +2. **Setup** : Step-by-step instructions to configure the hardware and software environment. -3. **Deployment**: Guidance on training and deploying the pre-trained ML models on supported devices. +3. **Deployment** : Guidance on training and deploying the pre-trained ML models on supported devices. -4. **Exercises**: Hands-on tasks to modify and experiment with model parameters. +4. **Exercises** : Hands-on tasks to modify and experiment with model parameters. -5. **Discussion**: Analysis of results, potential improvements, and practical insights. +5. **Discussion** : Analysis of results, potential improvements, and practical insights. ## Troubleshooting and Support diff --git a/contents/labs/raspi/image_classification/image_classification.qmd b/contents/labs/raspi/image_classification/image_classification.qmd index 472934dba..b8631dc6f 100644 --- a/contents/labs/raspi/image_classification/image_classification.qmd +++ b/contents/labs/raspi/image_classification/image_classification.qmd @@ -777,19 +777,19 @@ This Python script creates a web-based interface for capturing and organizing im #### Key Features: -1. **Web Interface**: Accessible from any device on the same network as the Raspberry Pi. -2. **Live Camera Preview**: This shows a real-time feed from the camera. -3. **Labeling System**: Allows users to input labels for different categories of images. -4. **Organized Storage**: Automatically saves images in label-specific subdirectories. -5. **Per-Label Counters**: Keeps track of how many images are captured for each label. -6. **Summary Statistics**: Provides a summary of captured images when stopping the capture process. +1. **Web Interface** : Accessible from any device on the same network as the Raspberry Pi. +2. **Live Camera Preview** : This shows a real-time feed from the camera. +3. **Labeling System** : Allows users to input labels for different categories of images. +4. **Organized Storage** : Automatically saves images in label-specific subdirectories. +5. **Per-Label Counters** : Keeps track of how many images are captured for each label. +6. **Summary Statistics** : Provides a summary of captured images when stopping the capture process. #### Main Components: -1. **Flask Web Application**: Handles routing and serves the web interface. -2. **Picamera2 Integration**: Controls the Raspberry Pi camera. -3. **Threaded Frame Capture**: Ensures smooth live preview. -4. **File Management**: Organizes captured images into labeled directories. +1. **Flask Web Application** : Handles routing and serves the web interface. +2. **Picamera2 Integration** : Controls the Raspberry Pi camera. +3. **Threaded Frame Capture** : Ensures smooth live preview. +4. **File Management** : Organizes captured images into labeled directories. #### Key Functions: @@ -1435,10 +1435,10 @@ The code creates a web application for real-time image classification using a Ra #### Key Components: -1. **Flask Web Application**: Serves the user interface and handles requests. -2. **PiCamera2**: Captures images from the Raspberry Pi camera module. -3. **TensorFlow Lite**: Runs the image classification model. -4. **Threading**: Manages concurrent operations for smooth performance. +1. **Flask Web Application** : Serves the user interface and handles requests. +2. **PiCamera2** : Captures images from the Raspberry Pi camera module. +3. **TensorFlow Lite** : Runs the image classification model. +4. **Threading** : Manages concurrent operations for smooth performance. #### Main Features: @@ -1491,10 +1491,10 @@ The code creates a web application for real-time image classification using a Ra #### Key Concepts: -1. **Concurrent Operations**: Using threads to handle camera capture and classification separately from the web server. -2. **Real-time Updates**: Frequent updates to the classification results without page reloads. -3. **Model Reuse**: Loading the TFLite model once and reusing it for efficiency. -4. **Flexible Configuration**: Allowing users to adjust the confidence threshold on the fly. +1. **Concurrent Operations** : Using threads to handle camera capture and classification separately from the web server. +2. **Real-time Updates** : Frequent updates to the classification results without page reloads. +3. **Model Reuse** : Loading the TFLite model once and reusing it for efficiency. +4. **Flexible Configuration** : Allowing users to adjust the confidence threshold on the fly. #### Usage: diff --git a/contents/labs/raspi/llm/llm.qmd b/contents/labs/raspi/llm/llm.qmd index 66bf85d5a..d47160908 100644 --- a/contents/labs/raspi/llm/llm.qmd +++ b/contents/labs/raspi/llm/llm.qmd @@ -46,13 +46,13 @@ GenAI provides the conceptual framework for AI-driven content creation, with LLM Large Language Models (LLMs) are advanced artificial intelligence systems that understand, process, and generate human-like text. These models are characterized by their massive scale in terms of the amount of data they are trained on and the number of parameters they contain. Critical aspects of LLMs include: -1. **Size**: LLMs typically contain billions of parameters. For example, GPT-3 has 175 billion parameters, while some newer models exceed a trillion parameters. +1. **Size** : LLMs typically contain billions of parameters. For example, GPT-3 has 175 billion parameters, while some newer models exceed a trillion parameters. -2. **Training Data**: They are trained on vast amounts of text data, often including books, websites, and other diverse sources, amounting to hundreds of gigabytes or even terabytes of text. +2. **Training Data** : They are trained on vast amounts of text data, often including books, websites, and other diverse sources, amounting to hundreds of gigabytes or even terabytes of text. -3. **Architecture**: Most LLMs use [transformer-based architectures](https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)), which allow them to process and generate text by paying attention to different parts of the input simultaneously. +3. **Architecture** : Most LLMs use [transformer-based architectures](https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)), which allow them to process and generate text by paying attention to different parts of the input simultaneously. -4. **Capabilities**: LLMs can perform a wide range of language tasks without specific fine-tuning, including: +4. **Capabilities** : LLMs can perform a wide range of language tasks without specific fine-tuning, including: - Text generation - Translation - Summarization @@ -60,17 +60,17 @@ Large Language Models (LLMs) are advanced artificial intelligence systems that u - Code generation - Logical reasoning -5. **Few-shot Learning**: They can often understand and perform new tasks with minimal examples or instructions. +5. **Few-shot Learning** : They can often understand and perform new tasks with minimal examples or instructions. -6. **Resource-Intensive**: Due to their size, LLMs typically require significant computational resources to run, often needing powerful GPUs or TPUs. +6. **Resource-Intensive** : Due to their size, LLMs typically require significant computational resources to run, often needing powerful GPUs or TPUs. -7. **Continual Development**: The field of LLMs is rapidly evolving, with new models and techniques constantly emerging. +7. **Continual Development** : The field of LLMs is rapidly evolving, with new models and techniques constantly emerging. -8. **Ethical Considerations**: The use of LLMs raises important questions about bias, misinformation, and the environmental impact of training such large models. +8. **Ethical Considerations** : The use of LLMs raises important questions about bias, misinformation, and the environmental impact of training such large models. -9. **Applications**: LLMs are used in various fields, including content creation, customer service, research assistance, and software development. +9. **Applications** : LLMs are used in various fields, including content creation, customer service, research assistance, and software development. -10. **Limitations**: Despite their power, LLMs can produce incorrect or biased information and lack true understanding or reasoning capabilities. +10. **Limitations** : Despite their power, LLMs can produce incorrect or biased information and lack true understanding or reasoning capabilities. We must note that we use large models beyond text, calling them *multi-modal models*. These models integrate and process information from multiple types of input simultaneously. They are designed to understand and generate content across various forms of data, such as text, images, audio, and video. @@ -80,7 +80,7 @@ Certainly. Let's define open and closed models in the context of AI and language **Closed models**, also called proprietary models, are AI models whose internal workings, code, and training data are not publicly disclosed. Examples: GPT-4 (by OpenAI), Claude (by Anthropic), Gemini (by Google). -**Open models**, also known as open-source models, are AI models whose underlying code, architecture, and often training data are publicly available and accessible. Examples: Gemma (by Google), LLaMA (by Meta) and Phi (by Microsoft)/ +**Open models**, also known as open-source models, are AI models whose underlying code, architecture, and often training data are publicly available and accessible. Examples: Gemma (by Google), LLaMA (by Meta) and Phi (by Microsoft). Open models are particularly relevant for running models on edge devices like Raspberry Pi as they can be more easily adapted, optimized, and deployed in resource-constrained environments. Still, it is crucial to verify their Licenses. Open models come with various open-source licenses that may affect their use in commercial applications, while closed models have clear, albeit restrictive, terms of service. @@ -94,17 +94,17 @@ SLMs are compact versions of LLMs designed to run efficiently on resource-constr Key characteristics of SLMs include: -1. **Reduced parameter count**: Typically ranging from a few hundred million to a few billion parameters, compared to two-digit billions in larger models. +1. **Reduced parameter count** : Typically ranging from a few hundred million to a few billion parameters, compared to two-digit billions in larger models. -2. **Lower memory footprint**: Requiring, at most, a few gigabytes of memory rather than tens or hundreds of gigabytes. +2. **Lower memory footprint** : Requiring, at most, a few gigabytes of memory rather than tens or hundreds of gigabytes. -3. **Faster inference time**: Can generate responses in milliseconds to seconds on edge devices. +3. **Faster inference time** : Can generate responses in milliseconds to seconds on edge devices. -4. **Energy efficiency**: Consuming less power, making them suitable for battery-powered devices. +4. **Energy efficiency** : Consuming less power, making them suitable for battery-powered devices. -5. **Privacy-preserving**: Enabling on-device processing without sending data to cloud servers. +5. **Privacy-preserving** : Enabling on-device processing without sending data to cloud servers. -6. **Offline functionality**: Operating without an internet connection. +6. **Offline functionality** : Operating without an internet connection. SLMs achieve their compact size through various techniques such as knowledge distillation, model pruning, and quantization. While they may not match the broad capabilities of larger models, SLMs excel in specific tasks and domains, making them ideal for targeted applications on edge devices. @@ -120,25 +120,25 @@ For more information on SLMs, the paper, [LLM Pruning and Distillation in Practi [Ollama](https://ollama.com/) is an open-source framework that allows us to run language models (LMs), large or small, locally on our machines. Here are some critical points about Ollama: -1. **Local Model Execution**: Ollama enables running LMs on personal computers or edge devices such as the Raspi-5, eliminating the need for cloud-based API calls. +1. **Local Model Execution** : Ollama enables running LMs on personal computers or edge devices such as the Raspi-5, eliminating the need for cloud-based API calls. -2. **Ease of Use**: It provides a simple command-line interface for downloading, running, and managing different language models. +2. **Ease of Use** : It provides a simple command-line interface for downloading, running, and managing different language models. -3. **Model Variety**: Ollama supports various LLMs, including Phi, Gemma, Llama, Mistral, and other open-source models. +3. **Model Variety** : Ollama supports various LLMs, including Phi, Gemma, Llama, Mistral, and other open-source models. -4. **Customization**: Users can create and share custom models tailored to specific needs or domains. +4. **Customization** : Users can create and share custom models tailored to specific needs or domains. -5. **Lightweight**: Designed to be efficient and run on consumer-grade hardware. +5. **Lightweight** : Designed to be efficient and run on consumer-grade hardware. -6. **API Integration**: Offers an API that allows integration with other applications and services. +6. **API Integration** : Offers an API that allows integration with other applications and services. -7. **Privacy-Focused**: By running models locally, it addresses privacy concerns associated with sending data to external servers. +7. **Privacy-Focused** : By running models locally, it addresses privacy concerns associated with sending data to external servers. -8. **Cross-Platform**: Available for macOS, Windows, and Linux systems (our case, here). +8. **Cross-Platform** : Available for macOS, Windows, and Linux systems (our case, here). -9. **Active Development**: Regularly updated with new features and model support. +9. **Active Development** : Regularly updated with new features and model support. -10. **Community-Driven**: Benefits from community contributions and model sharing. +10. **Community-Driven** : Benefits from community contributions and model sharing. To learn more about what Ollama is and how it works under the hood, you should see this short video from [Matt Williams](https://www.youtube.com/@technovangelist), one of the founders of Ollama: @@ -179,7 +179,7 @@ On the [Ollama Library page](https://ollama.com/library), we can find the models ![](images/png/small_and_multimodal.png) -Let's install and run our first small language model, [Llama 3.2](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/) 1B (and 3B). The Meta Llama, 3.2 collections of multilingual large language models (LLMs), is a collection of pre-trained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text-only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. +Let's install and run our first small language model, [Llama 3.2](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/) 1B (and 3B). The Meta Llama 3.2 series comprises a set of multilingual generative language models available in 1 billion and 3 billion parameter sizes. These models are designed to process text input and generate text output. The instruction-tuned variants within this collection are specifically optimized for multilingual conversational applications, including tasks involving information retrieval and summarization with an agentic approach. When compared to many existing open-source and proprietary chat models, the Llama 3.2 instruction-tuned models demonstrate superior performance on widely-used industry benchmarks. The 1B and 3B models were pruned from the Llama 8B, and then logits from the 8B and 70B models were used as token-level targets (token-level distillation). Knowledge distillation was used to recover performance (they were trained with 9 trillion tokens). The 1B model has 1,24B, quantized to integer (Q8_0), and the 3B, 3.12B parameters, with a Q4_0 quantization, which ends with a size of 1.3 GB and 2GB, respectively. Its context window is 131,072 tokens. @@ -205,14 +205,14 @@ Using the option `--verbose` when calling the model will generate several statis Each metric gives insights into how the model processes inputs and generates outputs. Here’s a breakdown of what each metric means: -- **Total Duration (2.620170326s)**: This is the complete time taken from the start of the command to the completion of the response. It encompasses loading the model, processing the input prompt, and generating the response. -- **Load Duration (39.947908ms)**: This duration indicates the time to load the model or necessary components into memory. If this value is minimal, it can suggest that the model was preloaded or that only a minimal setup was required. -- **Prompt Eval Count (32 tokens)**: The number of tokens in the input prompt. In NLP, tokens are typically words or subwords, so this count includes all the tokens that the model evaluated to understand and respond to the query. -- **Prompt Eval Duration (1.644773s)**: This measures the model's time to evaluate or process the input prompt. It accounts for the bulk of the total duration, implying that understanding the query and preparing a response is the most time-consuming part of the process. -- **Prompt Eval Rate (19.46 tokens/s)**: This rate indicates how quickly the model processes tokens from the input prompt. It reflects the model’s speed in terms of natural language comprehension. -- **Eval Count (8 token(s))**: This is the number of tokens in the model’s response, which in this case was, “The capital of France is Paris.” -- **Eval Duration (889.941ms)**: This is the time taken to generate the output based on the evaluated input. It’s much shorter than the prompt evaluation, suggesting that generating the response is less complex or computationally intensive than understanding the prompt. -- **Eval Rate (8.99 tokens/s)**: Similar to the prompt eval rate, this indicates the speed at which the model generates output tokens. It's a crucial metric for understanding the model's efficiency in output generation. +- **Total Duration (2.620170326s)** : This is the complete time taken from the start of the command to the completion of the response. It encompasses loading the model, processing the input prompt, and generating the response. +- **Load Duration (39.947908ms)** : This duration indicates the time to load the model or necessary components into memory. If this value is minimal, it can suggest that the model was preloaded or that only a minimal setup was required. +- **Prompt Eval Count (32 tokens)** : The number of tokens in the input prompt. In NLP, tokens are typically words or subwords, so this count includes all the tokens that the model evaluated to understand and respond to the query. +- **Prompt Eval Duration (1.644773s)** : This measures the model's time to evaluate or process the input prompt. It accounts for the bulk of the total duration, implying that understanding the query and preparing a response is the most time-consuming part of the process. +- **Prompt Eval Rate (19.46 tokens/s)** : This rate indicates how quickly the model processes tokens from the input prompt. It reflects the model’s speed in terms of natural language comprehension. +- **Eval Count (8 token(s))** : This is the number of tokens in the model’s response, which in this case was, “The capital of France is Paris.” +- **Eval Duration (889.941ms)** : This is the time taken to generate the output based on the evaluated input. It’s much shorter than the prompt evaluation, suggesting that generating the response is less complex or computationally intensive than understanding the prompt. +- **Eval Rate (8.99 tokens/s)** : Similar to the prompt eval rate, this indicates the speed at which the model generates output tokens. It's a crucial metric for understanding the model's efficiency in output generation. This detailed breakdown can help understand the computational demands and performance characteristics of running SLMs like Llama on edge devices like the Raspberry Pi 5. It shows that while prompt evaluation is more time-consuming, the actual generation of responses is relatively quicker. This analysis is crucial for optimizing performance and diagnosing potential bottlenecks in real-time applications. @@ -340,9 +340,9 @@ In this case, the answer was still longer than we expected, with an eval rate of When we asked the same questions about distance and Latitude/Longitude, we did not get a good answer for a distance of `13,507 kilometers (8,429 miles)`, but it was OK for coordinates. Again, it could have been less verbose (more than 200 tokens for each answer). -We can use any model as an assistant since their speed is relatively decent, but on September 24, the Llama2:3B is a better choice. You should try other models, depending on your needs. [🤗 Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) can give you an idea about the best models in size, benchmark, license, etc. +We can use any model as an assistant since their speed is relatively decent, but on September 24 (2023), the Llama2:3B is a better choice. You should try other models, depending on your needs. [🤗 Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) can give you an idea about the best models in size, benchmark, license, etc. -> The best model to use is the one fit for your specific necessity. Also, take into consideration that this field evolves with new models every day, +> The best model to use is the one fit for your specific necessity. Also, take into consideration that this field evolves with new models everyday. ### Multimodal Models @@ -526,10 +526,10 @@ As a result, we will have the model response in a JSON format: As we can see, several pieces of information are generated, such as: -- **response**: the main output text generated by the model in response to our prompt. +- **response** : the main output text generated by the model in response to our prompt. - `The capital of France is **Paris**. 🇫🇷` -- **context**: the token IDs representing the input and context used by the model. Tokens are numerical representations of text used for processing by the language model. +- **context** : the token IDs representing the input and context used by the model. Tokens are numerical representations of text used for processing by the language model. - `[106, 1645, 108, 1841, 603, 573, 6037, 576, 6081, 235336, 107, 108,` ` 106, 2516, 108, 651, 6037, 576, 6081, 603, 5231, 29437, 168428, ` ` 235248, 244304, 241035, 235248, 108]` @@ -537,11 +537,11 @@ As we can see, several pieces of information are generated, such as: The Performance Metrics: -- **total_duration**: The total time taken for the operation in nanoseconds. In this case, approximately 24.26 seconds. -- **load_duration**: The time taken to load the model or components in nanoseconds. About 19.38 seconds. -- **prompt_eval_duration**: The time taken to evaluate the prompt in nanoseconds. Around 1.9.0 seconds. -- **eval_count**: The number of tokens evaluated during the generation. Here, 14 tokens. -- **eval_duration**: The time taken for the model to generate the response in nanoseconds. Approximately 2.5 seconds. +- **total_duration** : The total time taken for the operation in nanoseconds. In this case, approximately 24.26 seconds. +- **load_duration** : The time taken to load the model or components in nanoseconds. About 19.83 seconds. +- **prompt_eval_duration** : The time taken to evaluate the prompt in nanoseconds. Around 16 nanoseconds. +- **eval_count** : The number of tokens evaluated during the generation. Here, 14 tokens. +- **eval_duration** : The time taken for the model to generate the response in nanoseconds. Approximately 2.5 seconds. But, what we want is the plain 'response' and, perhaps for analysis, the total duration of the inference, so let's change the code to extract it from the dictionary: @@ -663,7 +663,7 @@ The model took about 4 minutes (256.45 s) to return with a detailed image descri ### Function Calling -So far, we can see that, with the model's ("response") answer to a variable, we can efficiently work with it, integrating it into real-world projects. However, a big problem is that the model can respond differently to the same prompt. Let's say that what we want, as the model's response in the last examples, is only the name of a given country's capital and its coordinates, nothing more, even with very verbose models such as the Microsoft Phi. We can use the `Ollama function's calling` to guarantee the same answers, which is perfectly compatible with OpenAI API. +So far, we can observe that by using the model's response into a variable, we can effectively incorporate it into real-world projects. However, a major issue arises when the model provides varying responses to the same input. For instance, let's assume that we only need the name of a country's capital and its coordinates as the model's response in the previous examples, without any additional information, even when utilizing verbose models like Microsoft Phi. To ensure consistent responses, we can employ the 'Ollama function call,' which is fully compatible with the OpenAI API. #### But what exactly is "function calling"? @@ -683,7 +683,7 @@ We want to create an *app* where the user enters a country's name and gets, as a Once the user enters a country name, the model will return the name of its capital city (as a string) and the latitude and longitude of such city (in float). Using those coordinates, we can use a simple Python library ([haversine](https://pypi.org/project/haversine/)) to calculate the distance between those 2 points. -The idea of this project is to demonstrate a combination of language model interaction (IA), structured data handling with Pydantic, and geospatial calculations using the Haversine formula (traditional computing). +The idea of this project is to demonstrate a combination of language model interaction, structured data handling with Pydantic, and geospatial calculations using the Haversine formula (traditional computing). First, let us install some libraries. Besides *Haversine*, the main one is the [OpenAI Python library](https://github.com/openai/openai-python), which provides convenient access to the OpenAI REST API from any Python 3.7+ application. The other one is [Pydantic](https://docs.pydantic.dev/latest/) (and instructor), a robust data validation and settings management library engineered by Python to enhance the robustness and reliability of our codebase. In short, *Pydantic* will help ensure that our model's response will always be consistent. @@ -708,11 +708,11 @@ from pydantic import BaseModel, Field import instructor ``` -- **sys**: Provides access to system-specific parameters and functions. It's used to get command-line arguments. -- **haversine**: A function from the haversine library that calculates the distance between two geographic points using the Haversine formula. -- **openAI**: A module for interacting with the OpenAI API (although it's used in conjunction with a local setup, Ollama). Everything is off-line here. -- **pydantic**: Provides data validation and settings management using Python-type annotations. It's used to define the structure of expected response data. -- **instructor**: A module is used to patch the OpenAI client to work in a specific mode (likely related to structured data handling). +- **sys** : Provides access to system-specific parameters and functions. It's used to get command-line arguments. +- **haversine** : A function from the haversine library that calculates the distance between two geographic points using the Haversine formula. +- **openAI** : A module for interacting with the OpenAI API (although it's used in conjunction with a local setup, Ollama). Everything is off-line here. +- **pydantic** : Provides data validation and settings management using Python-type annotations. It's used to define the structure of expected response data. +- **instructor** : A module is used to patch the OpenAI client to work in a specific mode (likely related to structured data handling). ### 2. Defining Input and Model @@ -723,11 +723,11 @@ mylat = -33.33 # Latitude of Santiago de Chile mylon = -70.51 # Longitude of Santiago de Chile ``` -- **country**: On a Python script, getting the country name from command-line arguments is possible. On a Jupyter notebook, we can enter its name, for example, +- **country** : On a Python script, getting the country name from command-line arguments is possible. On a Jupyter notebook, we can enter its name, for example, - `country = "France"` -- **MODEL**: Specifies the model being used, which is, in this example, the phi3.5. -- **mylat** **and** **mylon**: Coordinates of Santiago de Chile, used as the starting point for the distance calculation. +- **MODEL** : Specifies the model being used, which is, in this example, the phi3.5. +- **mylat** **and** **mylon** : Coordinates of Santiago de Chile, used as the starting point for the distance calculation. ### 3. Defining the Response Data Structure @@ -738,7 +738,7 @@ class CityCoord(BaseModel): lon: float = Field(..., description="Decimal Longitude of the city") ``` -- **CityCoord**: A Pydantic model that defines the expected structure of the response from the LLM. It expects three fields: city (name of the city), lat (latitude), and lon (longitude). +- **CityCoord** : A Pydantic model that defines the expected structure of the response from the LLM. It expects three fields: city (name of the city), lat (latitude), and lon (longitude). ### 4. Setting Up the OpenAI Client @@ -752,8 +752,8 @@ client = instructor.patch( ) ``` -- **OpenAI**: This setup initializes an OpenAI client with a local base URL and an API key (ollama). It uses a local server. -- **instructor.patch**: Patches the OpenAI client to work in JSON mode, enabling structured output that matches the Pydantic model. +- **OpenAI** : This setup initializes an OpenAI client with a local base URL and an API key (ollama). It uses a local server. +- **instructor.patch** : Patches the OpenAI client to work in JSON mode, enabling structured output that matches the Pydantic model. ### 5. Generating the Response @@ -772,11 +772,11 @@ resp = client.chat.completions.create( ) ``` -- **client.chat.completions.create**: Calls the LLM to generate a response. -- **model**: Specifies the model to use (llava-phi3). -- **messages**: Contains the prompt for the LLM, asking for the latitude and longitude of the capital city of the specified country. -- **response_model**: Indicates that the response should conform to the CityCoord model. -- **max_retries**: The maximum number of retry attempts if the request fails. +- **client.chat.completions.create** : Calls the LLM to generate a response. +- **model** : Specifies the model to use (llava-phi3). +- **messages** : Contains the prompt for the LLM, asking for the latitude and longitude of the capital city of the specified country. +- **response_model** : Indicates that the response should conform to the CityCoord model. +- **max_retries** : The maximum number of retry attempts if the request fails. ### 6. Calculating the Distance @@ -786,12 +786,12 @@ print(f"Santiago de Chile is about {int(round(distance, -1)):,} \ kilometers away from {resp.city}.") ``` -- **haversine**: Calculates the distance between Santiago de Chile and the capital city returned by the LLM using their respective coordinates. -- **(mylat, mylon)**: Coordinates of Santiago de Chile. -- **resp.city**: Name of the country's capital -- **(resp.lat, resp.lon)**: Coordinates of the capital city are provided by the LLM response. -- **unit='km'**: Specifies that the distance should be calculated in kilometers. -- **print**: Outputs the distance, rounded to the nearest 10 kilometers, with thousands of separators for readability. +- **haversine** : Calculates the distance between Santiago de Chile and the capital city returned by the LLM using their respective coordinates. +- **(mylat, mylon)** : Coordinates of Santiago de Chile. +- **resp.city** : Name of the country's capital +- **(resp.lat, resp.lon)** : Coordinates of the capital city are provided by the LLM response. +- **unit='km'** : Specifies that the distance should be calculated in kilometers. +- **print** : Outputs the distance, rounded to the nearest 10 kilometers, with thousands of separators for readability. **Running the code** @@ -1079,7 +1079,6 @@ EMB_MODEL = "nomic-embed-text" MODEL = 'llama3.2:3B' ``` - Initially, a knowledge base about bee facts should be created. This involves collecting relevant documents and converting them into vector embeddings. These embeddings are then stored in a vector database, allowing for efficient similarity searches later. Enter with the "document," a base of "bee facts" as a list: ```python diff --git a/contents/labs/raspi/raspi.qmd b/contents/labs/raspi/raspi.qmd index 2e6dfe955..022413393 100644 --- a/contents/labs/raspi/raspi.qmd +++ b/contents/labs/raspi/raspi.qmd @@ -6,13 +6,13 @@ These labs offer invaluable hands-on experience with machine learning systems, l ## Pre-requisites -- **Raspberry Pi**: Ensure you have at least one of the boards: the Raspberry Pi Zero 2W, Raspberry Pi 4 or 5 for the Vision Labs, and the Raspberry 5 for the GenAi lab. -- **Power Adapter**: To Power on the boards. +- **Raspberry Pi** : Ensure you have at least one of the boards: the Raspberry Pi Zero 2W, Raspberry Pi 4 or 5 for the Vision Labs, and the Raspberry 5 for the GenAi lab. +- **Power Adapter** : To Power on the boards. - Raspberry Pi Zero 2-W: 2.5W with a Micro-USB adapter - Raspberry Pi 4 or 5: 3.5W with a USB-C adapter -- **Network**: With internet access for downloading the necessary software and controlling the boards remotely. -- **SD Card (32GB minimum) and an SD card Adapter**: For the Raspberry Pi OS. +- **Network** : With internet access for downloading the necessary software and controlling the boards remotely. +- **SD Card (32GB minimum) and an SD card Adapter** : For the Raspberry Pi OS. ## Setup diff --git a/contents/labs/raspi/setup/setup.qmd b/contents/labs/raspi/setup/setup.qmd index 4cf0a2924..806455d51 100644 --- a/contents/labs/raspi/setup/setup.qmd +++ b/contents/labs/raspi/setup/setup.qmd @@ -12,17 +12,17 @@ The Raspberry Pi is a powerful and versatile single-board computer that has beco ### Key Features -1. **Computational Power**: Despite their small size, Raspberry Pis offers significant processing capabilities, with the latest models featuring multi-core ARM processors and up to 8GB of RAM. +1. **Computational Power** : Despite their small size, Raspberry Pis offers significant processing capabilities, with the latest models featuring multi-core ARM processors and up to 8GB of RAM. -2. **GPIO Interface**: The 40-pin GPIO header allows direct interaction with sensors, actuators, and other electronic components, facilitating hardware-software integration projects. +2. **GPIO Interface** : The 40-pin GPIO header allows direct interaction with sensors, actuators, and other electronic components, facilitating hardware-software integration projects. -3. **Extensive Connectivity**: Built-in Wi-Fi, Bluetooth, Ethernet, and multiple USB ports enable diverse communication and networking projects. +3. **Extensive Connectivity** : Built-in Wi-Fi, Bluetooth, Ethernet, and multiple USB ports enable diverse communication and networking projects. -4. **Low-Level Hardware Access**: Raspberry Pis provides access to interfaces like I2C, SPI, and UART, allowing for detailed control and communication with external devices. +4. **Low-Level Hardware Access** : Raspberry Pis provides access to interfaces like I2C, SPI, and UART, allowing for detailed control and communication with external devices. -5. **Real-Time Capabilities**: With proper configuration, Raspberry Pis can be used for soft real-time applications, making them suitable for control systems and signal processing tasks. +5. **Real-Time Capabilities** : With proper configuration, Raspberry Pis can be used for soft real-time applications, making them suitable for control systems and signal processing tasks. -6. **Power Efficiency**: Low power consumption enables battery-powered and energy-efficient designs, especially in models like the Pi Zero. +6. **Power Efficiency** : Low power consumption enables battery-powered and energy-efficient designs, especially in models like the Pi Zero. ### Raspberry Pi Models (covered in this book) @@ -36,21 +36,21 @@ The Raspberry Pi is a powerful and versatile single-board computer that has beco ### Engineering Applications -1. **Embedded Systems Design**: Develop and prototype embedded systems for real-world applications. +1. **Embedded Systems Design** : Develop and prototype embedded systems for real-world applications. -2. **IoT and Networked Devices**: Create interconnected devices and explore protocols like MQTT, CoAP, and HTTP/HTTPS. +2. **IoT and Networked Devices** : Create interconnected devices and explore protocols like MQTT, CoAP, and HTTP/HTTPS. -3. **Control Systems**: Implement feedback control loops, PID controllers, and interface with actuators. +3. **Control Systems** : Implement feedback control loops, PID controllers, and interface with actuators. -4. **Computer Vision and AI**: Utilize libraries like OpenCV and TensorFlow Lite for image processing and machine learning at the edge. +4. **Computer Vision and AI** : Utilize libraries like OpenCV and TensorFlow Lite for image processing and machine learning at the edge. -5. **Data Acquisition and Analysis**: Collect sensor data, perform real-time analysis, and create data logging systems. +5. **Data Acquisition and Analysis** : Collect sensor data, perform real-time analysis, and create data logging systems. -6. **Robotics**: Build robot controllers, implement motion planning algorithms, and interface with motor drivers. +6. **Robotics** : Build robot controllers, implement motion planning algorithms, and interface with motor drivers. -7. **Signal Processing**: Perform real-time signal analysis, filtering, and DSP applications. +7. **Signal Processing** : Perform real-time signal analysis, filtering, and DSP applications. -8. **Network Security**: Set up VPNs, firewalls, and explore network penetration testing. +8. **Network Security** : Set up VPNs, firewalls, and explore network penetration testing. This tutorial will guide you through setting up the most common Raspberry Pi models, enabling you to start on your machine learning project quickly. We'll cover hardware setup, operating system installation, and initial configuration, focusing on preparing your Pi for Machine Learning applications. @@ -60,23 +60,23 @@ This tutorial will guide you through setting up the most common Raspberry Pi mod ![](images/jpeg/zero-hardware.jpg) -- **Processor**: 1GHz quad-core 64-bit Arm Cortex-A53 CPU -- **RAM**: 512MB SDRAM -- **Wireless**: 2.4GHz 802.11 b/g/n wireless LAN, Bluetooth 4.2, BLE -- **Ports**: Mini HDMI, micro USB OTG, CSI-2 camera connector -- **Power**: 5V via micro USB port +- **Processor** : 1GHz quad-core 64-bit Arm Cortex-A53 CPU +- **RAM** : 512MB SDRAM +- **Wireless** : 2.4GHz 802.11 b/g/n wireless LAN, Bluetooth 4.2, BLE +- **Ports** : Mini HDMI, micro USB OTG, CSI-2 camera connector +- **Power** : 5V via micro USB port ### Raspberry Pi 5 ![](images/jpeg/r5-hardware.jpg) -- **Processor**: +- **Processor** : - Pi 5: Quad-core 64-bit Arm Cortex-A76 CPU @ 2.4GHz - Pi 4: Quad-core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5GHz -- **RAM**: 2GB, 4GB, or 8GB options (8GB recommended for AI tasks) -- **Wireless**: Dual-band 802.11ac wireless, Bluetooth 5.0 -- **Ports**: 2 × micro HDMI ports, 2 × USB 3.0 ports, 2 × USB 2.0 ports, CSI camera port, DSI display port -- **Power**: 5V DC via USB-C connector (3A) +- **RAM** : 2GB, 4GB, or 8GB options (8GB recommended for AI tasks) +- **Wireless** : Dual-band 802.11ac wireless, Bluetooth 5.0 +- **Ports** : 2 × micro HDMI ports, 2 × USB 3.0 ports, 2 × USB 2.0 ports, CSI camera port, DSI display port +- **Power** : 5V DC via USB-C connector (3A) > In the labs, we will use different names to address the Raspberry: `Raspi`, `Raspi-5`, `Raspi-Zero`, etc. Usually, `Raspi` is used when the instructions or comments apply to every model. @@ -124,14 +124,14 @@ Follow the steps to install the OS in your Raspi. 2. Insert a microSD card into your computer (a 32GB SD card is recommended) . 3. Open Raspberry Pi Imager and select your Raspberry Pi model. 4. Choose the appropriate operating system: - - **For Raspi-Zero**: For example, you can select: + - **For Raspi-Zero** : For example, you can select: `Raspberry Pi OS Lite (64-bit)`. ![img](images/png/zero-burn.png) > Due to its reduced SDRAM (512MB), the recommended OS for the Raspi-Zero is the 32-bit version. However, to run some machine learning models, such as the YOLOv8 from Ultralitics, we should use the 64-bit version. Although Raspi-Zero can run a *desktop*, we will choose the LITE version (no Desktop) to reduce the RAM needed for regular operation. - - For **Raspi-5**: We can select the full 64-bit version, which includes a desktop: + - For **Raspi-5** : We can select the full 64-bit version, which includes a desktop: `Raspberry Pi OS (64-bit)` ![](images/png/r5-burn.png) diff --git a/contents/labs/seeed/xiao_esp32s3/kws/kws.qmd b/contents/labs/seeed/xiao_esp32s3/kws/kws.qmd index e1e6f81fb..2c9efecff 100644 --- a/contents/labs/seeed/xiao_esp32s3/kws/kws.qmd +++ b/contents/labs/seeed/xiao_esp32s3/kws/kws.qmd @@ -99,11 +99,11 @@ The I2S protocol consists of at least three lines: ![](https://hackster.imgix.net/uploads/attachments/1594628/image_8CRJmXD9Fr.png?auto=compress%2Cformat&w=740&h=555&fit=max) -**1. Bit (or Serial) clock line (BCLK or CLK)**: This line toggles to indicate the start of a new bit of data (pin IO42). +**1. Bit (or Serial) clock line (BCLK or CLK)** : This line toggles to indicate the start of a new bit of data (pin IO42). -**2. Word select line (WS)**: This line toggles to indicate the start of a new word (left channel or right channel). The Word select clock (WS) frequency defines the sample rate. In our case, L/R on the microphone is set to ground, meaning that we will use only the left channel (mono). +**2. Word select line (WS)** : This line toggles to indicate the start of a new word (left channel or right channel). The Word select clock (WS) frequency defines the sample rate. In our case, L/R on the microphone is set to ground, meaning that we will use only the left channel (mono). -**3. Data line (SD)**: This line carries the audio data (pin IO41) +**3. Data line (SD)** : This line carries the audio data (pin IO41) In an I2S data stream, the data is sent as a sequence of frames, each containing a left-channel word and a right-channel word. This makes I2S particularly suited for transmitting stereo audio data. However, it can also be used for mono or multichannel audio with additional data lines. diff --git a/contents/labs/seeed/xiao_esp32s3/setup/setup.qmd b/contents/labs/seeed/xiao_esp32s3/setup/setup.qmd index 8caa62e00..2b836e524 100644 --- a/contents/labs/seeed/xiao_esp32s3/setup/setup.qmd +++ b/contents/labs/seeed/xiao_esp32s3/setup/setup.qmd @@ -10,12 +10,12 @@ The [XIAO ESP32S3 Sense](https://www.seeedstudio.com/XIAO-ESP32S3-Sense-p-5639.h **XIAO ESP32S3 Sense Main Features** -- **Powerful MCU Board**: Incorporate the ESP32S3 32-bit, dual-core, Xtensa processor chip operating up to 240 MHz, mounted multiple development ports, Arduino / MicroPython supported -- **Advanced Functionality**: Detachable OV2640 camera sensor for 1600 * 1200 resolution, compatible with OV5640 camera sensor, integrating an additional digital microphone -- **Elaborate Power Design**: Lithium battery charge management capability offers four power consumption models, which allows for deep sleep mode with power consumption as low as 14μA -- **Great Memory for more Possibilities**: Offer 8MB PSRAM and 8MB FLASH, supporting SD card slot for external 32GB FAT memory -- **Outstanding RF performance**: Support 2.4GHz Wi-Fi and BLE dual wireless communication, support 100m+ remote communication when connected with U.FL antenna -- **Thumb-sized Compact Design**: 21 x 17.5mm, adopting the classic form factor of XIAO, suitable for space-limited projects like wearable devices +- **Powerful MCU Board** : Incorporate the ESP32S3 32-bit, dual-core, Xtensa processor chip operating up to 240 MHz, mounted multiple development ports, Arduino / MicroPython supported +- **Advanced Functionality** : Detachable OV2640 camera sensor for 1600 * 1200 resolution, compatible with OV5640 camera sensor, integrating an additional digital microphone +- **Elaborate Power Design** : Lithium battery charge management capability offers four power consumption models, which allows for deep sleep mode with power consumption as low as 14μA +- **Great Memory for more Possibilities** : Offer 8MB PSRAM and 8MB FLASH, supporting SD card slot for external 32GB FAT memory +- **Outstanding RF performance** : Support 2.4GHz Wi-Fi and BLE dual wireless communication, support 100m+ remote communication when connected with U.FL antenna +- **Thumb-sized Compact Design** : 21 x 17.5mm, adopting the classic form factor of XIAO, suitable for space-limited projects like wearable devices ![](./images/png/xiao_pins.png) diff --git a/contents/labs/seeed/xiao_esp32s3/xiao_esp32s3.qmd b/contents/labs/seeed/xiao_esp32s3/xiao_esp32s3.qmd index 90064ee8e..0d77dd037 100644 --- a/contents/labs/seeed/xiao_esp32s3/xiao_esp32s3.qmd +++ b/contents/labs/seeed/xiao_esp32s3/xiao_esp32s3.qmd @@ -6,10 +6,10 @@ These labs provide a unique opportunity to gain practical experience with machin ## Pre-requisites -- **XIAO ESP32S3 Sense Board**: Ensure you have the XIAO ESP32S3 Sense Board. -- **USB-C Cable**: This is for connecting the board to your computer. -- **Network**: With internet access for downloading necessary software. -- **SD Card and an SD card Adapter**: This saves audio and images (optional). +- **XIAO ESP32S3 Sense Board** : Ensure you have the XIAO ESP32S3 Sense Board. +- **USB-C Cable** : This is for connecting the board to your computer. +- **Network** : With internet access for downloading necessary software. +- **SD Card and an SD card Adapter** : This saves audio and images (optional). ## Setup diff --git a/contents/optimizations/optimizations.qmd b/contents/optimizations/optimizations.qmd index 285ab5c5b..826dd868f 100644 --- a/contents/optimizations/optimizations.qmd +++ b/contents/optimizations/optimizations.qmd @@ -91,10 +91,10 @@ A widely adopted and effective strategy for systematically pruning structures re There are several techniques for assigning these importance scores: -* **Weight Magnitude-Based Pruning**: This approach assigns importance scores to a structure by evaluating the aggregate magnitude of their associated weights. Structures with smaller overall weight magnitudes are considered less critical to the network's performance. -* **Gradient-Based Pruning**: This technique utilizes the gradients of the loss function with respect to the weights associated with a structure. Structures with low cumulative gradient magnitudes, indicating minimal impact on the loss when altered, are prime candidates for pruning. -* **Activation-Based Pruning**: This method tracks how often a neuron or filter is activated by storing this information in a parameter called the activation counter. Each time the structure is activated, the counter is incremented. A low activation count suggests that the structure is less relevant. -* **Taylor Expansion-Based Pruning**: This approach approximates the change in the loss function from removing a given weight. By assessing the cumulative loss disturbance from removing all the weights associated with a structure, you can identify structures with negligible impact on the loss, making them suitable candidates for pruning. +* **Weight Magnitude-Based Pruning** : This approach assigns importance scores to a structure by evaluating the aggregate magnitude of their associated weights. Structures with smaller overall weight magnitudes are considered less critical to the network's performance. +* **Gradient-Based Pruning** : This technique utilizes the gradients of the loss function with respect to the weights associated with a structure. Structures with low cumulative gradient magnitudes, indicating minimal impact on the loss when altered, are prime candidates for pruning. +* **Activation-Based Pruning** : This method tracks how often a neuron or filter is activated by storing this information in a parameter called the activation counter. Each time the structure is activated, the counter is incremented. A low activation count suggests that the structure is less relevant. +* **Taylor Expansion-Based Pruning** : This approach approximates the change in the loss function from removing a given weight. By assessing the cumulative loss disturbance from removing all the weights associated with a structure, you can identify structures with negligible impact on the loss, making them suitable candidates for pruning. The idea is to measure, either directly or indirectly, the contribution of each component to the model's output. Structures with minimal influence according to the defined criteria are pruned first. This enables selective, optimized pruning that maximally compresses models while preserving predictive capacity. In general, it is important to evaluate the impact of removing particular structures on the model's output, with recent works such as [@rachwan2022winning] and [@lubana2020gradient] investigating combinations of techniques like magnitude-based pruning and gradient-based pruning. diff --git a/contents/privacy_security/privacy_security.qmd b/contents/privacy_security/privacy_security.qmd index 9519eae21..79d6cc7be 100644 --- a/contents/privacy_security/privacy_security.qmd +++ b/contents/privacy_security/privacy_security.qmd @@ -494,7 +494,7 @@ The importance of TEEs in ML hardware security stems from their ability to prote * **Side-channel Attacks:** Although not impenetrable, TEEs can mitigate specific side-channel attacks by controlling access to sensitive operations and data patterns. -* ** Network Threats:** TEEs enhance network security by safeguarding data transmission between distributed ML components through encryption and secure in-TEE processing. This effectively prevents man-in-the-middle attacks and ensures data is transmitted through trusted channels. +* **Network Threats:** TEEs enhance network security by safeguarding data transmission between distributed ML components through encryption and secure in-TEE processing. This effectively prevents man-in-the-middle attacks and ensures data is transmitted through trusted channels. #### Mechanics @@ -514,7 +514,7 @@ Here are some examples of TEEs that provide hardware-based security for sensitiv * **[IntelSGX](https://www.intel.com/content/www/us/en/architecture-and-technology/software-guard-extensions.html):** Intel's Software Guard Extensions provide an enclave for code execution that protects against various software-based threats, specifically targeting O.S. layer vulnerabilities. They are used to safeguard workloads in the cloud. -* **[Qualcomm Secure Execution Environment](https://www.qualcomm.com/products/features/mobile-security-solutions):**A Hardware sandbox on Qualcomm chipsets for mobile payment and authentication apps. +* **[Qualcomm Secure Execution Environment](https://www.qualcomm.com/products/features/mobile-security-solutions):** A Hardware sandbox on Qualcomm chipsets for mobile payment and authentication apps. * **[Apple SecureEnclave](https://support.apple.com/guide/security/secure-enclave-sec59b0b31ff/web):** A TEE for biometric data and cryptographic key management on iPhones and iPads, facilitating secure mobile payments.