diff --git a/lessons/01_preprocessing.ipynb b/lessons/01_preprocessing.ipynb
index de33786..54c3836 100644
--- a/lessons/01_preprocessing.ipynb
+++ b/lessons/01_preprocessing.ipynb
@@ -5,17 +5,24 @@
    "id": "d3e7ea21-6437-48e8-a9e4-3bdc05f709c9",
    "metadata": {},
    "source": [
-    "# Python Text Analysis: Preprocessing\n",
+    "# Análisis de texto en python: Preprocesamiento\n",
     "\n",
     "* * * \n",
     "\n",
+    "## Grupo 4\n",
+    "### Integrantes\n",
+    "* Carlos Chicaiza\n",
+    "* Emilio Mayorga\n",
+    "* Juan Vizuete\n",
+    "* Jessica Llumiguano\n",
+    "\n",
     "<div class=\"alert alert-success\">  \n",
     "    \n",
-    "### Learning Objectives \n",
+    "### Objetivos de Aprendizaje\n",
     "    \n",
-    "* Learn common steps for preprocessing text data, as well as specific operations for preprocessing Twitter data.\n",
-    "* Know commonly used NLP packages and what they are capable of.\n",
-    "* Understand tokenizers, and how they have changed since the advent of Large Language Models.\n",
+    "* Aprender cuales son los pasos comunes para el procesamiento de datos, asi como tambien las operaciones que se realizan para el procesamiento de datos en twitter.\n",
+    "* Conocer los paquete de procesamiento de lenguaje natural mas utilizados y sus capacidades.\n",
+    "* Entender los tokenizadores y como han cambiado desde la aparición de los modelos de lenguaje en gran escala.\n",
     "</div>\n",
     "\n",
     "### Icons Used in This Notebook\n",
@@ -25,26 +32,223 @@
     "🎬 **Demo**: Showing off something more advanced – so you know what Python can be used for!<br> \n",
     "\n",
     "### Sections\n",
-    "1. [Preprocessing](#section1)\n",
-    "2. [Tokenization](#section2)\n",
+    "1. [Preprocesamiento](#section1)\n",
     "\n",
-    "In this three-part workshop series, we'll learn the building blocks for performing text analysis in Python. These techniques lie in the domain of Natural Language Processing (NLP). NLP is a field that deals with identifying and extracting patterns of language, primarily in written texts. Throughout the workshop series, we'll interact with various packages for performing text analysis: starting from simple string methods to specific NLP packages, such as `nltk`, `spaCy`, and more recent ones on Large Language Models (`BERT`).\n",
+    "En estas tres partes del trabajo, vamos aprender conceptos básicos para realizar análisis de tecto en python. Estas técnicas pertenecen al dominio del procesmiento de lenguaje natural (NLP). NlP es un camp enfocado a identificar y extraer patrones del lenguaje, principalmente en textos escritos. Durante el rabjo, interactuaremos con diversos paquetes para realizar análisis de texto, desde métodos simples de strings hasta paquetes específicos de NLP, como `nltk`, `spaCy` y otros modelos de lenguaje de gran escala como (`BERT`).\n",
     "\n",
-    "Now, let's have these packages properly installed before diving into the materials."
+    "Ahora bien, antes de iniciar, se debe instalar los siguientes paquetes:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 1,
    "id": "d442e4c7-e926-493d-a64e-516616ad915a",
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Collecting NLTK\n",
+      "  Downloading nltk-3.9.1-py3-none-any.whl.metadata (2.9 kB)\n",
+      "Collecting click (from NLTK)\n",
+      "  Downloading click-8.1.8-py3-none-any.whl.metadata (2.3 kB)\n",
+      "Collecting joblib (from NLTK)\n",
+      "  Downloading joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB)\n",
+      "Collecting regex>=2021.8.3 (from NLTK)\n",
+      "  Downloading regex-2024.11.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)\n",
+      "Collecting tqdm (from NLTK)\n",
+      "  Downloading tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)\n",
+      "Downloading nltk-3.9.1-py3-none-any.whl (1.5 MB)\n",
+      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.5/1.5 MB\u001b[0m \u001b[31m13.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+      "\u001b[?25hDownloading regex-2024.11.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (796 kB)\n",
+      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m796.9/796.9 kB\u001b[0m \u001b[31m21.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+      "\u001b[?25hDownloading click-8.1.8-py3-none-any.whl (98 kB)\n",
+      "Downloading joblib-1.4.2-py3-none-any.whl (301 kB)\n",
+      "Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)\n",
+      "Installing collected packages: tqdm, regex, joblib, click, NLTK\n",
+      "Successfully installed NLTK-3.9.1 click-8.1.8 joblib-1.4.2 regex-2024.11.6 tqdm-4.67.1\n",
+      "Note: you may need to restart the kernel to use updated packages.\n",
+      "Collecting transformers\n",
+      "  Downloading transformers-4.50.1-py3-none-any.whl.metadata (39 kB)\n",
+      "Collecting filelock (from transformers)\n",
+      "  Downloading filelock-3.18.0-py3-none-any.whl.metadata (2.9 kB)\n",
+      "Collecting huggingface-hub<1.0,>=0.26.0 (from transformers)\n",
+      "  Downloading huggingface_hub-0.29.3-py3-none-any.whl.metadata (13 kB)\n",
+      "Collecting numpy>=1.17 (from transformers)\n",
+      "  Downloading numpy-2.2.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (62 kB)\n",
+      "Requirement already satisfied: packaging>=20.0 in /workspaces/Python-Text-Analysis_grupo_4/.venv/lib/python3.12/site-packages (from transformers) (24.2)\n",
+      "Collecting pyyaml>=5.1 (from transformers)\n",
+      "  Downloading PyYAML-6.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.1 kB)\n",
+      "Requirement already satisfied: regex!=2019.12.17 in /workspaces/Python-Text-Analysis_grupo_4/.venv/lib/python3.12/site-packages (from transformers) (2024.11.6)\n",
+      "Collecting requests (from transformers)\n",
+      "  Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)\n",
+      "Collecting tokenizers<0.22,>=0.21 (from transformers)\n",
+      "  Downloading tokenizers-0.21.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB)\n",
+      "Collecting safetensors>=0.4.3 (from transformers)\n",
+      "  Downloading safetensors-0.5.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)\n",
+      "Requirement already satisfied: tqdm>=4.27 in /workspaces/Python-Text-Analysis_grupo_4/.venv/lib/python3.12/site-packages (from transformers) (4.67.1)\n",
+      "Collecting fsspec>=2023.5.0 (from huggingface-hub<1.0,>=0.26.0->transformers)\n",
+      "  Downloading fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)\n",
+      "Collecting typing-extensions>=3.7.4.3 (from huggingface-hub<1.0,>=0.26.0->transformers)\n",
+      "  Downloading typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)\n",
+      "Collecting charset-normalizer<4,>=2 (from requests->transformers)\n",
+      "  Downloading charset_normalizer-3.4.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (35 kB)\n",
+      "Collecting idna<4,>=2.5 (from requests->transformers)\n",
+      "  Downloading idna-3.10-py3-none-any.whl.metadata (10 kB)\n",
+      "Collecting urllib3<3,>=1.21.1 (from requests->transformers)\n",
+      "  Downloading urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB)\n",
+      "Collecting certifi>=2017.4.17 (from requests->transformers)\n",
+      "  Using cached certifi-2025.1.31-py3-none-any.whl.metadata (2.5 kB)\n",
+      "Downloading transformers-4.50.1-py3-none-any.whl (10.2 MB)\n",
+      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m10.2/10.2 MB\u001b[0m \u001b[31m56.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+      "\u001b[?25hDownloading huggingface_hub-0.29.3-py3-none-any.whl (468 kB)\n",
+      "Downloading numpy-2.2.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.1 MB)\n",
+      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m16.1/16.1 MB\u001b[0m \u001b[31m58.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m\n",
+      "\u001b[?25hDownloading PyYAML-6.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (767 kB)\n",
+      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m767.5/767.5 kB\u001b[0m \u001b[31m32.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+      "\u001b[?25hDownloading safetensors-0.5.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (471 kB)\n",
+      "Downloading tokenizers-0.21.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)\n",
+      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.0/3.0 MB\u001b[0m \u001b[31m49.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+      "\u001b[?25hDownloading filelock-3.18.0-py3-none-any.whl (16 kB)\n",
+      "Downloading requests-2.32.3-py3-none-any.whl (64 kB)\n",
+      "Using cached certifi-2025.1.31-py3-none-any.whl (166 kB)\n",
+      "Downloading charset_normalizer-3.4.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (145 kB)\n",
+      "Downloading fsspec-2025.3.0-py3-none-any.whl (193 kB)\n",
+      "Downloading idna-3.10-py3-none-any.whl (70 kB)\n",
+      "Downloading typing_extensions-4.12.2-py3-none-any.whl (37 kB)\n",
+      "Downloading urllib3-2.3.0-py3-none-any.whl (128 kB)\n",
+      "Installing collected packages: urllib3, typing-extensions, safetensors, pyyaml, numpy, idna, fsspec, filelock, charset-normalizer, certifi, requests, huggingface-hub, tokenizers, transformers\n",
+      "Successfully installed certifi-2025.1.31 charset-normalizer-3.4.1 filelock-3.18.0 fsspec-2025.3.0 huggingface-hub-0.29.3 idna-3.10 numpy-2.2.4 pyyaml-6.0.2 requests-2.32.3 safetensors-0.5.3 tokenizers-0.21.1 transformers-4.50.1 typing-extensions-4.12.2 urllib3-2.3.0\n",
+      "Note: you may need to restart the kernel to use updated packages.\n",
+      "Collecting spaCy\n",
+      "  Downloading spacy-3.8.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (27 kB)\n",
+      "Collecting spacy-legacy<3.1.0,>=3.0.11 (from spaCy)\n",
+      "  Downloading spacy_legacy-3.0.12-py2.py3-none-any.whl.metadata (2.8 kB)\n",
+      "Collecting spacy-loggers<2.0.0,>=1.0.0 (from spaCy)\n",
+      "  Downloading spacy_loggers-1.0.5-py3-none-any.whl.metadata (23 kB)\n",
+      "Collecting murmurhash<1.1.0,>=0.28.0 (from spaCy)\n",
+      "  Downloading murmurhash-1.0.12-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.1 kB)\n",
+      "Collecting cymem<2.1.0,>=2.0.2 (from spaCy)\n",
+      "  Downloading cymem-2.0.11-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (8.5 kB)\n",
+      "Collecting preshed<3.1.0,>=3.0.2 (from spaCy)\n",
+      "  Downloading preshed-3.0.9-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.2 kB)\n",
+      "Collecting thinc<8.4.0,>=8.3.4 (from spaCy)\n",
+      "  Downloading thinc-8.3.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (15 kB)\n",
+      "Collecting wasabi<1.2.0,>=0.9.1 (from spaCy)\n",
+      "  Downloading wasabi-1.1.3-py3-none-any.whl.metadata (28 kB)\n",
+      "Collecting srsly<3.0.0,>=2.4.3 (from spaCy)\n",
+      "  Downloading srsly-2.5.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (19 kB)\n",
+      "Collecting catalogue<2.1.0,>=2.0.6 (from spaCy)\n",
+      "  Downloading catalogue-2.0.10-py3-none-any.whl.metadata (14 kB)\n",
+      "Collecting weasel<0.5.0,>=0.1.0 (from spaCy)\n",
+      "  Downloading weasel-0.4.1-py3-none-any.whl.metadata (4.6 kB)\n",
+      "Collecting typer<1.0.0,>=0.3.0 (from spaCy)\n",
+      "  Downloading typer-0.15.2-py3-none-any.whl.metadata (15 kB)\n",
+      "Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /workspaces/Python-Text-Analysis_grupo_4/.venv/lib/python3.12/site-packages (from spaCy) (4.67.1)\n",
+      "Requirement already satisfied: numpy>=1.19.0 in /workspaces/Python-Text-Analysis_grupo_4/.venv/lib/python3.12/site-packages (from spaCy) (2.2.4)\n",
+      "Requirement already satisfied: requests<3.0.0,>=2.13.0 in /workspaces/Python-Text-Analysis_grupo_4/.venv/lib/python3.12/site-packages (from spaCy) (2.32.3)\n",
+      "Collecting pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 (from spaCy)\n",
+      "  Downloading pydantic-2.10.6-py3-none-any.whl.metadata (30 kB)\n",
+      "Collecting jinja2 (from spaCy)\n",
+      "  Downloading jinja2-3.1.6-py3-none-any.whl.metadata (2.9 kB)\n",
+      "Collecting setuptools (from spaCy)\n",
+      "  Downloading setuptools-78.1.0-py3-none-any.whl.metadata (6.6 kB)\n",
+      "Requirement already satisfied: packaging>=20.0 in /workspaces/Python-Text-Analysis_grupo_4/.venv/lib/python3.12/site-packages (from spaCy) (24.2)\n",
+      "Collecting langcodes<4.0.0,>=3.2.0 (from spaCy)\n",
+      "  Downloading langcodes-3.5.0-py3-none-any.whl.metadata (29 kB)\n",
+      "Collecting language-data>=1.2 (from langcodes<4.0.0,>=3.2.0->spaCy)\n",
+      "  Downloading language_data-1.3.0-py3-none-any.whl.metadata (4.3 kB)\n",
+      "Collecting annotated-types>=0.6.0 (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spaCy)\n",
+      "  Downloading annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)\n",
+      "Collecting pydantic-core==2.27.2 (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spaCy)\n",
+      "  Downloading pydantic_core-2.27.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)\n",
+      "Requirement already satisfied: typing-extensions>=4.12.2 in /workspaces/Python-Text-Analysis_grupo_4/.venv/lib/python3.12/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spaCy) (4.12.2)\n",
+      "Requirement already satisfied: charset-normalizer<4,>=2 in /workspaces/Python-Text-Analysis_grupo_4/.venv/lib/python3.12/site-packages (from requests<3.0.0,>=2.13.0->spaCy) (3.4.1)\n",
+      "Requirement already satisfied: idna<4,>=2.5 in /workspaces/Python-Text-Analysis_grupo_4/.venv/lib/python3.12/site-packages (from requests<3.0.0,>=2.13.0->spaCy) (3.10)\n",
+      "Requirement already satisfied: urllib3<3,>=1.21.1 in /workspaces/Python-Text-Analysis_grupo_4/.venv/lib/python3.12/site-packages (from requests<3.0.0,>=2.13.0->spaCy) (2.3.0)\n",
+      "Requirement already satisfied: certifi>=2017.4.17 in /workspaces/Python-Text-Analysis_grupo_4/.venv/lib/python3.12/site-packages (from requests<3.0.0,>=2.13.0->spaCy) (2025.1.31)\n",
+      "Collecting blis<1.3.0,>=1.2.0 (from thinc<8.4.0,>=8.3.4->spaCy)\n",
+      "  Downloading blis-1.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)\n",
+      "Collecting confection<1.0.0,>=0.0.1 (from thinc<8.4.0,>=8.3.4->spaCy)\n",
+      "  Downloading confection-0.1.5-py3-none-any.whl.metadata (19 kB)\n",
+      "Requirement already satisfied: click>=8.0.0 in /workspaces/Python-Text-Analysis_grupo_4/.venv/lib/python3.12/site-packages (from typer<1.0.0,>=0.3.0->spaCy) (8.1.8)\n",
+      "Collecting shellingham>=1.3.0 (from typer<1.0.0,>=0.3.0->spaCy)\n",
+      "  Downloading shellingham-1.5.4-py2.py3-none-any.whl.metadata (3.5 kB)\n",
+      "Collecting rich>=10.11.0 (from typer<1.0.0,>=0.3.0->spaCy)\n",
+      "  Downloading rich-13.9.4-py3-none-any.whl.metadata (18 kB)\n",
+      "Collecting cloudpathlib<1.0.0,>=0.7.0 (from weasel<0.5.0,>=0.1.0->spaCy)\n",
+      "  Downloading cloudpathlib-0.21.0-py3-none-any.whl.metadata (14 kB)\n",
+      "Collecting smart-open<8.0.0,>=5.2.1 (from weasel<0.5.0,>=0.1.0->spaCy)\n",
+      "  Downloading smart_open-7.1.0-py3-none-any.whl.metadata (24 kB)\n",
+      "Collecting MarkupSafe>=2.0 (from jinja2->spaCy)\n",
+      "  Downloading MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.0 kB)\n",
+      "Collecting marisa-trie>=1.1.0 (from language-data>=1.2->langcodes<4.0.0,>=3.2.0->spaCy)\n",
+      "  Downloading marisa_trie-1.2.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.0 kB)\n",
+      "Collecting markdown-it-py>=2.2.0 (from rich>=10.11.0->typer<1.0.0,>=0.3.0->spaCy)\n",
+      "  Downloading markdown_it_py-3.0.0-py3-none-any.whl.metadata (6.9 kB)\n",
+      "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /workspaces/Python-Text-Analysis_grupo_4/.venv/lib/python3.12/site-packages (from rich>=10.11.0->typer<1.0.0,>=0.3.0->spaCy) (2.19.1)\n",
+      "Collecting wrapt (from smart-open<8.0.0,>=5.2.1->weasel<0.5.0,>=0.1.0->spaCy)\n",
+      "  Downloading wrapt-1.17.2-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.4 kB)\n",
+      "Collecting mdurl~=0.1 (from markdown-it-py>=2.2.0->rich>=10.11.0->typer<1.0.0,>=0.3.0->spaCy)\n",
+      "  Downloading mdurl-0.1.2-py3-none-any.whl.metadata (1.6 kB)\n",
+      "Downloading spacy-3.8.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (31.8 MB)\n",
+      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m31.8/31.8 MB\u001b[0m \u001b[31m48.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m00:01\u001b[0m\n",
+      "\u001b[?25hDownloading catalogue-2.0.10-py3-none-any.whl (17 kB)\n",
+      "Downloading cymem-2.0.11-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (227 kB)\n",
+      "Downloading langcodes-3.5.0-py3-none-any.whl (182 kB)\n",
+      "Downloading murmurhash-1.0.12-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (138 kB)\n",
+      "Downloading preshed-3.0.9-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (156 kB)\n",
+      "Downloading pydantic-2.10.6-py3-none-any.whl (431 kB)\n",
+      "Downloading pydantic_core-2.27.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)\n",
+      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.0/2.0 MB\u001b[0m \u001b[31m49.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+      "\u001b[?25hDownloading spacy_legacy-3.0.12-py2.py3-none-any.whl (29 kB)\n",
+      "Downloading spacy_loggers-1.0.5-py3-none-any.whl (22 kB)\n",
+      "Downloading srsly-2.5.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)\n",
+      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m42.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+      "\u001b[?25hDownloading thinc-8.3.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.7 MB)\n",
+      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.7/3.7 MB\u001b[0m \u001b[31m34.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+      "\u001b[?25hDownloading typer-0.15.2-py3-none-any.whl (45 kB)\n",
+      "Downloading wasabi-1.1.3-py3-none-any.whl (27 kB)\n",
+      "Downloading weasel-0.4.1-py3-none-any.whl (50 kB)\n",
+      "Downloading jinja2-3.1.6-py3-none-any.whl (134 kB)\n",
+      "Downloading setuptools-78.1.0-py3-none-any.whl (1.3 MB)\n",
+      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m44.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+      "\u001b[?25hDownloading annotated_types-0.7.0-py3-none-any.whl (13 kB)\n",
+      "Downloading blis-1.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.6 MB)\n",
+      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m11.6/11.6 MB\u001b[0m \u001b[31m47.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m\n",
+      "\u001b[?25hDownloading cloudpathlib-0.21.0-py3-none-any.whl (52 kB)\n",
+      "Downloading confection-0.1.5-py3-none-any.whl (35 kB)\n",
+      "Downloading language_data-1.3.0-py3-none-any.whl (5.4 MB)\n",
+      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m5.4/5.4 MB\u001b[0m \u001b[31m55.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+      "\u001b[?25hDownloading MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23 kB)\n",
+      "Downloading rich-13.9.4-py3-none-any.whl (242 kB)\n",
+      "Downloading shellingham-1.5.4-py2.py3-none-any.whl (9.8 kB)\n",
+      "Downloading smart_open-7.1.0-py3-none-any.whl (61 kB)\n",
+      "Downloading marisa_trie-1.2.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB)\n",
+      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.4/1.4 MB\u001b[0m \u001b[31m40.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+      "\u001b[?25hDownloading markdown_it_py-3.0.0-py3-none-any.whl (87 kB)\n",
+      "Downloading wrapt-1.17.2-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (89 kB)\n",
+      "Downloading mdurl-0.1.2-py3-none-any.whl (10.0 kB)\n",
+      "Installing collected packages: cymem, wrapt, wasabi, spacy-loggers, spacy-legacy, shellingham, setuptools, pydantic-core, murmurhash, mdurl, MarkupSafe, cloudpathlib, catalogue, blis, annotated-types, srsly, smart-open, pydantic, preshed, markdown-it-py, marisa-trie, jinja2, rich, language-data, confection, typer, thinc, langcodes, weasel, spaCy\n",
+      "Successfully installed MarkupSafe-3.0.2 annotated-types-0.7.0 blis-1.2.0 catalogue-2.0.10 cloudpathlib-0.21.0 confection-0.1.5 cymem-2.0.11 jinja2-3.1.6 langcodes-3.5.0 language-data-1.3.0 marisa-trie-1.2.1 markdown-it-py-3.0.0 mdurl-0.1.2 murmurhash-1.0.12 preshed-3.0.9 pydantic-2.10.6 pydantic-core-2.27.2 rich-13.9.4 setuptools-78.1.0 shellingham-1.5.4 smart-open-7.1.0 spaCy-3.8.4 spacy-legacy-3.0.12 spacy-loggers-1.0.5 srsly-2.5.1 thinc-8.3.4 typer-0.15.2 wasabi-1.1.3 weasel-0.4.1 wrapt-1.17.2\n",
+      "Note: you may need to restart the kernel to use updated packages.\n",
+      "Collecting en-core-web-sm==3.8.0\n",
+      "  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)\n",
+      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m12.8/12.8 MB\u001b[0m \u001b[31m45.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m\n",
+      "\u001b[?25hInstalling collected packages: en-core-web-sm\n",
+      "Successfully installed en-core-web-sm-3.8.0\n",
+      "\u001b[38;5;2m✔ Download and installation successful\u001b[0m\n",
+      "You can now load the package via spacy.load('en_core_web_sm')\n"
+     ]
+    }
+   ],
    "source": [
     "# Uncomment the following lines to install packages/model\n",
-    "# %pip install NLTK\n",
-    "# %pip install transformers\n",
-    "# %pip install spaCy\n",
-    "# !python -m spacy download en_core_web_sm"
+    "%pip install NLTK\n",
+    "%pip install transformers\n",
+    "%pip install spaCy\n",
+    "!python -m spacy download en_core_web_sm"
    ]
   },
   {
@@ -54,16 +258,22 @@
    "source": [
     "<a id='section1'></a>\n",
     "\n",
-    "# Preprocessing\n",
+    "# Preprocesamiento\n",
+    "\n",
+    "En la primera parte de este trabajo, se abordará el primer paso para el análisis de texto. Nuestra meta sera convertir los datos desordenados en un formato consistente. Este proceso se conoce como preprocesameinto/ limpieza de texto/ normalización del texto.\n",
+    "\n",
+    "Al final del preprocesamiento, los datos seguirán estando en un formato legible. En la segunda y tercera parte, se empza´ra a convertir los datos de tecto en una representación numérica, un formato más adecuado para su precesamiento computacional.\n",
     "\n",
-    "In Part 1 of this workshop, we'll address the first step of text analysis. Our goal is to convert the raw, messy text data into a consistent format. This process is often called **preprocessing**, **text cleaning**, or **text normalization**.\n",
+    "🔔 **Pregunta**: tomate un minuto para reflexionar con tus experiencias pasadas trabjando con datos de texto:\n",
+    "- ¿Cuál es el formato de los datos de texto con los que has trabajado (texto plano, CSV, XML)?\n",
     "\n",
-    "You'll notice that at the end of preprocessing, our data is still in a format that we can read and understand. In Parts 2 and 3, we will begin our foray into converting the text data into a numerical representation—a format that can be more readily handled by computers. \n",
+    "Hemos trabajado con datos en texto csv, xml, txt, para la limpieza de datos, análisis y entrenamiento de redes neuronales.\n",
+    "- ¿De dónde provinieron (corpus estructurado, scrapping web, encuestas)?\n",
     "\n",
-    "🔔 **Question**: Let's pause for a minute to reflect on **your** previous experiences working on text data. \n",
-    "- What is the format of the text data you have interacted with (plain text, CSV, or XML)?\n",
-    "- Where does it come from (structured corpus, scraped from the web, survey data)?\n",
-    "- Is it messy (i.e., is the data formatted consistently)?"
+    "Los datos fueron obtenidos desde kaggle, ya que contiene un gran banco de datos de todo tipo.\n",
+    "- ¿Los datos estaban desordenados o inconsistentes?\n",
+    "\n",
+    "Los datos estuvieron en algunos casos desordenados, en otros casos los datos inconsistentes pue slos descartabamos ya que necesitabamos avanzar rapidamente con el proyecto."
    ]
   },
   {
@@ -71,21 +281,21 @@
    "id": "4b35911a-3b3f-4a48-a7d1-9882aab04851",
    "metadata": {},
    "source": [
-    "## Common Processes\n",
+    "## Procesos Comunes\n",
     "\n",
-    "Preprocessing is not something we can accomplish with a single line of code. We often start by familiarizing ourselves with the data, and along the way, we gain a clearer understanding of the granularity of preprocessing we want to apply.\n",
+    "El preprocesmiento no s epuede lograr con una sola linea de código. A menudo, nos familizarizamos con los datos para entender mejor el nivel de grnuralidad necesario para aplicar el preprocesamiento.\n",
     "\n",
-    "Typically, we begin by applying a set of commonly used processes to clean the data. These operations don't substantially alter the form or meaning of the data; they serve as a standardized procedure to reshape the data into a consistent format.\n",
+    "Tipicamente, al inicio aplicamos un listado de procesos comunmente utilizados para la limpieza de datos. estas operaciones no alteran sustancialmente la forma ni el significado de los datos; solo sirven como un procesamiento estandarizado para reorganizar los datos en un formato consistente.\n",
     "\n",
-    "The following processes, for examples, are commonly applied to preprocess English texts of various genres. These operations can be done using built-in Python functions, such as `string` methods, and Regular Expressions. \n",
-    "- Lowercase the text\n",
-    "- Remove punctuation marks\n",
-    "- Remove extra whitespace characters\n",
-    "- Remove stop words\n",
+    "Los siguientes procesos, por ejemplo, se aplican comunmente para el procesamiento de libreos de inglés en varios generos. Estas operaciones pueden ser realizadas usando funciones integradas en python, métodos como `string`, y expresiones regulares.\n",
+    "- Convertir a minúsculas\n",
+    "- Eliminar signos de puntuación.\n",
+    "- Eliminar espacios en blanco que esten demás.\n",
+    "- Eliminar palabrás bacías.\n",
     "\n",
-    "After the initial processing, we may choose to perform task-specific processes, the specifics of which often depend on the downstream task we want to perform and the nature of the text data (i.e., its stylistic and linguistic features).  \n",
+    "Después del preocesamiento inicial, nosotros podemos seleccionar los preocesos específicos según la tarea, los detalles de estos procesos dependen de la tarea posterior que queremos llevar a cabo y el tipo de datos de texto (es decir, sus caracteríticas estilisticas y linguisticas).\n",
     "\n",
-    "Before we jump into these operations, let's take a look at our data!"
+    "¡Antes de adentrarnos en estas operaciones, echemos un vistazo a nuestros datos!"
    ]
   },
   {
@@ -93,16 +303,49 @@
    "id": "ec5d7350-9a1e-4db9-b828-a87fe1676d8d",
    "metadata": {},
    "source": [
-    "### Import the Text Data\n",
+    "### Importar datos de texto\n",
     "\n",
-    "The text data we'll be working with is a CSV file. It contains tweets about U.S. airlines, scrapped from Feb 2015. \n",
+    "trabajaremos con un archivo CSV. Este archivo contiene tweets sobre aerolíneas de EE.UU. recopilados en febrero de 2015\n",
     "\n",
-    "Let's read the file `airline_tweets.csv` into dataframe with `pandas`."
+    "Vamos a leer el archivo `airline_tweets.csv` dentro de un dataframe de `pandas`."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 3,
+   "id": "6bda2022",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Collecting pandas\n",
+      "  Downloading pandas-2.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)\n",
+      "Requirement already satisfied: numpy>=1.26.0 in /workspaces/Python-Text-Analysis_grupo_4/.venv/lib/python3.12/site-packages (from pandas) (2.2.4)\n",
+      "Requirement already satisfied: python-dateutil>=2.8.2 in /workspaces/Python-Text-Analysis_grupo_4/.venv/lib/python3.12/site-packages (from pandas) (2.9.0.post0)\n",
+      "Collecting pytz>=2020.1 (from pandas)\n",
+      "  Downloading pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)\n",
+      "Collecting tzdata>=2022.7 (from pandas)\n",
+      "  Downloading tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)\n",
+      "Requirement already satisfied: six>=1.5 in /workspaces/Python-Text-Analysis_grupo_4/.venv/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)\n",
+      "Downloading pandas-2.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.7 MB)\n",
+      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m12.7/12.7 MB\u001b[0m \u001b[31m44.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m\n",
+      "\u001b[?25hDownloading pytz-2025.2-py2.py3-none-any.whl (509 kB)\n",
+      "Downloading tzdata-2025.2-py2.py3-none-any.whl (347 kB)\n",
+      "Installing collected packages: pytz, tzdata, pandas\n",
+      "Successfully installed pandas-2.2.3 pytz-2025.2 tzdata-2025.2\n",
+      "Note: you may need to restart the kernel to use updated packages.\n"
+     ]
+    }
+   ],
+   "source": [
+    "%pip install pandas"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
    "id": "3d1ff64b-53ad-4eca-b846-3fda20085c43",
    "metadata": {},
    "outputs": [],
@@ -119,7 +362,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 5,
    "id": "e397ac6a-c2ba-4cce-8700-b36b38026c9d",
    "metadata": {},
    "outputs": [
@@ -293,7 +536,7 @@
        "4  2015-02-24 11:14:45 -0800            NaN  Pacific Time (US & Canada)  "
       ]
      },
-     "execution_count": 2,
+     "execution_count": 5,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -308,13 +551,13 @@
    "id": "ae3b339f-45cf-465d-931c-05f9096fd510",
    "metadata": {},
    "source": [
-    "The dataframe has one row per tweet. The text of tweet is shown in the `text` column.\n",
-    "- `text` (`str`): the text of the tweet.\n",
+    "El dataframe tiene una fila por cada tweet. El texto del tweet se muestra en la columna text.\n",
+    "- `text` (`str`): el texto del tweet.\n",
     "\n",
-    "Other metadata we are interested in include: \n",
-    "- `airline_sentiment` (`str`): the sentiment of the tweet, labeled as \"neutral,\" \"positive,\" or \"negative.\"\n",
-    "- `airline` (`str`): the airline that is tweeted about.\n",
-    "- `retweet count` (`int`): how many times the tweet was retweeted."
+    "Otra información relevante que nos interesa incluye:\n",
+    "- `airline_sentiment` (`str`): el sentimiento del tweet, etiquetado como \"neutral\", \"positivo\" o \"negativo\".\n",
+    "- `airline` (`str`): la aerolínea sobre la que se tuitea.\n",
+    "- `retweet count` (`int`): la cantidad de veces que el tweet fue retuiteado."
    ]
   },
   {
@@ -322,12 +565,12 @@
    "id": "302c695b-4bd1-4151-9cb9-ef5253eb16df",
    "metadata": {},
    "source": [
-    "Let's take a look at some of the tweets:"
+    "Echemos un vistazo a algunos de los tweets:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 7,
    "id": "b690daab-7be5-4b8f-8af0-a91fdec4ec4f",
    "metadata": {},
    "outputs": [
@@ -352,7 +595,9 @@
    "id": "8adc05fa-ad30-4402-ab56-086bcb09a166",
    "metadata": {},
    "source": [
-    "🔔 **Question**: What have you noticed? What are the stylistic features of tweets?"
+    "🔔 **Pregunta**: ¿Qué has notado? ¿Cuáles son las características estilísticas de los tweets?\n",
+    "\n",
+    "Los tweets son informales y son directos con respecto a los servicios de la aerolínea, a través de ello se puede identificar el sentimiento de los usuarios."
    ]
   },
   {
@@ -360,20 +605,20 @@
    "id": "c3460393-00a6-461c-b02a-9e98f9b5d1af",
    "metadata": {},
    "source": [
-    "### Lowercasing\n",
+    "### Convertir a minúsculas\n",
     "\n",
-    "While we acknowledge that a word's casing is informative, we often don't work in contexts where we can properly utilize this information.\n",
+    "Mientras reconocemos que el uso de mayúsculas y minúsculas en una palabra resulta ser información, a menudo no trabajams en contextos donde podemos aprovechar adecuandamente esta infomación.\n",
     "\n",
-    "More often, the subsequent analysis we perform is **case-insensitive**. For instance, in frequency analysis, we want to account for various forms of the same word. Lowercasing the text data aids in this process and simplifies our analysis.\n",
+    "Mayormente, el análisis posterior que realizamos es insensible a las mayúsculas. Por ejemplo, en el análisis de frecuencia, nosotros usualmente queremos considerar varias formas de la misma palabra. Convertir los datos de texto a minúsculas facilita este proceso y smplifica nestro análisis.\n",
     "\n",
-    "We can easily achieve lowercasing with the string method [`.lower()`](https://docs.python.org/3/library/stdtypes.html#str.lower); see [documentation](https://docs.python.org/3/library/stdtypes.html#string-methods) for more useful functions.\n",
+    "Podemos lograr fácilmente la conversión a minúsculas con el método de cadena [`.lower()`](https://docs.python.org/3/library/stdtypes.html#str.lower); visitar [documentation](https://docs.python.org/3/library/stdtypes.html#string-methods) para ver más funciones útiles.\n",
     "\n",
-    "Let's apply it to the following example:"
+    "Apliquémoslo al siguiente ejemplo:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 8,
    "id": "58a95d90-3ef1-4bff-9cfe-d447ed99f252",
    "metadata": {},
    "outputs": [
@@ -393,7 +638,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 9,
    "id": "c66d91c0-6eed-4591-95fc-cd2eae2e0d41",
    "metadata": {},
    "outputs": [
@@ -2151,7 +2396,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": ".venv",
    "language": "python",
    "name": "python3"
   },
@@ -2165,7 +2410,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.4"
+   "version": "3.12.1"
   }
  },
  "nbformat": 4,