diff --git a/docs/explanation/file-reading.ipynb b/docs/explanation/file-reading.ipynb
new file mode 100644
index 0000000..71f2603
--- /dev/null
+++ b/docs/explanation/file-reading.ipynb
@@ -0,0 +1,596 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "229c30c0-9715-48a2-b5fe-ee8c733d847a",
+   "metadata": {},
+   "source": [
+    "# File Reading\n",
+    "\n",
+    "When working with dataset files, maintaining a clear separation between file reading and data analysis workflows can significantly improve control and clarity. At TableGPT Agent, we've designed a robust and structured approach to handling file reading that empowers the LLM (Large Language Model) to effectively analyze dataset files without being overwhelmed by unnecessary details. This method not only enhances the LLM's ability to inspect the data but also ensures a smoother and more reliable data analysis process.\n",
+    "\n",
+    "Traditionally, allowing an LLM to directly inspect a dataset might involve simply calling the `df.head()` function to preview its content. While this approach suffices for straightforward use cases, it often lacks depth when dealing with more complex or messy datasets. To address this, we've developed a multi-step file reading workflow designed to deliver richer insights into the dataset structure while preparing it for advanced analysis."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6ffbe96-f066-4b10-a743-0e9da6d41cbd",
+   "metadata": {},
+   "source": [
+    "**Here's how the workflow unfolds:**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f9ba4763-5784-4c39-8e99-6156061e35bf",
+   "metadata": {},
+   "source": [
+    "## Normalization (Optional)\n",
+    "\n",
+    "Not all files are immediately suitable for direct analysis. Excel files, in particular, can pose challenges—irregular formatting, merged cells, and inconsistent headers are just a few examples. To tackle these issues, we introduce an optional normalization step that preprocesses the data, transforming it into a format that is “pandas-friendly.”\n",
+    "\n",
+    "This step addresses the most common quirks in Excel files, such as non-standard column headers, inconsistent row structures, or missing metadata. By resolving these typical issues upfront, the data is transformed into a format that is 'pandas-friendly' ensuring smooth integration with downstream processes.\n",
+    "\n",
+    "**Example Scenario:**\n",
+    "\n",
+    "Imagine you have an Excel file that looks like this:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "c83bfe50-176b-4781-a4f6-ba809aa54750",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead tr th {\n",
+       "        text-align: left;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr>\n",
+       "      <th></th>\n",
+       "      <th colspan=\"9\" halign=\"left\">产品生产统计表</th>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th></th>\n",
+       "      <th>生产日期</th>\n",
+       "      <th>制造编号</th>\n",
+       "      <th>产品名称</th>\n",
+       "      <th>预定产量</th>\n",
+       "      <th colspan=\"2\" halign=\"left\">本日产量</th>\n",
+       "      <th>累计产量</th>\n",
+       "      <th colspan=\"2\" halign=\"left\">耗费工时</th>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th></th>\n",
+       "      <th>Unnamed: 0_level_2</th>\n",
+       "      <th>Unnamed: 1_level_2</th>\n",
+       "      <th>Unnamed: 2_level_2</th>\n",
+       "      <th>Unnamed: 3_level_2</th>\n",
+       "      <th>预计</th>\n",
+       "      <th>实际</th>\n",
+       "      <th>Unnamed: 6_level_2</th>\n",
+       "      <th>本日</th>\n",
+       "      <th>累计</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>2007-08-10 00:00:00</td>\n",
+       "      <td>FK-001</td>\n",
+       "      <td>猕猴桃果肉饮料</td>\n",
+       "      <td>100000.0</td>\n",
+       "      <td>40000</td>\n",
+       "      <td>45000</td>\n",
+       "      <td>83000</td>\n",
+       "      <td>10.0</td>\n",
+       "      <td>20.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>2007-08-11 00:00:00</td>\n",
+       "      <td>FK-002</td>\n",
+       "      <td>西瓜果肉饮料</td>\n",
+       "      <td>100000.0</td>\n",
+       "      <td>40000</td>\n",
+       "      <td>44000</td>\n",
+       "      <td>82000</td>\n",
+       "      <td>9.0</td>\n",
+       "      <td>18.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>2007-08-12 00:00:00</td>\n",
+       "      <td>FK-003</td>\n",
+       "      <td>草莓果肉饮料</td>\n",
+       "      <td>100000.0</td>\n",
+       "      <td>40000</td>\n",
+       "      <td>45000</td>\n",
+       "      <td>83000</td>\n",
+       "      <td>9.0</td>\n",
+       "      <td>18.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>2007-08-13 00:00:00</td>\n",
+       "      <td>FK-004</td>\n",
+       "      <td>蓝莓果肉饮料</td>\n",
+       "      <td>100000.0</td>\n",
+       "      <td>40000</td>\n",
+       "      <td>45000</td>\n",
+       "      <td>83000</td>\n",
+       "      <td>9.0</td>\n",
+       "      <td>18.0</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>2007-08-14 00:00:00</td>\n",
+       "      <td>FK-005</td>\n",
+       "      <td>水密桃果肉饮料</td>\n",
+       "      <td>100000.0</td>\n",
+       "      <td>40000</td>\n",
+       "      <td>45000</td>\n",
+       "      <td>83000</td>\n",
+       "      <td>10.0</td>\n",
+       "      <td>20.0</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "               产品生产统计表                                                                                                      \n",
+       "                  生产日期               制造编号               产品名称               预定产量   本日产量                      累计产量  耗费工时      \n",
+       "    Unnamed: 0_level_2 Unnamed: 1_level_2 Unnamed: 2_level_2 Unnamed: 3_level_2     预计     实际 Unnamed: 6_level_2    本日    累计\n",
+       "0  2007-08-10 00:00:00             FK-001            猕猴桃果肉饮料           100000.0  40000  45000              83000  10.0  20.0\n",
+       "1  2007-08-11 00:00:00             FK-002             西瓜果肉饮料           100000.0  40000  44000              82000   9.0  18.0\n",
+       "2  2007-08-12 00:00:00             FK-003             草莓果肉饮料           100000.0  40000  45000              83000   9.0  18.0\n",
+       "3  2007-08-13 00:00:00             FK-004             蓝莓果肉饮料           100000.0  40000  45000              83000   9.0  18.0\n",
+       "4  2007-08-14 00:00:00             FK-005            水密桃果肉饮料           100000.0  40000  45000              83000  10.0  20.0"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Load the data into a DataFrame\n",
+    "df1 = read_df('产品生产统计表.xlsx', header=[0, 1, 2])\n",
+    "df1.head(5)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0062d50e-63c9-4be2-bc15-7ebc15b23e4e",
+   "metadata": {},
+   "source": [
+    "The file is riddled with merged cells, empty rows, and redundant formatting that make it incompatible with pandas. If you try to load this file directly, pandas might misinterpret the structure or fail to parse it entirely.\n",
+    "\n",
+    "With our normalization feature, irregular datasets can be seamlessly transformed into clean, structured formats. When using the `create_tablegpt_agent` method, simply pass the `normalize_llm` parameter. The system will automatically analyze the irregular data and generate the appropriate transformation code, ensuring the dataset is prepared in the optimal format for further analysis.\n",
+    "\n",
+    "Below is an example of the code generated for the provided irregular dataset:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "9933fabd-d951-4da6-9bcd-a6511b12bc1b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Normalize the data\n",
+    "try:\n",
+    "    df = df1.copy()\n",
+    "\n",
+    "    import pandas as pd\n",
+    "\n",
+    "    # Assuming the original data is loaded into a DataFrame named df\n",
+    "    # Here is the transformation process:\n",
+    "\n",
+    "    # Step 1: Isolate the Table Header\n",
+    "    # Remove the unnecessary top rows and columns\n",
+    "    final_df = df.iloc[2:, :9].copy()\n",
+    "\n",
+    "    # Step 2: Rename Columns of final_df\n",
+    "    # Adjust the column names to match the desired format\n",
+    "    final_df.columns = ['生产日期', '制造编号', '产品名称', '预定产量', '本日产量预计', '本日产量实际', '累计产量', '本日耗费工时', '累计耗费工时']\n",
+    "\n",
+    "    # Step 3: Data Processing\n",
+    "    # Ensure there are no NaN values and drop any duplicate rows if necessary\n",
+    "    final_df.dropna(inplace=True)\n",
+    "    final_df.drop_duplicates(inplace=True)\n",
+    "\n",
+    "    # Convert the appropriate columns to numeric types\n",
+    "    final_df['预定产量'] = final_df['预定产量'].astype(int)\n",
+    "    final_df['本日产量预计'] = final_df['本日产量预计'].astype(int)\n",
+    "    final_df['本日产量实际'] = final_df['本日产量实际'].astype(int)\n",
+    "    final_df['累计产量'] = final_df['累计产量'].astype(int)\n",
+    "    final_df['本日耗费工时'] = final_df['本日耗费工时'].astype(int)\n",
+    "    final_df['累计耗费工时'] = final_df['累计耗费工时'].astype(int)\n",
+    "\n",
+    "    # Display the transformed DataFrame\n",
+    "    if final_df.columns.tolist() == final_df.iloc[0].tolist():\n",
+    "        final_df = final_df.iloc[1:]\n",
+    "\n",
+    "    # reassign df1 with the formatted DataFrame\n",
+    "    df1 = final_df\n",
+    "except Exception as e:\n",
+    "    # Unable to apply formatting to the original DataFrame. proceeding with the unformatted DataFrame.\n",
+    "    print(f\"Reformat failed with error {e}, use the original DataFrame.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b589ac3-405c-4350-84f7-bf675ddaaa06",
+   "metadata": {},
+   "source": [
+    "Using the generated transformation code, the irregular dataset is converted into a clean, structured format, ready for analysis:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "76efd557-333b-46c3-a697-644a84b8e6ec",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>生产日期</th>\n",
+       "      <th>制造编号</th>\n",
+       "      <th>产品名称</th>\n",
+       "      <th>预定产量</th>\n",
+       "      <th>本日产量预计</th>\n",
+       "      <th>本日产量实际</th>\n",
+       "      <th>累计产量</th>\n",
+       "      <th>本日耗费工时</th>\n",
+       "      <th>累计耗费工时</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>2007-08-12 00:00:00</td>\n",
+       "      <td>FK-003</td>\n",
+       "      <td>草莓果肉饮料</td>\n",
+       "      <td>100000</td>\n",
+       "      <td>40000</td>\n",
+       "      <td>45000</td>\n",
+       "      <td>83000</td>\n",
+       "      <td>9</td>\n",
+       "      <td>18</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>2007-08-13 00:00:00</td>\n",
+       "      <td>FK-004</td>\n",
+       "      <td>蓝莓果肉饮料</td>\n",
+       "      <td>100000</td>\n",
+       "      <td>40000</td>\n",
+       "      <td>45000</td>\n",
+       "      <td>83000</td>\n",
+       "      <td>9</td>\n",
+       "      <td>18</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>2007-08-14 00:00:00</td>\n",
+       "      <td>FK-005</td>\n",
+       "      <td>水密桃果肉饮料</td>\n",
+       "      <td>100000</td>\n",
+       "      <td>40000</td>\n",
+       "      <td>45000</td>\n",
+       "      <td>83000</td>\n",
+       "      <td>10</td>\n",
+       "      <td>20</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>2007-08-15 00:00:00</td>\n",
+       "      <td>FK-006</td>\n",
+       "      <td>荔枝果肉饮料</td>\n",
+       "      <td>100000</td>\n",
+       "      <td>40000</td>\n",
+       "      <td>44000</td>\n",
+       "      <td>82000</td>\n",
+       "      <td>10</td>\n",
+       "      <td>20</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>2007-08-16 00:00:00</td>\n",
+       "      <td>FK-007</td>\n",
+       "      <td>樱桃果肉饮料</td>\n",
+       "      <td>100000</td>\n",
+       "      <td>40000</td>\n",
+       "      <td>46000</td>\n",
+       "      <td>84000</td>\n",
+       "      <td>9</td>\n",
+       "      <td>18</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                  生产日期    制造编号     产品名称    预定产量  本日产量预计  本日产量实际   累计产量  本日耗费工时  累计耗费工时\n",
+       "2  2007-08-12 00:00:00  FK-003   草莓果肉饮料  100000   40000   45000  83000       9      18\n",
+       "3  2007-08-13 00:00:00  FK-004   蓝莓果肉饮料  100000   40000   45000  83000       9      18\n",
+       "4  2007-08-14 00:00:00  FK-005  水密桃果肉饮料  100000   40000   45000  83000      10      20\n",
+       "5  2007-08-15 00:00:00  FK-006   荔枝果肉饮料  100000   40000   44000  82000      10      20\n",
+       "6  2007-08-16 00:00:00  FK-007   樱桃果肉饮料  100000   40000   46000  84000       9      18"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df1.head(5)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac19e7a0-5487-4e02-80c0-f7dc48903df4",
+   "metadata": {},
+   "source": [
+    "## Dataset Structure Overview \n",
+    "\n",
+    "After normalization, the next step dives into the structural aspects of the dataset using the `df.info()` function. Unlike `df.head()`, which only shows a snippet of the data, `df.info()` provides a holistic view of the dataset’s structure. Key insights include:\n",
+    "\n",
+    "- **Column Data Types**: Helps identify numerical, categorical, or textual data at a glance.\n",
+    "- **Non-Null Counts**: Reveals the completeness of each column, making it easy to spot potential gaps or inconsistencies.\n",
+    "- **Memory Usage**: Offers a sense of the dataset's size, crucial for performance optimization in larger workflows.\n",
+    "\n",
+    "By focusing on the foundational structure of the dataset, this step enables the LLM to better understand the quality and layout of the data, paving the way for more informed analyses."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "2acf71a1-0e81-4f14-973e-05dfe1c9d963",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<class 'pandas.core.frame.DataFrame'>\n",
+      "Index: 18 entries, 2 to 19\n",
+      "Data columns (total 9 columns):\n",
+      " #   Column  Non-Null Count  Dtype \n",
+      "---  ------  --------------  ----- \n",
+      " 0   生产日期    18 non-null     object\n",
+      " 1   制造编号    18 non-null     object\n",
+      " 2   产品名称    18 non-null     object\n",
+      " 3   预定产量    18 non-null     int64 \n",
+      " 4   本日产量预计  18 non-null     int64 \n",
+      " 5   本日产量实际  18 non-null     int64 \n",
+      " 6   累计产量    18 non-null     int64 \n",
+      " 7   本日耗费工时  18 non-null     int64 \n",
+      " 8   累计耗费工时  18 non-null     int64 \n",
+      "dtypes: int64(6), object(3)"
+     ]
+    }
+   ],
+   "source": [
+    "# Remove leading and trailing whitespaces in column names\n",
+    "df1.columns = df1.columns.str.strip()\n",
+    "\n",
+    "# Remove rows and columns that contain only empty values\n",
+    "df1 = df1.dropna(how='all').dropna(axis=1, how='all')\n",
+    "\n",
+    "# Get the basic information of the dataset\n",
+    "df1.info(memory_usage=False)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "226cebc1-e38c-4da8-8455-662db9c152f6",
+   "metadata": {},
+   "source": [
+    "## Dataset Content Preview\n",
+    "\n",
+    "Finally, we utilize the `df.head()` function to provide a **visual preview of the dataset’s content**. This step is crucial for understanding the actual values within the dataset—patterns, anomalies, or trends often become apparent here.\n",
+    "\n",
+    "The number of rows displayed (`n`) is configurable to balance between granularity and simplicity. For smaller datasets or detailed exploration, a larger `n` might be beneficial. However, for larger datasets, displaying too many rows could overwhelm the LLM with excessive details, detracting from the primary analytical objectives."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "37ffeb0f-a80f-4ca8-9fda-fa2054870acf",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>生产日期</th>\n",
+       "      <th>制造编号</th>\n",
+       "      <th>产品名称</th>\n",
+       "      <th>预定产量</th>\n",
+       "      <th>本日产量预计</th>\n",
+       "      <th>本日产量实际</th>\n",
+       "      <th>累计产量</th>\n",
+       "      <th>本日耗费工时</th>\n",
+       "      <th>累计耗费工时</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>2007-08-12 00:00:00</td>\n",
+       "      <td>FK-003</td>\n",
+       "      <td>草莓果肉饮料</td>\n",
+       "      <td>100000</td>\n",
+       "      <td>40000</td>\n",
+       "      <td>45000</td>\n",
+       "      <td>83000</td>\n",
+       "      <td>9</td>\n",
+       "      <td>18</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>2007-08-13 00:00:00</td>\n",
+       "      <td>FK-004</td>\n",
+       "      <td>蓝莓果肉饮料</td>\n",
+       "      <td>100000</td>\n",
+       "      <td>40000</td>\n",
+       "      <td>45000</td>\n",
+       "      <td>83000</td>\n",
+       "      <td>9</td>\n",
+       "      <td>18</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>2007-08-14 00:00:00</td>\n",
+       "      <td>FK-005</td>\n",
+       "      <td>水密桃果肉饮料</td>\n",
+       "      <td>100000</td>\n",
+       "      <td>40000</td>\n",
+       "      <td>45000</td>\n",
+       "      <td>83000</td>\n",
+       "      <td>10</td>\n",
+       "      <td>20</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>2007-08-15 00:00:00</td>\n",
+       "      <td>FK-006</td>\n",
+       "      <td>荔枝果肉饮料</td>\n",
+       "      <td>100000</td>\n",
+       "      <td>40000</td>\n",
+       "      <td>44000</td>\n",
+       "      <td>82000</td>\n",
+       "      <td>10</td>\n",
+       "      <td>20</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>2007-08-16 00:00:00</td>\n",
+       "      <td>FK-007</td>\n",
+       "      <td>樱桃果肉饮料</td>\n",
+       "      <td>100000</td>\n",
+       "      <td>40000</td>\n",
+       "      <td>46000</td>\n",
+       "      <td>84000</td>\n",
+       "      <td>9</td>\n",
+       "      <td>18</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                  生产日期    制造编号     产品名称    预定产量  本日产量预计  本日产量实际   累计产量  本日耗费工时  累计耗费工时\n",
+       "2  2007-08-12 00:00:00  FK-003   草莓果肉饮料  100000   40000   45000  83000       9      18\n",
+       "3  2007-08-13 00:00:00  FK-004   蓝莓果肉饮料  100000   40000   45000  83000       9      18\n",
+       "4  2007-08-14 00:00:00  FK-005  水密桃果肉饮料  100000   40000   45000  83000      10      20\n",
+       "5  2007-08-15 00:00:00  FK-006   荔枝果肉饮料  100000   40000   44000  82000      10      20\n",
+       "6  2007-08-16 00:00:00  FK-007   樱桃果肉饮料  100000   40000   46000  84000       9      18"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Show the first 5 rows to understand the structure\n",
+    "df1.head(5)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a7b90c69-cf82-4711-9229-242113d30804",
+   "metadata": {},
+   "source": [
+    "## Why This Matters\n",
+    "\n",
+    "This structured, multi-step approach is not just about processing data; it's about making the LLM smarter in how it interacts with datasets. By systematically addressing issues like messy formatting, structural ambiguity, and information overload, we ensure the LLM operates with clarity and purpose.\n",
+    "\n",
+    "The separation of file reading from analysis offers several advantages:\n",
+    "\n",
+    "- Enhanced Accuracy: Preprocessing and structure-checking reduce the risk of errors in downstream analyses.\n",
+    "- Scalability: Handles datasets of varying complexity and size with equal efficiency.\n",
+    "- Transparency: Provides clear visibility into the dataset’s structure, enabling better decision-making.\n",
+    "\n",
+    "By adopting this method, TableGPT Agent transforms the way dataset files are read and analyzed, offering a smarter, more controlled, and ultimately more **user-friendly experience**."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/explanation/file-reading.md b/docs/explanation/file-reading.md
deleted file mode 100644
index 3fbcf7f..0000000
--- a/docs/explanation/file-reading.md
+++ /dev/null
@@ -1,9 +0,0 @@
-# File Reading
-
-TableGPT Agent separates the file reading workflow from the data analysis workflow to maintain greater control over how the LLM inspects the dataset files. Typically, if you let the LLM inspect the dataset itself, it uses the `df.head()` function to preview the data. While this is sufficient for basic cases, we have implemented a more structured approach by hard-coding the file reading workflow into several steps:
-
-- `normalization` (optional): For some Excel files, the content may not be 'pandas-friendly'. We include an optional normalization step to transform the Excel content into a more suitable format for pandas.
-- `df.info()`: Unlike `df.head()`, `df.info()` provides insights into the dataset's structure, such as the data types of each column and the number of non-null values, which also indicates whether a column contains NaN. This insight helps the LLM understand the structure and quality of the data.
-- `df.head()`: The final step displays the first n rows of the dataset, where n is configurable. A larger value for n allows the LLM to glean more information from the dataset; however, too much detail may divert its attention from the primary task.
-
-<!-- Need a picture -->
diff --git a/docs/index.md b/docs/index.md
index 540c2d3..effbb29 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -21,7 +21,7 @@ tablegpt-agent is a pre-built agent for [TableGPT2 (huggingface)](https://huggin
     - [Normalize Datasets](howto/normalize-datasets.md)
 - Explanation
     - [Agent Workflow](explanation/agent-workflow.md)
-    - [File Reading](explanation/file-reading.md)
+    - [File Reading](explanation/file-reading.ipynb)
 - [Reference](reference.md)
 
 ## Contributing
diff --git a/mkdocs.yml b/mkdocs.yml
index 78fe592..30750a1 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -56,7 +56,7 @@ nav:
   - Reference: reference.md
   - Explanation:
     - 'Agent Workflow': explanation/agent-workflow.md
-    - 'File Reading': explanation/file-reading.md
+    - 'File Reading': explanation/file-reading.ipynb
 
 repo_name: tablegpt/tablegpt-agent
 repo_url: https://github.com/tablegpt/tablegpt-agent

	产品生产统计表
	生产日期	制造编号	产品名称	预定产量	本日产量	累计产量	耗费工时
	Unnamed: 0_level_2	Unnamed: 1_level_2	Unnamed: 2_level_2	Unnamed: 3_level_2	预计	实际	Unnamed: 6_level_2	本日	累计
0	2007-08-10 00:00:00	FK-001	猕猴桃果肉饮料	100000.0	40000	45000	83000	10.0	20.0
1	2007-08-11 00:00:00	FK-002	西瓜果肉饮料	100000.0	40000	44000	82000	9.0	18.0
2	2007-08-12 00:00:00	FK-003	草莓果肉饮料	100000.0	40000	45000	83000	9.0	18.0
3	2007-08-13 00:00:00	FK-004	蓝莓果肉饮料	100000.0	40000	45000	83000	9.0	18.0
4	2007-08-14 00:00:00	FK-005	水密桃果肉饮料	100000.0	40000	45000	83000	10.0	20.0