Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: use ipynb tutorial #100

Merged
merged 1 commit into from
Nov 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ tablegpt-agent is a pre-built agent for [TableGPT2 (huggingface)](https://huggin

<!-- mkdocs requires 4 space to intent the list -->
- Tutorials
- [Quickstart](tutorials/quickstart.md)
- [Chat on Tabular Data](tutorials/chat-on-tabular-data.md)
- [Continue Analysis on Generated Charts](tutorials/continue-analysis-on-generated-charts.md)
- [Quickstart](tutorials/quick-start.ipynb)
- [Chat on Tabular Data](tutorials/chat-on-tabular-data.ipynb)
- [Continue Analysis on Generated Charts](tutorials/continue-analysis-on-generated-charts.ipynb)
- How-To Guides
- [Enhance TableGPT Agent with RAG](howto/retrieval.md)
- [Persist Messages](howto/persist-messages.md)
Expand Down
8 changes: 8 additions & 0 deletions docs/stylesheets/extra.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
/* hide jupyter notebooks input/output numbers */
.jp-InputPrompt {
display: none !important;
}

.jp-OutputPrompt {
display: none !important;
}
190 changes: 190 additions & 0 deletions docs/tutorials/chat-on-tabular-data.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "1944f5bf",
"metadata": {},
"source": [
"# Chat on Tabular Data\n",
"\n",
"TableGPT Agent excels at analyzing and processing tabular data. To perform data analysis, you need to first let the agent \"see\" the dataset. This is done by a specific \"file-reading\" workflow. In short, you begin by \"uploading\" the dataset and let the agent read it. Once the data is read, you can ask the agent questions about it.\n",
"\n",
"> To learn more about the file-reading workflow, see [File Reading](./explanation/file-reading.md).\n",
"\n",
"For data analysis tasks, we introduce two important parameters when creating the agent: `checkpointer` and `session_id`.\n",
"\n",
"- The `checkpointer` should be an instance of `langgraph.checkpoint.base.BaseCheckpointSaver`, which acts as a versioned \"memory\" for the agent. (See [langgraph's persistence concept](https://langchain-ai.github.io/langgraph/concepts/persistence) for more details.)\n",
"- The `session_id` is a unique identifier for the current session. It ties the agent's execution to a specific kernel, ensuring that the agent's results are retained across multiple invocations.\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "ec321eaa",
"metadata": {},
"outputs": [],
"source": [
"from langgraph.checkpoint.memory import MemorySaver\n",
"\n",
"checkpointer = MemorySaver()\n",
"agent = create_tablegpt_graph(\n",
" llm=llm,\n",
" pybox_manager=pybox_manager,\n",
" checkpointer=checkpointer,\n",
" session_id=\"some-session-id\",\n",
")"
]
},
{
"cell_type": "markdown",
"id": "b2554859",
"metadata": {},
"source": [
"Add the file for processing in the additional_kwargs of HumanMessage. Here's an example using the [Titanic dataset](https://github.com/tablegpt/tablegpt-agent/blob/main/examples/datasets/titanic.csv).\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "55a52fb7",
"metadata": {},
"outputs": [],
"source": [
"from typing import TypedDict\n",
"from langchain_core.messages import HumanMessage\n",
"\n",
"class Attachment(TypedDict):\n",
" \"\"\"Contains at least one dictionary with the key filename.\"\"\"\n",
" filename: str\n",
"\n",
"attachment_msg = HumanMessage(\n",
" content=\"\",\n",
" # Please make sure your iPython kernel can access your filename.\n",
" additional_kwargs={\"attachments\": [Attachment(filename=\"titanic.csv\")]},\n",
")"
]
},
{
"cell_type": "markdown",
"id": "0630560d",
"metadata": {},
"source": [
"Invoke the agent as shown in the quick start:\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "ce20b1b0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[HumanMessage(content='', additional_kwargs={'attachments': [{'filename': 'titanic.csv'}]}, response_metadata={}, id='ab0a7157-ad7d-4de8-9b24-1bee78ad7c55'),\n",
" AIMessage(content=\"我已经收到您的数据文件,我需要查看文件内容以对数据集有一个初步的了解。首先我会读取数据到 `df` 变量中,并通过 `df.info` 查看 NaN 情况和数据类型。\\n```python\\n# Load the data into a DataFrame\\ndf = read_df('titanic.csv')\\n\\n# Remove leading and trailing whitespaces in column names\\ndf.columns = df.columns.str.strip()\\n\\n# Remove rows and columns that contain only empty values\\ndf = df.dropna(how='all').dropna(axis=1, how='all')\\n\\n# Get the basic information of the dataset\\ndf.info(memory_usage=False)\\n```\", additional_kwargs={'parent_id': 'some-parent-id1', 'thought': '我已经收到您的数据文件,我需要查看文件内容以对数据集有一个初步的了解。首先我会读取数据到 `df` 变量中,并通过 `df.info` 查看 NaN 情况和数据类型。', 'action': {'tool': 'python', 'tool_input': \"# Load the data into a DataFrame\\ndf = read_df('titanic.csv')\\n\\n# Remove leading and trailing whitespaces in column names\\ndf.columns = df.columns.str.strip()\\n\\n# Remove rows and columns that contain only empty values\\ndf = df.dropna(how='all').dropna(axis=1, how='all')\\n\\n# Get the basic information of the dataset\\ndf.info(memory_usage=False)\"}, 'model_type': None}, response_metadata={}, id='add6691d-d7ea-411d-9699-e99ae0b7de97', tool_calls=[{'name': 'python', 'args': {'query': \"# Load the data into a DataFrame\\ndf = read_df('titanic.csv')\\n\\n# Remove leading and trailing whitespaces in column names\\ndf.columns = df.columns.str.strip()\\n\\n# Remove rows and columns that contain only empty values\\ndf = df.dropna(how='all').dropna(axis=1, how='all')\\n\\n# Get the basic information of the dataset\\ndf.info(memory_usage=False)\"}, 'id': 'b846aa01-04ef-4669-9a5c-53ddcb9a2dfb', 'type': 'tool_call'}]),\n",
" ToolMessage(content=[{'type': 'text', 'text': \"```pycon\\n<class 'pandas.core.frame.DataFrame'>\\nRangeIndex: 4 entries, 0 to 3\\nData columns (total 8 columns):\\n # Column Non-Null Count Dtype \\n--- ------ -------------- ----- \\n 0 Pclass 4 non-null int64 \\n 1 Sex 4 non-null object \\n 2 Age 4 non-null float64\\n 3 SibSp 4 non-null int64 \\n 4 Parch 4 non-null int64 \\n 5 Fare 4 non-null float64\\n 6 Embarked 4 non-null object \\n 7 Survived 4 non-null int64 \\ndtypes: float64(2), int64(4), object(2)\\n```\"}], name='python', id='0d441b21-bff3-463c-a07f-c0b12bd17bc5', tool_call_id='b846aa01-04ef-4669-9a5c-53ddcb9a2dfb', artifact=[]),\n",
" AIMessage(content='接下来我将用 `df.head(5)` 来查看数据集的前 5 行。\\n```python\\n# Show the first 5 rows to understand the structure\\ndf.head(5)\\n```', additional_kwargs={'parent_id': 'some-parent-id1', 'thought': '接下来我将用 `df.head(5)` 来查看数据集的前 5 行。', 'action': {'tool': 'python', 'tool_input': '# Show the first 5 rows to understand the structure\\ndf.head(5)'}, 'model_type': None}, response_metadata={}, id='5e26ef1d-7042-471e-b39f-194a51a185c7', tool_calls=[{'name': 'python', 'args': {'query': '# Show the first 5 rows to understand the structure\\ndf.head(5)'}, 'id': 'f6be0d96-05b3-4b5b-8313-90197a8c3d87', 'type': 'tool_call'}]),\n",
" ToolMessage(content=[{'type': 'text', 'text': '```pycon\\n Pclass Sex Age SibSp Parch Fare Embarked Survived\\n0 2 female 29.0 0 2 23.000 S 1\\n1 3 female 39.0 1 5 31.275 S 0\\n2 3 male 26.5 0 0 7.225 C 0\\n3 3 male 32.0 0 0 56.496 S 1\\n```'}], name='python', id='6fc6d8aa-546c-467e-91d3-d57b0b62dd68', tool_call_id='f6be0d96-05b3-4b5b-8313-90197a8c3d87', artifact=[]),\n",
" AIMessage(content='我已经了解了数据集 titanic.csv 的基本信息。请问我可以帮您做些什么?', additional_kwargs={'parent_id': 'some-parent-id1'}, response_metadata={}, id='b6dc3885-94cb-4b0f-b691-f37c4c8c9ba3')]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from datetime import date\n",
"from tablegpt.agent.file_reading import Stage\n",
"\n",
"# Reading and processing files.\n",
"response = await agent.ainvoke(\n",
" input={\n",
" \"entry_message\": attachment_msg,\n",
" \"processing_stage\": Stage.UPLOADED,\n",
" \"messages\": [attachment_msg],\n",
" \"parent_id\": \"some-parent-id1\",\n",
" \"date\": date.today(),\n",
" },\n",
" config={\n",
" # Using checkpointer requires binding thread_id at runtime.\n",
" \"configurable\": {\"thread_id\": \"some-thread-id\"},\n",
" },\n",
")\n",
"response[\"messages\"]"
]
},
{
"cell_type": "markdown",
"id": "bed80e60",
"metadata": {},
"source": [
"Continue to ask questions for data analysis:\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "dcceeebf",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"content=\"为了回答您的问题,我将筛选出所有男性乘客并计算其中的幸存者数量。\\n```python\\n# Filter male passengers who survived and count them\\nmale_survivors = df[(df['Sex'] == 'male') & (df['Survived'] == 1)]\\nmale_survivors_count = male_survivors.shape[0]\\nmale_survivors_count\\n```\" additional_kwargs={} response_metadata={'finish_reason': 'stop', 'model_name': 'TableGPT2-7B'} id='run-661d7496-341d-4a6b-84d8-b4094db66ef0'\n",
"content=[{'type': 'text', 'text': '```pycon\\n1\\n```'}] name='python' id='1c7531db-9150-451d-a8dd-f07176454e6f' tool_call_id='2860e8bb-0fa7-421b-bb2d-bfeca873354b' artifact=[]\n",
"content='根据数据集,有 1 名男性乘客幸存。' additional_kwargs={} response_metadata={'finish_reason': 'stop', 'model_name': 'TableGPT2-7B'} id='run-db640705-0085-4f47-adb4-3e0adce694cd'\n"
]
}
],
"source": [
"human_message = HumanMessage(content=\"How many men survived?\")\n",
"\n",
"async for event in agent.astream_events(\n",
" input={\n",
" # After using checkpoint, you only need to add new messages here.\n",
" \"messages\": [human_message],\n",
" \"parent_id\": \"some-parent-id2\",\n",
" \"date\": date.today(),\n",
" },\n",
" version=\"v2\",\n",
" # We configure the same thread_id to use checkpoints to retrieve the memory of the last run.\n",
" config={\"configurable\": {\"thread_id\": \"some-thread-id\"}},\n",
"):\n",
" event_name: str = event[\"name\"]\n",
" evt: str = event[\"event\"]\n",
" if evt == \"on_chat_model_end\":\n",
" print(event[\"data\"][\"output\"])\n",
" elif event_name == \"tools\" and evt == \"on_chain_stream\":\n",
" for lc_msg in event[\"data\"][\"chunk\"][\"messages\"]:\n",
" print(lc_msg)\n",
" else:\n",
" # Other events can be handled here.\n",
" pass\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading