read_df 未找到 #74

mosthandsomeman · 2024-11-12T09:05:24Z

跑官方示例调用tools name 'read_df' is not defined\n```"}

edwardzjl · 2024-11-12T09:19:31Z

很抱歉，由于人手和时间的紧缺，文档中存在许多不足。
read_df 是 TableGPT 执行器中提供的一个扩展方法，稍后会在文档中补充如何使用。
目前，如果你用的是本地执行器 (pybox.LocalPyBoxManager)，你可以把以下 python 代码存放到 $HOME/.ipython/profile_default/startup/ 下，应该就可以跑通 quickstart:

import os
from pathlib import Path
from typing import NamedTuple, cast

import pandas as pd
import concurrent.futures


class FileEncoding(NamedTuple):
    """File encoding as the NamedTuple."""

    encoding: str | None
    """The encoding of the file."""
    confidence: float
    """The confidence of the encoding."""
    language: str | None
    """The language of the file."""


def detect_file_encodings(
    file_path: str | Path, timeout: int = 5
) -> list[FileEncoding]:
    """Try to detect the file encoding.

    Returns a list of `FileEncoding` tuples with the detected encodings ordered
    by confidence.

    Args:
        file_path: The path to the file to detect the encoding for.
        timeout: The timeout in seconds for the encoding detection.
    """
    import chardet

    file_path = str(file_path)

    def read_and_detect(file_path: str) -> list[dict]:
        with open(file_path, "rb") as f:
            rawdata = f.read()
        return cast(list[dict], chardet.detect_all(rawdata))

    with concurrent.futures.ThreadPoolExecutor() as executor:
        future = executor.submit(read_and_detect, file_path)
        try:
            encodings = future.result(timeout=timeout)
        except concurrent.futures.TimeoutError:
            raise TimeoutError(
                f"Timeout reached while detecting encoding for {file_path}"
            )

    if all(encoding["encoding"] is None for encoding in encodings):
        raise RuntimeError(f"Could not detect encoding for {file_path}")
    return [FileEncoding(**enc) for enc in encodings if enc["encoding"] is not None]


def path_from_uri(uri: str) -> Path:
    """Return a new path from the given 'file' URI.
    This is implemented in Python 3.13.
    See <https://github.com/python/cpython/pull/107640>
    and <https://github.com/python/cpython/pull/107640/files#diff-fa525485738fc33d05b06c159172ff1f319c26e88d8c6bb39f7dbaae4dc4105c>
    TODO: remove when we migrate to Python 3.13"""
    if not uri.startswith("file:"):
        raise ValueError(f"URI does not start with 'file:': {uri!r}")
    path = uri[5:]
    if path[:3] == "///":
        # Remove empty authority
        path = path[2:]
    elif path[:12] == "//localhost/":
        # Remove 'localhost' authority
        path = path[11:]
    if path[:3] == "///" or (path[:1] == "/" and path[2:3] in ":|"):
        # Remove slash before DOS device/UNC path
        path = path[1:]
    if path[1:2] == "|":
        # Replace bar with colon in DOS drive
        path = path[:1] + ":" + path[2:]
    from urllib.parse import unquote_to_bytes

    path = Path(os.fsdecode(unquote_to_bytes(path)))
    if not path.is_absolute():
        raise ValueError(f"URI is not absolute: {uri!r}")
    return path


def file_extention(file: str) -> str:
    path = Path(file)
    return path.suffix


def read_df(uri: str, autodetect_encoding: bool = True, **kwargs) -> pd.DataFrame:
    """A simple wrapper to read different file formats into DataFrame."""
    try:
        return _read_df(uri, **kwargs)
    except UnicodeDecodeError as e:
        if autodetect_encoding:
            detected_encodings = detect_file_encodings(path_from_uri(uri), timeout=30)
            for encoding in detected_encodings:
                try:
                    return _read_df(uri, encoding=encoding.encoding, **kwargs)
                except UnicodeDecodeError:
                    continue
        # Either we ran out of detected encoding, or autodetect_encoding is False,
        # we should raise encoding error
        raise ValueError(f"不支持的文件编码{e.encoding}，请转换成 utf-8 后重试")


def _read_df(uri: str, encoding: str = "utf-8", **kwargs) -> pd.DataFrame:
    """A simple wrapper to read different file formats into DataFrame."""
    ext = file_extention(uri).lower()
    if ext == ".csv":
        df = pd.read_csv(uri, encoding=encoding, **kwargs)
    elif ext == ".tsv":
        df = pd.read_csv(uri, sep="\t", encoding=encoding, **kwargs)
    elif ext in [".xls", ".xlsx", ".xlsm", ".xlsb", ".odf", ".ods", ".odt"]:
        # read_excel does not support 'encoding' arg, also it seems that it does not need it.
        df = pd.read_excel(uri, **kwargs)
    else:
        raise ValueError(
            f"TableGPT 目前支持 csv、tsv 以及 xlsx 文件，您上传的文件格式 {ext} 暂不支持。"
        )
    return df

jianpugh · 2024-11-12T11:16:57Z

加了这个文件之后好像还是跑不通

openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': '[{\'type\': \'string_type\', \'loc\': (\'body\', \'messages\', 2, \'tool_calls\'), \'msg\': \'Input should be a valid string\', \'input\': [{\'type\': \'function\', \'id\': \'36bbe686-92f7-4a99-adf5-99135b4db4ee\', \'function\': {\'name\': \'python\', \'arguments\': \'{"query": "# Load the data into a DataFrame\\\\ndf = read_df(\\\'examples/datasets/titanic.csv\\\')\\\\n\\\\n# Remove leading and trailing whitespaces in column names\\\\ndf.columns = df.columns.str.strip()\\\\n\\\\n# Remove rows and columns that contain only empty values\\\\ndf = df.dropna(how=\\\'all\\\').dropna(axis=1, how=\\\'all\\\')\\\\n\\\\n# Get the basic information of the dataset\\\\ndf.info(memory_usage=False)"}\'}}], \'url\': \'https://errors.pydantic.dev/2.6/v/string_type\'}, {\'type\': \'string_type\', \'loc\': (\'body\', \'messages\', 3, \'content\'), \'msg\': \'Input should be a valid string\', \'input\': [{\'type\': \'text\', \'text\': "```pycon\\n<class \'pandas.core.frame.DataFrame\'>\\nRangeIndex: 4 entries, 0 to 3\\nData columns (total 8 columns):\\n # Column Non-Null Count Dtype \\n--- ------ -------------- ----- \\n 0 Pclass 4 non-null int64 \\n 1 Sex 4 non-null object \\n 2 Age 4 non-null float64\\n 3 SibSp 4 non-null int64 \\n 4 Parch 4 non-null int64 \\n 5 Fare 4 non-null float64\\n 6 Embarked 4 non-null object \\n 7 Survived 4 non-null int64 \\ndtypes: float64(2), int64(4), object(2)\\n```"}], \'url\': \'https://errors.pydantic.dev/2.6/v/string_type\'}, {\'type\': \'string_type\', \'loc\': (\'body\', \'messages\', 4, \'tool_calls\'), \'msg\': \'Input should be a valid string\', \'input\': [{\'type\': \'function\', \'id\': \'5cafd675-8191-4cd2-9f79-dca9aa6f5906\', \'function\': {\'name\': \'python\', \'arguments\': \'{"query": "# Show the first 5 rows to understand the structure\\\\ndf.head(5)"}\'}}], \'url\': \'https://errors.pydantic.dev/2.6/v/string_type\'}, {\'type\': \'string_type\', \'loc\': (\'body\', \'messages\', 5, \'content\'), \'msg\': \'Input should be a valid string\', \'input\': [{\'type\': \'text\', \'text\': \'```pycon\\n Pclass Sex Age SibSp Parch Fare Embarked Survived\\n0 2 female 29.0 0 2 23.0000 S 1\\n1 3 female 39.0 1 5 31.2750 S 0\\n2 3 male 26.5 0 0 7.2250 C 0\\n3 3 male 32.0 0 0 56.4958 S 1\\n```\'}], \'url\': \'https://errors.pydantic.dev/2.6/v/string_type\'}]', 'type': 'BadRequestError', 'param': None, 'code': 400}

vegetablest · 2024-11-12T12:05:59Z

@jianpugh 你好，请问你使用的vllm版本是多少？

jianpugh · 2024-11-12T12:23:43Z

@jianpugh 你好，请问你使用的vllm版本是多少？

v0.4.0

vegetablest · 2024-11-12T13:02:19Z

@jianpugh 你好，请问你使用的vllm版本是多少？

v0.4.0

嗯嗯，请尝试升级vllm试试

zTaoplus · 2024-11-12T13:56:29Z

@jianpugh 你好，请问你使用的vllm版本是多少？

v0.4.0

感谢您的使用与反馈, 经测试, 请保证您的vllm版本 >=0.5.5 即可正常响应tablegpt-agent 中的 data_analysis 功能

jianpugh · 2024-11-13T02:01:37Z

升级之后确实可以了，感谢指导！

jianpugh · 2024-11-13T02:43:20Z

不好意思，再打扰一下，还有两个问题

最后的回答结果一般会出现在什么地方？我看命令行打印的内容，好像不一定在最后一行里
我上传了一个xlsx的文件做问答，好像没答出来，看中间的过程报的文件类型不支持？

{'event': 'on_chain_end', 'data': {'output': AgentFinish(return_values={'output': '我已经了解了数据的基本结构。接下来，请告诉我数据集中的列名是什么，以便我继续分析。'}, log='我已经了解了数据的基本结构。接下来，请告诉我数据集中的列名是什么，以便我继续分析。'), 'input': {'messages': [HumanMessage(content='文件名称: examples/datasets/guangdong.xlsx', additional_kwargs={'attachments': [{'filename': 'examples/datasets/guangdong.xlsx'}]}, response_metadata={}, id='74d74112-32d8-4ea6-85fa-a38f6386f58e'), AIMessage(content="我已经收到您的数据文件，我需要查看文件内容以对数据集有一个初步的了解。首先我会读取数据到 df变量中，并通过df.info查看 NaN 情况和数据类型。\n```python\n# Load the data into a DataFrame\ndf = read_df('examples/datasets/guangdong.xlsx')\n\n# Remove leading and trailing whitespaces in column names\ndf.columns = df.columns.str.strip()\n\n# Remove rows and columns that contain only empty values\ndf = df.dropna(how='all').dropna(axis=1, how='all')\n\n# Get the basic information of the dataset\ndf.info(memory_usage=False)\n```", additional_kwargs={'parent_id': 'some-parent-id1', 'thought': '我已经收到您的数据文件，我需要查看文件内容以对数据集有一个初步的了解。首先我会读取数据到df变量中，并通过df.info查看 NaN 情况和数据类型。', 'action': {'tool': 'python', 'tool_input': "# Load the data into a DataFrame\ndf = read_df('examples/datasets/guangdong.xlsx')\n\n# Remove leading and trailing whitespaces in column names\ndf.columns = df.columns.str.strip()\n\n# Remove rows and columns that contain only empty values\ndf = df.dropna(how='all').dropna(axis=1, how='all')\n\n# Get the basic information of the dataset\ndf.info(memory_usage=False)"}, 'model_type': None}, response_metadata={}, id='0cfeeed4-6873-4be1-9b8a-b1882bfa4b8c', tool_calls=[{'name': 'python', 'args': {'query': "# Load the data into a DataFrame\ndf = read_df('examples/datasets/guangdong.xlsx')\n\n# Remove leading and trailing whitespaces in column names\ndf.columns = df.columns.str.strip()\n\n# Remove rows and columns that contain only empty values\ndf = df.dropna(how='all').dropna(axis=1, how='all')\n\n# Get the basic information of the dataset\ndf.info(memory_usage=False)"}, 'id': '7bcc44a4-3d59-4cbc-b6ee-3a92d808f086', 'type': 'tool_call'}]), ToolMessage(content=[{'type': 'text', 'text': '```pycon\n---------------------------------------------------------------------------\nModuleNotFoundError Traceback (most recent call last)\nFile D:\\environment\\miniconda3\\envs\\TableGPT-Agent\\Lib\\site-packages\\pandas\\compat\\_optional.py:135, in import_optional_dependency(name, extra, errors, min_version)\n 134 try:\n--> 135 module = importlib.import_module(name)\n 136 except ImportError:\n\nFile D:\\environment\\miniconda3\\envs\\TableGPT-Agent\\Lib\\importlib\\__init__.py:126, in import_module(name, package)\n 125 level += 1\n--> 126 return _bootstrap._gcd_import(name[level:], package, level)\n\nFile <frozen importlib._bootstrap>:1204, in _gcd_import(name, package, level)\n\nFile <frozen importlib._bootstrap>:1176, in _find_and_load(name, import_)\n\nFile <frozen importlib._bootstrap>:1140, in _find_and_load_unlocked(name, import_)\n\nModuleNotFoundError: No module named \'openpyxl\'\n\nDuring handling of the above exception, another exception occurred:\n\nImportError Traceback (most recent call last)\nCell In[1], line 2\n 1 # Load the data into a DataFrame\n----> 2 df = read_df(\'examples/datasets/guangdong.xlsx\')\n 4 # Remove leading and trailing whitespaces in column names\n 5 df.columns = df.columns.str.strip()\n\nFile ~\\.ipython\\profile_default\\startup\\read_df.py:92, in read_df(uri, autodetect_encoding, **kwargs)\n 90 """A simple wrapper to read different file formats into DataFrame."""\n 91 try:\n---> 92 return _read_df(uri, **kwargs)\n 93 except UnicodeDecodeError as e:\n 94 if autodetect_encoding:\n\nFile ~\\.ipython\\profile_default\\startup\\read_df.py:115, in _read_df(uri, encoding, **kwargs)\n 112 df = pd.read_csv(uri, sep="\\t", encoding=encoding, **kwargs)\n 113 elif ext in [".xls", ".xlsx", ".xlsm", ".xlsb", ".odf", ".ods", ".odt"]:\n 114 # read_excel does not support \'encoding\' arg, also it seems that it does not need it.\n--> 115 df = pd.read_excel(uri, **kwargs)\n 116 else:\n 117 raise ValueError(\n 118 f"TableGPT 目前支持 csv、tsv 以及 xlsx 文件，您上传的文件格式 {ext} 暂不支持。"\n 119 )\n\nFile D:\\environment\\miniconda3\\envs\\TableGPT-Agent\\Lib\\site-packages\\pandas\\io\\excel\\_base.py:495, in read_excel(io, sheet_name, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, date_format, thousands, decimal, comment, skipfooter, storage_options, dtype_backend, engine_kwargs)\n 493 if not isinstance(io, ExcelFile):\n 494 should_close = True\n--> 495 io = ExcelFile(\n 496 io,\n 497 storage_options=storage_options,\n 498 engine=engine,\n 499 engine_kwargs=engine_kwargs,\n 500 )\n 501 elif engine and engine != io.engine:\n 502 raise ValueError(\n 503 "Engine should not be specified when passing "\n 504 "an ExcelFile - ExcelFile already has the engine set"\n 505 )\n\nFile D:\\environment\\miniconda3\\envs\\TableGPT-Agent\\Lib\\site-packages\\pandas\\io\\excel\\_base.py:1567, in ExcelFile.__init__(self, path_or_buffer, engine, storage_options, engine_kwargs)\n 1564 self.engine = engine\n 1565 self.storage_options = storage_options\n-> 1567 self._reader = self._engines[engine](\n 1568 self._io,\n 1569 storage_options=storage_options,\n 1570 engine_kwargs=engine_kwargs,\n 1571 )\n\nFile D:\\environment\\miniconda3\\envs\\TableGPT-Agent\\Lib\\site-packages\\pandas\\io\\excel\\_openpyxl.py:552, in OpenpyxlReader.__init__(self, filepath_or_buffer, storage_options, engine_kwargs)\n 534 @doc(storage_options=_shared_docs["storage_options"])\n 535 def __init__(\n 536 self,\n (...)\n 539 engine_kwargs: dict | None = None,\n 540 ) -> None:\n 541 """\n 542 Reader using openpyxl engine.\n 543 \n (...)\n 550 Arbitrary keyword arguments passed to excel engine.\n 551 """\n--> 552 import_optional_dependency("openpyxl")\n 553 super().__init__(\n 554 filepath_or_buffer,\n 555 storage_options=storage_options,\n 556 engine_kwargs=engine_kwargs,\n 557 )\n\nFile D:\\environment\\miniconda3\\envs\\TableGPT-Agent\\Lib\\site-packages\\pandas\\compat\\_optional.py:138, in import_optional_dependency(name, extra, errors, min_version)\n 136 except ImportError:\n 137 if errors == "raise":\n--> 138 raise ImportError(msg)\n 139 return None\n 141 # Handle submodules: if we have submodule, grab parent module from sys.modules\n\nImportError: Missing optional dependency \'openpyxl\'. Use pip or conda to install openpyxl.\n```'}], name='python', id='813a5a06-ba00-4bf2-89e8-8d66f26f7870', tool_call_id='7bcc44a4-3d59-4cbc-b6ee-3a92d808f086', artifact=[]), AIMessage(content='接下来我将用df.head(5)来查看数据集的前 5 行。\n```python\n# Show the first 5 rows to understand the structure\ndf.head(5)\n```', additional_kwargs={'parent_id': 'some-parent-id1', 'thought': '接下来我将用df.head(5)来查看数据集的前 5 行。', 'action': {'tool': 'python', 'tool_input': '# Show the first 5 rows to understand the structure\ndf.head(5)'}, 'model_type': None}, response_metadata={}, id='47b1bd3d-25a1-46e3-aaf2-18c2dbf0ae1e', tool_calls=[{'name': 'python', 'args': {'query': '# Show the first 5 rows to understand the structure\ndf.head(5)'}, 'id': 'f3f884b4-5c35-4893-9ac4-bb7fd20c6a9a', 'type': 'tool_call'}]), ToolMessage(content=[{'type': 'text', 'text': "```pycon\n---------------------------------------------------------------------------\nNameError Traceback (most recent call last)\nCell In[2], line 2\n 1 # Show the first 5 rows to understand the structure\n----> 2 df.head(5)\n\nNameError: name 'df' is not defined\n```"}], name='python', id='bf67b874-9e94-453f-8ea7-39222a89d0fd', tool_call_id='f3f884b4-5c35-4893-9ac4-bb7fd20c6a9a', artifact=[]), AIMessage(content='我已经了解了数据集 examples/datasets/guangdong.xlsx 的基本信息。请问我可以帮您做些什么？', additional_kwargs={'parent_id': 'some-parent-id1'}, response_metadata={}, id='4f984f2b-5a38-499a-9c86-ad1b27de3540'), HumanMessage(content='有哪些专业的学费不超过4300', additional_kwargs={}, response_metadata={}, id='77dbdef2-10a9-439f-bb37-8a7206e58b50'), AIMessage(content="为了回答这个问题，我需要先筛选出学费不超过4300的专业，然后列出这些专业的名称。\n```python\n# 筛选出学费不超过4300的专业\naffordable_programs = df[df['学费'] <= 4300]\n\n# 列出这些专业的名称\naffordable_programs['专业']\n```", additional_kwargs={'thought': '为了回答这个问题，我需要先筛选出学费不超过4300的专业，然后列出这些专业的名称。', 'action': {'tool': 'python', 'tool_input': "# 筛选出学费不超过4300的专业\naffordable_programs = df[df['学费'] <= 4300]\n\n# 列出这些专业的名称\naffordable_programs['专业']"}, 'parent_id': 'some-parent-id2'}, response_metadata={}, id='ebb27228-360f-4384-af52-afbf98509147', tool_calls=[{'name': 'python', 'args': {'query': "# 筛选出学费不超过4300的专业\naffordable_programs = df[df['学费'] <= 4300]\n\n# 列出这些专业的名称\naffordable_programs['专业']"}, 'id': '12b71399-b536-46a1-b089-1857e91fd054', 'type': 'tool_call'}]), ToolMessage(content=[{'type': 'text', 'text': "```pycon\n---------------------------------------------------------------------------\nNameError Traceback (most recent call last)\nCell In[3], line 2\n 1 # 筛选出学费不超过4300的专业\n----> 2 affordable_programs = df[df['学费'] <= 4300]\n 4 # 列出这些专业的名称\n 5 affordable_programs['专业']\n\nNameError: name 'df' is not defined\n```"}], name='python', id='257b77b6-aec4-41d2-8a59-a90e38592087', tool_call_id='12b71399-b536-46a1-b089-1857e91fd054', artifact=[])], 'date': datetime.date(2024, 11, 13)}}, 'run_id': '495299d7-4338-427f-bae3-020acabd4a19', 'name': 'RunnableSequence', 'tags': ['seq:step:1'], 'metadata': {'thread_id': 'some-thread-id', 'langgraph_step': 4, 'langgraph_node': 'agent', 'langgraph_triggers': ['branch:tools:agent_selector:agent'], 'langgraph_path': ('__pregel_pull', 'agent'), 'langgraph_checkpoint_ns': 'data_analyze_graph:f6303585-4691-124b-a9df-eae13e9f5f3e|agent:2f564dd8-f3e6-3a83-f34c-66b0f643f86f', 'checkpoint_ns': 'data_analyze_graph:f6303585-4691-124b-a9df-eae13e9f5f3e'}, 'parent_ids': ['c16e02f4-7f17-48b7-b389-cdf0da5b06ef', '9b251e39-bf72-42bb-b553-109ffedfc172', '28a5c947-ca53-46be-bb89-c784e97b5a91', '771fcf79-3dcb-4b05-90a1-c5f343a8f864']}

vegetablest · 2024-11-13T02:51:13Z

1.event_stream执行的时候会有很多事件流，详情请参考 https://python.langchain.com/docs/how_to/streaming/#event-reference 你可以通过event["event"]=='on_chat_model_end'去忽略其它的中间过程，只保留模型的响应，如果只想要最后答案或许通过ainvoke来运行agent更合适，就像下边这样：

human_message = HumanMessage(content="How many men survived?")
response = await agent.ainvoke(
    input={
        # After using checkpoint, you only need to add new messages here.
        "messages": [human_message],
        "parent_id": "some-parent-id2",
        "date": date.today(),  # noqa: DTZ011
    },
    config={
        "configurable": {"thread_id": "some-thread-id"},
    },
)
print(response["messages"][-1])

2.你分析的是一个excel文件，看日志好像是pandas读取excel的相关依赖缺失了，请通过pip install openpyxl安装之后重新尝试

jianpugh · 2024-11-13T03:05:13Z

1.event_stream执行的时候会有很多事件流，详情请参考 https://python.langchain.com/docs/how_to/streaming/#event-reference 你可以通过event_name过滤不关心的信息，如果只想要最后答案或许你可以通过ainvoke来运行agent 2.你分析的是一个excel文件，看日志好像是pandas读取excel的相关依赖缺失了，请通过pip install openpyxl安装之后重新尝试

感谢您的解答~

lllyyyqqq · 2024-11-14T07:52:28Z

很抱歉，由于人手和时间的紧缺，文档中存在许多不足。 read_df 是 TableGPT 执行器中提供的一个扩展方法，稍后会在文档中补充如何使用。目前，如果你用的是本地执行器 (pybox.LocalPyBoxManager)，你可以把以下 python 代码存放到 $HOME/.ipython/profile_default/startup/ 下，应该就可以跑通 quickstart:

import os
from pathlib import Path
from typing import NamedTuple, cast

import pandas as pd
import concurrent.futures


class FileEncoding(NamedTuple):
    """File encoding as the NamedTuple."""

    encoding: str | None
    """The encoding of the file."""
    confidence: float
    """The confidence of the encoding."""
    language: str | None
    """The language of the file."""


def detect_file_encodings(
    file_path: str | Path, timeout: int = 5
) -> list[FileEncoding]:
    """Try to detect the file encoding.

    Returns a list of `FileEncoding` tuples with the detected encodings ordered
    by confidence.

    Args:
        file_path: The path to the file to detect the encoding for.
        timeout: The timeout in seconds for the encoding detection.
    """
    import chardet

    file_path = str(file_path)

    def read_and_detect(file_path: str) -> list[dict]:
        with open(file_path, "rb") as f:
            rawdata = f.read()
        return cast(list[dict], chardet.detect_all(rawdata))

    with concurrent.futures.ThreadPoolExecutor() as executor:
        future = executor.submit(read_and_detect, file_path)
        try:
            encodings = future.result(timeout=timeout)
        except concurrent.futures.TimeoutError:
            raise TimeoutError(
                f"Timeout reached while detecting encoding for {file_path}"
            )

    if all(encoding["encoding"] is None for encoding in encodings):
        raise RuntimeError(f"Could not detect encoding for {file_path}")
    return [FileEncoding(**enc) for enc in encodings if enc["encoding"] is not None]


def path_from_uri(uri: str) -> Path:
    """Return a new path from the given 'file' URI.
    This is implemented in Python 3.13.
    See <https://github.com/python/cpython/pull/107640>
    and <https://github.com/python/cpython/pull/107640/files#diff-fa525485738fc33d05b06c159172ff1f319c26e88d8c6bb39f7dbaae4dc4105c>
    TODO: remove when we migrate to Python 3.13"""
    if not uri.startswith("file:"):
        raise ValueError(f"URI does not start with 'file:': {uri!r}")
    path = uri[5:]
    if path[:3] == "///":
        # Remove empty authority
        path = path[2:]
    elif path[:12] == "//localhost/":
        # Remove 'localhost' authority
        path = path[11:]
    if path[:3] == "///" or (path[:1] == "/" and path[2:3] in ":|"):
        # Remove slash before DOS device/UNC path
        path = path[1:]
    if path[1:2] == "|":
        # Replace bar with colon in DOS drive
        path = path[:1] + ":" + path[2:]
    from urllib.parse import unquote_to_bytes

    path = Path(os.fsdecode(unquote_to_bytes(path)))
    if not path.is_absolute():
        raise ValueError(f"URI is not absolute: {uri!r}")
    return path


def file_extention(file: str) -> str:
    path = Path(file)
    return path.suffix


def read_df(uri: str, autodetect_encoding: bool = True, **kwargs) -> pd.DataFrame:
    """A simple wrapper to read different file formats into DataFrame."""
    try:
        return _read_df(uri, **kwargs)
    except UnicodeDecodeError as e:
        if autodetect_encoding:
            detected_encodings = detect_file_encodings(path_from_uri(uri), timeout=30)
            for encoding in detected_encodings:
                try:
                    return _read_df(uri, encoding=encoding.encoding, **kwargs)
                except UnicodeDecodeError:
                    continue
        # Either we ran out of detected encoding, or autodetect_encoding is False,
        # we should raise encoding error
        raise ValueError(f"不支持的文件编码{e.encoding}，请转换成 utf-8 后重试")


def _read_df(uri: str, encoding: str = "utf-8", **kwargs) -> pd.DataFrame:
    """A simple wrapper to read different file formats into DataFrame."""
    ext = file_extention(uri).lower()
    if ext == ".csv":
        df = pd.read_csv(uri, encoding=encoding, **kwargs)
    elif ext == ".tsv":
        df = pd.read_csv(uri, sep="\t", encoding=encoding, **kwargs)
    elif ext in [".xls", ".xlsx", ".xlsm", ".xlsb", ".odf", ".ods", ".odt"]:
        # read_excel does not support 'encoding' arg, also it seems that it does not need it.
        df = pd.read_excel(uri, **kwargs)
    else:
        raise ValueError(
            f"TableGPT 目前支持 csv、tsv 以及 xlsx 文件，您上传的文件格式 {ext} 暂不支持。"
        )
    return df

请问文件名是什么

vegetablest · 2024-11-14T07:55:37Z

文件名可以是xx.py 可以参考：https://tablegpt.github.io/tablegpt-agent/howto/incluster-code-execution/

edc3000 · 2024-11-14T08:21:23Z

你好，我直接运行examples/quick_start.py文件，报错ModuleNotFoundError: No module named 'tablegpt'。

我该如何正确跑通一次demo呢

edwardzjl · 2024-11-14T08:44:33Z

@edc3000 你有先安装 tablegpt 吗？
https://tablegpt.github.io/tablegpt-agent/tutorials/quickstart/

edc3000 · 2024-11-15T02:39:40Z

@edwardzjl 感谢，安装了tablegpt已经解决！

但我在尝试跑chat on tablular data的部分，发现日志有这样的报错：ToolMessage(content="Error: NoSuchKernel('python3')\n Please fix your mistakes.", name='python', id='d8cee44a-6895-4172-a928-f555c073e34a', tool_call_id='7e8dcc69-3228-46cf-9998-75331fba2082', status='error'）

这是我缺乏安装什么包导致吗，还是原本生成的代码无法执行的问题

vegetablest · 2024-11-15T02:44:13Z

本地运行应该通过pip install tablegpt-agent[local]安装tablegpt. 参考: https://tablegpt.github.io/tablegpt-agent/tutorials/quickstart/

edc3000 · 2024-11-15T02:46:02Z

@vegetablest 是的，我就是 pip install tablegpt-agent[local] 安装的

vegetablest · 2024-11-15T03:02:11Z

你好，我新建了一个全新的python venv，然后执行chat on tablular data没有复现你的问题。

11425  python -m venv venv
11426  source ./venv/bin/activate
11427  pip install langchain-openai
11428  pip install "tablegpt-agent[local]"

你可以先尝试运行jupyter kernelspec list查看是不是缺少了python3的kernelspec.如果确实缺少了，请通过pip install ipykernel安装。或者你也可以直接重新安装tablegpt-agent[local]。

vegetablest · 2024-11-15T06:52:29Z

@edc3000 你的问题解决了吗，如果还有问题我想我们应该开一个新的issue，因为你的问题已经与该主题无关了。

vegetablest · 2024-11-15T06:52:42Z

@edwardzjl read_df的讨论已经结束，建议关闭该Issue。

edwardzjl self-assigned this Nov 12, 2024

zTaoplus mentioned this issue Nov 13, 2024

feat: tablegpt kernel setups #81

Merged

edwardzjl closed this as completed Nov 15, 2024

weekenthralling mentioned this issue Nov 28, 2024

调用agent出现错误pycon\nError: NoSuchKernel('python3')\n Please fix your mistakes.\n` #122

Closed

vegetablest mentioned this issue Dec 4, 2024

输出的杂乱信息太多了，可读性差 #140

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_df 未找到 #74

read_df 未找到 #74

mosthandsomeman commented Nov 12, 2024

edwardzjl commented Nov 12, 2024

jianpugh commented Nov 12, 2024

vegetablest commented Nov 12, 2024 •

edited

Loading

jianpugh commented Nov 12, 2024

vegetablest commented Nov 12, 2024

zTaoplus commented Nov 12, 2024

jianpugh commented Nov 13, 2024

jianpugh commented Nov 13, 2024

vegetablest commented Nov 13, 2024 •

edited

Loading

jianpugh commented Nov 13, 2024

lllyyyqqq commented Nov 14, 2024

vegetablest commented Nov 14, 2024 •

edited

Loading

edc3000 commented Nov 14, 2024

edwardzjl commented Nov 14, 2024

edc3000 commented Nov 15, 2024

vegetablest commented Nov 15, 2024 •

edited

Loading

edc3000 commented Nov 15, 2024

vegetablest commented Nov 15, 2024 •

edited

Loading

vegetablest commented Nov 15, 2024

vegetablest commented Nov 15, 2024

read_df 未找到 #74

read_df 未找到 #74

Comments

mosthandsomeman commented Nov 12, 2024

edwardzjl commented Nov 12, 2024

jianpugh commented Nov 12, 2024

vegetablest commented Nov 12, 2024 • edited Loading

jianpugh commented Nov 12, 2024

vegetablest commented Nov 12, 2024

zTaoplus commented Nov 12, 2024

jianpugh commented Nov 13, 2024

jianpugh commented Nov 13, 2024

vegetablest commented Nov 13, 2024 • edited Loading

jianpugh commented Nov 13, 2024

lllyyyqqq commented Nov 14, 2024

vegetablest commented Nov 14, 2024 • edited Loading

edc3000 commented Nov 14, 2024

edwardzjl commented Nov 14, 2024

edc3000 commented Nov 15, 2024

vegetablest commented Nov 15, 2024 • edited Loading

edc3000 commented Nov 15, 2024

vegetablest commented Nov 15, 2024 • edited Loading

vegetablest commented Nov 15, 2024

vegetablest commented Nov 15, 2024

vegetablest commented Nov 12, 2024 •

edited

Loading

vegetablest commented Nov 13, 2024 •

edited

Loading

vegetablest commented Nov 14, 2024 •

edited

Loading

vegetablest commented Nov 15, 2024 •

edited

Loading

vegetablest commented Nov 15, 2024 •

edited

Loading