-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_df 未找到 #74
Comments
很抱歉,由于人手和时间的紧缺,文档中存在许多不足。 import os
from pathlib import Path
from typing import NamedTuple, cast
import pandas as pd
import concurrent.futures
class FileEncoding(NamedTuple):
"""File encoding as the NamedTuple."""
encoding: str | None
"""The encoding of the file."""
confidence: float
"""The confidence of the encoding."""
language: str | None
"""The language of the file."""
def detect_file_encodings(
file_path: str | Path, timeout: int = 5
) -> list[FileEncoding]:
"""Try to detect the file encoding.
Returns a list of `FileEncoding` tuples with the detected encodings ordered
by confidence.
Args:
file_path: The path to the file to detect the encoding for.
timeout: The timeout in seconds for the encoding detection.
"""
import chardet
file_path = str(file_path)
def read_and_detect(file_path: str) -> list[dict]:
with open(file_path, "rb") as f:
rawdata = f.read()
return cast(list[dict], chardet.detect_all(rawdata))
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(read_and_detect, file_path)
try:
encodings = future.result(timeout=timeout)
except concurrent.futures.TimeoutError:
raise TimeoutError(
f"Timeout reached while detecting encoding for {file_path}"
)
if all(encoding["encoding"] is None for encoding in encodings):
raise RuntimeError(f"Could not detect encoding for {file_path}")
return [FileEncoding(**enc) for enc in encodings if enc["encoding"] is not None]
def path_from_uri(uri: str) -> Path:
"""Return a new path from the given 'file' URI.
This is implemented in Python 3.13.
See <https://github.com/python/cpython/pull/107640>
and <https://github.com/python/cpython/pull/107640/files#diff-fa525485738fc33d05b06c159172ff1f319c26e88d8c6bb39f7dbaae4dc4105c>
TODO: remove when we migrate to Python 3.13"""
if not uri.startswith("file:"):
raise ValueError(f"URI does not start with 'file:': {uri!r}")
path = uri[5:]
if path[:3] == "///":
# Remove empty authority
path = path[2:]
elif path[:12] == "//localhost/":
# Remove 'localhost' authority
path = path[11:]
if path[:3] == "///" or (path[:1] == "/" and path[2:3] in ":|"):
# Remove slash before DOS device/UNC path
path = path[1:]
if path[1:2] == "|":
# Replace bar with colon in DOS drive
path = path[:1] + ":" + path[2:]
from urllib.parse import unquote_to_bytes
path = Path(os.fsdecode(unquote_to_bytes(path)))
if not path.is_absolute():
raise ValueError(f"URI is not absolute: {uri!r}")
return path
def file_extention(file: str) -> str:
path = Path(file)
return path.suffix
def read_df(uri: str, autodetect_encoding: bool = True, **kwargs) -> pd.DataFrame:
"""A simple wrapper to read different file formats into DataFrame."""
try:
return _read_df(uri, **kwargs)
except UnicodeDecodeError as e:
if autodetect_encoding:
detected_encodings = detect_file_encodings(path_from_uri(uri), timeout=30)
for encoding in detected_encodings:
try:
return _read_df(uri, encoding=encoding.encoding, **kwargs)
except UnicodeDecodeError:
continue
# Either we ran out of detected encoding, or autodetect_encoding is False,
# we should raise encoding error
raise ValueError(f"不支持的文件编码{e.encoding},请转换成 utf-8 后重试")
def _read_df(uri: str, encoding: str = "utf-8", **kwargs) -> pd.DataFrame:
"""A simple wrapper to read different file formats into DataFrame."""
ext = file_extention(uri).lower()
if ext == ".csv":
df = pd.read_csv(uri, encoding=encoding, **kwargs)
elif ext == ".tsv":
df = pd.read_csv(uri, sep="\t", encoding=encoding, **kwargs)
elif ext in [".xls", ".xlsx", ".xlsm", ".xlsb", ".odf", ".ods", ".odt"]:
# read_excel does not support 'encoding' arg, also it seems that it does not need it.
df = pd.read_excel(uri, **kwargs)
else:
raise ValueError(
f"TableGPT 目前支持 csv、tsv 以及 xlsx 文件,您上传的文件格式 {ext} 暂不支持。"
)
return df |
加了这个文件之后好像还是跑不通
|
@jianpugh 你好,请问你使用的vllm版本是多少? |
v0.4.0 |
嗯嗯,请尝试升级vllm试试 |
感谢您的使用与反馈, 经测试, 请保证您的vllm版本 |
升级之后确实可以了,感谢指导! |
不好意思,再打扰一下,还有两个问题
|
1.event_stream执行的时候会有很多事件流,详情请参考 https://python.langchain.com/docs/how_to/streaming/#event-reference 你可以通过 human_message = HumanMessage(content="How many men survived?")
response = await agent.ainvoke(
input={
# After using checkpoint, you only need to add new messages here.
"messages": [human_message],
"parent_id": "some-parent-id2",
"date": date.today(), # noqa: DTZ011
},
config={
"configurable": {"thread_id": "some-thread-id"},
},
)
print(response["messages"][-1]) 2.你分析的是一个excel文件,看日志好像是pandas读取excel的相关依赖缺失了,请通过 |
感谢您的解答~ |
请问文件名是什么 |
你好,我直接运行examples/quick_start.py文件,报错ModuleNotFoundError: No module named 'tablegpt'。 我该如何正确跑通一次demo呢 |
@edc3000 你有先安装 tablegpt 吗? |
@edwardzjl 感谢,安装了tablegpt已经解决! 但我在尝试跑chat on tablular data的部分,发现日志有这样的报错:ToolMessage(content="Error: NoSuchKernel('python3')\n Please fix your mistakes.", name='python', id='d8cee44a-6895-4172-a928-f555c073e34a', tool_call_id='7e8dcc69-3228-46cf-9998-75331fba2082', status='error') 这是我缺乏安装什么包导致吗,还是原本生成的代码无法执行的问题 |
本地运行应该通过 |
@vegetablest 是的,我就是 |
你好,我新建了一个全新的python venv,然后执行chat on tablular data没有复现你的问题。 11425 python -m venv venv
11426 source ./venv/bin/activate
11427 pip install langchain-openai
11428 pip install "tablegpt-agent[local]" 你可以先尝试运行 |
@edc3000 你的问题解决了吗,如果还有问题我想我们应该开一个新的issue,因为你的问题已经与该主题无关了。 |
@edwardzjl read_df的讨论已经结束,建议关闭该Issue。 |
跑官方示例 调用tools name 'read_df' is not defined\n```"}
The text was updated successfully, but these errors were encountered: