Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 190: illegal multibyte sequence #3

Closed
cp-1919 opened this issue Oct 20, 2024 · 0 comments

Comments

@cp-1919
Copy link

cp-1919 commented Oct 20, 2024

您好!
当我在使用LightRag时,发现以下报错:

UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 190: illegal multibyte sequence

这会在使用中文时发生
这可能是open函数在自动识别文件编码类型时将中文错误识别为gbk编码导致的
将dbs文件下的load_storage函数修改如下

def load_storage(file_name) -> Union[DataBase, None]:
    if not os.path.exists(file_name):
        return None
    with open(file_name, encoding='utf-8') as f:
        data = json.load(f)
    data["matrix"] = buffer_string_to_array(data["matrix"]).reshape(
        -1, data["embedding_dim"]
    )
    logger.info(f"Load {data['matrix'].shape} data")
    return data

之后似乎就可以了

第一次写issue,语言不周请见谅

gusye1234 added a commit that referenced this issue Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants