bug fix issue #95 #215

benx13 · 2024-11-06T02:45:33Z

LightRAG Bug Fix Report

Issue

A TypeError was occurring in the hybrid query mode when trying to access content from text units that contained None values. The error specifically occurred in the _find_most_related_text_unit_from_entities function when trying to process text units for token size truncation.

Root Cause

The issue stemmed from insufficient null checks when processing text units in the knowledge graph. Specifically:

Text unit data could be None when retrieved from text_chunks_db
The data dictionary could be missing the 'content' field
No proper filtering of invalid entries before token size truncation

Key problematic area was in:

591:597:LightRAG/lightrag/operate.py

    if any([v is None for v in all_text_units_lookup.values()]):
        logger.warning("Text chunks are missing, maybe the storage is damaged")
    all_text_units = [
        {"id": k, **v} for k, v in all_text_units_lookup.items() if v is not None
    ]
    all_text_units = sorted(
        all_text_units, key=lambda x: (x["order"], -x["relation_counts"])

Solution

Added comprehensive null checks and data validation throughout the text unit processing pipeline:

Added null check for node data and source_id field:

571:575:LightRAG/lightrag/operate.py

        for k, v in zip(all_one_hop_nodes, all_one_hop_nodes_data)
        if v is not None
    }
    all_text_units_lookup = {}
    for index, (this_text_units, this_edges) in enumerate(zip(text_units, edges)):

Added content validation when getting chunk data:

591:597:LightRAG/lightrag/operate.py

    if any([v is None for v in all_text_units_lookup.values()]):
        logger.warning("Text chunks are missing, maybe the storage is damaged")
    all_text_units = [
        {"id": k, **v} for k, v in all_text_units_lookup.items() if v is not None
    ]
    all_text_units = sorted(
        all_text_units, key=lambda x: (x["order"], -x["relation_counts"])

Added comprehensive filtering for None values:

599:604:LightRAG/lightrag/operate.py

    all_text_units = truncate_list_by_token_size(
        all_text_units,
        key=lambda x: x["data"]["content"],
        max_token_size=query_param.max_token_for_text_unit,
    )
    all_text_units: list[TextChunkSchema] = [t["data"] for t in all_text_units]

The changes are backward compatible and require no modifications to the existing API or data structures.

benx13 · 2024-11-06T02:46:41Z

Fixes #95

LarFii · 2024-11-07T06:55:20Z

Thanks for your contribution!

bug fix issue HKUDS#95

c956e39

Merge branch 'main' into main

36e9236

LarFii merged commit 7c5080e into HKUDS:main Nov 7, 2024

LarFii mentioned this pull request Nov 7, 2024

Error in naive queries with None #206

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug fix issue #95 #215

bug fix issue #95 #215

benx13 commented Nov 6, 2024

benx13 commented Nov 6, 2024

LarFii commented Nov 7, 2024

bug fix issue #95 #215

bug fix issue #95 #215

Conversation

benx13 commented Nov 6, 2024

LightRAG Bug Fix Report

Issue

Root Cause

Solution

benx13 commented Nov 6, 2024

LarFii commented Nov 7, 2024