Skip to content

[Bug]: ValueError: cannot convert float NaN to integer when running global search with dynamic selection #1864

@lsukharn

Description

@lsukharn

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

I ran this notebook on my data: https://github.com/microsoft/graphrag/blob/main/docs/examples_notebooks/global_search_with_dynamic_community_selection.ipynb
and got an error message after:

api_key = os.environ["GRAPHRAG_API_KEY"]
llm_model = os.environ["GRAPHRAG_LLM_MODEL"]
api_base = os.environ["API_BASE_TEST"]
deployment_name = os.environ["GRAPHRAG_LLM_MODEL_DEPLOYMENT_NAME"]

config = LanguageModelConfig(
    api_key=api_key,
    type=ModelType.AzureOpenAIChat,
    api_base=api_base,
    api_version='2025-01-01-preview',
    model=llm_model,
    deployment_name=deployment_name,
    max_retries=20,
)
model = ModelManager().get_or_create_chat_model(
    name="global_search",
    model_type=ModelType.AzureOpenAIChat,
    config=config,
)

token_encoder = tiktoken.encoding_for_model(llm_model)

OUTPUT_DIR = "./graphrag_project/output"
COMMUNITY_REPORT_TABLE = "community_reports"
ENTITY_TABLE = "entities"
COMMUNITY_TABLE = "communities"

# we don't fix a specific community level but instead use an agent to dynamicially
# search through all the community reports to check if they are relevant.
COMMUNITY_LEVEL = None

community_df = pd.read_parquet(f"{OUTPUT_DIR}/{COMMUNITY_TABLE}.parquet")
entity_df = pd.read_parquet(f"{OUTPUT_DIR}/{ENTITY_TABLE}.parquet")
report_df = pd.read_parquet(f"{OUTPUT_DIR}/{COMMUNITY_REPORT_TABLE}.parquet")

communities = read_indexer_communities(community_df, report_df)
reports = read_indexer_reports(
    report_df,
    community_df,
    community_level=COMMUNITY_LEVEL,
    dynamic_community_selection=True,
)
entities = read_indexer_entities(
    entity_df, community_df, community_level=COMMUNITY_LEVEL
)

print(f"Total report count: {len(report_df)}")
print(
    f"Report count after filtering by community level {COMMUNITY_LEVEL}: {len(reports)}"
)

report_df.head()

File ~\Desktop\graphrag_repo.venv\Lib\site-packages\graphrag\query\indexer_adapters.py:161, in read_indexer_entities..(x)
158 # group entities by id and degree and remove duplicated community IDs
159 nodes_df = nodes_df.groupby(["id"]).agg({"community": set}).reset_index()
160 nodes_df["community"] = nodes_df["community"].apply(
--> 161 lambda x: [str(int(i)) for i in x]
162 )
163 final_df = nodes_df.merge(entities_df, on="id", how="inner").drop_duplicates(
164 subset=["id"]
165 )
166 # read entity dataframe to knowledge model objects

ValueError: cannot convert float NaN to integer

See full error in the logs section.

Steps to reproduce

No response

Expected Behavior

I should be able to use the "dynamic" part of global search. The script works when I specify the COMMUNITY_LEVEL=2, but it fails when it's None.

GraphRAG Config Used

# Paste your config here

Logs and screenshots

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[47], line 12
      5 communities = read_indexer_communities(community_df, report_df)
      6 reports = read_indexer_reports(
      7     report_df,
      8     community_df,
      9     community_level=COMMUNITY_LEVEL,
     10     dynamic_community_selection=True,
     11 )
---> 12 entities = read_indexer_entities(
     13     entity_df, community_df, community_level=COMMUNITY_LEVEL
     14 )
     16 print(f"Total report count: {len(report_df)}")
     17 print(
     18     f"Report count after filtering by community level {COMMUNITY_LEVEL}: {len(reports)}"
     19 )

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\graphrag\query\indexer_adapters.py:160, in read_indexer_entities(final_entities, final_communities, community_level)
    158 # group entities by id and degree and remove duplicated community IDs
    159 nodes_df = nodes_df.groupby(["id"]).agg({"community": set}).reset_index()
--> 160 nodes_df["community"] = nodes_df["community"].apply(
    161     lambda x: [str(int(i)) for i in x]
    162 )
    163 final_df = nodes_df.merge(entities_df, on="id", how="inner").drop_duplicates(
    164     subset=["id"]
    165 )
    166 # read entity dataframe to knowledge model objects

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\pandas\core\series.py:4924, in Series.apply(self, func, convert_dtype, args, by_row, **kwargs)
   4789 def apply(
   4790     self,
   4791     func: AggFuncType,
   (...)   4796     **kwargs,
   4797 ) -> DataFrame | Series:
   4798     """
   4799     Invoke function on values of Series.
   4800 
   (...)   4915     dtype: float64
   4916     """
   4917     return SeriesApply(
   4918         self,
   4919         func,
   4920         convert_dtype=convert_dtype,
   4921         by_row=by_row,
   4922         args=args,
   4923         kwargs=kwargs,
-> 4924     ).apply()

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\pandas\core\apply.py:1427, in SeriesApply.apply(self)
   1424     return self.apply_compat()
   1426 # self.func is Callable
-> 1427 return self.apply_standard()

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\pandas\core\apply.py:1507, in SeriesApply.apply_standard(self)
   1501 # row-wise access
   1502 # apply doesn't have a `na_action` keyword and for backward compat reasons
   1503 # we need to give `na_action="ignore"` for categorical data.
   1504 # TODO: remove the `na_action="ignore"` when that default has been changed in
   1505 #  Categorical (GH51645).
   1506 action = "ignore" if isinstance(obj.dtype, CategoricalDtype) else None
-> 1507 mapped = obj._map_values(
   1508     mapper=curried, na_action=action, convert=self.convert_dtype
   1509 )
   1511 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1512     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1513     #  See also GH#25959 regarding EA support
   1514     return obj._constructor_expanddim(list(mapped), index=obj.index)

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\pandas\core\base.py:921, in IndexOpsMixin._map_values(self, mapper, na_action, convert)
    918 if isinstance(arr, ExtensionArray):
    919     return arr.map(mapper, na_action=na_action)
--> 921 return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\pandas\core\algorithms.py:1743, in map_array(arr, mapper, na_action, convert)
   1741 values = arr.astype(object, copy=False)
   1742 if na_action is None:
-> 1743     return lib.map_infer(values, mapper, convert=convert)
   1744 else:
   1745     return lib.map_infer_mask(
   1746         values, mapper, mask=isna(values).view(np.uint8), convert=convert
   1747     )

File lib.pyx:2972, in pandas._libs.lib.map_infer()

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\graphrag\query\indexer_adapters.py:161, in read_indexer_entities.<locals>.<lambda>(x)
    158 # group entities by id and degree and remove duplicated community IDs
    159 nodes_df = nodes_df.groupby(["id"]).agg({"community": set}).reset_index()
    160 nodes_df["community"] = nodes_df["community"].apply(
--> 161     lambda x: [str(int(i)) for i in x]
    162 )
    163 final_df = nodes_df.merge(entities_df, on="id", how="inner").drop_duplicates(
    164     subset=["id"]
    165 )
    166 # read entity dataframe to knowledge model objects

ValueError: cannot convert float NaN to integer

Additional Information

  • GraphRAG Version: 2.1.0
  • Operating System: Windows 11
  • Python Version: 3.12
  • Related Issues:

Metadata

Metadata

Assignees

No one assigned

    Labels

    backlogWe've confirmed some action is needed on this and will plan itbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions