-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Closed as not planned
Closed as not planned
Copy link
Labels
autoresolvedawaiting_responseMaintainers or community have suggested solutions or requested info, awaiting filer responseMaintainers or community have suggested solutions or requested info, awaiting filer responsebugSomething isn't workingSomething isn't workingstaleUsed by auto-resolve bot to flag inactive issuesUsed by auto-resolve bot to flag inactive issuestriageDefault label assignment, indicates new issue needs reviewed by a maintainerDefault label assignment, indicates new issue needs reviewed by a maintainer
Description
Do you need to file an issue?
- I have searched the existing issues and this bug is not already filed.
- My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
- I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.
Describe the bug
Every time i launch the indexer i get error extracting graph, validation error for BaseModelOutput.
I have tried this on different machines (my windows pc and then my macbook pro M2).
This eventually leads to a failed indexing.
I am not sure why this is happening but am very frustrated.
I need this code to work for a project due this weekend.
Steps to reproduce
following the steps on the getting started page.
Add my text files instead of a christmas carol.
run the indexer
Expected Behavior
This is supposed to go through the whole indexing process without any issues.
I have made no edits to the base code.
GraphRAG Config Used
### This config file contains required core defaults that must be set, along with a handful of common optional settings.
### For a full list of available settings, see https://microsoft.github.io/graphrag/config/yaml/
### LLM settings ###
## There are a number of settings to tune the threading and token limits for LLM calls - check the docs.
models:
default_chat_model:
type: azure_openai_chat
api_base: {redacted}
api_version: '2024-12-01-preview'
auth_type: api_key # or azure_managed_identity
api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env file
# audience: "https://cognitiveservices.azure.com/.default"
# organization: <organization_id>
model: 'gpt-4o-mini'
deployment_name: 'gpt-4o-mini'
# encoding_model: cl100k_base # automatically set by tiktoken if left undefined
model_supports_json: true # recommended if this is available for your model.
concurrent_requests: 25 # max number of simultaneous LLM requests allowed
async_mode: threaded # or asyncio
retry_strategy: native
max_retries: -1 # set to -1 for dynamic retry logic (most optimal setting based on server response)
tokens_per_minute: 880_000 # set to 0 to disable rate limiting
requests_per_minute: 8_500 # set to 0 to disable rate limiting
default_embedding_model:
type: azure_openai_embedding
api_base: {redacted}
api_version: '2024-02-01'
auth_type: api_key # or azure_managed_identity
api_key: ${GRAPHRAG_API_KEY}
# audience: "https://cognitiveservices.azure.com/.default"
# organization: <organization_id>
model: 'text-embedding-3-large'
deployment_name: 'text-embedding-3-large'
# encoding_model: cl100k_base # automatically set by tiktoken if left undefined
model_supports_json: true # recommended if this is available for your model.
concurrent_requests: 25 # max number of simultaneous LLM requests allowed
async_mode: threaded # or asyncio
retry_strategy: native
max_retries: -1 # set to -1 for dynamic retry logic (most optimal setting based on server response)
tokens_per_minute: 340_000 # set to 0 to disable rate limiting
requests_per_minute: 2_000 # set to 0 to disable rate limiting
vector_store:
default_vector_store:
type: lancedb
db_uri: output/lancedb
container_name: default
overwrite: True
embed_text:
model_id: default_embedding_model
vector_store_id: default_vector_store
### Input settings ###
input:
type: file # or blob
file_type: text # [csv, text, json]
base_dir: "input"
chunks:
size: 1200
overlap: 100
group_by_columns: [id]
### Output settings ###
## If blob storage is specified in the following four sections,
## connection_string and container_name must be provided
cache:
type: file # [file, blob, cosmosdb]
base_dir: "cache"
reporting:
type: file # [file, blob, cosmosdb]
base_dir: "logs"
output:
type: file # [file, blob, cosmosdb]
base_dir: "output"
### Workflow settings ###
extract_graph:
model_id: default_chat_model
prompt: "prompts/extract_graph.txt"
entity_types: [organization,person,geo,event]
max_gleanings: 1
summarize_descriptions:
model_id: default_chat_model
prompt: "prompts/summarize_descriptions.txt"
max_length: 500
extract_graph_nlp:
text_analyzer:
extractor_type: regex_english # [regex_english, syntactic_parser, cfg]
extract_claims:
enabled: false
model_id: default_chat_model
prompt: "prompts/extract_claims.txt"
description: "Any claims or facts that could be relevant to information discovery."
max_gleanings: 1
community_reports:
model_id: default_chat_model
graph_prompt: "prompts/community_report_graph.txt"
text_prompt: "prompts/community_report_text.txt"
max_length: 2000
max_input_length: 8000
cluster_graph:
max_cluster_size: 10
embed_graph:
enabled: false # if true, will generate node2vec embeddings for nodes
umap:
enabled: false # if true, will generate UMAP embeddings for nodes (embed_graph must also be enabled)
snapshots:
graphml: false
embeddings: false
### Query settings ###
## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query
local_search:
chat_model_id: default_chat_model
embedding_model_id: default_embedding_model
prompt: "prompts/local_search_system_prompt.txt"
global_search:
chat_model_id: default_chat_model
map_prompt: "prompts/global_search_map_system_prompt.txt"
reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"
knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"
drift_search:
chat_model_id: default_chat_model
embedding_model_id: default_embedding_model
prompt: "prompts/drift_search_system_prompt.txt"
reduce_prompt: "prompts/drift_search_reduce_prompt.txt"
basic_search:
chat_model_id: default_chat_model
embedding_model_id: default_embedding_model
prompt: "prompts/basic_search_system_prompt.txt"
Logs and screenshots
00:32:01,123 graphrag.index.operations.extract_graph.graph_extractor ERROR error extracting graph
Traceback (most recent call last):
File "/mnt/e/CIMPC_BACKUP/School/Research/GRAG_PC/.venv/lib/python3.11/site-packages/graphrag/index/operations/extract_graph/graph_extractor.py", line 127, in __call__
result = await self._process_document(text, prompt_variables)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/e/CIMPC_BACKUP/School/Research/GRAG_PC/.venv/lib/python3.11/site-packages/graphrag/index/operations/extract_graph/graph_extractor.py", line 155, in _process_document
response = await self._model.achat(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/e/CIMPC_BACKUP/School/Research/GRAG_PC/.venv/lib/python3.11/site-packages/graphrag/language_model/providers/fnllm/models.py", line 282, in achat
output=BaseModelOutput(content=response.output.content),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/e/CIMPC_BACKUP/School/Research/GRAG_PC/.venv/lib/python3.11/site-packages/pydantic/main.py", line 214, in __init__
validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for BaseModelOutput
content
Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
For further information visit https://errors.pydantic.dev/2.10/v/string_type
00:32:01,133 graphrag.callbacks.file_workflow_callbacks INFO Entity Extraction Error details={'doc_index': 0, 'text': ', that she was the property of al-Baghdadi." ABC News, citing counterterrorism officials, reported Mueller was held prisoner inside the home of Abu Sayyaf, a senior ISIS leader who directed the organization\'s illicit gas and oil trade. Al-Baghdadi would regularly visit Sayaff\'s compound to meet with him and assault Mueller, according to the ABC news report. Abu Sayyaf was killed during a raid on his compound in May. His wife, Umm Sayyaf, was captured in the raid, and allegedly told U.S. officials these new details of Mueller\'s horrific captivity during interrogations. ABC News reported several teenage Yezidi girls who had been held as sex slaves inside Sayyaf\'s compound also confirmed al-Baghdadi\'s role in Mueller\'s sexual assault. U.S. officials confirmed that they rescued a Yezidi woman during the raid on Sayyaf\'s compound in May. The White House declined to comment when asked by BuzzFeed News about this new information. The news about Mueller follows a searing New York Times investigation published Thursday on how members of ISIS use theology to encourage and justify the rape of non-Muslim women. The Times interviewed more than 20 victims, including a 12-year-old Yezidi girl who was repeatedly raped by an ISIS fighter who she said would pray before and after he sexually assaulted her. "He told me that according to Islam he is allowed to rape an unbeliever," the girl told the Times\' Rukmini Callimachi. "He said that by raping me he is drawing closer to God." ISIS has established a codified system of sex slavery comprising women and girls from the minority Yezidi ethnic group. There had been suspicions that Kayla Mueller had been forced to marry an ISIS commander. These rumors were prompted by a letter to her family written one year into her captivity, in which Mueller wrote that she had been treated with "the utmost respect + kindness," a quote that was touted by ISIS supporters as proof of their group\'s humanity. The Mueller family has referred media to a video a friend made in tribute to Kayla\'s life to mark her 27th birthday, which was last Friday.\n\nhttps://obamawhitehouse.archives.gov/vp says the following: Joseph Robinette Biden, Jr., represented Delaware for 36 years in the U.S. Senate before becoming the 47th and current Vice President of the United States. Joseph Robinette Biden, Jr., was born November 20, 1942, in Scranton, Pennsylvania, the first of four siblings. In 1953, the Biden family moved from Pennsylvania to Claymont, Delaware. He graduated from the University of Delaware and Syracuse Law School and served on the New Castle County Council. Then, at age 29, he became one of the youngest people ever elected to the United States Senate. Just weeks after the election, tragedy struck the Biden family when Biden\'s wife, Neilia and their one-year-old daughter, Naomi, were killed and their two young sons critically injured in an auto accident. Vice President Biden was sworn in to the U.S. Senate at his sons\' hospital bedside and began commuting to Washington every day by train, a practice he maintained throughout his career in the Senate. In 1977, Vice President Biden married Jill Jacobs. Jill Biden, who holds a Ph.D. in Education, is a life-long educator and currently teaches at a community college in Northern Virginia. The Vice President’s son, Beau, was Delaware\'s Attorney General from 2007-2015 and a Major in the 261st Signal Brigade of the Delaware National Guard. He was deployed to Iraq in 2008-2009. Beau passed away in 2015 after battling with brain cancer with the same integrity, courage, and strength he demonstrated every day of his life. The Vice President’s other son, Hunter, is an attorney who manages a private equity firm in Washington, D.C. and is Chairman of the World Food Program USA. And his daughter Ashley is a social worker and is Executive Director of the Delaware Center for Justice. Vice President Biden has five grandchildren: Naomi, Finnegan, Roberta Mabel ("Maisy"), Natalie, and Robert Hunter. As a Senator from Delaware for 36 years, Vice President Biden established himself as a leader in facing some of our nation\'s most important domestic and international challenges. As Chairman or Ranking Member of the Senate Judiciary Committee for 17 years, then-Senator Biden was widely recognized for his work on criminal justice issues, including the landmark 1994 Crime Act and the Violence Against Women Act. As Chairman or Ranking Member of the Senate Foreign Relations Committee for 12 years, then-Senator Biden played a pivotal role in shaping U.S. foreign policy. He has been at the forefront of issues and legislation related to terrorism, weapons of mass destruction, post-Cold War Europe, the Middle East, and Southwest Asia. As the 47th Vice President of the United States, Joe Biden has continued his leadership on important issues facing the nation and has represented our country abroad traveling over 1.2 million miles to more than 50 countries. Vice President Biden has convened sessions of the President’s Cabinet, led interagency efforts, and worked with Congress in his fight to raise the living standards of middle class Americans, reduce gun violence, address violence against women, and end cancer as we know it. Eight years ago, the turmoil in the financial sector led to crippling conditions in the real economy -- the livelihood of millions of American households and businesses outside of Wall Street. Eight years later, we’re in the midst of the longest streak of job growth in history -- with more than 15 million jobs added. The Vice President played a key role in acting aggressively to arrest the crisis, restart growth and job creation, rebuild our economy on a stronger long-term foundation, and expand opportunity for all Americans. The Vice President'}
Additional Information
- GraphRAG Version: 2.1.0
- Operating System: Windows but using WSL. I have also tried it with macbook pro M2
- Python Version: 3.11.11
- Related Issues:
Metadata
Metadata
Assignees
Labels
autoresolvedawaiting_responseMaintainers or community have suggested solutions or requested info, awaiting filer responseMaintainers or community have suggested solutions or requested info, awaiting filer responsebugSomething isn't workingSomething isn't workingstaleUsed by auto-resolve bot to flag inactive issuesUsed by auto-resolve bot to flag inactive issuestriageDefault label assignment, indicates new issue needs reviewed by a maintainerDefault label assignment, indicates new issue needs reviewed by a maintainer
