You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This tutorial is indicating the "description_embeddings" and "title" columns and a few other things that have changed since v0.4.0
I need to know how can I get the description_embeddings again in the .parquet files since the new workflow removes them from there and now are represented directly into a vectorstore.
What would be the most appropiate way to import this to neo4j now ?
Steps to reproduce
Install graphrag v0.4.0 or higher +
index any inputs, and follow the instructions in the neo4j import notebook
GraphRAG Config Used
### This config file contains required core defaults that must be set, along with a handful of common optional settings.### For a full list of available settings, see https://microsoft.github.io/graphrag/config/yaml/### LLM settings ##### There are a number of settings to tune the threading and token limits for LLM calls - check the docs.encoding_model: cl100k_base # this needs to be matched to your model!llm:
api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env filetype: openai_chat # or azure_openai_chatmodel: gpt-4o-minimodel_supports_json: true # recommended if this is available for your model.# audience: "https://cognitiveservices.azure.com/.default"# api_base: https://<instance>.openai.azure.com# api_version: 2024-02-15-preview# organization: <organization_id># deployment_name: <azure_model_deployment_name>parallelization:
stagger: 0.3# num_threads: 50async_mode: threaded # or asyncioembeddings:
async_mode: threaded # or asynciovector_store:
type: lancedbdb_uri: 'output/lancedb'container_name: defaultoverwrite: truellm:
api_key: ${GRAPHRAG_API_KEY}type: openai_embedding # or azure_openai_embeddingmodel: text-embedding-3-large# api_base: https://<instance>.openai.azure.com# api_version: 2024-02-15-preview# audience: "https://cognitiveservices.azure.com/.default"# organization: <organization_id># deployment_name: <azure_model_deployment_name>### Input settings ###input:
type: file # or blobfile_type: text # or csvbase_dir: "input"file_encoding: utf-8file_pattern: ".*\\.(txt|md)$"chunks:
size: 1200overlap: 100group_by_columns: [id]### Storage settings ##### If blob storage is specified in the following four sections,## connection_string and container_name must be providedcache:
type: file # or blobbase_dir: "cache"reporting:
type: file # or console, blobbase_dir: "logs"storage:
type: file # or blobbase_dir: "output"## only turn this on if running `graphrag index` with custom settings## we normally use `graphrag update` with the defaultsupdate_index_storage:
# type: file # or blob# base_dir: "update_output"### Workflow settings ###skip_workflows: []entity_extraction:
## llm: override the global llm settings for this task## parallelization: override the global parallelization settings for this task## async_mode: override the global async_mode settings for this taskprompt: "prompts/entity_extraction.txt"entity_types: [organization,person,geo,event,concept,component,specification, business entity, attribute, value, field, system, process, role]max_gleanings: 3summarize_descriptions:
prompt: "prompts/summarize_descriptions.txt"max_length: 500claim_extraction:
enabled: trueprompt: "prompts/claim_extraction.txt"description: "Any claims or facts that could be relevant to information discovery."max_gleanings: 2community_reports:
prompt: "prompts/community_report.txt"max_length: 2000max_input_length: 8000cluster_graph:
max_cluster_size: 10embed_graph:
enabled: false # if true, will generate node2vec embeddings for nodesumap:
enabled: false # if true, will generate UMAP embeddings for nodessnapshots:
graphml: falseembeddings: falsetransient: false### Query settings ##### The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#querylocal_search:
prompt: "prompts/local_search_system_prompt.txt"global_search:
map_prompt: "prompts/global_search_map_system_prompt.txt"reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"drift_search:
prompt: "prompts/drift_search_system_prompt.txt"
Do you need to file an issue?
Describe the issue
graphrag_import_neo4j_cypher.ipynb
This tutorial is indicating the "description_embeddings" and "title" columns and a few other things that have changed since v0.4.0
I need to know how can I get the description_embeddings again in the .parquet files since the new workflow removes them from there and now are represented directly into a vectorstore.
What would be the most appropiate way to import this to neo4j now ?
Steps to reproduce
GraphRAG Config Used
Logs and screenshots
No response
Additional Information
The text was updated successfully, but these errors were encountered: