-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Closed
Labels
backlogWe've confirmed some action is needed on this and will plan itWe've confirmed some action is needed on this and will plan itbugSomething isn't workingSomething isn't working
Description
Do you need to file an issue?
- I have searched the existing issues and this bug is not already filed.
- My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
- I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.
Describe the bug
Using blob storage for input. But when running indexing noticed that it tries to read more files than it should. It gives me all files within a container.
input config example:
input:
storage:
type: blob # [file, blob]
base_dir: "input/folder_1/"
container_name: "graphrag-container"
storage_account_blob_url: "https://graphragexample.blob.core.windows.net"
file_type: text
Using this example config, getting warning messages during indexing like:
2025-10-24 11:30:38.0647 - WARNING - graphrag.storage.blob_pipeline_storage - Error getting key input/folder_1/input/folder_2/text.txt
In code, noticed you are getting all blobs from the container:
| all_blobs = list(container_client.list_blobs()) |
Couldn't pin point exact place why other directories are not being filtered out but you could just get files under a base dir to avoid this issue:
all_blobs = list(container_client.list_blobs(name_starts_with=base_dir))
Steps to reproduce
No response
Expected Behavior
No response
GraphRAG Config Used
No response
Logs and screenshots
No response
Additional Information
- GraphRAG Version: 2.6.0
- Operating System: Ubuntu 22
- Python Version: 3.12
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
backlogWe've confirmed some action is needed on this and will plan itWe've confirmed some action is needed on this and will plan itbugSomething isn't workingSomething isn't working