Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 Directory Document Loading Component #2818

Merged

Conversation

slaplante-raft
Copy link
Contributor

This PR introduces a new component which will allow a user to load multiple documents from an S3 bucket. There are optional parameters Server URL and Prefix. The component duplicates the functionality of the filesystem directory document loading component.

When connecting to MinIO buckets or a local S3 bucket, the Server URL will need to be provided.
When you want to filter only entries under a specific directory you would use the prefix option (hierarchy in s3 is flat so if there was a directoryB in directoryA, you would specify directoryA/directoryB to only load contents of directoryB) This also defaults to recursive loading. Another option can be added to limit that if needed.

Tested with a MinIO bucket containing pdf files in different directories. Verified with no prefix(download entire bucket), a prefix containing another directory and a prefix with no directory.

In addition, this was tested with a global s3 bucket (When the Server URL is not provided)

Screenshot 2024-07-17 at 9 36 10 AM

@HenryHengZJ
Copy link
Contributor

awesomee thank you so much!

add placeholder for prefix
@HenryHengZJ HenryHengZJ merged commit 34d0e43 into FlowiseAI:main Jul 21, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants