-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a non-S3 file-system based option for temporary file storage #64
Conversation
.bucketName( | ||
Optional.ofNullable(System.getenv("S3_RAG_DOCUMENT_BUCKET")).orElse("rag-files")) | ||
.bucketName(Optional.ofNullable(System.getenv("S3_RAG_DOCUMENT_BUCKET")).orElse("")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if this is a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we verified it works as desired. 👍🏽
@@ -62,6 +60,8 @@ public S3RagFileUploader( | |||
|
|||
@Override | |||
public void uploadFile(MultipartFile file, String s3Path) { | |||
log.info("Uploading file to S3: {}", s3Path); | |||
System.out.println("Uploading file to S3: " + s3Path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes!
source_file = Path(settings.rag_databases_dir, "file_storage", document_key) | ||
target_file = Path(temp_dir, original_filename) | ||
shutil.copy(source_file, target_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think we can just return source_file
. I don't think we depend on the specific file path for anything else anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm. is that true? I thought the docling stuff needed the directory to write the temporary markdown file to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah...we definitely need the temp version returned, since it has the original file name, unlike the saved one.
llm-service/app/tests/conftest.py
Outdated
# This is a hook to configure the pytest session | ||
def pytest_configure(config: Any) -> None: | ||
settings.rag_databases_dir = "/tmp/databases" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason why we're doing this hard-coded path instead of something like https://docs.pytest.org/en/stable/how-to/tmp_path.html?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pure python ignorance is all
This reverts commit 3c93206.
python file storage abstraction python tests currently broken real AMP startup script needs new env var
9739ef0
to
27da24b
Compare
""" | ||
Download document from S3 | ||
""" | ||
return s3.download(temp_dir, bucket_name, document_key, original_filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe we should just copy the content of s3.download
? Instead of having two ways to deal with S3, since we want to incentivize DocumentStorage
No description provided.