-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #76 from cloudera/main
* upgrade everything * small refactor for params, update loading * add bedrock converse * fix loading * Clean up Cohere suggested questions * Add property-based test for process_response() (#56) * Add hypothesis * Add property-based test for process_response() * Shorten variable * Formatting * Add type annotations * Fix type annotation * hacking on startup scripts * hacking on startup scripts, moar * fix wrong dir * try having the java side restart itself if it dies * see output from java startup * add debug info * add the executable bit * change the flags * Add docstrings for tests * refactor datasourceId * update to exclude 405b model and default to 8b * update readme for new cohere * fix broken tests monkeypatching * "wip on creating with models and response chunks" * wip on modal updates * commit java updates * wip on populating the chat setting modal * set up ui for updating a session * add update method * use updated session for chat * remove query configuration from the chat context * refactoring fe and fixing bug with empty model * remove the datasource id from the context and use the active session instead * Update release version to 1.4.0-beta * Support multiple embedding models (#59) * add embedding model to the data source in the java API * embedding model used from the datasource while indexing * replace the rest of the embedding model defaults * "test & fix bugs with embedding variability" * small refactoring to make embedding & llm caii methods look the same * fix linting issues * add a todo for a failing property test case * remove unused import --------- Co-authored-by: Elijah Williams <ewilliams@cloudera.com> * Provide CAII batch embedding for better performance (#35) * CAII endpoint discovery (#60) * "wip on endpoint listing" * "wip on list_endpoints typing" * "refactoring to endpoint object" * "wip filtering" * "endpoints queried!" * "refactoring" * "wip on cleaning up types" * "type cleanup complete" * "moving files" * "use a dummy embedding model for deletes" * fix some bits from merge, get evals working again with CAII, tests passing * formatting * clean up ruff stuff * use the chat llm for evals * fix mypy for reformatting * "wip on java reconciler" * "reconciler don't do no model; start python work" * "python - updating for summarization model" * "comment out batch embeddings to get it working again" * add handling for no summarization in the files table * finish up ui and python for summarization * make sure to update the time-updated fields on data sources and chat sessions * use no-op models when we don't need real ones for summary functionality * Update release version to dev-testing * use the summarization llm when summarizing summaries --------- Co-authored-by: Elijah Williams <ewilliams@cloudera.com> Co-authored-by: actions-user <actions@github.com> * Update release version to 1.4.0 * pass the original filename from java-> python so we don't need s3 metadata to store it * don't read the whole directory when summarizing docs * "refactor java to use RagFileService" * remove seaweedfs experiment * Make mypy happy (#62) * Refactor summary index to isolate the logic (#63) * Refactor summary index to isolate the logic * fix tests * handle race condition * handle mypy * ignore errors if the directory doesn't exist --------- Co-authored-by: jwatson <jkwatson@gmail.com> * image * Update catalog entry to match the official one (#66) * Update local catalog with official info * add the git-ref back * add the html long description (#67) * Shuffle API for data sources for easier human consumption (#68) * Shuffle API for data sources for easier human consumption * make mypy happy * remove prints * wip o fs rag file uploader * "now we're thinking with overtime" * Revert ""now we're thinking with overtime"" This reverts commit 3c93206. * get the databases directory from the environment (in local_dev) python file storage abstraction python tests currently broken real AMP startup script needs new env var * add a todo * merge from main * properly override the configuration in pytest configure to point at a temp directory * get the tests passing with filesystem file handoff * update project metadata to support new local filesystem storage * Update release version to dev-testing * fix java * cleanup after switching tests to use the local filesystem * Remove unused settings (#70) * remove unused dep * fix circular dep and refactor doc storage * Update release version to 1.4.0 * Summarize the data store on every document summarization (#69) * fix bug with s3 path when the prefix is not provided (#72) * add --reload to the fastapi startup_app script * Avoid global variables and use ephemeral folder for tests (#71) * Avoid global variables and use ephemeral folder for tests * fix with merge to main * Remove print * lint * refetch knowledge base summary on doc summary change * Bump @eslint/plugin-kit (#16) Bumps the npm_and_yarn group with 1 update in the /ui directory: [@eslint/plugin-kit](https://github.com/eslint/rewrite). Updates `@eslint/plugin-kit` from 0.2.2 to 0.2.3 - [Release notes](https://github.com/eslint/rewrite/releases) - [Changelog](https://github.com/eslint/rewrite/blob/main/release-please-config.json) - [Commits](eslint/rewrite@plugin-kit-v0.2.2...plugin-kit-v0.2.3) --- updated-dependencies: - dependency-name: "@eslint/plugin-kit" dependency-type: indirect dependency-group: npm_and_yarn ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: jwatson <jkwatson@gmail.com> Co-authored-by: Michael Liu <mliu@cloudera.com> Co-authored-by: actions-user <actions@github.com> Co-authored-by: conradocloudera <csilvamiranda@cloudera.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
- Loading branch information
Showing
123 changed files
with
3,920 additions
and
1,746 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
71 changes: 71 additions & 0 deletions
71
backend/src/main/java/com/cloudera/cai/rag/files/FileSystemRagFileUploader.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
/* | ||
* CLOUDERA APPLIED MACHINE LEARNING PROTOTYPE (AMP) | ||
* (C) Cloudera, Inc. 2024 | ||
* All rights reserved. | ||
* | ||
* Applicable Open Source License: Apache 2.0 | ||
* | ||
* NOTE: Cloudera open source products are modular software products | ||
* made up of hundreds of individual components, each of which was | ||
* individually copyrighted. Each Cloudera open source product is a | ||
* collective work under U.S. Copyright Law. Your license to use the | ||
* collective work is as provided in your written agreement with | ||
* Cloudera. Used apart from the collective work, this file is | ||
* licensed for your use pursuant to the open source license | ||
* identified above. | ||
* | ||
* This code is provided to you pursuant a written agreement with | ||
* (i) Cloudera, Inc. or (ii) a third-party authorized to distribute | ||
* this code. If you do not have a written agreement with Cloudera nor | ||
* with an authorized and properly licensed third party, you do not | ||
* have any rights to access nor to use this code. | ||
* | ||
* Absent a written agreement with Cloudera, Inc. (“Cloudera”) to the | ||
* contrary, A) CLOUDERA PROVIDES THIS CODE TO YOU WITHOUT WARRANTIES OF ANY | ||
* KIND; (B) CLOUDERA DISCLAIMS ANY AND ALL EXPRESS AND IMPLIED | ||
* WARRANTIES WITH RESPECT TO THIS CODE, INCLUDING BUT NOT LIMITED TO | ||
* IMPLIED WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY AND | ||
* FITNESS FOR A PARTICULAR PURPOSE; (C) CLOUDERA IS NOT LIABLE TO YOU, | ||
* AND WILL NOT DEFEND, INDEMNIFY, NOR HOLD YOU HARMLESS FOR ANY CLAIMS | ||
* ARISING FROM OR RELATED TO THE CODE; AND (D)WITH RESPECT TO YOUR EXERCISE | ||
* OF ANY RIGHTS GRANTED TO YOU FOR THE CODE, CLOUDERA IS NOT LIABLE FOR ANY | ||
* DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, PUNITIVE OR | ||
* CONSEQUENTIAL DAMAGES INCLUDING, BUT NOT LIMITED TO, DAMAGES | ||
* RELATED TO LOST REVENUE, LOST PROFITS, LOSS OF INCOME, LOSS OF | ||
* BUSINESS ADVANTAGE OR UNAVAILABILITY, OR LOSS OR CORRUPTION OF | ||
* DATA. | ||
******************************************************************************/ | ||
|
||
package com.cloudera.cai.rag.files; | ||
|
||
import java.io.IOException; | ||
import java.nio.file.Files; | ||
import java.nio.file.Path; | ||
import lombok.extern.slf4j.Slf4j; | ||
import org.springframework.stereotype.Component; | ||
import org.springframework.web.multipart.MultipartFile; | ||
|
||
@Slf4j | ||
@Component | ||
public class FileSystemRagFileUploader implements RagFileUploader { | ||
|
||
private static final String FILE_STORAGE_ROOT = fileStoragePath(); | ||
|
||
@Override | ||
public void uploadFile(MultipartFile file, String s3Path) { | ||
log.info("Uploading file to FS: {}", s3Path); | ||
try { | ||
Path filePath = Path.of(FILE_STORAGE_ROOT, s3Path); | ||
Files.createDirectories(filePath.getParent()); | ||
Files.write(filePath, file.getBytes()); | ||
} catch (IOException e) { | ||
throw new RuntimeException(e); | ||
} | ||
} | ||
|
||
private static String fileStoragePath() { | ||
var fileStoragePath = System.getenv("RAG_DATABASES_DIR") + "/file_storage"; | ||
log.info("configured with fileStoragePath = {}", fileStoragePath); | ||
return fileStoragePath; | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.