-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync kserve/master with odh/master #356
Conversation
* docs: Move Alibi explainer to docs Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Empty-Commit Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * fix test Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Empty-Commit Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
* build: Add flake8 and black to pre-commit hooks Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * fix path Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * pass config Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * fix flake8 Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
…e#3562 (kserve#3576) * Set writable cache folder to avoid permission issue. Fixes kserve#3562 Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Update huggingface_server.Dockerfile Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Empty-Commit Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
…kserve#3596) chore: Fixes [CVE-2023-45288](https://www.cve.org/CVERecord?id=CVE-2023-45288) Signed-off-by: Spolti <fspolti@redhat.com>
* Add OpenAIModel support to model repository. Signed-off-by: grandbora <grandbora@fb.com> * Allow model server to register an openai model Signed-off-by: grandbora <grandbora@fb.com> * address comments Signed-off-by: grandbora <grandbora@fb.com> * fix format Signed-off-by: grandbora <grandbora@fb.com> * make black happy Signed-off-by: grandbora <grandbora@fb.com> * Python 3.9 can not do isinstance on union type Signed-off-by: grandbora <grandbora@fb.com> * add comment Signed-off-by: grandbora <grandbora@fb.com> * Use a base model Signed-off-by: grandbora <grandbora@fb.com> * fix formatting Signed-off-by: grandbora <grandbora@fb.com> * Fix case Co-authored-by: Dan Sun <dsun20@bloomberg.net> Signed-off-by: Bora <grandbora@users.noreply.github.com> * Fix case Signed-off-by: grandbora <grandbora@fb.com> --------- Signed-off-by: grandbora <grandbora@fb.com> Signed-off-by: Bora <grandbora@users.noreply.github.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net>
* updated xgboost to support json and ubj models Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * rename bst_model dir Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * bug fix Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * black format Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * black formatter Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * bug fix Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> --------- Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>
* google.golang.org/protobuf version upgrade Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * version upgrade Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> --------- Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>
* VLLM support for OpenAI Completions in HF server Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * remove unwanted imports Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * minor fixes Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix lint Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix verify license Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix verify license Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Change base model Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix linter Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix tests Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Fix vllm Base and Chat Completion template Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Include Readme Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * ignore file from linter and generate Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * ignore file from linter and generate Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * add codege license Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * bring in openai errors Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix linting Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Remove openai import and update openai types codegen cmd Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * remove unwanted import Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix tests after conflict Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix poetry lock Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix poetry lock Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * remove openai from extras Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix logprobs Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * send json repsone content of type openai error Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> --------- Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
Signed-off-by: grandbora <grandbora@fb.com> Co-authored-by: Bora Tunca <btunca2@bloomberg.net>
Fix model server stop method Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
* Provide full Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Move to cmd directory Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Add helm charts Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * regen Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * fix conflict Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * rebase Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * remove unused Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * remove redundant files Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Empty-Commit Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Rename file Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
* go lint fix Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * commit for golangci Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * rewrite if-else to switch statement Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * fix for the response body Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> --------- Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Ignore protected namespaces. Don't set json_loads Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
* build: Fix CRD copying in generate-install.sh Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Empty-Commit Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Sivanantham <90966311+sivanantha321@users.noreply.github.com>
Remove replace for golang.org/x/net and fix CVE-2023-45288 for qpext Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
test re run Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>
* OpenAI data models and endpoints from vLLM Signed-off-by: Tessa Pham <hpham111@bloomberg.net> more components for OpenAI endpoints Signed-off-by: Tessa Pham <hpham111@bloomberg.net> add OpenAI endpoints to router Signed-off-by: Tessa Pham <hpham111@bloomberg.net> modify generate() in data plane Signed-off-by: Tessa Pham <hpham111@bloomberg.net> class OpenAIModel Signed-off-by: Tessa Pham <hpham111@bloomberg.net> delete and rename files Signed-off-by: Tessa Pham <hpham111@bloomberg.net> add create_chat_completion() to OpenAIModel Signed-off-by: Tessa Pham <hpham111@bloomberg.net> update routers and lint Signed-off-by: Tessa Pham <hpham111@bloomberg.net> * Implement streaming Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Register OpenAI endpoints when appropriate Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Remove completion types from dataplane methods Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Add OpenAI endpoint support to huggingfaceserver Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Allow accessing headers and response in completion methods Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Create separate model for completion and chat completion requests Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Add stop function for handling model shutdown Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Add arg for remote code param Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Add option to allow selecting model backend Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Pin ray to 2.10.x Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Use correct type in tests Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Refactor encoder-decoder and decoder only models into separate classes. Fix tests. Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Add more test cases. Factor models out into fixtures. Pass loop as argument to the background request handler. Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Remove unneccessary None check Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Properly handle unsupported models. Don't try to load table question answering models as they are not supported. Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Remove models we don't support Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Pass in predictor config Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Fix test assertion. Remove debug lines Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> --------- Signed-off-by: Tessa Pham <hpham111@bloomberg.net> Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> Co-authored-by: Tessa Pham <hpham111@bloomberg.net>
Signed-off-by: Spolti <fspolti@redhat.com>
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
* build: Remove misleading logs from minimal-crdgen.sh Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Add file Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
* Fix v2 predict for hf Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Add e2e test for hf Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Fix post processing and e2e image build Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Increase memory limit Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Fix output for v2 Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Add more tests Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Reduce parallelism Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Use backend argument Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Update to use chat completion endpoint Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Fix openai tests Signed-off-by: Dan Sun <dsun20@bloomberg.net> --------- Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Dan Sun <dsun20@bloomberg.net>
…erver (kserve#3594) * set SAFETENSORS_FAST_GPU and HF_HUB_DISABLE_TELEMETRY Signed-off-by: Lize Cai <lize.cai@sap.com> * add doc on the default value Signed-off-by: Lize Cai <lize.cai@sap.com> --------- Signed-off-by: Lize Cai <lize.cai@sap.com>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
… backend (kserve#3657) * Assign device of input tensors Signed-off-by: sailgpu <sailesh.duddupudi@nutanix.com> * lint fix Signed-off-by: sailgpu <sailesh.duddupudi@nutanix.com> --------- Signed-off-by: sailgpu <sailesh.duddupudi@nutanix.com>
* Test image builds for ARM64 arch in CI Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Update lockfiles Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> * Add ARM64 support for paddle Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com> --------- Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
* Encoder-decoder models do not include input tokens in their output Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * Pass stopping criteria into streamer Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> --------- Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
…3615) * Added the field AdditionalIngressDomains into the struct IngressConfig Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Added the additional ingress domains into the hosts Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Fixed the indentation Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Added isvc name and namespace into the domain name * Added the validation for the URLs Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Validate the domain in the additionalIngressDomains Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Create the hosts from the list of additionalIngressDomains Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Change the way to validate the host Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Change the validation error message Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Revert the name to url Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Get all the available domain list Signed-off-by: Vincent Hou <shou73@bloomberg.net> * gofmt -s -w the file Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Add additionalIngressDomains into the charts Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Added the comments and refactor the tests Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Regenerate the manifests Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Modify createHTTPMatchRequest, the charts and the test cases Signed-off-by: Vincent Hou <shou73@bloomberg.net> * Run make generate Signed-off-by: Vincent Hou <shou73@bloomberg.net> --------- Signed-off-by: Vincent Hou <shou73@bloomberg.net>
…Fixes kserve#3452 (kserve#3603) * feat: Support customizable deployment strategy for RawDeployment mode Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * regen Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * lint Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Correctly apply rollingupdate Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * address comments Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Add validation Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
* Enable dtype for huggingface server Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Set float16 as default. Fixup linter Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Add small comment to make the changes understandable Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Fixup linter Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Adapt to new huggingfacemodel Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Fixup merge :) Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Explicitly mention the behaviour of dtype flag on auto. Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Default to FP32 for encoder models Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Selectively add --dtype to parser. Use FP16 for GPU and FP32 for CPU Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Fixup linter Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Update poetry Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Use torch.float32 forr tests explicitly Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> --------- Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com>
Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net>
* fix for extract zip from gcs Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * initial commit for gcs model download unittests Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * unittests for model download from gcs Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * black format fix Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * code verification Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> --------- Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com>
Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
* update wording for huggingface README small update to make readme easier to understand Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> * Update README.md Signed-off-by: Alexa Griffith agriffith50@bloomberg.net * Update python/huggingfaceserver/README.md Co-authored-by: Filippe Spolti <filippespolti@gmail.com> Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> * update vllm Signed-off-by: alexagriffith <agriffith50@bloomberg.net> * Update README.md --------- Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> Signed-off-by: Alexa Griffith agriffith50@bloomberg.net Signed-off-by: alexagriffith <agriffith50@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Filippe Spolti <filippespolti@gmail.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net>
* fix: HPA equality check should include annotations Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Only watch related autoscalerclass annotation Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * simplify Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Add missing delete action Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * fix logic Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
fix huggingface runtime in chart Signed-off-by: Dan Sun <dsun20@bloomberg.net>
* fix huggingface runtime in chart Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Allow model_dir to be specified on template Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Default model_dir to /mnt/models for HF Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Lint format Signed-off-by: Dan Sun <dsun20@bloomberg.net> --------- Signed-off-by: Dan Sun <dsun20@bloomberg.net>
) * Fix:vLLM Model Supported check throwing circular dependency Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * remove unwanted comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * remove unwanted comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix return case Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix to check all arch in model config forr vllm support Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fixlint Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> --------- Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com>
7bb2e6d
to
885fae6
Compare
Signed-off-by: Spolti <fspolti@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: israel-hdez, spolti The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
148d70f
into
opendatahub-io:master
No description provided.