Releases: kserve/kserve
Releases · kserve/kserve
v0.15.0-rc0
What's Changed
- bump to vllm0.6.2 and add explicit chat template by @hustxiayang in #3964
- bump to vllm0.6.3 by @hustxiayang in #4001
- Feature: Add hf transfer by @tjandy98 in #4000
- Fix snyk scan null error by @sivanantha321 in #3974
- Update quick install script by @johnugeorge in #4005
- Local Model Node CR by @HotsauceLee in #3978
- Reduce E2Es dependency on CI environment (2) by @israel-hdez in #4008
- Allow GCS to download single file by @spolti in #4015
- bump to vllm0.6.3.post1 by @hustxiayang in #4023
- Set default for SamplingParams.max_tokens in OpenAI requests if unset by @kevinmingtarja in #4020
- Add tools functionality to vLLM by @ArjunBhalla98 in #4033
- For vllm users, our parser should be able to support both - and _ by @hustxiayang in #3933
- Add tools unpacking for vLLM by @ArjunBhalla98 in #4035
- Multi-Node Inference Implementation by @Jooho in #3972
- Enhance InjectAgent to Handle Only HTTPGet, TCP Readiness Probes by @LOADBC in #4012
- Feat: Fix memory issue by replacing io.ReadAll with io.Copy (#4017) by @ops-jaeha in #4018
- Update alibiexplainer example by @spolti in #4004
- Fix huggingface build runs out of storage in CI by @sivanantha321 in #4044
- Update snyk scan to include new images by @sivanantha321 in #4042
- Introducing KServe Guru on Gurubase.io by @kursataktas in #4038
- Fix Hugging Face server EncoderModel not returning probabilities by correctly passing --return_probabilities flag (#3958) by @oplushappy in #4024
- Add deeper readiness check for transformer by @sivanantha321 in #3348
- Fix Starlette Denial of service (DoS) via multipart/form-data by @spolti in #4006
- remove duplicated import "github.com/onsi/gomega" by @carlory in #4051
- Fix localmodel controller name in snyk scan workflow by @sivanantha321 in #4054
- Fix azure blob storage access key env not mounted by @bentohset in #4064
- Storage Initializer support single digit azure DNS zone ID by @bentohset in #4070
- Fix trust remote code encoder model by @sivanantha321 in #4043
- introduce the prepare-for-release.sh script by @spolti in #3993
- Model cache controller and node agent by @yuzisun in #4089
- Storage containers typo fix for Huggingface Storage type by @andyi2it in #4098
- Support datetime object serialization in v1/v2 response by @sivanantha321 in #4099
- Replace klog with klog/v2 by @sivanantha321 in #4093
- Add exception handling and logging for grpc server by @sivanantha321 in #4066
- Update ClusterLocalModel to LocalModelCache by @yuzisun in #4105
- Fix LocalModelCache controller reconciles deleted resource by @sivanantha321 in #4106
- Fix InferenceService state when Predictor pod in CrashLoopBackOff by @hdefazio in #4003
- LocalModelCache Admission Webhook by @HotsauceLee in #4102
- Add namespace to localmodel and localmodelnode ServiceAccount helm chart by @ritzdevp in #4111
- KServe VLLM cpu image by @AyushSawant18588 in #4049
- Update max_model_len calculation and fixup encoder pooling by @Datta0 in #4055
- chore: use patch instead of update for finalizer changes by @whynowy in #4072
- Fix isvc role localmodelcache permission by @sivanantha321 in #4131
- Detect missing models and redownload models by @greenmoon55 in #4095
- introduce service configuration at configmap level by @spolti in #3672
- Allow multiple node groups in the model cache CR by @greenmoon55 in #4134
- Annotation to disable model cache by @greenmoon55 in #4118
- Clean up jobs in model cache agent by @greenmoon55 in #4140
- Ensure Model root folder exists by @greenmoon55 in #4142
- Add NodeGroup Name Into PVC Name by @HotsauceLee in #4141
- Make LocalModel Agent reconcilation frequency configurable by @greenmoon55 in #4143
- Remove deepcopy-gen in favour of controller-gen by @sivanantha321 in #4109
- Add ability to set annotations on controll/webhook service and expose metrics bind port and address in helm chart by @mhowell24 in #4127
- Fix EOF error for downloading zip files by @Jonas-Bruns in #4082
- Remove redundant namespace yaml by @greenmoon55 in #4148
- Fix Localmodel agent build by @greenmoon55 in #4150
- Fix model server fails to gracefully shutdown by @sivanantha321 in #4116
- Ensure root model directory exists and add protection for jobs created by @yuzisun in #4152
- Enable transformer deeper readiness check tests by @sivanantha321 in #4121
- Update HuggingFace server dependencies versions by @AyushSawant18588 in #4147
- Add workflow for verifying go mod by @sivanantha321 in #4137
- Fix for CVE-2024-52304 - aiohttp upgrade by @andyi2it in #4113
- Allow other engine builders other than docker by @spolti in #3906
- Add localmodelnode crd to helm chart by @greenmoon55 in #4161
- Fixes Non-linear parsing of case-insensitive content by @spolti in #4158
- Helm chart - option to run daemonset as root by @greenmoon55 in #4164
- Replace nodeGroup with nodeGroups in charts/kserve-crd by @ritzdevp in #4166
- Add affinity and tolerations to localmodel daemonset by @greenmoon55 in #4173
- Fix s3 download PermanentRedirectError for legacy s3 endpoint by @bentohset in #4157
- Make label and annotation propagation configurable by @spolti in #4030
- Add ModelCache e2e test by @sivanantha321 in #4136
- Update vllm to 0.6.6 by @rajatvig in #4176
- [bugfix] fix s3 storage download filename bug by @anencore94 in #4162
- Add hf to storageuri prefix list by @tjandy98 in #4184
- Add Support for OpenAI-compatible Embeddings API by @FabianScheidt in #4129
- fix: typo in _construct_http_status_error method by @Mgla96 in #4190
- Fix raw logger e2e test by @sivanantha321 in #4185
- Feat: Support configuring isvc resource defaults by @andyi2it in #4032
- keep replicas when autoscaler set external by @Jooho in #4196
- Increase kserve controller readiness probe time period by @sivanantha321 in #4200
- Fix golangci-lint binary path selection based on GOBIN by @Jooho in #4198
- Add option to disable volume management in localModel config by @ritzdevp in #4186
- set MaxUnavailable(0%)/MaxSurge(100%) for rollingUpdate in multinode case by @Jooho in #4188
- Gracefully shutdown the router server by @sivanantha321 in #3367
- Add workflow for manual huggingface vLLM image publish by @sivanantha321 in #4092
- Feat: Gateway API Support - Raw Deployment by @sivanantha321 in #3952
- add make goal to build huggingface cpu image by @spolti in #4202
- Cleanup the filepath in createNewFile to avoid path traversal issue by @hdefazio in #4205
- Enhance multinode health_check python and manifests by @Jooho in #4197
- Publish 0.15-rc0 release by @yuzisun in #4213
New Contributors
- @ArjunBH...
v0.14.1
What's Changed
- Support datetime object serialization for v1/v2 response by @sivanantha321 in #4123
- Introduce LocalModelNode CR by @HotsauceLee in #3978
- Update Model Cache controller for LocalModelNode and implement LocalModel node agent by @HotsauceLee @greenmoon55 in #4089
- Rename ClusterLocalModel to LocalModelCache by @yuzisun in #4105
- Detect missing models and redownload models by @greenmoon55 in #4095
- Allow multiple node groups in the model cache CR by @greenmoon55 in #4134
- Annotation to disable model cache by @greenmoon55 in #4118
- Clean up jobs in local model agent by @greenmoon55 in #4140
- Add node group to PVC name by @HotsauceLee in #4141
- Make local node agent reconciliation frequency configurable by @greenmoon55 in #4143
- Add LocalModelCache admission webhook by @HotsauceLee in #4102
- Fix model server fails to gracefully shutdown by @sivanantha321 in #4116
- Ensure root model directory exists and add protection for jobs created by @yuzisun #4152
Full Changelog: v0.14.0...v0.14.1
v0.14.0
What's Changed
- Prevent the PassthroughCluster for clients/workloads in the service mesh by @israel-hdez in #3711
- Extract openai predict logic into smaller methods by @grandbora in #3716
- Bump MLServer to 1.5.0 by @sivanantha321 in #3740
- Refactor storage initializer to log model download time for all storage types by @sivanantha321 in #3735
- inferenceservice controller: fix error check in Serverless mode by @dtrifiro in #3753
- Add nccl package and Bump vLLM to 0.4.3 for huggingface runtime by @sivanantha321 in #3723
- Propagate
trust_remote_code
flag throughout vLLM startup by @calwoo in #3729 - Fix dead links on PyPI by @kevinbazira in #3754
- Fix model is ready even if there is no model by @HAO2167 in #3275
- Fix No model ready error in multi model serving by @sivanantha321 in #3758
- Initial implementation of Inference client by @sivanantha321 in #3401
- Fix logprobs for vLLM by @sivanantha321 in #3738
- Fix model name not properly parsed by inference graph by @sivanantha321 in #3746
- pillow - Buffer Overflow by @spolti in #3598
- Use add_generation_prompt while creating chat template by @Datta0 in #3775
- Deduplicate the names for the additional domain names by @houshengbo in #3773
- Make Virtual Service case-insensitive by @andyi2it in #3779
- Install packages needed for vllm model load by @gavrissh in #3802
- Make gRPC max message length configurable by @sivanantha321 in #3741
- Add readiness probe for MLServer and Increase memory for pmml in CI by @sivanantha321 in #3789
- Several bug fixes for vLLM completion endpoint by @sivanantha321 in #3788
- Increase timeout to make unit test stable by @Jooho in #3808
- Upgrade CI deps by @sivanantha321 in #3822
- Add tests for vLLM by @sivanantha321 in #3771
- Bump python to 3.11 for serving runtime images and Bump poetry to 1.8.3 by @sivanantha321 in #3812
- Bump vLLM to 0.5.3.post1 by @sivanantha321 in #3828
- Refactor the ModelServer to let uvicorn handle multiple workers and use 'spawn' for mutiprocessing by @sivanantha321 in #3757
- Update golang for docs/Dockerfile to 1.21 by @spolti in #3761
- Make ray an optional dependency by @sivanantha321 in #3834
- Update aif example by @spolti in #3765
- Use helm for quick installation by @sivanantha321 in #3813
- Allow KServe to have its own local gateways for Serverless mode by @israel-hdez in #3737
- Add support for Azure DNS zone endpoints by @tjandy98 in #3819
- Fix failed build for knativeLocalGatewayService by @yuzisun in #3866
- Add logging request feature for vLLM backend by @sivanantha321 in #3849
- Bump vLLM to 0.5.4 by @sivanantha321 in #3874
- Fix: Add workaround for snyk image scan failure by @sivanantha321 in #3880
- Fix trust_remote_code not working with huggingface backend by @sivanantha321 in #3879
- Update KServe 2024-2025 Roadmap by @yuzisun in #3810
- Configurable image pull secrets in Helm charts by @saileshd1402 in #3838
- Fix issue with rolling update behavior by @andyi2it in #3786
- Fix the 'tokens exceeding model limit' error response in vllm server by @saileshd1402 in #3886
- Add support for binary data extension protocol and FP16 datatype by @sivanantha321 in #3685
- Protobuf version upgrade 4.25.4 by @andyi2it in #3881
- Adds optional labels and annotations to the controller by @guitouni in #3366
- Enable Server-Side Apply for Kustomize Overlays in Test Environment by @Jooho in #3877
- bufix: update image_transformer.py to handle changes in input structure by @zwong91 in #3830
- support text embedding task in hugging face server by @kevinmingtarja in #3743
- Rename max_length parameter to max_model_len to be in sync with vLLM by @Datta0 in #3827
- [Upstream] - Update-istio version based on go version 1.21 by @mholder6 in #3825
- Enrich isvc NotReady events for failed conditions by @asdqwe123zxc in #3303
- adding metadata on requests by @gcemaj in #3635
- Publish 0.14.0-rc0 release by @yuzisun in #3867
- Use API token for publishing package to PyPI by @sivanantha321 in #3896
- Fix sdlc broken when kserve installed using helm by @sivanantha321 in #3890
- Add Security Context and Resources to RBAC Proxy by @HotsauceLee in #3898
- Remove unwanted cluster scope secret permissions by @sivanantha321 in #3893
- bump to vllm 0.5.5 by @lizzzcai in #3911
- pin gosec to 2.20.0 by @greenmoon55 in #3921
- add a new doc 'common issues and solutions' by @Jooho in #3878
- Implement health endpoint for vLLM backend by @sivanantha321 in #3850
- Add security best practices for inferenceservice, inferencegraph, servingruntimes by @sivanantha321 in #3917
- Bump Go to 1.22 by @sivanantha321 in #3912
- bump to vllm 0.6.0 by @hustxiayang in #3934
- Set the volume mount's readonly annotation based on the ISVC annotation by @hdefazio in #3885
- mount /dev/shm volume to huggingfaceserver by @lizzzcai in #3910
- Fix permission error in snyk scan by @sivanantha321 in #3889
- Cluster Local Model CR by @greenmoon55 in #3839
- added http headers to inbound request by @andyi2it in #3895
- Add prow-github-action by @sivanantha321 in #3888
- Add TLS support for Inference Loggers by @ruivieira in #3863
- Fix explainer endpoint not working with path based routing by @sivanantha321 in #3257
- Fix ingress configuration for path based routing and update go mod by @sivanantha321 in #3944
- Add HostIPC field to ServingRuntimePodSpec by @greenmoon55 in #3943
- remove conversion wehbook part from self-signed-ca.sh by @Jooho in #3941
- update fluid kserve sample to use huggingface servingruntime by @lizzzcai in #3907
- bump to vLLM0.6.1post2 by @hustxiayang in #3948
- Add NodeDownloadPending status to ClusterLocalModel by @greenmoon55 in #3955
- add tags to rest server timing logs to differentiate cpu and wall time by @gfkeith in #3954
- Implement Huggingface model download in storage initializer by @andyi2it in #3584
- Update OWNERS file by @yuzisun in #3966
- Cluster local model controller by @greenmoon55 in #3860
- Prepare for 0.14.0-rc1 release and automate sync process by @sivanantha321 in #3970
- add a new API for multi-node/multi-gpu by @Jooho in #3871
- Fix update-openapigen.sh that can be executed from kserve dir by @Jooho in #3924
- Add python 3.12 support and remove python 3.8 support by @sivanantha321 in #3645
- Fix openssl vulnerability CWE-1395 by @sivanantha321 in #3975
- Fix Kubernetes Doc Links by @jyono in #3670
- Fix kserve local testing env by @yuzisun in #3981
- Fix streaming response not working properly with logger by @sivanantha321 in #3847
- Add a flag for automount serviceaccount token by @greenmoon55 in https://github.com/kserve/ks...
v0.14.0-rc1
What's Changed
- Publish 0.14.0-rc0 release by @yuzisun in #3867
- Use API token for publishing package to PyPI by @sivanantha321 in #3896
- Fix sdlc broken when kserve installed using helm by @sivanantha321 in #3890
- Add Security Context and Resources to RBAC Proxy by @HotsauceLee in #3898
- Remove unwanted cluster scope secret permissions by @sivanantha321 in #3893
- bump to vllm 0.5.5 by @lizzzcai in #3911
- pin gosec to 2.20.0 by @greenmoon55 in #3921
- add a new doc 'common issues and solutions' by @Jooho in #3878
- Implement health endpoint for vLLM backend by @sivanantha321 in #3850
- Add security best practices for inferenceservice, inferencegraph, servingruntimes by @sivanantha321 in #3917
- Bump Go to 1.22 by @sivanantha321 in #3912
- bump to vllm 0.6.0 by @hustxiayang in #3934
- Set the volume mount's readonly annotation based on the ISVC annotation by @hdefazio in #3885
- mount /dev/shm volume to huggingfaceserver by @lizzzcai in #3910
- Fix permission error in snyk scan by @sivanantha321 in #3889
- Cluster Local Model CR by @greenmoon55 in #3839
- added http headers to inbound request by @andyi2it in #3895
- Add prow-github-action by @sivanantha321 in #3888
- Add TLS support for Inference Loggers by @ruivieira in #3863
- Fix explainer endpoint not working with path based routing by @sivanantha321 in #3257
- Fix ingress configuration for path based routing and update go mod by @sivanantha321 in #3944
- Add HostIPC field to ServingRuntimePodSpec by @greenmoon55 in #3943
- remove conversion wehbook part from self-signed-ca.sh by @Jooho in #3941
- update fluid kserve sample to use huggingface servingruntime by @lizzzcai in #3907
- bump to vLLM0.6.1post2 by @hustxiayang in #3948
- Add NodeDownloadPending status to ClusterLocalModel by @greenmoon55 in #3955
- add tags to rest server timing logs to differentiate cpu and wall time by @gfkeith in #3954
- Implement Huggingface model download in storage initializer by @andyi2it in #3584
- Update OWNERS file by @yuzisun in #3966
- Cluster local model controller by @greenmoon55 in #3860
- Prepare for 0.14.0-rc1 release and automate sync process by @sivanantha321 in #3970
New Contributors
- @HotsauceLee made their first contribution in #3898
- @hustxiayang made their first contribution in #3934
- @hdefazio made their first contribution in #3885
- @ruivieira made their first contribution in #3863
- @gfkeith made their first contribution in #3954
Full Changelog: v0.14.0-rc0...v0.14.0-rc1
v0.14.0-rc0
What's Changed
- Prevent the PassthroughCluster for clients/workloads in the service mesh by @israel-hdez in #3711
- Extract openai predict logic into smaller methods by @grandbora in #3716
- Bump MLServer to 1.5.0 by @sivanantha321 in #3740
- Refactor storage initializer to log model download time for all storage types by @sivanantha321 in #3735
- inferenceservice controller: fix error check in Serverless mode by @dtrifiro in #3753
- Add nccl package and Bump vLLM to 0.4.3 for huggingface runtime by @sivanantha321 in #3723
- Propagate
trust_remote_code
flag throughout vLLM startup by @calwoo in #3729 - Fix dead links on PyPI by @kevinbazira in #3754
- Fix model is ready even if there is no model by @HAO2167 in #3275
- Fix No model ready error in multi model serving by @sivanantha321 in #3758
- Initial implementation of Inference client by @sivanantha321 in #3401
- Fix logprobs for vLLM by @sivanantha321 in #3738
- Fix model name not properly parsed by inference graph by @sivanantha321 in #3746
- pillow - Buffer Overflow by @spolti in #3598
- Use add_generation_prompt while creating chat template by @Datta0 in #3775
- Deduplicate the names for the additional domain names by @houshengbo in #3773
- Make Virtual Service case-insensitive by @andyi2it in #3779
- Install packages needed for vllm model load by @gavrissh in #3802
- Make gRPC max message length configurable by @sivanantha321 in #3741
- Add readiness probe for MLServer and Increase memory for pmml in CI by @sivanantha321 in #3789
- Several bug fixes for vLLM completion endpoint by @sivanantha321 in #3788
- Increase timeout to make unit test stable by @Jooho in #3808
- Upgrade CI deps by @sivanantha321 in #3822
- Add tests for vLLM by @sivanantha321 in #3771
- Bump python to 3.11 for serving runtime images and Bump poetry to 1.8.3 by @sivanantha321 in #3812
- Bump vLLM to 0.5.3.post1 by @sivanantha321 in #3828
- Refactor the ModelServer to let uvicorn handle multiple workers and use 'spawn' for mutiprocessing by @sivanantha321 in #3757
- Update golang for docs/Dockerfile to 1.21 by @spolti in #3761
- Make ray an optional dependency by @sivanantha321 in #3834
- Update aif example by @spolti in #3765
- Use helm for quick installation by @sivanantha321 in #3813
- Allow KServe to have its own local gateways for Serverless mode by @israel-hdez in #3737
- Add support for Azure DNS zone endpoints by @tjandy98 in #3819
- Fix failed build for knativeLocalGatewayService by @yuzisun in #3866
- Add logging request feature for vLLM backend by @sivanantha321 in #3849
- Bump vLLM to 0.5.4 by @sivanantha321 in #3874
- Fix: Add workaround for snyk image scan failure by @sivanantha321 in #3880
- Fix trust_remote_code not working with huggingface backend by @sivanantha321 in #3879
- Update KServe 2024-2025 Roadmap by @yuzisun in #3810
- Configurable image pull secrets in Helm charts by @saileshd1402 in #3838
- Fix issue with rolling update behavior by @andyi2it in #3786
- Fix the 'tokens exceeding model limit' error response in vllm server by @saileshd1402 in #3886
- Add support for binary data extension protocol and FP16 datatype by @sivanantha321 in #3685
- Protobuf version upgrade 4.25.4 by @andyi2it in #3881
- Adds optional labels and annotations to the controller by @guitouni in #3366
- Enable Server-Side Apply for Kustomize Overlays in Test Environment by @Jooho in #3877
- bufix: update image_transformer.py to handle changes in input structure by @zwong91 in #3830
- support text embedding task in hugging face server by @kevinmingtarja in #3743
- Rename max_length parameter to max_model_len to be in sync with vLLM by @Datta0 in #3827
- [Upstream] - Update-istio version based on go version 1.21 by @mholder6 in #3825
- Enrich isvc NotReady events for failed conditions by @asdqwe123zxc in #3303
- adding metadata on requests by @gcemaj in #3635
New Contributors
- @calwoo made their first contribution in #3729
- @guitouni made their first contribution in #3366
- @zwong91 made their first contribution in #3830
- @mholder6 made their first contribution in #3825
- @asdqwe123zxc made their first contribution in #3303
- @gcemaj made their first contribution in #3635
Full Changelog: v0.13.0...v0.14.0-rc0
v0.13.1
What's Changed
- Add nccl package and Bump vLLM to 0.4.3 for huggingface runtime by @sivanantha321 (#3723)
- Propagate trust_remote_code flag throughout vLLM startup by @calwoo (#3729)
- Use add_generation_prompt while creating chat template by @Datta0 (#3775)
- Fix logprobs for vLLM by @sivanantha321 (#3738)
- Install packages needed for vllm model load by @gavrissh (#3802)
- Publish 0.13.1 Release by @johnugeorge in #3824
Full Changelog: v0.13.0...v0.13.1
v0.13.0
🌈 What's New?
- add support for async streaming in predict by @alexagriffith in #3475
- Fix: Support model parallelism in HF transformer by @gavrishp in #3459
- Support model revision and tokenizer revision in huggingface server by @lizzzcai in #3558
- OpenAI schema by @tessapham in #3477
- Support OpenAIModel in ModelRepository by @grandbora in #3590
- updated xgboost to support json and ubj models by @andyi2it in #3551
- Add OpenAI API support to Huggingfaceserver by @cmaddalozzo in #3582
- VLLM support for OpenAI Completions in HF server by @gavrishp in #3589
- Add a user friendly error message for http exceptions by @grandbora in #3581
- feat: Provide minimal distribution of CRDs by @terrytangyuan in #3492
- set default SAFETENSORS_FAST_GPU and HF_HUB_DISABLE_TELEMETRY in HF Server by @lizzzcai in #3594
- Enabled the multiple domains support on an inference service by @houshengbo in #3615
- Add base model for proxying request to an OpenAI API enabled model server by @cmaddalozzo in #3621
- Add headers to predictor exception logging by @grandbora in #3658
- Enhance controller setup based on available CRDs by @israel-hdez in #3472
- Add openai models endpoint by @cmaddalozzo in #3666
- feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 by @terrytangyuan in #3603
- Enable dtype support for huggingface server by @Datta0 in #3613
- Add method for checking model health/readiness by @cmaddalozzo in #3673
- Unify the log configuration using kserve logger by @sivanantha321 in #3577
- Add the field ResponseStartTimeoutSeconds to create ksvc by @houshengbo in #3705
- Add FP16 datatype support for OIP grpc by @sivanantha321 in #3695
- Add option for returning probabilities in huggingface server by @andyi2it in #3607
⚠️ What's Changed
- Remove conversion webhook from manifests by @Jooho in #3476
- Remove cluster level list/watch for configmaps, serviceaccounts, secrets by @sivanantha321 in #3469
- chore: Remove Seldon Alibi dependencies. Fixes #3380 by @terrytangyuan in #3443
- docs: Move Alibi explainer to docs by @terrytangyuan in #3579
- Remove generate endpoints by @cmaddalozzo in #3654
- Remove conversion webhook from kubeflow manifest patch by @sivanantha321 in #3700
🐛 What's Fixed
- Fix:Support Parallelism in vllm runtime by @gavrishp in #3464
- fix: Instantiate HuggingfaceModelRepository only when model cannot be loaded. Fixes #3423 by @terrytangyuan in #3424
- Fix isADirectoryError in Azure blob download by @tjandy98 in #3502
- Fix bug: Remove redundant helm chart affinity on predictor CRD by @trojaond in #3481
- Make the modelcar injection idempotent by @rhuss in #3517
- Only pad left for decode-only architecture models. by @sivanantha321 in #3534
- fix lint typo on Makefile by @spolti in #3569
- fix: Set writable cache folder to avoid permission issue. Fixes #3562 by @terrytangyuan in #3576
- Fix model unload in server stop method by @sivanantha321 in #3587
- Fix golint errors by @andyi2it in #3552
- Fix make deploy-dev-storage-initializer not working by @sivanantha321 in #3617
- Fix Pydantic 2 warnings by @cmaddalozzo in #3622
- build: Fix CRD copying in generate-install.sh by @terrytangyuan in #3620
- Only load from model repository if model binary is not found under model_dir by @sivanantha321 in #3559
- build: Remove misleading logs from minimal-crdgen.sh by @terrytangyuan in #3641
- Assign device to input tensors in huggingface server with huggingface backend by @saileshd1402 in #3657
- Fix Huggingface server stopping criteria by @cmaddalozzo in #3659
- Explicitly specify pad token id when generating tokens by @sivanantha321 in #3565
- Fix quick install does not cleans up Istio installer by @sivanantha321 in #3660
- fix for extract zip from gcs by @andyi2it in #3510
- fix: HPA equality check should include annotations by @terrytangyuan in #3650
- Fix: model id and model dir check order by @yuzisun in #3680
- Fix:vLLM Model Supported check throwing circular dependency by @gavrishp in #3688
- Fix: Allow null in Finish reason streaming response in vLLM by @gavrishp in #3684
- Fix kserve version is not updated properly by python-release.sh by @sivanantha321 in #3707
- Add precaution again running v1 endpoints on openai models by @grandbora in #3694
- Typos and minor fixes by @alpe in #3429
- Fix model_id and model_dir precedence for vLLM by @yuzisun in #3718
- Fixup max_length for HF and model info for vLLM by @Datta0 in #3715
- Fix prompt token count and provide completion usage in OpenAI response by @sivanantha321 in #3712
⬆️ Version Upgrade
- Upgrade orjson to version 3.9.15 by @spolti in #3488
- feat: upgrade to new fastapi, update models to handle both pydantic v… by @timothyjlaurent in #3374
- Update cert manager version in quick install script by @shauryagoel in #3496
- ci: Bump minikube version to work with newer K8s version by @terrytangyuan in #3498
- upgrade knative to 1.13 by @andyi2it in #3457
- Upgrade istio to 1.20 works for the Github Actions by @houshengbo in #3529
- chore: Bump ModelMesh version to v0.12.0-rc0 in Helm chart by @terrytangyuan in #3642
- upgrade vllm/transformers version by @johnugeorge in #3671
🔨 Project SDLC
- Enhance CI environment by @sivanantha321 in #3440
- Fixed go lint error using golangci-lint tool. by @andyi2it in #3378
- chore: Update list of reviewers by @ckadner in #3484
- build: Add helm docs update to make generate command by @terrytangyuan in #3437
- Added v2 infer test for supported model frameworks. by @andyi2it in #3349
- fix the quote format same with others and docstrings by @leyao-daily in #3490
- remove unnecessary Istio settings from quick_install.sh by @peterj in #3493
- Remove GOARCH by @mkumatag in #3523
- GH Alert: Potential file inclusion via variable by @spolti in #3520
- Update codeQL to v3 by @spolti in #3548
- switch e2e test inference graph to raw mode by @andyi2it in #3511
- Black lint by @cmaddalozzo in #3568
- Fix python linter by @sivanantha321 in #3571
- build: Add flake8 and black to pre-commit hooks by @terrytangyuan in #3578
- build: Allow pre-commit to keep changes in reformatted code by @terrytangyuan in #3604
- Allow rerunning failed workflows by comment by @andyi2it in #3550
- add re-run info in the PR templates by @spolti in #3633
- Add e2e tests for huggingface by @sivanantha321 in #3600
- Test image builds for ARM64 arch in CI by @sivanantha321 in #3629
- workflow file for cherry-pick on comment by @andyi2it in #3653
- Fix: huggingface runtime in helm chart by @yuzisun in #3679
- Copy generated CRDs by kustomize to Helm by @Jooho in #3392
...
v0.13.0-rc1
What's Changed
- upgrade vllm/transformers version by @johnugeorge in #3671
- Add openai models endpoint by @cmaddalozzo in #3666
- feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 by @terrytangyuan in #3603
- Enable dtype support for huggingface server by @Datta0 in #3613
- Add method for checking model health/readiness by @cmaddalozzo in #3673
- fix for extract zip from gcs by @andyi2it in #3510
- Update Dockerfile and Readme by @gavrishp in #3676
- Update huggingface readme by @alexagriffith in #3678
- fix: HPA equality check should include annotations by @terrytangyuan in #3650
- Fix: huggingface runtime in helm chart by @yuzisun in #3679
- Fix: model id and model dir check order by @yuzisun in #3680
- Fix:vLLM Model Supported check throwing circular dependency by @gavrishp in #3688
- Fix: Allow null in Finish reason streaming response in vLLM by @gavrishp in #3684
- Unify the log configuration using kserve logger by @sivanantha321 in #3577
- Remove conversion webhook from kubeflow manifest patch by @sivanantha321 in #3700
- Add the field ResponseStartTimeoutSeconds to create ksvc by @houshengbo in #3705
New Contributors
Full Changelog: v0.13.0-rc0...v0.13.0-rc1
v0.13.0-rc0
🌈 What's New?
- add support for async streaming in predict by @alexagriffith in #3475
- Fix: Support model parallelism in HF transformer by @gavrishp in #3459
- Support model revision and tokenizer revision in huggingface server by @lizzzcai in #3558
- OpenAI schema by @tessapham in #3477
- Support OpenAIModel in ModelRepository by @grandbora in #3590
- updated xgboost to support json and ubj models by @andyi2it in #3551
- Add OpenAI API support to Huggingfaceserver by @cmaddalozzo in #3582
- VLLM support for OpenAI Completions in HF server by @gavrishp in #3589
- Add a user friendly error message for http exceptions by @grandbora in #3581
- feat: Provide minimal distribution of CRDs by @terrytangyuan in #3492
- set default SAFETENSORS_FAST_GPU and HF_HUB_DISABLE_TELEMETRY in HF Server by @lizzzcai in #3594
- Enabled the multiple domains support on an inference service by @houshengbo in #3615
- Add base model for proxying request to an OpenAI API enabled model server by @cmaddalozzo in #3621
- Add headers to predictor exception logging by @grandbora in #3658
- Enhance controller setup based on available CRDs by @israel-hdez in #3472
⚠️ What's Changed
- Remove conversion webhook from manifests by @Jooho in #3476
- Remove cluster level list/watch for configmaps, serviceaccounts, secrets by @sivanantha321 in #3469
- chore: Remove Seldon Alibi dependencies. Fixes #3380 by @terrytangyuan in #3443
- docs: Move Alibi explainer to docs by @terrytangyuan in #3579
- Remove generate endpoints by @cmaddalozzo in #3654
🐛 What's Fixed
- Fix:Support Parallelism in vllm runtime by @gavrishp in #3464
- fix: Instantiate HuggingfaceModelRepository only when model cannot be loaded. Fixes #3423 by @terrytangyuan in #3424
- Fix isADirectoryError in Azure blob download by @tjandy98 in #3502
- Fix bug: Remove redundant helm chart affinity on predictor CRD by @trojaond in #3481
- Make the modelcar injection idempotent by @rhuss in #3517
- Only pad left for decode-only architecture models. by @sivanantha321 in #3534
- fix lint typo on Makefile by @spolti in #3569
- fix: Set writable cache folder to avoid permission issue. Fixes #3562 by @terrytangyuan in #3576
- Fix model unload in server stop method by @sivanantha321 in #3587
- Fix golint errors by @andyi2it in #3552
- Fix make deploy-dev-storage-initializer not working by @sivanantha321 in #3617
- Fix Pydantic 2 warnings by @cmaddalozzo in #3622
- build: Fix CRD copying in generate-install.sh by @terrytangyuan in #3620
- Only load from model repository if model binary is not found under model_dir by @sivanantha321 in #3559
- build: Remove misleading logs from minimal-crdgen.sh by @terrytangyuan in #3641
- Assign device to input tensors in huggingface server with huggingface backend by @saileshd1402 in #3657
- Fix Huggingface server stopping criteria by @cmaddalozzo in #3659
- Explicitly specify pad token id when generating tokens by @sivanantha321 in #3565
- Fix quick install does not cleans up Istio installer by @sivanantha321 in #3660
⬆️ Version Upgrade
- Upgrade orjson to version 3.9.15 by @spolti in #3488
- feat: upgrade to new fastapi, update models to handle both pydantic v… by @timothyjlaurent in #3374
- Update cert manager version in quick install script by @shauryagoel in #3496
- ci: Bump minikube version to work with newer K8s version by @terrytangyuan in #3498
- upgrade knative to 1.13 by @andyi2it in #3457
- Upgrade istio to 1.20 works for the Github Actions by @houshengbo in #3529
- chore: Bump ModelMesh version to v0.12.0-rc0 in Helm chart by @terrytangyuan in #3642
🔨 Project SDLC
- Enhance CI environment by @sivanantha321 in #3440
- Fixed go lint error using golangci-lint tool. by @andyi2it in #3378
- chore: Update list of reviewers by @ckadner in #3484
- build: Add helm docs update to make generate command by @terrytangyuan in #3437
- Added v2 infer test for supported model frameworks. by @andyi2it in #3349
- fix the quote format same with others and docstrings by @leyao-daily in #3490
- remove unnecessary Istio settings from quick_install.sh by @peterj in #3493
- Remove GOARCH by @mkumatag in #3523
- GH Alert: Potential file inclusion via variable by @spolti in #3520
- Update codeQL to v3 by @spolti in #3548
- switch e2e test inference graph to raw mode by @andyi2it in #3511
- Black lint by @cmaddalozzo in #3568
- Fix python linter by @sivanantha321 in #3571
- build: Add flake8 and black to pre-commit hooks by @terrytangyuan in #3578
- build: Allow pre-commit to keep changes in reformatted code by @terrytangyuan in #3604
- Allow rerunning failed workflows by comment by @andyi2it in #3550
- add re-run info in the PR templates by @spolti in #3633
- Add e2e tests for huggingface by @sivanantha321 in #3600
- Test image builds for ARM64 arch in CI by @sivanantha321 in #3629
- workflow file for cherry-pick on comment by @andyi2it in #3653
CVE patches
- CVE-2024-24762 - update fastapi to 0.109.1 by @spolti in #3556
- golang.org/x/net Allocation of Resources Without Limits or Throttling by @spolti in #3596
- Fix CVE-2023-45288 for qpext by @sivanantha321 in #3618
- Security fix - CVE 2024 24786 by @andyi2it in #3585
📝 Documentation Update
- qpext: fix a typo in qpext doc by @daixiang0 in #3491
- Update KServe project description by @yuzisun in #3524
- Update kserve cake diagram by @yuzisun in #3530
- Remove white background for the kserve diagram by @yuzisun in #3531
- fix a typo in OPENSHIFT_GUIDE.md by @marek-veber in #3544
- Fix typo in README.md by @terrytangyuan in #3575
New Contributors
- @leyao-daily made their first contribution in #3490
- @peterj made their first contribution in #3493
- @timothyjlaurent made their first contribution in #3374
- @shauryagoel made their first contribution in #3496
- @mkumatag made their first contribution in #3523
- @marek-veber made their first contribution in #3544
- @trojaond made their first contribution in #3481
- @grandbora made their first contribution in #3590
- @saileshd1402 made their first contribution in #3657
Full Changelog: v0.12.1...v0.13.0-rc0
v0.12.1
What's Changed
- [release-0.12] Update fastapi to 0.109.1 and Support ray 2.10 by @sivanantha321 in #3609
- [release-0.12] Pydantic 2 support by @cmaddalozzo in #3614
- [release-0.12] Make the modelcar injection idempotent by @sivanantha321 in #3612
- Prepare for release 0.12.1 by @sivanantha321 in #3610
- release-0.12 pin back ray to 2.10 by @yuzisun in #3616
- [release-0.12] Fix docker build failure for ARM64 by @sivanantha321 in #3627
Full Changelog: v0.12.0...v0.12.1