Releases · kserve/kserve

27 Jan 21:59

yuzisun

v0.15.0-rc0

41a0904

v0.15.0-rc0 Pre-release

Pre-release

What's Changed

bump to vllm0.6.2 and add explicit chat template by @hustxiayang in #3964
bump to vllm0.6.3 by @hustxiayang in #4001
Feature: Add hf transfer by @tjandy98 in #4000
Fix snyk scan null error by @sivanantha321 in #3974
Update quick install script by @johnugeorge in #4005
Local Model Node CR by @HotsauceLee in #3978
Reduce E2Es dependency on CI environment (2) by @israel-hdez in #4008
Allow GCS to download single file by @spolti in #4015
bump to vllm0.6.3.post1 by @hustxiayang in #4023
Set default for SamplingParams.max_tokens in OpenAI requests if unset by @kevinmingtarja in #4020
Add tools functionality to vLLM by @ArjunBhalla98 in #4033
For vllm users, our parser should be able to support both - and _ by @hustxiayang in #3933
Add tools unpacking for vLLM by @ArjunBhalla98 in #4035
Multi-Node Inference Implementation by @Jooho in #3972
Enhance InjectAgent to Handle Only HTTPGet, TCP Readiness Probes by @LOADBC in #4012
Feat: Fix memory issue by replacing io.ReadAll with io.Copy (#4017) by @ops-jaeha in #4018
Update alibiexplainer example by @spolti in #4004
Fix huggingface build runs out of storage in CI by @sivanantha321 in #4044
Update snyk scan to include new images by @sivanantha321 in #4042
Introducing KServe Guru on Gurubase.io by @kursataktas in #4038
Fix Hugging Face server EncoderModel not returning probabilities by correctly passing --return_probabilities flag (#3958) by @oplushappy in #4024
Add deeper readiness check for transformer by @sivanantha321 in #3348
Fix Starlette Denial of service (DoS) via multipart/form-data by @spolti in #4006
remove duplicated import "github.com/onsi/gomega" by @carlory in #4051
Fix localmodel controller name in snyk scan workflow by @sivanantha321 in #4054
Fix azure blob storage access key env not mounted by @bentohset in #4064
Storage Initializer support single digit azure DNS zone ID by @bentohset in #4070
Fix trust remote code encoder model by @sivanantha321 in #4043
introduce the prepare-for-release.sh script by @spolti in #3993
Model cache controller and node agent by @yuzisun in #4089
Storage containers typo fix for Huggingface Storage type by @andyi2it in #4098
Support datetime object serialization in v1/v2 response by @sivanantha321 in #4099
Replace klog with klog/v2 by @sivanantha321 in #4093
Add exception handling and logging for grpc server by @sivanantha321 in #4066
Update ClusterLocalModel to LocalModelCache by @yuzisun in #4105
Fix LocalModelCache controller reconciles deleted resource by @sivanantha321 in #4106
Fix InferenceService state when Predictor pod in CrashLoopBackOff by @hdefazio in #4003
LocalModelCache Admission Webhook by @HotsauceLee in #4102
Add namespace to localmodel and localmodelnode ServiceAccount helm chart by @ritzdevp in #4111
KServe VLLM cpu image by @AyushSawant18588 in #4049
Update max_model_len calculation and fixup encoder pooling by @Datta0 in #4055
chore: use patch instead of update for finalizer changes by @whynowy in #4072
Fix isvc role localmodelcache permission by @sivanantha321 in #4131
Detect missing models and redownload models by @greenmoon55 in #4095
introduce service configuration at configmap level by @spolti in #3672
Allow multiple node groups in the model cache CR by @greenmoon55 in #4134
Annotation to disable model cache by @greenmoon55 in #4118
Clean up jobs in model cache agent by @greenmoon55 in #4140
Ensure Model root folder exists by @greenmoon55 in #4142
Add NodeGroup Name Into PVC Name by @HotsauceLee in #4141
Make LocalModel Agent reconcilation frequency configurable by @greenmoon55 in #4143
Remove deepcopy-gen in favour of controller-gen by @sivanantha321 in #4109
Add ability to set annotations on controll/webhook service and expose metrics bind port and address in helm chart by @mhowell24 in #4127
Fix EOF error for downloading zip files by @Jonas-Bruns in #4082
Remove redundant namespace yaml by @greenmoon55 in #4148
Fix Localmodel agent build by @greenmoon55 in #4150
Fix model server fails to gracefully shutdown by @sivanantha321 in #4116
Ensure root model directory exists and add protection for jobs created by @yuzisun in #4152
Enable transformer deeper readiness check tests by @sivanantha321 in #4121
Update HuggingFace server dependencies versions by @AyushSawant18588 in #4147
Add workflow for verifying go mod by @sivanantha321 in #4137
Fix for CVE-2024-52304 - aiohttp upgrade by @andyi2it in #4113
Allow other engine builders other than docker by @spolti in #3906
Add localmodelnode crd to helm chart by @greenmoon55 in #4161
Fixes Non-linear parsing of case-insensitive content by @spolti in #4158
Helm chart - option to run daemonset as root by @greenmoon55 in #4164
Replace nodeGroup with nodeGroups in charts/kserve-crd by @ritzdevp in #4166
Add affinity and tolerations to localmodel daemonset by @greenmoon55 in #4173
Fix s3 download PermanentRedirectError for legacy s3 endpoint by @bentohset in #4157
Make label and annotation propagation configurable by @spolti in #4030
Add ModelCache e2e test by @sivanantha321 in #4136
Update vllm to 0.6.6 by @rajatvig in #4176
[bugfix] fix s3 storage download filename bug by @anencore94 in #4162
Add hf to storageuri prefix list by @tjandy98 in #4184
Add Support for OpenAI-compatible Embeddings API by @FabianScheidt in #4129
fix: typo in _construct_http_status_error method by @Mgla96 in #4190
Fix raw logger e2e test by @sivanantha321 in #4185
Feat: Support configuring isvc resource defaults by @andyi2it in #4032
keep replicas when autoscaler set external by @Jooho in #4196
Increase kserve controller readiness probe time period by @sivanantha321 in #4200
Fix golangci-lint binary path selection based on GOBIN by @Jooho in #4198
Add option to disable volume management in localModel config by @ritzdevp in #4186
set MaxUnavailable(0%)/MaxSurge(100%) for rollingUpdate in multinode case by @Jooho in #4188
Gracefully shutdown the router server by @sivanantha321 in #3367
Add workflow for manual huggingface vLLM image publish by @sivanantha321 in #4092
Feat: Gateway API Support - Raw Deployment by @sivanantha321 in #3952
add make goal to build huggingface cpu image by @spolti in #4202
Cleanup the filepath in createNewFile to avoid path traversal issue by @hdefazio in #4205
Enhance multinode health_check python and manifests by @Jooho in #4197
Publish 0.15-rc0 release by @yuzisun in #4213

New Contributors

@ArjunBH...

Contributors

rajatvig, greenmoon55, and 28 other contributors

Assets 5

25 Dec 04:23

yuzisun

v0.14.1

564cc2d

v0.14.1 Latest

Latest

What's Changed

Support datetime object serialization for v1/v2 response by @sivanantha321 in #4123
Introduce LocalModelNode CR by @HotsauceLee in #3978
Update Model Cache controller for LocalModelNode and implement LocalModel node agent by @HotsauceLee @greenmoon55 in #4089
Rename ClusterLocalModel to LocalModelCache by @yuzisun in #4105
Detect missing models and redownload models by @greenmoon55 in #4095
Allow multiple node groups in the model cache CR by @greenmoon55 in #4134
Annotation to disable model cache by @greenmoon55 in #4118
Clean up jobs in local model agent by @greenmoon55 in #4140
Add node group to PVC name by @HotsauceLee in #4141
Make local node agent reconciliation frequency configurable by @greenmoon55 in #4143
Add LocalModelCache admission webhook by @HotsauceLee in #4102
Fix model server fails to gracefully shutdown by @sivanantha321 in #4116
Ensure root model directory exists and add protection for jobs created by @yuzisun #4152

Full Changelog: v0.14.0...v0.14.1

Contributors

greenmoon55, HotsauceLee, and 2 other contributors

Assets 8

16 Oct 11:16

yuzisun

v0.14.0

7e43642

v0.14.0

What's Changed

Prevent the PassthroughCluster for clients/workloads in the service mesh by @israel-hdez in #3711
Extract openai predict logic into smaller methods by @grandbora in #3716
Bump MLServer to 1.5.0 by @sivanantha321 in #3740
Refactor storage initializer to log model download time for all storage types by @sivanantha321 in #3735
inferenceservice controller: fix error check in Serverless mode by @dtrifiro in #3753
Add nccl package and Bump vLLM to 0.4.3 for huggingface runtime by @sivanantha321 in #3723
Propagate trust_remote_code flag throughout vLLM startup by @calwoo in #3729
Fix dead links on PyPI by @kevinbazira in #3754
Fix model is ready even if there is no model by @HAO2167 in #3275
Fix No model ready error in multi model serving by @sivanantha321 in #3758
Initial implementation of Inference client by @sivanantha321 in #3401
Fix logprobs for vLLM by @sivanantha321 in #3738
Fix model name not properly parsed by inference graph by @sivanantha321 in #3746
pillow - Buffer Overflow by @spolti in #3598
Use add_generation_prompt while creating chat template by @Datta0 in #3775
Deduplicate the names for the additional domain names by @houshengbo in #3773
Make Virtual Service case-insensitive by @andyi2it in #3779
Install packages needed for vllm model load by @gavrissh in #3802
Make gRPC max message length configurable by @sivanantha321 in #3741
Add readiness probe for MLServer and Increase memory for pmml in CI by @sivanantha321 in #3789
Several bug fixes for vLLM completion endpoint by @sivanantha321 in #3788
Increase timeout to make unit test stable by @Jooho in #3808
Upgrade CI deps by @sivanantha321 in #3822
Add tests for vLLM by @sivanantha321 in #3771
Bump python to 3.11 for serving runtime images and Bump poetry to 1.8.3 by @sivanantha321 in #3812
Bump vLLM to 0.5.3.post1 by @sivanantha321 in #3828
Refactor the ModelServer to let uvicorn handle multiple workers and use 'spawn' for mutiprocessing by @sivanantha321 in #3757
Update golang for docs/Dockerfile to 1.21 by @spolti in #3761
Make ray an optional dependency by @sivanantha321 in #3834
Update aif example by @spolti in #3765
Use helm for quick installation by @sivanantha321 in #3813
Allow KServe to have its own local gateways for Serverless mode by @israel-hdez in #3737
Add support for Azure DNS zone endpoints by @tjandy98 in #3819
Fix failed build for knativeLocalGatewayService by @yuzisun in #3866
Add logging request feature for vLLM backend by @sivanantha321 in #3849
Bump vLLM to 0.5.4 by @sivanantha321 in #3874
Fix: Add workaround for snyk image scan failure by @sivanantha321 in #3880
Fix trust_remote_code not working with huggingface backend by @sivanantha321 in #3879
Update KServe 2024-2025 Roadmap by @yuzisun in #3810
Configurable image pull secrets in Helm charts by @saileshd1402 in #3838
Fix issue with rolling update behavior by @andyi2it in #3786
Fix the 'tokens exceeding model limit' error response in vllm server by @saileshd1402 in #3886
Add support for binary data extension protocol and FP16 datatype by @sivanantha321 in #3685
Protobuf version upgrade 4.25.4 by @andyi2it in #3881
Adds optional labels and annotations to the controller by @guitouni in #3366
Enable Server-Side Apply for Kustomize Overlays in Test Environment by @Jooho in #3877
bufix: update image_transformer.py to handle changes in input structure by @zwong91 in #3830
support text embedding task in hugging face server by @kevinmingtarja in #3743
Rename max_length parameter to max_model_len to be in sync with vLLM by @Datta0 in #3827
[Upstream] - Update-istio version based on go version 1.21 by @mholder6 in #3825
Enrich isvc NotReady events for failed conditions by @asdqwe123zxc in #3303
adding metadata on requests by @gcemaj in #3635
Publish 0.14.0-rc0 release by @yuzisun in #3867
Use API token for publishing package to PyPI by @sivanantha321 in #3896
Fix sdlc broken when kserve installed using helm by @sivanantha321 in #3890
Add Security Context and Resources to RBAC Proxy by @HotsauceLee in #3898
Remove unwanted cluster scope secret permissions by @sivanantha321 in #3893
bump to vllm 0.5.5 by @lizzzcai in #3911
pin gosec to 2.20.0 by @greenmoon55 in #3921
add a new doc 'common issues and solutions' by @Jooho in #3878
Implement health endpoint for vLLM backend by @sivanantha321 in #3850
Add security best practices for inferenceservice, inferencegraph, servingruntimes by @sivanantha321 in #3917
Bump Go to 1.22 by @sivanantha321 in #3912
bump to vllm 0.6.0 by @hustxiayang in #3934
Set the volume mount's readonly annotation based on the ISVC annotation by @hdefazio in #3885
mount /dev/shm volume to huggingfaceserver by @lizzzcai in #3910
Fix permission error in snyk scan by @sivanantha321 in #3889
Cluster Local Model CR by @greenmoon55 in #3839
added http headers to inbound request by @andyi2it in #3895
Add prow-github-action by @sivanantha321 in #3888
Add TLS support for Inference Loggers by @ruivieira in #3863
Fix explainer endpoint not working with path based routing by @sivanantha321 in #3257
Fix ingress configuration for path based routing and update go mod by @sivanantha321 in #3944
Add HostIPC field to ServingRuntimePodSpec by @greenmoon55 in #3943
remove conversion wehbook part from self-signed-ca.sh by @Jooho in #3941
update fluid kserve sample to use huggingface servingruntime by @lizzzcai in #3907
bump to vLLM0.6.1post2 by @hustxiayang in #3948
Add NodeDownloadPending status to ClusterLocalModel by @greenmoon55 in #3955
add tags to rest server timing logs to differentiate cpu and wall time by @gfkeith in #3954
Implement Huggingface model download in storage initializer by @andyi2it in #3584
Update OWNERS file by @yuzisun in #3966
Cluster local model controller by @greenmoon55 in #3860
Prepare for 0.14.0-rc1 release and automate sync process by @sivanantha321 in #3970
add a new API for multi-node/multi-gpu by @Jooho in #3871
Fix update-openapigen.sh that can be executed from kserve dir by @Jooho in #3924
Add python 3.12 support and remove python 3.8 support by @sivanantha321 in #3645
Fix openssl vulnerability CWE-1395 by @sivanantha321 in #3975
Fix Kubernetes Doc Links by @jyono in #3670
Fix kserve local testing env by @yuzisun in #3981
Fix streaming response not working properly with logger by @sivanantha321 in #3847
Add a flag for automount serviceaccount token by @greenmoon55 in https://github.com/kserve/ks...

Contributors

ruivieira, greenmoon55, and 29 other contributors

Assets 8

03 Oct 08:44

yuzisun

v0.14.0-rc1

a50fdc9

v0.14.0-rc1 Pre-release

Pre-release

What's Changed

Publish 0.14.0-rc0 release by @yuzisun in #3867
Use API token for publishing package to PyPI by @sivanantha321 in #3896
Fix sdlc broken when kserve installed using helm by @sivanantha321 in #3890
Add Security Context and Resources to RBAC Proxy by @HotsauceLee in #3898
Remove unwanted cluster scope secret permissions by @sivanantha321 in #3893
bump to vllm 0.5.5 by @lizzzcai in #3911
pin gosec to 2.20.0 by @greenmoon55 in #3921
add a new doc 'common issues and solutions' by @Jooho in #3878
Implement health endpoint for vLLM backend by @sivanantha321 in #3850
Add security best practices for inferenceservice, inferencegraph, servingruntimes by @sivanantha321 in #3917
Bump Go to 1.22 by @sivanantha321 in #3912
bump to vllm 0.6.0 by @hustxiayang in #3934
Set the volume mount's readonly annotation based on the ISVC annotation by @hdefazio in #3885
mount /dev/shm volume to huggingfaceserver by @lizzzcai in #3910
Fix permission error in snyk scan by @sivanantha321 in #3889
Cluster Local Model CR by @greenmoon55 in #3839
added http headers to inbound request by @andyi2it in #3895
Add prow-github-action by @sivanantha321 in #3888
Add TLS support for Inference Loggers by @ruivieira in #3863
Fix explainer endpoint not working with path based routing by @sivanantha321 in #3257
Fix ingress configuration for path based routing and update go mod by @sivanantha321 in #3944
Add HostIPC field to ServingRuntimePodSpec by @greenmoon55 in #3943
remove conversion wehbook part from self-signed-ca.sh by @Jooho in #3941
update fluid kserve sample to use huggingface servingruntime by @lizzzcai in #3907
bump to vLLM0.6.1post2 by @hustxiayang in #3948
Add NodeDownloadPending status to ClusterLocalModel by @greenmoon55 in #3955
add tags to rest server timing logs to differentiate cpu and wall time by @gfkeith in #3954
Implement Huggingface model download in storage initializer by @andyi2it in #3584
Update OWNERS file by @yuzisun in #3966
Cluster local model controller by @greenmoon55 in #3860
Prepare for 0.14.0-rc1 release and automate sync process by @sivanantha321 in #3970

New Contributors

@HotsauceLee made their first contribution in #3898
@hustxiayang made their first contribution in #3934
@hdefazio made their first contribution in #3885
@ruivieira made their first contribution in #3863
@gfkeith made their first contribution in #3954

Full Changelog: v0.14.0-rc0...v0.14.0-rc1

Contributors

ruivieira, greenmoon55, and 9 other contributors

Assets 8

27 Aug 03:20

yuzisun

v0.14.0-rc0

0ad935e

v0.14.0-rc0 Pre-release

Pre-release

What's Changed

Prevent the PassthroughCluster for clients/workloads in the service mesh by @israel-hdez in #3711
Extract openai predict logic into smaller methods by @grandbora in #3716
Bump MLServer to 1.5.0 by @sivanantha321 in #3740
Refactor storage initializer to log model download time for all storage types by @sivanantha321 in #3735
inferenceservice controller: fix error check in Serverless mode by @dtrifiro in #3753
Add nccl package and Bump vLLM to 0.4.3 for huggingface runtime by @sivanantha321 in #3723
Propagate trust_remote_code flag throughout vLLM startup by @calwoo in #3729
Fix dead links on PyPI by @kevinbazira in #3754
Fix model is ready even if there is no model by @HAO2167 in #3275
Fix No model ready error in multi model serving by @sivanantha321 in #3758
Initial implementation of Inference client by @sivanantha321 in #3401
Fix logprobs for vLLM by @sivanantha321 in #3738
Fix model name not properly parsed by inference graph by @sivanantha321 in #3746
pillow - Buffer Overflow by @spolti in #3598
Use add_generation_prompt while creating chat template by @Datta0 in #3775
Deduplicate the names for the additional domain names by @houshengbo in #3773
Make Virtual Service case-insensitive by @andyi2it in #3779
Install packages needed for vllm model load by @gavrissh in #3802
Make gRPC max message length configurable by @sivanantha321 in #3741
Add readiness probe for MLServer and Increase memory for pmml in CI by @sivanantha321 in #3789
Several bug fixes for vLLM completion endpoint by @sivanantha321 in #3788
Increase timeout to make unit test stable by @Jooho in #3808
Upgrade CI deps by @sivanantha321 in #3822
Add tests for vLLM by @sivanantha321 in #3771
Bump python to 3.11 for serving runtime images and Bump poetry to 1.8.3 by @sivanantha321 in #3812
Bump vLLM to 0.5.3.post1 by @sivanantha321 in #3828
Refactor the ModelServer to let uvicorn handle multiple workers and use 'spawn' for mutiprocessing by @sivanantha321 in #3757
Update golang for docs/Dockerfile to 1.21 by @spolti in #3761
Make ray an optional dependency by @sivanantha321 in #3834
Update aif example by @spolti in #3765
Use helm for quick installation by @sivanantha321 in #3813
Allow KServe to have its own local gateways for Serverless mode by @israel-hdez in #3737
Add support for Azure DNS zone endpoints by @tjandy98 in #3819
Fix failed build for knativeLocalGatewayService by @yuzisun in #3866
Add logging request feature for vLLM backend by @sivanantha321 in #3849
Bump vLLM to 0.5.4 by @sivanantha321 in #3874
Fix: Add workaround for snyk image scan failure by @sivanantha321 in #3880
Fix trust_remote_code not working with huggingface backend by @sivanantha321 in #3879
Update KServe 2024-2025 Roadmap by @yuzisun in #3810
Configurable image pull secrets in Helm charts by @saileshd1402 in #3838
Fix issue with rolling update behavior by @andyi2it in #3786
Fix the 'tokens exceeding model limit' error response in vllm server by @saileshd1402 in #3886
Add support for binary data extension protocol and FP16 datatype by @sivanantha321 in #3685
Protobuf version upgrade 4.25.4 by @andyi2it in #3881
Adds optional labels and annotations to the controller by @guitouni in #3366
Enable Server-Side Apply for Kustomize Overlays in Test Environment by @Jooho in #3877
bufix: update image_transformer.py to handle changes in input structure by @zwong91 in #3830
support text embedding task in hugging face server by @kevinmingtarja in #3743
Rename max_length parameter to max_model_len to be in sync with vLLM by @Datta0 in #3827
[Upstream] - Update-istio version based on go version 1.21 by @mholder6 in #3825
Enrich isvc NotReady events for failed conditions by @asdqwe123zxc in #3303
adding metadata on requests by @gcemaj in #3635

New Contributors

@calwoo made their first contribution in #3729
@guitouni made their first contribution in #3366
@zwong91 made their first contribution in #3830
@mholder6 made their first contribution in #3825
@asdqwe123zxc made their first contribution in #3303
@gcemaj made their first contribution in #3635

Full Changelog: v0.13.0...v0.14.0-rc0

Contributors

grandbora, houshengbo, and 20 other contributors

Assets 8

28 Jul 17:22

yuzisun

v0.13.1

e7d9ac8

v0.13.1

What's Changed

Add nccl package and Bump vLLM to 0.4.3 for huggingface runtime by @sivanantha321 (#3723)
Propagate trust_remote_code flag throughout vLLM startup by @calwoo (#3729)
Use add_generation_prompt while creating chat template by @Datta0 (#3775)
Fix logprobs for vLLM by @sivanantha321 (#3738)
Install packages needed for vllm model load by @gavrissh (#3802)
Publish 0.13.1 Release by @johnugeorge in #3824

Full Changelog: v0.13.0...v0.13.1

Contributors

johnugeorge, calwoo, and 3 other contributors

Assets 8

05 Jun 13:38

yuzisun

v0.13.0

1c51eee

v0.13.0

🌈 What's New?

add support for async streaming in predict by @alexagriffith in #3475
Fix: Support model parallelism in HF transformer by @gavrishp in #3459
Support model revision and tokenizer revision in huggingface server by @lizzzcai in #3558
OpenAI schema by @tessapham in #3477
Support OpenAIModel in ModelRepository by @grandbora in #3590
updated xgboost to support json and ubj models by @andyi2it in #3551
Add OpenAI API support to Huggingfaceserver by @cmaddalozzo in #3582
VLLM support for OpenAI Completions in HF server by @gavrishp in #3589
Add a user friendly error message for http exceptions by @grandbora in #3581
feat: Provide minimal distribution of CRDs by @terrytangyuan in #3492
set default SAFETENSORS_FAST_GPU and HF_HUB_DISABLE_TELEMETRY in HF Server by @lizzzcai in #3594
Enabled the multiple domains support on an inference service by @houshengbo in #3615
Add base model for proxying request to an OpenAI API enabled model server by @cmaddalozzo in #3621
Add headers to predictor exception logging by @grandbora in #3658
Enhance controller setup based on available CRDs by @israel-hdez in #3472
Add openai models endpoint by @cmaddalozzo in #3666
feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 by @terrytangyuan in #3603
Enable dtype support for huggingface server by @Datta0 in #3613
Add method for checking model health/readiness by @cmaddalozzo in #3673
Unify the log configuration using kserve logger by @sivanantha321 in #3577
Add the field ResponseStartTimeoutSeconds to create ksvc by @houshengbo in #3705
Add FP16 datatype support for OIP grpc by @sivanantha321 in #3695
Add option for returning probabilities in huggingface server by @andyi2it in #3607

⚠️ What's Changed

Remove conversion webhook from manifests by @Jooho in #3476
Remove cluster level list/watch for configmaps, serviceaccounts, secrets by @sivanantha321 in #3469
chore: Remove Seldon Alibi dependencies. Fixes #3380 by @terrytangyuan in #3443
docs: Move Alibi explainer to docs by @terrytangyuan in #3579
Remove generate endpoints by @cmaddalozzo in #3654
Remove conversion webhook from kubeflow manifest patch by @sivanantha321 in #3700

🐛 What's Fixed

Fix:Support Parallelism in vllm runtime by @gavrishp in #3464
fix: Instantiate HuggingfaceModelRepository only when model cannot be loaded. Fixes #3423 by @terrytangyuan in #3424
Fix isADirectoryError in Azure blob download by @tjandy98 in #3502
Fix bug: Remove redundant helm chart affinity on predictor CRD by @trojaond in #3481
Make the modelcar injection idempotent by @rhuss in #3517
Only pad left for decode-only architecture models. by @sivanantha321 in #3534
fix lint typo on Makefile by @spolti in #3569
fix: Set writable cache folder to avoid permission issue. Fixes #3562 by @terrytangyuan in #3576
Fix model unload in server stop method by @sivanantha321 in #3587
Fix golint errors by @andyi2it in #3552
Fix make deploy-dev-storage-initializer not working by @sivanantha321 in #3617
Fix Pydantic 2 warnings by @cmaddalozzo in #3622
build: Fix CRD copying in generate-install.sh by @terrytangyuan in #3620
Only load from model repository if model binary is not found under model_dir by @sivanantha321 in #3559
build: Remove misleading logs from minimal-crdgen.sh by @terrytangyuan in #3641
Assign device to input tensors in huggingface server with huggingface backend by @saileshd1402 in #3657
Fix Huggingface server stopping criteria by @cmaddalozzo in #3659
Explicitly specify pad token id when generating tokens by @sivanantha321 in #3565
Fix quick install does not cleans up Istio installer by @sivanantha321 in #3660
fix for extract zip from gcs by @andyi2it in #3510
fix: HPA equality check should include annotations by @terrytangyuan in #3650
Fix: model id and model dir check order by @yuzisun in #3680
Fix:vLLM Model Supported check throwing circular dependency by @gavrishp in #3688
Fix: Allow null in Finish reason streaming response in vLLM by @gavrishp in #3684
Fix kserve version is not updated properly by python-release.sh by @sivanantha321 in #3707
Add precaution again running v1 endpoints on openai models by @grandbora in #3694
Typos and minor fixes by @alpe in #3429
Fix model_id and model_dir precedence for vLLM by @yuzisun in #3718
Fixup max_length for HF and model info for vLLM by @Datta0 in #3715
Fix prompt token count and provide completion usage in OpenAI response by @sivanantha321 in #3712

⬆️ Version Upgrade

Upgrade orjson to version 3.9.15 by @spolti in #3488
feat: upgrade to new fastapi, update models to handle both pydantic v… by @timothyjlaurent in #3374
Update cert manager version in quick install script by @shauryagoel in #3496
ci: Bump minikube version to work with newer K8s version by @terrytangyuan in #3498
upgrade knative to 1.13 by @andyi2it in #3457
Upgrade istio to 1.20 works for the Github Actions by @houshengbo in #3529
chore: Bump ModelMesh version to v0.12.0-rc0 in Helm chart by @terrytangyuan in #3642
upgrade vllm/transformers version by @johnugeorge in #3671

🔨 Project SDLC

Enhance CI environment by @sivanantha321 in #3440
Fixed go lint error using golangci-lint tool. by @andyi2it in #3378
chore: Update list of reviewers by @ckadner in #3484
build: Add helm docs update to make generate command by @terrytangyuan in #3437
Added v2 infer test for supported model frameworks. by @andyi2it in #3349
fix the quote format same with others and docstrings by @leyao-daily in #3490
remove unnecessary Istio settings from quick_install.sh by @peterj in #3493
Remove GOARCH by @mkumatag in #3523
GH Alert: Potential file inclusion via variable by @spolti in #3520
Update codeQL to v3 by @spolti in #3548
switch e2e test inference graph to raw mode by @andyi2it in #3511
Black lint by @cmaddalozzo in #3568
Fix python linter by @sivanantha321 in #3571
build: Add flake8 and black to pre-commit hooks by @terrytangyuan in #3578
build: Allow pre-commit to keep changes in reformatted code by @terrytangyuan in #3604
Allow rerunning failed workflows by comment by @andyi2it in #3550
add re-run info in the PR templates by @spolti in #3633
Add e2e tests for huggingface by @sivanantha321 in #3600
Test image builds for ARM64 arch in CI by @sivanantha321 in #3629
workflow file for cherry-pick on comment by @andyi2it in #3653
Fix: huggingface runtime in helm chart by @yuzisun in #3679
Copy generated CRDs by kustomize to Helm by @Jooho in #3392
...

Contributors

alpe, rhuss, and 27 other contributors

Assets 8

21 May 09:58

yuzisun

v0.13.0-rc1

6c37dce

v0.13.0-rc1 Pre-release

Pre-release

What's Changed

upgrade vllm/transformers version by @johnugeorge in #3671
Add openai models endpoint by @cmaddalozzo in #3666
feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 by @terrytangyuan in #3603
Enable dtype support for huggingface server by @Datta0 in #3613
Add method for checking model health/readiness by @cmaddalozzo in #3673
fix for extract zip from gcs by @andyi2it in #3510
Update Dockerfile and Readme by @gavrishp in #3676
Update huggingface readme by @alexagriffith in #3678
fix: HPA equality check should include annotations by @terrytangyuan in #3650
Fix: huggingface runtime in helm chart by @yuzisun in #3679
Fix: model id and model dir check order by @yuzisun in #3680
Fix:vLLM Model Supported check throwing circular dependency by @gavrishp in #3688
Fix: Allow null in Finish reason streaming response in vLLM by @gavrishp in #3684
Unify the log configuration using kserve logger by @sivanantha321 in #3577
Remove conversion webhook from kubeflow manifest patch by @sivanantha321 in #3700
Add the field ResponseStartTimeoutSeconds to create ksvc by @houshengbo in #3705

New Contributors

@Datta0 made their first contribution in #3613

Full Changelog: v0.13.0-rc0...v0.13.0-rc1

Contributors

cmaddalozzo, houshengbo, and 8 other contributors

Assets 7

07 May 10:11

yuzisun

v0.13.0-rc0

bfc2e21

v0.13.0-rc0 Pre-release

Pre-release

🌈 What's New?

add support for async streaming in predict by @alexagriffith in #3475
Fix: Support model parallelism in HF transformer by @gavrishp in #3459
Support model revision and tokenizer revision in huggingface server by @lizzzcai in #3558
OpenAI schema by @tessapham in #3477
Support OpenAIModel in ModelRepository by @grandbora in #3590
updated xgboost to support json and ubj models by @andyi2it in #3551
Add OpenAI API support to Huggingfaceserver by @cmaddalozzo in #3582
VLLM support for OpenAI Completions in HF server by @gavrishp in #3589
Add a user friendly error message for http exceptions by @grandbora in #3581
feat: Provide minimal distribution of CRDs by @terrytangyuan in #3492
set default SAFETENSORS_FAST_GPU and HF_HUB_DISABLE_TELEMETRY in HF Server by @lizzzcai in #3594
Enabled the multiple domains support on an inference service by @houshengbo in #3615
Add base model for proxying request to an OpenAI API enabled model server by @cmaddalozzo in #3621
Add headers to predictor exception logging by @grandbora in #3658
Enhance controller setup based on available CRDs by @israel-hdez in #3472

⚠️ What's Changed

Remove conversion webhook from manifests by @Jooho in #3476
Remove cluster level list/watch for configmaps, serviceaccounts, secrets by @sivanantha321 in #3469
chore: Remove Seldon Alibi dependencies. Fixes #3380 by @terrytangyuan in #3443
docs: Move Alibi explainer to docs by @terrytangyuan in #3579
Remove generate endpoints by @cmaddalozzo in #3654

🐛 What's Fixed

Fix:Support Parallelism in vllm runtime by @gavrishp in #3464
fix: Instantiate HuggingfaceModelRepository only when model cannot be loaded. Fixes #3423 by @terrytangyuan in #3424
Fix isADirectoryError in Azure blob download by @tjandy98 in #3502
Fix bug: Remove redundant helm chart affinity on predictor CRD by @trojaond in #3481
Make the modelcar injection idempotent by @rhuss in #3517
Only pad left for decode-only architecture models. by @sivanantha321 in #3534
fix lint typo on Makefile by @spolti in #3569
fix: Set writable cache folder to avoid permission issue. Fixes #3562 by @terrytangyuan in #3576
Fix model unload in server stop method by @sivanantha321 in #3587
Fix golint errors by @andyi2it in #3552
Fix make deploy-dev-storage-initializer not working by @sivanantha321 in #3617
Fix Pydantic 2 warnings by @cmaddalozzo in #3622
build: Fix CRD copying in generate-install.sh by @terrytangyuan in #3620
Only load from model repository if model binary is not found under model_dir by @sivanantha321 in #3559
build: Remove misleading logs from minimal-crdgen.sh by @terrytangyuan in #3641
Assign device to input tensors in huggingface server with huggingface backend by @saileshd1402 in #3657
Fix Huggingface server stopping criteria by @cmaddalozzo in #3659
Explicitly specify pad token id when generating tokens by @sivanantha321 in #3565
Fix quick install does not cleans up Istio installer by @sivanantha321 in #3660

⬆️ Version Upgrade

Upgrade orjson to version 3.9.15 by @spolti in #3488
feat: upgrade to new fastapi, update models to handle both pydantic v… by @timothyjlaurent in #3374
Update cert manager version in quick install script by @shauryagoel in #3496
ci: Bump minikube version to work with newer K8s version by @terrytangyuan in #3498
upgrade knative to 1.13 by @andyi2it in #3457
Upgrade istio to 1.20 works for the Github Actions by @houshengbo in #3529
chore: Bump ModelMesh version to v0.12.0-rc0 in Helm chart by @terrytangyuan in #3642

🔨 Project SDLC

Enhance CI environment by @sivanantha321 in #3440
Fixed go lint error using golangci-lint tool. by @andyi2it in #3378
chore: Update list of reviewers by @ckadner in #3484
build: Add helm docs update to make generate command by @terrytangyuan in #3437
Added v2 infer test for supported model frameworks. by @andyi2it in #3349
fix the quote format same with others and docstrings by @leyao-daily in #3490
remove unnecessary Istio settings from quick_install.sh by @peterj in #3493
Remove GOARCH by @mkumatag in #3523
GH Alert: Potential file inclusion via variable by @spolti in #3520
Update codeQL to v3 by @spolti in #3548
switch e2e test inference graph to raw mode by @andyi2it in #3511
Black lint by @cmaddalozzo in #3568
Fix python linter by @sivanantha321 in #3571
build: Add flake8 and black to pre-commit hooks by @terrytangyuan in #3578
build: Allow pre-commit to keep changes in reformatted code by @terrytangyuan in #3604
Allow rerunning failed workflows by comment by @andyi2it in #3550
add re-run info in the PR templates by @spolti in #3633
Add e2e tests for huggingface by @sivanantha321 in #3600
Test image builds for ARM64 arch in CI by @sivanantha321 in #3629
workflow file for cherry-pick on comment by @andyi2it in #3653

CVE patches

CVE-2024-24762 - update fastapi to 0.109.1 by @spolti in #3556
golang.org/x/net Allocation of Resources Without Limits or Throttling by @spolti in #3596
Fix CVE-2023-45288 for qpext by @sivanantha321 in #3618
Security fix - CVE 2024 24786 by @andyi2it in #3585

📝 Documentation Update

qpext: fix a typo in qpext doc by @daixiang0 in #3491
Update KServe project description by @yuzisun in #3524
Update kserve cake diagram by @yuzisun in #3530
Remove white background for the kserve diagram by @yuzisun in #3531
fix a typo in OPENSHIFT_GUIDE.md by @marek-veber in #3544
Fix typo in README.md by @terrytangyuan in #3575

New Contributors

@leyao-daily made their first contribution in #3490
@peterj made their first contribution in #3493
@timothyjlaurent made their first contribution in #3374
@shauryagoel made their first contribution in #3496
@mkumatag made their first contribution in #3523
@marek-veber made their first contribution in #3544
@trojaond made their first contribution in #3481
@grandbora made their first contribution in #3590
@saileshd1402 made their first contribution in #3657

Full Changelog: v0.12.1...v0.13.0-rc0

Contributors

rhuss, cmaddalozzo, and 24 other contributors

Assets 8

23 Apr 12:20

yuzisun

v0.12.1

d94ca25

v0.12.1

What's Changed

[release-0.12] Update fastapi to 0.109.1 and Support ray 2.10 by @sivanantha321 in #3609
[release-0.12] Pydantic 2 support by @cmaddalozzo in #3614
[release-0.12] Make the modelcar injection idempotent by @sivanantha321 in #3612
Prepare for release 0.12.1 by @sivanantha321 in #3610
release-0.12 pin back ray to 2.10 by @yuzisun in #3616
[release-0.12] Fix docker build failure for ARM64 by @sivanantha321 in #3627

Full Changelog: v0.12.0...v0.12.1

Contributors

cmaddalozzo, yuzisun, and sivanantha321

Assets 7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

🌈 What's New?

⚠️ What's Changed

🐛 What's Fixed

⬆️ Version Upgrade

🔨 Project SDLC

Contributors

What's Changed

New Contributors

Contributors

🌈 What's New?

⚠️ What's Changed

🐛 What's Fixed

⬆️ Version Upgrade

🔨 Project SDLC

CVE patches

📝 Documentation Update

New Contributors

Contributors

What's Changed

Contributors

Releases: kserve/kserve

v0.15.0-rc0

What's Changed

New Contributors

Contributors

v0.14.1

What's Changed

Contributors

v0.14.0

What's Changed

Contributors

v0.14.0-rc1

What's Changed

New Contributors

Contributors

v0.14.0-rc0

What's Changed

New Contributors

Contributors

v0.13.1

What's Changed

Contributors

v0.13.0

🌈 What's New?

⚠️ What's Changed

🐛 What's Fixed

⬆️ Version Upgrade

🔨 Project SDLC

Contributors

v0.13.0-rc1

What's Changed

New Contributors

Contributors

v0.13.0-rc0

🌈 What's New?

⚠️ What's Changed

🐛 What's Fixed

⬆️ Version Upgrade

🔨 Project SDLC

CVE patches

📝 Documentation Update

New Contributors

Contributors

v0.12.1

What's Changed

Contributors