Skip to content

TorchServe v0.8.2 Release Notes

Compare
Choose a tag to compare
@lxning lxning released this 28 Aug 23:20
· 408 commits to master since this release
04e0b37

This is the release of TorchServe v0.8.2.

Security

  • Updated snakeyaml version to v2 #2523 @nskool
  • Added warning about model allowed urls when default value is applied #2534 @namannandan

Custom metrics backwards compatibility

  • add_metric is now backwards compatible with versions [< v0.6.1] but the default metric type is inferred to be COUNTER. If the metric is of a different type, it will need to be specified in the call to add_metric as follows:
    metrics.add_metric(name='GenericMetric', value=10, unit='count', dimensions=[...], metric_type=MetricTypes.GAUGE)
  • When upgrading from versions [v0.6.1 - v0.8.1] to v0.8.2, replace the call to add_metric with add_metric_to_cache.
  • All custom metrics updated in the custom handler will need to be included in the metrics configuration file for them to be emitted by Torchserve. This is shown here.
  • A detailed upgrade guide is included in the metrics documentation.

New Features

New Examples

  1. Example LLama v2 70B chat using HuggingFace Accelerate #2494 @lxning @HamidShojanazeri @agunapal

  2. large model example OPT-6.7B on Inferentia2 #2399 @namannandan

    • This example demonstrates how NeuronX compiles the model , detects neuron core availability and runs the inference.
  3. DeepSpeed deferred init with OPT-30B #2419 @agunapal

    • This PR added feature deferred model init in OPT-30B example by leveraging DeepSpeed new version. This feature is able to significantly reduce model loading latency.
  4. Torch TensorRT example #2483 @agunapal

    • This PR uses Resnet-50 as an example to demonstrate Torch TensorRT.
  5. K8S mnist example using minikube #2323 @agunapal

    • This example shows how to use a pre-trained custom MNIST model to performing real time Digit recognition via K8S.
  6. Example for custom metrics #2516 @namannandan

  7. Example for object detection with ultralytics YOLO v8 model #2508 @agunapal

Improvements

Documentation

Platform Support

Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.

GPU Support

Torch 2.0.1 + Cuda 11.7, 11.8
Torch 2.0.0 + Cuda 11.7, 11.8
Torch 1.13 + Cuda 11.7, 11.8
Torch 1.11 + Cuda 10.2, 11.3, 11.6
Torch 1.9.0 + Cuda 11.1
Torch 1.8.1 + Cuda 9.2