Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs with NVAIE messaging (#6162) #6167

Merged
merged 1 commit into from
Aug 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,18 @@ Triton Inference Server is an open source inference serving software that stream
<iframe width="560" height="315" src="https://www.youtube.com/embed/NQDtfSi5QF4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div>

# Triton
# Triton Inference Server

Triton enables teams to deploy any AI model from multiple deep learning and machine learning frameworks, including TensorRT, TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more. Triton supports inference across cloud, data center,edge and embedded devices on NVIDIA GPUs, x86 and ARM CPU, or AWS Inferentia. Triton delivers optimized performance for many query types, including real time, batched, ensembles and audio/video streaming.
Triton Inference Server enables teams to deploy any AI model from multiple deep
learning and machine learning frameworks, including TensorRT, TensorFlow,
PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more. Triton supports inference
across cloud, data center, edge and embedded devices on NVIDIA GPUs, x86 and ARM
CPU, or AWS Inferentia. Triton Inference Server delivers optimized performance
for many query types, including real time, batched, ensembles and audio/video
streaming. Triton inference Server is part of
[NVIDIA AI Enterprise](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/),
a software platform that accelerates the data science pipeline and streamlines
the development and deployment of production AI.

Major features include:

Expand Down
41 changes: 41 additions & 0 deletions docs/user_guide/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,3 +162,44 @@ looking at the gdb trace for the segfault.

When opening a GitHub issue for the segfault with Triton, please include
the backtrace to better help us resolve the problem.

## What are the benefits of using [Triton Inference Server](https://developer.nvidia.com/triton-inference-server) as part of the [NVIDIA AI Enterprise Software Suite](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/)?

NVIDIA AI Enterprise enables enterprises to implement full AI workflows by
delivering an entire end-to-end AI platform. Four key benefits:

### Enterprise-Grade Support, Security & API Stability:

Business-critical AI projects stay on track with NVIDIA Enterprise Support,
available globally to assist both IT teams with deploying and managing the
lifecycle of AI applications and the developer teams with building AI
applications. Support includes maintenance updates, dependable SLAs and
response times. Regular security reviews and priority notifications mitigate
potential risk of unmanaged opensource and ensure compliance with corporate
standards. Finally, long term support and regression testing ensures API
stability between releases.

### Speed time to production with AI Workflows & Pretrained Models:
To reduce the complexity of developing common AI applications, NVIDIA AI
Enterprise includes
[AI workflows](https://www.nvidia.com/en-us/launchpad/ai/workflows/) which are
reference applications for specific business outcomes such as Intelligent
Virtual Assistants and Digital Fingerprinting for real-time cybersecurity threat
detection. AI workflow reference applications may include
[AI frameworks](https://docs.nvidia.com/deeplearning/frameworks/index.html) and
[pretrained models](https://developer.nvidia.com/ai-models),
[Helm Charts](https://catalog.ngc.nvidia.com/helm-charts),
[Jupyter Notebooks](https://developer.nvidia.com/run-jupyter-notebooks) and
[documentation](https://docs.nvidia.com/ai-enterprise/index.html#overview).

### Performance for Efficiency and Cost Savings:
Using accelerated compute for AI workloads such as data process with
[NVIDIA RAPIDS Accelerator](https://developer.nvidia.com/rapids) for Apache
Spark and inference with Triton Inference Sever delivers better performance
which also improves efficiency and reduces operation and infrastructure costs,
including savings from reduced time and energy consumption.

### Optimized and Certified to Deploy Everywhere:
Cloud, Data Center, Edge Optimized and certified to ensure reliable performance
whether it’s running your AI in the public cloud, virtualized data centers, or
on DGX systems.