Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Adding RestrictedFeatures Support to the Python Frontend Bindings #7775

Open
wants to merge 57 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 56 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
86584d7
reaching _populate_restricted_features helper function
KrishnanPrash Oct 19, 2024
8f09b68
working rough draft for C++ implementation
KrishnanPrash Oct 19, 2024
d418823
removing hardcoding
KrishnanPrash Oct 19, 2024
c5dfb0b
updating with changes from main
KrishnanPrash Oct 31, 2024
9020e4a
Cleaning up includes
KrishnanPrash Oct 31, 2024
b739339
RestrictedFeature Protocols Enum
KrishnanPrash Oct 31, 2024
41f394d
Incomplete restricted features class
KrishnanPrash Oct 31, 2024
36acf25
Adding '.pyi' support to copyright hook
KrishnanPrash Nov 1, 2024
e4fe9f9
User workflow #1
KrishnanPrash Nov 4, 2024
151acf4
Working RestrictedFeatures Class
KrishnanPrash Nov 4, 2024
48c3a59
Spacing
KrishnanPrash Nov 4, 2024
a2f24de
Making handle_triton_error compatible with non-void funcs
KrishnanPrash Nov 4, 2024
eefe8e6
Working Python Workflow
KrishnanPrash Nov 5, 2024
6368acb
Fixed includes and added triton-common-json dependency
KrishnanPrash Nov 6, 2024
8b962f8
Working passing of restricted_features json to C++ and parsing the json
KrishnanPrash Nov 7, 2024
91b3b68
Working Json parsing on C++ side
KrishnanPrash Nov 7, 2024
dea9b00
Working Solution that connects to the HTTP and gRPC frontends
KrishnanPrash Nov 7, 2024
291d0cc
Cleaning up code
KrishnanPrash Nov 7, 2024
ba42dbe
Working basic test suite
KrishnanPrash Nov 7, 2024
a4f0540
Renaming and Testing
KrishnanPrash Nov 7, 2024
9c992bc
Undoing unrelated changes
KrishnanPrash Nov 7, 2024
231e85a
Cleaning up includes
KrishnanPrash Nov 7, 2024
ee66d9c
Spacing
KrishnanPrash Nov 7, 2024
f29c14f
Merge branch 'main' into kprashanth-tritonfrontend-rfeatures
KrishnanPrash Nov 8, 2024
5671156
removing unused imports
KrishnanPrash Nov 8, 2024
ed74aa9
documentation and clean up
KrishnanPrash Nov 8, 2024
f21da8f
documentation and clean up
KrishnanPrash Nov 8, 2024
1027984
Clean up
KrishnanPrash Nov 8, 2024
06254c8
testing: removed untested/extra restricted features
KrishnanPrash Nov 8, 2024
3703c04
Comments and Docs with examples
KrishnanPrash Nov 12, 2024
770dd79
Changing restricted_apis/protocols to restricted_features
KrishnanPrash Nov 12, 2024
b46b124
Removing unused import
KrishnanPrash Nov 12, 2024
a490275
Documentation and Formatting
KrishnanPrash Nov 12, 2024
24bbbc1
Update src/python/tritonfrontend/_api/_error_mapping.py
KrishnanPrash Nov 20, 2024
101f409
Revising docs, Support for removing Features, Adding Testing
KrishnanPrash Nov 30, 2024
b5853a0
Merge branch 'main' into kprashanth-tritonfrontend-rfeatures
KrishnanPrash Nov 30, 2024
30375db
Clean up
KrishnanPrash Nov 30, 2024
3be2556
Adding decarator support for field validator and fixing variable names
KrishnanPrash Nov 30, 2024
c2643ff
Raising tritonfrontend error instead of tritonserver error
KrishnanPrash Dec 2, 2024
0f82ca2
Removing unused import
KrishnanPrash Dec 2, 2024
c415fdb
Update qa/L0_python_api/test_kserve.py
KrishnanPrash Dec 2, 2024
dae9159
Merge branch 'main' into kprashanth-tritonfrontend-rfeatures
KrishnanPrash Dec 13, 2024
17c2831
Update qa/L0_python_api/test_kserve.py
KrishnanPrash Dec 13, 2024
abf7407
removing redundant testing
KrishnanPrash Dec 13, 2024
4953905
Testing invalid value and no headers
KrishnanPrash Dec 13, 2024
714fdec
error_mapping comment
KrishnanPrash Dec 19, 2024
81afad1
comment formatting
KrishnanPrash Dec 19, 2024
1dd4031
update and remove functionality added to RF
KrishnanPrash Dec 20, 2024
dfd1b74
Adding testing
KrishnanPrash Dec 20, 2024
83d98c4
Skipping grpc tests
KrishnanPrash Dec 20, 2024
2176034
Merge branch 'main' into kprashanth-tritonfrontend-rfeatures
KrishnanPrash Dec 20, 2024
2231663
Updating restricted features docs and adding comments
KrishnanPrash Dec 20, 2024
aa28071
removed unused import and added comment
KrishnanPrash Dec 20, 2024
b699a8c
Fix no endpoint/no file-system build
KrishnanPrash Dec 20, 2024
fbaf864
Fix for Metrics/RestrictedFeatures path
KrishnanPrash Dec 20, 2024
fd33d99
correcting ticket number
KrishnanPrash Dec 20, 2024
d2f0f9b
Update src/python/examples/example_model_repository/identity/config.p…
KrishnanPrash Dec 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 64 additions & 8 deletions docs/customization_guide/tritonfrontend.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-->
### Triton Server (tritonfrontend) Bindings (Beta)
## Triton Server (tritonfrontend) Bindings (Beta)

The `tritonfrontend` python package is a set of bindings to Triton's existing
frontends implemented in C++. Currently, `tritonfrontend` supports starting up
Expand All @@ -35,13 +35,20 @@ with Triton's Python In-Process API
and [`tritonclient`](https://github.com/triton-inference-server/client/tree/main/src/python/library)
extend the ability to use Triton's full feature set with a few lines of Python.

Let us walk through a simple example:
1. First we need to load the desired models and start the server with `tritonserver`.
### Example Workflow:

1. Enter the triton container:
```bash
docker run -ti nvcr.io/nvidia/tritonserver:{YY.MM}-python-py3
```
Note: The tritonfrontend/tritonserver wheels have been shipped and installed by default in the container since 24.11 release.

2. First we need to load the desired models and start the server with `tritonserver`.
```python
import tritonserver

# Constructing path to Model Repository
model_path = f"server/src/python/examples/example_model_repository"
model_path = "server/src/python/examples/example_model_repository"

server_options = tritonserver.Options(
server_id="ExampleServer",
Expand Down Expand Up @@ -83,7 +90,7 @@ url = "localhost:8000"
client = httpclient.InferenceServerClient(url=url)

# Prepare input data
input_data = np.array([["Roger Roger"]], dtype=object)
input_data = np.array(["Roger Roger"], dtype=object)

# Create input and output objects
inputs = [httpclient.InferInput("INPUT0", input_data.shape, "BYTES")]
Expand Down Expand Up @@ -139,12 +146,61 @@ server.stop()
```
With this workflow, you can avoid having to stop each service after client requests have terminated.

### Example with RestrictedFeatures:
In order to restrict access to certain endpoints(inference, metadata, model-repo, ...), RestrictedFeatures can be utilized.
Let us walk through an example of restricting inference:
1. Similar to the previous workflow, we start with getting the server up and running.
```python
import tritonserver

model_path = "server/src/python/examples/example_model_repository"

server = tritonserver.Server(model_repostiory=model_path).start(wait_until_ready=True)
```

2. Now, we can restrict inference and start the endpoints.
```python
from tritonfrontend import Feature, RestrictedFeatures, KServeHttp

rf = RestrictedFeatures()
rf.create_feature_group("some-infer-key", "secret-infer-value", [Feature.INFERENCE])

http_options = KServeHttp.Options(restricted_features=rf)
http_service = KServeHttp(server, http_options)
http_service.start()
```

3. Finally, let us try sending a inference request to these endpoints:
```python
import tritonclient.http as httpclient

## Known Issues
model_name = "identity"
url = "localhost:8000"
valid_credentials = {"some-infer-key": "secret-infer-value"}
with httpclient.InferenceServerClient(url=url) as client:
input_data = np.array(["Roger Roger"], dtype=object)
inputs = [httpclient.InferInput("INPUT0", input_data.shape, "BYTES")]
inputs[0].set_data_from_numpy(input_data)
results = client.infer(model_name, inputs=inputs, headers=valid_credentials)
output_data = results.as_numpy("OUTPUT0")
print("[INFERENCE RESULTS]")
print("Output data:", output_data)
```
Note: If you remove the `header=valid_credentials` argument from `client.infer()`,
then you can see a failed inference request that looks something like that:
```
...
tritonclient.utils.InferenceServerException: [403] This API is restricted,
expecting header 'some-infer-key'
```
For more information on restrictedfeatures take a look at the following supporting docs:
- [limit endpoint access docs](https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/inference_protocols.md#limit-endpoint-access-beta)
- [restricted features implementation](https://github.com/triton-inference-server/server/blob/main/src/python/tritonfrontend/_api/_restricted_features.py)
### Known Issues
- The following features are not currently supported when launching the Triton frontend services through the python bindings:
- [Tracing](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/trace.md)
- [Shared Memory](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_shared_memory.md)
- [Restricted Protocols](https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/inference_protocols.md#limit-endpoint-access-beta)
- VertexAI
- Sagemaker
- After a running server has been stopped, if the client sends an inference request, a Segmentation Fault will occur.
- After a running server has been stopped, if the client sends an inference request, a Segmentation Fault will occur.
- Using tritonclient.grpc and tritonserver in the same process may cause crash/abort due to lack of `fork()` support in [`cygrpc`](https://github.com/grpc/grpc/blob/master/doc/fork_support.md)
9 changes: 8 additions & 1 deletion qa/L0_python_api/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,14 @@ fi


FRONTEND_TEST_LOG="./python_kserve.log"
python -m pytest --junitxml=test_kserve.xml test_kserve.py > $FRONTEND_TEST_LOG 2>&1
# TODO: [DLIS-7735] Run tritonclient.grpc as separate process
# Currently, running tritonclient.grpc with tritonserver in the same process,
# it will non-deterministically abort/crash without being able to be caught by pytest.
# This is because fork() is called by tritonserver on model load,
# which attempts to fork the imported libraries and their internal states,
# and cygrpc (dependency of tritonclient.grpc) does not officially support fork().
Copy link
Contributor

@rmccorm4 rmccorm4 Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, if the application only instantiate gRPC Python objects after calling fork(), then fork() will work normally, since there is no C extension binding at this point.

If the claim is that we're calling fork() during model load, aren't we instantiating grpc (client) after the fork() if server/service have already started up?

Can you ever reproduce this by only running a single test case repeatedly? If not, it's possible it's coming from the threshold between test cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From, my understanding it is an internal state set upon import:

import tritonclient.grpc as grpcclient

while True:
     server = utils.startup_server()
     sleep(1)
     utils.teardown_server(server)

# Reference: https://github.com/grpc/grpc/blob/master/doc/fork_support.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you just set env var as mentioned here so that the tests work with fork? https://github.com/grpc/grpc/blob/master/doc/fork_support.md#current-status

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempted to set and run test suite locally, but the test cases were still failing.

python -m pytest --junitxml=test_kserve.xml test_kserve.py -k "not KServeGrpc" > $FRONTEND_TEST_LOG 2>&1
if [ $? -ne 0 ]; then
cat $FRONTEND_TEST_LOG
echo -e "\n***\n*** Test Failed\n***"
Expand Down
Loading
Loading