Update examples to execute from the root of the repo (#1674)

* Update top-level examples and Triton start-up commands to execute from the root of the repo * Where possible set default values for cli flags, removing the need to set them for the common use-case * Where possible remove the need for defining `MORPHEUS_ROOT` * Ensure C++ Triton pipelines use port 8000 to avoid the warning about the grpc port. * Optionally cast types in the C++ impl of the Triton stage when `force_convert_inputs=true` and the input and model types didn't match (previously types were always casted) * Remove `--num_threads=1` restriction and configure logging for the `log_parsing` example * Remove `--num_threads=8` restriction from `nlp_si_detection` since the pipeline has more than 8 stages. * Don't invoke the C++ impl of preallocate if the type being requested isn't supported on the C++ side (strings) * Don't use the C++ impl of the Triton stage if `use_shared_memory` is requested as this isn't supported in C++. * Add missing `gnn-fraud-classification` stage to CLI alternative for `gnn_fraud_detection_pipeline` example Closes #1671 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md). - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Eli Fajardo (https://github.com/efajardo-nv) - Michael Demoret (https://github.com/mdemoret-nv) URL: #1674
nv-morpheus · May 2, 2024 · 808c52c · 808c52c
1 parent 9d3de8a
commit 808c52c
Show file tree

Hide file tree

Showing 28 changed files with 448 additions and 144 deletions.
diff --git a/docs/source/examples.md b/docs/source/examples.md
@@ -24,7 +24,7 @@ limitations under the License.
 * [Example Ransomware Detection Morpheus Pipeline for AppShield Data](../../examples/ransomware_detection/README.md)
 * [Root Cause Analysis Acceleration & Predictive Maintenance Example](../../examples/root_cause_analysis/README.md)
 * [SID Visualization Example](../../examples/sid_visualization/README.md)
-* [Large Language Models (LLMs)](../../examples/llm/README.md)
+* Large Language Models (LLMs)
   * [Agents](../../examples/llm/agents/README.md)
   * [Completion](../../examples/llm/completion/README.md)
   * [VDB Upload](../../examples/llm/vdb_upload/README.md)

diff --git a/examples/README.md b/examples/README.md
@@ -15,10 +15,18 @@ See the License for the specific language governing permissions and
 limitations under the License.
 -->
 
-## Morpheus CLI Examples
-
-Examples run with the Morpheus CLI (`morpheus ...`) should be run from the repository root; otherwise, some filepath arguments may need to be changed.
-
-## Morpheus run.py Examples
-
-Examples run with python (`python run.py`) should be run from the example's directory; otherwise, relative Python imports may be broken.
+# Examples
+* [Anomalous Behavior Profiling with Forest Inference Library (FIL) Example](./abp_nvsmi_detection/README.md)
+* [ABP Detection Example Using Morpheus](./abp_pcap_detection/README.md)
+* [Digital Fingerprinting (DFP)](./digital_fingerprinting/README.md)
+* [GNN Fraud Detection Pipeline](./gnn_fraud_detection_pipeline/README.md)
+* [Example cyBERT Morpheus Pipeline for Apache Log Parsing](./log_parsing/README.md)
+* [Sensitive Information Detection with Natural Language Processing (NLP) Example](./nlp_si_detection/README.md)
+* [Example Ransomware Detection Morpheus Pipeline for AppShield Data](./ransomware_detection/README.md)
+* [Root Cause Analysis Acceleration & Predictive Maintenance Example](./root_cause_analysis/README.md)
+* [SID Visualization Example](./sid_visualization/README.md)
+* Large Language Models (LLMs)
+  * [Agents](./llm/agents/README.md)
+  * [Completion](./llm/completion/README.md)
+  * [VDB Upload](./llm/vdb_upload/README.md)
+  * [Retreival Augmented Generation (RAG)](./llm/rag/README.md)
diff --git a/examples/abp_pcap_detection/README.md b/examples/abp_pcap_detection/README.md
@@ -27,14 +27,9 @@ docker pull nvcr.io/nvidia/tritonserver:23.06-py3
 ```
 
 ##### Deploy Triton Inference Server
-From the root of the Morpheus repo, navigate to the anomalous behavior profiling example directory:
+From the root of the Morpheus repo, run the following to launch Triton and load the `abp-pcap-xgb` model:
 ```bash
-cd examples/abp_pcap_detection
-```
-
-The following creates the Triton container, mounts the `abp-pcap-xgb` directory to `/models/abp-pcap-xgb` in the Triton container, and starts the Triton server:
-```bash
-docker run --rm --gpus=all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v $PWD/abp-pcap-xgb:/models/abp-pcap-xgb --name tritonserver nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models --exit-on-error=false
+docker run --rm --gpus=all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v $PWD/examples/abp_pcap_detection/abp-pcap-xgb:/models/abp-pcap-xgb --name tritonserver nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models --exit-on-error=false
 ```
 
 ##### Verify Model Deployment
@@ -53,53 +48,49 @@ Use Morpheus to run the Anomalous Behavior Profiling Detection Pipeline with the
 
 From the root of the Morpheus repo, run:
 ```bash
-cd examples/abp_pcap_detection
-python run.py --help
+python examples/abp_pcap_detection/run.py --help
 ```
 
 Output:
 ```
 Usage: run.py [OPTIONS]
 
 Options:
-  --num_threads INTEGER RANGE     Number of internal pipeline threads to use
+  --num_threads INTEGER RANGE     Number of internal pipeline threads to use.
                                   [x>=1]
   --pipeline_batch_size INTEGER RANGE
                                   Internal batch size for the pipeline. Can be
                                   much larger than the model batch size. Also
-                                  used for Kafka consumers  [x>=1]
+                                  used for Kafka consumers.  [x>=1]
   --model_max_batch_size INTEGER RANGE
-                                  Max batch size to use for the model  [x>=1]
-  --input_file PATH               Input filepath  [required]
+                                  Max batch size to use for the model.  [x>=1]
+  --input_file PATH               Input filepath.  [required]
   --output_file TEXT              The path to the file where the inference
                                   output will be saved.
   --model_fea_length INTEGER RANGE
-                                  Features length to use for the model  [x>=1]
+                                  Features length to use for the model.
+                                  [x>=1]
   --model_name TEXT               The name of the model that is deployed on
-                                  Tritonserver
+                                  Tritonserver.
   --iterative                     Iterative mode will emit dataframes one at a
                                   time. Otherwise a list of dataframes is
                                   emitted. Iterative mode is good for
                                   interleaving source stages.
-  --server_url TEXT               Tritonserver url  [required]
-  --file_type [auto|json|csv]     Indicates what type of file to read.
+  --server_url TEXT               Tritonserver url.  [required]
+  --file_type [auto|csv|json]     Indicates what type of file to read.
                                   Specifying 'auto' will determine the file
                                   type from the extension.
   --help                          Show this message and exit.
 ```
 
-To launch the configured Morpheus pipeline with the sample data that is provided in `examples/data`, from the `examples/abp_pcap_detection` directory run the following:
+To launch the configured Morpheus pipeline with the sample data that is provided in `examples/data`, run the following:
 
 ```bash
-python run.py \
-	--input_file ../data/abp_pcap_dump.jsonlines \
-	--output_file ./pcap_out.jsonlines \
-	--model_name 'abp-pcap-xgb' \
-	--server_url localhost:8001
+python examples/abp_pcap_detection/run.py
 ```
 Note: Both Morpheus and Triton Inference Server containers must have access to the same GPUs in order for this example to work.
 
-The pipeline will process the input `pcap_dump.jsonlines` sample data and write it to `pcap_out.jsonlines`.
+The pipeline will process the input `abp_pcap_dump.jsonlines` sample data and write it to `pcap_out.jsonlines`.
 
 ### CLI Example
 The above example is illustrative of using the Python API to build a custom Morpheus Pipeline.
@@ -123,5 +114,3 @@ morpheus --log_level INFO --plugin "examples/abp_pcap_detection/abp_pcap_preproc
     to-file --filename "pcap_out.jsonlines" --overwrite \
     monitor --description "Write to file rate" --unit "to-file"
 ```
-
-Note: Triton is still needed to be launched from the `examples/abp_pcap_detection` directory.
diff --git a/examples/abp_pcap_detection/run.py b/examples/abp_pcap_detection/run.py
@@ -33,6 +33,9 @@
 from morpheus.stages.preprocess.deserialize_stage import DeserializeStage
 from morpheus.utils.logger import configure_logging
 
+CUR_DIR = os.path.dirname(__file__)
+EX_DATA_DIR = os.path.join(CUR_DIR, "../data")
+
 
 @click.command()
 @click.option(
@@ -57,7 +60,7 @@
 @click.option(
     "--input_file",
     type=click.Path(exists=True, readable=True),
-    default="pcap.jsonlines",
+    default=os.path.join(EX_DATA_DIR, "abp_pcap_dump.jsonlines"),
     required=True,
     help="Input filepath.",
 )
@@ -84,7 +87,7 @@
     help=("Iterative mode will emit dataframes one at a time. Otherwise a list of dataframes is emitted. "
           "Iterative mode is good for interleaving source stages."),
 )
-@click.option("--server_url", required=True, help="Tritonserver url.")
+@click.option("--server_url", required=True, help="Tritonserver url.", default="localhost:8001")
 @click.option(
     "--file_type",
     type=click.Choice(FILE_TYPE_NAMES, case_sensitive=False),

diff --git a/examples/gnn_fraud_detection_pipeline/README.md b/examples/gnn_fraud_detection_pipeline/README.md
@@ -28,17 +28,10 @@ mamba env update \
 ```
 
 ## Running
-
-##### Setup Env Variable
-```bash
-export MORPHEUS_ROOT=$(pwd)
-```
-
 Use Morpheus to run the GNN fraud detection Pipeline with the transaction data. A pipeline has been configured in `run.py` with several command line options:
 
 ```bash
-cd ${MORPHEUS_ROOT}/examples/gnn_fraud_detection_pipeline
-python run.py --help
+python examples/gnn_fraud_detection_pipeline/run.py --help
 ```
 ```
 Usage: run.py [OPTIONS]
@@ -63,11 +56,10 @@ Options:
   --help                          Show this message and exit.
 ```
 
-To launch the configured Morpheus pipeline with the sample data that is provided at `$MORPHEUS_ROOT/models/dataset`, run the following:
+To launch the configured Morpheus pipeline, run the following:
 
 ```bash
-cd ${MORPHEUS_ROOT}/examples/gnn_fraud_detection_pipeline
-python run.py
+python examples/gnn_fraud_detection_pipeline/run.py
 ```
 ```
 ====Registering Pipeline====
@@ -125,6 +117,7 @@ morpheus --log_level INFO \
 	monitor --description "Graph construction rate" \
 	gnn-fraud-sage --model_dir  examples/gnn_fraud_detection_pipeline/model/ \
 	monitor --description "Inference rate" \
+	gnn-fraud-classification --model_xgb_file examples/gnn_fraud_detection_pipeline/model/xgb.pt \
 	monitor --description "Add classification rate" \
 	serialize \
 	to-file --filename "output.csv" --overwrite

diff --git a/examples/gnn_fraud_detection_pipeline/run.py b/examples/gnn_fraud_detection_pipeline/run.py
@@ -32,6 +32,8 @@
 from stages.graph_construction_stage import FraudGraphConstructionStage
 from stages.graph_sage_stage import GraphSAGEStage
 
+CUR_DIR = os.path.dirname(__file__)
+
 
 @click.command()
 @click.option(
@@ -62,21 +64,21 @@
 @click.option(
     "--input_file",
     type=click.Path(exists=True, readable=True, dir_okay=False),
-    default="validation.csv",
+    default=os.path.join(CUR_DIR, "validation.csv"),
     required=True,
     help="Input data filepath.",
 )
 @click.option(
     "--training_file",
     type=click.Path(exists=True, readable=True, dir_okay=False),
-    default="training.csv",
+    default=os.path.join(CUR_DIR, "training.csv"),
     required=True,
     help="Training data filepath.",
 )
 @click.option(
     "--model_dir",
     type=click.Path(exists=True, readable=True, file_okay=False, dir_okay=True),
-    default="model",
+    default=os.path.join(CUR_DIR, "model"),
     required=True,
     help="Path to trained Hinsage & XGB models.",
 )

diff --git a/examples/log_parsing/README.md b/examples/log_parsing/README.md
@@ -29,11 +29,6 @@ Example:
 docker pull nvcr.io/nvidia/tritonserver:23.06-py3
 ```
 
-##### Setup Env Variable
-```bash
-export MORPHEUS_ROOT=$(pwd)
-```
-
 ##### Start Triton Inference Server Container
 From the Morpheus repo root directory, run the following to launch Triton and load the `log-parsing-onnx` model:
 
@@ -56,19 +51,15 @@ Once Triton server finishes starting up, it will display the status of all loade
 
 ### Run Log Parsing Pipeline
 
-Run the following from the `examples/log_parsing` directory to start the log parsing pipeline:
+Run the following from the root of the Morpheus repo to start the log parsing pipeline:
 
 ```bash
-python run.py \
-    --num_threads 1 \
-    --input_file ${MORPHEUS_ROOT}/models/datasets/validation-data/log-parsing-validation-data-input.csv \
-    --output_file ./log-parsing-output.jsonlines \
+python examples/log_parsing/run.py \
+    --input_file=./models/datasets/validation-data/log-parsing-validation-data-input.csv \
     --model_vocab_hash_file=data/bert-base-cased-hash.txt \
-    --model_vocab_file=${MORPHEUS_ROOT}/models/training-tuning-scripts/sid-models/resources/bert-base-cased-vocab.txt \
-    --model_seq_length=256 \
+    --model_vocab_file=./models/training-tuning-scripts/sid-models/resources/bert-base-cased-vocab.txt \
     --model_name log-parsing-onnx \
-    --model_config_file=${MORPHEUS_ROOT}/models/log-parsing-models/log-parsing-config-20220418.json \
-    --server_url localhost:8001
+    --model_config_file=./models/log-parsing-models/log-parsing-config-20220418.json
 ```
 
 Use `--help` to display information about the command line options:
@@ -110,7 +101,7 @@ PYTHONPATH="examples/log_parsing" \
 morpheus --log_level INFO \
 	--plugin "inference" \
 	--plugin "postprocessing" \
-	run --num_threads 1 --pipeline_batch_size 1024 --model_max_batch_size 32  \
+	run --pipeline_batch_size 1024 --model_max_batch_size 32  \
 	pipeline-nlp \
 	from-file --filename ./models/datasets/validation-data/log-parsing-validation-data-input.csv  \
 	deserialize \

diff --git a/examples/log_parsing/run.py b/examples/log_parsing/run.py
@@ -12,6 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+import logging
 import os
 
 import click
@@ -28,6 +29,7 @@
 from morpheus.stages.output.write_to_file_stage import WriteToFileStage
 from morpheus.stages.preprocess.deserialize_stage import DeserializeStage
 from morpheus.stages.preprocess.preprocess_nlp_stage import PreprocessNLPStage
+from morpheus.utils.logger import configure_logging
 
 
 @click.command()
@@ -79,7 +81,7 @@
     help="The name of the model that is deployed on Tritonserver.",
 )
 @click.option("--model_config_file", required=True, help="Model config file.")
-@click.option("--server_url", required=True, help="Tritonserver url.")
+@click.option("--server_url", required=True, help="Tritonserver url.", default="localhost:8001")
 def run_pipeline(
     num_threads,
     pipeline_batch_size,
@@ -93,6 +95,10 @@ def run_pipeline(
     model_config_file,
     server_url,
 ):
+
+    # Enable the default logger.
+    configure_logging(log_level=logging.INFO)
+
     config = Config()
     config.mode = PipelineModes.NLP
     config.num_threads = num_threads

diff --git a/examples/nlp_si_detection/README.md b/examples/nlp_si_detection/README.md
@@ -103,11 +103,10 @@ The following command line is the entire command to build and launch the pipelin
 
 From the Morpheus repo root directory, run:
 ```bash
-export MORPHEUS_ROOT=$(pwd)
 # Launch Morpheus printing debug messages
 morpheus --log_level=DEBUG \
-   `# Run a pipeline with 8 threads and a model batch size of 32 (Must match Triton config)` \
-   run --num_threads=8 --pipeline_batch_size=1024 --model_max_batch_size=32 \
+   `# Run a pipeline with a model batch size of 32 (Must match Triton config)` \
+   run --pipeline_batch_size=1024 --model_max_batch_size=32 \
    `# Specify a NLP pipeline with 256 sequence length (Must match Triton config)` \
    pipeline-nlp --model_seq_length=256 \
    `# 1st Stage: Read from file` \

diff --git a/examples/nlp_si_detection/run.sh b/examples/nlp_si_detection/run.sh
@@ -19,7 +19,7 @@ SCRIPT_DIR=${SCRIPT_DIR:-"$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null
 export MORPHEUS_ROOT=${MORPHEUS_ROOT:-"$(realpath ${SCRIPT_DIR}/../..)"}
 
 morpheus --log_level=DEBUG \
-    run --num_threads=8 --pipeline_batch_size=1024 --model_max_batch_size=32 \
+    run --pipeline_batch_size=1024 --model_max_batch_size=32 \
     pipeline-nlp --model_seq_length=256 \
     from-file --filename=${MORPHEUS_ROOT}/examples/data/pcap_dump.jsonlines \
     deserialize \

diff --git a/examples/ransomware_detection/README.md b/examples/ransomware_detection/README.md
@@ -35,15 +35,15 @@ export MORPHEUS_ROOT=$(pwd)
 ```
 
 ##### Start Triton Inference Server Container
-Run the following from the `examples/ransomware_detection` directory to launch Triton and load the `ransomw-model-short-rf` model:
-
+From the Morpheus repo root directory, run the following to launch Triton and load the `ransomw-model-short-rf` model:
 ```bash
 # Run Triton in explicit mode
-docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models/triton-model-repo nvcr.io/nvidia/tritonserver:23.06-py3 \
-   tritonserver --model-repository=/models/triton-model-repo \
-                --exit-on-error=false \
-                --model-control-mode=explicit \
-                --load-model ransomw-model-short-rf
+docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 \
+    -v $PWD/examples/ransomware_detection/models:/models/triton-model-repo nvcr.io/nvidia/tritonserver:23.06-py3 \
+    tritonserver --model-repository=/models/triton-model-repo \
+                 --exit-on-error=false \
+                 --model-control-mode=explicit \
+                 --load-model ransomw-model-short-rf
 ```
 
 ##### Verify Model Deployment
@@ -67,14 +67,13 @@ mamba install 'dask>=2023.1.1' 'distributed>=2023.1.1'
 ```
 
 ## Run Ransomware Detection Pipeline
-Run the following from the `examples/ransomware_detection` directory to start the ransomware detection pipeline:
+Run the following from the root of the Morpheus repo to start the ransomware detection pipeline:
 
 ```bash
-python run.py --server_url=localhost:8001 \
+python examples/ransomware_detection/run.py --server_url=localhost:8001 \
               --sliding_window=3 \
               --model_name=ransomw-model-short-rf \
-              --conf_file=./config/ransomware_detection.yaml \
-              --input_glob=${MORPHEUS_ROOT}/examples/data/appshield/*/snapshot-*/*.json \
+              --input_glob=./examples/data/appshield/*/snapshot-*/*.json \
               --output_file=./ransomware_detection_output.jsonlines
 ```
 

diff --git a/examples/ransomware_detection/run.py b/examples/ransomware_detection/run.py
@@ -33,6 +33,8 @@
 from stages.create_features import CreateFeaturesRWStage
 from stages.preprocessing import PreprocessingRWStage
 
+CUR_DIR = os.path.dirname(__file__)
+
 
 @click.command()
 @click.option('--debug', default=False)
@@ -64,7 +66,7 @@
 @click.option(
     "--conf_file",
     type=click.STRING,
-    default="./config/ransomware_detection.yaml",
+    default=os.path.join(CUR_DIR, "config/ransomware_detection.yaml"),
     help="Ransomware detection configuration filepath.",
 )
 @click.option(