Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configuration option to disable system metrics collection #2104

Merged
merged 8 commits into from
Feb 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ Configuration parameter `install_py_dep_per_model` controls if the model server
install_py_dep_per_model=true
```

User can also supply custom python packages in zip or tar.gz format using the `--extra-files` flag while creating the model-archive and make an entry of the file name in the `requirements` file.
User can also supply custom python packages in zip or tar.gz format using the `--extra-files` flag while creating the model-archive and make an entry of the file name in the `requirements` file.

### Restrict backend worker to access environment variables

Expand Down Expand Up @@ -239,7 +239,7 @@ A model's parameters are defined in [model source code](https://github.com/pytor
* `defaultVersion`: the default version of a model
* `marName`: the mar file name of a model

A model's configuration example
A model's configuration example
```properties
models={\
"noop": {\
Expand Down Expand Up @@ -281,22 +281,23 @@ Most of the following properties are designed for performance tuning. Adjusting
* `unregister_model_timeout`: Timeout, in seconds, used when handling an unregister model request when cleaning a process before it is deemed unresponsive and an error response is sent. Default: 120 seconds.
* `decode_input_request`: Configuration to let backend workers to decode requests, when the content type is known.
If this is set to "true", backend workers do "Bytearray to JSON object" conversion when the content type is "application/json" and
the backend workers convert "Bytearray to utf-8 string" when the Content-Type of the request is set to "text*". Default: true
the backend workers convert "Bytearray to utf-8 string" when the Content-Type of the request is set to "text*". Default: true
* `initial_worker_port` : This is the initial port number for auto assigning port to worker process.
* `model_store` : Path of model store directory.
* `model_server_home` : Torchserve home directory.
* `model_server_home` : Torchserve home directory.
* `max_request_size` : The maximum allowable request size that the Torchserve accepts, in bytes. Default: 6553500
* `max_response_size` : The maximum allowable response size that the Torchserve sends, in bytes. Default: 6553500
* `limit_max_image_pixels` : Default value is true (Use default [PIL.Image.MAX_IMAGE_PIXELS](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.MAX_IMAGE_PIXELS)). If this is set to "false", set PIL.Image.MAX_IMAGE_PIXELS = None in backend default vision handler for large image payload.
* `limit_max_image_pixels` : Default value is true (Use default [PIL.Image.MAX_IMAGE_PIXELS](https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.MAX_IMAGE_PIXELS)). If this is set to "false", set PIL.Image.MAX_IMAGE_PIXELS = None in backend default vision handler for large image payload.
* `allowed_urls` : Comma separated regex of allowed source URL(s) from where models can be registered. Default: "file://.*|http(s)?://.*" (all URLs and local file system)
e.g. : To allow base URLs `https://s3.amazonaws.com/` and `https://torchserve.pytorch.org/` use the following regex string `allowed_urls=https://s3.amazonaws.com/.*,https://torchserve.pytorch.org/.*`
* `workflow_store` : Path of workflow store directory. Defaults to model store directory.
* `disable_system_metrics` : Disable collection of system metrics when set to "true". Default value is "false".

**NOTE**

All the above config properties can be set using environment variable as follows.
- set `enable_envvars_config` to true in config.properties
- export environment variable for property as`TS_<PROPERTY_NAME>`.
- export environment variable for property as`TS_<PROPERTY_NAME>`.

e.g.: to set inference_address property run cmd
`export TS_INFERENCE_ADDRESS="http://127.0.0.1:8082"`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,10 @@ public void startAndWait()
startGRPCServers();

// Create and schedule metrics manager
MetricManager.scheduleMetrics(configManager);
if (!configManager.isSystemMetricsDisabled()) {
MetricManager.scheduleMetrics(configManager);
}

System.out.println("Model server started."); // NOPMD

channelFutures.get(0).sync();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ public final class ConfigManager {
private static final String TS_JOB_QUEUE_SIZE = "job_queue_size";
private static final String TS_NUMBER_OF_GPU = "number_of_gpu";
private static final String TS_METRICS_CONFIG = "metrics_config";
private static final String TS_DISABLE_SYSTEM_METRICS = "disable_system_metrics";

// IPEX config option that can be set at config.properties
private static final String TS_IPEX_ENABLE = "ipex_enable";
Expand Down Expand Up @@ -373,6 +374,10 @@ public String getMetricsConfigPath() {
return path;
}

public boolean isSystemMetricsDisabled() {
return Boolean.parseBoolean(getProperty(TS_DISABLE_SYSTEM_METRICS, "false"));
}

public String getTsDefaultServiceHandler() {
return getProperty(TS_DEFAULT_SERVICE_HANDLER, null);
}
Expand Down Expand Up @@ -642,6 +647,8 @@ public String dumpConfigurations() {
+ prop.getProperty(TS_METRICS_FORMAT, METRIC_FORMAT_PROMETHEUS)
+ "\nEnable metrics API: "
+ prop.getProperty(TS_ENABLE_METRICS_API, "true")
+ "\nDisable system metrics: "
+ isSystemMetricsDisabled()
+ "\nWorkflow Store: "
+ (getWorkflowStore() == null ? "N/A" : getWorkflowStore())
+ "\nModel config: "
Expand Down
121 changes: 121 additions & 0 deletions test/pytest/test_metrics.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import glob
import os
import platform
import re
import shutil
import time
from os import path
Expand All @@ -9,6 +10,18 @@
import test_utils

NUM_STARTUP_CFG = 0
SYSTEM_METRICS = [
"CPUUtilization",
"MemoryUsed",
"MemoryAvailable",
"MemoryUtilization",
"DiskUsage",
"DiskUtilization",
"DiskAvailable",
"GPUMemoryUtilization",
"GPUMemoryUsed",
"GPUUtilization",
]


def setup_module(module):
Expand Down Expand Up @@ -55,6 +68,33 @@ def run_log_location_var(custom_path=test_utils.ROOT_DIR, no_config_snapshots=Fa
assert len(glob.glob(custom_path + "/ts_log.log")) == 1


def register_densenet161_model_and_make_inference_request():
test_utils.register_model("densenet161.mar", "densenet161")
data_file = os.path.join(
test_utils.REPO_ROOT, "examples/image_classifier/kitten.jpg"
)
with open(data_file, "rb") as input_data:
requests.post(
url=f"http://localhost:8080/predictions/densenet161", data=input_data
)


def validate_system_metrics(present=True):
assert len(glob.glob("logs/ts_metrics.log")) == 1
ts_metrics_path = glob.glob("logs/ts_metrics.log")[0]
assert os.path.getsize(ts_metrics_path) > 0

system_metrics_regex = re.compile("|".join(SYSTEM_METRICS), flags=re.IGNORECASE)
with open(ts_metrics_path, "rt") as ts_metrics_file:
ts_metrics = ts_metrics_file.read()
system_metrics = re.findall(system_metrics_regex, ts_metrics)

if present:
assert len(system_metrics) > 0
else:
assert len(system_metrics) == 0


def test_logs_created():
logs_created()
global NUM_STARTUP_CFG
Expand Down Expand Up @@ -318,3 +358,84 @@ def test_metrics_location_var_snapshot_enabled_rdonly_dir():
assert len(glob.glob(RDONLY_DIR + "/logs/ts_metrics.log")) == 0
finally:
del os.environ["METRICS_LOCATION"]


def test_collect_system_metrics_when_not_disabled():
"""
Validates that system metrics are collected when not disabled
"""
# Torchserve cleanup
test_utils.torchserve_cleanup()
# Remove existing logs if any
for f in glob.glob("logs/*.log"):
os.remove(f)

try:
test_utils.start_torchserve(
model_store=test_utils.MODEL_STORE, no_config_snapshots=True, gen_mar=False
)
register_densenet161_model_and_make_inference_request()
validate_system_metrics(present=True)
finally:
test_utils.torchserve_cleanup()


def test_disable_system_metrics_using_config_properties():
"""
Validates that system metrics collection is disabled when "disable_system_metrics"
configuration option is set to "true"
"""
# Torchserve cleanup
test_utils.torchserve_cleanup()
# Remove existing logs if any
for f in glob.glob("logs/*.log"):
os.remove(f)

config_file = test_utils.ROOT_DIR + "config.properties"
with open(config_file, "w") as f:
f.write("disable_system_metrics=true")

try:
test_utils.start_torchserve(
model_store=test_utils.MODEL_STORE,
snapshot_file=config_file,
no_config_snapshots=True,
gen_mar=False,
)
register_densenet161_model_and_make_inference_request()
validate_system_metrics(present=False)
finally:
test_utils.torchserve_cleanup()
os.remove(config_file)


def test_disable_system_metrics_using_environment_variable():
"""
Validates that system metrics collection is disabled when TS_DISABLE_SYSTEM_METRICS
environment variable is set to "true"
"""
# Torchserve cleanup
test_utils.torchserve_cleanup()
# Remove existing logs if any
for f in glob.glob("logs/*.log"):
os.remove(f)

config_file = test_utils.ROOT_DIR + "config.properties"
with open(config_file, "w") as f:
f.write("enable_envvars_config=true")

os.environ["TS_DISABLE_SYSTEM_METRICS"] = "true"

try:
test_utils.start_torchserve(
model_store=test_utils.MODEL_STORE,
snapshot_file=config_file,
no_config_snapshots=True,
gen_mar=False,
)
register_densenet161_model_and_make_inference_request()
validate_system_metrics(present=False)
finally:
test_utils.torchserve_cleanup()
del os.environ["TS_DISABLE_SYSTEM_METRICS"]
os.remove(config_file)