Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix failed to create model archive #1508

Merged
merged 5 commits into from
Mar 19, 2022
Merged

fix failed to create model archive #1508

merged 5 commits into from
Mar 19, 2022

Conversation

lxning
Copy link
Collaborator

@lxning lxning commented Mar 15, 2022

Description

Please include a summary of the feature or issue being fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes #(issue)

Type of change

#1498

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Feature/Issue validation/testing

Please describe the tests [UT/IT] that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A

  • Test B

  • Logs

  1. config.properties

inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081

number_of_netty_threads=32
job_queue_size=1000

vmargs=-Xmx4g -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError
prefer_direct_buffer=True

default_response_timeout=300
unregister_model_timeout=300
install_py_dep_per_model=true
default_service_handler=./model_store/noop_no_archive_no_version/service.py:handle

  1. tree model_store
    model_store
    └── noop_no_archive_no_version
    ├── model.pt
    └── service.py

  2. torchserve --ncs --start --model-store model_store --models noop_no_archive_no_version --ts-config config.properties

  3. server log:
    Config file: config.properties
    Inference address: http://0.0.0.0:8080
    Management address: http://0.0.0.0:8081
    Metrics address: http://127.0.0.1:8082
    Model Store: /Volumes/workplace/python_env/serve/model_store
    Initial Models: noop_no_archive_no_version
    Log dir: /Volumes/workplace/python_env/serve/logs
    Metrics dir: /Volumes/workplace/python_env/serve/logs
    Netty threads: 32
    Netty client threads: 0
    Default workers per model: 12
    Blacklist Regex: N/A
    Maximum Response Size: 6553500
    Maximum Request Size: 6553500
    Limit Maximum Image Pixels: true
    Prefer direct buffer: True
    Allowed Urls: [file://.|http(s)?://.]
    Custom python dependency for model allowed: true
    Metrics report format: prometheus
    Enable metrics API: true
    Workflow Store: /Volumes/workplace/python_env/serve/model_store
    Model config: N/A
    2022-03-14T18:58:43,154 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin...
    2022-03-14T18:58:43,186 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: noop_no_archive_no_version
    2022-03-14T18:58:43,190 [WARN ] main org.pytorch.serve.archive.model.ModelArchive - Model archive version is not defined. Please upgrade to torch-model-archiver 0.2.0 or higher
    2022-03-14T18:58:43,190 [WARN ] main org.pytorch.serve.archive.model.ModelArchive - Model archive createdOn is not defined. Please upgrade to torch-model-archiver 0.2.0 or higher
    2022-03-14T18:58:43,193 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model noop_no_archive_no_version
    2022-03-14T18:58:43,193 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model noop_no_archive_no_version
    2022-03-14T18:58:43,194 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model noop_no_archive_no_version loaded.
    2022-03-14T18:58:43,194 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: noop_no_archive_no_version, count: 12
    2022-03-14T18:58:43,216 [DEBUG] W-9002-noop_no_archive_no_version_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/Users/lninga/opt/anaconda3/envs/py38/bin/python, /Users/lninga/opt/anaconda3/envs/py38/lib/python3.8/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /var/folders/w6/s5gp9htn2pb9z87lwp6fzjg9hv4nys/T//.ts.sock.9002]
    2022-03-14T18:58:43,216 [DEBUG] W-9005-noop_no_archive_no_version_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/Users/lninga/opt/anaconda3/envs/py38/bin/python, /Users/lninga/opt/anaconda3/envs/py38/lib/python3.8/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /var/folders/w6/s5gp9htn2pb9z87lwp6fzjg9hv4nys/T//.ts.sock.9005]
    2022-03-14T18:58:43,216 [DEBUG] W-9006-noop_no_archive_no_version_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/Users/lninga/opt/anaconda3/envs/py38/bin/python, /Users/lninga/opt/anaconda3/envs/py38/lib/python3.8/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /var/folders/w6/s5gp9htn2pb9z87lwp6fzjg9hv4nys/T//.ts.sock.9006]
    2022-03-14T18:58:43,216 [DEBUG] W-9008-noop_no_archive_no_version_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/Users/lninga/opt/anaconda3/envs/py38/bin/python, /Users/lninga/opt/anaconda3/envs/py38/lib/python3.8/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /var/folders/w6/s5gp9htn2pb9z87lwp6fzjg9hv4nys/T//.ts.sock.9008]
    2022-03-14T18:58:43,216 [DEBUG] W-9001-noop_no_archive_no_version_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/Users/lninga/opt/anaconda3/envs/py38/bin/python, /Users/lninga/opt/anaconda3/envs/py38/lib/python3.8/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /var/folders/w6/s5gp9htn2pb9z87lwp6fzjg9hv4nys/T//.ts.sock.9001]

Checklist:

  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

@lxning lxning added the bug Something isn't working label Mar 15, 2022
@lxning lxning added this to the v0.6.0 milestone Mar 15, 2022
@lxning lxning self-assigned this Mar 15, 2022
@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 67e214b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 67e214b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 67e214b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@@ -70,6 +70,11 @@ public static ModelArchive downloadModel(

if (new File(url).isDirectory()) {
return load(url, new File(url), false);
} else if (modelLocation.exists()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lxning how would we handle the case if there are multiple mar files in a directory? are we relying on user to have only one mar file in the /xxx/model_store/modelXXX directory? In this case we need to make sure we have documented it clearly.

Do you we have any documentation on it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood this PR as no longer needing a model_store and just letting users link to various model files directy

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In model_store dir, existing code ignores the mar files in the subdir (ie. /xxx/model_store/modelXXX).

@maaquib
Copy link
Collaborator

maaquib commented Mar 15, 2022

@lxning Can you add more details as to why this change is being made? What changed that this stopped working?

"modelServerVersion": "1.0",
"implementationVersion": "1.0",
"specificationVersion": "1.0"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to store a model.pt in version control? Is it just an empty file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, noop is an example. model.pt is an empty file. Manifest requires it to pass validation.

import time


class NoopService(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow why a NoopService is needed, it seems very similar to the base handler? Could you please explain at a high level in the github issue the design of this PR

Copy link
Collaborator Author

@lxning lxning Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example noop_no_archive_no_version is copied from original noop_no_archive. I did slightly change (ie remove modelversion in manifest.json) for unit test the default manifest bug fixing.

@@ -70,6 +70,11 @@ public static ModelArchive downloadModel(

if (new File(url).isDirectory()) {
return load(url, new File(url), false);
} else if (modelLocation.exists()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood this PR as no longer needing a model_store and just letting users link to various model files directy

@msaroufim msaroufim self-requested a review March 16, 2022 18:41
@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: a513eaf
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: a513eaf
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: a513eaf
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 338913e
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 338913e
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 338913e
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@lxning lxning merged commit b8f19c4 into master Mar 19, 2022
@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 338913e
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 338913e
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@msaroufim msaroufim deleted the issue_1498 branch June 16, 2022 01:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants