Hi, in a MONAI deploy WG meeting last week @ericspod mentioned that it might be useful to comment here on experiences with other app packaging frameworks. I just wanted to share my experiences with the MLFlow python_function "flavor", Unfortunately, I'm not able to share all the source code at this time, but I'll try to describe it best I can with a short excerpt. The model itself is an NLP application for classifying endoscopy reports based on transformers pre-trained BERT model.

The basis of the package is this wrapper class which just requires __init__ and a predict methods. At the end of my training script, an instance of this class is created using an input model (in this case a LightningModule) and the tokenizer. This wrapped model object is then logged to mlflow where it can be deployed through mlflow's model serving framework.

"""
mlflow python_model wrapper class for Barretts model
"""
import mlflow
import pandas as pd
from torch import topk
from project.BarrettsDataset import BarrettsDataset
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


class BarrettsWrapper(mlflow.pyfunc.PythonModel):

    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        self.dataset = BarrettsDataset(tokenizer=tokenizer)

    def predict(self, context, model_input):
        logger.info('Running prediction service ')
        encoding = self.dataset.encoder(model_input.diag_final.tolist())

        _, test_prediction = self.model(encoding["input_ids"], encoding["attention_mask"])
        res = topk(test_prediction, 1).indices.tolist()

        confidences = pd.DataFrame(test_prediction.tolist(), columns=self.dataset.label_columns)
        prediction = pd.DataFrame({'Predicition': [self.dataset.label_columns[x[0]] for x in res]})

        results = pd.concat([prediction, confidences], axis=1)
        return results

# *** training code goes here, "model" is trained LightningModule, test_df is example input dataframe ***

wrappedModel = BarrettsWrapper(model, tokenizer)
signature = mlflow.models.signature.infer_signature(test_df, wrappedModel.predict(None, test_df))
mlflow.pyfunc.log_model('barretts_nlp', python_model=wrappedModel, signature=signature, code_path=['.'])

Some things I think might be useful to mention:

From the attached PR it looks like torchscript forms the basis of the packaged applications. I wasn't able to package this particular application with TS with trace or scripting due to compatibility issues with the transformers library. I know this is an NLP model so not necessarily within MONAIs remit but might be worth mentioning anyway as there may be other package incompatibilities out there and the torchscipt logs were difficult to debug.
One helpful feature was when logging the model to mlflow it is logged with a signature that describes the input and output shapes, this is automatically collected by running an (infer_signature function)[https://www.mlflow.org/docs/latest/python_api/mlflow.models.html#mlflow.models.infer_signature] using example inputs and outputs. I think something like this to partially automate/check the config .json files in [WIP] 486 Add example model package #487 would be useful.
Having a customisable wrapper class was useful since it abstracted away a lot of complexity but also left me free to customise the predict method. This meant i could add extra functionality on top of just callignt he model e.g. in this case I could add a few extra lines to put the output of the model into a dataframe for returning to the user.

I hope some of this rambling is useful, these are just a few things which come to mind, very happy to discuss if you have any questions!

Add example model package #486

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions