From 5f00e7843a20c4b7890633218cbd4e0e35b633ac Mon Sep 17 00:00:00 2001 From: Devin Robison Date: Mon, 27 Nov 2023 15:27:17 -0700 Subject: [PATCH] Align model card requirements (#1388) Authors: - Devin Robison (https://github.com/drobison00) Approvers: - David Gardner (https://github.com/dagardner-nv) - Michael Demoret (https://github.com/mdemoret-nv) URL: https://github.com/nv-morpheus/Morpheus/pull/1388 --- models/model-cards/abp-model-card.md | 20 --------- models/model-cards/dfp-model-card.md | 15 ------- models/model-cards/gnn-fsi-model-card.md | 27 ++++++------ models/model-cards/phishing-model-card.md | 17 -------- .../root-cause-analysis-model-card.md | 41 ------------------- 5 files changed, 13 insertions(+), 107 deletions(-) diff --git a/models/model-cards/abp-model-card.md b/models/model-cards/abp-model-card.md index 3f5777e8b3..e285203e27 100644 --- a/models/model-cards/abp-model-card.md +++ b/models/model-cards/abp-model-card.md @@ -94,10 +94,6 @@ limitations under the License. * Sample dataset consists of over 1000 nvidia-smi outputs
-**Dataset License:** - -* [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0)
- ## Evaluation Dataset: **Link:** @@ -108,10 +104,6 @@ limitations under the License. * Sample dataset consists of over 1000 nvidia-smi outputs
-**Dataset License:** - -* [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0)
- ## Inference: **Engine:** @@ -213,10 +205,6 @@ limitations under the License. * N/A -### What training is recommended for developers working with this model? - -* Familiarity with the Morpheus SDK is recommended for developers working with this model. - ### Link the relevant end user license agreement * [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0) @@ -256,10 +244,6 @@ limitations under the License. * N/A -### Technical robustness and model security validated? - -* No - ### Is the model and dataset compliant with National Classification Management Society (NCMS)? * No @@ -308,10 +292,6 @@ limitations under the License. * N/A -### Scanned for malware? - -* No - ### Are we able to identify and trace source of dataset? * Yes diff --git a/models/model-cards/dfp-model-card.md b/models/model-cards/dfp-model-card.md index 07e7ecfc2f..fd8d0758cd 100644 --- a/models/model-cards/dfp-model-card.md +++ b/models/model-cards/dfp-model-card.md @@ -83,9 +83,6 @@ The training dataset consists of AWS CloudTrail logs. It contains logs from two * [hammah-user123-training-part3.json](https://github.com/nv-morpheus/Morpheus/blob/branch-23.11/models/datasets/training-data/cloudtrail/hammah-user123-training-part3.json): 1000 records
* [hammah-user123-training-part4.json](https://github.com/nv-morpheus/Morpheus/blob/branch-23.11/models/datasets/training-data/cloudtrail/hammah-user123-training-part4.json): 387 records
-**Dataset License:** -* [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0)
- ## Evaluation Dataset: **Link:** * https://github.com/nv-morpheus/Morpheus/tree/branch-23.11/models/datasets/validation-data/cloudtrail
@@ -98,9 +95,6 @@ The evaluation dataset consists of AWS CloudTrail logs. It contains logs from tw * [hammah-user123-validation-part2.json](https://github.com/nv-morpheus/Morpheus/blob/branch-23.11/models/datasets/validation-data/cloudtrail/hammah-user123-validation-part2.json): 300 records * [hammah-user123-validation-part3.json](https://github.com/nv-morpheus/Morpheus/blob/branch-23.11/models/datasets/validation-data/cloudtrail/hammah-user123-validation-part3.json): 247 records -**Dataset License:** -* [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0)
- ## Inference: **Engine:** * PyTorch @@ -179,9 +173,6 @@ The evaluation dataset consists of AWS CloudTrail logs. It contains logs from tw ### What are the potential known risks to users and stakeholders? * None -### What training is recommended for developers working with this model? If none, please state "none." -* Familiarity with the Morpheus SDK is recommended for developers working with this model. - ### Link the relevant end user license agreement * [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0) @@ -211,9 +202,6 @@ The evaluation dataset consists of AWS CloudTrail logs. It contains logs from tw ### Name target quality Key Performance Indicators (KPIs) for which this has been tested. * None -### Technical robustness and model security validated? -* No - ### Is the model and dataset compliant with National Classification Management Society (NCMS)? * No @@ -251,9 +239,6 @@ The evaluation dataset consists of AWS CloudTrail logs. It contains logs from tw ### Is data in dataset traceable? * No -### Scanned for malware? -* No - ### Are we able to identify and trace source of dataset? * Yes ([fully synthetic dataset](https://github.com/nv-morpheus/Morpheus/tree/branch-23.11/models/datasets/training-data/cloudtrail)) diff --git a/models/model-cards/gnn-fsi-model-card.md b/models/model-cards/gnn-fsi-model-card.md index 1b34dc1c0c..34115bf86a 100644 --- a/models/model-cards/gnn-fsi-model-card.md +++ b/models/model-cards/gnn-fsi-model-card.md @@ -81,9 +81,6 @@ This model is an example of a fraud detection pipeline using a graph neural netw **Properties (Quantity, Dataset Descriptions, Sensor(s)):** * A training data consists of raw 753 synthetic labeled credit card transaction data with data augmentation in a total of 12053 labeled transaction data.
-**Dataset License:** -* [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0)
- ## Evaluation Dataset: **Link:** * [fraud-detection-validation-data.csv](models/dataset/fraud-detection-validation-data.csv)
@@ -91,9 +88,6 @@ This model is an example of a fraud detection pipeline using a graph neural netw **Properties (Quantity, Dataset Descriptions, Sensor(s)):** * Data consists of raw 265 labeled credit card transaction synthetically created
-**Dataset License:** -* [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0)
- ## Inference: **Engine:** * Triton
@@ -124,16 +118,21 @@ This model is an example of a fraud detection pipeline using a graph neural netw ### What is the accent balance of the model validation data? * Not Applicable + ### Describe measures taken to mitigate against unwanted bias. * Not Applicable + ## Model Card ++ Explainability Subcard ### Name example applications and use cases for this model. * The model is primarily designed for testing purposes and serves as a small pretrained model specifically used to evaluate and validate the GNN FSI pipeline. Its application is focused on assessing the effectiveness of the pipeline rather than being intended for broader use cases or specific applications beyond testing. + ### Fill in the blank for the model technique. * This model is designed for developers seeking to test the GNN fraud detection pipeline with a small pretrained model on a synthetic dataset. + ### Name who is intended to benefit from this model. * The intended beneficiaries of this model are developers who aim to test the performance and functionality of the GNN fraud detection pipeline using synthetic datasets. It may not be suitable or provide significant value for real-world transactions. + ### Describe the model output. * This model outputs fraud probability score b/n (0 & 1). @@ -152,8 +151,6 @@ This model is an example of a fraud detection pipeline using a graph neural netw ### What are the potential known risks to users and stakeholders? * None -### What training is recommended for developers working with this model? If none, please state "none." -* Familiarity with the Morpheus SDK is recommended for developers working with this model. ### Link the relevant end user license agreement * [Apache 2.0](https://github.com/nv-morpheus/Morpheus/blob/branch-23.11/LICENSE) @@ -170,17 +167,19 @@ This model is an example of a fraud detection pipeline using a graph neural netw ### Was model and dataset assessed for vulnerability for potential form of attack? * No + ### Name applications for the model. * Used for testing fraud detection application in Morpheus pipeline, under the defined dataset schema description. + ### Name use case restrictions for the model. * The model's use case is restricted to testing the Morpheus pipeline and may not be suitable for other applications. + ### Has this been verified to have met prescribed quality standards? * No ### Name target quality Key Performance Indicators (KPIs) for which this has been tested. * Not Applicable -### Technical robustness and model security validated? -* Not Applicable + ### Is the model and dataset compliant with National Classification Management Society (NCMS)? * Not Applicable @@ -189,20 +188,19 @@ This model is an example of a fraud detection pipeline using a graph neural netw ### Are there access restrictions to systems, model, and data? * No + ### Is there a digital signature? * No ## Model Card ++ Privacy Subcard ### Generatable or reverse engineerable personally-identifiable information (PII)? - * Neither ### Was consent obtained for any PII used? * Not Applicable (Data is extracted from synthetically created credit card transaction,refer[3] for the source of data creation) ### Protected classes used to create this model? (The following were used in model the model's training:) - * Not applicable ### How often is dataset reviewed? @@ -210,17 +208,18 @@ This model is an example of a fraud detection pipeline using a graph neural netw ### Is a mechanism in place to honor data * Yes + ### If PII collected for the development of this AI model, was it minimized to only what was required? * Not applicable ### Is data in dataset traceable? * No -### Scanned for malware? -* No + ### Are we able to identify and trace source of dataset? * Yes ### Does data labeling (annotation, metadata) comply with privacy laws? * Not applicable + ### Is data compliant with data subject requests for data correction or removal, if such a request was made? * Not applicable \ No newline at end of file diff --git a/models/model-cards/phishing-model-card.md b/models/model-cards/phishing-model-card.md index 9070a972f7..2a81460e9f 100644 --- a/models/model-cards/phishing-model-card.md +++ b/models/model-cards/phishing-model-card.md @@ -96,10 +96,6 @@ limitations under the License. * Dataset consists of SMSs
-**Dataset License:** - -* https://creativecommons.org/licenses/by/4.0/legalcode taken from https://archive.ics.uci.edu/dataset/228/sms+spam+collection
- ## Evaluation Dataset: **Link:** @@ -110,10 +106,6 @@ limitations under the License. * Dataset consists of SMSs
-**Dataset License:** - -* https://creativecommons.org/licenses/by/4.0/legalcode taken from https://archive.ics.uci.edu/dataset/228/sms+spam+collection
- ## Inference: **Engine:** @@ -207,9 +199,6 @@ limitations under the License. ### What are the potential known risks to users and stakeholders? * N/A -### What training is recommended for developers working with this model? -* Familiarity with the Morpheus SDK is recommended for developers working with this model. - ### Link the relevant end user license agreement * [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0) @@ -240,9 +229,6 @@ limitations under the License. ### Name target quality Key Performance Indicators (KPIs) for which this has been tested. * N/A -### Technical robustness and model security validated? -* No - ### Is the model and dataset compliant with National Classification Management Society (NCMS)? * No @@ -281,9 +267,6 @@ limitations under the License. ### Is data in dataset traceable? * N/A -### Scanned for malware? -* No - ### Are we able to identify and trace source of dataset? * N/A diff --git a/models/model-cards/root-cause-analysis-model-card.md b/models/model-cards/root-cause-analysis-model-card.md index 12d312bb3c..acbe4db6c2 100644 --- a/models/model-cards/root-cause-analysis-model-card.md +++ b/models/model-cards/root-cause-analysis-model-card.md @@ -95,10 +95,6 @@ limitations under the License. * kern.log files from DGX machines
-**Dataset License:** - -* [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0)
- ## Evaluation Dataset: **Link:** @@ -109,10 +105,6 @@ limitations under the License. * kern.log files from DGX machines
-**Dataset License:** - -* [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0)
- ## Inference: **Engine:** @@ -128,51 +120,39 @@ limitations under the License. ## Model Card ++ Bias Subcard ### What is the gender balance of the model validation data? - * Not Applicable ### What is the racial/ethnicity balance of the model validation data? - * Not Applicable ### What is the age balance of the model validation data? - * Not Applicable ### What is the language balance of the model validation data? - * Not Applicable ### What is the geographic origin language balance of the model validation data? - * Not Applicable ### What is the educational background balance of the model validation data? - * Not Applicable ### What is the accent balance of the model validation data? - * Not Applicable ### What is the face/key point balance of the model validation data? - * Not Applicable ### What is the skin/tone balance of the model validation data? - * Not Applicable ### What is the religion balance of the model validation data? - * Not Applicable ### Individuals from the following adversely impacted (protected classes) groups participate in model design and testing. - * Not Applicable ### Describe measures taken to mitigate against unwanted bias. - * Not Applicable ## Model Card ++ Explainability Subcard @@ -206,9 +186,6 @@ limitations under the License. ### What are the potential known risks to users and stakeholders? * N/A -### What training is recommended for developers working with this model? -* Familiarity with the Morpheus SDK is recommended for developers working with this model. - ### Link the relevant end user license agreement * [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0)
@@ -228,60 +205,45 @@ limitations under the License. * No ### Name applications for the model. - * The primary application for this model is testing the Morpheus pipeline. ### Name use case restrictions for the model. - * Different models need to be trained depending on the log types. ### Has this been verified to have met prescribed quality standards? - * No ### Name target quality Key Performance Indicators (KPIs) for which this has been tested. - * N/A -### Technical robustness and model security validated? - -* No - ### Is the model and dataset compliant with National Classification Management Society (NCMS)? - * No ### Are there explicit model and dataset restrictions? - * It is for pipeline testing purposes. ### Are there access restrictions to systems, model, and data? - * No ### Is there a digital signature? - * No ## Model Card ++ Privacy Subcard ### Generatable or reverse engineerable personally-identifiable information (PII)? - * Neither ### Was consent obtained for any PII used? * N/A ### Protected classes used to create this model? (The following were used in model the model's training:) - * N/A ### How often is dataset reviewed? * The dataset is initially reviewed upon addition, and subsequent reviews are conducted as needed or upon request for any changes. ### Is a mechanism in place to honor data subject right of access or deletion of personal data? - * N/A ### If PII collected for the development of this AI model, was it minimized to only what was required? @@ -290,9 +252,6 @@ limitations under the License. ### Is data in dataset traceable? * Original raw logs are not saved. The small sample in the repo is saved for testing the pipeline. -### Scanned for malware? -* No - ### Are we able to identify and trace source of dataset? * N/A