diff --git a/models/README.md b/models/README.md index c41c3256d1..b8f15633f2 100644 --- a/models/README.md +++ b/models/README.md @@ -1,6 +1,6 @@ # Morpheus Models -Pretrained models for Morpheus with corresponding training/validation scripts and datasets. +Pretrained models for Morpheus with corresponding training, validation scripts, and datasets. ## Repo Structure Every Morpheus use case has a subfolder, **`-models`**, that contains the model files for the use case. Training and validation datasets and scripts are also provided in [datasets](./datasets/), [training-tuning-scripts](./training-tuning-scripts/), and [validation-inference-scripts](./validation-inference-scripts/). Jupyter notebook (`.ipynb`) version of the training and fine-tuning scripts are also provided. @@ -15,6 +15,15 @@ In the root directory, the file `model-information.csv` contains the following i - **Use case** - Specific Morpheus use case the model targets - **Owner** - Name of the individual who owns the model - **Version** - Version of the model (major.minor.patch) + - **Model overview** - General description + - **Model architecture** - General model architecture + - **Training** - Training dataset and paradigm + - **How to use this model** - Circumstances where this model is useful + - **Input data** - Typical data that is used as input to the model + - **Output** - Type and format of model output + - **Out-of-scope use cases** - Use cases not envisioned during development + - **Ethical considerations** - Ethical analysis of risks and harms + - **References** - Resources used in model development - **Training epochs** - Number of epochs used during training - **Batch size** - Batch size used during training - **GPU model** - Family of GPU used during training @@ -22,7 +31,6 @@ In the root directory, the file `model-information.csv` contains the following i - **Model F1** - F1 score of the model when tested - **Small test set accuracy** - Accuracy of model on validation data in datasets directory - **Memory footprint** - Memory required by the model - - **Input data** - Typical data that is used as input to the model - **Thresholds** - Values of thresholds used for validation - **NLP hash file** - Hash file for tokenizer vocabulary - **NLP max length** - Max_length value for tokenizer @@ -34,18 +42,109 @@ In the root directory, the file `model-information.csv` contains the following i - **Version Ubuntu** - Ubuntu version used during training - **Version Transformers** - Transformers version used during training -## Current Use Cases Supported by Models Here -### Sensitive Information Detection -Sensitive information detection is used to identify pieces of sensitive data (e.g., AWS credentials, GitHub credentials, passwords) in unencrypted data. The model for this use case is an NLP model, specifically a transformer-based model with attention (e.g., mini-BERT). +# Model Card Info +## Sensitive Information Detection (SID) +### Model Overview +SID is a classifier, designed to detect sensitive information (e.g., AWS credentials, GitHub credentials) in unencrypted data. This example model classifies text containing these 10 categories of sensitive information- address, bank account, credit card number, email address, government id number, full name, password, phone number, secret keys, and usernames. +### Model Architecture +Compact BERT-mini transformer model +### Training +Training consisted of fine-tuning the original pretrained [model from google](https://huggingface.co/google/bert_uncased_L-4_H-256_A-4). The labeled training dataset is 2 million synthetic pcap payloads generated using the [faker package](https://github.com/joke2k/faker) to mimic sensitive and benign data found in nested jsons from web APIs and environmental variables. +### How To Use This Model +This model is an example of customized transformer-based sensitive information detection. It can be further fine-tuned for specific detection needs or retrained for alternative categorizations using the fine-tuning scripts in the repo. +#### Input +English text from PCAP payloads +#### Output +Multi-label sequence classification for 10 sensitive information categories +### References +Well-Read Students Learn Better: On the Importance of Pre-training Compact Models, 2019, https://arxiv.org/abs/1908.08962 + +## Phishing Email Detection +### Model Overview +Phishing email detection is a binary classifier differentiating between phishing and non-phishing emails. +### Model Architecture +BERT-base uncased transformer model +### Training +Training consisted of fine-tuning the original pretrained [model from google](https://huggingface.co/bert-base-uncased). The labeled training dataset is around 20000 emails from three public datasets ([CLAIR](https://www.kaggle.com/datasets/rtatman/fraudulent-email-corpus), [SPAM_ASSASIN](https://spamassassin.apache.org/old/publiccorpus/readme.html), [Enron](https://www.cs.cmu.edu/~./enron/)) +### How To Use This Model +This model is an example of customized transformer-based phishing email detection. It can be further fine-tuned for specific detection needs and customized the emails of your enterprise using the fine-tuning scripts in the repo. +#### Input +Entire email as a string +#### Output +Binary sequence classification as phishing or non-phishing +### References +- Radev, D. (2008), CLAIR collection of fraud email, ACL Data and Code Repository, ADCR2008T001, http://aclweb.org/aclwiki +- Devlin J. et al. (2018), BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding +https://arxiv.org/abs/1810.04805 + -### Anomalous Behavior Profiling -This use case is currently implemented to differentiate between crypto mining / GPU malware and other GPU-based workflows (e.g., ML/DL training). The model is a XGBoost model. +## Anomalous Behavior Profiling (ABP) +### Model Overview +This model is an example of a binary classifier to differentiate between anomalous GPU behavior such as crypto mining / GPU malware, and non-anomalous GPU-based workflows (e.g., ML/DL training). The model is an XGBoost model. +### Model Architecture +XGBoost +### Training +Training consisted of ~1000 labeled nv-smi logs generated from processes running either GPU malware or bengin GPU-based workflows. +### How To Use This Model +This model can be used to flag anomalous GPU activity. +#### Input +nv-smi data +#### Output +Binary classification as anomalous or benign. +### References +Chen, Guestrin (2016) XGBoost. A scalable tree boosting system. https://arxiv.org/abs/1603.02754 -### Phishing Email Detection -This use case is currently implemented to differentiate between phishing and non-phishing emails. The models for this use case are NLP models, specifically transformer-based models with attention (e.g., BERT). +## Humans-As-Machines-Machines-As-Humans Detection (HAMMAH) +### Model Overview +This use case is currently implemented to detect changes in users' behavior that indicate a change from a human to a machine or a machine to a human. The model is an ensemble of an Autoencoder and fast Fourier transform reconstruction. +### Model Architecture +The model is an ensemble of an Autoencoder and a fast Fourier transform reconstruction. The reconstruction loss of new log data through the trained Autoencoder is used as an anomaly score. Concurrently, the timestamps of user/entity activity are used for a time series analysis to flag activity with poor reconstruction after a fast Fourier transform. +### Training +The Autoencoder is trained on a baseline benign period of user activity. +### How To Use This Model +This model is one example of an Autoencoder trained from a baseline for benign activity from synthetic `user-123` and `role-g`. This model combined with validation data from Morpheus examples can be used to test the HAMMAH Morpheus pipeline. It has little utility outside of testing. +### Input +aws-cloudtrail logs +### Output +Anomalous score of Autoencoder, Binary classification of time series anomaly detection +### References +- https://github.com/AlliedToasters/dfencoder/blob/master/dfencoder/autoencoder.py +- https://github.com/rapidsai/clx/blob/branch-22.04/notebooks/anomaly_detection/FFT_Outlier_Detection.ipynb +- Rasheed Peng Alhajj Rokne Jon: Fourier Transform Based Spatial Outlier Mining 2009 - https://link.springer.com/chapter/10.1007/978-3-642-04394-9_39 -### Humans-As-Machines-Machines-As-Humans Detection -This use case is currently implemented to detect changes in users' behavior that indicate a change from a human to a machines or a machine to a human. The model is an ensemble of an autoencoder and fast fourier transform reconstruction. +## Flexible Log Parsing +### Model Overview +This model is an example of using Named Entity Recognition (NER) for log parsing, specifically apache web logs. +### Model Architecture +BERT-based cased transformer model with NER classification layer +### Training +Training consisted of fine-tuning the original pretrained [model from google](https://huggingface.co/bert-base-cased). The labeled training dataset is 1000 parsed apache web logs from a public dataset [logpai](https://github.com/logpai/loghub) +### How To Use This Model +This model is one example of a BERT-model trained to parse raw logs. It can be used to parse apache web logs or retrained to parse other types of logs as well. The model file has a corresponding config.json file with the names of the fields it parses. +#### Input +raw apache web logs +#### Output +parsed apache web log as jsonlines +### References +- Devlin J. et al. (2018), BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding +- https://arxiv.org/abs/1810.04805 +- https://medium.com/rapids-ai/cybert-28b35a4c81c4 +- https://www.splunk.com/en_us/blog/it/how-splunk-is-parsing-machine-logs-with-machine-learning-on-nvidia-s-triton-and-morpheus.html -### Fraud detection system Detection -This use case implemented to identify fraudulent transactions from legal transaction in credit card transaction network. The model is based on a combination of graph neural network and gradient boosting tree. It uses a bipartite heterogenous graph representation as input for GraphSAGE for feature learning and XGBoost as a classifier. \ No newline at end of file +## Fraud Detection +### Model Overview +This model shows an application of a graph neural network for fraud detection in a credit card transaction graph. A transaction dataset that includes three types of nodes, transaction, client, and merchant nodes is used for modeling. A combination of `GraphSAGE` along `XGBoost` is used to identify frauds in the transaction networks. +### Model Architecture +It uses a bipartite heterogeneous graph representation as input for `GraphSAGE` for feature learning and `XGBoost` as a classifier. Since the input graph is heterogenous, a heterogeneous implementation of `GraphSAGE` (HinSAGE) is used for feature embedding. +### Training +A training data consists of raw 753 labeled credit card transaction data with data augmentation in a total of 12053 labeled transaction data. The `GraphSAGE` is trained to output embedded representation of transactions out of the graph. The `XGBoost` is trained using the embedded features as a binary classifier to classify fraud and genuine transactions. +### How To Use This Model +This model is an example of a fraud detection pipeline using a graph neural network and gradient boosting trees. This can be further retrained or fine-tuned to be used for similar types of transaction networks with similar graph structures. +#### Input +Transaction data with nodes including transaction, client, and merchant. +#### Output +An anomalous score of transactions indicates a probability score of being a fraud. +### References +- https://stellargraph.readthedocs.io/en/stable/hinsage.html?highlight=hinsage +- https://github.com/rapidsai/clx/blob/branch-0.20/examples/forest_inference/xgboost_training.ipynb +- Rafaël Van Belle, Charles Van Damme, Hendrik Tytgat, Jochen De Weerdt,Inductive Graph Representation Learning for fraud detection (https://www.sciencedirect.com/science/article/abs/pii/S0957417421017449) \ No newline at end of file diff --git a/models/model-information.csv b/models/model-information.csv index 1cc1c062b9..a14ba4eb62 100644 --- a/models/model-information.csv +++ b/models/model-information.csv @@ -1,7 +1,7 @@ -model-name,use-case,owner,version,training-epochs,batch-size,gpu-model,model-accuracy,model-f1,small-test-set-accuracy,memory-footprint,input-data,thresholds,nlp.hash-file,nlp.max_length,nlp.stride,nlp.do_lower,nlp.do_truncate,version.cuda,version.pytorch,version.python,version.ubuntu,version.transformers -abp-nvsmi-xgb-20210310.bst,anomalous-behavior-profiling,Gorkem Batmaz,0.1.0,10,N/A,V100,1,1,1,4KB,nvidia_smi,N/A,N/A,N/A,N/A,N/A,N/A,11.0,1.7.1,3.8.10,18.04.5 LTS,N/A -phishing-bert-20211006.onnx,phishing-detection,Gorkem Batmaz,0.1.0,3,32,V100,0.99,0.984,0.999,427MB,emails,N/A,bert-base-uncased,128,N/A,TRUE,TRUE,11.0,1.7.1,3.8.10,18.04.5 LTS,4.6.1 -sid-minibert-20211021.onnx,sensitive-information-detection,Rachel Allen,0.2.0,1,32,V100,0.96,0.96,0.9875,43MB,L7_payload_raw,N/A,bert-base-uncased,256,64,TRUE,FALSE,11.0,1.8,3.8.10,18.04.5 LTS,4.5 -hammah-user123-20211017.pkl,humans-as-machines,Gorkem Batmaz,0.1.0,25,,V100,1,1,1,3MB,cloudtrail logs,"ae=4, ts=4",N/A,N/A,N/A,N/A,N/A,11.0,1.7.1,3.8.10,18.04.5 LTS,N/A -hammah-role-g-20211017.pkl,humans-as-machines,Gorkem Batmaz,0.1.0,25,,V100,1,1,1,9MB,cloudtrail logs,"ae=4, ts=4",N/A,N/A,N/A,N/A,N/A,11.0,1.7.1,3.8.10,18.04.5 LTS,N/A -hinsage-model.pt,Fraud-detection,Tad Zemicheal,0.1.0, 30 ,5 ,V100, NA, 0.96, NA, 756KB, transaction data,NA/0.5,N/A,N/A,N/A,N/A,N/A,11.0/11.4,1.9.1,3.8.10,18.04.5 LTS,N/A \ No newline at end of file +model-name,use-case,owner,version,model-overview,model-architecture,training,how-to-use-this-model,input-data,output,out-of-scope use cases,ethical-considerations,references,training-epochs,batch-size,gpu-model,model-accuracy,model-f1,small-test-set-accuracy,memory-footprint,thresholds,nlp.hash-file,nlp.max_length,nlp.stride,nlp.do_lower,nlp.do_truncate,version.cuda,version.pytorch,version.python,version.ubuntu,version.transformers +abp-nvsmi-xgb-20210310.bst,anomalous-behavior-profiling,Gorkem Batmaz,0.1.0,"This model is an example of a binary classifier to differentiate between anomalous GPU behavior such as crypto mining / GPU malware, and non-anomalous GPU-based workflows (e.g., ML/DL training). The model is a XGBoost model.",XGBoost,Training consisted of 1000 labeled nv-smi logs generated from processes running either GPU malware or bengin GPU-based workflows.,This model can be used to flag anomalous GPU activity. ,nvidia_smi,Binary classification as anomalous or benign.,This model version is trained on lab data. Use in your environement could require retraining.,N/A,"Chen, Guestrin (2016) XGBoost. A scalable tree boosting system. https://arxiv.org/abs/1603.02754",10,N/A,V100,1,1,1,4KB,N/A,N/A,N/A,N/A,N/A,N/A,11,1.7.1,3.8.10,18.04.5 LTS,N/A +phishing-bert-20211006.onnx,phishing-detection,Gorkem Batmaz,0.1.0,Phishing email detection is a binary classifier differentiating between phishing and non-phishing emails.,bert-base-uncased transformer model,"Training consisted of fine-tuning the original pretrained [model from google](https://huggingface.co/bert-base-uncased). The labeled training dataset is around 20000 emails from three public datasets ([CLAIR](https://www.kaggle.com/datasets/rtatman/fraudulent-email-corpus), [SPAM_ASSASIN](https://spamassassin.apache.org/old/publiccorpus/readme.html), [Enron](https://www.cs.cmu.edu/~./enron/))",This model is an example of customized transformer-based phishing email detection. It can be further fine-tuned for specific detection needs and customized the emails of your enterprise using the fine-tuning scripts in the repo.,"Entire email as a string, including header and formatting",Binary sequence classification as phishing or non-phishing.,This model version is designed for english language text data. It may not perform well on other languages.,N/A,"[Radev, D. (2008), CLAIR collection of fraud email, ACL Data and Code Repository, ADCR2008T001]( http://aclweb.org/aclwiki) Devlin J. et al. (2018), [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) ",3,32,V100,0.99,0.984,0.999,427MB,N/A,bert-base-uncased,128,N/A,TRUE,TRUE,11,1.7.1,3.8.10,18.04.5 LTS,4.6.1 +sid-minibert-20211021.onnx,sensitive-information-detection,Rachel Allen,0.2.0,"SID is a classifier, designed to detect sensitive information (e.g., AWS credentials, GitHub credentials) in unencrypted data. This example model classifies text containing these 10 categories of sensitive information- address, bank account, credit card number, email address, government id number, full name, password, phone number, secret keys, and usernames.",Compact BERT-mini transformer model,Training consisted of fine-tuning the original pretrained [model from google](https://huggingface.co/google/bert_uncased_L-4_H-256_A-4). The labeled training dataset is 2 million synthetic pcap payloads generated using the [faker package](https://github.com/joke2k/faker) to mimic sensitive and benign data found in nested jsons from web APIs and environmental variables.,This model is an example of customized transformer-based sensitive information detection. It can be further fine-tuned for specific detection needs or retrained for alternative categorizations using the fine-tuning scripts in the repo.,English text from PCAP payloads,Multi-label sequence classification for 10 sensitive information categories,This model version is designed for english language text data. It may not perform well on other languages.,N/A,"Well-Read Students Learn Better: On the Importance of Pre-training Compact Models, 2019,Ê https://arxiv.org/abs/1908.08962",1,32,V100,0.96,0.96,0.9875,43MB,N/A,bert-base-uncased,256,64,TRUE,FALSE,11,1.8,3.8.10,18.04.5 LTS,4.5 +hammah-user123-20211017.pkl and hammah-role-g-20211017.pkl,digital-fingerprinting/ humans-as-machines,Gorkem Batmaz,0.1.0,This model is one example of an Autoencoder trained from a baseline for benign activity from synthetic `user-123` and `role-g`. This model combined with validation data from Morpheus examples can be used to test the HAMMAH Morpheus pipeline. It has little utility outside of testing.,"The model is an ensemble of an Autoencoder and a fast Fourier transform reconstruction. The reconstruction loss of new log data through the trained Autoencoder is used as an anomaly score. Concurrently, the timestamps of user/entity activity are used for a time series analysis to flag activity with poor reconstruction after a fast Fourier transform.",,This model is one example of an Autoencoder trained from a baseline for benign activity from synthetic `user-123` and `role-g`. This model combined with validation data from Morpheus examples can be used to test the HAMMAH Morpheus pipeline. It has little utility outside of testing.,aws-cloudtrail logs,"Anomalous score of Autoencoder, Binary classification of time series anomaly detection",This particular model is an example based on a synthetic users baseline behavior. Use on other datasets will require retraining.,N/A,https://github.com/AlliedToasters/dfencoder/blob/master/dfencoder/autoencoder.py https://github.com/rapidsai/clx/blob/branch-22.04/notebooks/anomaly_detection/FFT_Outlier_Detection.ipynb Rasheed Peng Alhajj Rokne Jon: Fourier Transform Based Spatial Outlier Mining 2009 - https://link.springer.com/chapter/10.1007/978-3-642-04394-9_39,25,,V100,1,1,1,3MB and 9MB,"ae=4, ts=4",N/A,N/A,N/A,N/A,N/A,11,1.7.1,3.8.10,18.04.5 LTS,N/A +hinsage-model.pt and xgb.pth,fraud-detection,Tad Zemicheal,0.1.0,"This model shows an application of a graph neural network for fraud detection in a credit card transaction graph. A transaction dataset that includes three types of nodes, transaction, client, and merchant nodes is used for modeling. A combination of `GraphSAGE` along `XGBoost` is used to identify frauds in the transaction networks.","It uses a bipartite heterogeneous graph representation as input for `GraphSAGE` for feature learning and `XGBoost` as a classifier. Since the input graph is heterogenous, a heterogeneous implementation of `GraphSAGE` (HinSAGE) is used for feature embedding.",This model is an example of a fraud detection pipeline using a graph neural network and gradient boosting trees. This can be further retrained or fine-tuned to be used for similar types of transaction networks with similar graph structures.,This model is an example of a fraud detection pipeline using a graph neural network and gradient boosting trees. This can be further retrained or fine-tuned to be used for similar types of transaction networks with similar graph structures.,"Transaction data with nodes including transaction, client, and merchant.",An anomalous score of transactions indicates a probability score of being a fraud.,These particular model files are based on a synthetic transaction graph. Use with other datasets will require retraining.,N/A," https://stellargraph.readthedocs.io/en/stable/hinsage.html?highlight=hinsage https://github.com/rapidsai/clx/blob/branch-0.20/examples/forest_inference/xgboost_training.ipynb [Rafa‘l Van Belle, Charles Van Damme, Hendrik Tytgat, Jochen De Weerdt,Inductive Graph Representation Learning for fraud detection] (https:/www.sciencedirect.com/science/article/abs/pii/S0957417421017449)",30,5,V100, NA,0.96, NA, 756KB,N/A and 0.5,N/A,N/A,N/A,N/A,N/A,11.0/11.4,1.9.1,3.8.10,18.04.5 LTS,N/A +log-parsing-20220418.onnx,log-parsing,Rachel Allen,0.1.0,"This model is an example of using Named Entity Recognition (NER) for log parsing, specifically apache web logs.",bert-base-cased transformer model,Training consisted of fine-tuning the original pretrained [model from google](https://huggingface.co/bert-base-cased). The labeled training dataset is 1000 parsed apache web logs from a public dataset [logpai](https://github.com/logpai/loghub),This model is one example of a BERT-model trained to parse raw logs. It can be used to parse apache web logs or retrained to parse other types of logs as well. The model file has a corresponding config.json file with the names of the fields it parses.,raw apache web logs,parsed apache web log as jsonlines,This model version is designed for english language text data. It may not perform well on other languages.,N/A,[1](https://arxiv.org/abs/1810.04805) [2](https://medium.com/rapids-ai/cybert-28b35a4c81c4) [3](https://www.splunk.com/en_us/blog/it/how-splunk-is-parsing-machine-logs-with-machine-learning-on-nvidia-s-triton-and-morpheus.html),2,32,V100,0.99,0.99,0.999,431MB,N/A,bert-base-cased,256,64,FALSE,FALSE,11,1.9.1,3.8.10,18.04.5 LTS,4.18 \ No newline at end of file diff --git a/models/model_cards.csv b/models/model_cards.csv deleted file mode 100644 index 9638b7a7b4..0000000000 --- a/models/model_cards.csv +++ /dev/null @@ -1,20 +0,0 @@ -Model name,Use case,Model description,Owner,Version,Max allowed error rate for pipeline test,Settings for pipeline variables,Memory footprint,Training epochs,Training batch size,Training GPU,Intended users,Intended use cases,Out-of-scope use cases,Metrics,Evaluation Data,Training Data,Ethical Considerations,References -sid-minibert-20211021.onnx,sensitive-information-detection,"This model is a transformer-based sequence classifier (mini-bert) trained to detect sensitive data in unencrypted text. The ten categories of sensitive information include- address, bank account, credit card number, email address, government id number, full name, password, phone number, secret keys, and user names.",Rachel Allen,0.2.0,4%,for tokenizer: hash-file=bert-base-uncased max-length=256 stride=64 do-lower=TRUE do-truncation=FALSE,43MB,1,32,V100,cyber security and IT professionals,To detect leaked sensitive information from raw L7 payload data,This model version is designed for english language text data. It may not perform well on other languages.,F1=0.96,200k synthetic dataset with balanced classes for all 10 sensitive data labels, 2 million synethic pcap payloads generated using the faker repo to mimic sensitive and benign data found in nested jsons from web APIs and environmental variables,N/A,"Well-Read Students Learn Better: On the Importance of Pre-training Compact Models, 2019, arXiv:1908.08962v2" -phishing-bert-20211006.onnx,phishing-email-detection,"This use case is currently implemeted to differentiate between phishing and non-phishing emails. The models for this use case are NLP models, specifically transformer-based models with attention (e.g., BERT).",Gorkem Batmaz,0.1.0,0.10%,for tokenizer: hash-file=bert-base-uncased max-length=128 stride=64 do-lower=TRUE do-truncation=FALSE,417MB,3,32,V100,cyber security and IT professionals,To detect phishing emails,This model version is designed for english language text data. It may not perform well on other languages.,F1=0.984,4946 labelled emails from three public datasets,19783 labelled emails from three public datasets,N/A,"Radev, D. (2008), CLAIR collection of fraud email, ACL Data and Code Repository, ADCR2008T001, http://aclweb.org/aclwiki -https://www.kaggle.com/rtatman/fraudulent-email-corpus * -https://www.cs.cmu.edu/~./enron/ -https://spamassassin.apache.org/old/publiccorpus/readme.html -https://github.com/huggingface/transformers/tree/master/examples# -https://www.depends-on-the-definition.com/named-entity-recognition-with-bert/ -https://github.com/ThilinaRajapakse/pytorch-transformers-classification -https://mccormickml.com/2019/07/22/BERT-fine-tuning/" -abp-nvsmi-xgb-20210310.bst,anomalous-behavior-profiling,"This use case is currently implemeted to differentiate between crypto mining / GPU malware and other GPU-based workflows (e.g., ML/DL training). The model is a XGBoost model.",Gorkem Batmaz,0.1.1,TBD,N/A,3KB,5,N/,V100,cyber security and IT professionals,To detect crypto mining,,Accuracy=1,248 labelled nv-smi logs,994 labelled nv-smi logs,N/A,"https://docs.rapids.ai/api/cuml/stable/ -https://medium.com/rapids-ai/rapids-forest-inference-library-prediction-at-100-million-rows-per-second-19558890bc35 -https://developer.nvidia.com/nvidia-system-management-interface -https://developer.nvidia.com/morpheus-cybersecurity -https://github.com/rapidsai/clx/blob/branch-0.20/examples/forest_inference/xgboost_training.ipynb" -hinsage_model.pt and xgb-model.pt,Fraud detection using GNN,"This use case a graph neural network based fraud detecter in transaction data -.It uses a bipartite heteregenous graph representation as input for GraphSAGE for feature learning and XGBoost as a classifier.", Tad Zemicheal,0.1.1,TBD,N/A,756KB,25, 5 ,V100,cyber security and IT professionals,To detect fraud in transaction data,4%,AUC=0.96,265 labelled credit card transaction, 753 labelled credit card transaction data, N/A," -https://stellargraph.readthedocs.io/en/stable/hinsage.html?highlight=hinsage, -https://github.com/rapidsai/clx/blob/branch-0.20/examples/forest_inference/xgboost_training.ipynb" -hammah-user123-20211017-dill.pkl,hammah,This use case is currently implemeted to detect changes in users' behavior that incate a change from a human to a machines or a machine to a human. The model is an ensemble of an autoencoder and fast fourier transform reconstruction.,Gorkem Batmaz,0.1,TBD,N/A,2.62MB,25,256,V100,cyber security and IT professionalsTo detect anomalies,The Autoencoder part needs retraining for different use cases or log types,,F1=1,847 rows of cloudtrail logs,3387 rows of cloudtrail logs,N/A,https://github.com/AlliedToasters/dfencoder/blob/master/dfencoder/autoencoder.py https://github.com/rapidsai/clx/blob/branch-22.04/notebooks/anomaly_detection/FFT_Outlier_Detection.ipynb Rasheed Peng Alhajj Rokne Jon: Fourier Transform Based Spatial Outlier Mining 2009