Skip to content

Project targets to detect errors and anomalies in hardware behaviour during tests. Part of this work is already incorporated into the team’s internal tools and is in sharp operation. The rest of the bachelor thesis serves as a prototype to test other functionalities such as machine learning and visualisation.

Notifications You must be signed in to change notification settings

simonf-dev/ml-anomaly-detection

Repository files navigation

Prepare virtual env

Tutorial for Ubuntu: This tutorial is mainly for Ubuntu, but it should work on other Unix distributions too, just change installation and specific commands for your distribution. 1.First install python3.6 to your system(only version of pyton3.6 is supported).

  1. We need to download correct version of install pip to it and create venv. Its done by single command.
    sudo make prepare-venv

  2. Activate venv. source venv/bin/activate

Tutorial in general

This tool is for anomaly detection in automatized tests from CQE team in the company Red Hat. Tests are executed on the Beaker platform. Then its used pcplog tool from './dist' (which is part of the bachelor thesis) to extract all important information from PCP archives and metadata files. All information are saved to JSON outputs(all output_NUMBER.json outputs). Then are these JSON outputs parsed by number of nodes, types of metrics and types of test_cases to folder './data'. In the parsing workflow are solved problems as missing values etc. Datasets are saved by some hierarchy in that folder and then are used by ML. There are also pretrained models in './data' folder for every test_case,number of nodes etc. Now we have 4 types of functionality in the tool. All functionality options are executed from command line by starting main.py with commands and opts.

  1. We can get output_{ID}.json from pcp_archive and metadata files. (Not working right now, because I need to get data without critical information)

  2. We can save data from output_{ID}.json to data folder which is used by ML. Data are parsed by number of nodes, test_cases and then saved in some hierarchy defined in settings.py for data folder. If the dataset already exists, its replaced and no duplications arise.

  3. We can get ML results for concrete output_{ID}.json, models tell us if there are some anomalies or not. We can get output in logical or readable format.

  4. We can retrain model for specific test_case, number of nodes, type of data etc. Its taken default autoencoder model and data from root folder. Model is then saved in root folder and also threshold is saved for it.

  5. We can get plot picture for concrete PCP output json. Its good for visualization, what we do we measure and to see example of data for concrete jobs.

I made it like a console application for an easy use and reproducibility, but architecture should be done to adapt for REST API etc.

Fun starts

We have 4 main functions of this tool. Functionality is also combined with installed pcplog tool. Lets try, what our tool offers.
python3.6 main.py --help

Now we can see basic help to identify all commands. Lets try execute some ML analysis.
python3.6 main.py get-result --help

We have some prepared data in folder exemplary_data. We have trained models for all testcases, so now we can check if our exemplary_data are OK or there are anomalies in some testcases.
python3.6 main.py get-result -i exemplary_data/output_114711.json -d ./data

We can see, if there were found some anomalies or not. If we want to logical output, we can just use this option.
python3.6 main.py get-result -i exemplary_data/output_114711.json -d ./data --logical-output

Be free to try other options from --help and also another files from 'exemplary_data'. You can also use data from the folder 'pcp_raw_data', but models are trained from them and results wouldn't tell too much.

Now we can train another function. Imagine situation that we made ML analysis on some output as above. Now we know, that output is correct and we want to save him to next retrain of the model.
python3.6 main.py save-result -o ./data -i ./exemplary_data/output_114711.json

What if only one of the testcase ended correct and we only want to save that concrete testcase?
python3.6 main.py save-result -l testcase_3 -o ./data -i ./exemplary_data/output_114711.json

Be free to use this command more times, because every output has unique id and we avoid duplications by changing row in the ML data.

So now we saved some new data and we want to retrain the model. Retraining of the model could be really hard for your CPU or GPU and also could be very long. So be free to skip this step. We will only retrain model for testcase_3 and only for cpus metrics.
python3.6 main.py retrain-model -l testcase_3 -d ./data --skip-memory

Now we can print plots and save them to some file.
python3.6 main.py get-picture -i ./exemplary_data/output_114711.json -o example_picture.svg

TODO: Last functionality of the tool is to generate output JSON from pcp_archive and metadata files(that JSON files, that are used in other functionality options), we need to generate archives on nonlicensed distributions, so functionality is working ok, but there are no available data yet.

About

Project targets to detect errors and anomalies in hardware behaviour during tests. Part of this work is already incorporated into the team’s internal tools and is in sharp operation. The rest of the bachelor thesis serves as a prototype to test other functionalities such as machine learning and visualisation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published