MalwareMemoryAnalysis

Definition of the Business Problem

Malicious software or Malware is a program or code that aims to harm, damage, or disable computers, applications, systems, or mobile phones. So, we need to classify the running software if its benign or malware and if its malware, so which type of malware it is.

Data set:

The dataset was created to represent as close to a real-world situation as possible using malware that is prevalent in the real world. Made up of Spyware, Ransomware and Trojan Horse malware, it provides a balanced dataset that can be used to test obfuscated malware detection systems.

CIC-MalMem-2022: https://www.unb.ca/cic/datasets/malmem-2022.html

Features Selection:

The data set features are: pslist.nproc pslist.nppid pslist.avg_threads pslist.nprocs64bit pslist.avg_handlers dlllist.ndlls dlllist.avg_dlls_per_proc handles.nhandles handles.avg_handles_per_proc handles.nport handles.nfile handles.nevent handles.ndesktop handles.nkey handles.nthread handles.ndirectory handles.nsemaphore handles.ntimer handles.nsection handles.nmutant ldrmodules.not_in_load ldrmodules.not_in_init ldrmodules.not_in_mem ldrmodules.not_in_load_avg ldrmodules.not_in_init_avg ldrmodules.not_in_mem_avg malfind.ninjections malfind.commitCharge malfind.protection malfind.uniqueInjections psxview.not_in_pslist psxview.not_in_eprocess_pool psxview.not_in_ethread_pool psxview.not_in_pspcid_list psxview.not_in_csrss_handles psxview.not_in_session psxview.not_in_deskthrd psxview.not_in_pslist_false_avg psxview.not_in_eprocess_pool_false_avg psxview.not_in_ethread_pool_false_avg psxview.not_in_pspcid_list_false_avg psxview.not_in_csrss_handles_false_avg psxview.not_in_session_false_avg psxview.not_in_deskthrd_false_avg modules.nmodules svcscan.nservices svcscan.kernel_drivers svcscan.fs_drivers svcscan.process_services svcscan.shared_process_services svcscan.interactive_process_services svcscan.nactive callbacks.ncallbacks callbacks.nanonymous callbacks.ngeneric

Exploratory Data Analysis

Extracting and Loading Data

import numpy as np
def importdata():
    MalMem2022 = pd.read_csv('/content/Obfuscated-MalMem2022.csv')
# Printing the dataset shape
print ("Dataset of Malware_memory_Analysis_2022 Length is: ", len(MalMem2022))
    print ("Dataset of Malware_memory_Analysis_2022 Shape is: ", MalMem2022.shape)

Dataset of Malware_memory_Analysis_2022 Length is: 58596

Dataset of Malware_memory_Analysis_2022 Shape is: (58596, 58)

    # Printing the dataset obseravtions
    print ("Dataset of Malware_memory_Analysis_2022 is: ",MalMem2022.head())
    return MalMem2022

Dataset of Malware_memory_Analysis_2022 is: pslist.nproc pslist.nppid pslist.avg_threads pslist.nprocs64bit
0 45 17 10.555556 0
1 47 19 11.531915 0
2 40 14 14.725000 0
3 32 13 13.500000 0
4 42 16 11.452381 0

pslist.avg_handlers dlllist.ndlls dlllist.avg_dlls_per_proc
0 202.844444 1694 38.500000
1 242.234043 2074 44.127660
2 288.225000 1932 48.300000
3 264.281250 1445 45.156250
4 281.333333 2067 49.214286

handles.nhandles handles.avg_handles_per_proc handles.nport ...
0 9129 212.302326 0 ...
1 11385 242.234043 0 ...
2 11529 288.225000 0 ...
3 8457 264.281250 0 ...
4 11816 281.333333 0 ...

svcscan.process_services svcscan.shared_process_services
0 24 116
1 24 118
2 27 118
3 27 118
4 24 118

svcscan.interactive_process_services svcscan.nactive
0 0 121
1 0 122
2 0 120
3 0 120
4 0 124

callbacks.ncallbacks callbacks.nanonymous callbacks.ngeneric Class
0 87 0 8 Benign
1 87 0 8 Benign
2 88 0 8 Benign
3 88 0 8 Benign
4 87 0 8 Benign

Category SubCategory
0 Benign Benign
1 Benign Benign
2 Benign Benign
3 Benign Benign
4 Benign Benign

[5 rows x 58 columns]

# Function to split the dataset 
def splitdataset(MalMem2022):
  
    # Separating the target variable
    X = MalMem2022.values[:, 0:55]
    Y = MalMem2022.values[:, 56]
  
    # Splitting the dataset into 75% training and 25% testing
    X_train, X_test, y_train, y_test = train_test_split( 
    X, Y, test_size = 0.25, random_state = 0)
      
    return X, Y, X_train, X_test, y_train, y_test
      
# Function to perform training with giniIndex (Origional)
def train_using_gini(X_train, X_test, y_train):
  
    # Creating the classifier object
    clf_gini = DecisionTreeClassifier(criterion = "gini",
            random_state = 0)
  
    # Performing training
    clf_gini.fit(X_train, y_train)
    return clf_gini
      
# Function to make predictions
def prediction(X_test, clf_object):
  
    # Predicton on test with giniIndex
    y_pred = clf_object.predict(X_test)
    print("Predicted values:")
    print(y_pred)
    return y_pred
      
# Function to calculate accuracy
def cal_accuracy(y_test, y_pred):
      
    print("Confusion Matrix: ",
        confusion_matrix(y_test, y_pred))
      
    print ("Accuracy : ",
    accuracy_score(y_test,y_pred)*100)
      
    print("Report : ",
    classification_report(y_test, y_pred))

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MalwareMemoryAnalysis

Definition of the Business Problem

Data set:

Features Selection:

Exploratory Data Analysis

Extracting and Loading Data

About

Releases

Packages

Maysakh/MalwareMemoryAnalysis

Folders and files

Latest commit

History

Repository files navigation

MalwareMemoryAnalysis

Definition of the Business Problem

Data set:

Features Selection:

Exploratory Data Analysis

Extracting and Loading Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages