added step description for mia workflow

aidotse · Dec 17, 2024 · b693854 · b693854
1 parent 378a742
commit b693854
Showing 1 changed file with 34 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -19,14 +19,43 @@ The results are automatically collected, summarized, and presented in a comprehe
 
 ### Membership Inference Attacks (MIA)
 ![mia_flow](./resources/mia_flow.png) 
-
 
-## Real world examples
-# Real world examples
+In the figure above, the MIA workflow is outlined. The upper part of the image (above the dashed line) shows user-controlled inputs, while the lower part illustrates the inner workings of LeakPro.
+To evaluate MIAs using LeakPro, the following steps are necessary:
+
+- **Step 1:**  Ensure access to an auxiliary dataset, which may originate from the same distribution as the training dataset or a different one. The figure above illustrates the former case.
+Additionally, the user must split the dataset into a training set and a test set. The training set is used for model training, while the test set is used to assess the generalization gap. During evaluation, the attack will be tested on both training samples (in-members) and testing samples (out-members).
+The complete dataset (including the training, test, and auxiliary sets) will be provided to LeakPro and referred to as the population data.
+Importantly, the population dataset must be indexable.
+
+- **Step 2:** The user must provide a function to train a model using the training set. This function can either be used to train a target model or be bypassed if a pre-trained target model is provided. Regardless, the user must supply training functionality to enable the adversary to train shadow models during the evaluation process.
+It is important to note that this training functionality can be designed to limit the adversary's knowledge, as the training process may differ from the actual model training used in practice.
+Additionally, along with the target model, the user should provide the following metadata:
+    - Training and testing indices within the population data.
+    - The optimizer function used during training.
+    - The loss function applied during model evaluation.
+
+
+- **Step 3:** The user-provided inputs–population data, target model, target model metadata, and the training function–are supplied to LeakPro when the LeakPro object is created. This information is stored within the Handler object, which acts as an interface between the user and the various attacks performed by LeakPro.
+
+- **Step 4:** The relevant attacks are prepared within LeakPro, utilizing different tools based on the specific attacks being performed. For instance, some attacks rely on shadow models, while others leverage techniques such as model distillation or quantile regression.
+These tools are built using the auxiliary data, which is assumed to be accessible to the adversary, along with the training loop provided by the user.
 
-Please also check our industry use cases below. The use-cases cover four different data modalities, namely tabular, image, text, and graphs. 
-Moreover ach use case 
+Additionally, certain attacks feature both online and offline versions:
+    - Offline attacks: The adversary can only sample from the provided auxiliary dataset, limiting access to other data sources.
+    - Online attacks: The adversary can also sample from the training and test datasets, though without knowing whether specific samples were used during the training of the target model.
+
+- **Step 5:** The attack tools are utilized during attack execution to generate signals for membership inference. The membership inference attacks are evaluated on both the training and test datasets. The adversary's objective is twofold:
+
+    - Correctly infer that the training data samples are in-members (part of the training set).
+    - Accurately detect that the test data samples are out-members (not part of the training set).
+
+- **Step 6:** The signals generated by each attack, along with the corresponding decisions, are passed to LeakPro's report module for summarization. The report module compiles the results into a comprehensive PDF report for easy sharing while also storing the individual data outputs produced by the attacks for further analysis.
+Once the results have been generated and stored, the auditing process is considered complete.
+
+## Real world examples
 
+Our industry use cases cover four distinct data modalities: tabular, image, text, and graphs. Each use case supports various types of privacy attacks, providing a comprehensive evaluation framework.
 
 <div align="center">