Skip to content

YT-SUN97/trojai-literature

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 

Repository files navigation

TrojAI Literature Review

The list below contains curated papers and arXiv articles that are related to Trojan attacks, backdoor attacks, and data poisoning on neural networks and machine learning systems. They are ordered approximately from most to least recent and articles denoted with a "*" mention the TrojAI program directly. Some of the particularly relevant papers include a summary that can be accessed by clicking the "Summary" drop down icon underneath the paper link. These articles were identified using variety of methods including:

  • flair embedding created from the arXiv CS subset; details will be provided later.
  • A trained ASReview random forest model
  • A curated manual literature review
  1. Backdoor Attacks on Self-Supervised Learning

  2. Transferable Environment Poisoning: Training-time Attack on Reinforcement Learning

  3. Investigation of a differential cryptanalysis inspired approach for Trojan AI detection

  4. Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers

  5. Robust Backdoor Attacks against Deep Neural Networks in Real Physical World

  6. The Design and Development of a Game to Study Backdoor Poisoning Attacks: The Backdoor Game

  7. A Backdoor Attack against 3D Point Cloud Classifiers

  8. Explainability-based Backdoor Attacks Against Graph Neural Networks

  9. DeepSweep: An Evaluation Framework for Mitigating DNN Backdoor Attacks using Data Augmentation

  10. Rethinking the Backdoor Attacks' Triggers: A Frequency Perspective

  11. PointBA: Towards Backdoor Attacks in 3D Point Cloud

  12. Online Defense of Trojaned Models using Misattributions

  13. Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

  14. SPECTRE: Defending Against Backdoor Attacks Using Robust Covariance Estimation

  15. Black-box Detection of Backdoor Attacks with Limited Information and Data

  16. TOP: Backdoor Detection in Neural Networks via Transferability of Perturbation

  17. T-Miner : A Generative Approach to Defend Against Trojan Attacks on DNN-based Text Classification

  18. Hidden Backdoor Attack against Semantic Segmentation Models

  19. What Doesn't Kill You Makes You Robust(er): Adversarial Training against Poisons and Backdoors

  20. Red Alarm for Pre-trained Models: Universal Vulnerabilities by Neuron-Level Backdoor Attacks

  21. Provable Defense Against Delusive Poisoning

  22. An Approach for Poisoning Attacks Against RNN-Based Cyber Anomaly Detection

  23. Backdoor Scanning for Deep Neural Networks through K-Arm Optimization

  24. TAD: Trigger Approximation based Black-box Trojan Detection for AI*

  25. WaNet - Imperceptible Warping-based Backdoor Attack

  26. Data Poisoning Attack on Deep Neural Network and Some Defense Methods

  27. Baseline Pruning-Based Approach to Trojan Detection in Neural Networks*

  28. Covert Model Poisoning Against Federated Learning: Algorithm Design and Optimization

  29. Property Inference from Poisoning

  30. TROJANZOO: Everything you ever wanted to know about neural backdoors (but were afraid to ask)

  31. A Master Key Backdoor for Universal Impersonation Attack against DNN-based Face Verification

  32. Detecting Universal Trigger's Adversarial Attack with Honeypot

  33. ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

  34. Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks

  35. Data Poisoning Attacks to Deep Learning Based Recommender Systems

  36. Backdoors hidden in facial features: a novel invisible backdoor attack against face recognition systems

  37. One-to-N & N-to-One: Two Advanced Backdoor Attacks against Deep Learning Models

  38. DeepPoison: Feature Transfer Based Stealthy Poisoning Attack

  39. Policy Teaching via Environment Poisoning:Training-time Adversarial Attacks against Reinforcement Learning

  40. Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features

  41. SPA: Stealthy Poisoning Attack

  42. Backdoor Attack with Sample-Specific Triggers

  43. Explainability Matters: Backdoor Attacks on Medical Imaging

  44. Escaping Backdoor Attack Detection of Deep Learning

  45. Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks

  46. Poisoning Attacks on Cyber Attack Detectors for Industrial Control Systems

  47. Fair Detection of Poisoning Attacks in Federated Learning

  48. Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification*

  49. Stealthy Poisoning Attack on Certified Robustness

  50. Machine Learning with Electronic Health Records is vulnerable to Backdoor Trigger Attacks

  51. Data Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses

  52. Detection of Backdoors in Trained Classifiers Without Access to the Training Set

  53. TROJANZOO: Everything you ever wanted to know about neural backdoors(but were afraid to ask)

  54. HaS-Nets: A Heal and Select Mechanism to Defend DNNs Against Backdoor Attacks for Data Collection Scenarios

  55. DeepSweep: An Evaluation Framework for Mitigating DNN Backdoor Attacks using Data Augmentation

  56. Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder

  57. Strong Data Augmentation Sanitizes Poisoning and Backdoor Attacks Without an Accuracy Tradeoff

  58. BaFFLe: Backdoor detection via Feedback-based Federated Learning

  59. Detecting Backdoors in Neural Networks Using Novel Feature-Based Anomaly Detection

  60. Mitigating Backdoor Attacks in Federated Learning

  61. FaceHack: Triggering backdoored facial recognition systems using facial characteristics

  62. Customizing Triggers with Concealed Data Poisoning

  63. Backdoor Learning: A Survey

  64. Rethinking the Trigger of Backdoor Attack

  65. AEGIS: Exposing Backdoors in Robust Machine Learning Models

  66. Weight Poisoning Attacks on Pre-trained Models

  67. Poisoned classifiers are not only backdoored, they are fundamentally broken

  68. Input-Aware Dynamic Backdoor Attack

  69. Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural Networks for Detection and Training Set Cleansing

  70. BAAAN: Backdoor Attacks Against Autoencoder and GAN-Based Machine Learning Models

  71. Don’t Trigger Me! A Triggerless Backdoor Attack Against Deep Neural Networks

  72. Toward Robustness and Privacy in Federated Learning: Experimenting with Local and Central Differential Privacy

  73. CLEANN: Accelerated Trojan Shield for Embedded Neural Networks

  74. Witches’ Brew: Industrial Scale Data Poisoning via Gradient Matching

  75. Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks

  76. Can Adversarial Weight Perturbations Inject Neural Backdoors?

  77. Trojaning Language Models for Fun and Profit

  78. Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases

  79. Class-Oriented Poisoning Attack

  80. Noise-response Analysis for Rapid Detection of Backdoors in Deep Neural Networks

  81. Cassandra: Detecting Trojaned Networks from Adversarial Perturbations

  82. Backdoor Learning: A Survey

  83. Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review

  84. Live Trojan Attacks on Deep Neural Networks

  85. Odyssey: Creation, Analysis and Detection of Trojan Models

  86. Data Poisoning Attacks Against Federated Learning Systems

  87. Blind Backdoors in Deep Learning Models

  88. Deep Learning Backdoors

  89. Attack of the Tails: Yes, You Really Can Backdoor Federated Learning

  90. Backdoor Attacks on Facial Recognition in the Physical World

  91. Graph Backdoor

  92. Backdoor Attacks to Graph Neural Networks

  93. You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion

  94. Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks

  95. Trembling triggers: exploring the sensitivity of backdoors in DNN-based face recognition

  96. Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks

  97. Adversarial Machine Learning -- Industry Perspectives

  98. ConFoc: Content-Focus Protection Against Trojan Attacks on Neural Networks

  99. Model-Targeted Poisoning Attacks: Provable Convergence and Certified Bounds

  100. Deep Partition Aggregation: Provable Defense against General Poisoning Attacks

  101. The TrojAI Software Framework: An OpenSource tool for Embedding Trojans into Deep Learning Models*

  102. BadNL: Backdoor Attacks Against NLP Models

    Summary
    • Introduces first example of backdoor attacks against NLP models using Char-level, Word-level, and Sentence-level triggers (these different triggers operate on the level of their descriptor)
      • Word-level trigger picks a word from the target model’s dictionary and uses it as a trigger
      • Char-level trigger uses insertion, deletion or replacement to modify a single character in a chosen word’s location (with respect to the sentence, for instance, at the start of each sentence) as the trigger.
      • Sentence-level trigger changes the grammar of the sentence and use this as the trigger
    • Authors impose an additional constraint that requires inserted triggers to not change the sentiment of text input
    • Proposed backdoor attack achieves 100% backdoor accuracy with only a drop of 0.18%, 1.26%, and 0.19% in the models utility, for the IMDB, Amazon, and Stanford Sentiment Treebank datasets
  103. Neural Network Calculator for Designing Trojan Detectors*

  104. Dynamic Backdoor Attacks Against Machine Learning Models

  105. Vulnerabilities of Connectionist AI Applications: Evaluation and Defence

  106. Backdoor Attacks on Federated Meta-Learning

  107. Defending Support Vector Machines against Poisoning Attacks: the Hardness and Algorithm

  108. Backdoors in Neural Models of Source Code

  109. A new measure for overfitting and its implications for backdooring of deep learning

  110. An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks

  111. MetaPoison: Practical General-purpose Clean-label Data Poisoning

  112. Backdooring and Poisoning Neural Networks with Image-Scaling Attacks

  113. Bullseye Polytope: A Scalable Clean-Label Poisoning Attack with Improved Transferability

  114. On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shaping

  115. A Survey on Neural Trojans

  116. STRIP: A Defence Against Trojan Attacks on Deep Neural Networks

    Summary
    • Authors introduce a run-time based trojan detection system called STRIP or STRong Intentional Pertubation which focuses on models in computer vision
    • STRIP works by intentionally perturbing incoming inputs (ie. by image blending) and then measuring entropy to determine whether the model is trojaned or not. Low entropy violates the input-dependance assumption for a clean model and thus indicates corruption
    • Authors validate STRIPs efficacy on MNIST,CIFAR10, and GTSRB acheiveing false acceptance rates of below 1%
  117. TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents

  118. Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection

  119. Regula Sub-rosa: Latent Backdoor Attacks on Deep Neural Networks

  120. Februus: Input Purification Defense Against Trojan Attacks on Deep Neural Network Systems

  121. TBT: Targeted Neural Network Attack with Bit Trojan

  122. Bypassing Backdoor Detection Algorithms in Deep Learning

  123. A backdoor attack against LSTM-based text classification systems

  124. Invisible Backdoor Attacks Against Deep Neural Networks

  125. Detecting AI Trojans Using Meta Neural Analysis

  126. Label-Consistent Backdoor Attacks

  127. Detection of Backdoors in Trained Classifiers Without Access to the Training Set

  128. ABS: Scanning neural networks for back-doors by artificial brain stimulation

  129. NeuronInspect: Detecting Backdoors in Neural Networks via Output Explanations

  130. Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs

  131. Programmable Neural Network Trojan for Pre-Trained Feature Extractor

  132. Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection

  133. TamperNN: Efficient Tampering Detection of Deployed Neural Nets

  134. TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems

  135. Design of intentional backdoors in sequential models

  136. Design and Evaluation of a Multi-Domain Trojan Detection Method on ins Neural Networks

  137. Poison as a Cure: Detecting & Neutralizing Variable-Sized Backdoor Attacks in Deep Neural Networks

  138. Data Poisoning Attacks on Stochastic Bandits

  139. Hidden Trigger Backdoor Attacks

  140. Deep Poisoning Functions: Towards Robust Privacy-safe Image Data Sharing

  141. A new Backdoor Attack in CNNs by training set corruption without label poisoning

  142. Deep k-NN Defense against Clean-label Data Poisoning Attacks

  143. Transferable Clean-Label Poisoning Attacks on Deep Neural Nets

  144. Revealing Backdoors, Post-Training, in DNN Classifiers via Novel Inference on Optimized Perturbations Inducing Group Misclassification

  145. Explaining Vulnerabilities to Adversarial Machine Learning through Visual Analytics

  146. Subpopulation Data Poisoning Attacks

  147. TensorClog: An imperceptible poisoning attack on deep neural network applications

  148. DeepInspect: A black-box trojan detection and mitigation framework for deep neural networks

  149. Resilience of Pruned Neural Network Against Poisoning Attack

  150. Spectrum Data Poisoning with Adversarial Deep Learning

  151. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks

  152. SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems

    Summary
    • Authors develop SentiNet detection framework for locating universal attacks on neural networks
    • SentiNet is ambivalent to the attack vectors and uses model visualization / object detection techniques to extract potential attacks regions from the models input images. The potential attacks regions are identified as being the parts that influence the prediction the most. After extraction, SentiNet applies these regions to benign inputs and uses the original model to analyze the output
    • Authors stress test the SentiNet framework on three different types of attacks— data poisoning attacks, Trojan attacks, and adversarial patches. They are able to show that the framework achieves competitive metrics across all of the attacks (average true positive rate of 96.22% and an average true negative rate of 95.36%)
  153. PoTrojan: powerful neural-level trojan designs in deep learning models

  154. Hardware Trojan Attacks on Neural Networks

  155. Spectral Signatures in Backdoor Attacks

    Summary
    • Identified a "spectral signatures" property of current backdoor attacks which allows the authors to use robust statistics to stop Trojan attacks
    • The "spectral signature" refers to a change in the covariance spectrum of learned feature representations that is left after a network is attacked. This can be detected by using singular value decomposition (SVD). SVD is used to identify which examples to remove from the training set. After these examples are removed the model is retrained on the cleaned dataset and is no longer Trojaned. The authors test this method on the CIFAR 10 image dataset.
  156. Defending Neural Backdoors via Generative Distribution Modeling

  157. Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering

    Summary
    • Proposes Activation Clustering approach to backdoor detection/ removal which analyzes the neural network activations for anomalies and works for both text and images
    • Activation Clustering uses dimensionality techniques (ICA, PCA) on the activations and then clusters them using k-means (k=2) along with a silhouette score metric to separate poisoned from clean clusters
    • Shows that Activation Clustering is successful on three different image/datasets (MNIST, LISA, Rotten Tomatoes) as well as in settings where multiple Trojans are inserted and classes are multi-modal
  158. Model-Reuse Attacks on Deep Learning Systems

  159. How To Backdoor Federated Learning

  160. Trojaning Attack on Neural Networks

  161. Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks

    Summary
    • Proposes neural network poisoning attack that uses "clean labels" which do not require the adversary to mislabel training inputs
    • The paper also presents a optimization based method for generating their poisoning attacks and provides a watermarking strategy for end-to-end attacks that improves the poisoning reliability
    • Authors demonstrate their method by using generated poisoned frog images from the CIFAR dataset to manipulate different kinds of image classifiers
  162. Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks

    Summary
    • Investigate two potential detection methods for backdoor attacks (Fine-tuning and pruning). They find both are insufficient on their own and thus propose a combined detection method which they call "Fine-Pruning"
    • Authors go on to show that on three backdoor techniques "Fine-Pruning" is able to eliminate or reduce Trojans on datasets in the traffic sign, speech, and face recognition domains
  163. Technical Report: When Does Machine Learning FAIL? Generalized Transferability for Evasion and Poisoning Attacks

  164. Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation

  165. Hu-Fu: Hardware and Software Collaborative Attack Framework against Neural Networks

  166. Attack Strength vs. Detectability Dilemma in Adversarial Machine Learning

  167. Data Poisoning Attacks in Contextual Bandits

  168. BEBP: An Poisoning Method Against Machine Learning Based IDSs

  169. Generative Poisoning Attack Method Against Neural Networks

  170. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

    Summary
    • Introduce Trojan Attacks— a type of attack where an adversary can create a maliciously trained network (a backdoored neural network, or a BadNet) that has state-of-the-art performance on the user’s training and validation samples, but behaves badly on specific attacker-chosen inputs
    • Demonstrate backdoors in a more realistic scenario by creating a U.S. street sign classifier that identifies stop signs as speed limits when a special sticker is added to the stop sign
  171. Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization

  172. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

  173. Neural Trojans

  174. Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization

  175. Certified defenses for data poisoning attacks

  176. Data Poisoning Attacks on Factorization-Based Collaborative Filtering

  177. Data poisoning attacks against autoregressive models

  178. Using machine teaching to identify optimal training-set attacks on machine learners

  179. Poisoning Attacks against Support Vector Machines

  180. Backdoor Attacks against Learning Systems

  181. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

  182. Antidote: Understanding and defending against poisoning of anomaly detectors

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published