Binary similarity has a rich body, and there are many ways to classify existing work. We classify them by their approaches. Meanwhile, we put extra tags to help you look for other classification.
- Baker, Brenda S., Udi Manber, and Robert Muth. "Compressing differences of executable code." ACMSIGPLAN Workshop on Compiler Support for System Software (WCSS). 1999.
- Wang, Zheng, Ken Pierce, and Scott McFarling. "Bmat-a binary matching tool." Feedback-Directed Optimization (FDO2) (1999).
- Dullien, Thomas, and Rolf Rolles. "Graph-based comparison of executable objects (english version)." Sstic 5.1 (2005): 3.
- Sæbjørnsen, Andreas, et al. "Detecting code clones in binary executables." Proceedings of the eighteenth international symposium on Software testing and analysis. 2009.
- Hu, Xin, Tzi-cker Chiueh, and Kang G. Shin. "Large-scale malware indexing using function-call graphs." Proceedings of the 16th ACM conference on Computer and communications security. 2009.
- Santos, Igor, et al. "Idea: Opcode-sequence-based malware detection." International Symposium on Engineering Secure Software and Systems. Springer, Berlin, Heidelberg, 2010.
- Kang, Boojoong, et al. "Malware classification method via binary content comparison." Proceedings of the 2012 ACM Research in Applied Computation Symposium. 2012.
- Khoo, Wei Ming, Alan Mycroft, and Ross Anderson. "Rendezvous: A search engine for binary code." 2013 10th Working Conference on Mining Software Repositories (MSR). IEEE, 2013.
Feature-based learning
Hu, Xin, et al. "Mutantx-s: Scalable malware clustering based on static features." 2013 {USENIX} Annual Technical Conference ({USENIX}{ATC} 13). 2013.Feature-based learning
Jang, Jiyong, Maverick Woo, and David Brumley. "Towards automatic software lineage inference." 22nd {USENIX} Security Symposium ({USENIX} Security 13). 2013.- Farhadi, Mohammad Reza, et al. "Binclone: Detecting code clones in malware." 2014 Eighth International Conference on Software Security and Reliability (SERE). IEEE, 2014.
- Ding, Steven HH, Benjamin CM Fung, and Philippe Charland. "Kam1n0: Mapreduce-based assembly clone search for reverse engineering." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016.
- Huang, He, Amr M. Youssef, and Mourad Debbabi. "Binsequence: fast, accurate and scalable binary code reuse detection." Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. 2017.
- Kruegel, Christopher, et al. "Polymorphic worm detection using structural information of executables." International Workshop on Recent Advances in Intrusion Detection. Springer, Berlin, Heidelberg, 2005.
- Bruschi, Danilo, Lorenzo Martignoni, and Mattia Monga. "Detecting self-mutating malware using control-flow graph matching." International conference on detection of intrusions and malware, and vulnerability assessment. Springer, Berlin, Heidelberg, 2006.
- Gao, Debin, Michael K. Reiter, and Dawn Song. "Binhunt: Automatically finding semantic differences in binary programs." International Conference on Information and Communications Security. Springer, Berlin, Heidelberg, 2008.
- Ming, Jiang, Meng Pan, and Debin Gao. "iBinHunt: Binary hunting with inter-procedural control flow." International Conference on Information Security and Cryptology. Springer, Berlin, Heidelberg, 2012.
- Lindorfer, Martina, et al. "Lines of malicious code: insights into the malicious software industry." Proceedings of the 28th Annual Computer Security Applications Conference. 2012.
Feature-based learning
Jin, Wesley, et al. "Binary function clustering using semantic hashes." 2012 11th International Conference on Machine Learning and Applications. Vol. 1. IEEE, 2012.BinJuice
Lakhotia, Arun, Mila Dalla Preda, and Roberto Giacobazzi. "Fast location of similar code fragments using semantic'juice'." Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop. 2013.Blex
Egele, Manuel, et al. "Blanket execution: Dynamic similarity testing for program binaries and components." 23rd {USENIX} Security Symposium ({USENIX} Security 14). 2014.CoP
Luo, Lannan, et al. "Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection." Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 2014; Luo, Lannan, et al. "Semantics-based obfuscation-resilient binary code similarity comparison with applications to software and algorithm plagiarism detection." IEEE Transactions on Software Engineering 43.12 (2017): 1157-1177.- Pewny, Jannik, et al. "Leveraging semantic signatures for bug search in binary programs." Proceedings of the 30th Annual Computer Security Applications Conference. 2014.
- Ming, Jiang, Dongpeng Xu, and Dinghao Wu. "Memoized semantics-based binary diffing with application to malware lineage inference." IFIP International Information Security and Privacy Conference. Springer, Cham, 2015.
Cross-architecture
Pewny, Jannik, et al. "Cross-architecture bug search in binary executables." 2015 IEEE Symposium on Security and Privacy. IEEE, 2015.Cross-architecture
Hu, Yikun, et al. "Cross-architecture binary semantics understanding via similar code comparison." 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER). Vol. 1. IEEE, 2016.- David, Yaniv, Nimrod Partush, and Eran Yahav. "Statistical similarity of binaries." ACM SIGPLAN Notices 51.6 (2016): 266-280.
Cross-architecture
Cross-platform
Chandramohan, Mahinthan, et al. "Bingo: Cross-architecture cross-os binary search." Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 2016.Cross-platform
Feng, Qian, et al. "Extracting conditional formulas for cross-platform bug search." Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. 2017.Cross-architecture
Cross-compiler/optimization
Hu, Yikun, et al. "Binary code clone detection across architectures and compiling configurations." 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC). IEEE, 2017.Cross-architecture
GITZ
David, Yaniv, Nimrod Partush, and Eran Yahav. "Similarity of binaries through re-optimization." Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. 2017.- Ming, Jiang, et al. "Binsim: Trace-based semantic binary diffing via system call sliced segment equivalence checking." 26th {USENIX} Security Symposium ({USENIX} Security 17). 2017.
- Kargén, Ulf, and Nahid Shahmehri. "Towards robust instruction-level trace alignment of binary code." 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2017.
Learning
Wang, Shuai, and Dinghao Wu. "In-memory fuzzing for binary code similarity analysis." 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2017.- Alrabaee, Saed, et al. "Fossil: a resilient and efficient system for identifying foss functions in malware binaries." ACM Transactions on Privacy and Security (TOPS) 21.2 (2018): 1-34.
Cross-architecture
David, Yaniv, Nimrod Partush, and Eran Yahav. "Firmup: Precise static detection of common vulnerabilities in firmware." ACM SIGPLAN Notices 53.2 (2018): 392-404.
- Ng, Beng Heng, and Atul Prakash. "Expose: Discovering potential binary code re-use." 2013 IEEE 37th Annual Computer Software and Applications Conference. IEEE, 2013.
- David, Yaniv, and Eran Yahav. "Tracelet-based code search in executables." Acm Sigplan Notices 49.6 (2014): 349-360.- David, Yaniv, and Eran Yahav. "Tracelet-based code search in executables." Acm Sigplan Notices 49.6 (2014): 349-360.
- Xu, Zhengzi, et al. "SPAIN: security patch analysis for binaries towards understanding the pain and pills." 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 2017.
- Ruttenberg, Brian, et al. "Identifying shared software components to support malware forensics." International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, Cham, 2014.
- Feng, Qian, et al. "Scalable graph-based bug search for firmware images." Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 2016.
- Gao, Jian, et al. "VulSeeker: a semantic learning based vulnerability seeker for cross-platform binary." 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2018.
- Lageman, Nathaniel, et al. "BinDNN: Resilient Function Matching Using Deep Learning." International Conference on Security and Privacy in Communication Systems. Springer, Cham, 2016.
- Xu, Xiaojun, et al. "Neural network-based graph embedding for cross-platform binary code similarity detection." Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017.
- Liu, Bingchang, et al. "αdiff: cross-version binary code similarity detection with dnn." Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 2018.
Cross-architecture
Redmond, Kimberly, Lannan Luo, and Qiang Zeng. "A cross-architecture instruction embedding model for natural language processing-inspired binary code analysis." arXiv preprint arXiv:1812.09652 (2018).InnerEye
Zuo, Fei, et al. "Neural machine translation inspired binary code similarity comparison beyond function pairs." in Network and Distributed System Security Symposium, 2019.- Ding, Steven HH, Benjamin CM Fung, and Philippe Charland. "Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization." 2019 IEEE Symposium on Security and Privacy (SP). IEEE, 2019.
- Massarelli, Luca, et al. "Safe: Self-attentive function embeddings for binary similarity." International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, Cham, 2019.
- Duan, Yue, et al. "DEEPBINDIFF: Learning Program-Wide Code Representations for Binary Diffing." in Network and Distributed System Security Symposium, 2020.
- Kexin, Zhou, et al. "TREX: Learning Execution Semantics from Micro-Traces for Binary Similarity" in 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 2021.