Skip to content

Latest commit

 

History

History
512 lines (441 loc) · 56.2 KB

README.md

File metadata and controls

512 lines (441 loc) · 56.2 KB

Automatic-Exploit-Generation

软件漏洞发掘是当前的热点问题。尽管模糊测试技术帮助我们解决了程序漏洞的自动发现问题,并行模糊测试平台已经可以高效的发现大量的程序错误,但无论是防御者还是攻击者,都更关心这些程序漏洞或错误是否可能被利用。如何快速分析、评估漏洞的可利用性是当前漏洞发掘与分析的关键问题之一。传统软件漏洞利用主要以手工方式构造,该过程不仅需要具备较为全面的系统底层知识(包括文件格式,汇编代码,操作系统内部机理以及处理器架构等),同时还需要对漏洞机理深入、细致的分析,才可能构造成功的利用。在软件功能越来越复杂,漏洞越来越多样化的趋势下,传统利用方式已难以应对上述挑战。 目前,随着程序分析技术的不断发展,尤其是污点分析、符号执行等技术成功运用在软件动态分析以及软件漏洞挖掘等多个领域后,研究者开始尝试利用这些技术来进行高效的软件漏洞利用自动构造。

--《软件漏洞自动利用研究进展》

Paper List:

Chinese:

English:

对 AEG 方向的经典论文进行梳理,如下图所示:


《中国教育网络》 2016年Z1期   
中科院软件所
和亮,苏璞睿

摘要:

软件漏洞发掘是当前的热点问题。尽管模糊测试技术帮助我们解决了程序漏洞的自动发现问题,并行模糊测试平台已经可以高效的发现大量的程序错误,但无论是防御者还是攻击者,都更关心这些程序漏洞或错误是否可能被利用。如何快速分析、评估漏洞的可利用性是当前漏洞发掘与分析的关键问题之一。传统软件漏洞利用主要以手工方式构造,该过程不仅需要具备较为全面的系统底层知识(包括文件格式,汇编代码,操作系统内部机理以及处理器架构等),同时还需要对漏洞机理深入、细致的分析,才可能构造成功的利用。在软件功能越来越复杂,漏洞越来越多样化的趋势下,传统利用方式已难以应对上述挑战。 目前,随着程序分析技术的不断发展,尤其是污点分析、符号执行等技术成功运用在软件动态分析以及软件漏洞挖掘等多个领域后,研究者开始尝试利用这些技术来进行高效的软件漏洞利用自动构造。


西安电子科技大学  
2017.06  
戴春春、段振华、王小兵(工硕)

摘要:

随着网络的普及,信息的安全性遭到了很大的威胁。通过利用软件漏洞,攻击者可以获取用户存放在手机、电脑、网站上的各种信息。
攻击者和防御者都在寻找软件中的漏洞,前者希望通过利用漏洞攻击系统,后者希望能够修复漏洞和防御攻击。为了证明一个漏洞是高危的,最可靠的方法是为它构造一个漏洞利用。因此,无论是攻击者还是防御者都非常关注漏洞利用相关的研究。
手工的漏洞利用构造过程需要非常丰富的底层知识,包括汇编语言、操作系统、CPU架构等,对操作者的水平有很高的要求,也非常消耗精力和时间。随着软件规模的增加,软件在运行过程中进行了复杂的运算,拥有数目众多的路径分支。而漏洞利用的过程中需要对软件的控制流和数据流进行分析。手工完成这些任务无疑是困难的。
本文提出了一种漏洞利用自动生成算法。对于给定的二进制可执行代码,算法自动发现程序中存在的漏洞,对漏洞进行分析并创建一个精心构造的输入。使用该输入驱动程序,将触发程序中的漏洞,劫持程序的控制流并执行恶意代码。
在本文的算法中,采用模糊测试技术实现了漏洞的自动发现,并将导致程序崩溃的输入记录下来,从而能够分析每一个崩溃对应的漏洞路径。在随后的分析过程中,采用污点分析的方法获取程序的控制流和数据流信息,对受到输入影响的内存布局进行记录。通过分析程序运行中与输入相关的控制流约束,保证程序使用新构造的漏洞利用运行依然能到达漏洞所在地点。通过将跳转指令和shellcode布置到内存中,保证程序能够跳转到恶意代码。
为了能够分析动态生成和下载的代码,算法使用动态二进制插桩框架实现上述过程。算法采用了多种shellcode和跳转指令的组合,增加更多的跳转指令分析,一方面增加了构造出漏洞利用的概率,另一方面也预防针对某种跳转指令的检查。
理论分析与实验结果表明:本文所提出的漏洞利用自动生成算法是有效的。


《信息网络安全》 2017年第5期   
信大  
彭建山,奚琪,王清贤

摘要:

整型溢出漏洞已成为威胁软件安全的第二大类漏洞,现有的整型溢出漏洞挖掘工具不支持自动验证漏洞,且现有的漏洞自动验证工具不支持整型溢出漏洞模式。因此,文章提出了一种二进制程序整型溢出漏洞的自动验证方法以填补这一空白。针对整型溢出漏洞中有价值的 IO2BO 漏洞,为避免程序在缓冲区溢出过程中发生 Crash 导致无法劫持控制流,通过污点分析建立可疑污点集合以缩小待分析污点范围,利用污点回溯技术追踪污点来源,通过符号执行收集内存读写操作的循环条件,控制循环次数以覆盖堆栈关键数据,最后通过约束求解生成新样本,将 IO2BO 漏洞的自动验证问题转化为传统缓冲区溢出漏洞的自动验证。实验证明该方法能够自动验证典型的 IO2BO 漏洞,生成能够劫持控制流并执行任意代码的新样本。


《计算机系统应用》 2017年10月期   
中国科学院大学 中国科学院软件研究所 深圳大学深圳南特商学院  
万云鹏 邓艺 石东辉 程亮 张阳

摘要:

在本文中,我们提出 BAEG,一个自动寻找二进制程序漏洞利用的系统。BAEG 为发现的每一个漏洞产生一个控制流劫持的利用,因此保证了它所发现的漏洞都是安全相关并且可利用的。BAEG 针对输入造成程序崩溃的情况进行分析,面临的挑战主要有两点:

  1. 如何重现崩溃路径,获取崩溃状态;
  2. 如何自动生成控制流劫持利用。
    对于第一点,本论文提出路径导向算法,将崩溃输入作为符号值,重现崩溃路径。对于第二点,我们总结多种控制流劫持的利用原理,建立对应的利用产生模型。此外,对于非法符号读、写操作,BAEG 还可以让程序从崩溃点继续执行,探索程序深层次代码,检测崩溃路径逻辑深处是否还有利用点。

华中科技大学   
2017年5月  
贺玄 李伟明  

摘要:

如何自动化地挖掘二进制程序漏洞并生成漏洞利用代码是当前软件安全领域研究的一个热点。目前的漏洞自动化利用方案都还处于较为初期的阶段,存在漏洞利用类型简单、对生成的 Exploit 没有进一步验证等问题。因此提出基于二进制程序的自动化漏洞利用方法,该方法希望能进一步完善目前的自动化漏洞利用技术。
基于二进制程序的自动化漏洞利用方法分为三个模块:动态符号化执行模块、验证模块和 Exploit 校正模块。动态符号化执行模块负责探索出有漏洞的路径,结合漏洞现场分析其类型,然后判断是否满足利用约束,满足则添加该约束生成 Exploit。可利用的漏洞类型有栈溢出和函数指针覆盖两类,利用方案有代码注入和代码复用两类具体四种利用方案。验证模块负责将产生的 Exploit 提供给程序实际执行,判断 Exploit是否有效。若在验证中发生了崩溃,则将 Exploit 交给 Exploit 校正模块。Exploit 校正模块负责使用动态污点跟踪的方法在汇编指令级跟踪 Exploit 在程序运行中的传播过程,确定崩溃是由 Exploit 中的哪些字节导致的,然后对该字节进行变异、再验证,如此反复直至 Exploit 可用。
根据以上提出的方法,实现了基于二进制程序的自动化漏洞利用的原型系统:以符号化执行引擎 Angr 为平台进行动态符号化执行分析二进制程序,完成漏洞挖掘和利用的步骤;以 Pin 为二进制插桩平台,在汇编指令级进行污点传播指令插桩,进行动态污点跟踪分析,完成自动化校正 Exploit 的步骤。经过对 11 个 CTF 题目样本的测试,原型系统能够对微小型程序全自动地完成漏洞挖掘、利用和自动化校正 Exploit 的工作,证明使用动态污点跟踪方法完成对 Exploit 的校正是有效可行的。


《计算机科学》 2019年2月期   
陆军工程大学指挥控制工程学院  
方 皓 吴礼发 吴志勇

摘要:

Return-to-dl-resolve 是一种可突破复杂防护机制的通用漏洞利用技术,目前主要以手工方式实现,研究人员需要深入分析并理解 ELF动态链接原理,泄露并解析任意库函数的地址,拼装攻击载荷,效率非常低。文中提出了一 种基于符号执行的 Return-to-dl-resolve自动化实现方法,该方法为 ELF可执行文件提供符号执行环境,对程序崩溃点的符号状态进行约束,通过约束求解器对约束进行求解,实现了 Return-to-dl-resolve利用代码自动生成系统 R2dlAEG。 实验结果表明,R2dlAEG 可快速构造利用代码,并能够在 NX 和 ASLR 防护机制同时开启的条件下劫持程序的控制流。


电子科技大学  
2017.06  
张小松、刘路遥

摘要:

近年来互联网恶意攻击事件频发,各大安全平台捕获的攻击样本数量不断增多,样本分析成为了互联网安全研究领域的重点。样本分析中的一个核心环节就是漏洞验证。漏洞验证即验证样本是否利用了软件漏洞进行攻击,具体的验证内容包括漏洞类型和攻击手段。传统的漏洞验证通常采用人工分析的方式,然而人工分析存在效率低下和成本较高的问题,因此研究一种漏洞自动化验证的新方法来缓解这些问题就显得很有意义。针对此需求,本文从动态分析技术出发,提出了对大量样本进行漏洞自动化验证的新方法,并设计和实现了验证系统原型。漏洞自动化验证方法分为环境搭建和验证规则设计两个方面,对此,本文进行了如下研究:
1。本文研究了环境搭建所面临两个主要问题。 一是单一样本进行漏洞验证,所需的环境有什么特征;二是在样本类型复杂的情况下,如何构建方案来满足所需环境复杂的需求。本文归纳出单一样本所需环境的特征,并对其做出形式化描述。在此基础上,提出了漏洞自动化验证的环境搭建方案。该方案采取将样本分发至环境集群的方法,提高环境匹配的成功率。进一步,针对环境集群搭建成本高的问题,本文提出了软件环境集合划分算法,在不降低漏洞验证的准确率前提下,减少了环境搭建的成本。
2。本文研究了缓冲区溢出漏洞和ROP攻击的自动化验证规则。通过分析缓冲区溢出漏洞触发时函数返回地址特征,提出了基于函数返回地址匹配的规则来验证样本是否触发了缓冲区溢出漏洞。通过构造大量的ROP攻击链,分析链中Gadget的长度特征,以及Gadget中Ret指令之前两条指令的行为特征,构建出了基于统计规则和行为规则的ROP攻击混合检测方案。
3。针对不同平台下动态分析技术的优劣,选择基于调试器和动态插桩的技术,搭建环境集群,实现了漏洞自动化验证原型系统。采用实际的攻击样本对系统进行了功能和性能测试,测试结果表明,本验证系统误报率更低性能更好。

Note:

文中所解决的“漏洞自动化验证问题”属于恶意样本分析范畴中的漏洞信息分析环节。分析的信息包括是否利用漏洞,利用漏洞的类型,利用漏洞的方式以及漏洞利用的细节(如漏洞产生地址,产生时上下文信息等)。漏洞验证的方法主要是通过对程序的动态跟踪,分析出程序流程,同时通过对象生存周期分析以及线程上下文分析等手段,获取程序运行时的内存空间,内存权限以及函数栈的调用情况。据此来判定程序是否产生溢出等非法行为。
漏洞验证,是对程序进行逆向分析的一个过程。这个过程的目的是分析出程序是否利用了漏洞以及利用漏洞的方式和利用漏洞的具体信息。与“漏洞利用自动生成”不属于一类问题。


电子科技大学  
2016.06  
杨国武、向琦(工硕)

摘要:

近年来,随着科技的发展,互联网的逐步成熟,互联网安全是人们讨论的热门话题,漏洞作为互联网的核心更是各大公司和科研机构重点研究的对象,首先文章研究了漏洞相关理论和插桩原理,然后提炼出基于插桩技术的漏洞自动化验证平台方案。该平台从功能上主要分为两个版块:漏洞验证控制端和漏洞验证服务端。其中漏洞验证控制端设计采用 MFC 简约界面负责人机交互和漏洞相关参数配置,里面还包含虚拟机管理模块,负责分发样本至对应的虚拟机。漏洞验证服务端里面包含了虚拟机管理集群模块,每台虚拟机内置有漏洞验证服务端,用来自动化验证未知样本漏洞,样本验证之后采用监控文件系统的方式对样本行为进行记录,最后设计漏洞库用来保存样本信息。本文的研究内容主要有以下几个部分:
1。深入调研了各类插桩工具,并比较其优劣,针对本文的 Pin 插桩工具,详细研究了它的工作原理以及插桩规则的编写。
2。在 Linux 环境下提出了基于插桩技术的漏洞自动化验证平台概念,该平台采用 C/S 结构实现自动化功能,即控制端和服务端模式,控制端分为用户交互模块,负责人机交互,下达漏洞验证指令;虚拟机管理模块负责分类处理验证漏洞样本的有效性;通信模块,采用重叠 I/O 技术实现异步交互。服务端分为漏洞触发判定模块,主要负责对未知样本进行插桩验证,漏洞攻击感知模块则是负责监控样本的行为,最后漏洞数据库负责存储漏洞相关信息。
3。虚拟机管理模块主要是负责虚拟机环境的部署,主要分为操作系统版本和应用软件版本,应用软件版本有诸如聊天工具、下载工具、办公工具等。同类软件部署在一个测试环境中,节约资源同时也方便管理。
4。论文在漏洞触发判定模块中,研究了缓冲区溢出漏洞和 ROP 漏洞的触发机制,编写出与之对应的插桩规则,该平台能够很好地对这两类漏洞样本进行验证。此外,漏洞触发判定模块提供漏洞验证接口,以供其他漏洞类型地验证。
5。设计了 Linux 文件检测模块,通过 LKM 技术,实现了对漏洞样本触发漏洞后的文件操作进行监控。
6。建立了漏洞数据库,能够对可以利用的漏洞进行信息录入,供专业人员进行查询和二次利用。


《中国科学院大学学报》 2015年03期   
中国科学院大学 国家计算机网络入侵防范中心
李晓琦,刘奇旭,张玉清

摘要:

针对 Linux 下的内核级提权漏洞,基于模拟攻击的漏洞检测思想,设计并开发漏洞自动利用系统 KernelPET,揭示典型提权漏洞的利用过程,从而为漏洞防御提供支持。KernelPET 系统与主流漏洞库 exploit-db、securityfocus 等衔接,模拟攻击测试近百个提权漏洞,挑选 30 个 经典的 Linux 内核提权漏洞载入 KernelPET 漏洞代码库,并基于不同内核、不同发行版的 Linux 平台测试。实验结果表明,KernelPET 在多类发行版 Linux 系统下具有较好的效果。

Note:

本文侧重点在于“漏洞的自动利用”,并非“漏洞利用的自动生成”。


@article{ 
title={CRAX: Software Crash Analysis for Automatic Exploit Generation by Modeling Attacks as Symbolic Continuations}, 
author={Shih-Kun Huang, Min-Hsiang Huang, Po-Yen Huang}, 
booktitle={IEEE Sixth International Conference on Software Security and Reliability},
year={2012}
}

Abstract:

We present a simple framework capable of automatically generating attacks that exploit control flow hijacking vulnerabilities. We analyze given software crashes and perform symbolic execution in concolic mode, using a whole system environment model. The framework uses an end-to-end approach to generate exploits for various applications, including 16 medium scale benchmark programs, and several large scale applications, such as Mplayer (a media player), Unrar (an archiver) and Foxit(a pdf reader), with stack/heap overflow, off-by-one overflow, use of uninitialized variable, format string vulnerabilities. Notably, these applications have been typically regarded as fuzzing preys, but still require a manual process with security knowledge to produce mitigation-hardened exploits. Using our system to produce exploits is a fully automated and straightforward process for crashed software without source. We produce the exploits within six minutes for medium scale of programs, and as long as 80 minutes for mplayer (about 500,000 LOC), after constraint reductions. Our results demonstrate that the link between software bugs and security vulnerabilities can be automatically bridged.

Notes:

作者实现了一套漏洞利用自动生成系统 CRAX,系统以二进制程序及能够使该程序产生崩溃的 Crash 作为输入,通过基于 S2E 的 concolic 方法进行分析,并最终生成利用程序。

CRAX 能够对以下五种类型的漏洞进行分析,实现了 “Return-to-memory”、“Return-to-libc”、“Jump-to-register” 三种方式的漏洞利用。

作者共设计了五组对比试验用以评估 CRAX 系统的有效性:

  1. 对五种不同类型的漏洞自动生成利用程序;
  2. 能够生成不同类型的漏洞利用,实现与防护机制的对抗,证明系统在真实环境中的有效性;
  3. 对 AEG 中的 16 个真实样本进行测试,实现与 AEG 的横向对比;
  4. 对比原始 concolic 与改进后的 concolic,证明优化的有效性;
  5. 在真实环境下对大型应用程序进行测试;

@article{
title={Automatic Generation of Control Flow Hijacking Exploits for Software Vulnerabilities}, 
author={Sean Heelan}, 
booktitle={Doctoral thesis}, 
year={2009}
}

Abstract:

Software bugs that result in memory corruption are a common and dangerous feature of systems developed in certain programming languages. Such bugs are security vulnerabilities if they can be leveraged by an attacker to trigger the execution of malicious code. Determining if such a possibility exists is a time consuming process and requires technical expertise in a number of areas. Often the only way to be sure that a bug is in fact exploitable by an attacker is to build a complete exploit. It is this process that we seek to automate. We present a novel algorithm that integrates data-flow analysis and a decision procedure with the aim of automatically building exploits. The exploits we generate are constructed to hijack the control flow of an application and redirect it to malicious code. Our algorithm is designed to build exploits for three common classes of security vulnerability; stack-based buffer overflows that corrupt a stored instruction pointer, buffer overflows that corrupt a function pointer, and buffer overflows that corrupt the destination address used by instructions that write to memory. For these vulnerability classes we present a system capable of generating functional exploits in the presence of complex arithmetic modification of inputs and arbitrary constraints. Exploits are generated using dynamic data-flow analysis in combination with a decision procedure. To the best of our knowledge the resulting implementation is the first to demonstrate exploit generation using such techniques. We illustrate its effectiveness on a number of benchmarks including a vulnerability in a large, real-world server application.


@article{ 
title={ SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis },
author={ Shoshitaishvili, Yan and Wang, Ruoyu and Salls, 
Christopher and Stephens, Nick and Polino, Mario and 
Dutcher, Audrey and Grosen, John and Feng, Siji and 
Hauser, Christophe and Kruegel, Christopher and Vigna, 
Giovanni }, 
booktitle={ S&P }, 
year={ 2016 }
}

Abstract:

Finding and exploiting vulnerabilities in binary code is a challenging task. The lack of high-level, semantically rich information about data structures and control constructs makes the analysis of program properties harder to scale. However, the importance of binary analysis is on the rise. In many situations binary analysis is the only possible way to prove (or disprove) properties about the code that is actually executed. In this paper, we present a binary analysis framework that implements a number of analysis techniques that have been proposed in the past. We present a systematized implementation of these techniques, which allows other researchers to compose them and develop new approaches. In addition, the implementation of these techniques in a unifying framework allows for the direct comparison of these approaches and the identification of their advantages and disadvantages. The evaluation included in this paper is performed using a recent dataset created by DARPA for evaluating the effectiveness of binary vulnerability analysis techniques. Our framework has been open-sourced and is available to the security community.


@article{ 
title={ Software Crash Analysis for Automatic Exploit Generation on Binary Programs }, 
author={Shih-Kun Huang, Min-Hsiang Huang, Po-Yen Huang, Chung-Wei Lai }, 
booktitle={ IEEE Transactions on Reliability },  
year={2014} 
}

Abstract:

This paper presents a new method, capable of automatically generating attacks on binary programs from software crashes. We analyze software crashes with a symbolic failure model by performing concolic executions following the failure directed paths, using a whole system environment model and concrete address mapped symbolic memory in S2E. We propose a new selective symbolic input method and lazy evaluation on pseudo symbolic variables to handle symbolic pointers and speed up the process. This is an end-to-end approach able to create exploits from crash inputs or existing exploits for various applications, including most of the existing benchmark programs, and several large scale applications, such as a word processor (Microsoft office word), a media player (mpalyer), an archiver (unrar), or a pdf reader (foxit). We can deal with vulnerability types including stack and heap overflows, format string, and the use of uninitialized variables. Notably, these applications have become software fuzz testing targets, but still require a manual process with security knowledge to produce mitigation-hardened exploits. Using this method to generate exploits is an automated process for software failures without source code. The proposed method is simpler, more general, faster, and can be scaled to larger programs than existing systems. We produce the exploits within one minute for most of the benchmark programs, including mplayer. We also transform existing exploits of Microsoft office word into new exploits within four minutes. The best speedup is 7,211 times faster than the initial attempt. For heap overflow vulnerability, we can automatically exploit the unlink() macro of glibc, which formerly requires sophisticated hacking efforts.


@article{ 
title={FUZE: Towards Facilitating Exploit Generation for Kernel Use-After-Free Vulnerabilities}, 
author={ Wei Wu, Yueqi Chen, Jun Xu, Xinyu Xing, Xiaorui Gong, Wei Zou] }, 
booktitle={USENIX}, 
year={2018}
}

Abstract:

Software vendors usually prioritize their bug remediation based on ease of their exploitation. However, accurately determining exploitability typically takes tremendous hours and requires significant manual efforts. To address this issue, automated exploit generation techniques can be adopted. In practice, they however exhibit an insufficient ability to evaluate exploitability particularly for the kernel Use-After-Free (UAF) vulnerabilities. This is mainly because of the complexity of UAF exploitation as well as the scalability of an OS kernel. In this paper, we therefore propose FUZE, a new framework to facilitate the process of kernel UAF exploitation. The design principle behind this technique is that we expect the ease of crafting an exploit could augment a security analyst with the ability to evaluate the exploitability of a kernel UAF vulnerability. Technically, FUZE utilizes kernel fuzzing along with symbolic execution to identify, analyze and evaluate the system calls valuable and useful for kernel UAF exploitation. In addition, it leverages dynamic tracing and an off-the-shelf constraint solver to guide the manipulation of vulnerable object. To demonstrate the utility of FUZE, we implement FUZE on a 64-bit Linux system by extending a binary analysis framework and a kernel fuzzer. Using 15 realworld kernel UAF vulnerabilities on Linux systems, we then demonstrate FUZE could not only escalate kernel UAF exploitability but also diversify working exploits. In addition, we show that FUZE could facilitate security mitigation bypassing, making exploitability evaluation less challenging and more efficient.


@article{ 
title={Automated Exploit Generation for Stack Buffer Overflow Vulnerabilities}, 
author={ V. A. Padaryan, V. V. Kaushan, A. N. Fedotov },  booktitle={ PROGRAMMING AND COMPUTER SOFTWARE }, 
year={2015}
}

Abstract:

An automated method for exploit generation is presented. This method allows one to construct exploits for stack buffer overflow vulnerabilities and to prioritize software bugs. The method is based on the dynamic analysis and symbolic execution of programs. It could be applied to program binaries and does not require debug information. The proposed method was used to develop a tool for exploit generation. This tool was used to generate exploits for eight vulnerabilities in Linux and Windows programs, of which three were not fixed at the time this paper was written.

Notes:

Presently, a lot of solvers are available, such as MiniSat, OpenSMT, STP, Yices, Z3, and others. We used the Z3 due to the following advantages:

  • incremental approach to the solution of equations;
  • support of many data types, including machinelevel data types;
  • there is a C API that allows one to directly invoke the equation solver, which is much more efficient than the work with text input;
  • the source code under the MSR-LA license is available;
  • it is faster than other solvers.

@article{ 
title={AEG: Automatic Exploit Generation}, 
author={Thanassis Avgerinos, Sang Kil Cha, Brent Lim Tze Hao, David Brumley}, 
booktitle={NDSS}, 
year={2011}
}

Abstract:

The automatic exploit generation challenge is given a program, automatically find vulnerabilities and generate exploits for them. In this paper we present AEG, the first end-to-end system for fully automatic exploit generation. We used AEG to analyze 14 open-source projects and successfully generated 16 control flow hijacking exploits. Two of the generated exploits (expect-5.43 and htget-0.93) are zero-day exploits against unknown vulnerabilities. Our contributions are:

  1. we show how exploit generation for control flow hijack attacks can be modeled as a formal verification problem,
  2. we propose preconditioned symbolic execution, a novel technique for targeting symbolic execution,
  3. we present a general approach for generating working exploits once a bug is found, and
  4. we build the first end-to-end system that automatically finds vulnerabilities and generates exploits that produce a shell.

Comment:

AEG 为基于源码的自动利用方案:为了克服 APEG 对于补丁依赖以及无法构造控制流劫持的缺陷,在 2011 年的 NDSS 会议上,T.Avgerinos 等人首次提出了一种有效的漏洞自动挖掘和利用方法 AEG。该方法的核心思想是借助程序验证技术找出能够满足使得程序进入非安全状态且可被利用的输入,其中非安全状态包括内存越界写、恶意的格式化字符串等,可被利用主要是指程序的 EIP 被任意操纵。其具体流程为:首先,在预处理阶段,利用 GNU C 编译器构建二进制程序以及通过 LLVM 生成所需的字节码信息;其次,在实际分析的过程中,AEG 首先通过源码分析以及符号执行找出存在错误的位置,并通过路径约束条件生成相应的输入;之后,AEG 利用动态分析方法提取程序运行时的各类信息,例如栈上脆弱缓冲区的地址、脆弱函数的返回地址以及在漏洞触发之前的其他环境数据等;随后,综合漏洞利用约束条件以及动态运行时环境信息,最终构建可利用样本。通过对 14 组真实程序漏洞的自动利用实验,证明了该方法的可靠性和有效性。 AEG 集成了优化后的符号执行和动态指令插装技术,实现了从软件漏洞自动挖掘到软件漏洞自动利用的整个过程,并且生成的利用样本直接具备控制流劫持能力,是第一个真正意义上的面向控制流漏洞利用的自动化构建方案。该方案的局限性主要体现在:首先,该方案需要依赖源代码进行程序错误搜索;其次,所构造的利用样本主要是面向栈溢出或者字符串格式化漏洞,并且利用样本受限于编译器和动态运行环境等因素。

--《软件漏洞自动利用研究进展》


@article{ 
title={Automatic Exploit Generation}, 
author={Thanassis Avgerinos, Sang Kil Cha}, 
booktitle={communications of the acm}, 
year={2014}
}

Abstract:

Attackers commonly exploit buggy programs to break into computers. Security-critical bugs pave the way for attackers to install trojans, propagate worms, and use victim computers to send spam and launch denial-of-service attacks. A direct way, therefore, to make computers more secure is to find securitycritical bugs before they are exploited by attackers. Unfortunately, bugs are plentiful. For example, the Ubuntu Linux bug-management database listed more than 103,000 open bugs as of January 2013. Specific widely used programs (such as the Firefox Web browser and the Linux 3.x kernel) list 7,597 and 1,293 open bugs in their public bug trackers, respectively. Other projects, including those that are closed-source, likely involve similar statistics. These are just the bugs we know; there is always the persistent threat of zero-day exploits, or attacks against previously unknown bugs. Among the thousands of known bugs, which should software developers fix first? Which are exploitable?


@article{ 
title={Unleashing MAYHEM on Binary Code}, 
author={Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert and David Brumley}, 
booktitle={S&P}, 
year={2012}
}

Abstract:

In this paper we present MAYHEM, a new system for automatically finding exploitable bugs in binary (i.e., executable) programs. Every bug reported by MAYHEM is accompanied by a working shell-spawning exploit. The working exploits ensure soundness and that each bug report is security-critical and actionable. MAYHEM works on raw binary code without debugging information. To make exploit generation possible at the binary-level, MAYHEM addresses two major technical challenges: actively managing execution paths without exhausting memory, and reasoning about symbolic memory indices, where a load or a store address depends on user input. To this end, we propose two novel techniques:

  1. hybrid symbolic execution for combining online and offline (concolic) execution to maximize the benefits of both techniques, and
  2. index-based memory modeling, a technique that allows MAYHEM to efficiently reason about symbolic memory at the binary level.
    We used MAYHEM to find and demonstrate 29 exploitable vulnerabilities in both Linux and Windows programs, 2 of which were previously undocumented.

Comment:

基于二进制的自动利用方案:为了摆脱对源代码的依赖以及保证系统适用场景的广泛性,S.K.Cha 等人在 2012 年的 IEEE S&P 会议上提出了基于二进制程序的漏洞利用自动生成方法 Mayhem。该方法通过综合利用在线式符号执行的速度优势和离线式符号执行的内存低消耗特点,并通过基于索引的内存模型构建,进而实现较为实用化的漏洞挖掘与利用自动生成方法。其具体流程如下:首先,通过构建两个并行的符号执行子系统,具体执行和符号执行子系统;其次,对于具体执行子系统,通过引入污点传播技术,寻找程序执行过程中,由用户输入所能控制的所有 jmp 指令或者 call 指令,并将其作为 bug 候选项交给符号执行子系统;之后,符号执行系统将所有接收到的污点指令转化为中间指令,并进行执行路径约束构建和可利用约束构建;最后,符号执行系统通过约束求解器来寻找满足路径可达条件和漏洞可利用条件的利用样本。 在实际进行符号执行的过程中,为了保证效率问题,Mayhem 系统使用了一种基于索引的内存模型用来优化处理符号化内存的加载问题,进而使其成为一种高使用性的漏洞自动利用方案。目前 Mayhem 的局限性主要集中在以下三个方面:首先,系统只能建模部分系统或者库函数,因此无法高效处理大型程序;其次,系统无法处理多线程交互问题,例如消息传递和共享内存问题;最后,由于使用了污点传播方法,同样具有漏传和误传等典型问题。

--By:《软件漏洞自动利用研究进展》


@article{ 
title={Q: Exploit Hardening Made Easy}, 
author={Edward J. Schwartz, Thanassis Avgerinos and David Brumley}, 
booktitle={USENIX Security}, 
year={2011}
}

Abstract:

Prior work has shown that return oriented programming (ROP) can be used to bypass W⊕X, a software defense that stops shellcode, by reusing instructions from large libraries such as libc. Modern operating systems have since enabled address randomization (ASLR), which randomizes the location of libc, making these techniques unusable in practice. However, modern ASLR implementations leave smaller amounts of executable code unrandomized and it has been unclear whether an attacker can use these small code fragments to construct payloads in the general case. In this paper, we show defenses as currently deployed can be bypassed with new techniques for automatically creating ROP payloads from small amounts of unrandomized code. We propose using semantic program verification techniques for identifying the functionality of gadgets, and design a ROP compiler that is resistant to missing gadget types. To demonstrate our techniques, we build Q, an end-to-end system that automatically generates ROP payloads for a given binary. Q can produce payloads for 80% of Linux /usr/bin programs larger than 20KB. We also show that Q can automatically perform exploit hardening: given an exploit that crashes with defenses on, Q outputs an exploit that bypasses both W⊕X and ASLR. We show that Q can harden nine realworld Linux and Windows exploits, enabling an attacker to automatically bypass defenses as deployed by industry for those programs.

Comment:

ROP 代码自动生成方案:为了解决数据执行保护和地址随机化给控制流劫持类漏洞利用带来的困扰,在 2011 年的 USENIX Security 会议上,E.J.Schwartz 等人实现了一套面向高可靠性漏洞利用的 ROP 代码自动生成方法 Q。其核心思想是收集目标程序中的 Gadget 并通过面向 Gadget 的编程语言自动构建 ROP。具体的流程主要如下:首先,向 Q 提供未随机化的脆弱程序或者其他二进制库,并由 Q 找出具备特定功能的 Gadget 集合;其次,利用 Q 提供的编程语言 QooL 实现满足特定语义功能的目标代码,并通过 Q 将目标代码编译为面向 Gadget 的指令序列;随后,通过利用已获取的 Gadget 集合填充上一步得到的指令序列,从而形成最终的 ROP 代码。通过对 9 个真实软件漏洞的实验,可以看到在开启数据执行保护和地址随机化功能后,通过 Q 仍然可以保证这些漏洞利用代码稳定执行。 Q 方案证明了在含有少量未随机化代码的系统中仍可以有效自动构建 ROP 代码,进而强化了面向控制流劫持类漏洞利用在真实环境下的攻击效果。Q 方案本身的局限性主要体现在:首先,Q 方案未考虑自动构建不含 ret 指令的 ROP;其次,Q 方案仅从实际应用效果出发,没有考虑满足图灵完备性。

--By:《软件漏洞自动利用研究进展》


@article{ 
title={Automatic Patch-Based Exploit Generation is Possible:Techniques and Implications}, 
author={David Brumley, Dawn Song, Jiang Zheng}, 
booktitle={S&P}, 
year={2008}
}

Abstract:

The automatic patch-based exploit generation problem is: given a program P and a patched version of the program P′, automatically generate an exploit for the potentially unknown vulnerability present in P but fixed in P′. In this paper, we propose techniques for automatic patch-based exploit generation, and show that our techniques can automatically generate exploits for 5 Microsoft programs based upon patches provided via Windows Update. Although our techniques may not work in all cases, a fundamental tenant of security is to conservatively estimate the capabilities of attackers. Thus, our results indicate that automatic patch-based exploit generation should be considered practical. One important security implication of our results is that current patch distribution schemes which stagger patch distribution over long time periods, such as Windows Update, may allow attackers who receive the patch first to compromise the significant fraction of vulnerable hosts who have not yet received the patch.

Comment:

在 2008 年的 IEEE S&P 会议上,D.Brumley 等人首次提出了基于二进制补丁比较的漏洞 利用自动生成方法 APEG。其核心思路是基于以下的假设条件,即补丁程序中增加了对触发原程序崩溃的过滤条件。因此,只要能够找到补丁程序中添加过滤条件的位置,同时构造 不满足过滤条件的“违规”输入,即可认为是原始程序的一个可利用的输入候选项。根据其具体介绍内容可知,该工作主要分为三个步骤:首先,利用二进制差异比较工具(例如 BinDiff 与 EBDS 等)找到补丁存在的位置,即补丁程序的检测点;其次,找出不满足补丁程序检测点的输入数据作为原始程序的利用候选项;最后,利用污点传播等监控方法筛选所有能够对原始程序造成溢出或者控制流劫持等崩溃发生的有效利用。根据对微软所发布的多个补丁程序的实验结果表明,该方法具有较强的可靠性和实用性。 APEG 是对漏洞利用自动化构建的首次尝试,虽然核心思想较为简单,但由于其具有很强的可操作性,因此也得到了其他研究人员的普遍认可。然而 APEG 的局限性主要体现在两个方面:首先,该方法无法处理补丁程序中不添加过滤判断的情况,例如,为了修复缓冲区溢出而增加缓冲区长度的补丁程序;其次,从实际利用效果来看,所构造的利用类型主要属于拒绝服务,即只能造成原程序的崩溃,而无法造成直接的控制流劫持。

-- By 《软件漏洞自动利用研究进展》


@article{ 
title={Revery: From Proof-of-Concept to Exploitable}, 
author={Yan Wang, Chao Zhang}, 
booktitle={CCS}, 
year={2018}
}

Abstract:

Automatic exploit generation is an open challenge. Existing solutions usually explore in depth the crashing paths, i.e., paths taken by proof-of-concept (PoC) inputs triggering vulnerabilities, and generate exploits when exploitable states are found along the paths. However, exploitable states do not always exist in crashing paths. Moreover, existing solutions heavily rely on symbolic execution and are not scalable in path exploration and exploit generation. In addition, few solutions could exploit heap-based vulnerabilities. In this paper, we propose a new solution Revery to search for exploitable states in paths diverging from crashing paths, and generate control-flow hijacking exploits for heap-based vulnerabilities. It adopts three novel techniques: (1) a layout-contributor digraph to characterize a vulnerability’s memory layout and its contributor instructions; (2) a layout-oriented fuzzing solution to explore diverging paths, which have similar memory layouts as the crashing paths, in order to search more exploitable states and generate corresponding diverging inputs; (3) a control-flow stitching solution to stitch crashing paths and diverging paths together, and synthesize EXP inputs able to trigger both vulnerabilities and exploitable states.
We have developed a prototype of Revery based on the binary analysis engine angr, and evaluated it on a set of 19 CTF (capture the flag) programs. Experiment results showed that it could generate exploits for 9 (47%) of them, and generate EXP inputs able to trigger exploitable states for another 5 (26%) of them.


@article{ 
title={Automatic Polymorphic Exploit Generation for Software Vulnerabilities}, 
author={Minghua Wang, Purui Su, Qi Li, Lingyun Ying, Yi Yang, Dengguo Feng}, 
year={2013}
}

Abstract:

Generating exploits from the perspective of attackers is an effective approach towards severity analysis of known vulnerabilities. However, it remains an open problem to generate even one exploit using a program binary and a known abnormal input that crashes the program, not to mention multiple exploits. To address this issue, in this paper, we propose PolyAEG, a system that automatically generates multiple exploits for a vulnerable program using one corresponding abnormal input. To generate polymorphic exploits, we fully leverage different trampoline instructions to hijack control flow and redirect it to malicious code in the execution context. We demonstrate that, given a vulnerable program and one of its abnormal inputs, our system can generate polymorphic exploits for the program. We have successfully generated control flow hijacking exploits for 8 programs in our experiment. Particularly, we have generated 4,724 exploits using only one abnormal input for IrfanView, a widely used picture viewer.


@article{ 
title={Exploit Generation for Information Flow Leaks in Object-Oriented Programs}, 
author={Quoc Huy Do, Richard Bubel, Reiner}, 
booktitle={IFIP}, 
year={2015}
}

Abstract:

We present a method to generate automatically exploits for information flow leaks in object-oriented programs. Our approach combines self-composition and symbolic execution to compose an insecurity formula for a given information flow policy and a specification of the security level of the program locations. The insecurity formula gives then rise to a model which is used to generate input data for the exploit. A prototype tool called KEG implementing the described approach for Java programs has been developed, which generates exploits as executable JUnit tests.


@article{ 
title={Survey of Automated Vulnerability Detection and Exploit Generation Techniques in Cyber Reasoning Systems}, 
author={Teresa Nicole Brooks}, 
booktitle={AISC}, 
year={2018}
}

Abstract:

Software is everywhere, from mission critical systems such as industrial power stations, pacemakers and even household appliances. This growing dependence on technology and the increasing complexity of software has serious security implications as it means we are potentially surrounded by software that contains exploitable vulnerabilities. These challenges have made binary analysis an important area of research in computer science and has emphasized the need for building automated analysis systems that can operate at scale, speed and efficiency; all while performing with the skill of a human expert. Though great progress has been made in this area of research, there remains limitations and open challenges to be addressed. Recognizing this need, DARPA sponsored the Cyber Grand Challenge (CGC), a competition to showcase the current state of the art in systems that perform; automated vulnerability detection, exploit generation and software patching. This paper is a survey of the vulnerability detection and exploit generation techniques, underlying technologies and related works of two of the winning systems Mayhem and Mechanical Phish.


@article{ 
title={Towards Automated Exploit Generation for Embedded Systems}, 
author={Matthew Ruffell, Jin B. Hong, Hyoungshick Kim, Dong Seong Kim}, 
booktitle={LNCS}, 
year={2017}
}

Abstract:

Manual vulnerability discovery and exploit development on an executable are very challenging tasks for developers. Therefore, the automation of those tasks is becoming interesting in the field of software security. In this paper, we implement an approach of automated exploit generation for firmware of embedded systems by extending an existing dynamic analysis framework called Avatar. Embedded systems occupy a significant portion of the market but lack typical security features found on general purpose computers, making them prone to critical vulnerabilities. We discuss several techniques to automatically discover vulnerabilities and generate exploits for embedded systems, and evaluate our proposed approach by generating exploits for two vulnerable firmware written for a popular ARM Cortex-M3 microcontroller.


@article{ 
title={Transformation-aware Exploit Generation using a HI- CFG}, 
author={Dan Caselden, Alex Bazhanyuk, Mathias Payer, Dawn Song}, 
year={2013}
}

Abstract:

A common task for security analysts is to determine whether potentially unsafe code constructs (as found by static analysis or code review) can be triggered by an attackercontrolled input to the program under analysis. We refer to this problem as proof-of-concept (POC) exploit generation. Exploit generation is challenging to automate because it requires precise reasoning across a large code base; in practice it is usually a manual task. An intuitive approach to exploit generation is to break down a program?s relevant computation into a sequence of transformations that map an input value into the value that can trigger an exploit. We automate this intuition by describing an approach to discover the buffer structure (the chain of buffers used between transformations) of a program, and use this structure to construct an exploit input by inverting one transformation at a time. We propose a new program representation, a hybrid information- and control-flow graph (HI-CFG), and give algorithms to build a HI-CFG from instruction traces. We then describe how to guide program exploration using symbolic execution to efficiently search for transformation pre-images. We implement our techniques in a tool that operates on applications in x86 binary form. In two case studies we discuss how our tool creates POC exploits for (i) a vulnerability in a PDF rendering library that is reachable through multiple different transformation stages and (ii) a vulnerability in the processing stage of a specific document format in AbiWord.


@article{ 
title={System Service Call-oriented Symbolic Execution of Android Framework with Applications to Vulnerability Discovery and Exploit Generation}, 
author={Lannan Luo, Qiang Zeng, Chen Cao, Kai Chen, Jian Liu}, 
booktitle={MobiSys’17}, 
year={2017}
}

Abstract:

Android Application Framework is an integral and foundational part of the Android system. Each of the 1.4 billion Android devices relies on the system services of Android Framework to manage applications and system resources. Given its critical role, a vulnerability in the framework can be exploited to launch large-scale cyber attacks and cause severe harms to user security and privacy. Recently, many vulnerabilities in Android Framework were exposed, showing that it is vulnerable and exploitable. However, most of the existing research has been limited to analyzing Android applications, while there are very few techniques and tools developed for analyzing Android Framework. In particular, to our knowledge, there is no previous work that analyzes the framework through symbolic execution, an approach that has proven to be very powerful for vulnerability discovery and exploit generation. We design and build the first system, Centaur, that enables symbolic execution of Android Framework. Due to some unique characteristics of the framework, such as its middleware nature and extraordinary complexity, many new challenges arise and are tackled in Centaur. In addition, we demonstrate how the system can be applied to discovering new vulnerability instances, which can be exploited by several recently uncovered attacks against the framework, and to generating PoC exploits.