Getting Started with Exploit Development
tl;dr The rest of this goes into detail about what topics matter and why from each resource, but if you want to cut to the chase and ignore that...
- Prerequisites
- C programming language
- x86 Assembly (64bit)
- Linux terminal usage
- Pwn College Fundamentals
- Pwn College - CSE 466 - Learn the fundamentals of modern stack-based buffer overflows.
- ROP Emporium - Return Oriented Programming (ROP) is one of the most influential exploitation techniques around right now.
- Open Security Training - Vulnerabilities 1001 - Gain exposure to corruptions beyond "buffer overflows".
- Pwn College - CSE 598 - Spring 2024 - Practice exploiting those other types of corruptions
- Weird Machines - This is to get the metal shift and understand how to reason about modern exploits
- Continued Learning
- Guyinatuxedo's Nightmare - Has example challenges that cover a variety of bug classes and exploit concepts. Good for some targeted learning.
- CTF To Real World - Blog series about moving onto real-world targets
FAQ
What is Exploit Development?
Exploit development, specifically "binary exploitation," involves taking a memory corruption bug in software and turning it into something useful, like arbitrary code execution. Unlike higher-level exploits like SQL injection, binary exploitation requires a more intricate development process, akin to reprogramming the software for your purposes. As higher-level exploits get more complicated though the term is being adopted more widely and I expect that trend to continue.
Isn't exploit development a dying art? Is it still worth learning?
Exploit development is a niche skill that stands out even among hackers, often seen as a form of black magic. While there is a higher demand for addressing high-level security issues, memory corruption bugs remain crucial due to their potential severity. I cannot tell you if its worth learning for you, but I can say that I still see a future in this work. For some thoughts on the future of exploitation, you can check out my discussion with Specter from March 2024: Future of Exploitation Development.
I want to learn Windows exploitation, is all of this Linux?
Exploitation of memory corruption bugs is fundamentally similar across Windows, Linux, FreeBSD, MacOS, or other operating systems. Each piece of software has unique data in memory to be corrupted. The core idea is understanding how application data is used, and seeing how you can modify it to benefit you, regardless of the operating system. When working on a specific target you'll research the unique details of that software, but when learning the concepts you don't need to worry about that. Though, one of my last modules is a Windows crash course to to help make that transition and show how similar they are.
Prerequisites
Be Motivated
Motivation is key in learning exploit development. Follow your interests rather than forcing yourself through topics that feel like a grind. If you lose interest, switch topics and come back later. Staying motivated is more important than following the "right" order. You will get frustrated during this, frustration is a key part of exploit development and you need to learn to embrace it and keep going.
The C Programming Language
Learning C isn't about becoming a C programmer but understanding the CPU's memory model. Since memory corruption bugs exploit memory at this level, C provides the most accurate mental model. You need a solid grasp of software memory, different regions, allocation, and pointers. Evan Miller's post You Can't Dig Upwards explains the mental model you gain from writing C that you don’t get from languages like Python.
x86-64 Machine Code/Assembly
Many recommended resources focus on exploiting 64-bit x86 binaries, which are common in desktops and laptops. Unlike programmers, exploit developers work with "raw" machine code which lacks most of the nicities that exist for programmers.
What is important is to understand what the CPU itself is seeing, some important topics to understand include:
- Registers and their usage
- Calling conventions for functions and syscalls
- Memory segmentation
- Translating high-level types to raw bytes (in assembly, everything is just bytes in memory or registers)
Knowledge of a Scripting language
Exploitation involves interacting with other software, often using a scripting language. Python and the PwnTools library are common choices, but any language can work.
Basic Linux Usage
Most resources are Linux-based, so you should be comfortable with the Linux terminal and common command-line tools.
Pwn College - Fundamentals
The fundamentals Dojo provides some modules that cover most of these topics and is worth checking out before you get started.
Environment Setup
I've specifically tried to choose resources that are easily accessible. One of the issues with the prior version of my recommendations was not that the content was now out-dated or something but that bit-rot made it difficult to setup a working environment with a lot of hassle.
Most of the content here can be done with a text editor for your own notes and a web browser. Though some resources will require you have access to a Linux machine (a virtual machine is fine) to run provided binaries. In one section you may want to know a bit about running a binary under qemu to try to exploit other architectures but this is optional.
Learning Path
Pwn College - Part 1
Since the first release of Pwn College in 2020 it has becoming a leading recommendation for learning exploit development. I'm not a fan of the ordering of topics but still its an amazing, free resource from Arizona State University. And if you have issues their discord community is active, and the courses are updated with the school year.
Pwn College - CSE 466 - Fall 2023
This is the primary course people refer to when talking about Pwn College without qualification. It focuses on stack-based overflows and is a good starting point for memory corruption issues.
You can skip the modules that are also present in the Foundations/Refreshers Dojo unless you need a refresher. Complete the rest of the modules in order through to the end of "Program Exploitation."
You'll gain two main things from these modules:
- Experience with stack-based buffer overflows in a relatively modern environment with common mitigations enabled.
- Improved creative and lateral thinking skills from the Shellcoding and Sandboxing modules. These modules help you develop the right mindset for dealing with real-world exploitation constraints, even though the situations are contrived.
ROP Emporium
The Pwn College class does have a Return Oriented Programming (ROP) module. If you struggle with ROP Emporium, then you may find that module useful to reference and the labs have a more gentle learning curve.
ROP Emporium does go beyond the Pwn College module, it also offers it's challenges in multiple architectures. You'll want to continue with the 64bit x86 binaries but I do recommend coming back at some point to complete the ARM challenges. This process helps cement your understanding of code-reuse attacks. While the architecture doesn't fundamentally change anything, it adds constraints that prevent you from merely memorizing steps, fostering a deeper understanding of what you're doing.
Open Security Training - Vulnerabilities 1001
So far with Pwn College, you've primarily encountered stack-based buffer overflows. It's the easiest starting point because the process doesn't vary much. However, in the real world, we deal with many types of bugs that can corrupt memory beyond simple buffer overflows.
The Open Security Training course focuses on understanding the theory behind these issues, not exploitation. It covers various vulnerabilities and classes of corruption, starting with familiar stack-based linear overflows, and moving into heap issues, out-of-bound access problems, and various integer issues.
Pwn College - Part 2
CSE 466 - Race Conditions
In CSE 466, we skipped this module to focus on ROP for a smoother flow. However, race conditions are a common bug class, so it's worth gaining some exposure here, even though this module isn't as tightly focused on memory corruption issues its still a good introduction to concurrency issues.
CSE 598 - Spring 2024 - Format String Exploits
A Note about CSE 598 Modules - At ASU the "598" course number used for "special topics," so one CSE 598 will not necessarily be the same as another semester's 598.
Format string attacks have a long history of being the second bug class people play around with in binary exploitation. Its provides a very powerful ability to corrupt memory in a targeted way. Given the background from the Vulns 1001 course you should be able to catch onto this pretty easily.
CSE 598 - Spring 2024 - File Struct Exploits
This module is the first step towards "modern corruption," where you start using corruption to change behavior in more useful ways rather than just one-and-done exploits.
CSE 598 - Spring 2024 - Dynamic Allocator Misuse
While this module focuses on the default glibc allocator, its really a stand-in for any reasonably complicated software. Many applications are going to have these types of abstract data structures like a linked list or a binary tree which can be targets for your corruption. The allocator is just a very convenient starting place to learn about targeting those internal structures.
CSE 598 - Spring 2024 - Exploitation Primitives
This is the module where I feel like the ideas are being brought together into thinking about exploitation in a more-or-less modern way.
Weird Machines
This is a more "meta" topic. A weird machine is essentially an unintended computational model created within a system due to a vulnerability. Exploit development is about manipulate this model to achieve your goals.
It is this "weird machine" mental model that I think defines the modern era of exploitation.
LiveOverflow has put out a couple videos to explain the concept:
Thomas Dullien/Halvar Flake has a more academic perspective on weird machine in his 2015 paper Weird Machines, Exploitability, and Provable Unexploitability. His 2018 RuhrSec presentation is also an excellent resource.
CSE 598 - Spring 2024 - Windows Crash Course
For those of you who want to exploit on Windows, this transitionary module covers the high-level differences with exploits on Windows. It's not very in-depth, but it demonstrates that exploitation isn't fundamentally different across operating systems. If you understand the fundamentals of exploitation, switching to a different OS isn't a huge challenge.
By this point if you read a few windows write-ups on whatever target you choose to pursue you should be able to get up to speed just fine.
Continued Learning
I could end this guide here, as you should now have the fundamentals down and be ready to take on your own exploitation challenges. Although there are still many concepts you haven't been exposed to, you have the foundational knowledge necessary to understand them. Here are a few options on where to go from here:
Guyinatuxedo's Nightmare
While this resource exists as its own course that you can work through from top to bottom, it also works great as a "pick your own learning adventure." Choose a topic that interests you and work through the challenges at your own pace.
GLibc Heap Exploitation
Back in the day Malloc Maleficarum and Malloc Des-Maleficarum were eye-opening articles. They highlighted the potential of small corruption to have a massive impact due to their side-effects rather than resulting in a compromise directly. This concept, where small bugs can be chained into something bigger, is why learning about heap allocator attacks is worthwhile even if don't intend to be attacking the specific allocator yourself.
While I generally avoid recommending paid resources, one exception is Max Kamper's HeapLAB series which is excellent. On the free side, there is another Pwn College module on Dynamic Allocator Exploitation and Shellphish's how2heap.
CTF To Real World
This is another series of blogs I wrote. While the title might imply transitioning from CTFs, prior CTF experience isn't necessary. The series focuses on moving from understanding exploitation in intentionally vulnerable situations, like CTFs, to tackling real-world targets. It covers the skills you should start practicing and learning, along with some advice on how to do so effectively.
Paid Training
I get asked about this a lot, but I generally don't recommend any paid training due to the wealth of free content available.
If you do want paid training, Ret2 Fundamentals of Software Exploitation is, in my opinion, the best online training available. It doesn't include video instruction, so the style might not suit everyone. However, it stands out by exposing you to modern vulnerability classes, rather than just focusing on stack-based overflows and (maybe) format string attacks like others often do in their beginner oriented training.