Getting Started with Exploit Development
tl;dr The rest of this goes into detail about what topics matter and why from each resource, but if you want to cut to the chase and ignore that...
- C programming language
- x86 Assembly (32bit and 64bit)
- Linux terminal usage
- Exploit Education - Nebula - Start thinking like an attacker and learning to do research
Open Security Training - Introduction to Software Exploitation - Fundamentals of memory corruption
- Pwn College - (Added July 2022) An alternative place to learn the fundamentals of software exploitation.
- Exploit Education - Phoenix - Practice the fundamentals in 32bit and 64bit
- Pwn College - Module: Memory Errors - Explore more classes of vulnerabilities and learn about more recent mitigations
- Exploit Education - Fusion - Practice the above and a bit of reverse engineering/testing
- ROP Emporium - One of the most common exploitation techniques used today
- Weird Machines - This is to get the metal shift and understand how to reason about modern exploits
- Nightmare - Final practice.
What is Exploit Development?
As usual within the security industry, the terms are made up and no one uses them consistently. Exploit development as it is used here is about the development of scripts or programs that can take advantage of (exploit) memory corruption vulnerabilities in software.
This is opposed to exploits that might take advantage of higher-level vulnerabilities such as those seen in general application security such as web-applications or mobile applications.
There are three things you need to know before getting started.
The C programming language. The thing about learning C is not that you're going to have to do a lot of programming in it, but rather in learning C you also gain a mental model of how software works. C doesn't attempt to hide most of the memory management from you, neither does it do magic like garbage collection in the background. It's important if you're going to learn about exploiting memory corruption vulnerabilities that you understand how software uses memory.
There is a really good blog post by Evan Millar You Can't Dig Upwards which is a defense of learning C as your first language. While I don't necessarily agree, he does an excellent job of describing why learning C matters, and how it enables you to understand the layers of abstraction in most software.
As a general rule of thumb, when you want to attack anything, the first step is to understand how it works.
An assembly language. This is a tricky one that newcomers sometimes take the wrong path on. You don't need to be able to program in assembly, and you shouldn't follow tutorials for programmers about writing assembly. Programmers can make use of compilers and assemblers to turn their hand-written assembly code into machine language (the code understood by the CPU). This gives access to things like named variables and labels, fake instructions, and other high-level concepts that are not reflected in the machine code.
What's more important is the ability to read what I'd call "raw" assembly. That is, take some compiled instructions, run them through a disassembler and read those instructions. This may be strange at first as you won't have the high-level type information and constructs you're used to, but it's important to understand what the CPU executes under the hood.
The most important concepts to understand for exploiting memory corruption issues are:
- Registers and how they are used
- Calling conventions for functions and syscalls
- Memory segmentation
- How high-level types translate to raw bytes. In assembly, there are no special "types", as it's all just bytes in memory or registers
Basic Linux Usage. Just get comfortable using a Linux terminal, a lot of the resources recommended here will involve Linux.
The classic question is what operating system to use. The truth is that it doesn't matter, use what you are comfortable with. You should learn to be comfortable with a Linux terminal, but for your host system don't worry about it.
This is a bit of a pet peeve of mine, but you don't gain much by starting on your target platform. A lot of the generic concepts apply no matter what operating system you're targeting. You start by learning your fundamentals and then you learn specific applications that are operating system specific. The idea of a write-primitive isn't unique to just Linux or Android exploitation, but what you actually write to turn your write-primitive into say a local privileges escalation will be OS-dependent. You can learn about various code-reuse attacks like return-oriented programming, the basic idea of which doesn't change, but you'll utilize it differently depending on the OS.
With that in mind, many of the resources that will be recommended are Linux-focused. There are a handful of reasons for that, from the ease of sharing a consistent Linux environment for training to the lack of smaller mitigations that exist on other platforms. Trust me, you'll have no problem transferring knowledge from one operating system to another.
Exploit Education - Nebula
In Nebula you are learning to think like an attacker and do research. This box isn't actually about exploit development, but more general application security. I like recommending it though because it forces you to start doing some research on topics you might not be familiar with to determine what the vulnerability being showcased is. It gives you enough information to get started. While this might feel annoying, this ability to research and digest information about a new topic is a huge part of exploit development. I spend more time reading documentation and other write-ups than I do writing exploit code. The ability to do research and persevere is immensely important.
You will find yourself going down dead-ends, doing research that doesn't pan out, thinking you're wasting hours of time, and that's okay. You need to learn to embrace that frustration as it's a key part of exploitation. Every dead end you go down doesn't help you immediately, but as you keep doing it you're building up a huge personal knowledge base that you'll eventually start drawing on as time goes on. Learn to enjoy the rabbit holes and don't worry about the wasted time.
Open Security Training - Introduction to Software Exploitation
Update (July 2022): Over the last year, as I've recommended resources, I've come to realize that a lot of bit-rot has occured with the OST course above. While I believe the content top-notch and it strikes a really good balance for beginners. Its quite a hassle to run the VM and work along with the course. So I've started recommending more segments from Pwn.College. While I don't entirely like the flow of pwn.college it has two huge benefits going for it. It has a ton of labs to practice on for every topic, and all the labs can be completed from within your browser inside its provided workspace. Making it far more accessible and usable than the OST course.
Introduction to Software Exploitation is a lab-driven course by Corey Kallenberg. As such, the labs are useful for learning fundamental concepts when it comes to exploitation, so when the labs come up in the videos, don't skip them, pause and do them yourself.
There are a few basic concepts that this course covers:
- Shellcoding and Calling conventions
- Buffer Overflows (stack and heap)
- Arbitrary Write (format string attack)
Don't worry about the fact it uses a pretty old Linux distro, you're not going to be pulling off most of these attacks as they are in the course today. But the basic idea of a write-primitive, or overflowing a buffer and overwriting nearby data is still relevant.
This is a lab driven course from Arizona State University. It is a proper undergraduate course and taught by Zardus (Yan Shoshitaishvili) and kanak (Connor Nelson). You've got lectures on their Youtube channel, while the class is running, the classes are streamed live on Twitch, and the discord server is active. They've also been updating the course every year, so by the time you read it, it might be slightly different. As a course it is not quite a drop-in for the topics covered by the OST course it "replaces" here. The core topics to learn here would be:
- Assembly and Shellcoding
- Interacting with Software
- Memory Errors
While the entire course is valuable, you can consider stopping once the course gets into the topic of "Return Oriented Programming" (ROP) and doing the next couple of resources. Then returning to this course afterwards to learn ROP and some of the more advanced topics. The OST course assumes some basic knowledge of heap exploitation which isn't covered up to this point in Pwn College (its covered after ROP) but may be useful to know for some of the next resources.
Also, at least in 2020 and 2021 the course had a ton of repetitive feeling labs. There is some benefit to the repetition which I think is what they are going for, framing it kinda like a martial art and practicing the concepts, but don't feel too pressured to keep going on one topic if its getting really tedious and boring.
Exploit Education - Phoenix
Now you're going to take the concepts you learned in the previous course and put them into practice a bit. You should start with the 32bit x86 version of Phoenix. Then move onto 64bit exploits on the AMD64 version of Phoenix.
Many of the challenges will be largely the same, but you'll start getting exposed to the differences between 32bit and 64bit x86 exploitation. There are some fundamental problems you'll run into. Again this will involve some of your own research as you learn about those differences.
Bypassing Exploit Mitigations
So the type of exploitation covered so far is essentially the stuff we did in the early 2000s. Since that time, several exploit mitigations have been introduced.
Three Four of which have attained significant popularity:
- Stack Canaries/Cookies - This blocks you from overwriting the stored return address on the stack by placing a canary value on the stack before the saved return address. This canary is checked before returning and if it's modified then the program dies.
- Data Execution Prevention (No-eXecute Bit) - In the previous sections, you wrote shellcode into memory and then jumped to it to get code execution. This is no longer possible on modern systems due to Data Execution Prevention, also known as DEP or NX. This mitigation ensures that a page can be Writable but not eXecutable, or eXecutable and not Writable.
- Address Space Layout Randomization (ASLR) - Previously when you wrote your shellcode into memory you could do so in a roughly consistent location. Now the address space gets randomized, so even if you can control the program control flow, you don't know where direct it to.
- Position Independent Executables/Code (PIE/PIC) - ASLR can easily randomize the location of some memory segments like the stack and heap. Moving segments containing code around requires that code be compiled as position independent. For awhile this left significant room for attackers to bypass ASLR simply by reusing code in shared libraries or within the main executable which wouldn't be randomized. Now almost all shared libraries will be compiled as PIC which severely reduces that attack surface, and sensitive executables such as those exposed on the network will be compiled with PIE.
Pwn College - Module: Memory Errors
Update (July 2022) - If you did Pwn College instead of OST then you should have already done this section and can go right on to the next resource :D
Pwn College is an awesome resource for more modern exploitation. In particular, I'm linking just a few of the lectures that cover dealing with some common mitigations. This module in particular you can probably skip the first three lectures. But the following lectures linked on the page are worth checking out:
- Causes of Corruption 1 and 2
- Stack Canary Mitigations
- ASLR Mitigations
- Causes of Disclosure
- Shellcoding: Data Execution Prevention
If you're motivated there is a ton more content in pwn college to check out too.
Exploit Education - Fusion
Now that you've got a bit more knowledge about mitigations, it's time to put that into practice
also. The Fusion box is also going to get you doing a bit more reverse engineering and testing for vulnerabilities than you had to do for Nebula or Phoenix. Its also going to introduce all of the above mitigations for you to play around with.
This is what I would consider the last of the beginner concepts, return-oriented programming (ROP). ROP is a very common exploitation technique, most exploits today tend to utilize ROP at some stage in the chain.
You'll probably find it easier to work through ROP Emporium on 32bit but do go back and do it on 64bit because things do change substantially with the different calling conventions.
Weird machines also touches on ROP, but it focuses more on how to think about ROP. It is more of a meta-topic I guess. A lot of recent exploit write-ups tend to talk about exploitation in terms of gadgets and primitives being obtained, but these terms won't be familiar to you if you're new to ROP. It's a bit of a mental shift away from talking about the specific exploit technique, so you need to familiarize yourself with it. You might just naturally get it, but I wanted to should out a couple of videos from LiveOverflow where he explains the concept:
This is a tricky concept to explain, but you'll start to understand it intuitively with experience. There is a solid paper on the topic of weird machines and exploitability which is also an interesting read.
So Nightmare has a ton of challenges for you to practice on. In particular, I want to call out the Heap Exploitation section. While heap exploitation is one of those areas that is particular to each operating system (and each heap implementation). I think there is significant value in learning about the ptmalloc2 allocator and its attacks. You might not find yourself using them, but at least for me, Malloc des-Maleficarum was a huge eye-opener for the creativity and art of exploitation.
What you gain by running through the heap exploitation is less about memorizing all the different techniques that have been found to attack ptmalloc but more just a sense of how you can creatively apply control of certain pieces of data. So I'd highly recommend running through the heap challenges on nightmare.
At this point, you have a lot of the basic concepts that you'll need to start looking at modern exploits, and hopefully the research skill to start discovering what you don't yet know you don't know.
We do plan to put out another part to this post covering how to bridge the gap from these CTFs and toy binaries to real-world exploitation soon.
For now, I'll just say that one of the biggest mistakes I see people make is they wait until they feel ready. Don't wait, just dive in and learn as you go. This is especially important when it comes to the process of discovering vulnerabilities (which I haven't touched on at all here) as a big part of that is building an intuition which just takes time.