Dirty Pipe and Analyzing Memory Tagging
The root cause of Dirty Pipe was a linux kernel bug introduced in the pipe subsystem in a 2016 commit. Due to various changes in the kernel over time, this bug became a critical security issue. A bit of background is needed on how pipes work on the kernel side, particularly anonymous pipes. Pipes are implemented using a ring buffer containing
pipe_buffer objects. These objects contain flags as well as other things such as a reference to the backing memory page for the data to be stored. In normal cases, when you write to a pipe, a page is allocated and the data is written there. If you then write to a pipe again, the kernel will try to append that data to the existing page if there’s room before allocating a new page for performance reasons. However, it’s possible for a
pipe_buffer to contain a reference to a page it doesn’t actually own, being the
splice() system call.
splice() system call will create a
pipe_buffer entry that references a page in the
page_cache for the file being spliced in. Due to the pipe not having ownership of this page, obviously it cannot allow the user to append data to that page. The kernel needs to account for this and track which buffers are “mergable” and which aren’t. Until a commit in 2020, this “mergable” trait was tracked with a field called
can_merge. In 2020, it was changed to reference the
flags field, and the
PIPE_BUF_FLAG_CAN_MERGE flag was introduced.
A commit in 2016 added two functions to the pipe subsystem,
copy_page_to_iter_pipe(). These functions are used by the
splice() system call to allocate
pipe_buffer entries for the backing file data. The problem is, it never initialized the
flags field. At the time, this was a non-security issue because even though the flags could be used in an uninitialized manner, the flags weren’t used in any critical context. When the 2020 commit landed and made use of that flags field, memory corruption was introduced which made it possible for
splice()-allocated pipe buffers to have the “mergable” trait. An attacker can intentionally poison the ring to set the
PIPE_BUF_FLAG_CAN_MERGE on pipe buffers used by
splice(), which then allows them to write data into the page cache which the pipe doesn’t own.
This can facilitate the ability to write and change the file data in the page cache even if it’s opened as read-only, giving a privilege escalation primitive.
The vulnerability here is just a straight forward case of reading a size from the attacker, and using it in a
memcpy into a fixed size destination buffer on the stack.
A little bit interesting in this case was the exploitation strategy used. While nothing ground breaking normally the only primitive we’ve covered being gained from the stack-based overflow is hijacking the stored return address between stack frames. This exploit used the stack-based overflow to get a couple other primitives first by corrupting the locals on the stack.
- First the they were able to brute force a
client_sockvalue. When the file descriptor used here was invalid the function just returned so it was simple to just try a value and keep incrementing until it worked.
- With the
client_sockleaked, there was the
prefix_sizevalues which provided an arbitrary read primitive as
prefix_sizebytes would be read from
prefix_ptrand sent out over the socket and then freed. The free was an important constraint on where the
prefix_ptrcould point to, but otherwise it was arbitrary. They pointed it at the Global Offset Table (which for some reason worked despite the
freecall) to leak libc function pointers. With that they could calculate the address of
- The next step with
systemleaked was to get data they control in a consistent location. This turned out to be fairly easy just requiring an allocation large enough to get
mmap‘d. Which could be caused just by sending a large enough HTTP request in the first place.
- And finally they had all the pieces necessary to use a more traditional route with a ret2lib style attack.
While nothing ground breaking, especially for those new to the field I often see stack based overflow basically just meaning overwrite ret, and they really can be much more powerful than just a control flow hijack.