CVE-2022-29582 - An io_uring vulnerability

We discussed this vulnerability during Episode 152 on 20 September 2022

An io_uring race UAF that gets chained into 4 UAFs! First, a bit of background on io_uring.

Background io_uring is a (primarily) async I/O subsystem that was introduced to linux in 5.1. It consists of a submission queue (SQ) and completion queue (CQ), which are shared ring buffers between kernel and userspace. As userspace, you write your entries into the submission queue, dubbed “SQE”, and the kernel writes results back to the completion queue. While io_uring is commonly used asynchronously, it’s possible to use synchronous requests as well via setting the IOSQE_LINK_FLAG in the SQE. There’s also an IORING_OP_TIMEOUT opcode, which allows you to specify a timeout for a batch of n I/O operations. For targeted timeouts of specific requests, IORING_OP_LINK_TIMEOUT is provided. Relevant for exploitation is the IORING_OP_TEE type, which is used for splicing data from an input file to an output file.

Vulnerability The researchers had a question with the two timeout related opcodes - what happens if an IORING_OP_TIMEOUT races with an IORING_OP_LINK_TIMEOUT on the same request? It turns out, memory corruption happens, because the io_flush_timeouts() function will unconditionally try to cancel the timeout and remove it from the timeout list without consideration that something else might be triggering or cancelling it as well. This leads to a deferred UAF on the io_kiocb object for the timeout.

Exploitation Exploitation was tricky. It’s a race condition with a somewhat narrow window, and heap isolation makes it so it’s not easy to overlap the free’d io_kiocb with a different type of object. It seems they decided to overlap an IORING_OP_TEE io_kiocb with that of the free’d one, and used the request free on the overlap to corrupt the reference count of the file object backing the IORING_OP_TEE request. This allowed them to get a controlled UAF on file. Now this puts them in a somewhat similar situation to before as heap isolation is still a factor, but they’re not working inside of a tight race window.

By doing some trickery to get the page released back to the slab allocator and get it used for a different object cache, they were able to overlap the file with a sprayed msg_msgseg object. Then, they triggered a third UAF by closing the file to overlap msg_msgseg with a tls_context object for sockets. They used this overlap to leak the contents of tls_context by receiving the msg_msgseg object, which in-turn also free’d it, chaining into their fourth and final UAF. By overlapping tls_context with another msg_msgseg of a fake tls_context, they could hijack the sk_proto pointer and get code execution from the getsockopt() function pointer.