Linux Kernel: Exploiting a Netfilter Use-after-Free in kmalloc-cg

We discussed this vulnerability during Episode 178 on 10 January 2023

A fairly complex exploit of a use-after-free in netfilter. The vuln is detailed more in other posts linked off by exodus, but effectively the bug is a lifetime issue with netfilter sets that don’t have the NFT_EXPR_STATEFUL flag set but contain a reference to another set (such as lookup and dynset expressions). If the expression associated to the set doesn’t have the NFT_EXPR_STATEFUL flag set, it aborts and destroys the expression, but the referenced set’s binding list isn’t updated to remove the reference. Whenever the binding list is updated (ie. to add or remove something), UAF occurs as it’ll dereference a dangling pointer to update the doubly linked list. This gives an attacker the ability to write a node in the linked list into a freed space.

In kernel v5.18, these objects are allocated in the kmalloc-cg-* caches as they’re allocated with GFP_KERNEL_ACCOUNT flag set. This makes exploitation a little trickier than previous kernel versions where it was in a generic cache and could be targeted with more universal objects for primitives.

Infoleak heap pointer The exploit chain involves three stages. Infoleak to leak a heap pointer, to leak a .text pointer to defeat kASLR, and finally code execution. The infoleak involves triggering the UAF on a dynset expression and overlapping with a msg_msg object (which was moved to the kmalloc-cg cache in 5.14). A pointer to the netfilter set object gets written into an overlapped msg_msg’s msg_data which will be infoleaked when msgrcv() is called.

Infoleak kernel text pointer A kernel text pointer is much trickier, as the ops field of dynset can’t be leaked since it falls inside of the msg_msg header upon overlap. For leaking a text pointer, a lookup object is used for the initial UAF and it’s overlapped with an fdtable, which aligns the linked list entry with the tablesopen_fds pointer. It’s a good candidate object because they can spray by forking child processes and trigger a free on open_fds by terminating the child processes to get a partial free. They then trigger another UAF on a dynset object, get that pointer overwritten into fdtable->open_fds, and spray msg_msg objects to occupy the now free’d dynset. Finally, the child processes are terminated and get the dynset free’d, which frees part of the sysv msg_msg, which they then replace with a time_namespace object that contains an ops table pointer.

Code exec Code execution is achieved by using the partial free via fdtable to corrupt a set object’s ops pointer. This kickstarts a ROP chain to overwrite MODPROBE_PATH to get root.