Fall of the machines: Exploiting the Qualcomm NPU (neural processing unit) kernel driver

We discussed this vulnerability as part of our weekly podcast on 24 November 2021

Three vulnerabilities in Qualcomm’s Neural Processing Unit (NPU) driver. Specifically the article focuses on Samsung devices, as, for whatever reason, the NPU device is accessible to untrusted users where it isn’t on most other devices.

Bug 1: UAF in npu_exec_network() / npu_close() The first bug has to do with the fact that the NPU driver has to keep track of the various network models that are loaded, and the clients they have. This is for two reasons, one being they need to be able to isolate different clients of the same network from each other. Another being, the driver has to be able to know which client ID to interact with when asynchronous commands (like npu_exec_network) are used. It does this with a global static array of network objects, which maintain their own client reference.

This client reference (and the entire network object) needs to be cleaned up upon the file descriptor’s closure, as those clients will be free’d as well. It does this via npu_host_cleanup_networks(), which eventually calls free_network(). The problem is, there’s error paths that can short-circuit that free_network() call, such as by having an asynchronous command processed while the close path is running. By sending an async npu_exec_network command and quickly closing the file descriptor, when NPU later finishes the exec task and tries to send a message back to the CPU, it’ll get processed by app_msg_proc, which will use the (now free’d) client object.

Bug 2: Logic bug in npu_process_kevent() The npu_process_kevent() function is called when an exec network command is finished, and is responsible for copying stats_buf information to userspace. The pointer to the stats_buf object is stored in kevt->reserved[0]. The problem is, when calling copy_to_user(), instead of using kevt->reserved[0] for the source pointer, they use &kevt->preserved[0]. This results in leaking the address of the stats_buf object rather than the copying the contents.

Bug 3: Uninitialized read in app_msg_proc() The final issue was in app_msg_proc() and it’s use of a union in the msm_npu_events object (which gets copied to userspace). A union will take the size of it’s largest member, which happens to be a 128-byte auxiliary data buffer. As this buffer is larger than some of the other objects in the union (such as npu_event_execute_v2_done), trailing bytes get leaked to userspace.

Chaining it all together The second bug can be used to get controlled data at a known kernel heap address, since the stats_buf can be controlled by the user and the address gets leaked. This is useful for faking objects and gets around Privileged Access Never (PAN). The third bug can be used to defeat kernel Address Space Layout Randomization (kASLR). The first bug can be used to get a user-controlled pointer (via UAF) to get dereferenced, which eventually has a function pointer that gets called, allowing control flow hijack. By setting up the fake object via the second bug, and leaking the address of a gadget function with the third bug, this can lead to full code execution.