Show Notes

149 - A Zoom RCE, VMware Auth Bypass, and GitLab Stored XSS

This is a cool trick, using a UTF-8 parser differential between the client XML parsing library (Gloox) and the server side (fast_xml), to smuggling in characters that would end an XML tag prematurely and smuggle in new XML content.

The core issue comes down to how the \xEB character was treated. In UTF-8 this character represents the start of a three byte sequence (I assume you can probably do the same attack with the markers for longer sequences too). The client would kinda just ignore the UTF-8 aspect of this, its just another byte that gets reflected in the tag name. So it would scan the characters after a < looking for > byte. The server however had a better understanding of UTF-8, it would see the \xEB and would skip over the next few bytes knowing that it belonged to that character sequence. So the client side would stop the tag when it sees > even if its immediately after the \xEB and the server would see it as part of the tag name (enough those its technically an invalid UTF-8 sequence, the server was somewhat permissive)

So with this desync and the ability to smuggle in tags in the attack would inject a <? xml ?> tag which would restart the entire XML context allowing any of the special server commands to be sent. Including one that would cause the client to disconnect from the server and reconnect to an attacker defined server. From there an update could be issued for code execution on the client.

Honestly, this is a bit of a crazy issue to see, during Login, if the LocalPasswordAuthAdapter gets used, it will attempt to validate the login credentials with whatever host is in the Host header, an attacker can often control this header completely. And so by pointing the header to a domain the attacker controls they can setup a server that will respond with an HTTP 200 to the authentication request allowing the attacker to login.

At its core, a simple issue with path normalization between a reverse proxy and the end server, one treated ..%2f as a traversal and the other did not. This was used by the author to access internal NGINX Plus endpoints and was able to take advantage of it and was able to add his own server to the upstream list. So victims would be proxies to an attacker-controlled server. Cool way to escalate the issue that I’ve not seen before.

It seems that the syntax highlighting filter will read the data-sourcepos attribute rather permissively including newlines and angle brackets. This value gets reflected back out into the page where the browser will end up interpreting as HTML some of the text the backend throught was in the attribute.

This did provide a pretty straight-forward XSS, just including a <script> which would work on self-hosted GitLab instances without a CSP set. GitLab.com did have a stronger CSP though which had to be bypassed.

First Bypass was by injecting a <base> tag. This tag tells the browser the base location to load relative resources from. You can trick the page into loading scripts after the tag from an attacker controlled domain. As these scripts would have been added legitimately they would also have the nonce necessary for CSP to allow them.

He also provided a second bypass using a gadget out of main.js for displaying GlFieldErrors on every form. By crafting a form with a mlaicious <input> and title attribute the title would be reflected into the generated error element’s body. Which would then inject a new tag, this one providing attribute for rails-ujs (data-remote=true data-method=get data-type=script href=...) to pickup, and load a remote script.

The gist of this attack is using a hidden electromagnetic interference generate to inject fake touch points into a touch screen without actually physically touch the device, and through other materials (a table). Its an interesting attack, though the exact mechanisms of it are beyond me.

There are a handful of limitations I’ll call out though. All of the testing happened within a range of 0 to 15mm, so still very close to the device. The device itself would emit an audible coil buzz (42db at 20cm above table) that was related to be about as loud as a refrigerator’s hum. There is also the alignment issue which I don’t believe this study dealt with at all. It could inject specific touches and gestures but required the phone be positioned relative to the device. In a real world scenario perfect place might might be rather challenging.

However none of these seem like impossible problems to solve with further research, and as a proof of concept its a fun attack.