Joomla: PHP Bug Introduces Multiple XSS Vulnerabilities
This was a really cool XSS filter bypass due to a parsing differential between PHP’s multibyte string functions: mb_strpos
and mb_substr
when dealing with invalid UTF-8 sequences.
At a really high-level in UTF-8 encoding characters may take up one or more bytes, so the first byte of any character will indicate how many bytes that follow are part of this one character and then all of the following bytes (if any) will start with the bits 10
indicating it is a continuation. When mb_strpos
would parse process the string byte-by-byte. In that it would read hte first byte, it would indicate how many bytes are to follow, then it would read the next bytes one-by-one ensuring they were continuation bytes until it got to the end of the character. If one of those continuation bytes were to be invalid, it would consider that the end of the previous character and attempt to interpret this new byte as the start of a new character. This means that when parsing the characters to find the some position it could have a UTF-8 leading byte indicating that this is a 4 byte character, but then if the second byte were invalid that 4-byte character would end after just one byte.
On the other hand mb_substr
would read that leading character and then skip over the next however many bytes were indicating and read the next leading-byte. This parsing difference means that characters could be smuggled in when the XSS filter would run because it could improperly calculate how many characters into the string it is. Really cool bug, and the fix wasn’t backported into older PHP versions so there is a chance this will stick around for a little while.