Show Notes

111 - Bad Code and Bad URLs

if you’re going to apply a blacklist to remove content…perform it recursively.

const sanitizeUrl = function (s) {
  return s
    .replace(/&/g, '&')
    .replace(/</g, '&lt;')
    .replace(/javascript:/g, '');
var sanitizedLink = sanitizeUrl(links[key]);

Should trigger some alarm bells because its removing that javascript: string, but only running once. Classic blacklist bypass: javajavasacript:script: it removes the middle javascript: and the resulting strict in javascript:.

Two other issues were found in addLinks and addALink though both are in effect the same issue. The normal flow would sanitize the links for their use in labels and then add the sanitized links to a list to be reused later in a display showing the links. When htmlLabels was enabled, the links would not undergo the initial sanitation, so when the popup link box would use them they would only have to contend with the bypassable sanitizeUrl.

The logout endpoint provided by the Shibboleth plugin for an Identity Provider to log a user out of services had an odd way of finding the right sessions to destroy that lead to the request originator being logged into another seemingly random account.

foreach ($sessions as $session) {
    if (session_decode(base64_decode($session->sessdata))) {
        if ($_SESSION['SESSION']->shibboleth_session_id == trim($spsessionid)) {
            // ...

This happened when a database was used as the backing storage for sessions. The application would query the database for all sessions, then it would session_decode each of them (which deserializes the value into the $_SESSION superglobal) and check if the id matched the id they were trying to logout.

The problem is two fold, first, for a brief time the user making the request would have their session set to the session of every active user-session, though exploiting anything in this short window might not be practical. Secondly and where the main session hijack comes in is that after the loop ended this process wasn’t reversed. The user would be logged in as the last session in the response. If it wasn’t a privileged user, one could use the main logout (this is one used by the SAML idP) to have the session destroyed and try again until they got an admin account.

Great find by Robin Peraglie and Johannes Moritz, fun bug, but stupid code on Moodle’s part. Though PHP can take a bit of the blame as while the language exposes unserialize this doesn’t work on the session data without a bit of extra parsing (which is ultimately what the patch does). So I understand why developers might be tempted to do it this way.

Different URL parser may treat mistakes in the URL differently, leading to behaviour differences that can be used. This research paper focused on five potential areas where parses disagreed on how to understand the URL

  1. Scheme Confusion. This is a missing or invalid scheme, some parsers will assume an http scheme but more interesting is in how they parse the rest following that. Such as treating what might naturally be parsed as the host as the path (and the whole thing as a relative path) or just using an implicit HTTP scheme.
  2. Slash Confusion. Specifically in this case confusion regarding having the wrong number of slashes, browses especially would normalize the URL and accept extras which might trip up other parsers into thinking they are parsing a path instead of the hostname.
  3. Blackslash Confusion. This one comes from a difference of opinion between the WHATWG URL (what most browsers follow) definition, and RFC 3986 (most libraries seem to follow this). In WHATWG the \ and / characters are to be treated as the same, but in the RFC they are different.
  4. URL-Encoded Data Confusion. Hostnames (and anywhere but the scheme) can include url encoded data. So a library making a request will usually resolve the urlencoding to the proper domain, however many of the software libraries would parse out the urlencoded string and return that to the developer meaning the developer would have to be aware of this issue and decode the url before use.
  5. Scheme Mixup. While non-HTTP URLS might look similar, their specification might be slightly different. in http:// you have a special character, # which marks the start of the fragment section, in ldap:// however # recieves no such special treatment, so parsing a url for one scheme improperly can lead to some behavior differences, this was the case in one of the recent log4j issue, where ldap:// could bypass the ldap hostt whitelist as a http parser would think the host is (allowed) but the actual lookup would respect the ldap scheme and make a request to instead.