Fuzzing Through the Side Door

[article]
Summary:
Fuzz testing, or "fuzzing,” is an approach to test automation that attempts to uncover weaknesses in a system using tool-generated data. In this article, Jonathan Kohl recounts how he used this technique on a published web services interface to test “through the side door”—those testable, in-between areas like messaging APIs.

“Fuzzing” is a software-testing technique that involves using tool-generated test data to try to cause failures in a system. It’s a brute-force approach—a fuzzing tool generates random data of a certain type over and over, and that data is inserted somewhere within a software program. It is simple, yet incredibly powerful. Weird data types ending up where the program doesn’t expect them to be is a common source of errors. Fuzzing can help uncover security weaknesses in a system, and that’s where it gets most of its exposure.

I was working on a web-based system that processed customer orders. One of the problems in the test approach was that in production, thousands of orders could be processed over an hour, but in test, there might be a dozen orders processed per day. It could take a while for each order to be entered manually through the system, and it didn’t take much less time to have an automated tool process that order. You could also generate orders by running transactions against the back-end database to quickly create a lot of test data, but that wasn’t really testing the order system. I wanted to get more production-like volume in a short period of time. One program interface caught my attention: Most of the orders in the production system were processed through a web services interface. That means that third-party systems created orders that were sent to a non-GUI interface in this system using an XML format. I can run automated tests much more quickly if I don’t need to use the GUI. Furthermore, if there is a published, stable interface already in use, I can exploit that for testing—enter test automation through the “side-door.” I got this term from consulting developer and tester Jennitta Andrea. The “front door” is the GUI, the “back door” is the database, and the “side door” is other testable interfaces in between, like messaging APIs.

There are a lot of fuzzing tools out there, but many of them seem to work by trying to cause a public interface such as a login page or URL to fail. Others analyze your code and use fuzz payloads to cause problems at a very low level. Still others are powerful, sophisticated tools to run throughout your software system. I wanted to do something different. Both mutation testing and high-volume test automation are well-known, powerful techniques for finding problems in systems. I wanted to run a high volume of tests in this online ordering system to see how it could handle a large number of transactions over hours or days. I also wanted to see how the system could handle problems in various parts of an individual order. I decided to use a fuzzing library to generate the brute-force data, insert that data into parts of an order, and run this test thousands of times. The fuzz data would mutate the order messages to simulate problem orders that were seen in production. The test program would generate messages quickly and run for a long period of time to simulate production traffic.

I couldn't find any tools that did exactly what I needed, so I started a search for a fuzzing library. I was working in a Java shop, so I needed a Java tool. I found the OWASP JBroFuzz project and found I could use JBroFuzz as a library, even though it was written as a standalone tool. I created a test that relied on JBroFuzz to generate test data and inserted that data into an order message. A normal message looked something like this XML example (test data in bold):

   <book:addHotel>
            <orderid>123456789</orderid>
            <name>Three Guys Hotel</name>
            <address>123 Main Street</address>
            <city>Calgary</city>
            <state>Alberta</state>
            <zip>12343</zip>
            <country>Canada</country>
            <price>40</price>
        </book:addHotel>

A fuzzed message looked something like this (price is getting fuzzed):

   <book:addHotel>
            <orderid>123456789</orderid>
            <name>Three Guys Hotel</name>
            <address>123 Main Street</address>
            <city>Calgary</city>
            <state>Alberta</state>
            <zip>12343</zip>
            <country>Canada</country>
            <price>0xffffffff</price>
        </book:addHotel>

Now, imagine doing this over and over, thousands of times, with the fuzzer doing the test data creation for me. The fuzzer creates data based on a payload type, then the automation program inserts it into a message, and then the message is sent on to the system under test. I can fuzz as many or as few of the message parameters for this order type as I want. I could use automation to mutate each message over time so that, in the end, all values were fuzzed. I used simple monitoring tools in this system: I watched for error messages in the log files using simple Unix commands, and I kept an eye on system resource usage on the application servers and the backend database. The first time I tried it out, the server crashed within twenty minutes.

I found quite a few errors using a fuzzer in this way. The system was mature and was well tested, but I got fast results using test automation to do the heavy lifting: generating lots of test data and running it over and over, sometimes for hours. I managed to get high-volume test automation with tool-generated test data I would never have thought to create myself. It would have taken weeks or months of full-time work to do what this tool could do in minutes or hours.

Since the fuzzing automation was essentially pummeling each public order entry field with different kinds of data, we found different kinds of errors. Fuzzer payloads might exploit input overflow tests, and, on some fields, large values caused errors anywhere from the database on up to the code processing the message. Other fuzzer payloads exploit entering data of the wrong type, and that could cause the database to fail while trying to insert the incorrect data into a field. In other cases, the code that processes messages might fail when trying to process a character from a character set it wasn’t expecting. Any number of libraries in between might fail similarly.

We discovered inconsistencies in the way code handled error conditions for different inputs in an order. The order ID field might restrict data of the wrong type or data that is too long at the messaging layer. The price field might not restrict those data types or lengths at all. In other cases, error handling might break down after processing a large number of orders. We also discovered memory leaks and other problems that were revealed by processing a large number of orders over a period of days. We just didn’t see that with manual testing or automated functional testing in the past. Those efforts didn’t generate enough data or conditions on the server that were closer to what you would see in production.

This approach to test automation is a bit more technical than some testers might be used to, but it can be highly effective. Using the well-known approaches of mutation testing, high-volume test automation, and fuzzing against an order-entry system proved to be a powerful combination. If you have the technical resources to be able to try this sort of thing out on your system, I encourage you to do so. A word of warning, though: Do not try this on public systems or systems you don’t have permission to run a fuzzer against. Fuzzing in this manner can quickly bring a test system down. If you’ve been asked to test a messaging system, propose looking at this kind of approach, and happy fuzzing!

Jonathan would like to thank Yiannis Pavlosoglou of the JBroFuzz team for his help and support.

References

About the author

CMCrossroads is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.