Welcome!

Java IoT Authors: Pat Romanski, Elizabeth White, Liz McMillan, Yeshim Deniz, Mehdi Daoudi

Related Topics: Java IoT

Java IoT: Article

Preventing Reverse Engineering

Java Bytecode Obfuscation

Your software may be under attack! Reverse engineers can't wait to get their hands on your binaries and learn their secrets. Okay, so that's a little melodramatic, but for many software companies like mine, there is a competitive advantage in our source code, so we don't ship source, rather only the necessary artifacts required to execute on the target platform. While an extremely motivated individual, given enough time, energy, patience, and Mountain Dew, can reverse engineer the software by disassembling the execution artifacts down to the machine-code level and figuring out how the software works, shipping only software executables is pretty safe.

What about software written in Java? Java source is compiled into .class files, which are often packaged into Java Archive files (JARs) and shipped, along with any Javadocs and other documentation. The situation is no different, really, from software that is compiled into native machine language instructions, right? Well, not exactly, as we will see.

In this article, I'll explore the vulnerabilities of Java bytecode to decompilation-style reverse-engineering attacks. Then we will look at a technique called obfuscation for modifying the bytecode instructions so that, if subject to such an attack, the resulting decompiled code is more difficult to read. We'll see how to run an obfuscator and explore some of the options available for obfuscating bytecode. Finally, we'll look at some of the things to keep in mind when using an obfuscator.

The source code for this article is available at www.sys-con.com/java/sourcec.cfm, and contains a complete working application that I wrote for chapter 10 of the book Java Enterprise Best Practices. Scripts to build and run the application, along with scripts to decompile the .class files, are also included.

Introduction

Lately I've been thinking a lot about Java bytecode, the instructions produced by the Java compiler and executed by the Java Virtual Machine. Let's suppose I have a class, Queue, whose add() method is shown in Listing 1.

When the Java source code for this class is compiled, a file is produced with a .class extension that contains some metadata about the class, along with bytecode instructions for executing the class's instructions. I can use a decompiler such as JODE (http://jode.sourceforge.net) to decompile the Queue class. The add() method is shown after decompilation in Listing 2. In this listing, Queue.java was compiled with debug information included and no optimization.

Notice something rather startling: nearly all of the original Java code can be reproduced from the contents of the class file! With the exception of the comments from Listing 1, the original Java code and the code produced from the decompiler are identical. But since I never (oh no, not me!) forget to change my <javac> Ant task debug flag in my build script (or omit the -g flag when compiling from the command line), I've got nothing to worry about, right? Let's see. Listing 3 shows the decompiled add() method from Listing 2 when debugging information is not included in the class file. In Listing 3, Queue.java was compiled with no debug information.

Again, with the exception of comments and local variables in the add() method, the original source code has survived. Although the local variable names add meaning when reading the code for the method, it still wouldn't be terribly difficult to reverse engineer this code.

What do I do to protect my software from reverse engineering? Of course, I use an obfuscator! There are several freely available bytecode obfuscators. The working application that accompanies this article uses two obfuscators: ProGuard (http://proguard.sourceforge.net/) and yGuard (www.yworks.com/en/products_yguard_about.htm), both freely available (ironic? see sidebar - Reconciling Open Source with Obfuscation). Of the many available, I picked these two because they integrate with Ant, which is my build tool of choice. The examples in this article show the yGuard obfuscator in action.

Concepts of Obfuscation

The main idea behind bytecode obfuscation is to take a Java class file and process it into a new class file. By doing so, the new class file is behaviorally identical to the original, but bytecode instructions and class file metadata are scrambled so that reading and understanding decompiled obfuscated bytecode is difficult. Ideally, all obfuscating transformations on the original bytecode should be one way, or lossy. That is, the process so completely scrambles the bytecode that the bytecode still executes as intended, but unscrambling it with a decompiler retrieves very little of the original source. We've already seen how the bytecode produced by the Java compiler can be easily reproduced (especially if we compile with debug information included in the class file). An obfuscator can employ several techniques to foil the would-be reverse engineer. In the following section we'll look at the simplest of those techniques: layout obfuscation.

Layout Obfuscation

Layout obfuscation refers to altering the formatting of the class file. This involves removing debug information and changing the names of elements such as the class, member variables, and the local variable.

Remove Debug Information
Of course, debugging information can be omitted by the way you compile the code, but an obfuscator offers this initial level of protection should you forget. When code with debugging information in it is decompiled, local variable names are preserved. Any proprietary algorithms contained in the code can then be easily reverse engineered.

Renaming
The obfuscator employs renaming techniques to further confuse the would-be reverse engineer. Renaming is a powerful obfuscation technique. Why? Properly written, there is a good deal of semantic information contained in the names of classes, methods, and variables used in source code. Removing the inherent meaning in the class, member, and local variable names and replacing them with names that are not related to their purpose at execution time results in far less readable code, as shown in Listing 4. (Listings 4-6 can be downloaded from www.sys-con.com/java/sourcec.cfm.)

As you can see, this code is pretty hard to read. The method name along with class member and local variable names have been replaced with short, meaningless names. This is the kind of thing an obfuscator does: it makes your Java bytecode less susceptible to reverse engineering.

In fact, I also configured the obfuscator to rename the Queue class, all member variables, and certain member methods [as we saw in Listing 4 that add() was renamed to A()].

In Listing 5 we can see that Queue was renamed to C, and its base class (Basic) was renamed to B. All of the member variables were also renamed. The structure of the Queue class is essentially the same, but all of the meaning I coded into the names of variables, methods, and the class name is gone. And, best of all, this is a one-way transformation, so the original semantics of the class are lost upon decompilation, as we see from the previous examples.

Things to Look for in an Obfuscator

Not all obfuscators are the same, but there are some commonalities between obfuscators, and for good reason. Any obfuscator should be able to remove debug information and rename identifiers. However, a well-written obfuscator should also be configurable so you can pick and choose which identifiers are preserved and which are obfuscated. A good obfuscator will also provide some sort of log file that contains the mappings from original names to obfuscated names (some obfuscators even have separate tools to make looking this information up easier) so that you can, for example, interpret stack traces.

Here is a laundry list of the minimum functionality an obfuscator should provide:

  • Remove debug information
  • Rename identifiers to be meaningless
  • Configurable renaming so that you can choose what gets renamed and what gets obfuscated
  • Generate a mapping file so you can map original names to obfuscated names

Control Obfuscation

Another powerful obfuscation technique is Control Obfuscation, which refers primarily to altering the control flow of the statements that execute inside a method. This is a very sophisticated technique, and one that I could only find implemented in commercial obfuscators. An obfuscator that implements this technique produces bytecode for a class whose method instructions are altered such that the method still executes as intended. However, should the resulting class be decompiled, the code is even more difficult to decipher than it is by using renaming techniques.

Control flow obfuscation should be used with care, however, because when the flow of a method is altered, the potential to introduce overhead becomes very real. While a top-of-the-line obfuscator will certainly take this into consideration, it would be wise on your part to benchmark your unobfuscated code against your obfuscated code, especially if your obfuscator aggressively alters control flow. Some commercial obfuscators make the level of control flow obfuscation configurable, from none to aggressive.

An in-depth discussion of control flow obfuscation is found in "A Taxonomy of Obfuscating Transformations" by C. Collberg, et al.

Running the Obfuscator

All of the examples in this section use the yGuard obfuscator, which also produced the examples we've seen so far. Every obfuscator I've worked with has a slightly different configuration, but basically configuration falls into general categories, which we'll look at below. However, the configuration shown is that of yGuard, so you can get started with the example application, which can be downloaded from www.sys-con.com/java/sourcec.cfm. All configuration is in XML, since we'll be using Ant to build and run the examples.

First, we have to tell the obfuscator the location of the classes to be obfuscated, and where the resulting obfuscated classes should be written. yGuard accepts JAR input and writes JAR output using the <inoutpair> tag:

<inoutpair
in="./jmxbp.jar"
out="./obfuscated/jmxbpObfuscated.jar"/>

where jmxbp.jar contains the classes to be obfuscated, and the obfuscated classes will be written to a JAR file called jmxbpObfuscated.jar located in the obfuscated directory. Some obfuscators read JAR input, a relative directory to class files (where all classes located there and in subdirectories will be obfuscated), or a single class file.

Next, we tell the obfuscator what names we want to obfuscate. We can choose from any of our classes, fields, and methods by visibility, package pattern, name pattern, and so forth. yGuard allows configurable renaming of class, member variable (field), and method names as part of its <expose> tag. Anything you want exposed (i.e., not obfuscated) goes in this tag. There are many permutations of how renaming can occur, so it's impossible to show them all. But say, for example, that we want to expose only the public methods of our public classes. The yGuard configuration for this looks like:

<class classes="public" methods="public"/>

We can also choose to expose all public and protected methods, say, if our software is a library with classes intended to be subclassed. The yGuard configuration for this looks like:

<class classes="public" methods="public"/>
<class classes="public" methods="protected"/>

or we can selectively expose only certain methods of a class. We must take care to tell the obfuscator to expose the class name, so it may be referenced by name:

<class name="jmxbp.common.Basic"/>
<method class="jmxbp.common.Basic" name="void reset()"/>
<method class="jmxbp.common.Basic" name="boolean isTraceOn()"/>

This configuration snippet will expose the Basic class and its reset() and isTraceOn() methods. Every other class and method will be obfuscated.

Finally, the obfuscator produces a log file (also referred to as a "map file") so that we can see the mapping between the original names of our classes, fields, and methods and their obfuscated names. This file can come in handy if, for example, you need to read a stack trace. The obfuscator may also provide a tool to automatically read in the map file, along with the obfuscated stack trace, and produce a meaningful stack trace. yGuard produces a map file, parts of which are shown in Listing 6.

Reconciling Open Source with Obfuscation

The open source software movement is the beginning of the commoditization of software. Since the early 20th century, as industrialized economies matured, the services sector boomed as fewer companies could compete against the dominant manufacturers, and more workers moved from factory jobs to the services industry. Because software isn't a product in the sense that, say, a length of PVC pipe is, the analogy between manufacturing and software development isn't airtight. At some point in the future manufacturing software systems may no longer be necessary. For example, there are certain standard sizes of pipe, and they pretty much all look alike. No one would consider custom manufacturing all of the pipes for a building onsite. Instead they are created by the manufacturer, ordered by the construction contractor, and shipped to the job site. But if you could copy a pipe the way you could copy a program, and the only "warehouse" you need for software is disk space, the manufacturer would be obsolete (or at the very least only a few would be extant). As more and different types of software move into the realm of open source, companies who may have traditionally manufactured and sold their software will reshape their business model around services such as support and customization.

However, as an industry, we are not there yet. Many companies manufacture software and maintain a competitive advantage by the way their software is written. These companies can use an obfuscator to help protect their software assets in a similar way that a wall protects a castle. No castle wall is impermeable, and no obfuscated code is completely safe from reverse-engineering attacks, but it does provide some level of defense. To continue the analogy, the better the obfuscator, the taller and stronger the walls.

What about an open source obfuscator like ProGuard? There seems to be a fundamental contradiction between the terms "open source" and "obfuscator." After all, the open source movement is all about sharing software for the benefit of the community. And the job of an obfuscator is to build a wall around software to protect it from reverse engineering. Or is it? In actuality, an obfuscator's job is to be the first line of defense in enforcing license agreements between the software company and those who would seek to gain an advantage via a reverse-engineering attack (i.e., "cheaters"). You might argue that reverse engineering a commercial product might be useful in solving problems, and, oh, by the way, avoid support costs to the vendor. However, I would argue that if you're reading this magazine, you're probably not the average developer, and wouldn't mind at all taking a little trip through the source code. Furthermore, I believe most reverse-engineering attacks are not aimed at avoiding support costs, or vendors who give away their products (along with source code) and who derive their revenue from the sale of support services and documentation would not be able to survive. But they do.

Things to Watch Out For

Make sure to properly expose classes, methods, or fields that are referenced by name (from your software or from the outside) using the Reflection API. If you don't, the names will not be found at runtime since they have been obfuscated. The sample application for this article makes heavy use of the Reflection API (see the DynamicMBeanFacade class) for building out the management interface of each class, so you'll see in the obfuscator configuration that I'm careful to preserve the appropriate method names accordingly.

Make sure to preserve native method names, so they can be linked to the correct native library.

Be careful when choosing what classes and methods to obfuscate. For example, if you're writing a library, you'll most likely want to keep public and protected methods and fields. Otherwise, your classes cannot be referenced by name. The sample application included with this article is standalone, so all of the classes with the exception of Controller are declared with public visibility. I chose to obfuscate all classes (except Controller and its main() method) since they are not to be called from the outside. When choosing which classes to rename, you'll also be forced to reexamine your design choices. Questions like "Why did I make that class public? It should be package private," or "That method is never invoked outside of itself, I should make it private," will come up, giving you the opportunity to improve your software.

Conclusion

While no software is safe from reverse engineering given enough time, patience, and persistence on the part of the reverse engineer, Java bytecode is especially susceptible. Because bytecode is architecture-neutral, a rich set of metadata is contained in the class file so that decompiling bytecode can very nearly yield the original Java source. A bytecode obfuscator, however, can rename packages, classes, member variables, and method names, making them meaningless. A sophisticated obfuscator can even alter control flow, making decompiled code even harder to read.

References

  • Collberg, C.; Thomborson, C.; and Low D. "A Taxonomy of Obfuscating Transformations." Department of Computer Science, University of Auckland.
  • Lindholm, T.; and Yellin F. (2002). The Java Virtual Machine Specification, 2nd Edition. Addison-Wesley.
  • More Stories By J. Steven Perry

    J. Steven Perry has 13 years experience as a professional software developer, and for the past five of those Steve has been focused on Java development in such areas as application management, XML data binding, and enterprise frameworks. He is an author, a participant on two Java Community Process Expert Groups, and works as an architect for Fidelity Information Services in Little Rock, AR.

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


    IoT & Smart Cities Stories
    The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected pat...
    There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
    Codete accelerates their clients growth through technological expertise and experience. Codite team works with organizations to meet the challenges that digitalization presents. Their clients include digital start-ups as well as established enterprises in the IT industry. To stay competitive in a highly innovative IT industry, strong R&D departments and bold spin-off initiatives is a must. Codete Data Science and Software Architects teams help corporate clients to stay up to date with the mod...
    At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
    Druva is the global leader in Cloud Data Protection and Management, delivering the industry's first data management-as-a-service solution that aggregates data from endpoints, servers and cloud applications and leverages the public cloud to offer a single pane of glass to enable data protection, governance and intelligence-dramatically increasing the availability and visibility of business critical information, while reducing the risk, cost and complexity of managing and protecting it. Druva's...
    BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for five years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe.
    The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
    With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, we provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading...
    DSR is a supplier of project management, consultancy services and IT solutions that increase effectiveness of a company's operations in the production sector. The company combines in-depth knowledge of international companies with expert knowledge utilising IT tools that support manufacturing and distribution processes. DSR ensures optimization and integration of internal processes which is necessary for companies to grow rapidly. The rapid growth is possible thanks, to specialized services an...
    At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...