Disassembling, Decompiling and Modifying executables
Motivation for writing
As professional developers, we create products. We implement ideas, which are usually driven from some business craving for acceptance in the global market, from their target group. We try to deliver elegant, fast and reliable solutions and, quite honestly, we hate when someone use our work without at least saying "thanks, you've really made a great thing". That is why we need to protect our work. And in order to do that, we should be aware of the common vectors used by crackers to hack our software.
In this article, I'm gonna show you how to disassemble and decompile pure executable written in C++, among other interesting things related to managed and unmanaged environments.
First, we’ll need a little bit of a theory so we can really understand what we are doing and why.
Difference between static and dynamic libraries
Historically, the static libraries are the first type of libraries to appear. In Windows you can find them by the extensions .lib and .dll. The main difference between the static and the dynamic libraries is that the static library is directly embedded in the executable, thus increasing its size. The dynamic library, on the other hand, is a separate file which uploads a different image of itself in memory every time it is called from a program. The dll is one, but the image is different and this way any inter-process concurrent issues are avoided. This also enables more manageable updates, but implies a slight performance degradation, which is not considered a big issue.
In general, the dynamic libraries are the preferred approach for building applications. Even in the latest versions of Visual Studio there is no option to create a static library; by default all libraries are considered dynamic. Yet it is still possible to create statically linked libraries through the console environment.
The CPU registers
The CPU registers are the fastest memory located in the CPU itself. They are basically used for every low – level operation, they are the super-fast data storage of the processor. For x86 architectures there are usually 8 32 bit long registers, 2 of which hold the base pointer and the stack pointer that are used for navigation between the instructions. The registers are even faster than the Static RAM (SRAM, known as the cache) and, of course, the Dynamic RAM.
Quick overview of the Assembly language
For this article we need to know few basic things about the assembly language so we can actually understand what we are doing. The Assembly language is unstructured and is based on very primitive instructions, which are divided in the following general types (I’ll describe only the basic operations) :
Data movement instructions
mov – used to copy data from one cell to another, between registers, or between a register and a cell in the memory
push/pop – operates on the memory supported stack
add/sub/inc – arithmetic operations. Can operate with constants, registers or memory cells
Control flow instructions
jmp – jump to label or a cell in memory
jb – jump if condition is met
je - jump when equal
jne - jump when not equal
jz - jump when last result was zero
jg - jump when greater than
jge - jump when greater than or equal to
jl - jump when less than
jle - jump when less than or equal to
cmp – compare the values of the two specified operands
call/ret – these two implement the routine call and return
The Control flow instructions are what we are most interested in here. For a complete tutorial on the x86 assembly language, check this article.
Disassembling and modifying a C++ executable
For our example I’ve created a simple C++ application with basic I/O.
using namespace std;
cout << "Please enter the code: \n";
if(numbers != "82634")
cout << "\nTry again.\n";
cout << "Code accepted";
cin >> hold;
int _tmain(int argc, _TCHAR* argv)
We’ll need to disassemble, debug and optionally decompile our example. Download the following tools that will help us to do that :
I’ve compiled this example which you can download from here. When we start it we see the following simple console application :
It asks for some predefined input. If the wrong code is entered, the following output is presented :
Let’s pretend that we don’t have the source code and we don’t know the code. So what can we do ? Obviously, we have a loop here with some check inside which determines if the program should break from the loop or not.
We also got few strings :
“Please enter the code :”
Debug the executable
Start the OllyDbg debugger (with administrator privileges) and open the exe. (click to enlarge)
What we see in the upper-left window is the disassembled machine code. In other words, you see the instructions written in the Assembly Language. Below that we see the window with the binary code presented in hexademical values, and on the right we see the window with the CPU registers.
Locate the loop conditions
So now that our exe is loaded, started, and the debugger is attached, we have to find the exact place in the assembly code where the check is made. To do that we can use the strings that the UI shows us. Right-click on the assembly code view > Search For > All Referenced Strings . Find the “Try again” string and double-click it. The assembly view will locate the exact instruction which prints that string on the console. We can also see the “Code accepted” related instructions few rows below. It is clear where the loop resides.
Modify the assembly instructions
The next step is to modify some assembly instructions. We see a lot of instructions, but we are most interested in the jmp-related ones that control the position of the stack pointer. If we scroll a little bit up we can see “Please enter the following code…” instruction. In order to escape the loop, we need to change the target address of one of the jmp instructions that we run through.
Let’s take the jb at “00D613A4”, click it twice and change the target memory address to “00D613C7” – the one just before the “Code accepted” ASCII text, which obviously opens a stream.
In order to save it, right-click on the assembly window and press “Copy to executable” -> “Selection” while you’re on the modified row.
An alternative to OllyDbg. What is IDA ?
IDA is a debugger and a disassembler like OllyDbg. But it provides a more user-friendly view of the assembly code, and it can also act as a decompiler. For example, check the following screenshot of its assembly view :
As you can see it is more structured, the various jumps are visualized like graph nodes which facilitates navigation.
Decompiling a C++ executable using IDA
Which brings us to the question “Is it possible to decompile native image in a way that an understandable source code can be generated ?”. The short answer is no.
What it generates is pseudo C code. Let me show you the output of the small example program :
std__operator___std__char_traits_char_(std__cout, "Please enter the code: \n");
v1 = std__basic_ios_char_std__char_traits_char____widen((_DWORD)
std__cin + (_DWORD)*(&std__cin + 1), 10);
v0 = (int *)&v13;
if ( v15 >= 0x10 )
v0 = v13;
v2 = 5;
if ( v14 < 5 )
v2 = v14;
if ( !v2 )
v3 = (int)"82634";
v5 = (unsigned int)v2 < 4;
v4 = v2 - 4;
if ( v5 )
if ( v4 == -4 )
while ( *v0 == *(_DWORD *)v3 )
v3 += 4;
v6 = (unsigned int)v4 < 4;
v4 -= 4;
if ( v6 )
v7 = *(_BYTE *)v0 < *(_BYTE *)v3;
if ( *(_BYTE *)v0 != *(_BYTE *)v3
|| v4 != -3
&& ((v8 = *((_BYTE *)v0 + 1), v7 = v8 < *(_BYTE *)(v3 + 1), v8 != *(_BYTE *)(v3 + 1))
|| v4 != -2
&& ((v9 = *((_BYTE *)v0 + 2), v7 = v9 < *(_BYTE *)(v3 + 2), v9 != *(_BYTE *)(v3 + 2))
|| v4 != -1 && (v10 = *((_BYTE *)v0 + 3), v7 = v10 < *(_BYTE *)(v3 + 3),
v10 != *(_BYTE *)(v3 + 3)))) )
v11 = -v7 | 1;
v11 = 0;
if ( v11 )
if ( v14 >= 5 && v14 == 5 )
std__operator___std__char_traits_char_(std__cout, "\nTry again.\n");
std__operator___std__char_traits_char_(std__cout, "Code accepted");
result = std__basic_istream_char_std__char_traits_char____operator__(std__cin, &v16);
if ( v15 >= 0x10 )
result = operator delete(v13);
So, can we decompile a native image into an understandable source code ? Depends on your idea of "understandable". You have to devote a lot of time and you need to posses serious knowledge of the APIs your operation system use, along with understanding of the C and Assembly syntax.
Decompiling applications written in managed environments
Decompiling .Net apps is also done with debuggers and decompilers for .Net like Reflector, for example (which is actually paid from some time on).
But the exe or dll you see on your desktop is intermediate, not binary code (assuming you do not use NGen). Decompiling C++ apps is hard because the compiler first produces Assembly language code targeted to the specific processor architecture, and next the Assembler gets that code and produces the actual native image. And as we saw, decompiling assembly code is hard.
The MSIL, at the other hand, is very close to the actual source code of your app, e.g. written with C#. You can use programs like Reflector to decompile them, along with some plugins to actually modify them.
So it is actually not so hard to crack an application
Yes, it’s not. With the difference that this process in an actual application will be more time-consuming. Do you know a single popular stand-alone application that has not been cracked ? That is why you need to think of better ways of protecting your software. Understand one simple thing :
Every application can be cracked, if you have access to its native image, just like every computer password can be broken, if you have physical access to the machine.
Of course, there are techniques that allows us to slow an attacker down, which might or might not be enough. But "slowing" doesn't mean "preventing", and that's a topic of another article.
That's from me regarding the topic of decompilation, I hope you learned something new today and, hopefully, this knowledge will help you to better protect your software. Know your enemy before going into battle. Because it's the battle for your own time.
About the author: