Chapter 6 Notes
to accompany Sikorski and Honig, Practical Malware Analysis, no starch press
Recognizing C Code Constructs in Assembly
A lot of malware is written in C. But also Javascript, and php, but malicious executables tend to be written in C.
We want to identify major control structures, i.e. loops and conditionals, as well as code for common data structures such as arrays and linked lists.
Compiler versions and settings matter, so malware analysis shops will keep a repertoire of compilers and libraries on hand.
Don't get stuck on minutiae!
Globals and Locals
- globals (C extern) are declared outside of any function
- but globals are stored on the heap (using memory addresses, e.g
dword_whatever
)
- which makes cross-references from IDA interesting
- and locals (C automatic) are stored on the stack (using stack addresses e.g.
[ebp-4]
)
Arithmetic Operations
- add, sub, idiv, etc. are used in subscript calculations
Conditionals
- cmp instruction, followed by a jnz
- nested conditionals get more complicated, but IDA Pro makes nice pictures
Loops
- for loops consist of initialize, loop body, increment or decrement counter, test for exit
- loop counters are usually in a register, especially if the loop is nested
- shows up as a cycle in IDA Pro disassembly graph
- while loops consist of initialize, test for exit condition, loop body, update and repeat
Function Calls
- Windows supports multiple calling conventions
- Order of parameters on stack, and who cleans up, can vary
- More than one way to return from a function
Three Calling Conventions
cdecl
- parameters are pushed right to left (i.e. reverse order)
- so arguments can be popped left to right!
- return value stored in EAX
- caller cleans up
stdcall
- callee cleans up
- used in Windows API
fastcall
- first (and second) parameters passed in registers
- others passed on stack
- other calling conventions can (and have been) imagined
Other Control Structures
- switch isn't much different from nested if, and is easily distinguished from a
- jump table
- arrays and loops tend to go together...and with linked lists
- structs, with and without pointers
Consider the example C program