Before the Tutorial
- While you're at home, with your own Internet connection, you can install any or all of these packages, and perhaps get more out of the tutorial. However, people who don't do so will be at no disadvantage.
- Download and install Virtual Box or VMWare Player.
- If you have access to such software, install virtual machines that run Windows XP or Windows 7. Be advised that some XP malware doesn't work on Windows 7. (And even less works on Windows 8 or 10)
- Install a VM running a Linux of your choice.
- It is also convenient to have a UNIX-workalike on your Windows VMs. I prefer cygwin.
- Download and install a disassembler such as IDA Pro. The free version is fine for our purposes.
- Download and install a debugger. Olly is still widely used, but other debuggers are available, such as Immunity (available at the Olly site) and x64dbg.
- You might enjoy wathcing Ralph Langer's March 2011 TED talk on Stuxnet.
- Want a good book on the subject of malware analysis? Consider Practical Malware Analysis, from No Starch Press. Paper and electronic formats, of course. Includes exercises on real malware.
Overview
- Introductions all around!
- Cyber attacks are in the news all the time! Malware is a factor in many if not most cyber attacks. (User blunders being the other factor.)
- See, for example, the latest issue of Cyberwire
- Or the May 15, 2015 issue of Newsweek
- For great fun, check out The Norse Attack Map
- Cyber includes many different subjects, including malware analysis. But many cyber attacks rely on malware to work.
- Cyber in general, and malware analysis specifically, is an
active area of research.
- See for example the Journal of Computer Virology and Hacking Techniques
- and the Usenix Security Symposium
- and Defcon
- and the occasional Dagstuhl seminar, such as this workshop on Analysis of Executables
- and there are other meetings for industry and government groups.
- Specific topics (not an exhaustive list)
- Malware analysis is aided by advances in machine learning , see for example Using Machine Learning to Detect Malware Similarity
- Spotting malware by string matching is no longer effective. Research is under way to spot malware by matching patterns (grammars?) rather than specific strings.
- There are techniques to hinder or defeat analysis, and research on overcoming these is in progress.
- Look at Symantec and F-Secure and McAfee and Microsoft lab sites. There are many others.
- (Un?)Fortunately, there is no shortage of data to work with:
- A number of malware collections are available for research purposes. Some noteworthy examples:
- VX Heaven is quite dated, but it's still pretty big, and easily accessed. Many malware specimens categorized by type, and lots of related material.
- Zeus Tracker see the FAQ for a link to a zip file with many specimens.
- The CERT malware catalog is big, multiple TBs, and growing. Submitting a specimen to CERT for analysis isn't hard, but that has advantages and disadvantages.
- Lists of malware corpora for research can be found, such as Malware Sample Sources for Researchers
- Anti-virus vendors have large collections of malware. Google's archive of Android malware is probably the biggest malware repository of them all. Not easilty accessed from the outside.
- The variety of malware may surprise you!
- Executable files, whether binaries (.exe or .dll files) or scripts (.bat or.scr). These files tend to be targeted towards the Windows platform. This will be the focus in this tutorial.
- Macs are not immune! But Mac malware is still a small subset of the whole. An overview.
- Web-based malware is now a big deal.
- Exploit kits can attack a variety of platforms. Exploit kits such as Blackhole among many others serve to automate the distibution of malware. A blog post about the creator of Black Hole.
- We can talk about exploit kits at greater length if there is audience interest.
- Mobile phones are a huge target. Android especially, but also iPhone.
- PDF files can contain executable content - which can escape the viewer sandbox and cause damage.
- There are even malicious LaTeX files! A word to the wise: Don’t Take LATEX Files from Strangers (pdf)
- We'll look at static vs. dynamic analysis
- Feel free to follow along! This tutorial is intended to be interactive, without our severe time constraints.
- Practical Malware Analysis is focused on Windows XP, but may still be the best book available. From No Starch Press. Paper and electronic formats, of course. Includes exercises on real malware. Notice the alien peeking.
What does Malware Analysis have to do with Document Engineering?
Those concerned with Malware Analysis tend to ask a lot of the same questions that our Document Engineering community have been working with for years, such as:
- Malware can be viewed as a particular type of document. Hence we can consider questions related to creation, whether manual or automatic. Dissemination of malware is an interesting social and technical problem. Malware is usually designed to be stealthy, and not easily read and understood. To be more specific:
- Malware can be polymoprphic, that is, able to change over time. Like active or dynamic documents?
- Systems for automating the malware authorship process are available, and (apparently) in wide use. Akin to authoring systems?
- Malware analysis tends to produce documents related to the specimen, such as disassembler output, debugging logs, execution traces, network logs, and so forth. Systems for dealing with large sets of related documents is our cup of tea, is it not?
- When are objects similar? Are there families of objects? How can we characterize them? How can we classify them? Will soon demonstrate visualization of malware and malware families.
- Who created this object, and how? Attribution is an interesting and hard question.
- Specific document processing tools as malware attack vectors. What can or should be done?
- Malware analysts (like all analysts) make their living by writing reports. Can the data in those reports be mined?
- This tutorial is a condensed version of a semester-length course at UMBC. So we will skim over some of the details...
Tools of the Trade
- Use of virtual machine software such as Virtual Box is essential, but is not without trade-offs.
- Demonstrate use of Virtual Box.
- You may need to purchase more RAM for your laptop.
- Keep host OS as uncluttered as possible.
- Keep copies of clean installs, as exported appliances
- Shared folders are convenient, but have their risks
- Make backups of VMs using the clone function
- Don't use the same VM for malware analysis and on-line banking :-)
- Become comfortable with building new VMs.
- Become comfortable with running two VMs at once, e.g. a Windows VM for running the malware and a UNIX for pretending to be the Internet
- Dropbox is useful! Especially when shared with one or more VMs.
- Screen shot of VirtualBox's main menu
- Tools for malware analysis fall into several categories
- Platform specific utilities for quick inspection, e.g. Microsoft Sysinternals. Useful for triage as well as in-depth.
- A disassembler such as IDA Pro. Please feel free to get a copy of the freeware version of IDA Pro.
- A debugger such as Olly, Immunity, or x64dbg,
- A network monitor such as Wireshark. Use sudo apt-get install wireshark to get wireshark for Ubuntu and other flavors of Linux. Virtual Box has some network monitoring of its own.
- Reference databases, such as MSDN
- Ordinary system utilities, such as IDEs for C and perhaps assembly, and decompression utilites.
- Malware is usually saved in compressed and encrypted form. A Zip file with the password 'infected' is safe to email. One would think.
- You might like to configure a VM or two with these tools installed.
- Isn't a good anti-virus program enough? Not so!
- Internet-based AV sites such as https://www.virustotal.com/ but see also this article on how malware authors are using VirusTotal.
- Even Symantec says anti-virus is dead!
- Some good AV programs are available for free, according to PC Magazine, such as AVG Antivirus Free. Windows Defender seems to work, too.
- Do make a habit of installing and updating AV software on your host machine, but don't try to run it on your VMs for malware analyisis.
- What are the strengths and weaknesses of AV signatures?
- It's an arms race! Many malware actors work hard to make their malware hard to analyze.
- You might need to dig into details that normal people don't care about.
- There is a learning curve! It would take at least a full-day tutorial to learn it all :-)
Platform-specific Utilities
- For computing file signatures, we have WinMD5. Feel free to download, and unzip it. But before we run it, what should we do?
- Demonstrate taking a snapshot of a VM.
- Note that WinMD5Free.exe has an MD5 of 944a1e869969dd8a4b64ca5e6ebc209a, just as the web site promised.
- What can we see in a binary?
- Demonstrate the strings command from a cygwin shell, again using WinMD5 as the file being inspected. System calls, registry keys, and web sites that seem out of place usually are!
- Strings is one of several utilities bundled up in Sysinternals. You'll need to put the Sysinternals directory on your path, or type the full pathname of the executable.
- A hex editor such as 010 Editor is a useful addition to your tool kit, although die-hards may prefer to use hex mode in emacs :-)
- Malware is usually packed, to avoid A/V, to make analysis harder, and to make a smaller footprint.
- Obfuscation is widely used in malware, especially crimeware.
- There are a variety of pack/unpack utilities available.
- Good overview of unpacking and patching an executable binary.
- So being able to measure the entropy of a file, or part of a file, is useful. See “Using Entropy Analysis to Find Encrypted and Packed Malware.” IEEE Security & Privacy Magazine, 2007, pages 40-45. It turns out that entropy can tell you a lot. Calculating the entropy of a file is a useful first programming exercise, suitable for Python or C or even assembler.
- For more on entropy, see Sorokin's paper on structural entropy, with some highlighting (pdf)
- By the way, knowledge of x86 assembler and Windows system internals can be really useful.
- The focus in this tutorial will be on Windows more than any other platform.
- On Windows, You might find it useful to download and run setup.exe from www.cygwin.org, which gives you a working UNIX-like environment on Windows. Chews up some disk space, though.
- The Portable Executable File Format is described in detail at this Wikipedia article which refers to this spec from Microsoft and this PE poster and this article which describes the smallest possible PE file.
- The PE header can tell us several things, and along with the strings command, we can tell if perhaps the file has been packed or obfuscated.
- Several utilities for working with the PE header are available. PEViewer is free, and seems adequate.
- Demonstrate PEViewer, again using WinMD5.exe as an example.
- If time permits, we can do demos of other tools, such as Dependency Walker, and Resource Hacker.
- The PEiD utility descibed in PMA is still available, but no longer supported.
- In case you need more PE tools, see this post from Malwarebytes Unpacked.
Static Analysis: Disassemblers and Such
We can demonstrate IDA Pro, but before using IDA, a triage step using pestudio (among other choices) is in order.
- Here is a simple C program, compiled with Code::Blocks
#include <stdio.h>
#include <windows.h>
int main()
{
SYSTEMTIME lt;
GetLocalTime(<);
printf("The local time is %02d:%02d\n", lt.wHour, lt.wMinute);
return 0;
}
- A link to this code, in case you don't want to type it in yourself. The program compiles and runs as expected.
- An oveview from pestudio
- The fact that pestudio looks for malware indicators is handy.
- We can also look at the strings.
Moral of the story: one can sometimes learn at lot from the PE header. I now know the programmer's name! - Opening the file in IDA, we see
- and a little lower, we see code we recognize. (Windows and CodeBlocks put a bunch of library code in as well, making the executable larger than the raw .o file would suggest. The red area indicates the program's end.
- and we can see the call graph
- and a graphical view is also available
- Of course IDA also lets us look at strings.
- But you won't see much if the file is packed, which is something that the PE utilities can tell us. So IDA provides some unpacking facilities
- The hex dump will take you back to your undergraduate days, perhaps. May also indicate where buffers might be located later, if and when the file unpacks itself.
- The libraries the binary imports may tell you a great deal.
This is obviously a C program, with no remarkable system calls. But if we had seen low-level keyboard hooks, or registry access, we'd be more suspicious.
- Now compare to a file we know to be be malicious! Let's look at Lab03-04.exe from the PMA book.
- You will also see references to another disassembler, PEBrowsePro. PEBrowsePro is worth trying if you don't need a system as complex as IDA.
- Using PEBrowsePro, we can take a quick look.
- Do we see anything suspicious? The screen shot wouldn't be here if we didn't!
- In IDA, we can see some other malware indicators, apart from the strings mentioned above. The program has a mix of system calls, including file system, registry manipulation, socket calls, and then...building an http header, but not being a browser? Suggests an HTTP backdoor, which is malware that sends information to a web server run by the attacker.
- and a call to sleep, without any obvious reason. Sleep is sometimes used to hide functionality that would otherwise appear under dynamic analysis.
- IDA has debugger capabilities, as well as static program analysis. Probably the single most important tool for malware analysis.
- IDA is a big, complex system. The IDA Pro Book by Chris Eagle is available from No Starch.
- An IDA Pro Cheat Sheet (pdf)
- Other alternatives to IDA exist, such as Hopper for OS X and Linux.
Dynamic Analysis
- Make a snapshot. Make a clone and a snapshot.
- Disconnect your VM from the network before beginning dynamic analysis. Make sure you know how to do this!
- The procmon utility can tell you what's going on, in part.
- The ProcessExplorer prgram gives even more detail.
- Process Explorer may also let us watch what happens when documents are opened using Word or a PDF viewer. If you open such a document and see unexplained activity, a malicious document may be the explanation.
- Look at Norman Sandbox
- PMA refers to the GFI Sandbox and we have an analysis of Lab03-04.exe (pdf) (html). (We just looked at this program with IDA.)
- GFI Sandbox has been acquired by ThreatTrack Security, and the public sandbox is still available.
- Dynamic analysis may involve just running the program, to see what network activity or file system changes can be noted. This includes changes to the Windows Registry. Do we all know what that is?
- Registry snapshots can be made using regshot.
- Feel free to download and install Ollydbg, which is available here
- a summary of Olly commands
- Feel free to download and install x64_dbg, which is available here
- The Immunity Debugger was inspired by Olly, but allows for plug-ins written in Python. You can download Immunity starting from here.
- Careful! Some unpackers have to execute the suspect program in order to have it unpack itself.
- Make a copy of Lab 3-4 on the desktop. Let's just run it and see what happens!
- Now open the file with Olly and see what we can see
- Eventually the process terminates
But the programs acts differently when being debugged...since the file is still where it was. Can we figure out how the file deletes itself on termination? Or how it knows to behave differently when being debugged?
Malware Analysts Write Reports
- Description of the malware
- name, size, date acquired and how
- MD5 and/or SHA hash
- results from VirusTotal and similar utilites
- what kind of malware? Windows executable? VBscript? Exploit kit?
- name, size, date acquired and how
- Results of analysis, static or dynamic
- Excerpts from tools like PEStudio and IDA, such as
- What does the malware do?
- How does it achieve execution?
- How does it achieve persistence?
- Does it communicate with the outside? How? What IP addresses are involved?
- Is there anything unusual about this specimen?
- Is this specimen similar to anything seen before?
- What damage is done? How can the damage be repaired?
- How does this malware spread?
- Who produced it, and why?
- Such malware reports are the format I use for exam questions in the semester-length course. Take home tests.
Malware Analysis in the Large vs. Malware Analysis in the Small
- You will have seen how malware analysis zooms down into details very quickly.
- In my opinion,
- study of families of malware has received relatively little attention
- visualization tools are not yet used as widely as they should be
- Here we have a graph using a subset of the Zeus family
- plus an example of the charts those guys at UCSB use. See this blog post. Quoting from them,
"Here, we consider 68 malware samples which were assigned a single family name (Kolik.A) by an Anti Virus (AV) software. When we cluster these samples and view the distance matrix, we can see that there are 4 smaller tight clusters and many singletons. The singletons could be the possible outliers and could be sent back for re-labeling."
For Further Study
- How can you protect yourself from malware?
- Live off the grid, or
- Use separate VMs for work, personal actitivty.
- Practice good cyber hygiene: don't reuse passwords, and make them hard to guess
- Keep your software up-to-date
- Beginning malware analysts (and experienced ones too) can find the variety of tools for malware analysis daunting, especially for the Windows environment. Maybe we should have a toolkit recommendation like this.
- What separates the best malware analysts from the wannabes?
- Experience!
- both yours and others
- Tenacity!
- Willingness to learn new stuff.
- Willingness to invent (or invest in) new tools.
- Experience!
- Lots of security blogs deal with malware analysis topics from time to time.
- New tools come out from time to time. On my list of things to read
- I like Dr. Fu's site. He's got a tutorial on malware analysis.
- An analysis tool called Truman
- A New Approach to Prioritizing Malware Analysis
- Here's a discussion of Sandbox Overloading
- Here's an interesting report from FireEye
- Comments, corrections, and suggestions to improve this tutorial are welcome! Send email to nicholas at umbc dot edu
- Thanks!