Tuesday, November 18, 2014

Welcome Hybrid-Analysis.com - Free Malware Analysis Service

Today we are proud to announce that a Beta version of VxStream Sandbox has been launched as part of our new malware analysis webservice at hybrid-analysis.com. This will remain a free service for everyone and it will give people an idea of our innovative Hybrid Analysis technology. The service is an ongoing project and is likely to receive feature updates in the future. Please keep in mind that it is a new service and needs time to develop.

Update: We just updated our service, improving some bugs with the network traffic display, the stability of the runtime monitor and a few other minor issues (the parameter values are now displayed in hexadecimal). Right now we are re-running the 100+ unique samples we received in the last 24 hours to update all reports. We will only be doing this during the initial phase, as re-running 100 samples takes only about 30 minutes, but later it would stall the system too much. Have fun!

Tuesday, November 11, 2014

Understanding "Torminator" Ransomware

It has been a while since I've posted to Payload Security's blog, because the dev team behind VxStream Sandbox has been quite busy finishing the web interface and focusing on the new report design. Nevertheless, it is our daily task to stress test the system with new samples. So as we were running samples through our system today we had an analysis that was quite on spot and perfect for a blogpost. The malware analysis we will be discussing here shortly outlines the strengths of our sandbox system: behavior signatures paired with a strong analysis engine that includes Hybrid Analysis. I know the term "Hybrid Analysis" is more mystic than obvious to most of the readers, but you should have a good idea after reading this article, so have no fear to continue reading. Before we go into the depths of malware, let me announce something very cool: within the next weeks we will be offering a demo "web interface" with navigable reports to give everyone an impression of the system overall.

UPDATE: You can view the full report at our new free malware analysis service here: http://www.hybrid-analysis.com/sample/f0a068c48d260ebd182861e114edfb4383f922ec8186fa6b9ffb247a37da36eb/

Tumble down the rabbit hole (with VxStream Sandbox) 

The sample we will be looking at is labeled by Kaspersky as "Trojan-Spy.Win32.SpyEyes.aryc" (SHA256: f0a068c48d260ebd182861e114edfb4383f922ec8186fa6b9ffb247a37da36eb), but we will call it "Torminator", because it destructs the system (e.g. deletes all shadow copies), encrypts user files and then asks for a ransom to decrypt/restore the files (i.e. a typical ransomware). The restore page(s) are TOR websites (e.g. https://<random>.torminater.com/).

The first thing we do when we test our system (besides reading our own report) is check what the "competition" detected in order to determine the quality of our analysis. Unfortunately in this case, malwr (the free service running Cuckoo Sandbox) did not so well (failed analysis happens to us all the time):




Please note that today, some malware is even aware of analysis system software presence (detecting e.g. third-party tools like Wireshark, AutoIt etc.) and falls asleep not executing its payload (which is why "dormant code detection" as implemented by Hybrid Analysis is so important). Also, userland hooking engines (as utilized by Cuckoo Sandbox) will always have detection issues (a bit like Heisenberg's uncertainty principle that describes the disturbance triggered by the act of observation), which is somewhat countered by a whole list of advantages (such as fast portability between Windows versions compared to kernel code, sometimes stability or being a lot closer to the instructions and data), but that is another discussion. Should you be interested, you can see the full malwr report here. On a side note: I did not want to downtalk malwr or its free service (how can anyone ever complain about free work?!), but the comparison shows that different systems can have different results and it is always good to rely on a variety of tools, even if they are from the same category ("forensic malware analysis").

When we take a look at the VxStream Sandbox report, we always start out with the behavior signatures as based on that we can get a very good idea of what the malware does, what functionality it contains and get entrypoints for deeper analysis. The more we scroll down, the deeper we fall into the rabbit hole and get to know about more details.



An interesting malicious signature that immediately pops into our face is the "Deletes volume snapshots" signature, as it characterizes a unique feature (dropping files and writing memory into foreign processes is common among malware). When we uncollapse the signature we obtain some more details about the signature:



As we can see, the "volume snapshots" are deleted using vssadmin.exe using the commandline "Delete Shadow /All /Quiet" (note the /Quiet).

The next step I surf to is usually the "Screenshots" and "Hybrid analysis" section (see menu on the right), because they contain visual information (which is always interesting) and a process tree of the original sample that contains infected and newly created processes. Also, I can take a look at in-depth data about each monitored process. In this case, this is how the process tree looks:


As we can see, the malware injects itself into explorer (or creates a new instance) and hides itself in a svchost process (quite typical for malware) to then create a notepad instance, delete shadow copies of the harddrive, disable recovery mode and set the boot policy to "ignore all failures". Not very nice. ;-) If we take a look at the second screenshot below we can see that a notepad instance with the title "DECRYPT_INSTRUCTIONS" and some informational text (in german) is created:



What it basically says is that all files were encrypted with a RSA-2048 key and it is possible to recover the files if one visits some "personal website" and pays a bunch of $$$ to some crooks. Typical ransomware, just that a TOR service is being used. More interesting is to take a look at how the software works. If you are interested in very specific details, such as the logged API calls, all registry accesses, created mutants, touched handles or "streams" (more on that later), then it is possible to click on any process in the process tree and navigate the in-depth "sub-reports" (on a per process basis). Here are two examples:



In the example above we can see a simple list of API calls. What is nice is that there is some additional "meta parameters" (those that are in brackets) containing additional information (e.g. the pathway connected to a handle). Let's take a look at some "streams" (basically annotated disassembly listings):


As can be seen, disassembly instructions were extracted from a memory dump file and annotated accordingly. What is nice is that the "vsaadmin.exe Delete Shadows /All /Quiet" call was reconstructed automatically by the Hybrid Analysis engine using stack simulation and data flow analysis. The malware author tried to hide the string adding each character to the local stack frame, because the string cannot be detected using a binary search in that case. Luckily, we are equipped with a powerful tool to counter-attack that measure. ;-)) One additional note: the screenshots presented here are from the web interface, i.e. they do not represent the "full report" which contains all the gathered information of the analysis system. Full reports are available in JSON, XML and HTML currently.

Anyway, if we scroll to the network traffic section we get a good overview including a graphical "world map" that highlights countries that were contacted:


If we scroll down even further we can take a look at the dropped files and download them as well:


As we can see, two interesting dropped files are available for further inspection. An alleged "jpg" file that is actually a COM executable and the file named w7-32@pumma[1].txt ("w7-32" is our computer name) which is a configuration file. Adding to the monitored processes and memory dumps, dropped files can provide valuable indicators for (automatic) post-processing. Please note that all dropped files are parsed by Hybrid Analysis as part of a normal analysis already (i.e. disassembly streams and strings/API calls are extracted). Scroll down the report even further down, there is a list of extracted strings (from a variety of sources) and some informational notices of the analysis system itself. That's it.

One last note: to be fair, we added the ransomware signature "Deletes volume snapshots" after we found the sample and ran it for the first time, but it only took two minutes to add the new signature script and five minutes to re-run the adapted system, because the signature interface is very open andcan be scripted easily. This iterative approach is what some people call agile security.

Conclusion

A malware analysis system that provides reports which are straight forward outline, but the option to look at in-depth behvaior at the same time is a very good basis to understand and adapt to malware threats today. Generic behavior signatures are a strong and powerful tool, but they all depend on the underlying system to provide data that can be used as a trigger. That is why technologies such as Hybrid Analysis that can extract strings, API calls, shellcode and dormant code are invaluable on a large scale, because the overall data will have a higher quality. In this blogpost we outlined how easy it is to understand the most important aspects of malware within 10 minutes. For deeper analysis, the disassembly listings and the provided context files provide a good entrypoint. The reports VxStream Sandbox generates are more than just an impression.

Thursday, October 9, 2014

New Feature Benchmark and first Report UI Preview

This will be just a very short blogpost. Our VxStream Sandbox malware analysis system development process is progressing well. Currently, we have completed development of a Beta Version that is being tested on large scale systems and released a feature product comparison chart to give people an idea of what will be included in the first version. Go and check it out at our new VxStream product page.

Friday, August 1, 2014

Preview On VxStream Sandbox Automated Malware Analysis System

Last month we published an article "Hybrid Analysis - NextGen Technology for Advanced Malware Payload Detection" that outlined our StaticStream core engine and also appeared in the July's edition of the Hakin9.org magazine. It outlined some aspects of automated malware analysis systems, specifically that the "NextGen" automated systems will require a combination of dynamic and static analysis techniques in the future, because VM detection on the malware end is growing stronger and the preset environment does not always meet the conditions to trigger the interesting payload. In other words, it is important to detect and analyze non-executed code sequences at runtime. We understand this requirement and since we have been building on some in-house tools to extract run-time data from malware decided to take things one step further and automate the process, creating a fully automated malware analysis system that we call VxStream Sandbox, to a degree borrowing its name from the "streaming architecture" of StaticStream that is a core and integral part of the overall system. In this blogpost, we will outline some of the features the new system has and give a brief overview.

Description         
In-depth analysis of 32-bit executables on all compatible Windows Operating Systems
High-speed algorithms that allow in-depth analysis within minutes
Flexible hooking system to monitor run-time behavior
Intelligent process monitoring that follows malware injecting into system/user processes
Implements common anti-VM detection techniques (e.g. undetectable to paranoid fish)
Hybrid Analysis integrated (combination of static and dynamic analysis)
Dormant code detection based on executed function calls
Injected memory logging for in-depth analysis (shellcode detection)
API calls with parameter values/names, register values and call stack
Full Registry access, Process Handles, Mutants, etc. monitoring
Memory snapshots to detect unpacked code during runtime
Open and configurable behavior signatures, add your own signatures
Third-party integration of e.g. YARA signatures possible
Extensive pure static analysis on sample (imphash, ssdeep, etc.)
Unique screenshot detection
Dropped/created file detection for multi-stage analysis
Network traffic filters and extraction of key data (HTTP request/contacted hosts)
Extensive XML and JSON reports for post-processing
Optional (automatic) persistence of reports into supported databases
Wide range of configuration options and logging features
As we can see, the list of features is already quite extensive, but there is always room for improvement. Since we are very convinced of (and seen already) the real-world practicability of our software system, we are going to invest more resources into taking it to the next level.

The following diagram outlines the overall system quite well (from a "birds perspective"):



As we can see, it is quite straightforward and the general data processing (with parallelization) is the conversion of an input sample to an in-depth report that is machine parsable. Of course, the behavior signatures that are applied to the extracted run-time and static analysis data are configurable, are ever-growing by nature of malware forensics and can be shared amongst users.

Thursday, July 10, 2014

Hybrid Analysis - NextGen Technology for Advanced Malware Payload Detection

As malware evolves, the era of pure dynamic analysis systems is coming to an end. What potential does Hybrid Analysis have?

by Jan Miller (jan(dot)miller(at)payload-security.com)
What you will learn…
What you should know…
About malware analysis challenges
What Hybrid Analysis is about
Why Hybrid Analysis is successful
Basic knowledge of x86 Assembly
Basic knowledge of Malware Analysi

Introduction

The Internet connects a wide range of personal computers for private and business purposes that often run Microsoft Windows OS on x86 compatible architectures with Windows ranging at 90% market share in the desktop segment (NetMarketShare, 2014). These monocultures are an extremely attractive environment for numerous malware attacks. Today, malware often appears in the form of highly complex Trojan systems that come with exploit kits and very sophisticated anti-detection measures. The number of infections and the awareness in the industry is larger than ever. Today, there are about 4 million new infections per month (SecureList, 2014). The worm MyDoom.X alone caused damages of about $38.5 billion – and that was in 2006 (Borglund, 2014). Lately, also due to the NSA scandal, the awareness for IT security has been growing a lot and IT security is becoming a highly invested market.
Classical malware detection methods were based on pure static code analysis, such as finding a specific byte pattern and matching it against a known database of “malicious signatures”. Static analysis can be described (in the most general sense) as code analysis without execution of the target payload. In turn, malware authors started releasing packed/encrypted or even polymorphic software that rendered classical methods worthless. Consequently, anti-virus (AV) vendors, CERTs/CIRTs and malware researchers started developing and using dynamic analysis systems. Dynamic analysis can be described (in the most general sense) as code analysis during execution or emulation of the target payload. This was a huge step in evolution, because when the execution environment is instrumented appropriately, it allows the observer to see the target software behavior after the malware unpacks its security layers. Today, dynamic analysis systems run the target software on virtual environments with hardware acceleration support (such as VMWare or VirtualBox), in order to observe the malware behavior during runtime. These often automatic systems are called “Sandbox” analysis systems, as they represent an isolated execution environment for malware that simulates a real victim’s machine [1]. Using systems such as VirtualBox, the virtual machine (VM) state can be restored to a clean state by loading predefined snapshot files, thus allowing execution of numerous malware samples in sequence without the need to restore the infected machine. Of course, malware authors have adapted to the growth of Sandbox systems and introduced a variety of VM detection methods. If a VM environment can be detected, the malware may behave differently as it would in the wild and not show its true behavior. The not-observed malicious functionality is what we call dormant code. These avoiding techniques range from delayed execution – so called “time bombs” – to complex system/hardware state detection methods. For example, if the real payload is not executed within a reasonable amount of time – the analysis system will give up on the analysis and potentially miss valuable information. Thus, dormant code detection is a vital prerequisite to Sandbox systems. Analysis results get even better when dormant code is analyzed in-depth using runtime context information.
Combining both static and dynamic analysis (typical term is Hybrid Analysis) in a fully automated, scalable and performant analysis environment is the next generation in malware forensics and detection algorithms. In this article, we will take a look at what dynamic analysis data is necessary to understand dormant code and how we can combine it with static analysis to extract in-depth behavior information.

Terminology

In this chapter the most important terms are outlined, in order for all readers to be at the same level for when the terms are used in the article.

Static Analysis

Static analysis can be described in the most general sense as code analysis without execution of the target payload. The target code (the analysis input data) may be a compiled binary file or a human-readable format, such as program source code, scripting language files or any other type of machine code representation. N. Ayewah et al. define static analysis as a method that “(…) examines code in the absence of input data and without running the code, and can detect potential security violations (…), runtime errors (…) and logical inconsistencies (…).” (Nathaniel Ayewah, David Hovemeyer, J. David Morgenthaler, John Penix and William Pugh, 2008).

Dynamic Analysis

Dynamic analysis can be described in the most general sense as code analysis during execution or emulation of the target payload. Involved techniques are usually implemented by tools such as execution visualizers, system observing tools (e.g. malicious behavior detection, intrusion detection, performance observation, etc.), profilers or other types of behavior analysis tools (e.g. sandbox systems). The only known technique used for performing dynamic analysis is instrumentation of the target code or its host (i.e. instrumenting the Operating System to enable system-level profiling of the suspect application), in order to profile the target code’s behavior (Kendall, 2007). Instrumentation refers to techniques that insert additional code for analysis purpose (or instrumentation code) into the target code, in order to measure client performance, detect bugs or intercept code-flow in order to analyze certain behavior patterns. In malware analysis, behavior patterns are often the most interesting.

Dormant Code

Dormant code or dormant functionality in malicious programs is payload/code that is not observed during dynamic analysis. In the context of malware, dormant code (not to be confused with “Software rot”) may be hiding very interesting behavior that was not executed during analysis for whatever reason (e.g. due to virtual machine detection, a command and control server not being available, a long initial sleeping delay, etc.). We can say that every pure dynamic analysis containing “no malicious behavior” always contains some kind of dormant code (as the executed code coverage is never 100%) and sometimes malicious dormant code. As the “false negative” case is to be avoided at all cost (i.e. thinking something is clean that is not), it makes sense to invest resources into detecting dormant code. This can be achieved by adding e.g. an additional static analysis layer on memory snapshots.
On a side-note: process memory context constantly changes. Thus, it is necessary to take memory snapshots at an intelligent point in time or with a high frequency to “catch” e.g. unpacked code or injected shellcode, etc. In a “perfect” world with quantum processors, an analysis system would be able to observe any memory change and instantly analyze the entire process address space for all potentially executable code locations and not make an impact on the performance. Unfortunately, we do not have quantum computers and as such need to require on heuristics and shortcuts, leaving room for mistakes. For example, analysis systems that run through thousands of files per day have an analysis time limit that they have to abide by. If nothing happens within the first ~5-10 minutes, it is off to the next file and heuristics have to do the job. Thus, the better and more intelligent the underlying algorithms and performance of the system overall is, the more files can be analyzed in a more complete and error-reduced fashion. Of course, scalable systems and a lot of hardware can solve bad implementations to some degree, but there is always a limit in the real world hardware-wise and other bottlenecks surface on large parallel systems, i.e. quality starts at the lowest level keeping in mind a flexible architecture.

Hybrid Analysis

Hybrid Analysis (HA) is something we call intelligent combination of static and dynamic analysis. It is a technology or method that can integrate run-time data extracted from dynamic analysis into a static analysis algorithm to detect behavior or malicious functionality otherwise not as easily possible. Often, the dynamic “helper data” resembles memory snapshots, runtime API symbol data (memory reference address values) and adding them as an input to a sophisticated static analysis engine (possibly including data flow analysis). For example, if a dormant code sequence executes an indirect call, it would not be possible to resolve the called function address without knowing the value read from a memory location at the point in time of execution [2]. Even if we knew the value, it would not be possible to associate the called function address with a system call, if a mapping of memory references to symbol information is not available for the specific execution environment [3].

Hybrid Analysis in Action

In this chapter we will apply Hybrid Analysis techniques on an exemplary malware and evaluate the results in order to take a look at the practical side of the topic. In the previous chapter, Hybrid Analysis and its associated terms were outlined briefly.

Tools

Before we get to the experimental results, the involved tools will be outlined briefly.

VirtualBox

For our example malware analysis, we will be using VirtualBox as our preferred virtual machine environment. From the main page Oracle states that “VirtualBox is a powerful x86 and AMD64/Intel64 virtualization product for enterprise as well as home use. Not only is VirtualBox an extremely feature rich, high performance product for enterprise customers, it is also the only professional solution that is freely available as Open Source Software under the terms of the GNU General Public License (GPL) version 2.” (VirtualBox) Sounds good? It is good. Definitely good enough to show what HA is about.

StaticStream

StaticStream is our preferred static analysis engine, as it can take dynamic data (such as memory snapshots, symbol data) and put it together using HA technology. From the webpage, it is described as following: “StaticStream is a high-performance static analysis engine that is written in C++ and can analyze x86 PE files, memory dumps or shellcode. It uses a novel approach of combining dynamic data with state of the art static analysis techniques in order to detect and understand dormant code. It offers a wide range of configuration options and regular updates.” (Payload Security)

Dynamic Analysis Tools

For run-time data capturing we are going to use the AREE (Automatic Reverse Engineering Engine) Manager and Monitor binaries. These are two in-house tools used at Payload Security to generate dynamic data when running malware. These tools work similar to the Cuckoo Sandbox monitor library “CuckooMon” in the sense that they detour calls at the application level, whereby the Manager is used to load configuration data and start the analysis. The monitor is a DLL file that is injected into the initial malware process and user-level hooks are applied to catch system API calls. Also, whenever the malware tries to inject itself into another process (e.g. using a remote thread or other techniques), the monitor is applied to the new target process. In order for our experiment to be successful, injected shellcode, memory dumps, process context (loaded modules, registry accesses, mutants, etc.) and symbol information (module exports) are logged before the malware is able to modify/taint the data. Why did we use our own tools? Basically, we only decided to use them, because the generated dynamic data has a preferred format that is understandable to StaticStream and we can show how HA works more easily. If you want to replicate our experiment and want to try out the tools, feel free to contact us.

Hybrid Analysis vs. Matsnu Trojan

Now that we know about the tools involved, let us take a look at real malware and see HA come into action. For our “experiment”, we decided to use a Trojan called Matsnu [4] that encrypts files on the target drive in order hold the unencrypted data as a ransom. These are the steps we will be taking:
  • Install a VirtualBox instance with a typical OS, such as Windows XP
  • Load Matsnu sample on the virtual machine drive
  • Run Matsnu sample using AREEv2Mgr and inject AREEv2Mon monitor library
  • Let the analysis run for a couple of seconds (it is enough) and grab the generated run-time data
  • Take the grabbed run-time data and use it to analyze memory snapshots using HA technology
  • Evaluate the results and draw a conclusion
First, let us install Windows XP and load Matsnu on the main drive. The following screenshot shows the system after setup shortly before an analysis.


Figure 1: Start Screen after Installing Windows XP and loading “matsnu” on the main drive

As we can see, there is a “shared folder” (release) open with the Manager ready to start the Matsnu application. Also, we notice that Matsnu is using a PDF icon in order to mislead the Windows user into thinking it is dealing with a document and not an executable. As extensions are disabled by default, we cannot know at first sight that it is an executable.
In the next screenshot we see the manager open and use the command “.run C:/Matsnu” to start analysis manually. There is also a command-line interface, but that is not outlined here.


Figure 2: Running “matsnu” from the Manager using the interactive mode

At this point we can already observe an output folder “AREE” that has been created on the C: drive. It will contain all the dynamic analysis information. Also, the Matsnu file is missing. Checking the captured files in the “AREE” folder, we detect that this is implemented using a dynamically created batch, which is deletes itself after deleting the original file “Matsnu.exe” on the C: drive. Also, the batch file is executed from a duplicated process so that the original file is not in use by the OS. This is the batch file content:

:l
if not exist "C:\Matsnu.exe" goto e
del /Q /F "C:\Matsnu.exe"
goto l
:e
del /Q /F "C:\DOCUME~1\mjkdmjmj\APPLIC~1\5176313.bat"
All in all, the malicious process duplicates itself upon startup, deletes the original file, but continues to exist. The PDF file is missing for the user and the malware author’s probably assume that the user will continue with daily business not putting thought to what happened.
After running the sample for a couple of seconds, we abort the analysis, quit the VM and take a look at the captured dynamic data. This is how the dynamic data folder looks like.

Figure 3: Dynamic Data Folder

The “api” folder contains system calls and parameters, the “bin” folder contains captured files (e.g. the *.bat file mentioned above), the “ctx” folder contains environment data (such as loaded modules, their symbols, registry accesses, etc.), the “dmp” folder contains memory snapshots of multiple frames and the “shc” folder contains extracted shellcodes. The “monprocs.csv” file contains an overview of all monitored processes. In this case, the contents are similar to the following (reduced version):
15539444-00013192,"INJECT_NEW","c:\Matsnu.exe","\Device\HarddiskVolume1\Matsnu.exe","<date>"
15540015-00013280,"INJECT_EXISTING","C:\WINDOWS\system32\cmd.exe","\Device\HarddiskVolume1\WINDOWS\system32\cmd.exe","<date>"
15540115-00001528,"INJECT_EXISTING","C:\WINDOWS\Explorer.EXE","\Device\HarddiskVolume1\WINDOWS\explorer.exe","<date>"
We quickly see that Matsnu first runs the batch file and then injects itself into “explorer.exe” where it remains to execute most of its payload. This makes manual debugging with e.g. OllyDbg more difficult.
Consequently, we first try to analyze the memory dump files (ignoring all system files) from the explorer.exe process using symbol memory references and module information as “context information”, which is one of the ideas of Hybrid Analysis. Specifically, we start StaticStream letting it analyze the last frame of the process (i.e. the last “dump” we logged before quitting the VM), because it often contains already unpacked code sequences. See the following StaticStream’s output in a shorter form (passing by nearly 1.6 million instructions including data flow in an impressive ~3 seconds):
Welcome to AREE v2.1
Starting analysis ...
Adding undefined memory file 15540115-00001528.00000002.15561486.2B90000.00000040.mdmp (POI: 0, Executable: 1) for later analysis
Found a hidden PE file in memory file 15540115-00001528.00000002.15561486.3730000.00000002.mdmp at 3730000
Analyzing in-memory binary file 15540115-00001528.00000002.15561486.3730000.00000002.mdmp
Analyzing 1 exports
1 of 1 exports accepted
No packed files could be detected
Running heuristic scan on binary file 15540115-00001528.00000002.15561486.3730000.00000002.mdmp
Generating final analysis report
Number of passed instructions: 1660669
Finished analysis in 3276 ms with a throughput of 445 KB/s
This is an excerpt of how one output folder with stream files containing disassembly listings looked like (a human-readable output is the default behavior):

Figure 4: Streams Folder File Listing

Hand-browsing some of the stream files quickly reveal that one portion of the streams contains encrypted payload and one portion contains unencrypted payload. Here are some of the more interesting functions that could be used for post-processing to generate behavior signatures or used as an entrypoint for an additional manual analysis:

Figure 5: Persistance using RegCreateKeyEx

The above “code sequence” (or “Stream”) shows the call to RegCreateKeyExW at ADVAPI32.dll that would otherwise not be detected using pure static analysis, as the indirect call memory reference would not be resolved. In this case, the creation of a registry key and a registry key value was set during execution, as indicated by the dynamic analysis registry logfile (i.e. the associated code sequence is not dormant code):


Figure 6: Persistance using Registry

Converting the hex values to ASCII reveals the following pathway:
C:\Documents and Settings\mjkdmjmj\Application Data\Microsoft\qfpvideo.exe
Matsnu obviously tries to survive a reboot by adding itself to the auto-start registry, which is a very common technique. Checking more streams, another interesting entrypoint was found quickly. It is the function that encrypts the Command & Control server requests before sending the data over an alternate HTTP connection.


Figure 7: Encrypting Payload before C&C request

The code location above is a good starting point to check cross-references and intercept the encrypted key creation (of course, this requires a flexible monitor system). Also, please note that using a run-time capturing mechanism located at the kernel level, such a system would not be able to capture the unencrypted data without hooking into the user mode and becoming detectable again.
Today, more and more malware is using encrypted traffic (not only HTTPS, but the payload itself being encrypted as well), making it necessary to move closer to the malware code itself, as encryption/decryption of important system data happens at the application level.
On a side note, the HA technology also revealed the following C&C server IP addresses using the alternate HTTP port 8080:
50.31.146.134:8080
204.197.254.94:8080
78.129.181.191:8080
27.124.127.10:8080
173.203.112.215:8080
50.97.99.2:8080
103.25.59.120:8080
5.135.208.53:8080
50.31.146.109:8080
204.93.183.196:8080
… and a lot more interesting dormant code sequences, which are not outlined here.

Conclusion

Although the Matsnu Trojan is not the most sophisticated malware available today, it is a good example, because it reflects typical and state of the art aspects. The traffic communication uses encrypted payloads, it tries to hide its payload injecting itself into a variety of processes, it decrypts its payload inside the explorer making manual debugging difficult, and so forth. Using some run-time data capturing tools we were able to extract a lot of information, including dormant code and complete symbol information. Of course, the dynamic analysis tool was required to follow the malware into the explorer and remain undetected. As a next step, the static analysis engine StaticStream associated run-time data and generated code sequences for post-processing quickly, allowing us to find valuable analysis entrypoints and behavior data otherwise unseen by a pure dynamic analysis engine.
In general we can say that static analysis is good, if the to-be-analyzed data is not encrypted, not obfuscated and available in a more or less complete manner, etc. Sadly, this is not often the case with malware today. Furthermore, we can say that dynamic analysis is good as well, but it misses dormant code and potentially malicious functionality. As we cannot make any qualified statements about the unknown, it is impossible for a pure dynamic analysis system to safely make a statement about a file being benign/clean, because maybe the real payload was never executed. Thus, new Hybrid Analysis (HA) technologies are not only a necessity, but part of a future solution in the battle on malware. Due to the additional overhead imposed by hybrid technologies, very efficient and performance-oriented algorithms are necessary, especially if viewed on a large scale.

Summary

In this article we outlined that today’s malware development is opening up new challenges for malware analysis systems. In the early days, simple static analysis byte patterns were enough to detect and classify malware. Then, as malware became more sophisticated, dynamic analysis systems that observed run-time behavior surfaced. The dynamic analysis systems have evolved and are a powerful tool today, but their impact is becoming more and more limited. Today, neither static nor dynamic analysis alone is an effective weapon against modern malware. Dynamic analysis environments are either being detected and/or malicious dormant code is not being analyzed, due to time-constraints or unpredictable code flow behavior. Using intelligent algorithms and Hybrid Analysis (HA) technologies, the best of both worlds can be put together: first-pass checks, analyzing/logging run-time behavior, as well as detecting and understanding dormant code functionality. In this article we showed that Hybrid Analysis is an answer, if the run-time data captured has a sufficient quality and the static analysis engine is flexible enough to produce usable analysis results that can be post-processed to generate signatures or indicators.

About the Tools

In this article we put focus on a static analysis engine called StaticStream. It is a product of Payload Security and makes automatic and efficient Hybrid Analysis available to dynamic analysis systems and analysts. Its easy interface, high configurability and flexible data stream processing architecture make it an interesting option to upgrade any dynamic analysis system for challenges today and tomorrow.

On the Web

More information on StaticStream is available on the web at www.payload-security.com.

About the author

Jan Miller is a specialist for static binary analysis algorithms, reverse engineering and malware signatures. He is the CEO and founder of Payload Security UG (haftungsbeschränkt). In the past two years, he has been putting focus on Android based malware, as well as implementing Hybrid Analysis technologies for a leading dynamic analysis system.

Table of Figures

Figure 1: Start Screen after Installing Windows XP and loading “matsnu” on the main drive.
Figure 2: Running “matsnu” from the Manager using the interactive mode.
Figure 3: Dynamic Data Folder.
Figure 4: Streams Folder File Listing.
Figure 5: Persistance using RegCreateKeyEx.
Figure 6: Persistance using Registry.
Figure 7: Encrypting Payload before C&C request

Bibliography

Borglund, J. (2014, April). Top 5 Most Costly Viruses of All Time. Retrieved April 2014, from TopTen Reviews: http://anti-virus-software-review.toptenreviews.com/top-5-most-costly-viruses-of-all-time-pg5.html
Cuckoo Sandbox. (n.d.). Malwr - Malware Analysis by Cuckoo Sandbox. Retrieved June 24, 2014, from https://malwr.com/analysis/YjQzNzExNjcwMDQyNDBhMmJmOTFhN2Y4ODk5ZmQ0NGM/
Kendall, K. (2007). Practical Malware Analysis. Mandiant, Intelligent Information Security.
Nathaniel Ayewah, David Hovemeyer, J. David Morgenthaler, John Penix and William Pugh. (2008). Experiences Using Static Analysis to Find Bugs.
NetMarketShare. (2014, April). Desktop Operating System Market Share. Retrieved April 2014, from http://www.netmarketshare.com/
Payload Security. (n.d.). Payload-Security.com - Combining Static and Dynamic Analysis Intelligently. Retrieved June 24, 2014, from http://www.payload-security.com/
SecureList. (2014, April). Internet threats statistics. Retrieved April 2014, from SecureList: http://www.securelist.com/en/statistics#/en/map/oas/month
VirtualBox. (n.d.). Oracle VM VirtualBox. Retrieved June 24, 2014, from https://www.virtualbox.org/

Footnotes


[1] Executing malware on a prepared physical machine is possible as well, of course.
[2] Using a memory snapshot from a later point in time is possible as well, if the value remains unchanged.
[3] The “specific analysis” reference is important, because techniques such as ASLR (Address space layout randomization) cause system API function addresses to not be predictable. As such, we always need to understand detected dormant code in a process context of a specific execution environment.
[4] SHA256 e008e161cce090242262fc977b6fe707d3058cdaa3b5d5c3bab24c8c6b05ce9e