Wednesday, February 25, 2015

Why Hybrid Analysis is not a marketing joke, but a useful technology

In 5 minutes you will know why Hybrid Analysis is useful - and not a marketing joke.

The case

As usual, we were checking reports uploaded to our malware analysis online service. Yesterday, we came by a report of sample* that is actually not that interesting, it is a typical dropper. The only significant aspect about the file at first sight is that it is relatively small (only ~14 KB) and tries to leave as little traces on the system as possible. Nevertheless, since everyone deserves a second chance, we decided to take a closer look and see if we couldn't find something that we could turn into a generic signature for malicious behavior. Generic signatures are great, because they apply to a broad variety of malware and obviously to new variants. We have seen a lot of samples that were uploaded, which were previously unknown to e.g. VirusTotal, but contained a lot of malicious behavior. Anyway, let's dive into the sample.

The first thing I always do is take a look at the signatures that matched. Then, I usually take a look at the network connections and process tree of analyzed processes. This obligatory check on the Hybrid Analysis section sometimes reveals quite interesting annotated disassembly listings (so called "Streams"). Since we can build signatures that fire on any kind of data found in the report, we come by some goodies from time to time.

Hybrid Analysis in action

The following screenshot is taken from the heuristically determined "most relevant" function found with the Hybrid Analysis engine:


We can see a typical pattern used by malware authors to "hide" strings from string-searching algorithms by building/concatenating a string character-by-character, often saving them in a local variable on the stack. This is quite an effective method, because the "final string" is concatenated at runtime so to speak and not lying in memory (i.e. even a process memory scan would not reveal the string, unless the stack/heap is snapshoted in just the right moment). Anyhow, usually these type of strings are API names and used for a GetProcAddress call to lookup the associated virtual address.

Turn it into something useful

The idea we had is the following: if we detect a lot (maybe more than 10) single characters being pushed onto the stack and a reference to GetProcAddress/LdrGetProcedureAddress in the same function/context, then we can assume someone is trying to hide a procedure name lookup from string scanning engines. So we whipped up a signature that does exactly that. Here it is after updating our online service and re-running the sample:


As we can see, there is enough indicators to make the decision that the behavior seen is malicious. This generic signature will fire on any sample uploaded to our service that contains the same or a similar trick. If you are interested in the signature code itself and how it was implemented, please get in touch through our contact form.

Final Notes

In this blogpost we learned that "Hybrid Analysis" (the combination of static analysis on memory dumps/binary files with dynamic runtime data/context information) can add valuable indicators that would have otherwise never been available. That is one of the reasons why VxStream Sandbox can extract more artifacts/indicators to trigger behavior signatures on than most other systems on the market. This does not mean we think our system is the perfect solution, but the underlying technology is solid and we believe that we are developing our software in the right direction.

The full report for this sample: https://www.hybrid-analysis.com/sample/342f9acdb9b89e963761fea283daccf0c7cacaf513a46fd09d9cc89223b9d978/

*SHA256: 342f9acdb9b89e963761fea283daccf0c7cacaf513a46fd09d9cc89223b9d978