Sunday, August 16, 2015

About Dridex, decoding and deobfuscating VBE files, behavior signature triplets and other features

Decoding and deobfuscating embedded VBE files

We will start out this blogpost outlining the technologically speaking probably most exciting feature that we added recently: VxStream Sandbox is now able to detect, extract, decode and deobfuscate VBE (encoded visual basic) macros from input samples. This is a feature we are quite proud of, because we are probably the first and only sandbox that is capable of doing so. We would like to demonstrate the feature on a sample that someone just recently made us aware of: it's a dridex variant (the hash / sample is available at the bottom) that appears in form of a Windows shortcut file and contains an embedded VBE macro as part of its overlay. The sample does not yield good results on some 'APT industry leader' solutions, as we have heard. Anyway, what our system will do is the following:
  • Detect embedded VBE files
  • Carve them out as an 'extra file' for analysis
  • Decode the VBE file to a VBS file for later post-analysis-analysis
  • Launch the carved VBE file additionally to the input sample (in case the input sample fails to launch its payload)
  • Deobfuscate the decoded VBE file
  • Put all that information into the report and have it reflected as part of the Threat Score
The steps an analyst would usually need to take to extract/decode and deobfuscate the macro (to obtain e.g. the malicious URL) would be quite time intensive, so seeing all that in an automated fashion happening within minutes makes us quite happy. The following screenshots will give you just a brief excerpt of the most stunning parts of the report:





As can be seen, it is possible to even download the decoded *.vbs file for further analysis. Also, an interesting conclusion of this sample, especially if the actual payload is not executed, is that pure static analysis can be a very powerful tool when analyzing macros. It might be generic to instrument VB execution and extract data, but that always depends on a successful execution (i.e. what if the file doesn't run as expected?). That's why we believe in the combination of both dynamic and static analysis techniques: something we try to describe as 'Hybrid Analysis'.

Report (including download of sample): Here

Other progress

It is difficult to stay up-to-date with all the feature we add to our webservice, because there is no published changelog. That's why every now and then we like to make a blogpost that gives some insights, but also to recap and archive the development progress we made for ourself. Looking at our public webservice as a visitor, there is two places you can use to indirectly see the development:

Version number on the front page
Total behavior signatures

The total number of behavior has been on a constant rise since we went online late 2014. Whenever we find a new interesting sample, we check if there is some malicious/suspicious behavior that can be turned into a generic and replicable signature. For example, we just recently added a 'Sample was identified as malicious by a large number of Antivirus engines' signature in addition to the previous 'Sample was identified by at least one Antivirus engine'. The new signature has a far higher relevance on our internal 'Threat Score' calculation, because if 25% of 50+ AVs agree that a file is malicious, the chances of a false positive is quite low. While this isn't an example for a generic signature, it is a good example of the gradual and constant improvements that happen to our system all the time.

Incident Response Section

After getting some feedback of incident responders we decided to add a new section called 'Incident Response' that contains a 'Risk Assessment' and a 'Network' area. The 'Risk Assessment' area basically displays some more broad categories (such as 'Spyware/Leak') depending on whether a signature or a combination of signatures matched (configured internally). The idea behind it is to answer the question 'How worried should I be?' (e.g. if the submitter knows an information leaking file was executed on a computer in the finance dept.). 


The 'Network' area is a summary of what you would find in the 'Network Traffic' section to allow quick response based on the IPs and domain names. Ovearll, it does not contain more information than you would be able to read by sifting through the report, but it can save some time on a first glance. This is still a work in progress.

Platform Intelligence Section

The 'Platform Intelligence' section is also new and may appear on malicious reports. It is the beginning of a broader development agenda that we want to learn about a file by comparing/associating its data with data from other reports on the platform. As the database is growing (we have about 30k reports online right now), there will be more and more useful applications.

The first feature implemented as part of the 'Platform Intelligence' section is the 'Report Behavior Comparison' section, which - under the hood - is quite effective in regard to determining if a file is malicious if the report database is large and diverse. What we noticed was the following: if one looks at a single behavior signature (e.g. 'Contains ability to retrieve keyboard strokes') it is often not a strong enough indicator to make a verdict about the file (think of an installer, which is often packed, drops files, shows network activity, sets an autostart registry key, etc.). When one looks at certain combinations of signatures though (e.g. 'Contains ability to retrieve keyboard strokes' AND 'Writes data to a remote process' AND ...) and you check each combination against every report (benign or malicious) in the entire database, it is possible to isolate signature combinations that are unique to malware. Using signature combinations, it is also possible to classify malware, but we have not gone that far yet. Anyway, what we can say is: the larger the number of signature tuples, the higher the confidence will be, but the more specific to certain malware families. Again, this is still a work in progress, but what is nice about the implementation is that we calculate all tuples on-the-fly based on a live snapshot of all triplets of all malicious reports in the database (i.e. if you refresh a report the next day, you might see different data). This feature has been a research topic of one of our main developers some years ago, and because it is still a work in progress and relatively experimental, the results of the section are not added to the 'Threat Score', but just displayed as an additive to the rest of the report.