Using Google Threat Intelligence and Code Interpreter to Empower Gemini for Malware Analysis
What is Code Interpreter?
A tool that converts human-readable code into commands that a computer can understand and carry out is called a code interpreter.
What is code obfuscation?
A method called “code obfuscation” makes it more difficult to understand or reverse engineer source code. It is frequently used to hide data in software programs and safeguard intellectual property.
Giving security experts up-to-date tools to help them fend off the newest attacks is one of Google Cloud‘s main goals. Moving toward a more autonomous, adaptive approach to threat intelligence automation is one aspect of that aim.
As part of its most recent developments in malware research, it is giving Gemini new tools to tackle obfuscation strategies and get real-time information on indicators of compromise (IOCs). While Google Threat Intelligence (GTI) function calling allows Gemini to query GTI for more context on URLs, IPs, and domains found within malware samples, the Code Interpreter extension allows Gemini to dynamically create and run code to help obfuscate specific strings or code sections. By improving its capacity to decipher obfuscated parts and obtain contextual information depending on the particulars of each sample, these tools represent a step toward making Gemini a more versatile malware analysis tool.
Building on this, Google previously examined important preprocessing procedures using Gemini 1.5 Pro, which allowed us to analyze large portions of decompiled code in a single pass by utilizing its large 2-million-token input window. To address specific obfuscation strategies, it included automatic binary unpacking using Mandiant Backscatter before the decompilation phase in Gemini 1.5 Flash, which significantly improved scalability. However, as any experienced malware researcher is aware, once the code is made public, the real difficulty frequently starts. Obfuscation techniques are commonly used by malware developers to hide important IOCs and underlying logic. Additionally, malware may download more dangerous code, which makes it difficult to completely comprehend how a particular sample behaves.
Obfuscation techniques and additional payloads pose special issues for large language models (LLMs). Without specific decoding techniques, LLMs frequently “hallucinate” when working with obfuscated strings like URLs, IPs, domains, or file names. Furthermore, LLMs are unable to access URLs that host extra payloads, for instance, which frequently leads to speculative conclusions regarding the behavior of the sample.
Code Interpreter and GTI function calling tools offer focused ways to assist with these difficulties. With the help of Code Interpreter, Gemini may independently write and run bespoke scripts as necessary. It can use its own discretion to decode obfuscated elements in a sample, including strings encoded using XOR-based methods. This feature improves Gemini’s capacity to uncover hidden logic without the need for human participation and reduces interpretation errors.
By obtaining contextualized data from Google Threat Intelligence on dubious external resources like URLs, IP addresses, or domains, GTI function calling broadens Gemini’s scope while offering validated insights free from conjecture. When combined, these tools enable Gemini to better manage externally hosted or obfuscated data, moving it closer to its objective of operating as an independent malware analysis agent.
Here’s a real-world example to show how these improvements expand Gemini’s potential. Here, we are examining a PowerShell script that hosts a second-stage payload via an obfuscated URL. Some of the most sophisticated publicly accessible LLM models, which include code generation and execution in their reasoning process, have already been used to analyze this specific sample. Each model “hallucinated,” producing whole fake URLs rather than correctly displaying the correct one, in spite of these capabilities.
Without human assistance, Gemini produced the following report on its own using Code Interpreter and GTI function calling as part of its reasoning process. It uses these techniques to process and extract more information from the sample when it is thought necessary.
Final Report: PowerShell Malware with URL DeobfuscationThis report analyzes a PowerShell script designed to download and execute a file from an obfuscated URL. The script employs a custom obfuscation routine resembling the RC4 stream cipher to conceal the malicious URL.Functionality:Obfuscation Routine: The script defines a function F that performs byte-level XOR encryption using a key derived from the string “tox2”. This function is used to deobfuscate the malicious URL.URL Decryption: The obfuscated URL, stored as a byte array, is decrypted using the F function with “tox2” as the key.File Download: The script downloads a file from the decrypted URL (hxxps://filedn[.]eu/lODWTgN8sswHA6Pn8HXWe1J/tox2/Scan_docs%2398097960[.]msi ) and saves it to the system’s temporary directory under a randomly generated filename.Execution: Finally, the script executes the downloaded file.Indicators of Compromise (IOCs):URL: hxxps://filedn[.]eu/lODWTgN8sswHA6Pn8HXWe1J/tox2/Scan_docs%2398097960[.]msi Deobfuscation Code:def deobfuscate(x, y): s = list(range(256)) j = 0 k = [ord(c) for c in y] for i in range(256): j = (j + s[i] + k[i % len(k)]) % 256 s[i], s[j] = s[j], s[i] i = 0 j = 0 output = [] for m in range(len(x)): i = (i + 1) % 256 j = (j + s[i]) % 256 s[i], s[j] = s[j], s[i] output.append(x[m] ^ s[(s[i] + s[j]) % 256]) return output a = [0x25, 0x7B, 0x5A, 0xD8, 0x9C, 0xFF, 0xC3, 0xA4, 0x95, 0xB7, 0x9D, 0x6E, 0x5F, 0x4D, 0x49, 0xFB, 0x78, 0x32, 0xB8, 0x17, 0x72, 0x01, 0x31, 0xFB, 0xC0, 0x58, 0x45, 0xCB, 0x5C, 0x3C, 0xDD, 0x00, 0x60, 0x87, 0x25, 0xB2, 0x44, 0xA6, 0xC5, 0xEA, 0x71, 0xAA, 0x9D, 0xA1, 0x05, 0x5F, 0xBD, 0x41, 0x28, 0x70, 0x30, 0x6B, 0x96, 0x24, 0x92, 0xA0, 0x9D, 0x92, 0x23, 0x79, 0x69, 0x2C, 0x1B, 0x1E, 0xEB, 0x69, 0x8F, 0x04, 0x10, 0x74, 0xBC] b = "tox2" c = deobfuscate(a, b) decoded_url = "".join(chr(i) for i in c) print(decoded_url) Google Threat Intelligence (GTI) Lookup:The GTI lookup provides valuable context regarding the identified IOC:Threat Actor: UNC5687, known for using MESHAGENT in phishing campaigns impersonating the Security Service of Ukraine.Campaign: This specific URL is linked to a campaign spreading MESHAGENT via phishing emails with malicious links. The campaign involved a captcha page leading to the download of either a malicious MSI or EXE file.Malware: The campaign delivers MESHAGENT, an open-source remote access framework, configured to communicate with C2 domains linked to a service called “AnonVNC.”Conclusion:The analyzed PowerShell script is confirmed to be malicious, associated with a phishing campaign by UNC5687 distributing the MESHAGENT remote access tool. |
Gemini discovered that the script hides the download URL using an RC4-like XOR-based obfuscation method. Gemini recognizes this pattern and uses the Code Interpreter sandbox to automatically create and run a Python deobfuscation script, successfully exposing the external resource.
After obtaining the URL, Gemini queries Google Threat Intelligence for more context using GTI function calling. According to this study, the URL is associated with UNC5687, a threat cluster that is well-known for deploying a remote access tool in phishing attacks that pose as the Ukrainian Security Service.
As demonstrated, the incorporation of these tools has improved Gemini’s capacity to operate as a malware analyst that can modify its methodology to tackle obfuscation and obtain crucial information about IOCs. Gemini is better able to handle complex samples by integrating the Code Interpreter and GTI function calling, which allow it to contextualize external references and comprehend hidden aspects on its own.
Even while these are important developments, there are still a lot of obstacles to overcome, particularly in light of the wide variety of malware and threat situations. Google Cloud is dedicated to making consistent progress, and the next upgrades will further expand Gemini’s capabilities, bringing us one step closer to a threat intelligence automation strategy that is more independent and flexible.