The rapid growth of malware challenges manual analysis methods, emphasising the need for automation and new technologies. Generative AI models are useful for some malware analysis, but they struggle with large and complicated samples. Gemini 1.5 Pro, which can process 1 million tokens, is a breakthrough. This development allows AI to aid in malware analysis workflow automation and scales up code analysis automation. Gemini 1.5 Pro helps analysts manage the overwhelming amount of threats by significantly improving processing capacity, enabling a more adaptable and robust cybersecurity approach.
Traditional Automated Malware Analysis Methods
Static and dynamic analysis methods are essential to understanding malware behaviour and underpin automated malware analysis. Static analysis of malware reveals its code structure and unobfuscated logic without execution. In contrast, dynamic analysis involves watching malware execute in a controlled environment to observe its behaviour regardless of obfuscation. These methods are used to understand malware.
Alongside these methods, AI and ML are being used to categorise and cluster malware by behaviour, signatures, and anomalies. These methods include supervised learning, which trains models on labelled datasets, and unsupervised learning for clustering, which groups malware by patterns without labels.
Despite technological advances, malware complexity and volume are major concerns. ML improves malware variant detection but not new threats. This detection gap lets advanced attacks bypass cybersecurity, undermining system security.
Malware Analysis Assistant Generative AI
Generated AI (gen AI) malware analysis advanced using Code Insight at the RSA Conference 2023. This new component of Google’s VirusTotal platform analyses code snippets and generates natural language reports like a malware researcher. Code Insight first supported PowerShell scripts, then Batch, Shell, VBScript, and Office files.
Code Insight helps analysts understand code behaviour and attack strategies by digesting code and creating summary reports. This involves discovering hidden functionality, malevolent intent, and attack paths that typical detection approaches may miss.
Code Insight could only handle certain file sizes owing to LLM limits and token input capability. Despite continual advancements to extend the maximum file size limit and support new formats, analysing binaries and executables remains difficult. These files’ code size usually exceeds the LLMs’ processing capability when disassembled or decompiled. Thus, current AI models have mostly assisted human analysts by analysing code fragments from binaries rather than the complete code, which is typically too large for them.
Reverse Engineering: Malware Analysis’ Human Side
Probably the most advanced malware analysis method for cybersecurity specialists is reverse engineering. This approach involves disassembling malicious software binaries and carefully examining the code. Analysts can reverse engineer malware to determine its functionality and execution flow. However, this strategy has drawbacks. Reconstructing the malware’s logic and revealing its secrets demands a lot of time, knowledge, and an analytical mentality to comprehend each instruction, data structure, and function call.
Scaling reverse engineering is difficult. The lack of specialised talent in this field makes scaling these analyses difficult. Reverse engineering is complicated and time-consuming, therefore the cybersecurity sector has sought ways to make it easier.
Gemini 1.5 Pro: Scalable Malware Analysis Reverse Engineering
Malware analysis, especially reverse engineering, improves with the ability to analyse 1 million token prompts. This development ultimately allows gen AI to analyse binaries and executables, a challenging process formerly reserved for highly competent human analysts.
Gemini 1.5 Pro does this how?
Increased capacity
Gemini 1.5 Pro can analyse some disassembled or decompiled executables in one pass without breaking code down due to its increased token capacity. Fragmented code might lose context and critical programme linkages, making this crucial. Small bits make it hard to understand the malware’s functionality and behaviour, potentially overlooking its goal and functioning. Gemini 1.5 Pro analyses the entire malware code for a more accurate and complete analysis.
Coding interpretation
Gemini 1.5 Pro interprets code intent and purpose, not just patterns or similarities. Its training on a vast dataset of assembly language from diverse architectures, high-level languages like C, and decompiler pseudo-code makes this possible. Gemini 1.5 Pro can mimic malware analyst logic and judgement because to its comprehensive knowledge of OS systems, networking, and cybersecurity. Thus, it can forecast malware behaviour and provide insights into new dangers. See the zero day case study later in this essay for more.
Analysis in detail
Gemini 1.5 Pro generates human-readable summary reports, making analysis easier and faster. These go beyond the simple categorization and clustering conclusions of classic machine learning algorithms. Gemini 1.5 Pro’s reports can include malware functionality, behaviour, potential attack paths, and indicators of compromise (IOCs) to feed other security systems to improve threat detection and prevention.
A realistic case study will show how Gemini 1.5 Pro analyses decompiled code with a representative malware sample. They automatically decompiled two WannaCry binaries using Hex-Rays without annotations or context. This method yielded two C code files, 268 KB and 231 KB, with over 280,000 tokens for LLM processing.
In testing with other similar gen AI tools, they had to fragment the code. Fragmentation often made the analysis incomplete and ambiguous. These limitations demonstrate the difficulties of employing such tools with complex code bases.
Gemini 1.5 Pro breaks these limits significantly. Analysis takes 34 seconds and processes all decompiled code in one shot. Gemini 1.5 Pro’s introductory summary accurately shows its ability to handle vast and complicated datasets:
- Declares ransomware malicious.
- IOC files include c.wnry and tasksche.exe
- Acknowledges using an algorithm to generate IP addresses and scan network for port 445/SMB targets to infect other systems.
- Finds WannaCry’s “killswitch” URL/domain, registry key, and mutex
Gemini 1.5 Pro’s WannaCry report isn’t based on pre-trained understanding of this malware. Analysis comes from the model’s independent code interpretation. As Gemini 1.5 Pro analyses novel malware samples in the future examples, its broad capabilities will become obvious.
Malware Details
The following table lists this post’s malware samples
Filename | SHA-256 Hash | Size | First Seen | File Type |
lhdfrgui.exe (WannaCry dropper) | 24d004a104d4d54034dbcffc2a4b19a 11f39008a575aa614ea04703480b1022c | 3.55 MB (3723264 bytes) | 2017-05-12 | Win32 EXE |
tasksche.exe (WannaCry cryptor) | ed01ebfbc9eb5bbea545af4d01bf5f10 71661840480439c6e5babe8e080e41aa | 3.35 MB (3514368 bytes) | 2017-05-12 | Win32 EXE |
EXEC.exe | 1917ec456c371778a32bdd74e113b0 7f33208740327c3cfef268898cbe4efbfe | 306.50 KB (313856 bytes) | 2022-04-18 | Win32 EXE |
medui.exe | 719b44d93ab39b4fe6113825349add fe5bd411b4d25081916561f9c403599e50 | 833.50 KB (853504 bytes) | 2024-03-27 | Win32 EXE |