Monday, May 27, 2024

How Gemini 1.5 Pro Makes Malware Analysis Effortless

The rapid growth of malware challenges manual analysis methods, emphasising the need for automation and new technologies. Generative AI models are useful for some malware analysis, but they struggle with large and complicated samples. Gemini 1.5 Pro, which can process 1 million tokens, is a breakthrough. This development allows AI to aid in malware analysis workflow automation and scales up code analysis automation. Gemini 1.5 Pro helps analysts manage the overwhelming amount of threats by significantly improving processing capacity, enabling a more adaptable and robust cybersecurity approach.

Traditional Automated Malware Analysis Methods

Static and dynamic analysis methods are essential to understanding malware behaviour and underpin automated malware analysis. Static analysis of malware reveals its code structure and unobfuscated logic without execution. In contrast, dynamic analysis involves watching malware execute in a controlled environment to observe its behaviour regardless of obfuscation. These methods are used to understand malware.

Alongside these methods, AI and ML are being used to categorise and cluster malware by behaviour, signatures, and anomalies. These methods include supervised learning, which trains models on labelled datasets, and unsupervised learning for clustering, which groups malware by patterns without labels.

Despite technological advances, malware complexity and volume are major concerns. ML improves malware variant detection but not new threats. This detection gap lets advanced attacks bypass cybersecurity, undermining system security.

Malware Analysis Assistant Generative AI

Generated AI (gen AI) malware analysis advanced using Code Insight at the RSA Conference 2023. This new component of Google’s VirusTotal platform analyses code snippets and generates natural language reports like a malware researcher. Code Insight first supported PowerShell scripts, then Batch, Shell, VBScript, and Office files.

Code Insight helps analysts understand code behaviour and attack strategies by digesting code and creating summary reports. This involves discovering hidden functionality, malevolent intent, and attack paths that typical detection approaches may miss.

Code Insight could only handle certain file sizes owing to LLM limits and token input capability. Despite continual advancements to extend the maximum file size limit and support new formats, analysing binaries and executables remains difficult. These files’ code size usually exceeds the LLMs’ processing capability when disassembled or decompiled. Thus, current AI models have mostly assisted human analysts by analysing code fragments from binaries rather than the complete code, which is typically too large for them.

Reverse Engineering: Malware Analysis’ Human Side

Probably the most advanced malware analysis method for cybersecurity specialists is reverse engineering. This approach involves disassembling malicious software binaries and carefully examining the code. Analysts can reverse engineer malware to determine its functionality and execution flow. However, this strategy has drawbacks. Reconstructing the malware’s logic and revealing its secrets demands a lot of time, knowledge, and an analytical mentality to comprehend each instruction, data structure, and function call.

Scaling reverse engineering is difficult. The lack of specialised talent in this field makes scaling these analyses difficult. Reverse engineering is complicated and time-consuming, therefore the cybersecurity sector has sought ways to make it easier.

Gemini 1.5 Pro: Scalable Malware Analysis Reverse Engineering

Malware analysis, especially reverse engineering, improves with the ability to analyse 1 million token prompts. This development ultimately allows gen AI to analyse binaries and executables, a challenging process formerly reserved for highly competent human analysts.

Gemini 1.5 Pro does this how?

Increased capacity

Gemini 1.5 Pro can analyse some disassembled or decompiled executables in one pass without breaking code down due to its increased token capacity. Fragmented code might lose context and critical programme linkages, making this crucial. Small bits make it hard to understand the malware’s functionality and behaviour, potentially overlooking its goal and functioning. Gemini 1.5 Pro analyses the entire malware code for a more accurate and complete analysis.

Coding interpretation

Gemini 1.5 Pro interprets code intent and purpose, not just patterns or similarities. Its training on a vast dataset of assembly language from diverse architectures, high-level languages like C, and decompiler pseudo-code makes this possible. Gemini 1.5 Pro can mimic malware analyst logic and judgement because to its comprehensive knowledge of OS systems, networking, and cybersecurity. Thus, it can forecast malware behaviour and provide insights into new dangers. See the zero day case study later in this essay for more.

Analysis in detail

Gemini 1.5 Pro generates human-readable summary reports, making analysis easier and faster. These go beyond the simple categorization and clustering conclusions of classic machine learning algorithms. Gemini 1.5 Pro’s reports can include malware functionality, behaviour, potential attack paths, and indicators of compromise (IOCs) to feed other security systems to improve threat detection and prevention.

A realistic case study will show how Gemini 1.5 Pro analyses decompiled code with a representative malware sample. They automatically decompiled two WannaCry binaries using Hex-Rays without annotations or context. This method yielded two C code files, 268 KB and 231 KB, with over 280,000 tokens for LLM processing.

In testing with other similar gen AI tools, they had to fragment the code. Fragmentation often made the analysis incomplete and ambiguous. These limitations demonstrate the difficulties of employing such tools with complex code bases.

Gemini 1.5 Pro breaks these limits significantly. Analysis takes 34 seconds and processes all decompiled code in one shot. Gemini 1.5 Pro’s introductory summary accurately shows its ability to handle vast and complicated datasets:

  • Declares ransomware malicious.
  • IOC files include c.wnry and tasksche.exe
  • Acknowledges using an algorithm to generate IP addresses and scan network for port 445/SMB targets to infect other systems.
  • Finds WannaCry’s “killswitch” URL/domain, registry key, and mutex

Gemini 1.5 Pro’s WannaCry report isn’t based on pre-trained understanding of this malware. Analysis comes from the model’s independent code interpretation. As Gemini 1.5 Pro analyses novel malware samples in the future examples, its broad capabilities will become obvious.

Malware Details

The following table lists this post’s malware samples

FilenameSHA-256 HashSizeFirst SeenFile Type
lhdfrgui.exe (WannaCry dropper)24d004a104d4d54034dbcffc2a4b19a
3.55 MB (3723264 bytes)2017-05-12Win32 EXE
tasksche.exe (WannaCry cryptor)ed01ebfbc9eb5bbea545af4d01bf5f10
3.35 MB (3514368 bytes)2017-05-12Win32 EXE
306.50 KB (313856 bytes)2022-04-18Win32 EXE
833.50 KB (853504 bytes)2024-03-27Win32 EXE
Since June 2023, Drakshi has been writing articles of Artificial Intelligence for govindhtech. She was a postgraduate in business administration. She was an enthusiast of Artificial Intelligence.


Please enter your comment!
Please enter your name here

Recent Posts

Popular Post Would you like to receive notifications on latest updates? No Yes