How Gemini 1.5 Pro Makes Malware Analysis Effortless

April 30, 2024

337

How the Google Gemini API Can Supercharge Your Projects

The rapid growth of malware challenges manual analysis methods, emphasising the need for automation and new technologies. Generative AI models are useful for some malware analysis, but they struggle with large and complicated samples. Gemini 1.5 Pro, which can process 1 million tokens, is a breakthrough. This development allows AI to aid in malware analysis workflow automation and scales up code analysis automation. Gemini 1.5 Pro helps analysts manage the overwhelming amount of threats by significantly improving processing capacity, enabling a more adaptable and robust cybersecurity approach.

Traditional Automated Malware Analysis Methods

Static and dynamic analysis methods are essential to understanding malware behaviour and underpin automated malware analysis. Static analysis of malware reveals its code structure and unobfuscated logic without execution. In contrast, dynamic analysis involves watching malware execute in a controlled environment to observe its behaviour regardless of obfuscation. These methods are used to understand malware.

Alongside these methods, AI and ML are being used to categorise and cluster malware by behaviour, signatures, and anomalies. These methods include supervised learning, which trains models on labelled datasets, and unsupervised learning for clustering, which groups malware by patterns without labels.

Despite technological advances, malware complexity and volume are major concerns. ML improves malware variant detection but not new threats. This detection gap lets advanced attacks bypass cybersecurity, undermining system security.

Malware Analysis Assistant Generative AI

Generated AI (gen AI) malware analysis advanced using Code Insight at the RSA Conference 2023. This new component of Google’s VirusTotal platform analyses code snippets and generates natural language reports like a malware researcher. Code Insight first supported PowerShell scripts, then Batch, Shell, VBScript, and Office files.

Code Insight helps analysts understand code behaviour and attack strategies by digesting code and creating summary reports. This involves discovering hidden functionality, malevolent intent, and attack paths that typical detection approaches may miss.

Code Insight could only handle certain file sizes owing to LLM limits and token input capability. Despite continual advancements to extend the maximum file size limit and support new formats, analysing binaries and executables remains difficult. These files’ code size usually exceeds the LLMs’ processing capability when disassembled or decompiled. Thus, current AI models have mostly assisted human analysts by analysing code fragments from binaries rather than the complete code, which is typically too large for them.

Reverse Engineering: Malware Analysis’ Human Side

Probably the most advanced malware analysis method for cybersecurity specialists is reverse engineering. This approach involves disassembling malicious software binaries and carefully examining the code. Analysts can reverse engineer malware to determine its functionality and execution flow. However, this strategy has drawbacks. Reconstructing the malware’s logic and revealing its secrets demands a lot of time, knowledge, and an analytical mentality to comprehend each instruction, data structure, and function call.

Scaling reverse engineering is difficult. The lack of specialised talent in this field makes scaling these analyses difficult. Reverse engineering is complicated and time-consuming, therefore the cybersecurity sector has sought ways to make it easier.

Gemini 1.5 Pro: Scalable Malware Analysis Reverse Engineering

Malware analysis, especially reverse engineering, improves with the ability to analyse 1 million token prompts. This development ultimately allows gen AI to analyse binaries and executables, a challenging process formerly reserved for highly competent human analysts.

Gemini 1.5 Pro does this how?

Increased capacity

Gemini 1.5 Pro can analyse some disassembled or decompiled executables in one pass without breaking code down due to its increased token capacity. Fragmented code might lose context and critical programme linkages, making this crucial. Small bits make it hard to understand the malware’s functionality and behaviour, potentially overlooking its goal and functioning. Gemini 1.5 Pro analyses the entire malware code for a more accurate and complete analysis.

Coding interpretation

Gemini 1.5 Pro interprets code intent and purpose, not just patterns or similarities. Its training on a vast dataset of assembly language from diverse architectures, high-level languages like C, and decompiler pseudo-code makes this possible. Gemini 1.5 Pro can mimic malware analyst logic and judgement because to its comprehensive knowledge of OS systems, networking, and cybersecurity. Thus, it can forecast malware behaviour and provide insights into new dangers. See the zero day case study later in this essay for more.

Analysis in detail

Gemini 1.5 Pro generates human-readable summary reports, making analysis easier and faster. These go beyond the simple categorization and clustering conclusions of classic machine learning algorithms. Gemini 1.5 Pro’s reports can include malware functionality, behaviour, potential attack paths, and indicators of compromise (IOCs) to feed other security systems to improve threat detection and prevention.

A realistic case study will show how Gemini 1.5 Pro analyses decompiled code with a representative malware sample. They automatically decompiled two WannaCry binaries using Hex-Rays without annotations or context. This method yielded two C code files, 268 KB and 231 KB, with over 280,000 tokens for LLM processing.

In testing with other similar gen AI tools, they had to fragment the code. Fragmentation often made the analysis incomplete and ambiguous. These limitations demonstrate the difficulties of employing such tools with complex code bases.

Gemini 1.5 Pro breaks these limits significantly. Analysis takes 34 seconds and processes all decompiled code in one shot. Gemini 1.5 Pro’s introductory summary accurately shows its ability to handle vast and complicated datasets:

Declares ransomware malicious.
IOC files include c.wnry and tasksche.exe
Acknowledges using an algorithm to generate IP addresses and scan network for port 445/SMB targets to infect other systems.
Finds WannaCry’s “killswitch” URL/domain, registry key, and mutex

Gemini 1.5 Pro’s WannaCry report isn’t based on pre-trained understanding of this malware. Analysis comes from the model’s independent code interpretation. As Gemini 1.5 Pro analyses novel malware samples in the future examples, its broad capabilities will become obvious.

Malware Details

The following table lists this post’s malware samples

Filename	SHA-256 Hash	Size	First Seen	File Type
lhdfrgui.exe (WannaCry dropper)	24d004a104d4d54034dbcffc2a4b19a 11f39008a575aa614ea04703480b1022c	3.55 MB (3723264 bytes)	2017-05-12	Win32 EXE
tasksche.exe (WannaCry cryptor)	ed01ebfbc9eb5bbea545af4d01bf5f10 71661840480439c6e5babe8e080e41aa	3.35 MB (3514368 bytes)	2017-05-12	Win32 EXE
EXEC.exe	1917ec456c371778a32bdd74e113b0 7f33208740327c3cfef268898cbe4efbfe	306.50 KB (313856 bytes)	2022-04-18	Win32 EXE
medui.exe	719b44d93ab39b4fe6113825349add fe5bd411b4d25081916561f9c403599e50	833.50 KB (853504 bytes)	2024-03-27	Win32 EXE

How Gemini 1.5 Pro Makes Malware Analysis Effortless

Traditional Automated Malware Analysis Methods

Malware Analysis Assistant Generative AI

Reverse Engineering: Malware Analysis’ Human Side

Gemini 1.5 Pro: Scalable Malware Analysis Reverse Engineering

Gemini 1.5 Pro does this how?

Increased capacity

Coding interpretation

Analysis in detail

Malware Details

The following table lists this post’s malware samples

OpenAI ChatGPT Edu AI Power In Future Of Education

LLaMA 3.3 70B Multilingual AI Model Redefines Performance

Midjourney V7: Better AI Image Generation, Realistic Results

LEAVE A REPLY Cancel reply

Page Content

Recent Posts

OpenAI ChatGPT Edu AI Power In Future Of Education

LLaMA 3.3 70B Multilingual AI Model Redefines Performance

Midjourney V7: Better AI Image Generation, Realistic Results

Agent Mode In GitHub Copilot For Your VS Code Workflow

Intel Agilex 7 FPGA and SoC Improve Hardware Acceleration

Quantum Picturalism QPic And Future Of Quantum Education

About Us

POPULAR CATEGORY