Thursday, July 4, 2024

Gemini 1.5 Pro’s Analysis : From Assistant to Analyst

Thankfully, Gemini 1.5 Pro shows comparable performance in handling assembly and high-level languages on different architectures. As a result, depending on the particulars of each instance, google method for automating binary analysis can make use of both strategies or take a hybrid approach. This adaptability enables us to customise google analysis approach to the particulars of the binary under consideration, optimising for effectiveness, breadth of understanding, and the analysis’s particular goals, whether that entails breaking down the program’s logic and flow or delving into the minute details of its low-level operations.

LLM Code Analysis

Code Disassembled vs. Decompiled: A Comparison

Decompilation was an essential step in the previous example of WannaCry analysis before the code was fed to the LLM. This entirely automated approach translates binary code into a higher-level representation such as C, and it resembles the first steps malware investigators take when manually breaking down dangerous software. However, how does the distinction between decompiled and disassembled code affect LLM analysis?

Disassembly

Assembly language, a low-level representation unique to the processor architecture, is created by converting binary code. Assembly code is human-readable, but it’s still very complicated and takes a lot of knowledge to grasp. Compared to the original source code, it is also far lengthier and more repetitious.

Decompilation

Through this procedure, the original source code is attempted to be reconstructed from the binary. Decompilation, as opposed to disassembled code, can greatly increase readability and conciseness, however it is not always flawless. This is accomplished by highlighting high-level components like as variables, loops, and functions, which helps analysts better grasp the code.

Given these considerations, decompilation provides a number of scalability and efficiency benefits when employing LLMs for binary analysis. Decompilation produces output that is shorter and more organised, which more easily fits the processing limits of LLMs and makes it possible to analyse huge or complex binaries more effectively. As a matter of fact, a decompiler produces output that is five to ten times more concise than a disassembler.

Disassembly is required for precise decompilation and is still a very useful tool in some situations when in-depth, low-level analysis is essential. In some situations, disassembly offers insights that decompilation cannot match because of the higher-level, structured character of the decompiled output.

Google will then look at an example where Google use disassembly for analysis directly. This time, Google is dealing with a more recent and unidentified binary; in fact, only four of the seven VirusTotal anti-malware engines have identified the executable as malicious, and even then, only in a general sense, without offering any additional information about the malware family that might shed light on the executable’s behaviour.

Thanks to its huge token window at the prompt, Gemini 1.5 Pro can process the 306.50 KB executable binary in 46 seconds in a single pass after automatic preprocessing using HexRays/IDA Pro, resulting in a 1.5 MB assembly file. This feature makes it possible to analyse the assembly output in its entirety, providing in-depth understanding of the binary’s functioning.

The intriguing case of the unidentified binary highlights Gemini 1.5 Pro‘s amazing powers. Even though the file was only detected as malicious by four out of seventy anti-malware engines on VirusTotal (using generic signatures), Gemini 1.5 Pro detected the file as malicious and gave a thorough justification for its conclusion.

It’s possible that the file is a game cheat meant to insert a dynamic-link library (DLL) hack into the Grand Theft Auto video game. Depending on your point of view, something that the game’s developers or security team believes to be “malicious” could really be desirable for certain players. Still, this automated first-pass analysis is astounding and provides valuable insight into the nature and purpose of the binary.

Best Malware Analysis tools

Revealing the Hidden: An Analysis of a Zero-Day Detection Case

Any malware analysis tool’s real test is its capacity to recognise threats that have never been seen before and go unnoticed by conventional techniques, all the while proactively shielding systems from zero-day attacks. Here, they look at a scenario in which an executable file on VirusTotal is not picked up by any anti-virus programme or sandbox.

After being decompiled into 189,080 tokens, the 833 KB file medui.exe was processed by Gemini 1.5 Pro in about 27 seconds, yielding a comprehensive malware analysis report in just one pass.

Due to the dubious functionalities this investigation turned up, Gemini 1.5 Pro declared something to be harmful. It came to the conclusion that this malware’s main objective is to steal bitcoin by intercepting Bitcoin transactions and avoiding discovery by turning off protection software based on its observations.

This demonstrates Gemini AI can use its in-depth knowledge of code behaviour to identify malicious intent, even in threats that haven’t been encountered before, going beyond basic pattern matching or machine learning classification. This is a big development for malware research because it enables us to proactively identify and address novel and emerging threats that more conventional approaches could overlook.

From Analyst to Assistant with Gemini 1.5 pro

Gemini 1.5 Pro opens up amazing possibilities that allow you to analyse a lot of decompiled and disassembled code. By improving efficiency, accuracy, and google capacity to expand in response to an increasing number of threats, it has the potential to drastically alter how we combat malware.

But remember that this is just the beginning. Even though Gemini 1.5 Pro is a big step forward, the field of artificial intelligence is still very young. To obtain genuinely robust and dependable automated malware analysis, a number of issues must be resolved:

Obfuscation and packing

To hide their code and avoid detection, malware writers are always coming up with novel ways to do so. As a result, there’s a rising requirement to improve binary preprocessing before analysis in addition to continuously improving generation AI models. Using dynamic strategies and different preprocessing tools can help remove malware more successfully by unpacking and deobfuscating it. This preliminary work is essential to allow future artificial intelligence (AI) models to precisely examine the underlying code, guaranteeing that they stay abreast of changing obfuscation strategies and continue to be proficient in identifying and comprehending complex malware threats.

Growing binary size

The size of modern software’s binaries is increasing, reflecting the complexity of the software itself. Since most current AI models are limited by significantly smaller token window constraints, this trend poses a serious difficulty.

Gemini 1.5 Pro, on the other hand, stands out since it can hold up to one million tokens, which is the largest capacity in the industry as of right now. However, even with this amazing capabilities, Gemini 1.5 Pro could run into issues when managing very big binaries. This emphasises how AI technology must continue to progress in order to handle the analysis of ever-larger files and guarantee thorough and efficient malware detection as programme complexity rises.

Attack methodologies are always changing, therefore the issue facing generation AI models goes beyond mere adaptation. Attackers always develop new security bypasses. In addition to recognising new dangers, these models must evolve with developers and researchers. In order to improve the context that AI models are given, new techniques for automating the preparation of threat data must be developed. For example, combining the decompiled and disassembled code with extra information from static and dynamic analysis tools, including sandbox reports, can greatly improve the models’ comprehension and detection powers.

Drakshi
Drakshi
Since June 2023, Drakshi has been writing articles of Artificial Intelligence for govindhtech. She was a postgraduate in business administration. She was an enthusiast of Artificial Intelligence.
RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Recent Posts

Popular Post

Govindhtech.com Would you like to receive notifications on latest updates? No Yes