Saturday, December 14, 2024

XRefer: The Binary Navigator With Gemini Assistance

- Advertisement -

XRefer: The Binary Navigator Assisted by Gemini

Malware reverse engineering is a routine aspect of its workdays at Mandiant FLARE. It occasionally has to do simple triages on binaries, and every hour saved is crucial for meeting incident response deadlines. At other times, it spends days analyzing complex materials and creating thorough analytical reports. Knowing where to go, what to look at, and creating a “map” of the malware are important tasks that have a direct impact on its response times and triage efficacy as it deal with more and more complicated malware, which is frequently written in contemporary languages like Rust.

- Advertisement -

Today, it presents XRefer, a new tool that tries to alleviate some of this strain for analysts like us who try to follow these rabbit holes. It helps analysts get to the key parts of their inquiry more quickly while preserving the context of their work.

Overview

To help analysts navigate and comprehend binaries, XRefer offers a permanent companion view. This program, which is an IDA Pro plugin, is expandable and modular.

Fundamentally, XRefer provides two complementary paradigms for navigation:

  • Cluster analysis driven by Gemini breaks down the binary into functional units and uses the large language model (LLM) to explain their relationships and goals. Compare this to looking at a city on Google Maps, where you can easily locate the commercial districts, residential neighborhoods, and green areas. This feature gives you a strategic perspective of the malware’s architecture by assisting in the identification of functional groupings such as command-and-control communication, persistence mechanisms, or information-gathering routines.
  • A view that is sensitive to context and changes dynamically according to where you are in the code. Both the current function’s immediate artifacts and those from related functions that share an execution route are displayed in this view. It’s like having X-ray vision when standing outside a mall, allowing you to see the menus of restaurants, store stocks, and services provided on each floor without having to go inside. This enables you to decide with knowledge which areas need more research. By surfacing APIs, strings, CAPA matches, library information, and other artifacts that might otherwise necessitate manual exploration of numerous functions, XRefer’s context-aware view assists analysts in rapidly identifying pertinent code paths, much like a mall directory facilitates effective route planning for shopping.

Let’s examine each of these paradigms in more detail, starting with cluster-based navigation.

- Advertisement -

An Overview of Cluster-Based Binary Navigation

The capacity of XRefer to deconstruct a binary into functional pieces and instantly provide a high-level understanding of its design is one of its primary capabilities. Let’s look at a Rust-written ALPHV ransomware sample to illustrate this capability. XRefer’s analysis arranges the essential operations of this intricate binary into distinct functional clusters, even though it has over 2,700 functions.

Cluster Relationship graph view
Image credit to Google Cloud
Cluster Relationship graph view

The following is a descriptive term for these functional clusters:

  • Main Module of Ransomware
    • Module for Configuration Parsing
    • Module for Process Information and User Profile
    • System Data, Privilege Escalation, and AntiAnalysis Module
    • Pipeline Module for File Processing
    • Module for Network Communication and Cluster Management
    • Module for Thread Synchronisation and Keyed Events
    • Module for File Path and Encryption Key Generation
    • Module for Console Clearing
    • The Console Output Module and UI Rendering
    • Module for Image Generation and Encoding
    • Module for Data Encoding and Hashing
    • Module for File Discovery and Dispatch
    • Module for Time Management and Thread Synchronization

Concentrate on the high-level view for the time being, but each cluster has deeper sub-clusters that analysts can investigate. Static analysis is used to identify relationships and execute clustering. The natural language descriptions of each cluster and their relationships are then provided by XRefer using Gemini.

The display gives a brief explanation of the binary’s category and functionality at the top. The currently chosen cluster and its connections to other clusters are then described. A visual graph representation follows a list of that cluster’s cross-references for easy navigation.

BINARY CATEGORYRansomware
BINARY DESCRIPTIONThis binary is ransomware that encrypts files using various ciphers, propagates over the network, and employs anti-analysis techniques.
CLUSTERImage Generation and Encoding Module
DESCRIPTIONGenerates and encodes images in PNG format
RELATIONSHIPSUses embedded-graphics and PNG crates for image generation, DEFLATE compression (cluster.id.0061), and PNG encoding. Handles image rendering and encoding errors.
CROSS REFERENCES<functon_name> – cluster.id.0001 – Ransomware Main Module<function_address>

In order to provide a clearer visual representation of cluster navigation, it has employed a lightweight backdoor.

When you go to a function that is part of a known cluster, XRefer can immediately open that cluster’s view and highlight the current function inside it. This allows the navigation to automatically sync with clusters. XRefer has two methods for clustering:

  • All paths included in XRefer’s study should be clustered (more on XRefer’s analysis later).
  • Group a specific subset of functions that Gemini has pre-filtered according to their artifacts.

Note: Binary elements such as strings, library references, API calls, and other extractable information that aid in understanding program behaviour are referred to as “artifacts” throughout this blog article.

XRefer uses the first approach by default. Although this method is thorough, it might lead to the formation of more clusters around the program’s unknown libraries. Usually, it is simple to recognize these library groups and rule them out of study.

The second clustering technique, which can be accessed optionally through the context menu, is useful for automatically eliminating repetitious loud functions, runtime/compiler artifacts, and libraries. But because LLMs are inherently flawed, this method may not always work as intended artifacts may be overlooked, and outcomes may differ from run to run. This unpredictability is nevertheless a feature of this method even if missed artifacts can typically be retrieved with a fast rerun of the LLM analysis.

These LLM-filtered artifacts can also be seen by XRefer as a separate view from the clustering visualization.

It is crucial to remember that clusters are not ideal boundaries. They might include functions that are utilized repeatedly throughout the binary and might not catch all relevant functions. Reused functions are usually simple to spot at a glance, and any related functions that are overlooked will usually be located close to their logical cluster. Clustering aims to construct general zones and subzones with related functionality rather than rigid divisions.

- Advertisement -
Thota nithya
Thota nithya
Thota Nithya has been writing Cloud Computing articles for govindhtech from APR 2023. She was a science graduate. She was an enthusiast of cloud computing.
RELATED ARTICLES

Recent Posts

Popular Post

Govindhtech.com Would you like to receive notifications on latest updates? No Yes