Capa is FLARE’s latest open-source malware analysis tool. Google Cloud platform lets the community encode, identify, and exchange malicious behaviors. It uses decades of reverse engineering knowledge to find out what a program performs, regardless of your background. This article explains capa, how to install and use it, and why you should utilize it in your triage routine now.
Problem
In investigations, skilled analysts can swiftly analyze and prioritize unfamiliar files. However, basic malware analysis skills are needed to determine whether a software is harmful, its participation in an assault, and its prospective capabilities. An skilled reverse engineer can typically restore a file’s full functionality and infer the author’s purpose.
Malware analysts can rapidly triage unfamiliar binaries to acquire first insights and guide analysis. However, less experienced analysts sometimes don’t know what to look for and struggle to spot the unexpected. Unfortunately, strings / FLOSS and PE viewers offer the least information, forcing users to mix and interpret data.
Malware Triage 01-01
Practical Malware Analysis Lab 01-01 illustrates this. Google Cloud want to know how the software works. The file’s strings and import table with relevant values are shown in Figure 1.
This data allows reverse engineers to deduce the program’s functionality from strings and imported API functions, but no more. Sample may generate mutex, start process, or interact via network to IP address 127.26.152.13. Winsock (WS2_32) imports suggest network capabilities, but their names are unavailable since they are imported by ordinal.
Dynamically evaluating this sample may validate or reject hypotheses and uncover new functionality. Sandbox reports and dynamic analysis tools only record code path activity. This excludes features activated following a successful C2 server connection. Google seldom advise malware analysis with an active Internet connection.
We can see the following functionality with simple programming and Windows API knowledge. The malware:
- Uses a mutex to limit execution to one
- Created a TCP socket with variables 2 = AF_INET, 1 = SOCK_STREAM, and 6 = IPPROTO_TCP.
- IP 127.26.152.13 on port 80
- Transmits and gets data
- Checks data against sleep and exec
- Develops new method
Malware may do these actions, even if not all code paths execute on each run. Together, the results show that the virus is a backdoor that can execute any program provided by a hard-coded C2 server. This high-level conclusion helps us scope an investigation and determine how to react to the danger.
Automation of Capability Identification
Malware analysis is seldom simple. A binary with hundreds or thousands of functions might propagate intent artifacts. Reverse engineering has a high learning curve and needs knowledge of assembly language and operating system internals.
After enough effort, it is discern program capabilities from repeating API calls, strings, constants, and other aspects. It show using capa that several of its primary analytical results can be automated. The technology codifies expert knowledge and makes it accessible to the community in a flexible fashion. Capa detects characteristics and patterns like a person, producing high-level judgments that may guide further investigation. When capa detects unencrypted HTTP communication, you may need to investigate proxy logs or other network traces.
Introducing capa
The output from capa against its sample program virtually speaks for itself. Each left item in the main table describes a capability in this example. The right-hand namespace groups similar capabilities. capa defined all the program capabilities outlined in the preceding part well.
Capa frequently has unanticipated outcomes. Capa to always present the evidence required to determine a capability. The “create TCP socket” conclusion output from capa . Here, it can see where capa detected the necessary characteristics in the binary. While they wait for rule syntax, it may assume they’re a logic tree with low-level characteristics.
How it Works
Its two major components algorithmically triage unknown programs. First, a code analysis engine collects text, disassembly, and control flow from files. Second, a logic engine identifies rule-based feature pairings. When the logic engine matches, it reports the rule’s capability.
Extraction of Features
The code analysis engine finds program low-level characteristics. It can describe its work since all its characteristics, including strings and integers, are human-recognizable. These characteristics are usually file or disassembly-related.
File characteristics, like the PE file header, are retrieved from raw file data and structure. Skimming the file may reveal this. Other than strings and imported APIs, they include exported function and section names.
Advanced static analysis of a file extracts disassembly characteristics, which reconstructs control flow. Figure displays API calls, instruction mnemonics, integers, and string references in disassembly.
It applies its logic at the right level since sophisticated analysis can differentiate between functions and other scopes in a program. When unrelated APIs are utilized in distinct functions, capa rules may match them against each function separately, preventing confusion.
It is developed for flexible and extensible feature extraction. Integrating code analysis backends is simple. It standalone uses a vivisect analysis framework. The IDA Python backend lets you run it in IDA Pro. various code analysis engines may provide various feature sets and findings. The good news is that this seldom causes problems.
Capa Rules
A capa rule describes a program capability using an organized set of characteristics. If all needed characteristics are present, capa declares the program capable.
Its rules are YAML documents with metadata and logic assertions. Rule language includes counting and logical operators. The “create TCP socket” rule requires a basic block to include the numbers 6, 1, and 2 and calls to API methods socket or WSASocket. Basic blocks aggregate assembly code low-level, making them perfect for matching closely connected code segments. It enables function and file matching in addition to basic blocks. Function scope connects all features in a disassembled function, whereas file scope includes all file features.
Rule names define capabilities, whereas namespaces assign them to techniques or analytic categories. Its output capability table showed the name and namespace. Author and examples may be added to the metadata. To unit test and validate every rule, Google Cloud utilizes examples to reference files and offsets with known capabilities. Please maintain a copy of capa rules since they detail real-world malware activities. Meta information like capa’s support for the ATT&CK and Malware Behavior Catalog frameworks will be covered in a future article.
Installation
The offer standalone executables for Windows, Linux, and OSX to simplify capa use. It provide the Python tool’s source code on GitHub. The capa repository has updated installation instructions.
Latest FLARE-VM versions on GitHub feature capa.
Usage
Run capa and provide the input file to detect software capabilities:
Suspicious.exe
Capa supports shellcode and Windows PE (EXE, DLL, SYS). For instance, to analyze 32-bit shellcode, capa must be given the file format and architecture:
- Capa sc32 shellcode.bin
It has two verbosity levels for detailed capability information. Use highly verbose to see where and why capa matched rules:
- Suspicious.exe capa
Use the tag option to filter rule meta data to concentrate on certain rules:
- Suspicious.exe capa -t “create TCP socket”
Show capa’s help to show all available options and simplify documentation:
- $capa-h
Contributing
Google cloud believe capa benefits the community and welcome any contribution. Google cloud appreciate criticism, suggestions, and pull requests. Starting with the contributing document is ideal.
Rules underpin its identifying algorithm. It aims to make writing them entertaining and simple.
Utilize a second GitHub repository for its embedded rules to segregate work and conversations from its main code. Rule repository is a git submodule in its main repository.
Conclusion
FLARE’s latest malware analysis tool is revealed in this blog article. The open-source capa framework encodes, recognizes, and shares malware behaviors. Believe the community needs this tool to combat the number of malware it encounter during investigations, hunting, and triage. It uses decades of knowledge to explain a program, regardless of your background.
Apply it to your next malware study. The program is straightforward to use and useful for forensic analysts, incident responders, and reverse engineers.