Contact

Symbolic execution and taint analysis for Mach-O binaries

Table of contents

Overview

Symbolic execution and taint analysis help explain how data moves through Mach-O binaries when source code is unavailable. Symbolic execution reasons about reachable paths and required inputs. Taint analysis tracks untrusted data from entry points to sensitive operations.

This article focuses on practical analysis of macOS and iOS binaries with Radare2, ESIL, r2pipe, and SMT solvers. The goal is to identify feasible data flows, document assumptions, and decide where dynamic validation is needed.

Core concepts

Symbolic execution

Symbolic execution is a program analysis technique that executes programs with symbolic values instead of concrete inputs. When a program is executed symbolically:

Symbolic execution helps answer questions like:

The technique constructs a mathematical formula that represents program behavior, which can be solved to produce inputs that follow specific execution paths.

Taint analysis

Taint analysis is a technique for tracking potentially untrusted data (taint) through a program’s execution to identify security vulnerabilities:

For example, if user input (tainted) flows into a SQL query without proper sanitization, a SQL injection vulnerability may exist.

Taint tracking rules include:

Relationship between techniques

While symbolic execution and taint analysis are distinct techniques, they complement each other:

SMT solvers

SMT (Satisfiability Modulo Theories) solvers are core components of symbolic execution engines. They determine whether a set of constraints has a solution and, if so, find concrete values that satisfy those constraints.

Key aspects of SMT solvers include:

Popular SMT solvers include:

SMT solvers translate program constraints into logical formulas and attempt to find assignments to variables that make these formulas true. This allows symbolic execution engines to determine if execution paths are feasible and what inputs would follow those paths.

Analysis tools

Radare2 fundamentals

Radare2 is an open-source reverse engineering framework with useful primitives for Mach-O inspection, disassembly, emulation, and automation.

Setting up Radare2

# Basic opening of a Mach-O binary
r2 ./target_binary

# Opening with analysis
r2 -A ./target_binary

Initial analysis commands

# Inside Radare2
aaa                    # Analyze all symbols and functions
afl                    # List all functions
is                     # List symbols

Mach-O specific commands

# Mach-O header information
iM                     # Show Mach-O header
iH                     # Show Mach-O header details
iHl                    # Show Mach-O load commands
iHm                    # Show Mach-O memory map
iHs                    # Show Mach-O sections

# For Objective-C (common in Mach-O)
ic                     # List Objective-C classes
icl                    # List Objective-C class methods

ESIL: The foundation for symbolic analysis

ESIL (Evaluable Strings Intermediate Language) is the key to symbolic execution in Radare2. It provides an architecture-independent representation of assembly instructions.

ESIL basics

ESIL features:

Viewing ESIL instructions

# Enable ESIL in disassembly
e asm.esil=true

# Disassemble function with ESIL
pdf@<function_name>

Example ESIL conversion

# x86_64 Assembly to ESIL
mov rax, 1    ->    1,rax,=
cmp rax, rbx  ->    rbx,rax,==,?{,},
je target     ->    rbx,rax,==,?{,target,jmp,}

ESIL’s stack-based format makes it ideal for symbolic execution because:

  1. It simplifies architecture independence
  2. Operations can be modeled mathematically
  3. Symbolic values can be tracked through stack operations

ESIL execution model

ESIL operates as a virtual machine with:

For example, the ESIL expression 1,rax,= means:

  1. Push the value 1 onto the stack
  2. Push the register identifier “rax” onto the stack
  3. Execute the assignment operation “=” which pops the top two values and assigns the first to the second

This can be extended to symbolic values, where instead of concrete values like “1”, symbolic variables are pushed to the stack and propagate through operations.

Performing symbolic execution

Using Radare2 directly

Basic ESIL emulation

# Initialize ESIL environment
aei                    # Initialize ESIL VM
aeim                   # Initialize ESIL memory
aeip                   # Set instruction pointer to current address

# Step through code
aes                    # Step one instruction
aesb                   # Step until branch
aesu <address>         # Continue until address

Debugging ESIL execution

# Inspect ESIL state
aer                    # Show ESIL registers
aest                   # Show ESIL stack
aep                    # Show ESIL debug info

Building a custom symbolic execution engine

To build a more advanced symbolic execution engine with Radare2 and an SMT solver, r2pipe can be used to:

  1. Extract ESIL expressions from disassembly
  2. Translate ESIL to solver constraints
  3. Track path conditions
  4. Generate solutions for execution paths

Example basic approach in Python (conceptual):

import r2pipe
import z3  # SMT solver

# Connect to Radare2
r2 = r2pipe.open("./binary")
r2.cmd("aaa")  # Analyze all

# Set up symbolic variables
solver = z3.Solver()
sym_input = z3.BitVec('input', 64)

# Analyze a function
function_addr = "0x100001280"
instructions = r2.cmd(f"pdf @ {function_addr}")

# Track constraints
path_constraints = []

# Process ESIL for each instruction
# ...

# Check if path is reachable
if solver.check() == z3.sat:
    model = solver.model()
    print(f"Solution: input = {model[sym_input]}")
else:
    print("Path is unreachable")

Taint analysis techniques

Manual approach with Radare2

While Radare2 doesn’t have dedicated taint analysis, ESIL can be used for basic taint tracking:

# Setup for tracking
[0x100001280]> aei
[0x100001280]> aeim
[0x100001280]> aeip

# Mark input as tainted
[0x100001280]> "ae rdi=0x12345678"

# Trace execution and watch data flow
[0x100001280]> aesot        # Step with trace
[0x100001284]> aer          # Check register values

Building custom taint analysis

Building a taint tracker requires:

  1. Defining sources (inputs)
  2. Defining sinks (sensitive functions)
  3. Defining propagation rules
  4. Tracking taint through execution

Example taint propagation rules for ESIL:

iOS security analysis

Common sources and entry points

iOS applications have several common entry points for untrusted data that should be marked as tainted:

  1. User input sources:
    • Text field inputs (UITextField): User-provided text
    • Web views (WKWebView, UIWebView): HTML, JavaScript execution
    • URL Scheme handlers (custom urls): External app communication
    • Deep links: URL parameters passed to the app
    • Pasteboard data: Content from clipboard
  2. Network sources:
    • NSURLSession and NSURLConnection: API responses
    • WebSocket communications: Real-time data
    • RESTful API responses: JSON/XML data
    • Push notifications: Remote notification payload
  3. Storage sources:
    • NSUserDefaults: Potentially modifiable preferences
    • File system access: External files
    • Keychain: If accessible to other processes
    • SQLite databases: User or downloaded data
  4. IPC mechanisms:
    • XPC services: Inter-process communication
    • App extensions: Data from extensions

Identifying entry points in Mach-O binaries

Using Radare2 to identify these entry points:

# Look for networking APIs
r2 -qc 'aaa; afl~NSURLSession,NSURLConnection' ./ios_app

# Look for file operation classes
r2 -qc 'aaa; afl~NSFileManager,NSData' ./ios_app

# Check URL handling
r2 -qc 'aaa; afl~handleOpenURL,application:openURL' ./ios_app

Using class/method information:

# List Objective-C classes in the binary
r2 -qc 'aaa; ic' ./ios_app

# List methods of specific classes
r2 -qc 'aaa; ic NSURLSession' ./ios_app

Sensitive sinks

Common sensitive sinks in iOS applications:

  1. Code execution:
    • system(), popen(): Command execution
    • NSTask: Process execution
    • dlopen(), dlsym(): Dynamic library loading
    • JavaScript evaluation: stringByEvaluatingJavaScriptFromString:
  2. Memory corruption:
    • C string functions: strcpy(), strcat(), sprintf()
    • Buffer operations without bounds checking
    • Format string vulnerabilities: [NSString stringWithFormat:] with user input
  3. File system operations:
    • NSFileManager: File operations
    • writeToFile:atomically:: File writes
    • URLByAppendingPathComponent:: Path manipulation
  4. SQL injection:
    • Direct SQL queries with Core Data
    • SQLite operations with user input
  5. WebView vulnerabilities:
    • loadHTMLString:baseURL:: Loading user content in WebViews
    • JavaScript injection points

Identifying sensitive sinks in Mach-O binaries

Using Radare2 to find sensitive sinks:

# Look for C functions prone to memory corruption
r2 -qc 'aaa; afl~strcpy,strcat,sprintf' ./ios_app

# Find file operations
r2 -qc 'aaa; afl~NSFileManager,writeToFile' ./ios_app

# Identify command execution
r2 -qc 'aaa; afl~system,popen,NSTask' ./ios_app

Tracking taint flow

To track the flow of tainted data from sources to sinks in iOS applications, you can build a custom analysis script using r2pipe and an SMT solver:

  1. Basic approach:
    • Identify sources and sinks
    • Track data flow through ESIL operations
    • Check constraints to determine feasibility
  2. Custom taint tracking script (concept):
import r2pipe
import z3

def track_taint_flow(binary_path, source_addr, sink_addr):
    r2 = r2pipe.open(binary_path)
    r2.cmd("aaa")

    # Mark source as tainted
    tainted_regs = set()
    tainted_mem = set()

    # Get execution path between source and sink
    path = get_execution_path(r2, source_addr, sink_addr)

    # Symbolically execute path and track taint
    solver = z3.Solver()
    sym_values = {}

    for instr in path:
        esil = r2.cmd(f"pie 1 @ {instr}")
        # Process ESIL and update taint sets
        # e.g., for "1,rax,=" we don't propagate taint
        # for "rbx,rax,=" we check if rbx is tainted

    # Check if taint reaches sink
    if is_sink_tainted(tainted_regs, tainted_mem, sink_addr):
        # Verify with SMT solver if this path is feasible
        if solver.check() == z3.sat:
            return True, solver.model()

    return False, None

Automating analysis workflows

Using Radare2 scripts

Create analyze.r2 script file:

# Setup
e asm.esil=true
aaa

# Find target
afl~check_password > funcs.txt

# Analyze first result
s `cat funcs.txt | head -n1 | awk '{print $1}'`

# Store ESIL output
pdf > esil_output.txt

# Find branch conditions
/E ==,?{ > conditions.txt

Run with:

r2 -i analyze.r2 -q ./target_binary

Using r2pipe for automation:

import r2pipe

def analyze_binary(path):
    r2 = r2pipe.open(path)
    r2.cmd("aaa")

    # Find all functions
    functions = r2.cmdj("aflj")

    # Find potential sources and sinks
    sources = []
    sinks = []

    for func in functions:
        if "URL" in func.get("name", ""):
            sources.append(func)
        elif any(sink in func.get("name", "") for sink in ["system", "exec", "eval"]):
            sinks.append(func)

    print(f"Found {len(sources)} potential sources and {len(sinks)} potential sinks")

    # Analyze each source-sink pair
    # ...

Practical examples

Analyzing URL scheme handler vulnerability

For an iOS application with a vulnerable URL handler:

# First, identify the URL handler function
r2 -qc 'aaa; afl~openURL,handleURL' ./ios_app

# Analyze the handler with ESIL
r2 -c 'e asm.esil=true; aaa; s <handler_address>; pdf' ./ios_app

Finding path traversal in file operations

# Identify file operation functions
r2 -qc 'aaa; afl~NSFileManager,writeToFile' ./ios_app

# Analyze for path manipulation operations
r2 -c 'e asm.esil=true; aaa; s <file_op_address>; pdf' ./ios_app

Identifying format string vulnerabilities

# Look for format string functions
r2 -qc 'aaa; afl~stringWithFormat,sprintf' ./ios_app

The password checker example

# Find function of interest
[0x0]> aaa
[0x0]> afl~check
0x100001280   24 512  -> 614   sym.check_password

# Examine with ESIL
[0x0]> s 0x100001280
[0x100001280]> e asm.esil=true
[0x100001280]> pdf
            ;-- func.100001280:
/ 24: sym.check_password (void *arg1);
|           ; arg void *arg1 @ rdi
|           0x100001280      55             8,rsp,-=,rbp,rsp,=[8]
|           0x100001281      4889e5         rsp,rbp,=
|           0x100001284      8a07           rdi,[1],al,=                    ; load first byte from input
|           0x100001286      3c41           0x41,al,==                      ; compare with 'A'
|           0x100001288      7520           0x41,al,==,!,?{,0x1000012aa,jmp,} ; jump if not equal
|           0x10000128a      8a4701         rdi,1,+,[1],al,=               ; load second byte
|           0x10000128d      3c42           0x42,al,==                     ; compare with 'B'
|           0x10000128f      7519           0x42,al,==,!,?{,0x1000012aa,jmp,} ; jump if not equal
|           0x100001291      31c0           eax,eax,^=                     ; set to true
|           0x100001293      c3             rsp,[8],rip,=,8,rsp,+=         ; return

We can trace this manually using ESIL stepping:

# Initialize ESIL VM
[0x100001280]> aei
[0x100001280]> aeim
[0x100001280]> aeip

# Set symbolic value for input (in rdi)
[0x100001280]> ae rdi="ABCD"  # Concrete input for demonstration

# Step through instructions
[0x100001280]> aes  # Step instruction
[0x100001281]> aes
[0x100001284]> aes  # Load first byte
[0x100001286]> aer al  # Check value in al
0x00000041  # A
[0x100001286]> aes  # Compare with 'A'
[0x100001288]> aes  # Branch check
[0x10000128a]> aes  # Load second byte
[0x10000128d]> aer al
0x00000042  # B

Limitations and recommendations

Radare2 limitations

Recommendations