A guide to symbolic execution, taint analysis, SMT solvers, radare2, ESIL, and Mach-O binary workflows.

Symbolic execution and taint analysis for Mach-O binaries

Overview
Core concepts
Analysis tools
- Radare2 fundamentals
ESIL: The foundation for symbolic analysis
Performing symbolic execution
- Using Radare2 directly
Taint analysis techniques
- Manual approach with Radare2
iOS security analysis
Automating analysis workflows
Practical examples
Limitations and recommendations

Overview

Symbolic execution and taint analysis help explain how data moves through Mach-O binaries when source code is unavailable. Symbolic execution reasons about reachable paths and required inputs. Taint analysis tracks untrusted data from entry points to sensitive operations.

This article focuses on practical analysis of macOS and iOS binaries with Radare2, ESIL, r2pipe, and SMT solvers. The goal is to identify feasible data flows, document assumptions, and decide where dynamic validation is needed.

Core concepts

Symbolic execution

Symbolic execution is a program analysis technique that executes programs with symbolic values instead of concrete inputs. When a program is executed symbolically:

Inputs are represented as symbolic values (variables) rather than concrete values
Program execution builds path constraints that represent conditions for following specific paths
These constraints accumulate as the program encounters branch conditions
At any point, constraints can be solved to find concrete input values that would reach that program state

Symbolic execution helps answer questions like:

Which program paths are reachable
What inputs are needed to reach specific code locations
Whether vulnerabilities can be triggered

The technique constructs a mathematical formula that represents program behavior, which can be solved to produce inputs that follow specific execution paths.

Taint analysis

Taint analysis is a technique for tracking potentially untrusted data (taint) through a program’s execution to identify security vulnerabilities:

Specific inputs are marked as “tainted” (untrusted)
The analysis engine tracks how tainted data propagates through operations
When tainted data reaches sensitive functions (sinks), potential vulnerabilities are flagged

For example, if user input (tainted) flows into a SQL query without proper sanitization, a SQL injection vulnerability may exist.

Taint tracking rules include:

Direct assignment: If y = x and x is tainted, then y becomes tainted
Arithmetic/string operations: If any operand is tainted, the result is tainted
Control dependencies: Conditions tested on tainted data can taint variables in conditionally executed code

Relationship between techniques

While symbolic execution and taint analysis are distinct techniques, they complement each other:

Taint analysis can run without symbolic execution by tracking data flows.
Symbolic execution improves taint analysis by checking whether a tainted path is feasible.
Used together, the techniques help determine whether untrusted input can reach sensitive operations.

SMT solvers

SMT (Satisfiability Modulo Theories) solvers are core components of symbolic execution engines. They determine whether a set of constraints has a solution and, if so, find concrete values that satisfy those constraints.

Key aspects of SMT solvers include:

Decision procedures for different theories (e.g., integers, arrays, bit-vectors)
Constraint solving to find concrete values satisfying path constraints
Model generation to produce example inputs that follow specific execution paths

Popular SMT solvers include:

Z3 (Microsoft Research)
Boolector (specialized for bit-vectors)
CVC4/CVC5
Yices

SMT solvers translate program constraints into logical formulas and attempt to find assignments to variables that make these formulas true. This allows symbolic execution engines to determine if execution paths are feasible and what inputs would follow those paths.

Analysis tools

Radare2 fundamentals

Radare2 is an open-source reverse engineering framework with useful primitives for Mach-O inspection, disassembly, emulation, and automation.

Setting up Radare2

# Basic opening of a Mach-O binary
r2 ./target_binary

# Opening with analysis
r2 -A ./target_binary

Initial analysis commands

# Inside Radare2
aaa                    # Analyze all symbols and functions
afl                    # List all functions
is                     # List symbols

Mach-O specific commands

# Mach-O header information
iM                     # Show Mach-O header
iH                     # Show Mach-O header details
iHl                    # Show Mach-O load commands
iHm                    # Show Mach-O memory map
iHs                    # Show Mach-O sections

# For Objective-C (common in Mach-O)
ic                     # List Objective-C classes
icl                    # List Objective-C class methods

ESIL: The foundation for symbolic analysis

ESIL (Evaluable Strings Intermediate Language) is the key to symbolic execution in Radare2. It provides an architecture-independent representation of assembly instructions.

ESIL basics

ESIL features:

Stack-based notation where operands are pushed and operations pop/push results
Architecture-independent representation
Easy to parse and evaluate programmatically

Viewing ESIL instructions

# Enable ESIL in disassembly
e asm.esil=true

# Disassemble function with ESIL
pdf@<function_name>

Example ESIL conversion

# x86_64 Assembly to ESIL
mov rax, 1    ->    1,rax,=
cmp rax, rbx  ->    rbx,rax,==,?{,},
je target     ->    rbx,rax,==,?{,target,jmp,}

ESIL’s stack-based format makes it ideal for symbolic execution because:

It simplifies architecture independence
Operations can be modeled mathematically
Symbolic values can be tracked through stack operations

ESIL execution model

ESIL operates as a virtual machine with:

A stack for operations
Register and memory states
Conditional evaluation

For example, the ESIL expression 1,rax,= means:

Push the value 1 onto the stack
Push the register identifier “rax” onto the stack
Execute the assignment operation “=” which pops the top two values and assigns the first to the second

This can be extended to symbolic values, where instead of concrete values like “1”, symbolic variables are pushed to the stack and propagate through operations.

Performing symbolic execution

Using Radare2 directly

Basic ESIL emulation

# Initialize ESIL environment
aei                    # Initialize ESIL VM
aeim                   # Initialize ESIL memory
aeip                   # Set instruction pointer to current address

# Step through code
aes                    # Step one instruction
aesb                   # Step until branch
aesu <address>         # Continue until address

Debugging ESIL execution

# Inspect ESIL state
aer                    # Show ESIL registers
aest                   # Show ESIL stack
aep                    # Show ESIL debug info

Building a custom symbolic execution engine

To build a more advanced symbolic execution engine with Radare2 and an SMT solver, r2pipe can be used to:

Extract ESIL expressions from disassembly
Translate ESIL to solver constraints
Track path conditions
Generate solutions for execution paths

Example basic approach in Python (conceptual):

import r2pipe
import z3  # SMT solver

# Connect to Radare2
r2 = r2pipe.open("./binary")
r2.cmd("aaa")  # Analyze all

# Set up symbolic variables
solver = z3.Solver()
sym_input = z3.BitVec('input', 64)

# Analyze a function
function_addr = "0x100001280"
instructions = r2.cmd(f"pdf @ {function_addr}")

# Track constraints
path_constraints = []

# Process ESIL for each instruction
# ...

# Check if path is reachable
if solver.check() == z3.sat:
    model = solver.model()
    print(f"Solution: input = {model[sym_input]}")
else:
    print("Path is unreachable")

Taint analysis techniques

Manual approach with Radare2

While Radare2 doesn’t have dedicated taint analysis, ESIL can be used for basic taint tracking:

# Setup for tracking
[0x100001280]> aei
[0x100001280]> aeim
[0x100001280]> aeip

# Mark input as tainted
[0x100001280]> "ae rdi=0x12345678"

# Trace execution and watch data flow
[0x100001280]> aesot        # Step with trace
[0x100001284]> aer          # Check register values

Building custom taint analysis

Building a taint tracker requires:

Defining sources (inputs)
Defining sinks (sensitive functions)
Defining propagation rules
Tracking taint through execution

Example taint propagation rules for ESIL:

Assignment operations: val,reg,= -> If val is tainted, reg becomes tainted
Arithmetic: reg1,reg2,+,reg3,= -> If either reg1 or reg2 is tainted, reg3 becomes tainted
Memory: reg1,[4],reg2,= -> If memory at reg1 is tainted, reg2 becomes tainted

iOS security analysis

Common sources and entry points

iOS applications have several common entry points for untrusted data that should be marked as tainted:

User input sources:
- Text field inputs (UITextField): User-provided text
- Web views (WKWebView, UIWebView): HTML, JavaScript execution
- URL Scheme handlers (custom urls): External app communication
- Deep links: URL parameters passed to the app
- Pasteboard data: Content from clipboard
Network sources:
- NSURLSession and NSURLConnection: API responses
- WebSocket communications: Real-time data
- RESTful API responses: JSON/XML data
- Push notifications: Remote notification payload
Storage sources:
- NSUserDefaults: Potentially modifiable preferences
- File system access: External files
- Keychain: If accessible to other processes
- SQLite databases: User or downloaded data
IPC mechanisms:
- XPC services: Inter-process communication
- App extensions: Data from extensions

Identifying entry points in Mach-O binaries

Using Radare2 to identify these entry points:

# Look for networking APIs
r2 -qc 'aaa; afl~NSURLSession,NSURLConnection' ./ios_app

# Look for file operation classes
r2 -qc 'aaa; afl~NSFileManager,NSData' ./ios_app

# Check URL handling
r2 -qc 'aaa; afl~handleOpenURL,application:openURL' ./ios_app

Using class/method information:

# List Objective-C classes in the binary
r2 -qc 'aaa; ic' ./ios_app

# List methods of specific classes
r2 -qc 'aaa; ic NSURLSession' ./ios_app

Sensitive sinks

Common sensitive sinks in iOS applications:

Code execution:
- system(), popen(): Command execution
- NSTask: Process execution
- dlopen(), dlsym(): Dynamic library loading
- JavaScript evaluation: stringByEvaluatingJavaScriptFromString:
Memory corruption:
- C string functions: strcpy(), strcat(), sprintf()
- Buffer operations without bounds checking
- Format string vulnerabilities: [NSString stringWithFormat:] with user input
File system operations:
- NSFileManager: File operations
- writeToFile:atomically:: File writes
- URLByAppendingPathComponent:: Path manipulation
SQL injection:
- Direct SQL queries with Core Data
- SQLite operations with user input
WebView vulnerabilities:
- loadHTMLString:baseURL:: Loading user content in WebViews
- JavaScript injection points

Identifying sensitive sinks in Mach-O binaries

Using Radare2 to find sensitive sinks:

# Look for C functions prone to memory corruption
r2 -qc 'aaa; afl~strcpy,strcat,sprintf' ./ios_app

# Find file operations
r2 -qc 'aaa; afl~NSFileManager,writeToFile' ./ios_app

# Identify command execution
r2 -qc 'aaa; afl~system,popen,NSTask' ./ios_app

Tracking taint flow

To track the flow of tainted data from sources to sinks in iOS applications, you can build a custom analysis script using r2pipe and an SMT solver:

Basic approach:
- Identify sources and sinks
- Track data flow through ESIL operations
- Check constraints to determine feasibility
Custom taint tracking script (concept):

import r2pipe
import z3

def track_taint_flow(binary_path, source_addr, sink_addr):
    r2 = r2pipe.open(binary_path)
    r2.cmd("aaa")

    # Mark source as tainted
    tainted_regs = set()
    tainted_mem = set()

    # Get execution path between source and sink
    path = get_execution_path(r2, source_addr, sink_addr)

    # Symbolically execute path and track taint
    solver = z3.Solver()
    sym_values = {}

    for instr in path:
        esil = r2.cmd(f"pie 1 @ {instr}")
        # Process ESIL and update taint sets
        # e.g., for "1,rax,=" we don't propagate taint
        # for "rbx,rax,=" we check if rbx is tainted

    # Check if taint reaches sink
    if is_sink_tainted(tainted_regs, tainted_mem, sink_addr):
        # Verify with SMT solver if this path is feasible
        if solver.check() == z3.sat:
            return True, solver.model()

    return False, None

Automating analysis workflows

Using Radare2 scripts

Create analyze.r2 script file:

# Setup
e asm.esil=true
aaa

# Find target
afl~check_password > funcs.txt

# Analyze first result
s `cat funcs.txt | head -n1 | awk '{print $1}'`

# Store ESIL output
pdf > esil_output.txt

# Find branch conditions
/E ==,?{ > conditions.txt

Run with:

r2 -i analyze.r2 -q ./target_binary

Using r2pipe for automation:

import r2pipe

def analyze_binary(path):
    r2 = r2pipe.open(path)
    r2.cmd("aaa")

    # Find all functions
    functions = r2.cmdj("aflj")

    # Find potential sources and sinks
    sources = []
    sinks = []

    for func in functions:
        if "URL" in func.get("name", ""):
            sources.append(func)
        elif any(sink in func.get("name", "") for sink in ["system", "exec", "eval"]):
            sinks.append(func)

    print(f"Found {len(sources)} potential sources and {len(sinks)} potential sinks")

    # Analyze each source-sink pair
    # ...

Practical examples

Analyzing URL scheme handler vulnerability

For an iOS application with a vulnerable URL handler:

# First, identify the URL handler function
r2 -qc 'aaa; afl~openURL,handleURL' ./ios_app

# Analyze the handler with ESIL
r2 -c 'e asm.esil=true; aaa; s <handler_address>; pdf' ./ios_app

Finding path traversal in file operations

# Identify file operation functions
r2 -qc 'aaa; afl~NSFileManager,writeToFile' ./ios_app

# Analyze for path manipulation operations
r2 -c 'e asm.esil=true; aaa; s <file_op_address>; pdf' ./ios_app

Identifying format string vulnerabilities

# Look for format string functions
r2 -qc 'aaa; afl~stringWithFormat,sprintf' ./ios_app

The password checker example

# Find function of interest
[0x0]> aaa
[0x0]> afl~check
0x100001280   24 512  -> 614   sym.check_password

# Examine with ESIL
[0x0]> s 0x100001280
[0x100001280]> e asm.esil=true
[0x100001280]> pdf
            ;-- func.100001280:
/ 24: sym.check_password (void *arg1);
|           ; arg void *arg1 @ rdi
|           0x100001280      55             8,rsp,-=,rbp,rsp,=[8]
|           0x100001281      4889e5         rsp,rbp,=
|           0x100001284      8a07           rdi,[1],al,=                    ; load first byte from input
|           0x100001286      3c41           0x41,al,==                      ; compare with 'A'
|           0x100001288      7520           0x41,al,==,!,?{,0x1000012aa,jmp,} ; jump if not equal
|           0x10000128a      8a4701         rdi,1,+,[1],al,=               ; load second byte
|           0x10000128d      3c42           0x42,al,==                     ; compare with 'B'
|           0x10000128f      7519           0x42,al,==,!,?{,0x1000012aa,jmp,} ; jump if not equal
|           0x100001291      31c0           eax,eax,^=                     ; set to true
|           0x100001293      c3             rsp,[8],rip,=,8,rsp,+=         ; return

We can trace this manually using ESIL stepping:

# Initialize ESIL VM
[0x100001280]> aei
[0x100001280]> aeim
[0x100001280]> aeip

# Set symbolic value for input (in rdi)
[0x100001280]> ae rdi="ABCD"  # Concrete input for demonstration

# Step through instructions
[0x100001280]> aes  # Step instruction
[0x100001281]> aes
[0x100001284]> aes  # Load first byte
[0x100001286]> aer al  # Check value in al
0x00000041  # A
[0x100001286]> aes  # Compare with 'A'
[0x100001288]> aes  # Branch check
[0x10000128a]> aes  # Load second byte
[0x10000128d]> aer al
0x00000042  # B

Limitations and recommendations

Radare2 limitations

No native solver: Requires external SMT solver integration via r2pipe
Manual tracing: Much of the symbolic analysis requires manual tracing
ESIL limits: Some complex operations may not be fully represented
No built-in taint tracking: Requires manual tracking of data flow

Recommendations

Start with simple examples to understand ESIL representation
For complex iOS apps, focus on specific functions rather than whole-program analysis
Combine static analysis with dynamic ESIL emulation
Consider integrating an SMT solver (like Z3) with r2pipe for constraint solving
Use class-dump or class-dump-z to better understand Objective-C class structure
Build custom scripts for specific analysis tasks
For Swift-heavy apps, consider complementary tools like Ghidra with its Swift support