Symbolic execution and taint analysis for Mach-O binaries
Table of contents
- Overview
- Core concepts
- Analysis tools
- ESIL: The foundation for symbolic analysis
- Performing symbolic execution
- Taint analysis techniques
- iOS security analysis
- Automating analysis workflows
- Practical examples
- Limitations and recommendations
Overview
Symbolic execution and taint analysis help explain how data moves through Mach-O binaries when source code is unavailable. Symbolic execution reasons about reachable paths and required inputs. Taint analysis tracks untrusted data from entry points to sensitive operations.
This article focuses on practical analysis of macOS and iOS binaries with Radare2, ESIL, r2pipe, and SMT solvers. The goal is to identify feasible data flows, document assumptions, and decide where dynamic validation is needed.
Core concepts
Symbolic execution
Symbolic execution is a program analysis technique that executes programs with symbolic values instead of concrete inputs. When a program is executed symbolically:
- Inputs are represented as symbolic values (variables) rather than concrete values
- Program execution builds path constraints that represent conditions for following specific paths
- These constraints accumulate as the program encounters branch conditions
- At any point, constraints can be solved to find concrete input values that would reach that program state
Symbolic execution helps answer questions like:
- Which program paths are reachable
- What inputs are needed to reach specific code locations
- Whether vulnerabilities can be triggered
The technique constructs a mathematical formula that represents program behavior, which can be solved to produce inputs that follow specific execution paths.
Taint analysis
Taint analysis is a technique for tracking potentially untrusted data (taint) through a program’s execution to identify security vulnerabilities:
- Specific inputs are marked as “tainted” (untrusted)
- The analysis engine tracks how tainted data propagates through operations
- When tainted data reaches sensitive functions (sinks), potential vulnerabilities are flagged
For example, if user input (tainted) flows into a SQL query without proper sanitization, a SQL injection vulnerability may exist.
Taint tracking rules include:
- Direct assignment: If
y = xandxis tainted, thenybecomes tainted - Arithmetic/string operations: If any operand is tainted, the result is tainted
- Control dependencies: Conditions tested on tainted data can taint variables in conditionally executed code
Relationship between techniques
While symbolic execution and taint analysis are distinct techniques, they complement each other:
- Taint analysis can run without symbolic execution by tracking data flows.
- Symbolic execution improves taint analysis by checking whether a tainted path is feasible.
- Used together, the techniques help determine whether untrusted input can reach sensitive operations.
SMT solvers
SMT (Satisfiability Modulo Theories) solvers are core components of symbolic execution engines. They determine whether a set of constraints has a solution and, if so, find concrete values that satisfy those constraints.
Key aspects of SMT solvers include:
- Decision procedures for different theories (e.g., integers, arrays, bit-vectors)
- Constraint solving to find concrete values satisfying path constraints
- Model generation to produce example inputs that follow specific execution paths
Popular SMT solvers include:
- Z3 (Microsoft Research)
- Boolector (specialized for bit-vectors)
- CVC4/CVC5
- Yices
SMT solvers translate program constraints into logical formulas and attempt to find assignments to variables that make these formulas true. This allows symbolic execution engines to determine if execution paths are feasible and what inputs would follow those paths.
Analysis tools
Radare2 fundamentals
Radare2 is an open-source reverse engineering framework with useful primitives for Mach-O inspection, disassembly, emulation, and automation.
Setting up Radare2
# Basic opening of a Mach-O binary
r2 ./target_binary
# Opening with analysis
r2 -A ./target_binary
Initial analysis commands
# Inside Radare2
aaa # Analyze all symbols and functions
afl # List all functions
is # List symbols
Mach-O specific commands
# Mach-O header information
iM # Show Mach-O header
iH # Show Mach-O header details
iHl # Show Mach-O load commands
iHm # Show Mach-O memory map
iHs # Show Mach-O sections
# For Objective-C (common in Mach-O)
ic # List Objective-C classes
icl # List Objective-C class methods
ESIL: The foundation for symbolic analysis
ESIL (Evaluable Strings Intermediate Language) is the key to symbolic execution in Radare2. It provides an architecture-independent representation of assembly instructions.
ESIL basics
ESIL features:
- Stack-based notation where operands are pushed and operations pop/push results
- Architecture-independent representation
- Easy to parse and evaluate programmatically
Viewing ESIL instructions
# Enable ESIL in disassembly
e asm.esil=true
# Disassemble function with ESIL
pdf@<function_name>
Example ESIL conversion
# x86_64 Assembly to ESIL
mov rax, 1 -> 1,rax,=
cmp rax, rbx -> rbx,rax,==,?{,},
je target -> rbx,rax,==,?{,target,jmp,}
ESIL’s stack-based format makes it ideal for symbolic execution because:
- It simplifies architecture independence
- Operations can be modeled mathematically
- Symbolic values can be tracked through stack operations
ESIL execution model
ESIL operates as a virtual machine with:
- A stack for operations
- Register and memory states
- Conditional evaluation
For example, the ESIL expression 1,rax,= means:
- Push the value 1 onto the stack
- Push the register identifier “rax” onto the stack
- Execute the assignment operation “=” which pops the top two values and assigns the first to the second
This can be extended to symbolic values, where instead of concrete values like “1”, symbolic variables are pushed to the stack and propagate through operations.
Performing symbolic execution
Using Radare2 directly
Basic ESIL emulation
# Initialize ESIL environment
aei # Initialize ESIL VM
aeim # Initialize ESIL memory
aeip # Set instruction pointer to current address
# Step through code
aes # Step one instruction
aesb # Step until branch
aesu <address> # Continue until address
Debugging ESIL execution
# Inspect ESIL state
aer # Show ESIL registers
aest # Show ESIL stack
aep # Show ESIL debug info
Building a custom symbolic execution engine
To build a more advanced symbolic execution engine with Radare2 and an SMT solver, r2pipe can be used to:
- Extract ESIL expressions from disassembly
- Translate ESIL to solver constraints
- Track path conditions
- Generate solutions for execution paths
Example basic approach in Python (conceptual):
import r2pipe
import z3 # SMT solver
# Connect to Radare2
r2 = r2pipe.open("./binary")
r2.cmd("aaa") # Analyze all
# Set up symbolic variables
solver = z3.Solver()
sym_input = z3.BitVec('input', 64)
# Analyze a function
function_addr = "0x100001280"
instructions = r2.cmd(f"pdf @ {function_addr}")
# Track constraints
path_constraints = []
# Process ESIL for each instruction
# ...
# Check if path is reachable
if solver.check() == z3.sat:
model = solver.model()
print(f"Solution: input = {model[sym_input]}")
else:
print("Path is unreachable")
Taint analysis techniques
Manual approach with Radare2
While Radare2 doesn’t have dedicated taint analysis, ESIL can be used for basic taint tracking:
# Setup for tracking
[0x100001280]> aei
[0x100001280]> aeim
[0x100001280]> aeip
# Mark input as tainted
[0x100001280]> "ae rdi=0x12345678"
# Trace execution and watch data flow
[0x100001280]> aesot # Step with trace
[0x100001284]> aer # Check register values
Building custom taint analysis
Building a taint tracker requires:
- Defining sources (inputs)
- Defining sinks (sensitive functions)
- Defining propagation rules
- Tracking taint through execution
Example taint propagation rules for ESIL:
- Assignment operations:
val,reg,=-> Ifvalis tainted,regbecomes tainted - Arithmetic:
reg1,reg2,+,reg3,=-> If eitherreg1orreg2is tainted,reg3becomes tainted - Memory:
reg1,[4],reg2,=-> If memory atreg1is tainted,reg2becomes tainted
iOS security analysis
Common sources and entry points
iOS applications have several common entry points for untrusted data that should be marked as tainted:
- User input sources:
- Text field inputs (
UITextField): User-provided text - Web views (
WKWebView,UIWebView): HTML, JavaScript execution - URL Scheme handlers (custom urls): External app communication
- Deep links: URL parameters passed to the app
- Pasteboard data: Content from clipboard
- Text field inputs (
- Network sources:
NSURLSessionandNSURLConnection: API responses- WebSocket communications: Real-time data
- RESTful API responses: JSON/XML data
- Push notifications: Remote notification payload
- Storage sources:
NSUserDefaults: Potentially modifiable preferences- File system access: External files
- Keychain: If accessible to other processes
- SQLite databases: User or downloaded data
- IPC mechanisms:
- XPC services: Inter-process communication
- App extensions: Data from extensions
Identifying entry points in Mach-O binaries
Using Radare2 to identify these entry points:
# Look for networking APIs
r2 -qc 'aaa; afl~NSURLSession,NSURLConnection' ./ios_app
# Look for file operation classes
r2 -qc 'aaa; afl~NSFileManager,NSData' ./ios_app
# Check URL handling
r2 -qc 'aaa; afl~handleOpenURL,application:openURL' ./ios_app
Using class/method information:
# List Objective-C classes in the binary
r2 -qc 'aaa; ic' ./ios_app
# List methods of specific classes
r2 -qc 'aaa; ic NSURLSession' ./ios_app
Sensitive sinks
Common sensitive sinks in iOS applications:
- Code execution:
system(),popen(): Command executionNSTask: Process executiondlopen(),dlsym(): Dynamic library loading- JavaScript evaluation:
stringByEvaluatingJavaScriptFromString:
- Memory corruption:
- C string functions:
strcpy(),strcat(),sprintf() - Buffer operations without bounds checking
- Format string vulnerabilities:
[NSString stringWithFormat:]with user input
- C string functions:
- File system operations:
NSFileManager: File operationswriteToFile:atomically:: File writesURLByAppendingPathComponent:: Path manipulation
- SQL injection:
- Direct SQL queries with Core Data
- SQLite operations with user input
- WebView vulnerabilities:
loadHTMLString:baseURL:: Loading user content in WebViews- JavaScript injection points
Identifying sensitive sinks in Mach-O binaries
Using Radare2 to find sensitive sinks:
# Look for C functions prone to memory corruption
r2 -qc 'aaa; afl~strcpy,strcat,sprintf' ./ios_app
# Find file operations
r2 -qc 'aaa; afl~NSFileManager,writeToFile' ./ios_app
# Identify command execution
r2 -qc 'aaa; afl~system,popen,NSTask' ./ios_app
Tracking taint flow
To track the flow of tainted data from sources to sinks in iOS applications, you can build a custom analysis script using r2pipe and an SMT solver:
- Basic approach:
- Identify sources and sinks
- Track data flow through ESIL operations
- Check constraints to determine feasibility
- Custom taint tracking script (concept):
import r2pipe
import z3
def track_taint_flow(binary_path, source_addr, sink_addr):
r2 = r2pipe.open(binary_path)
r2.cmd("aaa")
# Mark source as tainted
tainted_regs = set()
tainted_mem = set()
# Get execution path between source and sink
path = get_execution_path(r2, source_addr, sink_addr)
# Symbolically execute path and track taint
solver = z3.Solver()
sym_values = {}
for instr in path:
esil = r2.cmd(f"pie 1 @ {instr}")
# Process ESIL and update taint sets
# e.g., for "1,rax,=" we don't propagate taint
# for "rbx,rax,=" we check if rbx is tainted
# Check if taint reaches sink
if is_sink_tainted(tainted_regs, tainted_mem, sink_addr):
# Verify with SMT solver if this path is feasible
if solver.check() == z3.sat:
return True, solver.model()
return False, None
Automating analysis workflows
Using Radare2 scripts
Create analyze.r2 script file:
# Setup
e asm.esil=true
aaa
# Find target
afl~check_password > funcs.txt
# Analyze first result
s `cat funcs.txt | head -n1 | awk '{print $1}'`
# Store ESIL output
pdf > esil_output.txt
# Find branch conditions
/E ==,?{ > conditions.txt
Run with:
r2 -i analyze.r2 -q ./target_binary
Using r2pipe for automation:
import r2pipe
def analyze_binary(path):
r2 = r2pipe.open(path)
r2.cmd("aaa")
# Find all functions
functions = r2.cmdj("aflj")
# Find potential sources and sinks
sources = []
sinks = []
for func in functions:
if "URL" in func.get("name", ""):
sources.append(func)
elif any(sink in func.get("name", "") for sink in ["system", "exec", "eval"]):
sinks.append(func)
print(f"Found {len(sources)} potential sources and {len(sinks)} potential sinks")
# Analyze each source-sink pair
# ...
Practical examples
Analyzing URL scheme handler vulnerability
For an iOS application with a vulnerable URL handler:
# First, identify the URL handler function
r2 -qc 'aaa; afl~openURL,handleURL' ./ios_app
# Analyze the handler with ESIL
r2 -c 'e asm.esil=true; aaa; s <handler_address>; pdf' ./ios_app
Finding path traversal in file operations
# Identify file operation functions
r2 -qc 'aaa; afl~NSFileManager,writeToFile' ./ios_app
# Analyze for path manipulation operations
r2 -c 'e asm.esil=true; aaa; s <file_op_address>; pdf' ./ios_app
Identifying format string vulnerabilities
# Look for format string functions
r2 -qc 'aaa; afl~stringWithFormat,sprintf' ./ios_app
The password checker example
# Find function of interest
[0x0]> aaa
[0x0]> afl~check
0x100001280 24 512 -> 614 sym.check_password
# Examine with ESIL
[0x0]> s 0x100001280
[0x100001280]> e asm.esil=true
[0x100001280]> pdf
;-- func.100001280:
/ 24: sym.check_password (void *arg1);
| ; arg void *arg1 @ rdi
| 0x100001280 55 8,rsp,-=,rbp,rsp,=[8]
| 0x100001281 4889e5 rsp,rbp,=
| 0x100001284 8a07 rdi,[1],al,= ; load first byte from input
| 0x100001286 3c41 0x41,al,== ; compare with 'A'
| 0x100001288 7520 0x41,al,==,!,?{,0x1000012aa,jmp,} ; jump if not equal
| 0x10000128a 8a4701 rdi,1,+,[1],al,= ; load second byte
| 0x10000128d 3c42 0x42,al,== ; compare with 'B'
| 0x10000128f 7519 0x42,al,==,!,?{,0x1000012aa,jmp,} ; jump if not equal
| 0x100001291 31c0 eax,eax,^= ; set to true
| 0x100001293 c3 rsp,[8],rip,=,8,rsp,+= ; return
We can trace this manually using ESIL stepping:
# Initialize ESIL VM
[0x100001280]> aei
[0x100001280]> aeim
[0x100001280]> aeip
# Set symbolic value for input (in rdi)
[0x100001280]> ae rdi="ABCD" # Concrete input for demonstration
# Step through instructions
[0x100001280]> aes # Step instruction
[0x100001281]> aes
[0x100001284]> aes # Load first byte
[0x100001286]> aer al # Check value in al
0x00000041 # A
[0x100001286]> aes # Compare with 'A'
[0x100001288]> aes # Branch check
[0x10000128a]> aes # Load second byte
[0x10000128d]> aer al
0x00000042 # B
Limitations and recommendations
Radare2 limitations
- No native solver: Requires external SMT solver integration via r2pipe
- Manual tracing: Much of the symbolic analysis requires manual tracing
- ESIL limits: Some complex operations may not be fully represented
- No built-in taint tracking: Requires manual tracking of data flow
Recommendations
- Start with simple examples to understand ESIL representation
- For complex iOS apps, focus on specific functions rather than whole-program analysis
- Combine static analysis with dynamic ESIL emulation
- Consider integrating an SMT solver (like Z3) with r2pipe for constraint solving
- Use class-dump or class-dump-z to better understand Objective-C class structure
- Build custom scripts for specific analysis tasks
- For Swift-heavy apps, consider complementary tools like Ghidra with its Swift support