Great time speaking at
@summit_defi during
@EFDevcon in Argentina🇦🇷! Some comments said the talk went at a rapid pace with overwhelming information on fuzzing technology.
Here's the a recap of my talk:
---------------------------------------------------
Part A: Understanding of Exploits & Challenges of Smart Contract Fuzzing
---------------------------------------------------
Key Insights and Core Concepts:
Value Extraction Exploit: At its core, exploits in DeFi are characterized by two-step optimization process:
1. Sequence generation: Deciding which state-changing smart contract functions to invoke and in what order.
2. Parameter mutation: Continuously optimizing input parameters for these functions to trigger vulnerabilities.
- Fuzzing and State Change: The goal is to modify the state via sequences and parameters that lead to an exploit.
- Feasibility of fuzzing: Dimensionality Reduction. Real-world examples of loop-based exploits and various types of reducible actions in practice.
Challenges of Smart Contracts Fuzzing in Testcase generations:
- State Explosion: The combinatorial explosion of possible function sequences and states.
- Multi-contract dependencies: Protocols often span multiple smart contracts interacting with one another.
- Proxy contracts and storage separation: Logic and data stored across different contracts complicate state tracking.
- External Calls: Protocols may invoke external contracts, adding layers of complexity and uncertainty.
Current Approach for Testcase generations:
- Sequence-based approach: Pseudo-random sequence mutation with Read-After-Write (RAW) relationship construction by leveraging SLOAD and SSTORE opcode.
- Custom Invariants/Property-based testing/specifications: Auditors with expert knowledge can specify testcases with deep understanding of the program under test (PUT).
- Snapshot-based approach: Exploring interesting states and mutating based on chosen corpus
Challenges of Smart Contracts Fuzzing in Input Parameters generations:
- Common specific ABI-specific input types: String and address types are uniquely defined.
- Dynamic input types: dynamic array type and dynamic tuple...etc.
- Complex input types: Dynamic tuples, arrays, and compressed calldata increase fuzzing difficulty.
Current Approach for Input parameter generations:
- LibAFL with Havoc Strategy assisted with abi-type mutation: (bitflip, RandMutator, …etc)
- Coverage-based feedback mechanism: Code-coverage metrics, distance-metrics.
- Optimization algorithms: Leveraging algorithms such as Particle Swarm Optimization, Stochastic gradient descent, Genetic algorithms and learning-based methods.
Fuzzing Jargon and Framework:
- Argument Initialization: Setting initial input values for fuzz testing.
- Sequence Generation: Creating sequences of contract calls to test.
- Mutation: Modifying input parameters for subsequent fuzzing iterations.
- Feedback Mechanism: Metrics like code coverage or distance to branch conditions that guide mutations.
- Oracle: In the fuzzing context, defines what constitutes a failure or exploit (not to be confused with price oracles).
- Scheduling: The energy allocation in the fuzzing process.
---------------------------------------------------
Part B: Proposed Solution - Three-Layer Fuzzing Framework
---------------------------------------------------
1. Language Model (LM)-Guided Fuzzing
- Use LLMs for static and dynamic analysis to guide fuzzing intelligently. Four key components:
- Taint Analysis: Tracking data flow to identify relevant inputs.
- External Call Trace Analysis: Understanding call hierarchies and dependencies.
- Compressed Data Generation: Generating complex calldata inputs.
- Dynamic Runtime Information: Observing runtime behavior to guide mutation.
Example: Using an LLM to identify vulnerable code lines and map them to control flow graph (CFG) basic blocks to target fuzzing efforts.
- LLM aids in linking caller and callee functions, understanding which input parameters affect nested calls—crucial for mutating the correct parameters in complex functions like batchSwap.
2. State-Based Fuzzing Approach - CFG-guided fuzzing with three phases:
--- Identify the basic block corresponding to a vulnerable branch. --- Analyze opcode-level conditions (e.g., JUMPI, comparison opcodes) to discover which storage or arguments influence branch decisions. --- Use distance metrics on storage and arguments to guide input mutation.
- Maintain a state pool: A repository of interesting blockchain states encountered during fuzzing, enabling reuse and combination to increase coverage.
- Introduce state diversity: Combine states from different execution paths to explore more scenarios.
3. GPU-Accelerated EVM Execution
- Transform smart contract bytecode into GPU-parallelizable code to massively speed up fuzzing.
- Enables running multiple fuzzing instances concurrently, enhancing exploration of the state space.
---------------------------------------------------
This recap is intentionally concise — the full talk goes much deeper. Watch it here if you want the unfiltered version:
youtube.com/watch?v=DidSdyN1….
I'm genuinely curious:
→ Which of the three layers (LLM-guided, state-pool CFG, or GPU acceleration) excites you most?
→ Have you already hit one of the fuzzing pain points I described in production?
→ Which of the open-ended questions the talk implicitly raises do you believe will shape the next 1-3 years of smart contract fuzzing?
→ Or any topics in AI / Security :]!
Drop it in the replies or DM me!