How why we rewrote Circom in Rust 🦀
Jolt uses R1CS to constrain a subset of the VM's functionality to ensure that the virtual CPU's fetch-decode-execute loop is constrained to the RISC-V spec.
For example, we must ensure the 32-bit ELF row decodes correctly. Each 32-bit RV32I R-type instruction decodes to a tuple (rd, opcode, rs1, rs2). Jolt uses the unpacked versions of each of these in other constraints, so we'll need to constrain
instruction = rd || opcode || rs1 || rs2
In total we have around 30 individual constraints of similar format.
Constraints are expressed as a combination of degree-1 polynomials Az * Bz - Cz = 0. For example we might want to express that 4 8-bit field elements [Q_0, Q_1, Q_2, Q_3] pack into a 32-bit field element R, based on a condition S. In other words:
if (S) R == Q_0 || Q_1 || Q_2 || Q_3
In R1CS format:
Az = S
Bz = Q_0 2^8 * Q_1 2^16 * Q_2 2^24 * Q_3
Cz = R
Our full constraint: S * (Q_0 2^8 * Q_1 2^16 * Q_2 2^24 * Q_3 ) - R == 0
If S is one, R must be Q_0 || Q_1 || Q_2 || Q_3. We describe this state as the circuit being "satisfied".
Back to Jolt. Jolt is a multilinear SNARK so we'll need a multilinear prover. We started with Srinath Setty's Spartan as it was the fastest implemented multilinear SNARK. We started by using Circom to express all of the constraints described above.
But it was too slow! Jolt has 30 constraints per cycle of the CPU. At 2^24 steps the total number of constraints blows up to 2^29. The A, B, C matrices have dimensions:
- COLS = NUM_CYCLES * NUM_INPUTS
- ROWS = NUM_CYCLES * NUM_CONSTRAINTS
So the prover costs are increasing as a square of NUM_CYCLES... alarming.
@moodlezoup likes to say the only real optimization tool is taking advantage of repeated structure. Here we have 2^24 repeats of exactly the same fairly small computation. But we were outsourcing the load to Circom's libraries and improperly amortizing costs across these repeated cycles. To take full advantage of the structure, we knew we'd have to vertically integrate Circom.
Circom fundamentally performs two discrete functions for us:
1. DSL for describing linear combinations over the input variables
2. Computation of the witness z
Simple Symbolic DSL
For Jolt we defined a JoltIn enum of all the inputs to the circuit, then defined std::ops::{Add, Subtract, Multiply} over the enum and i64 constants. These aggregate into a LinearCombination struct which stores a vector of Term: (i64, JoltIn).
- JoltIn::A JoltIn::B -> LinearCombination = [(1, JoltIn::A), (1, JoltIn::B)]
- 12i64 * JoltIn::A -> LinearCombination = [(12, JoltIn::A)]
- 10i64 * LinearCombination - 12 -> LinearCombination = [(120, JoltIn::A), (-12, JoltIn::Constant)]
You can view this arithmetic as symbolic arithmetic in the variables JoltIn.
Now we can define constraints in natural language.
let packed == JoltIn::Q_0 2^8 * JoltIn::Q_1 2^16 * JoltIn::Q_2 2^24 JoltIn::Q_3;
cs.constrain_eq(packed, JoltIn::R);
Witness Computation
We can now compute the witness vector 'z' directly, using the relevant linear combinations.
let lc = vec![(120, JoltIn::A), (-12, JoltIn::Constant)];
let z_i = 120*JoltIn::A - 12;
By vertically integrating all of this logic we were able to take full advantage of the structure, accelerating key bottlenecks and increasing the readability of our constraint system (~100 LOC).