We now have the computation that we want to prove, nicely expressed as a sequence of lines that simply assign the result of the multiplication of sums of variables to a new variable: great!
That’s not enough though: to build our “Pinocchio” zk-Snark that can prove that we did indeed run our computation, without revealing the values used when doing it, we need to take it one step further: make another transformation that describes the same computation in a different format.
For this new format, we’ll introduce a vector, w, that starts with the number 1, and then has, for each subsequent field, a place “reserved” to each of the variables of the computation. So, an instance of this w vector, is made of the values associated to all the variables used during the computation. The order of these variables will have some importance, related to the concept of “public input” (maybe I’ll write about this in a future post); for the time being, let’s see a possible w vector related to our example computation: w = (1, y, x, T1, z, T2, T3).
It may be useful to recall some of the definitions seen in the previous posts and connect them to our example:
Our problem is:
y = (x+z)^2 + z +1
If x=1 and z=2, we have an instance of the problem, and if we unroll the computation we get:
With the above, we are ready to make another reduction (reduction = transforming one problem into another problem), and express the computation in a new format, a “rank-1 constraint system” (R1CS). This new format is not really “human readable”, so this translation will typically be performed by some kind of “compiler”. The idea of R1CS is that it keeps track of the values that each variable assumes during the computation, and binds the relationships among all those variables that are implied by the computation itself.
These “relationships implied by the computation”, usually called “constraints” or “gates”, are easy to spot in our example: we have one for each line of the “unrolled computation” seen before. Our constraints are then only made of multiplications of sums of variables, e.g. 3*y = (2*p + 4*f) * (7*g + 2*k +3*j). For each line, i.e. for each constraint or gate, R1CS asks us to calculate three vectors a, b and c so that:
w °c = w °a * w °b
The operator “°” means that we multiply the values of the two “zipped” vectors that are in the same vector position: first_position * first_position, second_position * second_position, ... and then sum up the results of all these multiplications together, ending up with just a number as a result.
Time to go back to our example, and look at its first constraint:
T1 = x * x
We know that each position of the w vector, excluding the first one, is associated to a variable: (1, y, x, T1, z, T2, T3). So the variable T1 is associated to the 4th position, and x to the 3rd one.
If we understand how the operator “°” works, we can see that the vectors a and b will have to have an effect on the x variable, which is in 3rd position, as we are using it to represent " x * x ". So a and b for this constraint will both be the vector (0,0,1,0,0,0,0), while the vector c, when zipped to w, will output T1, the left side of the equation, and so the c vector for this constraint will be (0,0,0,1,0,0,0).
We saw before that w=(1, 11, 1, 1, 2, 4, 4) is a solution of our computation, i.e. the vector has in it all the values of all the variables used during the computation. It’s easy to verify that w °c = w °a * w °b, in fact
w°c | = | (1, 11, 1, 1, 2, 4, 4)°(0,0,0,1,0,0,0) | = | 1*0 + 11*0 + 1*0 + 1*1 + 2*0 + 4*0 + 4*0 | = | 1 |
w°a | = | (1, 11, 1, 1, 2, 4, 4)°(0,0,1,0,0,0,0) | = | 1*0 + 11*0 + 1*1 + 1*0 + 2*0 + 4*0 + 4*0 | = | 1 |
w°b | = | (1, 11, 1, 1, 2, 4, 4)°(0,0,1,0,0,0,0) | = | 1*0 + 11*0 + 1*1 + 1*0 + 2*0 + 4*0 + 4*0 | = | 1 |
So 1 = 1*1
Now, we can see that these three specific vectors, (a, b and c) together represent the first constraints of the computation, a constraint that binds the variables x and T1.
Let’s look at another constraint, i.e. another line of the computation, let’s pick the third line:
T3 = 2 * x * z
Our a vector this time needs to output 2 * x, so it will be the vector (0,0,2,0,0,0,0), while the b vector outputs just the variable z, so b will be the vector (0,0,0,0,1,0,0)
The c vector outputs just the variable T3, that is in the last position of the vector, then c = (0,0,0,0,0,0,1).
The vector w is always the same for all the constraints of this instance of the problem, w=(1, 11, 1, 1, 2, 4, 4). Also this time it’s easy to verify that w °c = w °a * w °b:
w°c | = | (1, 11, 1, 1, 2, 4, 4)°(0,0,0,0,0,0,1) | = | 1*0 + 11*0 + 1*0 + 1*0 + 2*0 + 4*0 + 4*1 | = | 4 |
w°a | = | (1, 11, 1, 1, 2, 4, 4)°(0,0,2,0,0,0,0) | = | 1*0 + 11*0 + 1*2 + 1*0 + 2*0 + 4*0 + 4*0 | = | 2 |
w°b | = | (1, 11, 1, 1, 2, 4, 4)°(0,0,0,0,1,0,0) | = | 1*0 + 11*0 + 1*0 + 1*0 + 2*1 + 4*0 + 4*0 | = | 2 |
And that sums up to 4 = 2*2
We have expressed another constraint!
Let’s write our vectors vertically. In this last example, related to the third line of the computation, we have:
w | ° | c | = | w | ° | a | * | w | ° | b |
1 | 0 | 1 | 0 | 1 | 0 | |||||
11 | 0 | 11 | 0 | 11 | 0 | |||||
1 | 0 | 1 | 2 | 1 | 0 | |||||
1 | ° | 0 | = | 1 | ° | 0 | * | 1 | ° | 0 |
2 | 0 | 2 | 0 | 2 | 1 | |||||
4 | 0 | 4 | 0 | 4 | 0 | |||||
4 | 1 | 4 | 0 | 4 | 0 | |||||
4 | = | 2 | * | 2 |
If we collect all of the vectors a, b and c, then all of them together express the original computation. If you’re lost and can’t grasp the reasoning behind this last statement, it may be helpful to see that, to calculate the vector w, we need to compute each step of the computation, so that we can know the value of each used variable. Then, if for each of the above collections of vectors a, b and c, we can check that w °c = w °a * w °b, that means that indeed w has the values assigned during the computation, i.e. that the computation was indeed performed.