zk-SNARK

zk-SNARK: Succinct Non-interactive ARgument of Knowledge

Applications:

Study Resources and Current State

Study Resource

Linear Algebra: Strang's Book, 3B1B
Abstract Algebra: A first course in abstract algebra by Fraleigh and Algebra chapter 0 by Aluffi.
Fourrier transform: 3B1B
Cryptography: Introduction to Mathematical Cryptography by Hoffstei, then Dan Boneh’s courses in Coursera
ZK: ZK Whiteboard sessions, then Plonk implementations, gnark and arkworks codebase too.

If you already a pro, you might start with proving BrainFuck

https://aszepieniec.github.io/stark-brainfuck
https://neptune.cash/learn/brainfuck-tutorial

Study Resources: please read them in order

To understand math and this paper: - If you like courses: you need Crypto I CS255 - Coursera prereques and maybe Stanford CS355 and Number Theory. - Modern Zero Knowledge Cryptography - 4 week course on ZK - Berkeley Course on ZK and recording - Or Graduate Course on Applied Cryptography - You can also use ZCash Documentation or this repository - Or even a course by ZK-DAO (login as guest)

To learn Halo2 Circuit Proving System: Zero Knowledge Proof — A Guide to Halo2 Source Code

To learn Plonk2 Circuit Proving System:

To learn Noir Language:

To learn Circom:

To learn RISC Zero zkVM: Documentation and Examples

To solve puzzles and learn arkworks: ZKHack Resource

To learn quantization

Current State

Firstly, this repository summarizes projects related to ZKML.

Secondly, this repository is everything about ZK. (Not necessarily about ML.)

Other ZK ML Projects:

Other ZK Projects:

Other Encryption Projects:

The Foundations

Arithmetic Circuits

Arithmetic circuits: a function C : \mathbb{F}^n \to \mathbb{F}

You can think of arithmetic circuits as a general computer circuit with one output that has integer range [0, p-1] for some prime. (since practical computers are intrinsically modular due to overflow)

boolean-SAT: a problem asking given boolean circuit C(...), whether there exists a set of x_i boolean input, such that C(x_1, x_2, ..., x_n) = 0.

Note that boolean-SAT is NP-complete. But if I an the prover who obtain a set of witness x_i that satisfies C, I wish that I am able to convince the verifier that C is satisfiable without giving out my x_i witness.

All polynomial-time computation can be captured by Poly-Arithmetic Circuits

Argument System

Argument System:

Without ZK, the prover need only simply send w to the verifier to convince the verifier that C is satisfiable.

Interactive Argument System: prover and verifier can have many rounds of interaction to raise the confidence of the verifier.

Non-interactive Argument System: prover send only one thing once to verifier.

Non-interactive Argument System is needed with many verifiers. Example: rollup server would be crowded with challenges from nodes all over the world if using interactive argument system.

Properties of zk-SNARK

Completeness: if a prover has correct w that satisfies C, then verifier will be convinced.

(\forall x, w)(C(x, w) = 0 \implies Pr\{V(x, P(x, w)) = \text{ accept}\} = 1)

Argument of Knowledge:

Prover Knows: means w can be "extracted" from the prover. Formally, if there exist a forger who is able to convince the verifier, then there exist an algorithm called "extractor" who can use the forger as a blackbox to get w. So the forger isn't really a forger.

Formal Definition of Knowing

Formal Definition of Knowing

Above definition assumes computational bounded adversary, so it is called "argument of knowledge", not a "proof of knowledge".

Zero Knowledge: (x, \pi) "reveals nothing" about w.

Learn Nothing: there exist an efficient algorithm "simulator" that generates a proof \pi given C, S_p, S_v (without using w) such that the simulated \pi is in the distribution of real proof \pi.

Formal Definition of Learning Nothing

Formal Definition of Learning Nothing

Note that the simulator can choose S_p, S_v freely but a real prover is given S_p from initialization.

Succinct: with a constant security parameter \lambda

// TODO: some source says that \pi need to be in log(|w|), which I don't know if it is correct. I don't have intuition on relation between circuit size and the number of input digits.

If you observe "Succinct" requirement, then you may discover that \log|C| is not achievable since verifier at least need to read C in O(|C|) for correctness reason. But the verifier is allowed to pre-process C and summarize C to S_v that has length O(\log|C|). So the verifier can look like: V(S_v, x, \pi).

A trivial argument system: just sending w will break

Pre-Processing and Initialization

In order to satisfy succinct requirement, preprocessing technique is needed to summarize the circuit C to the size of \log|C|.

There are a few techniques on how to do this: in general we the preprocess algorithm S takes in a circuit, and generate tuple (S_p, S_v) a summary for the prover and verifier.

Trusted Setup Per-circuit:

  1. network admin choose a random secret string \lambda for every C
  2. admin run S(\lambda, C) to produce (S_p, S_v)
  3. admin destroy \lambda. Non-admin believe in faith that admin destroyed \lambda and act non-maliciously. If \lambda leaked, anybody can prove false statement.

So classically, when layer2 rollup chains starts, founders will gather people to run setups. Each person run their own algorithm and hope at least one of them throw away the secret.

Trusted but Updatable:

  1. network admin choose a random secret string \lambda for every C
  2. admin run S_{init}(\lambda) to produce U
  3. admin destroy \lambda. Non-admin believe in faith that admin destroyed \lambda and act non-maliciously. If \lambda leaked, anybody can prove false statement.
  4. Everybody can run S_{pre}(U, C) to produce (S_p, S_v)

Here is Vitalik's Blog on How Trusted Setup Worked.

More zk-SNARK circuits and constraints require longer setup time. ZCash setup took about 24 hours on a fast machine, and requires a 97G download and 49G upload. Rollup requires more than 260 million constraints, so it must compute 2^{28} powers of tau.

Transparent: much more costly than above two methods, but does not rely on trusted admin and requires no secret data.

Types of SNARKs

Types of SNARKs (partial list)

Types of SNARKs (partial list)

Types of SNARKs (partial list) with exact time

Types of SNARKs (partial list) with exact time

Kilian'92, Micali'94: succinct transparent arguments for PCP, but with impractical prover time

GGPR'13, Groth'16: quasi-linear prover time, constant size proof, trusted, per-circuit setup (most application use Groth'16)

Sonic'19, Marlin'19, Plonk'19: universal updatable

DARK'19, Halo'19, STARK, ...: no trusted setup (transparent)

Typically, people use Groth'16:

SNARKs Software System

SNARKs Software System: each green arrow is an algorithm. The computational intense part is to produce the proof.

SNARKs Software System: each green arrow is an algorithm. The computational intense part is to produce the proof.

and plus Noir, CirC(build your own DSL) for high-level language. and plus PIL for low-level language.

Example: We want to show that SHA256(w) = x

We first write in higher language ZoKrates:

// domain-specific language
def main(field x[2], private field w) -> (field):
  h = sha256packed(w)
  h[0] == x[0] // check first 128 bits
  h[1] == x[1] // check last 128 bits
  return 1

Then we compile above to R1CS program (arithmetic circuit representation) over \mathbb{F}

Whiteboard Sessions by Dan Boneh

ZKML

Promises of ZK

Asked by Justin: The person asking for the result doesn't want to disclose their input (e.g. their personal health information) and the model provider does not want to disclose their model weights (e.g. if their model is proprietary). This is possible.

Reply by Jason: Yes, you could combine MPC+ZK to do that. If part of the model can be public there might be a way to play some games with recursion / composition, but it would probably leak information.

Reply by Elisey: i guess it can be combined in the following way. let’s say there are two models:

first model is run in users device. second on behalf of provider. one should be really careful when designing such system to avoid leaking information. two main vectors of leakage:

Current Issues of ZKML

Ryan | Modulus Labs: In terms of verifying e.g. that a model is robust to extraction attacks, data poisoning, adversarial attacks, etc., it's a totally different story (and folks have posited that this may not even be possible when working in such high-dimensional feature spaces) -- I'm unfortunately behind on SoTA AI literature for this subfield in particular, but folks are certainly increasingly interested in robustness and explainability as a research topic.

"It's possible to extract model weights just by querying the model on certain inputs, so even if you have an ML black box it's not going to be private in many respects, especially not towards the model"

I think that it's a very interesting question to apply a lot of the already existing work on "model compression" and "knowledge distillation" so that ML models can produce a witness that is more closely compatible with provers than simply naively compiling a NN to a circuit

Non-linear layers like Softmax are harder to implement in proofs.

Hey, right now the state of art in ZKML is in its very early stages, nothing that you can just plug an play for even mid sized models, let alone big models.

Remaining Questions

// QUESTION: what is Recursive zkSNARKs? (https://github.com/lyronctk/zator Proving the execution of arbitrarily deep neural networks with recursive SNARKs.)

// QUESTION: I don't quiet understand this conversation: https://t.me/zkmlcommunity/812 and https://t.me/zkmlcommunity/825

// QUESTION: Read this paper: https://eprint.iacr.org/2022/1508 "Indeed, the case that we consider — where Alice’s program is large — is extremely well motivated: the program P could be an ML model with billions of painstakingly learned parameters.

// TODO: read this when you have sufficient knowledge: https://polybase.xyz/blog/polybase-zk-database-polygon-miden

// TODO: not sure what these two guys are discussing: https://t.me/zkmlcommunity/1314

// TODO: watch the following

  1. OpenZL: Middleware and Open Standards for the Next Generation of zkApps, Brandon Gomes, Manta Network
  2. Ring VRFs from zero knowledge continuations, Jeff Burdges, Web3 Foundation
  3. A Zero-Knowledge circuit for the Lurk language, Eduardo Morais, Protocol Labs
  4. Arkworks: A Rust Ecosystem for zkSNARKs, Pratush Mishra, Aleo/UPenn
  5. Hardware acceleration of ZKP, Bowen Huang, Cysic
  6. Tutorial: Poseidon in ECLAIR, Todd Norton, Manta Network
  7. Considering Plonky2, Sebastien La Duca
  8. Multi-level IR and its utility in ZK, Brian Retford, RISC0
  9. On Interoperability of Crypto Compute Environments, Wei Dai
  10. CycloneNTT: Improving Twiddle Access for Number Theoretic Transforms, Rahul Maganti, Jump Crypto https://www.crowdcast.io/c/manta-openzl(or https://www.crowdcast.io/c/manta-openzl/kkomP)

// QUESTION: I don't know about this paper: https://eprint.iacr.org/2022/957 as part of this website: https://www.ulvetanna.io/news/introducing-ulvetanna

// QUESTION: I don't understand SNARKs utilization in BFT to reduce complexitiy: https://arxiv.org/pdf/2205.11652v2.pdf

// QUESTION: Halo2 FRI Gadget

// QUESTION:

Don't understand this image

Don't understand this image

Table of Content