zk-SNARK: Succinct Non-interactive ARgument of Knowledge
\log(n) short proof
non-interactive
zero knowledge
argument (assume computational bounded), instead of a proof
Applications:
zkSNARK rollup for privacy (although rollup typically does not require zk, only SNARK)
confidential transactions (ZCash, Tornado Cash, IronFish)
private dapps code (Aleo)
prove solvency in transaction (e.g. prove that I have certain about of property)
prove transaction compliance to regulation (e.g. prove transaction is under 1000 dollars)
zk taxes
Linear Algebra: Strang's Book, 3B1B
Abstract Algebra: A first course in abstract algebra by Fraleigh and Algebra chapter 0 by Aluffi.
Fourrier transform: 3B1B
Cryptography: Introduction to Mathematical Cryptography by Hoffstei, then Dan Boneh’s courses in Coursera
ZK: ZK Whiteboard sessions, then Plonk implementations, gnark and arkworks codebase too.
If you already a pro, you might start with proving BrainFuck
https://aszepieniec.github.io/stark-brainfuck
https://neptune.cash/learn/brainfuck-tutorial
Study Resources: please read them in order
ZK Whiteboard Sessions(or Official) - by ZK
Zero Knowledge Machine Learning zkApp for Mina Protocol (ZKML) - by Jason Morton 2022
ZK Podcast - by ZK
Histroy, Textbook, Papers, Applications Collection of ZK (Mostly before 2020) - Video Lecture by Justin Thaler in this collection is good.
Uncloak - Large DB of knowledge about cryptography, zk, hash, quantum
To understand math and this paper: - If you like courses: you need Crypto I CS255 - Coursera prereques and maybe Stanford CS355 and Number Theory. - Modern Zero Knowledge Cryptography - 4 week course on ZK - Berkeley Course on ZK and recording - Or Graduate Course on Applied Cryptography - You can also use ZCash Documentation or this repository - Or even a course by ZK-DAO (login as guest)
To learn Halo2 Circuit Proving System: Zero Knowledge Proof — A Guide to Halo2 Source Code
To learn Plonk2 Circuit Proving System:
To learn Noir Language:
To learn Circom:
To learn RISC Zero zkVM: Documentation and Examples
To solve puzzles and learn arkworks: ZKHack Resource
To learn quantization
Firstly, this repository summarizes projects related to ZKML.
They host community calls with notes and recordings
They have calendar Modulus Labs
Secondly, this repository is everything about ZK. (Not necessarily about ML.)
Other ZK ML Projects:
Giza is working on verifiable training and crowdsource neural net for SaaS.
Execution of a Federated Learning process within a smart contract
ChainML is building PyTorch library for deploying ZKML using SpaceAndTime
Other ZK Projects:
prove integrity of image using specialized hardware + ZK-Editing Software
RISC Zero zkVM
zkML ads: don't give your data to Google/Youtube recommending system while enjoy things they recommended based on your data by zk Yuki. Trusted Execution Environment (TEE). Provide proofs alongside recommendations and/or run the recommendations directly on-chain by Nazih Kalo
zkEmail by Ayush, proof of me received an email containing partial hidden information.
Other Encryption Projects:
Fully Homomorphic Encryption (FHE) is a type of encryption that allows computations to be performed on encrypted data without the need to decrypt the data first. And both will produce the same result. But you don’t get computational integrity/verifiability with FHE afaik like you do with SNARKs/STARKs. Concrete-ML is a library for FHE ML, and they are working on verifiability. But since both zk and FHE have a 1000x slowdown factor, the combination will be very slow. (FHE gives you privacy, ZK gives your verifiability. Combine both could replace SGX)
Software Guard Extensions (SGX) is a set of CPU instructions and hardware enhancements by Intel. It protect specific memory region when executing code on sensitive data so that data can't be accessed by outside (other programs or even OS)
Ocean Protocol: allow publishing, selling, and training on encrypted data on chain.
Arithmetic circuits: a function C : \mathbb{F}^n \to \mathbb{F}
fix a finite field \mathbb{F} = \{0, ..., p-1\} for some prime p > 2
fix a set of modular operation on F
|C| denotes the number of gates in the circuit
You can think of arithmetic circuits as a general computer circuit with one output that has integer range [0, p-1] for some prime. (since practical computers are intrinsically modular due to overflow)
boolean-SAT: a problem asking given boolean circuit C(...), whether there exists a set of x_i boolean input, such that C(x_1, x_2, ..., x_n) = 0.
Note that boolean-SAT is NP-complete. But if I an the prover who obtain a set of witness x_i that satisfies C, I wish that I am able to convince the verifier that C is satisfiable without giving out my x_i witness.
All polynomial-time computation can be captured by Poly-Arithmetic Circuits
must be deterministic
must be pure
Argument System:
given statement
x \in \mathbb{F}^n
given witness
w \in \mathbb{F}^m
prover knows x, w
verifier knows x
prover needs to convince verifier that (\exists w)(C(x, w) = 0), basically the circuit C(x, \cdot) is satisfiable.
Without ZK, the prover need only simply send w to the verifier to convince the verifier that C is satisfiable.
Interactive Argument System: prover and verifier can have many rounds of interaction to raise the confidence of the verifier.
Non-interactive Argument System: prover send only one thing once to verifier.
Non-interactive Argument System is needed with many verifiers. Example: rollup server would be crowded with challenges from nodes all over the world if using interactive argument system.
Completeness: if a prover has correct w that satisfies C, then verifier will be convinced.
C: modular SAT circuit, publically avaliable
x: argument
w: satisfying string, the witness
\pi: proof, either real or faked
Argument of Knowledge:
V \text{ accept} \implies P \text{ "knows" } w
P \text{ "not know" } w \implies Pr\{V(x, \pi) = \text{ accept}\} < \epsilon
Prover Knows: means w can be "extracted" from the prover. Formally, if there exist a forger who is able to convince the verifier, then there exist an algorithm called "extractor" who can use the forger as a blackbox to get w. So the forger isn't really a forger.
Above definition assumes computational bounded adversary, so it is called "argument of knowledge", not a "proof of knowledge".
Zero Knowledge: (x, \pi) "reveals nothing" about w.
Learn Nothing: there exist an efficient algorithm "simulator" that generates a proof \pi given C, S_p, S_v (without using w) such that the simulated \pi is in the distribution of real proof \pi.
Note that the simulator can choose S_p, S_v freely but a real prover is given S_p from initialization.
Succinct: with a constant security parameter \lambda
the length of proof is short: \pi is in O(\log|C| + \lambda)
verifier is fast: V(x, \pi) run in O(|x| + \log|C| + \lambda)
// TODO: some source says that \pi need to be in log(|w|), which I don't know if it is correct. I don't have intuition on relation between circuit size and the number of input digits.
If you observe "Succinct" requirement, then you may discover that \log|C| is not achievable since verifier at least need to read C in O(|C|) for correctness reason. But the verifier is allowed to pre-process C and summarize C to S_v that has length O(\log|C|). So the verifier can look like: V(S_v, x, \pi).
A trivial argument system: just sending w will break
zero knowledge
succinct length
succinct computation
In order to satisfy succinct requirement, preprocessing technique is needed to summarize the circuit C to the size of \log|C|.
There are a few techniques on how to do this: in general we the preprocess algorithm S takes in a circuit, and generate tuple (S_p, S_v) a summary for the prover and verifier.
Trusted Setup Per-circuit:
So classically, when layer2 rollup chains starts, founders will gather people to run setups. Each person run their own algorithm and hope at least one of them throw away the secret.
Trusted but Updatable:
Here is Vitalik's Blog on How Trusted Setup Worked.
More zk-SNARK circuits and constraints require longer setup time. ZCash setup took about 24 hours on a fast machine, and requires a 97G download and 49G upload. Rollup requires more than 260 million constraints, so it must compute 2^{28} powers of tau.
Transparent: much more costly than above two methods, but does not rely on trusted admin and requires no secret data.
Kilian'92, Micali'94: succinct transparent arguments for PCP, but with impractical prover time
GGPR'13, Groth'16: quasi-linear prover time, constant size proof, trusted, per-circuit setup (most application use Groth'16)
Sonic'19, Marlin'19, Plonk'19: universal updatable
DARK'19, Halo'19, STARK, ...: no trusted setup (transparent)
Typically, people use Groth'16:
size of \pi is around 200 bytes
S_p is quiet large, but S_v is small (good for blockchain miners)
verification time: 3ms
but require trusted setup
Plonk/Marlin's method doubled the size of \pi and verification time, but enable updatable/universal setup
and plus Noir, CirC(build your own DSL) for high-level language. and plus PIL for low-level language.
Example: We want to show that SHA256(w) = x
the statement we want to prove is: x
the witness is: w
We first write in higher language ZoKrates:
// domain-specific language
def main(field x[2], private field w) -> (field):
h = sha256packed(w)
h[0] == x[0] // check first 128 bits
h[1] == x[1] // check last 128 bits
return 1
Then we compile above to R1CS program (arithmetic circuit representation) over \mathbb{F}
Asked by Justin: The person asking for the result doesn't want to disclose their input (e.g. their personal health information) and the model provider does not want to disclose their model weights (e.g. if their model is proprietary). This is possible.
Reply by Jason: Yes, you could combine MPC+ZK to do that. If part of the model can be public there might be a way to play some games with recursion / composition, but it would probably leak information.
Reply by Elisey: i guess it can be combined in the following way. let’s say there are two models:
one obfuscates the data
one performs computation on obfuscated data
first model is run in users device. second on behalf of provider. one should be really careful when designing such system to avoid leaking information. two main vectors of leakage:
straight up reversible way to obfuscate the data
big enough output vector ( so people can not be rainbow table all the possible inputs)
Ryan | Modulus Labs: In terms of verifying e.g. that a model is robust to extraction attacks, data poisoning, adversarial attacks, etc., it's a totally different story (and folks have posited that this may not even be possible when working in such high-dimensional feature spaces) -- I'm unfortunately behind on SoTA AI literature for this subfield in particular, but folks are certainly increasingly interested in robustness and explainability as a research topic.
"It's possible to extract model weights just by querying the model on certain inputs, so even if you have an ML black box it's not going to be private in many respects, especially not towards the model"
I think that it's a very interesting question to apply a lot of the already existing work on "model compression" and "knowledge distillation" so that ML models can produce a witness that is more closely compatible with provers than simply naively compiling a NN to a circuit
Non-linear layers like Softmax are harder to implement in proofs.
Hey, right now the state of art in ZKML is in its very early stages, nothing that you can just plug an play for even mid sized models, let alone big models.
// QUESTION: what is Recursive zkSNARKs? (https://github.com/lyronctk/zator Proving the execution of arbitrarily deep neural networks with recursive SNARKs.)
// QUESTION: I don't quiet understand this conversation: https://t.me/zkmlcommunity/812 and https://t.me/zkmlcommunity/825
// QUESTION: Read this paper: https://eprint.iacr.org/2022/1508 "Indeed, the case that we consider — where Alice’s program is large — is extremely well motivated: the program P could be an ML model with billions of painstakingly learned parameters.
// TODO: read this when you have sufficient knowledge: https://polybase.xyz/blog/polybase-zk-database-polygon-miden
// TODO: not sure what these two guys are discussing: https://t.me/zkmlcommunity/1314
// TODO: watch the following
// QUESTION: I don't know about this paper: https://eprint.iacr.org/2022/957 as part of this website: https://www.ulvetanna.io/news/introducing-ulvetanna
// QUESTION: I don't understand SNARKs utilization in BFT to reduce complexitiy: https://arxiv.org/pdf/2205.11652v2.pdf
// QUESTION: Halo2 FRI Gadget
// QUESTION:
Table of Content