# zkEVM Halo2 GPU Prover

Snarkify's cuSnark library is a C++/CUDA project which provides a set of API functions, with rust bindings, designed to be plugged into the Halo2 proof system, replacing various compute-intensive operations with an accelerated GPU backend.

The following tables outline the performance improvements yielded when all of these optimizations are employed. The impact on end-to-end (e2e) proof time each optimization has depends on the size of the proof, which can be generally characterized by the number of rows and columns in the proof's trace table. Two proofs with different dimensions have been selected to demonstrate the variance in optimization impact. These benchmarks were obtained on a AMD EPYC 7702 64-Core Processor with 4x NVIDIA GeForce RTX 3090 (24 GB) GPUs.

### Proof 1 (aggregation): 2^25 rows, 5 columns

<table data-full-width="true"><thead><tr><th width="240">Proof Stage</th><th>CPU/s</th><th width="148">CPU e2e %</th><th width="129">GPU/s</th><th width="132">GPU e2e %</th><th>Speedup</th></tr></thead><tbody><tr><td>Initialization</td><td>1.40 </td><td>0.64 </td><td>1.40</td><td>4.63 </td><td>1.00 </td></tr><tr><td>Generate Instance</td><td>1.08 </td><td>0.50 </td><td>0.46</td><td>1.52 </td><td>2.35 </td></tr><tr><td>Generate Advice</td><td>6.99 </td><td>3.21 </td><td>2.73</td><td>9.03 </td><td>2.56 </td></tr><tr><td>Generate Lookups</td><td>2.22 </td><td>1.02 </td><td>1.88</td><td>6.22 </td><td>1.18 </td></tr><tr><td>Commit Permutations</td><td>24.01 </td><td>11.03 </td><td>10.72</td><td>35.45 </td><td>2.24 </td></tr><tr><td>Eval_h</td><td>67.40 </td><td>30.95 </td><td>6.30</td><td>20.83 </td><td>10.70 </td></tr><tr><td>Compute Evaluations</td><td>35.73 </td><td>16.41 </td><td>5.02</td><td>16.60 </td><td>7.12 </td></tr><tr><td>Multiopen</td><td>29.74</td><td>13.66</td><td>1.72</td><td>5.69</td><td>17.29 </td></tr><tr><td><strong>Total</strong></td><td><strong>217.76</strong></td><td></td><td><strong>30.24</strong></td><td></td><td><strong>7.20</strong></td></tr></tbody></table>

### Proof 2 (chunk\_inner): 2^20 rows, 1135 columns

<table data-full-width="true"><thead><tr><th width="246">Proof Stage</th><th>CPU/s</th><th width="138">CPU e2e %</th><th width="99">GPU/s</th><th width="199">GPU e2e %</th><th>Speedup</th></tr></thead><tbody><tr><td>Initialization</td><td>6.15 </td><td>0.35 </td><td>6.11</td><td>1.32 </td><td>1.01 </td></tr><tr><td>Generate Instance</td><td>0.05 </td><td>0.00 </td><td>0.13</td><td>0.03 </td><td>0.38 </td></tr><tr><td>Generate Advice</td><td>393.58 </td><td>22.33 </td><td>306.44</td><td>66.17 </td><td>1.28 </td></tr><tr><td>Generate Lookups</td><td>59.63 </td><td>3.38 </td><td>56.84</td><td>12.27 </td><td>1.05 </td></tr><tr><td>Commit Permutations</td><td>152.79 </td><td>8.67 </td><td>42.27</td><td>9.13 </td><td>3.61 </td></tr><tr><td>Eval_h</td><td>1115.43 </td><td>63.28 </td><td>36.19</td><td>7.81 </td><td>30.82 </td></tr><tr><td>Compute Evaluations</td><td>10.22 </td><td>0.58 </td><td>7.60</td><td>1.64 </td><td>1.34 </td></tr><tr><td>Multiopen</td><td>24.90</td><td>1.41</td><td>7.56</td><td>1.63</td><td>3.29 </td></tr><tr><td><strong>Total</strong></td><td><strong>1762.75</strong></td><td></td><td><strong>463.13</strong></td><td></td><td><strong>3.81</strong></td></tr></tbody></table>

This document outlines the following GPU modules and the acceleration they provide for Halo2 proofs of various dimensions:

* [Multi-Scalar Multiplication (MSM)](https://docs.snarkify.io/high-performance-zkp/zkevm-halo2-gpu-prover/msm)
* [Number Theoretic Transform (NTT)](https://docs.snarkify.io/high-performance-zkp/zkevm-halo2-gpu-prover/ntt)
* [Polynomial Evaluation](https://docs.snarkify.io/high-performance-zkp/zkevm-halo2-gpu-prover/quotient-polynomial-evaluation)
* [KZG Multiopen](https://docs.snarkify.io/high-performance-zkp/zkevm-halo2-gpu-prover/kzg-multiopen)
* [Polynomial Inversion](https://docs.snarkify.io/high-performance-zkp/zkevm-halo2-gpu-prover/polynomial-inversion)
* [Permutation Generation](https://docs.snarkify.io/high-performance-zkp/zkevm-halo2-gpu-prover/permutation-generation)<br>
