# Polynomial Inversion

## Overview

At multiple points in the Halo2 proof generation process, large sets of finite field elements must be inverted, which can be a significant computational expense.

## GPU Implementation

These inversions can be trivially implemented on the GPU by simply computing the inverse of each element concurrently.

## Performance

The following table outlines the performance improvements yielded with just the cuSnark polynomial inversion employed, showing the (CPU baseline) and accelerated results for the different proof stages in a set of proofs of various sizes. Times are in seconds, obtained on a AMD EPYC 7702 64-Core Processor with 4x NVIDIA GeForce RTX 3090 (24 GB) GPUs.

<table><thead><tr><th width="237">Proof Stage</th><th width="254">2^20 rows, 1135 columns</th><th>2^25 rows, 5 columns</th></tr></thead><tbody><tr><td>Initialization</td><td>(6.04) 6.12 </td><td>(1.40) 1.41 </td></tr><tr><td>Generate Instance</td><td>(0.05) 0.05 </td><td>(1.05) 1.08 </td></tr><tr><td>Generate Advice</td><td>(381.78) 355.14 </td><td>(6.68) 5.53 </td></tr><tr><td>Generate Lookups</td><td>(57.99) 57.92 </td><td>(2.10) 2.12 </td></tr><tr><td>Commit Permutations</td><td>(146.59) 145.17 </td><td>(23.72) 23.58 </td></tr><tr><td>Eval_h</td><td>(1069.09) 1070.33 </td><td>(66.58) 66.70 </td></tr><tr><td>Compute Evaluations</td><td>(9.81) 9.85 </td><td>(35.79) 35.39 </td></tr><tr><td>Multiopen</td><td>(18.71) 18.93</td><td>(28.82) 28.69</td></tr><tr><td>Total</td><td>(1690.05) 1663.51</td><td>(166.15) 164.48</td></tr></tbody></table>
