# Lecture 007 - Uniform Superposition

## Some Warnings

Unlike probabilistic world you can calculate the probability of certain self-defined events with grouping multiple events of your choice together, you cannot just print one bits and sum up amplitude before squaring, you have to square the amplitude for individual, unique state.

To illustate, don't do this: \begin{align} &a|011\rangle + b|010\rangle + c|001\rangle + d|000\rangle\ \neq& (a+b)|01x\rangle + (c+d)|other\rangle\ \neq& (a+b)^2 \tag{probability output 01x}\ \neq& (c+d)^2 \tag{probability output other}\ \end{align} In fact, we have: \begin{align} &a|011\rangle + b|010\rangle + c|001\rangle + d|000\rangle\ =& a^2+b^2 \tag{probability output 01x}\ =& c^2+d^2 \tag{probability output other}\ \end{align} We only add things up if we have: \begin{align} &a|011\rangle + a'|011\rangle + b|010\rangle + c|001\rangle + d|000\rangle\ =& (a + a')^2+b^2 \tag{probability output 01x}\ =& c^2+d^2 \tag{probability output other}\ \end{align}

## Repairing "Uniform Superposition"

### Uniform Superposition and Hadamard Transform

Superposition: non-deterministic quantum state Uniform Superposition: all the amplitude in a valid quantum state are the same

1. Starting from $n$ qubits initialized to all $0$
2. Do Hadamard gate on all qubits

Hadamard Transform on all $0$ gives equal probability on $\{0, 1\}^n$ strings, that is $2^n$ possible outcomes. But none of the qubits are entangled by Dirty Secret Theorem.

If we do Hadamard Transform, we essentially transform all amplitude from one single point to the entire amplitude field. The amplitudes of the result quantum state for $n$ qubits is all $\sqrt{\frac{1}{2}}^{n} = \left(\frac{1}{2}\right)^{n/2}$.

Repairing a quantum state that is already in superposition is hard. We will discuss it in future lecture.

If we compute in uniform superposition, we observe one deterministic gates $f()$ can adjust all amplitudes of qubits in the leaves. If we manually do so, that is $2^n$ number of calls to $f()$, which seems powerful. But similarly, this isn't so different than probabilistic computing. In fact, the true power lies on cancelation of negative amplitude.

What if we add negative amplitude by hiding the result with $-1$ amplitude (sign-compute)? We still get the same result, since $(-1)^2 = (1)^2$

$C_F$:

1. Starting from $n$ qubits initialized to all $0$
2. Do Hadamard gate on all qubits
3. Sign compute some function $F$
4. Do Hadamard gate on all qubits
5. Print $|000...0\rangle$

Theorem: sign compute $F : \lbrace 0, 1 \rbrace^n \to \lbrace 0, 1 \rbrace^n$ in uniform superposition, and do Hadamard Transform (on all bits), then the final amplitude on $|000...0\rangle$ is the arithmetic average of all $2^n$ amplitudes.

Why? Say we created uniform superposition by doing Add&Sub on $n$ qubit 0...0 on every $i \in [n]$ direction (each direction in the amplitude space represent 2 possible state of a qubit, so $n$ dimension gives $2^n$ possible states). Then we do some computation. Then we do Avg&Diff, and notice if we only care about qubit 0...0, then we only need to do Avg computation and ignore Diff computation since Diff is not in the direction toward 0...0. So we are essentially averaging all signed amplitudes.

The below is a generic sign compute function. Notice that we need to use 1 temporary variables to do the sign compute. Also we require tmp to not be in the input variable of f().

@require tmp=0
def If f() then Minus():
// [ 1,  1,  1]
// [ 1,  1,  1,  0,  0,  0]
Add 1 to tmp     // [ 0,  0,  0,  1,  1,  1] switch l/r
H on tmp         // [ 1,  1,  1, -1, -1, -1] minus (cuz tmp=1) spread
Add f() to tmp   // [ 1, -1,  1, -1,  1, -1] switch l/r based on f()
H on tmp         // [ 0,  0,  0,  1, -1,  1] minus (control) contract
Add 1 to tmp     // [ 1, -1,  1,  0,  0,  0] switch l/r


## Bias-Busting

So, if amplitude on 0...0 is 0 (therefore the probability of printing out 0...0 is 0), we know there are equal many 0s as 1s in amplitudes.

Corollary: $F$ is a balanced function (meaning answer sum to one), iff the amplitude of state $|000...0\rangle$ is zero, and the probability of printing out $|000...0\rangle$ is zero.

For mystery function $F$, if it is balanced, then we never prints $|000...0\rangle$. If it is not balanced, then it sometimes prints $|000...0\rangle$.

If there exists a classical gate $C_F$ for all function $F$, then $P^{\sigma_2} = NP^{\sigma_2}$.

Bias-Busting (Frutsh-Jordan): If $F$ is balanced, never say "Busted!". If $F$ is biased, then $Pr\{\text{"Busted!"}\} > 0$.

Name Speedup
Bias-Busting (Frutsh-Jordan) Exponential(Y)
Rollercoaster (Bernstein-Vazirani) Polynomial(N)
SAT in $\sqrt{2}^n$ (Grover's) Modest(Y)
Simon's Algorithm Exponential(N)
Factoring Exponential(Y)

The derivation assumes we have "Add&Diff".

In lecture, we see

XOR(x_1, x_2, ..., x_n) = \begin{cases} 1 & \text{if }|\{x | x = 1\}| \mod 2 = 1\\ 0 & \text{otherwise} \end{cases}

Therefore the filtered $XOR_{B_1 ... B_n} (A_1, ..., A_n)$ can be seen as binary dot product of vectors $\begin{bmatrix}A_1 & ... & A_n\end{bmatrix} \cdot \begin{bmatrix}B_1 & ... & B_n\end{bmatrix} \mod 2$. By distributed rule, we can write the dot product as $\sum_{i = 1}^n ((A_i \cdot B_i) \mod 2)$.

The amplitude (the thing you need to multiply) for string $x$ to go to string $y$ after applying "Add\&Diff All" operation can be calculated by multiplying the individual amplitude of "Add\&Diff" operation on each cubit. In a $n$ cubit system, for string $x$ to go to string $y$, we need to multiply the following in our amplitude tree:

a row of multiply factor in amplitude tree correspond to a column in matrix

\prod_{i = 1}^n (-1)^{x_i \cdot y_i} = (-1)^{\sum_{i = 1}^n x_i \cdot y_i} = (-1)^{XOR_y(x)}

To justify the above equation: For each cubit $x_i$, it goes to $y_i$ with amplitude $-1$ if only if $x_1 = 1 \land y_1 = 1$, otherwise with amplitude $1$. Therefore, we need to multiply by $-1$ only when $x_i \cdot y_i = 1$. By \textbf{Lamma about XOR}, we can also write the sum using $XOR$. Therefore $AD[x]_y = AD[y]_x = (-1)^{XOR_y(x)}$ since $XOR_y(x) = XOR_x(y)$.

### Derivation

\begin{align*} \hat{f}(y) =& AD[y] \cdot |v\rangle\\ =& \sum_{i = 1}^n AD[y]_i \cdot |v\rangle_i\\ =& \sum_{x \in \{0, 1\}^n} AD[y]_x \cdot f(x)\\ =& \sum_{x \in \{0, 1\}^n} (-1)^{XOR_y(x)} \cdot f(x)\\ =& 2^n \text{avg}_{x \in \{0, 1\}^n} f(x) \cdot (-1)^{XOR_y(x)} \tag{assume no complex number}\\ =& \text{avg}_{x \in \{0, 1\}^n} f(x) \cdot (-1)^{XOR_y(x)} \tag{dropping scalar}\\ \end{align*}

### Observation

We can recursively define the matrix as:

H_m = \frac{1}{\sqrt{2}} \begin{pmatrix} H_{m - 1} & H_{m - 1}\\ H_{m - 1} & -H_{m - 1}\\ \end{pmatrix}

If we chosen the $x$ to be $|0\rangle$, essentially serve as an average of all amplitudes.

If we have a two qubit system, then the Hadamard matrix looks like:

A = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1 & 0 & 0\\ 1 & -1 & 0 & 0\\ 0 & 0 & 1 & 1\\ 0 & 0 & 1 & -1\\ \end{bmatrix}, B = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 0 & 1 & 0\\ 0 & 1 & 0 & 1\\ 1 & 0 & -1 & 0\\ 0 & 1 & 0 & -1\\ \end{bmatrix},

For three cubits, these are the 3 matrices:

\begin{pmatrix} A & 0\\ 0 & A\\ \end{pmatrix}, \begin{pmatrix} B & 0\\ 0 & B\\ \end{pmatrix}, \frac{1}{\sqrt{2}}\begin{pmatrix} I & 0\\ 0 & I\\ \end{pmatrix},

Table of Content