In order to solve SAT, last time we saw that if we know the probability p that you plug a random string s into SAT to get SAT(s) = 1, then we can solve SAT. But how do we know p? Today, we study approximating p. (This lecture is not at all about quantum)

Last time, we can solve SAT given that we know p. This time, we want to estimate p. We also saw each "combo" we rotate about \theta = 2\sqrt{\frac{m}{N}} = 2\sqrt{p}. And we need O(\frac{1}{\sqrt{p}}) many rotation to achieve our goal.

So, our goal is to figure out p by figure out the mystery (but fixed) rotation \theta. We ask: what is the angle \theta done by one "combo"?

Notice once we know either p or \theta, we know both of them, since \theta = 2\sqrt{p}.

Sharp-SAT: We don't know how many m there is satisfying f. We don't know the probability p = \frac{m}{N} < 50\%

In summary, we need to find the bias of a coin with probability p by flipping the coin. A fair guess is to just count the fraction of head or tail we get, there are a few problems

- When the number of count is small (either head or tail), the fraction is very unstable. (We can't really count on the last digit of output if we want to be within 10%)
- When the number of count is small, the percent error is very sensitive
- In practice, you should try flipping coin geometrically (ie. 10 coins, 100 coins, 1000 coins, ...), and 10 coins become negligible when you flip 100 coins. Geometric number converges.

For small p, you have expectation E[X] = np, Var(X) = np(1-p) \simeq np. If we want stdev \leq \epsilon E[X], we have

\begin{align*}
Var(X)^2 \leq& \epsilon E[X]\\
\sqrt{np} \leq& \epsilon E[X]\\
\sqrt{np} \leq& \epsilon np\\
\frac{1}{\epsilon} \leq& \sqrt{np}\\
\frac{1}{\epsilon^2} \leq& np\\
n\geq& \frac{1}{\epsilon^2p}\\
\end{align*}

Conclusion: If we want p to be in bound [(1-\epsilon)p, (1+\epsilon)p, we need

O\left(\frac{1/\epsilon^2}{p}\right) = O\left(\frac{1}{\epsilon^2 p}\right)

To estimate \theta \in (1 \pm \epsilon)\hat{\theta}, what is p's range we should estimate?

\begin{align*}
\theta \in& (1 \pm \epsilon)\hat{\theta}\\
\theta^2/4 \in& (1 \pm \epsilon)^2\hat{\theta}^2/4\\
p \in& (1 \pm \epsilon)^2 \hat{p} \tag{by $\theta = 2\sqrt{p} \implies p = \frac{\theta^2}{4}$}\\
p \in& (1 \pm 2\epsilon + \epsilon^2) \hat{p}\\
\end{align*}

Conclusion: If we want \theta to be in bound [(1-\epsilon)\theta, (1+\epsilon)p, we need

O\left(\frac{1/\epsilon}{\theta}\right) = O\left(\frac{1}{\epsilon \theta}\right) = O\left(\frac{1}{\epsilon \sqrt{p}}\right)

Note that the fact we get speed up in estimating \theta is due to our substitution \theta = 2\sqrt{p}, the power of Grover: rotate once \sqrt{p} > p get us to the goal faster than stepping p toward goal.

Takeaway: It is cheaper to estimate \theta (with quantum method) than estimate p (with classical method), since estimating \theta in a bigger range can give us p in a smaller range. There is a speed up if measuring \theta in a bigger range is cheaper than measuring p in a smaller range.//QUESTION: is it the meaning of the calculation? What is the takeaway of above calculation

Since we only get 1 bit of information when we measure, we need to split our tasks into measuring each digits of \theta. For example:

Let's say our ground truth is \theta = 0.123456789...d \times 2\pi, we need to measure 1, 2, 3, 4, 5, 6, 7, 8, 9, ..., d separately. We therefore need to separately determine the approximate angle of:

- \theta^1 = 0.123456789...d \times 2\pi
- \theta^2 = 1.23456789...d \times 2\pi = 0.23456789...d \times 2\pi
- \theta^3 = 2.3456789...d \times 2\pi = 0.3456789...d \times 2\pi
- ...
- \theta^d = ?.d \times 2\pi = 0.d \times 2\pi

Therefore the algorithm is:

```
def MeasureRotationOperationAgainst(Rot_theta, makeStart, d):
# for each significant digits
for i = 1, 2, 3, ..., d:
|start> = makeStart()
# measure log(d) times (to yield high probability)
for m = 1, 2, 3, ... log(d):
# rotate |start> 10^i times
for j = 1, 2, 3, ..., 10^i:
Rot_theta |start>
Measure |start>
# make one prediction
cos^2(10^i * theta) = fraction of measurements that are 0
return theta
```

Notice that to create \theta^i, we need to perform Rot_\theta repeatably for O(10^{i - 1}) times. Then with O(\log d) measurements for \theta^i, we can get a digit with "high confidence" (assuming failure probability O(1/d^2)), giving us total O(\log d \cdot 10^{i - 1}) many rotations.

The cost of above algorithm (rotation) is:

\sum_{i = 1}^d O(\log d) \cdot O(10^{i - 1}) = O(\log d) \cdot O(10^{d - 1})

Note that if you can do number of measurement for each digit not O(\log n) but just little more than O(1). E.g., you can take the failure probability on the first stage to be like 1.5^{-d} which cost extra factor of O(d) operations. By tailoring this so you do fewer and fewer repetitions as you get to later and later stages, you can arrange to get an overall small constant failure probability while still retaining a total time bound of O(2d).

Conclusion: to estimate \theta up to d digits, we need O(10^{d-1}) (or O(\log d 10^{d-1}) if you want actual high probability bound) Rotations and O(d) measurements.

We can improve this algorithm by a lot: 1. carefully produce bulk repetition operation with repeating square. 2. don't remake |start\rangle.

// QUESTION: is rotation truly O(1)? I mean, some rotations seems definally harder to achieve than others. Image rotation by pi/2 degree should be a lot harder to implement than rotation by pi/2 radiance.

If you want additive accuracy \theta \pm \tau, it turns out we can use only O(1/\tau) rotations.

Table of Content