15-854B: Advanced Approximation Algorithms: 2008

Wednesday, April 30, 2008

Another clarification on HW6

For 2a, it should say "expressed as a sum of products of up to k of f's Fourier coefficients".

Thanks to Jonah for pointing this out.

Tuesday, April 29, 2008

Ok, yeah, part (b) was super-vague, sorry about that. Here is a more precise question, I think.

Consider the matrix $M_2(y)$; if this is psd, then we know, e.g., that $y_{abcd}$ can be written as the dot product of vectors $z_{ab}$ and $z_{cd}$, and also as the dot products of vectors $z_{bc}$ and $z_{ad}$ etc. Similarly $y_{ab}$ can be written as $z_{a}$ dot $z_b$, and also as $z_{ab}$ dot $z_\emptyset$. Hence we now have these vectors $z$'s, each subscripted by at most two vertices, with some constraints relating them.

The goal of the problem is to show that the SDP $Q_1(K_{VC})$ is at least as strong as a natural vertex cover SDP, which assigns a vector $v_i$ to each node $i$ to minimize $\sum_i v_i \cdot v_i$ subject to

$(v_\emptyset - v_a)(v_\emptyset - v_b) = 0$ for all edges $ab$.
$v_\emptyset \cdot v_\emptyset = 1$ (say).

Can you show that the SDP you've created implies these constraints?

Does your SDP also imply the triangle inequalities:

$||v_a - v_b||^2 + ||v_b - v_c||^2 \geq ||v_a - v_c||^2$?

If things are still vague (or if you see some mistakes above), please email/post yr questions.

Monday, April 28, 2008

Course evaluation time...

Hi all: when you get a chance, could you go to https://my.cmu.edu/site/main/page.academics and give yr feedback on the course. thanks!

Sunday, April 27, 2008

Integrality Gaps for Sparsest Cut

A quick comment about the sparsest cut integrality gaps we saw last time, and the connection to the dual. (BTW, the notes for Lec 27 are up, thanks Varun!)

In Lecture 27, the arguments for the gaps went thus: at least $k$ of the demands need to travel through $p$ edges in the network, hence sending $\tau$ fraction of each unit demand uses $\tau \cdot k \cdot p$ volume. And since we dealt with unit capacities, there was only $m$ volume in the network, hence the maximum concurrent flow is at most $m/(pk)$. For the 5-node example, $p=2$, $k=4$ and $m=6$; for the expander, $k = \Omega(n^2)$, $p = \Omega(\log n)$, and $m = O(n)$.

But suppose we are more sophisticated:

Suppose we assign lengths $d_e$ to each edge $e$, so that the shortest path according to these edge lengths from $i$ to $j$ is now $d_{ij}$. Think of each of the edges $(i,j)$ as pipes with capacity $C_{ij}$ and length $d_{ij}$.
Now if the pair $ij$ sends $\tau D_{ij}$ flow, each unit of flow must travel $d_{ij}$ distance. Hence the total volume used by this pair is $\tau D_{ij} \cdot d_{ij}$. And the total volume used is $\sum_{ij} \tau \cdot D_{ij} \cdot d_{ij}$.
But the total volume in the network is now $\sum_{(i,j) \in E} C_{ij} d_{ij}$.

Hence, the maximum concurrent flow $\tau^*$ is at most $\sum_{(i,j) \in E} C_{ij} d_{ij}$ divided by $\sum_{ij} \tau \cdot D_{ij} \cdot d_{ij}$. Moreover, this is true for any setting of the edge lengths $d_e$. Hence, $\tau$ is at most the minimum of $(C.d)/(D.d)$ over all metrics $d$, or in other words, $\tau \leq \lambda^*$!

What we have just proved is weak LP-duality for the case of sparsest cut. The fact that $\tau^*$ equals $\lambda^*$ is given by strong duality.

Monday, April 21, 2008

Hardness of Independent Set

I wanted to say a few more things about the hardness of Max-Independent-Set (equivalently, Min-Vertex-Cover). Recall that in Lecture 25, we sketched the proof of factor $1/n^{1-\epsilon}$ NP-hardness for this problem; specifically, Hastad's result shows that distinguishing $1/n^{\epsilon}$ vs. $1/n^{1-\epsilon}$ is NP-hard for all $\epsilon > 0$.

This is a great hardness factor, but (as they used to say) "the gap is in the wrong place" -- or at least, it's not in the strongest possible place. It is of great interest to try to obtain a hardness result of the form $\Omega(1)$ vs. $o(1)$. This will give a worse "factor", but the result is essentially stronger. Note that we can't hope for $\Omega(1)$ vs. $1/n^{1-\epsilon}$ NP-hardness; Boppana and Halldorsson showed, roughly, that $1/k$ vs. $1/n^{(k-2)/(k-1)}$ is in P (using Ramsey theory) and this was improved by Alon and Kahale to, roughly, $1/k$ vs. $1/n^{(k-2)/(k+1)}$, using the Karger-Motwani-Sudan SDP coloring algorithm.

One reason hardness results like $\Omega(1)$ vs. "small" are better is that they better "factor" hardness for Min-Vertex-Cover. Recall that, trivially, a $c$ vs. $s$ hardness result for Min-Independent-Set is a $1-c$ vs. $1 - s$ hardness result for Max-Vertex-Cover, and thus a $(1-s)/(1-c)$-factor hardness for Max-Vertex-Cover. For example, Hastad's hardness gives only $1+o(1)$ factor hardness for Vertex-Cover, but a result like $1/3$ vs. $o(1)$ hardness for Independent-Set would give factor $3/2 - o(1)$ hardness for Vertex-Cover.

For a long time, the best NP-hardness result like this we knew for Independent-Set came simply from taking Hastad's $1 - \epsilon$ vs. $1/2 + \epsilon$ hardness result for Max-E3Lin and applying the FGLSS reduction to it. You should (easily) check for yourself that this gives $1/4 - \epsilon$ vs. $1/8 + \epsilon$ NP-hardness for Independent-Set. Hence it gives $3/4 + \epsilon$ vs. $7/8 - \epsilon$ NP-hardness for Vertex-Cover; i.e., factor $7/6 + \epsilon$.

In 2002, in the same conference as Khot's Unique Games Conjecture paper, Dinur and Safra published a stronger NP-hardness result for Independent-Set. This was a pretty amazing paper, because the outline of the proof was:

a) Prove a super-weak variant of the Unique Games Conjecture;
b) Do a very difficult Dictator Test based on the combinatorics of $2$-intersecting families.

Part (a), as mentioned, was done contemporaneously with the UGC paper, while part (b) was done before the work on Ek-Independent-Set described in our Lectures 8 and 9.

I like to remember their result as $1/3$ vs. $1/9$ NP-hardness for Independent-Set (omitting the $\epsilon$'s), and hence factor $(8/9)/(2/3) = 4/3$ hardness for Vertex-Cover. In fact, they pushed it a little further. Specifically, for $p = (3 - \sqrt{5})/2$, they showed $p$ vs. $4p^3 - 3p^4$ NP-hardness for Independent-Set. This is, like, .382 vs. .159. This yields factor 1.36 or so NP-hardness for Vertex-Cover.

That stands as the best NP-hardness result known. Relatively soon after the UGC paper, Khot and Regev showed showed that assuming UGC, $1/2 - \epsilon$ vs. $\epsilon$ hardness holds for Min-Independent-Set, and thus factor $2-\epsilon$ hardness for Vertex-Cover. Indeed, they showed UGC-hardness of $1/k - \epsilon$ vs. $\epsilon$ for Min-Ek-Independent-Set.

End of Lecture 25

Here is the end of the proof I didn't get a chance to finish from Lecture 25.

Homework 5 graded

Homework 5 is graded and will be handed back tomorrow. The average was 79%.

A few comments:

1. Most people got this right (one way or another).

2. For part b), the two equally simplest counterexamples are {$x_1$, $x_1$, $1$}, or {$x_1$, $x_1$, $x_1$}. Aaron, Kanat, and Mike all gave the same, far-from-simplest counterexample. What a coincidence...

3. Many people failed to prove a characterization (if and only if statement) in c), and just gave an implication. Many people also had complicated solutions for the support-3 problem. Ali and Amitabh gave roughly the same solution, which is probably the slickest. Suppose f = a $\chi_S$ + $b \chi_T$ + $c \chi_U$, where the coefficients $a, b, c$ are nonzero and $S$, $T$, $U$ are distinct. By Parseval,

$a^2 + b^2 + c^2 = 1$ (*).

Substituting in $(1, ..., 1)$, we get $a + b + c \in {-1,1}$. Squaring this and subtracting (*) yields

$ab + bc + ac = 0$ (**).

Since $S$, $T$, $U$ are distinct, there must be an index $i$ which is in exactly 1 or 2 of them. Substituting in the string which is 1 everywhere except that it's $-1$ on the $i$th coordinate, we conclude $\sigma_1 a + \sigma_2 b + \sigma_3 c \in {-1,1}$, where exactly 1 or 2 of the signs $\sigma_i$ are $-1$. Squaring this and subtracting (*) yields

$\tau_1 ab + \tau_2 bc + \tau_3 ac = 0$ (***),

where exactly two of the signs $\tau_i$ are $-1$. Now add (**) and (***) and conclude that one of $ab$, $bc$, or $ac$ must equal 0, a contradiction.

4. Most people got (a) and (b). For (c), several people estimated the degree-1 Fourier coefficients of Majority and concluded that it's, say, $(1/n, 1/2)$-quasirandom. Only Eric truly ventured to high degree, pointing out that $Inf_i(Maj)$ can be seen to be $\Theta(1/\sqrt{n})$, and therefore by 5a, Maj is $(\Theta(1/\sqrt{n}),0)$-quasirandom. In fact, one can show that the largest Fourier coefficients for Maj are the ones at level $1$, and hence Maj is $(1/n, 0)$-quasirandom.

5. Most people more or less got this.

6. Most people mostly got this; the key in b) is picking the right diameter.

7. Most people got this too, although many failed to do both parts of f).

Tuesday, April 15, 2008

Homework 6 out

Homework 6 is out: it's due May 1st.

Monday, April 14, 2008

Lecture 24: UGC-Hardness of Max-Cut

The completion of the proof of the reduction from Unique Label-Cover to Max-Cut can be found here.

Thursday, April 3, 2008

Homework 4

A high-scoring one: the average was 88. Notes and comments soon.

Monday, March 31, 2008

Hwk 5 clarifications

HW5 has been extended to Thursday Apr 3rd. Here are some typos to watch out for ---

Q6, the demands are all integers.
Q7e, it should be $\epsilon/\log n$, and not $\epsilon \log n$.

Tomorrow's class

Tomorrow we're going to dive more deeply into hardness of approximation results for CSPs. Please take a look again at the ones we've already proved: for Max-Coverage (Lecture 3) and for Min-Ek-Hypergraph-Independent-Set (Lectures 8 and 9). (

Saturday, March 29, 2008

Lecture notes 16 and 17

are posted. Sorry for the delay.

Monday, March 24, 2008

Prep for lecture tomorrow: recap Lecture 18

For tomorrow's lecture (#19), please recall the definition of metrics embedding into trees (problem #6 from HW4). Also, have a look at the correctness proof for low-diameter decompositions posted on Thursday, and try to prove the following extension of that result:

Extension: For a vertex $v$ and radius $r$, let $B(v,r) = \{ w | d_{vw} \leq r\}$ be the "ball" of all vertices within distance $r$ to $v$. Extend the proof from Lecture 16 to show that the probability of an edge $(u,v)$ being cut is at most

$O(1) \times (d_{uv} / r) \times \log (|B(u,2r)| / |B(u,r/4)|)$.

(Hint: The previous proof sums $1/j$ for all $j$ from $1$ to $n$ to get $H_n$. Can you show it suffices to sum over a smaller range? You may also want to consider cases depending whether or not $d_{uv}$ is at least $r/4$.)

Friday, March 21, 2008

Homework 5

Sorry for the delay, HW5 is up on the web page.

Thursday, March 20, 2008

Lecture 18: Last bits of the proof

Recall the low-diameter decomposition procedure: we picked a random radius $X$ and a random permutation $\pi$. We then constructed the random partition by considering all the vertices one by one, in the order given by $\pi$: when considering $w$, we assigned all the yet-unassigned vertices $v$ with $d(w,v) \leq X$ to $w$'s partition.

Just for the analysis: suppose we renumber the vertices $w_1, w_2, \dots, w_n$ in order of their distance from the edge $e = (u,v)$.

The crucial definitions (that avoid pain later) are the following:

At some time instant in this procedure, one (or both) of $u$ or $v$ gets assigned to some $w_i$. Say that $w_i$ settles the edge $(u,v)$.
At the moment the edge is settled, if only one endpoint of this edge gets assigned, then we say that $w_i$ cuts the edge $(u,v)$.

Note that each edge is settled at exactly one time instant in the procedure, and it may or may not be cut at that point in time. Of course, once the edge is settled (with or without being cut), it is never cut in the future.

Consider $w_j$ and let its distance $d(w_j,u) = a_j$ and $d(w_j, v) = b_j$. Assume $a_j < b_j$, the other case is identical. If $w_j$ cuts $(u,v)$ when the random values are $X$ and $\pi$, the following two properties must hold:

The random variable $X$ must lie in the interval $[a_j, b_j]$ (else either none or both of $(u,v)$ would get marked).
The node $w_j$ must come before $w_1, ..., w_{j-1}$ in the permutation $\pi$.

Suppose not, and one of them came before $w_j$ in the permutation. Since all these vertices are closer to the edge than $w_j$ is, then for the current value of $X$, they would have settle the edge (either capture one or both of the endpoints) at some previous time point, and hence $w_j$ would not settle---and hence not cut---the edge $(u,v)$.

Now the rest of the argument is as in class: Pr[ edge $e$ is cut ] = $\sum_j$ Pr[ $w_j$ cuts the edge $e$]. Moreover, Pr[$w_j$ cuts $e$] $\leq$ Pr[$X \in [a_j, b_j]$ and $w_j$ comes before $w_1, ... w_{j-1}$ in the permutation $\pi$] $\leq$ $(d_{uv}/(r/2)) \times (1/j)$. And summing this over all vertices gives us $2 (d_{uv}/r) H_n$.

Sunday, March 16, 2008

Homework 4, problem 2b comment

Just show you can get a gap at least this bad. (In other words, although you will need to construct a good SDP solution for $C_5$, you need not prove it is optimal. Unless you want to. Say, by using duality.)

Homework 3 graded

I'll return them on Tuesday. The average was 79%. Some comments:

1a) Several people erroneously claimed a log*|D| approximation here. (In fact, no factor approximation is possible.) The reason the AKC algorithm does not work is as follows: Assume we have correctly guessed R = Opt. At first, we indeed know that there k vertices which R-cover all of D. So the first Set-Cover approximation returns some set of vertices A R-covering D. But now, we do not know that there are k vertices which R-cover A. So we are stuck.

2. This was surprisingly not done so well. Most people seemed to get the key point that ($c_0$, $s_0$) hardness implies hardness along the two line segments (in your diagram) joining this point to (1,1) and to (0,0). But people seemed to have a very hard time deducing the easy region (due to the Vertex-Cover algorithm), which is the triangle with vertices (1,1), (1, 0), and (1-1/k, 0).

3. Most everybody got this one, except for failing to point out in part b the simple but key fact that the reduction preserves Opt = 1.

4. As mentioned earlier, this question was kind of screwed up. In honor of that, everyone received 3 bonus points. Mainly, we just wanted you to revisit the Max-Coverage proof, pay attention to the parameters, and see how the parameters from #3 fit in.

5. Almost no one tried this, although it is not too hard.

6. Most people who tried this more or less got it, although some failed to argue why shifting preserves the weight of the set. For a solution, see the original paper.

Mathematical odds and sods for SDP

Hello all, hope you had a good spring break. Scribe notes for the last two lectures are up. Sorry for the delay; my fault, not the scribes'.

I thought I would record here a few small mathematical facts that came up in our lectures on SDP algorithms for Max-Cut and Coloring.

The GW constant: Recall the GW constant,

$\alpha := min_{-1 \leq \rho \leq 1} [\arccos(\rho)/\pi]/[1/2 - \rho/2]$.

Using calculus and some trigonometric substitutions, you can determine that $\alpha = 2/(\pi sin \theta)$, where $\theta$ is the positive solution of $\theta = \tan(\theta/2)$. As far as I know, it is not known if $\alpha$ is irrational, although presumably it is transcendental.

Gaussians: Remember that a real random variable X has the "standard Gaussian (normal) distribution" if its pdf is $\phi(x) := (1/\sqrt{2\pi}) \exp(-x^2/2)$. The annoying thing about this pdf is that it does not have a closed-form integrals although several closely related expressions do. For example, $x\phi(x)$ has $-\phi(x)$ as an antiderivative. It's easy to check that E[$X^k$] = 0 whenever $k$ is an odd positive integer, and with enough integration-by-parts effort you can check that E[$X^k]= (k-1)(k-3) \cdot \cdot \cdot 3 \cdot 1$ whenever $k$ is an even positive integer. In particular, X has mean 0 and variance 1.

In class we were particularly interested in the "tail", $\mathcal{N}(t) := $ Pr[X $\geq t$] $= \int_{t}^\infty \phi(x) dx$, where $t$ is positive. Here is how to accurately estimate this quantity:

For an upper bound, just insert a factor of $x/t$ into the integrand, which is always at least $1$ on the range. Then you can immediately integrate and get the upper bound

$\mathcal{N}(t) \leq (1/t) \phi(t)$.

For a lower bound, you need a cleverer trick. Insert a factor of $1 - 3/x^4$ into the integral (!), which is always less than $1$. You can check that the resulting quantity, $(1-3/x^4)\phi(x)$ has an explicit antiderivative: $(1/x^3 - 1/x)\phi(x)$. Thus you can conclude

$\mathcal{N}(t) \geq (1/t - 1/t^3) \phi(t)$.

(I think this trick is due to Feller.) Thus we have excellent bounds for $\mathcal{N}(t)$ for large $t$: in particular,

$\mathcal{N}(t) \sim (1/t) \phi(t)$

as mentioned in class.

How to draw a Gaussian random variable: In practice, just use "randn" in Matlab :) In theory, suppose you have access to an oracle which gives you a uniform real from the interval [0,1]. Then the easiest thing is to draw two random numbers S and $\theta$ from [0,1] and then set g = $\sqrt{\ln(1/s)} \cos(2\pi \theta)$, h = $\sqrt{\ln(1/s)} \sin(2\pi \theta)$. Then g and h are two independent Gaussians! (This is the "Box-Muller Transformation".) It is a nice exercise to see why this is true.

Of course, in practice in theory (if you will), the usual model of randomness is to assume you have random bits. Also, of course, you can't compute ln, cos, sin, $\pi$, etc. to arbitrary precision. So in reality, you need to make some approximations.

Just as with the issue of solving SDPs exactly, these approximations introduce additive errors of $\epsilon$, with running time overhead of polylog($1/\epsilon$). And just as with the SDP-solving issue, we will try to never speak of such annoying details again :)

Compendium of approximability

[somehow I deleted the original of this posting...]

This site, by Pierluigi Crescenzi and Viggo Kann describes the best known approximation guarantees and inapproximability results for a very large number of problems.

It's quite out of date, but still a useful resource.

Tuesday, March 4, 2008

k-Set-Cover

Jeremiah asked me today about the status of the problem "k-Set-Cover" for small k. This is the Min-Set-Cover problem where all the sets have cardinality at most k.

I wasn't exactly sure on the status, actually. On the homework we showed that the greedy algorithm gives an $H_k$ ($k$th harmonic number) factor approximation. I guessed that this was known to be more or less sharp.

Turns out that is true to some extent. With some augmentations to Feige's result, Trevisan showed that the problem is hard to approximate within a factor of $\ln k - \Omega(\ln \ln k)$.

While this looks pretty sharp, it's actually useless for the relatively interesting case of small constant $k$. For $k = 2$ the problem is AKA "Edge Cover", and it's solvable in P.

For $k \geq 3$ the problem is known to be NP-hard, and furthermore known to be "APX-hard", meaning that it is hard to approximate within a factor of $1 + \epsilon$ for some universal positive $\eps > 0$. (Result due to Papadimitriou and Yannakakis?) Improving this to an explicit decent value of $\epsilon$, depending on $k$, would be a nice problem.

On the positive side, the case of most interest to Jeremiah was $k = 3$. Here the best result is due to Duh and Fürer: a factor 4/3 algorithm. Actually, they get a factor $H_k - 1/2$ algorithm for any $k$.

There have been subsequent marginal improvements. A result slightly better than 19/12 is known for $k = 4$, and the world record is $H_k - .59$ for all $k \geq 6$; that appears in this recent paper of Athanassopoulos, Caragiannis, and Kaklamanis.

Friday, February 29, 2008

Homework 2: Notes and Comments

Here are solution sketches and comments on the rest of problems in HW2.

For the naive algorithm on unweighted graphs, several answers said: "if an edge is uncovered, the algorithm will pick two vertices when the optimal algorithm picks one vertex". While this is true (and is indeed the basis for the proof), you need to show that we are not over-counting. One way is to clearly state a lower bound on OPT (the size of any maximal matching), and state why the algorithm is at most twice the lower bound.

For the randomized VC algorithm, see the solutions to problem 2b from this previous semester's course.

For the primal-dual algorithm, we write down the dual which maximizes $\sum_e y_e$ such that for each vertex $\sum_{e: v \in e} y_e \leq c_v$, and the $y_e$ values are non-negative. We start with the zero primal and dual solutions, and raise all the $y_e$'s uniformly until some constraint becomes tight (i.e., for some vertex $v$, $\sum_{e: v \in e} y_e = c_v$. Pick the vertex in the primal solution, and "freeze" all the variables involved in this constraint. Now continue raising all the unfrozen edge variables, picking vertices and freezing edges when other contraints become right etc, until all the edge variables become tight: we now have have a feasible primal vertex cover $C$. (why?) Now the cost of this solution is $\sum_{v \in C} c_v = \sum_{v \in C} \sum_{e: v \in e} y_e$ $\leq \sum_{e} \sum_{v \in C \cap e} y_e \leq 2 \sum_e y_e$, which is the dual value. (The last inequality is because each edge contains at most 2 vertices.) Hence the vertex cover costs twice the feasible dual solution.

For the greedy algorithm, if you have a star with center cost $1+\epsilon$, and $n$ leaves with costs $1, 1/2, 1/3, \ldots, 1/n$, then you will pick all the leaves instead of the center, and hence only get an $H_n$ approx. An $O(\log m) = O(\log n)$ upper bound follows from the standard set cover analysis.
One point often missed: this problem defined $C^*$ and $F^*$ to be the facility and connection costs for the optimal integer solution, and not for the optimal LP solution.

Also, if you claim that after filtering you can use the standard Set Cover randomized rounding to obtain a solution, you need to show how the LP solution for facility location can be mapped to a LP solution for Set Cover.
No surprises here.
Build a bipartite graph on clauses and variables, and use Hall's theorem to infer the existence of a perfect matching.

Thursday, February 28, 2008

Homework 2 graded

Sorry for the delay. We'll hand it back today. Overall average was 84%.

Solutions to Homework 2, #6

7a. Take any $s$ sets in $\mathcal{F}_{s,t,\ell}$. By definition, each misses at most $\ell$ elements from $[t + s\ell]$. Thus their intersection misses at most $s \ell$ elements from $[t + s\ell]$ and thus has size at least $t$.

7b. Extend the definition of $S_{ij}$ to sets in the natural way. It's obvious that if $A$ is in $\mathcal{F}_{s,t,\ell}$ then $S_{ij}(A)$ is too. Hence shifting has no effect on $\mathcal{F}_{s,t,\ell}$.

7c. This one definitely takes much more explanation, but I don't think it's actually tricky -- you just have to follow your nose all the way through. However, you do have to avoid the following two extremely common mistakes:

"$S_{ij}\mathcal{F}$ is given by taking $S_{ij}(A)$ for all $A \in \mathcal{F}$." -- Not true.

"For any $A' \in S_{ij}\mathcal{F}$, there is an $A \in \mathcal{F}$ such that $S_{ij}(A) = A'$." -- False false false! Consider $i = 1$, $j = 2$, $n = 2$, $\mathcal{F} = \{ \{1\}, \{2\} \}$. Then $S_{12} \mathcal{F} = \mathcal{F}$, and yet for $\{2\} \in S_{ij} \mathcal{F}$, there is no set $A \in \mathcal{F}$ such that $S_{12}(A) = \{2\}$!

Here is my two-part proof:

Part 1.
Suppose by way of contradiction that there are $s$ sets $B_1, ..., B_s$ in $S_{ij} \mathcal{F}$ whose intersection is of size less than $t$. Let $A_1, ..., A_s \in \mathcal{F}$ be the sets they "came from". So for each $k$, either $B_k = S_{ij}(A_k)$ or $B_k = A_k$, the latter occurring when $S_{ij}(A_k)$ is also in $\mathcal{F}$. Clearly, the $A_k$'s are all be distinct, and so the intersection of the $A_k$'s has size at least $t$.

How could the $A_k$'s have intersection at least $t$ and yet the $B_k$'s have intersection less than $t$? Certainly nothing changes on coordinates other than $i$ or $j$. It's also clear $S_{ij}$ can't cause coordinate $i$ to leave the intersection. This means that it must be the case that coordinate $j$ left the intersection. In particular, $j$ is in the intersection of the $A_k$'s and not in the intersection of the $B_k$'s.

Further, since we know that $j$ is in the intersection of the $A_k$'s, it follows that $i$ is not in the intersection of the $A_k$'s. Otherwise, we have both $i$ and $j$ in each $A_k$, which surely means that $i$ and $j$ are both in each $B_k$. But we've already established that $j$ is not in the intersection of the $B_k$'s.

Finally, since $i$ is not in the intersection of the $A_k$'s, and since shifting caused the intersection size to go down, $i$ had better not be in the intersection of the $B_k$'s.

To summarize, $j$ but not $i$ is in the intersection of the $A_k$'s, and neither $i$ nor $j$ is in the intersection of the $B_k$'s. Further, it must be the case that the $A_k$'s have intersection size exactly $t$.

Part 2.
Focus on the coordinates $i$ and $j$ in the $A_k$'s now. For each $A_k$ with a $0$ in the $i$ coordinate and a $1$ in the $j$ coordinate (and there is at least one such $A_k$), how come all those $1$'s didn't shift over into the $i$th coordinate, causing all the $B_k$'s to have $1$'s in their $i$th coordinate?

It must be because there is at least one $A_k$ -- say $A_1$ WOLOG -- such that $S_{ij}(A_1)$ was already in $\mathcal{F}$. So let's look at the intersection of the following $s$ distinct sets in $\mathcal{F}$: $S_{ij}(A_1), A_2, ..., A_s$. These $s$ sets must also have intersection size at least $t$. Now the intersection of these is very similar to the intersection of $A_1, ..., A_s$; however, $S_{ij}(A_j)$ does not have a $0$ in the $j$th coordinate. This means $j$ is not in the intersection of $S_{ij}(A_1), A_2, ..., A_s$.

But this is a risky situation, because we concluded in Part 1 that $A_1, ..., A_s$ had intersection size only $t$, and this included the fact that $j$ was in the intersection. The only way $S_{ij}(A_1), A_2, ..., A_s$ could still have intersection size $t$ is if miraculously, putting in $S_{ij}(A_1)$ causes $i$ to be in the intersection, where it wasn't before.

But the only way for this to happen is if $A_1$ was the only set among $A_1, ..., A_s$ with the pattern $0, 1$ in coordinates $i, j$. In this case, every other $A_k$ has the pattern $1, 1$, which certainly implies $B_k = A_k$. Further, we have $B_1 = A_1$ by assumption (since $S_{ij}(A_1)$ was already in $\mathcal{F}$). So in fact all $B_k$'s equal their $A_k$'s. This is a contradiction because in Part 1 we concluded that $j$ was in the intersection of the $A_k$'s but not in the intersection of the $B_k$'s.

Tuesday, February 26, 2008

Lecture 13: The Hardness Endgame

Here's the final steps to the hardness of Priority Steiner tree. The details are in the paper On the Approximability of Network Design Problems, J. Chuzhoy, A. Gupta, J. Naor and A. Sinha, SODA 2006.

What we saw Today in Class. Recall that we assumed that for some $h$, we had set cover instances with universe of size $u$, $s$ sets, each set of size $z$, where

the Yes instances had a solution using at most $X$ sets, but
the No instances had the property that even picking $h\cdot X$ sets would leave a $2^{-c \cdot h}$-fraction of the universe uncovered. (Let us call this the $h$-factor property.)

The Yes instances mapped to instances of PST where the solution cost was at most $X$.

In the No instances, we proved a claim that if the total cost of edges bought in levels $1, 2, \ldots, j-1$ was at most $hX/2$, then the cost incurred at level $j$ itself would be at least $(u/4z)\cdot 2^{-ch}$. In particular, if $k$ were (at least) $(4zhX/u)\cdot 2^{ch}$, this claim implies that the total cost in the No instance is at least $hX/2$.

And hence we have a gap of $h/2$ between the Yes and the No instances.

Parameters (The Questions). So how do we set the parameters so that

the set cover instance satisfies the $h$-factor property.
The number of levels $k$ is at least $(4zhX/u) \cdot 2^{ch}$.
the size of the construction $N \approx (2z)^k \cdot s$ is reasonable.

Our Friend the Set Cover Instance. The construction of the set cover instance is the usual thing as discussed in Lecture 11 (or in Homework 3): note that we can start from a SAT instance of size $n$, and get a label cover instance with $n'$ nodes in each partition, each node participating in $d'$ constraints, label set $L'$, key set $K'$, where $n' = n^{- \log \eta}$, and $d', |L'|, |K'|$ are all $poly(1/\eta)$.

In turn, this gives us a set cover instance with universe size $u = n'd'2^{|K'|}$, number of sets $s = n'|L'|$, size of each set being $z = d' 2^{|K'| - 1}$, the optimal solution size is $X = 2n'$, and it satisfies the $h$-factor property above if $1/h^2 > O(\eta)$.

Parameters (The Answers). To answer the questions above:

the set cover instance satisfies the $h$-factor property as long as $h^2 < O(1/\eta)$. This is satisfied if we set $\eta \approx h \approx \Theta(\log \log n)$.
Note that the parameters derived above satisfy $zX/u = 4$, and hence $k = 16h 2^{ch} \approx \Theta(\log n)$ is sufficient.
the size of the construction is $N \approx s (2z)^k = n^{- \log \eta} \cdot poly(1/\eta)^{(2^h)} \approx n^{\log \log \log n}$.

Finally, the gap between the Yes and No instances is $h = \Omega(\log \log n) = \log \log N$ as well. This completes the proof.

Questions and comments about today's lecture go here, as usual.

Homework 3 correction -- Problem 4

Hi -- as many of you noticed, the last part of Problem 4 does not look so doable :)

The result is in fact true -- indeed, $\Omega(\log n)$ hardness for Set-Cover is known even subject to P $\neq$ NP. However the method illustrated probably won't cut it.

Please do one of the following instead:

1. Use this method to prove $f(n)$-hardness for Set-Cover, for as large a function $f(n)$ as you can -- subject to NP not in DTIME($2^{o(n)}$) or an even better DTIME assumption.

OR

2. Prove $\Omega(\log n)$-hardness for Set-Cover assuming NP not in BPTIME($n^{polylog n}$) -- using any method.

Tuesday, February 19, 2008

Integrality Gap for Group Steiner

The integrality gap of $\Omega(\log^2 k)$ was proved in Integrality Ratio for Group Steiner Trees and Directed Steiner Trees, E. Halperin, G. Kortsarz, R. Krauthgamer, A. Srinivasan, and N. Wang. SIAM Journal on Computing, Vol. 36, 1494-1511, 2007. Here's the idea.

Create a balanced $\delta$-ary tree with height $H+1$ levels, where the edge costs decrease exponentially: edges at level $h$ cost $1/2^h$. (Hence edges incident to the root cost $1/2$, and edges incident to the leaves cost $1/2^{H}$.)
The arity of the tree is $\delta = c \log n$ for some constant $c$. So the height is $H = \log n/\log \delta$. The number of leaves is $n$, the total number of nodes is $n(1+o(1))$. The number of groups/colors/elements is $k = 2^{2H}$.
Do the following independently for each of the $k$ elements $x$. The element $x$ starts off at the root. For each of the $\delta$ child edges, it flips a fair coin independently: if the coin comes up heads, a copy of $x$ slides down that edge, where it begins such a random process anew. Note that the expected number of copies of $x$ at level $i$ is $(\delta/2)^i$. The group for $x$ is the set of all the leaves that contain a copy of $x$.

Small fractional solution. It is not that difficult to show that this instance has a fractional solution of value $O(H)$ with high probability (where the probability is over the random choice of groups above):

Consider some node at level $i$ in this tree. Conditioned on this node getting a copy of $x$, the expected number of leaves containing $x$ below it is $\mu_h = (\delta/2)^{H-h}$. With high probability, the actual number lies in $(\mu_h/3, 3\mu_h)$. (Use Chernoff bounds!)
For an edge $e$ going from level $i-1$ to level $i$, set $x_e = 9/\mu_e$ --- using the above facts, this is a feasible solution (whp) with cost $9H$.

No small integer solution. Next, we show that "with high probability, all integer solutions have cost $\Omega(H^2 \log k) = \Omega(H \log^2 k)$." This proves the $\Omega(\log^2 k)$ integrality gap for the problem.

Since each leaf-edge costs $1/2^H$, no tree of cost $C$ can use more than $C2^H$ of these. There are $\delta^H$ leaf-edges, so the number of possible trees of cost $C$ is $[(\delta^H) choose (C2^H)] \leq \exp(CH2^H log \delta)$. Call this value $M(C)$.
Suppose $Fail(C)$ is the probability that for a fixed tree of cost $C$, it misses the group for $x$ created above. By independence of groups, the probability that this fixed tree hits all the groups is at most $(1 - Fail(C))^k \leq \exp(-k \times Fail(C)) = Hitall(C)$.
Hence, taking a trivial union bound over all trees, the chance that there exists some tree that hits all the groups is at most $M(C) \times Hitall(C)$.
If this product is $o(1)$ for some $C$, then with probability $1 - o(1)$ no tree of cost $C$ is feasible. We would like to show this holds true for some constant $\epsilon > 0$ and $C = \epsilon H^2 \log k$.

Now for the most technical part.

One needs to prove that for some fixed tree of cost at most $\epsilon H^2 \log k$, and some $x$, none of the leafs in this tree are hit by the random process above. The proof strategy seems natural (in hindsight), but also somewhat delicate: details in Section 2.3.1 of the paper above. Question: can one get a slightly worse bound but without the case analysis at the end? That may be useful to know...

This analysis of the integrality gap (and an extension of it) is actually used for the proof of soundness in the hardness reduction we saw today! I will write/say some more about this later.

Monday, February 18, 2008

Tomorrow's lecture

For tomorrow, could you brush up what you know Max $k$-ary Consistent Labeling(K,L) from HW1 problem #7 (also defined in the list of problems)? Thanks!

In particular, consider the $2$-ary (also called "binary" case): the constraint hyperedges now become normal edges, and the hypergraph becomes a normal graph. (Note that "strongly" and "weakly" satisfying mean the same thing now.)

Suppose such an instance is a bipartite graph $(U,V,E)$, with $|U| = |V| = n$; moreover, the instance is "regular" (i.e., the number of constraints involving each vertex is some $d$). Remind yourself that for such instances, it is NP-hard to infer whether the value is $1$ or less than $\eta < 1$.

Thursday, February 14, 2008

Notes on the group Steiner tree proof

A few notes on the rounding proof (and how to complete it):

When you round down the $x_e$ values to the power of $2$ below it, note that you might violate the minimality of the solution: e.g., if an edge has value $x_e = 3/4$, and has two children of value $x_{e1} = 1/2$ and $x_{e2} = 1/4$, then rounding down would give $x'_e = 1/2$ but would maintain $x'_{e1} = 1/2$ and $x'_{e2} = 1/4$.

This kind of situation would cause some concern if we want to "merge" edges with identical values in the last step of the preprocessing: how would we merge $e$ and $e1$ in this case? The answer is: we would just "contract" the edge $e1$, and hence any children of $e1$ would now become children of $e$. The key observation is the random process on this new contracted graph connects the same set of leaves to the root as on the original uncontracted graph, since $x'_{e1} = x'_e$.

(Aside: the procedure is fairly robust, so you could alter the procedure and add a step to regain the minimality. Or you could just forget about the last "merging" step entirely. Note that we now no longer have a tree of "height" $\log |g|$, and the closing arguments would now change slightly.)
The argument to compute $\Delta$: suppose the height of the "merged" tree is $H$.

For some vertex $u \in g$, let us first show that $\Delta_u = \sum_{w \sim u} Pr[E_u \cap E_w] \le O(H) \times x'_u$. Summing this over all $u \in g$ gives us $O(H)$, since the $x'_u$'s sum up to at most $1$, and hence completes the proof.

For any $w \in g$, let $e$ be the lowest edge shared by the path from $u$ to the root, and the path from $w$ to the root. A moment's thought tells us that $Pr[ E_u ] = x'_u$, and $Pr[ E_w | E_u ] = x'_w/x'_e$ ~~$Pr[ E_w | E_u ] = x'_u/x'_e$~~. Hence, $Pr[E_u \cap E_w] = (x'_u x'_w / x'_e)$.

Now, if we sum over all $w$ such that the paths from $u$ to root, and $w$ to root have the edge $e$ as their least common ancestor edge as above. Now use the fact that $\sum_{such w} x'_w \leq O(x'_e)$, since all the flow going to such $w$'s must have been supported on the edge $e$ before rounding down. Hence, the sum of $Pr[E_u \cap E_w]$ over all the relevant $w$ for some edge $e$ is $O(x'_u)$.

Finally, the tree is of height $H$, which means there are at most $H$ such terms, giving us $O(H) x'_u$ as claimed.
Aravind Srinivasan has some notes on Janson's inequality: it explains how the "simple" structure of the events $E_i$ and (the independence of the random variables determining these events) allow us to prove such a strong inequality. Joel Spencer also has slides on the proof: in fact, he states it in exactly the form we needed today, with his "bad" events $B_i$ being our "good" events $E_v$!

If anything else was unclear, please post questions, comments, or clarifications!

Tuesday, February 12, 2008

Lecture 9: Finishing Ek-Indep. Set hardness; AKC hardness

Any questions about Lecture 9 can go here.

I wanted to include a niggling detail about the hardness for Independent Set. Recall that in our reduction, our hyperedges involved $k/2$ distinct sets from $\{0,1\}^L_v$ and $k/2$ distinct sets from $\{0,1\}^L_{v'}$, whenever $(u,v)$ and $(u,v')$ were both in $E$.

It is important (why?) to allow the case $v' = v$. However this requires us to take a little care. First of all, in this case, we need to take the hyperedges to be of the form $\{A_1^v, ..., A_k^v\}$ where all $k$ sets are all distinct. This will still be a hyperedge iff $\pi_{v \to u}(\cap_{i=1}^{k/2} A_i)$ is disjoint from $\pi_{v \to u}(\cap_{i=k/2+1}^{k} A_i)$.

Suppose that $v$ is a "good" vertex. We know that $\mathcal{F}_v$ is too big to be $k/2$-wise $t$-intersecting, so it contains some "good" sets $A_1^{v}, ..., A_{k/2}^v$ with intersection size less than $t$. We took Sugg($v$) to be this intersection. As we saw in class, if $v$ has a neighbor $u$, and $v'$ is another good neighbor of $u$, then Sugg($v$) and Sugg($v'$) are "consistent" label sets (i.e., when projected to $u$, they agree on at least one key). In particular, Sugg($v$) and Sugg($v'$) must certainly be nonempty.

The tiny tricky part now is the following: This argument relies on $v' \neq v$! If $v' = v$ then you only have $k/2$ "good" sets in play, and you can't make any sort of argument based on size-$k$ hyperedges. In particular, if $v$ has no other "good" distance-2 neighbors, Sugg($v$) may well be empty!

But: it's not like it's a problem. In this case, just put an arbitrary element into Sugg($v$). Then the soundness argument is fine; any neighbor $u$ of $v$ must have $v$ as its buddy, and then of course the $(u,v)$ constraint will be satisfied with probability $1$!

Saturday, February 9, 2008

Lecture notes

for the previous week (Lec5 and Lec6) are up. Tuesday's lecture (Lec7) will be up shortly. Sorry for the delay...

Friday, February 8, 2008

Left-Shifting

Could use a quick reality check on the definition of left-shift (q6b); there seems to be more than one consistent interpretation.

What would the following set become with i=1, j=3?
{ 011, 101, 111, 001 }

Thanks. :-)

Thursday, February 7, 2008

Lecture 8: Hardness of Max-Ek-Independent-Set

Any questions about today's lecture? I will repeat here the hardness reduction:

Given: Max-Label-Cover(K,L) instance $G = (U,V,E)$ with projection constraints $\pi_{v \to u}$.

Reduction to a weighted k-uniform hypergraph $H$:
. The vertex set is $V \times \{0,1\}^L$ (there are $|V| 2^{|L|}$ vertices). We call each $\{v\} \times \{0,1\}^L$ the "block" of vertices corresponding to $v \in V$. We name the strings/sets in this block $A^{(v)}$.
. The weight on each vertex $A^{(v)}$ is its $p$-biased weight, $p = 1 - 2/k - \delta$. The total weight in each block is $1$ and the overall total weight is $v$.
. For each pair of vertices $v, v' \in V$ sharing a common $u$ (i.e., with $(u,v), (u,v') \in E$), we will put some hyperedges on vertices in the union of the $v$ and $v'$ blocks.
. Specifically, we put a hyperedge on the $k$-set $\{A_1^{(v)}, ..., A_{k/2}^{(v)}, B_{1}^{(v')}, ... B_{k/2}^{(v')}\}$ iff the following condition holds:

$\pi_{v\to u}(\cap A_{i}^{(v)})$ is disjoint from $\pi_{v' \to u}(\cap B_{i}^{(v')})$.

(Supertechnical note. The A's should all be distinct, and the B's should all be distinct. Also, we allow $v = v'$, in which case the A's and B's should be mutually distinct.)

This construction is polynomial time (since $k$, $|K|$, $|L|$ are all constants; supersupertechnical note: we assume $\delta$ is rational).

You should again try to convince yourself of the "completeness" part of the proof: Opt(G) = 1 implies Opt(H) $\geq p = 1 - 2/k - \delta$.

Wednesday, February 6, 2008

Solution to 7a on Homework 1

Here is a solution to problem 7a on homework 1. To get the full $1 - 1/(k-1) - \delta$ vs. $\epsilon$ hardness result for Max-Ek-Independent-Set (a problem we will discuss in tomorrow's class), you need to use something similar, and that something similar might be on the next homework...

Monday, February 4, 2008

How to do a hardness reduction

Just a quick reminder on how to do a hardness reduction. It may seem obvious, but these reductions are going to get even more complicated later, and there was already some confusion on the homework.

To show c' vs. s' NP-hardness for problem B, based on known c vs. s NP-hardness for problem A:

1. Design a polynomial time reduction R mapping A-instances into B-instances.
2. Show that R can be carried out in polynomial time.
3. Show that for any A-instance X, Opt(X) $\geq$ c implies Opt(R(X)) $\geq$ c'.
4. Show that for any A-instance X, Opt(X) $\lt$ s implies Opt(R(X)) $\lt$ s'.

Done.

Comments on each of the steps:

1. This is generally the second-hardest part.

2. This is generally trivial.

3. This is generally true "by design". One typically shows that given any solution x to A-instance X with $val_X$(x) $\geq c$, one can convert it to a solution y for R(X) with $val_{R(X)}$(y) $\geq c'$.

Usually this conversion is explicit and easy, but please don't bother to note whether or not it is polynomial-time -- it doesn't matter. Noting this only confuses things.

Furthermore, especially in the case that c = c' = 1, it is usually easy to prove some kind of "reverse", like, if Opt(R(X)) = 1 then Opt(X) must also be 1. But again, this is irrelevant and it's best not to write it without having a good reason.

4. This is generally the hardest part. One almost always proves it in the contrapositive: If Opt(R(X)) $\geq s'$, then Opt(X) must be $\geq s$. Generally, one does this by showing how to convert any solution y for R(X) with $val_{R(X)}$(y) $\geq s'$ into a solution x for X with $val_{X}$(x) $\geq s$.

Again, it does not matter if this conversion is polynomial-time; like in step 3, one only needs to show an existence result. (Indeed, often the conversion is randomized.)

Saturday, February 2, 2008

Homework 1 graded

I'll bring them around on Tuesday. The average was 80%.

Some comments on the problems:

1. Some people made the unwarranted assumption that A running in a reasonable amount of time is independent of A outputting a solution of value at least s.

2. Most people got this completely right; the only interesting part was to see how slickly you could prove (b). People were occasionally a little unclear on why their random assignment in (d)
satisfied a constraint with probability >= 1/|K|.

3. Throughout, people were sometimes unclear on what exactly one needs to do to prove a hardness reduction. I think I will post a short note about this in a future blog entry.

Also, in the first part of (a) you needed to be showing, say, 1/3 vs. $(1/3)(1-\epsilon_0)$ hardness for Independent-Set, not 1 vs. $1 - \epsilon_0$. The latter decision problem is completely trivial -- just output 'yes' if the input graph has no edges!

People were also frequently failed to clearly explain what graph product they were using.

4. Most people did pretty well on this one -- thanks to a couple of people for pointing out I should have said 'reduce from Hamiltonian-Cycle'. In part (b), a couple of people added a star rather than what I and most others thought of -- disconnected edges; cute. Only one person was heroic and honest enough to point out that you have to deal with integrality issues vis-a-vis 1/delta when you're adding your stars/edges. I didn't take off any points for the rest of you here.

5a) One cute aspect here is that although from the analysis (repeat Opt $\cdot \ln(n/Opt)$ times to get down to Opt left, then pick Opt more times), it looks like the algorithm might have to "guess" Opt, it doesn't actually -- although the analysis changes, the actual algorithm is the same.

6. No one explained why the sum of the prices equals the cost of the algorithm's solution. It's only about a sentence, but it's still important. I ended up not taking off points for this though.

7. Only one person tried 7a (and got it somewhat right), even though I think it's much shorter than 7b (which does have a lot of painful arithmetic in it). I'll post a solution for it later.

Tuesday, January 29, 2008

Homework 2

Homework 2 is up on the course web page: I will also put a bunch of copies outside Nicole Stenger's office (Wean 4116) that you could pick up....

Example for Primal-Dual fixed

The bad example for the primal-dual algorithm that I cooked up in class did not give the large gap I was hoping for, though going in the right direction. Instead, consider the following example:

There are $2n$ clients and $n$ potential facilities. The first $n$ of the clients (say $v_1, v_2, \ldots, v_n$) are at distance 1 to all the facilities, we call these communal clients.

For the other $n$ clients: client $v_{n+i}$ is at distance $1$ to facility $u_i$ and at distance 3 to all other facilities. (We say client $v_{n+i}$ is facility $u_i$'s private client.)

The first facility $u_1$ costs $n+1$, and all the others cost $n+2$.

If we raise duals uniformly, then at time $t=2$, the first facility $u_1$ will become tentatively opened, and all the communal clients, and private client $u_{n+1}$ become frozen. But the other private clients will continue to raise their duals, and at time $t=3$, all the remaining facilities will also become tentatively open.

Note that the cost of actually opening all these tentatively open facilities would be $(n+1) + (n-1)(n+2) = \Omega(n^2)$, whereas the total dual value generated is only $O(n)$.

Hence the clean-up step is needed.

Facility Location -- no joke

Facility Location is one of the theory problems studied most extensively in practice. It has obvious appeal in industry and is the subject of numerous books and optimization heuristics.

I especially like how the special case of the "Fermat-Weber problem" in the plane was already studied for its use in practice (minimizing industrial transportation cost) in 1909.

Monday, January 28, 2008

Some more LP notes

Here is the chapter dealing with LP duality from the textbook Understanding and Using Linear Programming by Jiří Matoušek and Bernd Gärtner.

Two caveats: the file is somewhat large (4 megs), and is accessible only from cmu.edu web addresses.

Saturday, January 26, 2008

Notes on Linear Programming

You may want to brush up on your linear programming; LP duality will be important for tomorrow's lecture.

For a relatively quick (40 pages) summary of the topic I usually go to Goemans's lecture notes. However these do not include the Ellipsoid Algorithm, which is of great theoretical importance for showing linear programming (and extensions, e.g., semidefinite programming) is in P.

For more details, Avner Magen's lecture notes look pretty thorough.

Nina and Avrim's problem

Nina just told me a nice approximation algorithms problem she has written about with Avrim -- "Graph Vertex Pricing".

Stripping out the econ motivation, the problem is as follows:

Input: an undirected graph with nonnegative weights $w(u,v)$ on the edges.

Output: a nonnegative "price" $p(v)$ on each vertex.

Value: For each edge $(u,v)$ such that $p(u) + p(v) \leq w(u,v)$, you gain $p(u) + p(v)$. Otherwise you gain $0$ for that edge.

Determining $Opt$ is NP-hard, and it's also known [Guruswami-Hartline-Karlin-Kempe-Kenyon-McSherry] to be "APX-hard"; i.e., there is no PTAS unless P = NP.

Nina and Avrim have a paper giving a factor-$4$ approximation algorithm (among many other things). You might want to see if you can think of this algorithm yourself (hint: reduce to the bipartite case; lose a factor of 2 twice).

This seems to me like a very nice problem to work on -- improving either the algorithm or the hardness. One tricky aspect of it is that no one can think of a LP relaxation...

Thursday, January 24, 2008

You too can post to the blog

If you didn't sign up in the first class but want to post to the blog, just send us an email.

(Excellent usage of this feature by Jeremiah in clarifying homework problem 3, for example.)

Homework 1 problem 4b correction

This should say "... for all constant $c \lt 5/4$..." not $\gt$. Corrected in the .pdf on the course page. (Thanks Kanat.)

Homework 1 Clarification

For problem 3 we use Max-E3SAT-6. When it says that each variable appears in exactly 6 clauses does this mean that (x, not x) each appear 6 in six clauses or are they not considered separate variables?

Wednesday, January 23, 2008

Scribes and Math Writing

Please read the first few pages of the Knuth, Larrabee and Roberts book on Mathematical Writing [PDF] before starting to type up the scribe notes: the list of dos and don'ts is particularly useful to look over...

Tuesday, January 22, 2008

Lecture 3: 1 vs. 3/4 + eps hardness for Max-Coverage: now with bonus endgame calculations

Any questions about the lecture can go here. I'll also show how to finish off the calculations.

First, there was one small extra idea that I didn't have time to mention: It uses the fact that the function $1 - 2^{-t}$ is a concave function of $t$. Recall that the overall average number of suggestions per edges is $2$. We also know that edges with t total suggestions but no consistent suggestions have coverage at most $1 - 2^{-t}$. We want that they have overall coverage at most $1 - 2^{-2}$. We need to check that the Suggester can't "cheat" by achieving awesome coverage on some edges by "spending" a lot of suggestions, overcoming bad coverage on edges where few suggestions were spent.

This indeed follows from concavity. As an example, suppose the Suggester is labeling 2 edges inconsistently, and it has 4 labels to spend (average of 2, as needed). It could spend 2 on each and achieve avg{$1 - 2^{-2}$, $1 - 2^{-2}$) $= 3/4$. Or it could try spending 3 on one and 1 on the other. But this only achieves avg{$1 - 2^{-3}$, $1 - 2^{-1}$} = avg{7/8, 1/2} $\lt$ 3/4.

With that idea noted, we come to:

Formal endgame of proof:

Define an edge $(u,v)$ to be "frugally suggested" if $|Sugg(u,v)| \leq 10/\epsilon.$ Next, define an edge to be "good" if it is both frugally and consistently suggested.

Suppose the fraction of good edges $(u,v)$ is at least $\epsilon/20$. We argue that this implies E$[val_{\mathcal{G}}(f)] \geq \eta = : \epsilon^3/2000$, as desired. Supposing $(u,v)$ is good, it has consistent suggestions and hence there exist $a \in Sugg(u)$ and $\alpha \in Sugg(v)$ such that the labeling $(a, \alpha)$ satisfies the constraint on $(u,v)$ (that is, $\pi_{v \rightarrow u}(\alpha) = a$). Further, since $(u,v)$ is good, it is frugally suggested, so both $|Sugg(u)|$ and $|Sugg(v)|$ are at most $10/\epsilon$. Thus when $f$ is randomly chosen, there is at least an $(\epsilon/10)^2$ chance that it will get both $f(u) = a$ and $f(v) = \alpha$, thus satisfying $(u,v)$. Hence in expectation, $f$ satisfies at least an $(\epsilon/20)(\epsilon/10)^2 = \eta$ fraction of edges $(u,v)$, as desired.

It remains to show that indeed the fraction of good edges cannot be less than $\epsilon/20$. If this were the case, then for a randomly chosen edge $(u,v)$,

P$[(u,v)$ consistent suggestions$]$

$\leq P$[(u,v)$ good$] +$ P$[(u,v)$ not frugally sugg'd$]$

$\lt \epsilon/20 +$ P$[(u,v)$ more than $10/\epsilon$ sugg'd labels$]$

$\leq \epsilon/20 + 2(\epsilon/10) $

$ = \epsilon/4,$

where the second-to-last step used Markov's inequality based on the fact that the average number of suggested labels per edge is exactly $2$.

Even though the consistently suggested edges have perfect coverage, when their fraction is only $\epsilon/4$, the fact that the overall coverage is assumed to be at least $3/4 + \epsilon$ implies that the inconsistently suggested edges have average coverage at least $3/4 + (3/4)\epsilon$. Our claim implied that an inconsistently suggested edge with $t$ suggested labels has coverage at most $1 - 2^{-t}$. The average number of labels per inconsistently suggested edge is at most $2/(1 - \epsilon/4) \leq 2 + \epsilon$, where we again used that the fraction of consistently suggested edges is at most $\epsilon/4$. Thus by concavity of the function $1 - 2^{-t}$, the average coverage of ground elements on inconsistently suggested edges is at most

$1 - 2^{-(2 + \epsilon)} = 1 - \frac{1}{4} e^{-(\ln 2) \epsilon} \leq 1 - \frac{1}{4} (1 - (\ln 2)\epsilon) = 3/4 + .17\epsilon \lt 3/4 + (3/4)\epsilon$

a contradiction.

Monday, January 21, 2008

Prepare for Tuesday's class

Tomorrow we will cover $1$ vs. $3/4 + \epsilon$ hardness for Max-Coverage. (The extension to $1$ vs. $1 - 1/e + \epsilon$) is on your homework.) The class will be somewhat fast-paced, so please come prepared. Specifically, please review the definition of Max-Label-Cover from the problem list handout; also, solve (or at least, get started on) problems 3a, 3b, and 4c from the homework.

LP integrality gaps for Min-Set-Cover

As promised, herein I will discuss "LP integrality gap instances" for Set-Cover. Recall that these are instances $\mathcal{I}$ for which $Lp(\mathcal{I})$ is small but $Opt(\mathcal{I})$ is large.

Let's start with one that looks quite similar to the LP+Randomized-Rounding algorithmic gap we saw in class:

Ground elements: all nonzero elements of $\{0,1\}^q$ -- thought of as $\mathbb{F}_2^q$, the $q$-dimensional vector space over the field of size 2. There are $n = 2^q - 1$ ground elements.

Sets: For each $\alpha \in \mathbb{F}_2^q$ we have a set $S_\alpha = \{ e \in \mathbb{F}_2^q \setminus \{0\} : \alpha \cdot e = 1\}$. Here $\cdot$ denotes the usual dot product in $\mathbb{F}_2^q$: $\alpha_1 e_1 + ... + \alpha_q e_q$ (mod 2). There are $M = 2^q$ sets. (The set $S_0$ is kind of pathetic, being the empty set, but never mind.)

Analysis: Each ground element is contained in exactly half of the sets. (Standard undergrad probability exercise.) It follows that in an LP solution, if we take each set fractionally to the extent $2/M$, then each element is fractionally covered to the extent $1$.

Hence $Lp(\mathcal{I}) \leq M (2/M) = 2$. (In fact, $Lp$ is less than this...)

On the other hand, we claim $Opt(\mathcal{I}) \geq q$. Suppose by way of contradiction that there are some $q-1$ sets $S_{\alpha_1}, ..., S_{\alpha_{q-1}}$ covering all ground elements. So

$S_{\alpha_1} \cup ... \cup S_{\alpha_{q-1}} = \mathbb{F}_2^q \setminus \{0\}$

$\Leftrightarrow \bigcap_{i=1}^q \overline{S_{\alpha_i}} = \emptyset$

$\Leftrightarrow \bigcap_{i=1}^q \{e \in \mathbb{F}_2^q : \alpha_i \cdot e = 0 \} = \{0\}$.

But each set in the intersection above is a hyperplane in $\mathbb{F}_2^q$. This gives the contradiction, because in dimension $q$, you can't have $q-1$ hyperplanes intersecting on just the point $0$.

Conclusion: This instance is a $2$ vs. $\log_2$ $(n+1)$ integrality gap for Set-Cover. This yields a ratio of $\Omega(\ln n)$, but not the full $(1-\epsilon) \ln n$ we've come to expect.

--------------------

Comparing with the way things have gone before, I bet you think I'm going to now tell you a straightforward way to extend this to get the full $(1 - \epsilon) \ln n$ gap, perhaps by looking at an instance with ground elements $\mathbb{F}_k^n$.

Well, I'm here to tell you that I don't know how to do this. Quite likely it's possible, but I couldn't see how. I'd be very very interested if one of you could give such an example.

--------------------

So is there another kind of integrality gap instance? Or perhaps $Lp$ and $Opt$ can never be off by more than a $(\log_2 n)/2$ factor...

Since Feige proved that the $(1-\eps) \ln n$ decision problem is hard, we know that the former must be the case. (Otherwise, the efficient algorithm of computing and outputting $Lp$ would contradict his theorem.) Here is one way to show it:

Fix an abstract set of ground elements, $\Omega$, of cardinality $n$. Introduce sets $S_1, ..., S_M$ by picking them randomly (and indendently). For each $S_i$, we choose it by including each element of $\Omega$ with probability $1/k$. We will take $M = (C k^2) \log n$ for a large $C$ to be named later.

Analysis: Each element is expected to be contained in about a $1/k$ fraction of the sets. If we take $C$ large enough, then a Chernoff bound + union bound will imply that with high probability, every element is contained in at least a $(1/k)(1 - \epsilon)$ fraction of sets. (Here we can make $\epsilon$ as small as we like by making $C$ large.) Thus we can get a valid fractional solution to the LP by taking each set to the extent $(k/M)/(1-\epsilon)$; this solution has value $k/(1-\epsilon)$. Summarizing: with high probability, $Lp \leq k/(1 - \epsilon)$.

We now would like to show that with high probability, $Opt \geq t := (1-\epsilon) k \ln n$. This will give us an $Opt/Lp$ integrality gap of $(1-\epsilon)^2 \ln n$, which is good enough.

To see this, we fix any set of $t$ out of $M$ set indices and show that the probability these sets end up being a valid cover is very small. If the probability is significantly smaller even than $1/{M \choose t}$ then we can union bound over all possible size-$t$ covers and be done.

Now each element $e$ has a $1-1/k$ chance of being uncovered by a given set, and thus our fixed $t$ sets have a $(1 - 1/k)^t$ chance of leaving $e$ uncovered. With our choice of $t$, this quantity is essentially $\exp(-(1-\epsilon) \ln n) = 1/n^{1-\epsilon}$ (slight cheat here as the inequality goes the wrong way, but one can fix this easily). The events that the $t$ sets cover each element are actually independent across elements. So the probability that all elements are covered is at most $(1 - 1/n^{1-\epsilon})^n \leq \exp(-n^{\epsilon})$. This is indeed much less than $1/{M \choose t}$, since this latter quantity only is of the order $1/(\log n)^{O(\log n)}$ (assuming $k$ and $\epsilon$ are "constants"), much smaller than $\exp(-n^{\epsilon})$.

Lecture 1 and Lecture 2 scribe notes posted

On the course web page.

Sunday, January 20, 2008

Why decision and search are different for approximation problems

We don't usually think too much about the distinction between decision and search problems for exact NP-optimization problems. But the distinction is relevant when approximation is involved. Here is a problem & solution excised from Homework 1, illustrated the distinction:

1. Decision to search.

a) Suppose the exact decision version of Max-$3$Sat is in P i.e., for any $c$, the $c$ vs. $c$ decision problem is in P. Show that the exact search version of Max-$3$Sat is also in P; i.e., there is a polynomial time algorithm which, given a $3$Sat formula $\phi$, finds an assignment satisfying a maximal number of clauses.

b) Explain the difficulty in showing an analogous reduction from a $.9$-factor decision algorithm (i.e., $c$ vs. $.9c$ decision, for any $c$) to a $.9$-factor search algorithm.

Solution.

a) If $\phi$ has $m$ clauses, it's easy to use binary search to identify the value of the optimum using $\log_2 m$ calls to the decision algorithm. (Or, one could be naive and simply use $m$ calls.) Having found $Opt$, we now find an optimum solution as follows: Run the decision algorithm on $(\phi_0, Opt)$ and $(\phi_1, Opt)$, where $\phi_b$ denotes the $3$Sat formula gotten by fixing $x_1$ to $b$ ($b = 0, 1$) and simplifying. At least one of these must answer `yes'; say $b_1$. Fix $x_1 = b_1$, replace $\phi$ by $\phi_{b_1}$, and now repeat the process with $x_2$, $x_3$, etc. After $2n$ calls to the decision algorithm we get an assignment $(b_1, ..., b_n)$ satisfying an $Opt$ fraction of the constraints.

b) When you call the $.9$-factor decision algorithm with $\alpha$, a `yes' answer implies the optimum is between $.9\alpha$ and $\alpha$, while a `no' answer implies the optimum is less than $\alpha$. By searching, you can find a value $\beta$ such that $.9\beta \leq Opt \leq \beta$. The hope would be to use the same reduction as before to find an assignment with value in this range. Let's assume we decide to fix the variables in some order with no backtracking -- if we were to use backtracking, there's no way we could guarantee polynomial time. For notational simplicity, assume that as before we first try fixing $x_1$. We now might as well try running the $.9$-factor decision algorithm on $(\phi_0, \alpha)$ and $(\phi_1, \alpha)$ for all possible $\alpha$.

It is perfectly possible that $Opt = .9\beta$, every assignment achieving $.9\beta$ requires $x_1 = 0$, but there exists an assignment with $x_1 = 1$ satisfying $.9\beta m - 1$ clauses. Further, in this case, it might hold that the decision algorithm says `no' to $(\phi_0, \alpha)$ for all $\alpha > .9\beta$ whereas it says `yes' to $(\phi_1, \alpha)$ for all $\alpha < .99\beta$. Now how could we fail to make the wrong choice, $x_1 = 1$?

Thursday, January 17, 2008

Lecture 2: Algorithms and gaps for Set-Cover and Coverage

Questions/etc. about the lecture can go here.

I will discuss integrality gap instances for Set-Cover in a later post. I may also clarify the randomized rounding analysis for Max-Coverage. (Or -- it would be nice if someone else in class wanted to post something on this :)

For now, I'll quickly give you the detailed probabilistic analysis of randomized rounding for Set-Cover.

Recall that if we repeat the randomized rounding step $t$ times, the expected overall cost is at most $t \cdot$ Opt. (I think I said it exactly equals $t \cdot$ Opt in class but this is not quite right because you could pick the same set more than once in different rounding stages -- and you only have to pay for it once.) We will select $t = \ln n + C \ln \ln n$ for some large constant $C$.

We now observe that from Markov's inequality, the probability our output costs more than $(1 + 2/\ln n) \cdot t \cdot$ Opt is at most $1/(1 + 2/\ln n)$ which is about $1 - 2/\ln n$. Rigorously, it's less than $1 - 1/\ln n$. Multiplying out the $(1 + 1/\ln n) \cdot t$, we conclude:

(*) Pr[costs at most $(\ln n + O(\ln \ln n)) \cdot$ Opt] $\geq 1/\ln n$.

Now on the other hand, as we saw, after our $t$ rounds, the probability a particular $e$ is uncovered is at most $\exp(-t) = 1/(n \ln^C$ $n)$. Union-bounding over all elements, we get:

(**) Pr[solution is invalid] $\leq 1/\ln^C$ $n$.

Taking $C$ large (even $C = 2$ is okay, I guess) and combining (*) and (**) we conclude:

(***) Pr[valid solution of cost at most $(\ln n + O(\ln \ln n)) \cdot$ Opt] $\geq 1 / \ln n - 1/\ln^2$ $n \geq \Omega(1/\ln n)$.

We may now use Problem 1a on Homework 1.

Max-Coverage vs. Min-Set-Cover

A quick note to say that these problems are very closely related. Set-Cover is of course more famous, but in quite a few cases the analysis and understanding of Max-Coverage is simpler. In the lectures I'll actually switch between them to make my points.

Formally, there's a straightforward reduction from Min-Set-Cover to Max-Coverage. (A reduction in the opposite direction is not obvious to me: can anybody think of one?) I was going to make this a homework problem but then I forgot.

Specifically:

Proposition: Suppose there is a 1 vs. 1 - $\gamma$ search algorithm for Max-Coverage. Then there is a $\lceil \log_{(1/\gamma)}$ $n \rceil$ factor search algorithm for Min-Set-Cover.

Remark: So if we got our 1 vs. 1 - 1/e algorithm for Max-Coverage via Greedy and somehow failed to notice the ln n algorithm for Min-Set-Cover, this reduction would give it. Additionally, if we prove ln n hardness for Min-Set-Cover, this reduction automatically gives 1 vs. 1 - 1/e hardness for the search version of Max-Coverage. Unfortunately, it's a little easier to prove hardness for Max-Coverage, and that's what we'll see in next Tuesday's class. (To compensate a tiny bit, we'll at least prove the 1 vs. 1 - 1/e decision version of Max-Coverage is hard.)

Proof: Suppose we have a Min-Set-Cover instance with optimum Opt. Since 1 $\leq$ Opt $\leq n$, we can "guess" Opt's value in polynomial time. [Exercise: at the end of the proof, explain this rigorously.] Now run the 1 vs. 1 - $\gamma$ search algorithm for Max-Coverage with "m" = Opt. By assumption, this Max-Coverage instance has value 1, so we will get a collection of Opt sets which leave less than a $\gamma$ fraction of ground elements uncovered.

"Delete" all the covered elements from the instance, simplifying it. Now there are still (at most) Opt sets which cover all the uncovered elements. So we can run the Max-Coverage instance again and get Opt more sets which reduce the fraction of uncovereds down to $\gamma^2$.

Ergo etc., as they used to say. If we do $t$ iterations of this we will have chosen $t \cdot$ Opt sets and we can stop once $\gamma^t \lt 1/n$; i.e., $t \gt \log_{(1/\gamma)}$ $n$.

Wednesday, January 16, 2008

Min-R^2-TSP

Did anybody catch why the Min-$\mathbb{R}^2$-TSP problem is not known to be in NP?

Monday, January 14, 2008

Lecture 1: Definitions; greedy algorithm for Set-Cover & Max-Coverage

Any questions or comments about Lecture 1 can go here.

I also wanted to include just a little bit of my own opinion on why studying approximation algorithms is worthwhile.

The "usual" motivation given is the following: In practice we are faced with NP-hard problems, and we nevertheless want to do something besides just give up. This is definitely true. The motivation suggests that running an approximation algorithm is a good idea -- they have worst-case performance guarantees! This is partly true, but it's not the whole story.

As an example, at one point in late 2001, the best known approximation algorithm for "Uncapacitated Metric Facility Location" (sounds complicated, but it's actually a very natural problem studied in industry; we'll discuss it later in this class) was a 1.73-factor approximation [Guha-Charikar]. It used an unholy combination of linear programming, primal-dual methods, and greedy methods. It's doubtful anyone ever ran it. At the same time, there was a relatively simple, quadratic time greedy algorithm achieving a 1.86-factor approximation [Mahdian-Markakis-Saberi-V.Vazirani]. One would be hard-pressed to say that the 1.73-factor algorithm was a better heuristic in practice for the problem. (The current best known algorithm is 1.52-factor [Mahdian-Ye-Zhang] and is combinatorial.)

On the other hand, take Set Cover. Even though we know that the Greedy algorithm only achieves somewhat bad performance -- a ln n factor -- we would and do run it anyway.

So breakthroughs in the analysis of a problem's approximability don't necessarily help you out at all in practice.

(I should emphasize, though, that sometimes they do: for example, the Goemans-Williamson .878-factor Max-Cut algorithm has had a huge impact on practice; not because it is a .878-factor approximation algorithm but because it gave powerful evidence in favor of an algorithmic technique (semidefinite programming) which is today a key component in practical Max-Cut algorithms.)

Regardless of the "heuristics for practice" motivation, there are additional reasons to study approximability:

1. It helps you understand "hard instances"; for example, what properties of Set-Cover instances make them hard? What properties make them easy? Studying approximation algorithms for the problem usually reveals this and helps you design algorithm for special cases. See also Luca Trevisan's opinion on the whys of approximation algorithms.

2. It tells you something about P vs. NP. (This is my person reason for studying approximability.) Take Max-Cut for example. We know that .878-approximating it is in P. We also know [Håstad-Trevisan-Sorkin-Sudan-Williamson] that .942-approximating it is NP-hard. What about the algorithmic problem of .9-approximating Max-Cut: is it in P or is it NP-hard? No one knows. In fact, experts have contradictory guesses. And this is for Max-Cut, which I'd argue is the simplest possible NP optimization problem. How can such a simple problem evade being classified as being in P or NP-hard? I find this to be an even more galling situation than the unclear status of Factoring (which most people at least guess is not in P) and Graph-Isomorphism (which most people at least guess is in P).