15-854B: Advanced Approximation Algorithms: January 2008

Tuesday, January 29, 2008

Homework 2

Homework 2 is up on the course web page: I will also put a bunch of copies outside Nicole Stenger's office (Wean 4116) that you could pick up....

The bad example for the primal-dual algorithm that I cooked up in class did not give the large gap I was hoping for, though going in the right direction. Instead, consider the following example:

There are $2n$ clients and $n$ potential facilities. The first $n$ of the clients (say $v_1, v_2, \ldots, v_n$) are at distance 1 to all the facilities, we call these communal clients.

For the other $n$ clients: client $v_{n+i}$ is at distance $1$ to facility $u_i$ and at distance 3 to all other facilities. (We say client $v_{n+i}$ is facility $u_i$'s private client.)

The first facility $u_1$ costs $n+1$, and all the others cost $n+2$.

If we raise duals uniformly, then at time $t=2$, the first facility $u_1$ will become tentatively opened, and all the communal clients, and private client $u_{n+1}$ become frozen. But the other private clients will continue to raise their duals, and at time $t=3$, all the remaining facilities will also become tentatively open.

Note that the cost of actually opening all these tentatively open facilities would be $(n+1) + (n-1)(n+2) = \Omega(n^2)$, whereas the total dual value generated is only $O(n)$.

Hence the clean-up step is needed.

Facility Location -- no joke

Facility Location is one of the theory problems studied most extensively in practice. It has obvious appeal in industry and is the subject of numerous books and optimization heuristics.

I especially like how the special case of the "Fermat-Weber problem" in the plane was already studied for its use in practice (minimizing industrial transportation cost) in 1909.

Monday, January 28, 2008

Some more LP notes

Here is the chapter dealing with LP duality from the textbook Understanding and Using Linear Programming by Jiří Matoušek and Bernd Gärtner.

Two caveats: the file is somewhat large (4 megs), and is accessible only from cmu.edu web addresses.

Saturday, January 26, 2008

Notes on Linear Programming

You may want to brush up on your linear programming; LP duality will be important for tomorrow's lecture.

For a relatively quick (40 pages) summary of the topic I usually go to Goemans's lecture notes. However these do not include the Ellipsoid Algorithm, which is of great theoretical importance for showing linear programming (and extensions, e.g., semidefinite programming) is in P.

For more details, Avner Magen's lecture notes look pretty thorough.

Nina and Avrim's problem

Nina just told me a nice approximation algorithms problem she has written about with Avrim -- "Graph Vertex Pricing".

Stripping out the econ motivation, the problem is as follows:

Input: an undirected graph with nonnegative weights $w(u,v)$ on the edges.

Output: a nonnegative "price" $p(v)$ on each vertex.

Value: For each edge $(u,v)$ such that $p(u) + p(v) \leq w(u,v)$, you gain $p(u) + p(v)$. Otherwise you gain $0$ for that edge.

Determining $Opt$ is NP-hard, and it's also known [Guruswami-Hartline-Karlin-Kempe-Kenyon-McSherry] to be "APX-hard"; i.e., there is no PTAS unless P = NP.

Nina and Avrim have a paper giving a factor-$4$ approximation algorithm (among many other things). You might want to see if you can think of this algorithm yourself (hint: reduce to the bipartite case; lose a factor of 2 twice).

This seems to me like a very nice problem to work on -- improving either the algorithm or the hardness. One tricky aspect of it is that no one can think of a LP relaxation...

Thursday, January 24, 2008

You too can post to the blog

If you didn't sign up in the first class but want to post to the blog, just send us an email.

(Excellent usage of this feature by Jeremiah in clarifying homework problem 3, for example.)

Homework 1 problem 4b correction

This should say "... for all constant $c \lt 5/4$..." not $\gt$. Corrected in the .pdf on the course page. (Thanks Kanat.)

Homework 1 Clarification

For problem 3 we use Max-E3SAT-6. When it says that each variable appears in exactly 6 clauses does this mean that (x, not x) each appear 6 in six clauses or are they not considered separate variables?

Wednesday, January 23, 2008

Scribes and Math Writing

Please read the first few pages of the Knuth, Larrabee and Roberts book on Mathematical Writing [PDF] before starting to type up the scribe notes: the list of dos and don'ts is particularly useful to look over...

Tuesday, January 22, 2008

Lecture 3: 1 vs. 3/4 + eps hardness for Max-Coverage: now with bonus endgame calculations

Any questions about the lecture can go here. I'll also show how to finish off the calculations.

First, there was one small extra idea that I didn't have time to mention: It uses the fact that the function $1 - 2^{-t}$ is a concave function of $t$. Recall that the overall average number of suggestions per edges is $2$. We also know that edges with t total suggestions but no consistent suggestions have coverage at most $1 - 2^{-t}$. We want that they have overall coverage at most $1 - 2^{-2}$. We need to check that the Suggester can't "cheat" by achieving awesome coverage on some edges by "spending" a lot of suggestions, overcoming bad coverage on edges where few suggestions were spent.

This indeed follows from concavity. As an example, suppose the Suggester is labeling 2 edges inconsistently, and it has 4 labels to spend (average of 2, as needed). It could spend 2 on each and achieve avg{$1 - 2^{-2}$, $1 - 2^{-2}$) $= 3/4$. Or it could try spending 3 on one and 1 on the other. But this only achieves avg{$1 - 2^{-3}$, $1 - 2^{-1}$} = avg{7/8, 1/2} $\lt$ 3/4.

With that idea noted, we come to:

Formal endgame of proof:

Define an edge $(u,v)$ to be "frugally suggested" if $|Sugg(u,v)| \leq 10/\epsilon.$ Next, define an edge to be "good" if it is both frugally and consistently suggested.

Suppose the fraction of good edges $(u,v)$ is at least $\epsilon/20$. We argue that this implies E$[val_{\mathcal{G}}(f)] \geq \eta = : \epsilon^3/2000$, as desired. Supposing $(u,v)$ is good, it has consistent suggestions and hence there exist $a \in Sugg(u)$ and $\alpha \in Sugg(v)$ such that the labeling $(a, \alpha)$ satisfies the constraint on $(u,v)$ (that is, $\pi_{v \rightarrow u}(\alpha) = a$). Further, since $(u,v)$ is good, it is frugally suggested, so both $|Sugg(u)|$ and $|Sugg(v)|$ are at most $10/\epsilon$. Thus when $f$ is randomly chosen, there is at least an $(\epsilon/10)^2$ chance that it will get both $f(u) = a$ and $f(v) = \alpha$, thus satisfying $(u,v)$. Hence in expectation, $f$ satisfies at least an $(\epsilon/20)(\epsilon/10)^2 = \eta$ fraction of edges $(u,v)$, as desired.

It remains to show that indeed the fraction of good edges cannot be less than $\epsilon/20$. If this were the case, then for a randomly chosen edge $(u,v)$,

P$[(u,v)$ consistent suggestions$]$

$\leq P$[(u,v)$ good$] +$ P$[(u,v)$ not frugally sugg'd$]$

$\lt \epsilon/20 +$ P$[(u,v)$ more than $10/\epsilon$ sugg'd labels$]$

$\leq \epsilon/20 + 2(\epsilon/10) $

$ = \epsilon/4,$

where the second-to-last step used Markov's inequality based on the fact that the average number of suggested labels per edge is exactly $2$.

Even though the consistently suggested edges have perfect coverage, when their fraction is only $\epsilon/4$, the fact that the overall coverage is assumed to be at least $3/4 + \epsilon$ implies that the inconsistently suggested edges have average coverage at least $3/4 + (3/4)\epsilon$. Our claim implied that an inconsistently suggested edge with $t$ suggested labels has coverage at most $1 - 2^{-t}$. The average number of labels per inconsistently suggested edge is at most $2/(1 - \epsilon/4) \leq 2 + \epsilon$, where we again used that the fraction of consistently suggested edges is at most $\epsilon/4$. Thus by concavity of the function $1 - 2^{-t}$, the average coverage of ground elements on inconsistently suggested edges is at most

$1 - 2^{-(2 + \epsilon)} = 1 - \frac{1}{4} e^{-(\ln 2) \epsilon} \leq 1 - \frac{1}{4} (1 - (\ln 2)\epsilon) = 3/4 + .17\epsilon \lt 3/4 + (3/4)\epsilon$

a contradiction.

Monday, January 21, 2008

Prepare for Tuesday's class

Tomorrow we will cover $1$ vs. $3/4 + \epsilon$ hardness for Max-Coverage. (The extension to $1$ vs. $1 - 1/e + \epsilon$) is on your homework.) The class will be somewhat fast-paced, so please come prepared. Specifically, please review the definition of Max-Label-Cover from the problem list handout; also, solve (or at least, get started on) problems 3a, 3b, and 4c from the homework.

LP integrality gaps for Min-Set-Cover

As promised, herein I will discuss "LP integrality gap instances" for Set-Cover. Recall that these are instances $\mathcal{I}$ for which $Lp(\mathcal{I})$ is small but $Opt(\mathcal{I})$ is large.

Let's start with one that looks quite similar to the LP+Randomized-Rounding algorithmic gap we saw in class:

Ground elements: all nonzero elements of $\{0,1\}^q$ -- thought of as $\mathbb{F}_2^q$, the $q$-dimensional vector space over the field of size 2. There are $n = 2^q - 1$ ground elements.

Sets: For each $\alpha \in \mathbb{F}_2^q$ we have a set $S_\alpha = \{ e \in \mathbb{F}_2^q \setminus \{0\} : \alpha \cdot e = 1\}$. Here $\cdot$ denotes the usual dot product in $\mathbb{F}_2^q$: $\alpha_1 e_1 + ... + \alpha_q e_q$ (mod 2). There are $M = 2^q$ sets. (The set $S_0$ is kind of pathetic, being the empty set, but never mind.)

Analysis: Each ground element is contained in exactly half of the sets. (Standard undergrad probability exercise.) It follows that in an LP solution, if we take each set fractionally to the extent $2/M$, then each element is fractionally covered to the extent $1$.

Hence $Lp(\mathcal{I}) \leq M (2/M) = 2$. (In fact, $Lp$ is less than this...)

On the other hand, we claim $Opt(\mathcal{I}) \geq q$. Suppose by way of contradiction that there are some $q-1$ sets $S_{\alpha_1}, ..., S_{\alpha_{q-1}}$ covering all ground elements. So

$S_{\alpha_1} \cup ... \cup S_{\alpha_{q-1}} = \mathbb{F}_2^q \setminus \{0\}$

$\Leftrightarrow \bigcap_{i=1}^q \overline{S_{\alpha_i}} = \emptyset$

$\Leftrightarrow \bigcap_{i=1}^q \{e \in \mathbb{F}_2^q : \alpha_i \cdot e = 0 \} = \{0\}$.

But each set in the intersection above is a hyperplane in $\mathbb{F}_2^q$. This gives the contradiction, because in dimension $q$, you can't have $q-1$ hyperplanes intersecting on just the point $0$.

Conclusion: This instance is a $2$ vs. $\log_2$ $(n+1)$ integrality gap for Set-Cover. This yields a ratio of $\Omega(\ln n)$, but not the full $(1-\epsilon) \ln n$ we've come to expect.

--------------------

Comparing with the way things have gone before, I bet you think I'm going to now tell you a straightforward way to extend this to get the full $(1 - \epsilon) \ln n$ gap, perhaps by looking at an instance with ground elements $\mathbb{F}_k^n$.

Well, I'm here to tell you that I don't know how to do this. Quite likely it's possible, but I couldn't see how. I'd be very very interested if one of you could give such an example.

--------------------

So is there another kind of integrality gap instance? Or perhaps $Lp$ and $Opt$ can never be off by more than a $(\log_2 n)/2$ factor...

Since Feige proved that the $(1-\eps) \ln n$ decision problem is hard, we know that the former must be the case. (Otherwise, the efficient algorithm of computing and outputting $Lp$ would contradict his theorem.) Here is one way to show it:

Fix an abstract set of ground elements, $\Omega$, of cardinality $n$. Introduce sets $S_1, ..., S_M$ by picking them randomly (and indendently). For each $S_i$, we choose it by including each element of $\Omega$ with probability $1/k$. We will take $M = (C k^2) \log n$ for a large $C$ to be named later.

Analysis: Each element is expected to be contained in about a $1/k$ fraction of the sets. If we take $C$ large enough, then a Chernoff bound + union bound will imply that with high probability, every element is contained in at least a $(1/k)(1 - \epsilon)$ fraction of sets. (Here we can make $\epsilon$ as small as we like by making $C$ large.) Thus we can get a valid fractional solution to the LP by taking each set to the extent $(k/M)/(1-\epsilon)$; this solution has value $k/(1-\epsilon)$. Summarizing: with high probability, $Lp \leq k/(1 - \epsilon)$.

We now would like to show that with high probability, $Opt \geq t := (1-\epsilon) k \ln n$. This will give us an $Opt/Lp$ integrality gap of $(1-\epsilon)^2 \ln n$, which is good enough.

To see this, we fix any set of $t$ out of $M$ set indices and show that the probability these sets end up being a valid cover is very small. If the probability is significantly smaller even than $1/{M \choose t}$ then we can union bound over all possible size-$t$ covers and be done.

Now each element $e$ has a $1-1/k$ chance of being uncovered by a given set, and thus our fixed $t$ sets have a $(1 - 1/k)^t$ chance of leaving $e$ uncovered. With our choice of $t$, this quantity is essentially $\exp(-(1-\epsilon) \ln n) = 1/n^{1-\epsilon}$ (slight cheat here as the inequality goes the wrong way, but one can fix this easily). The events that the $t$ sets cover each element are actually independent across elements. So the probability that all elements are covered is at most $(1 - 1/n^{1-\epsilon})^n \leq \exp(-n^{\epsilon})$. This is indeed much less than $1/{M \choose t}$, since this latter quantity only is of the order $1/(\log n)^{O(\log n)}$ (assuming $k$ and $\epsilon$ are "constants"), much smaller than $\exp(-n^{\epsilon})$.

Lecture 1 and Lecture 2 scribe notes posted

On the course web page.

Sunday, January 20, 2008

Why decision and search are different for approximation problems

We don't usually think too much about the distinction between decision and search problems for exact NP-optimization problems. But the distinction is relevant when approximation is involved. Here is a problem & solution excised from Homework 1, illustrated the distinction:

1. Decision to search.

a) Suppose the exact decision version of Max-$3$Sat is in P i.e., for any $c$, the $c$ vs. $c$ decision problem is in P. Show that the exact search version of Max-$3$Sat is also in P; i.e., there is a polynomial time algorithm which, given a $3$Sat formula $\phi$, finds an assignment satisfying a maximal number of clauses.

b) Explain the difficulty in showing an analogous reduction from a $.9$-factor decision algorithm (i.e., $c$ vs. $.9c$ decision, for any $c$) to a $.9$-factor search algorithm.

Solution.

a) If $\phi$ has $m$ clauses, it's easy to use binary search to identify the value of the optimum using $\log_2 m$ calls to the decision algorithm. (Or, one could be naive and simply use $m$ calls.) Having found $Opt$, we now find an optimum solution as follows: Run the decision algorithm on $(\phi_0, Opt)$ and $(\phi_1, Opt)$, where $\phi_b$ denotes the $3$Sat formula gotten by fixing $x_1$ to $b$ ($b = 0, 1$) and simplifying. At least one of these must answer `yes'; say $b_1$. Fix $x_1 = b_1$, replace $\phi$ by $\phi_{b_1}$, and now repeat the process with $x_2$, $x_3$, etc. After $2n$ calls to the decision algorithm we get an assignment $(b_1, ..., b_n)$ satisfying an $Opt$ fraction of the constraints.

b) When you call the $.9$-factor decision algorithm with $\alpha$, a `yes' answer implies the optimum is between $.9\alpha$ and $\alpha$, while a `no' answer implies the optimum is less than $\alpha$. By searching, you can find a value $\beta$ such that $.9\beta \leq Opt \leq \beta$. The hope would be to use the same reduction as before to find an assignment with value in this range. Let's assume we decide to fix the variables in some order with no backtracking -- if we were to use backtracking, there's no way we could guarantee polynomial time. For notational simplicity, assume that as before we first try fixing $x_1$. We now might as well try running the $.9$-factor decision algorithm on $(\phi_0, \alpha)$ and $(\phi_1, \alpha)$ for all possible $\alpha$.

It is perfectly possible that $Opt = .9\beta$, every assignment achieving $.9\beta$ requires $x_1 = 0$, but there exists an assignment with $x_1 = 1$ satisfying $.9\beta m - 1$ clauses. Further, in this case, it might hold that the decision algorithm says `no' to $(\phi_0, \alpha)$ for all $\alpha > .9\beta$ whereas it says `yes' to $(\phi_1, \alpha)$ for all $\alpha < .99\beta$. Now how could we fail to make the wrong choice, $x_1 = 1$?

Thursday, January 17, 2008

Lecture 2: Algorithms and gaps for Set-Cover and Coverage

Questions/etc. about the lecture can go here.

I will discuss integrality gap instances for Set-Cover in a later post. I may also clarify the randomized rounding analysis for Max-Coverage. (Or -- it would be nice if someone else in class wanted to post something on this :)

For now, I'll quickly give you the detailed probabilistic analysis of randomized rounding for Set-Cover.

Recall that if we repeat the randomized rounding step $t$ times, the expected overall cost is at most $t \cdot$ Opt. (I think I said it exactly equals $t \cdot$ Opt in class but this is not quite right because you could pick the same set more than once in different rounding stages -- and you only have to pay for it once.) We will select $t = \ln n + C \ln \ln n$ for some large constant $C$.

We now observe that from Markov's inequality, the probability our output costs more than $(1 + 2/\ln n) \cdot t \cdot$ Opt is at most $1/(1 + 2/\ln n)$ which is about $1 - 2/\ln n$. Rigorously, it's less than $1 - 1/\ln n$. Multiplying out the $(1 + 1/\ln n) \cdot t$, we conclude:

(*) Pr[costs at most $(\ln n + O(\ln \ln n)) \cdot$ Opt] $\geq 1/\ln n$.

Now on the other hand, as we saw, after our $t$ rounds, the probability a particular $e$ is uncovered is at most $\exp(-t) = 1/(n \ln^C$ $n)$. Union-bounding over all elements, we get:

(**) Pr[solution is invalid] $\leq 1/\ln^C$ $n$.

Taking $C$ large (even $C = 2$ is okay, I guess) and combining (*) and (**) we conclude:

(***) Pr[valid solution of cost at most $(\ln n + O(\ln \ln n)) \cdot$ Opt] $\geq 1 / \ln n - 1/\ln^2$ $n \geq \Omega(1/\ln n)$.

We may now use Problem 1a on Homework 1.

Max-Coverage vs. Min-Set-Cover

A quick note to say that these problems are very closely related. Set-Cover is of course more famous, but in quite a few cases the analysis and understanding of Max-Coverage is simpler. In the lectures I'll actually switch between them to make my points.

Formally, there's a straightforward reduction from Min-Set-Cover to Max-Coverage. (A reduction in the opposite direction is not obvious to me: can anybody think of one?) I was going to make this a homework problem but then I forgot.

Specifically:

Proposition: Suppose there is a 1 vs. 1 - $\gamma$ search algorithm for Max-Coverage. Then there is a $\lceil \log_{(1/\gamma)}$ $n \rceil$ factor search algorithm for Min-Set-Cover.

Remark: So if we got our 1 vs. 1 - 1/e algorithm for Max-Coverage via Greedy and somehow failed to notice the ln n algorithm for Min-Set-Cover, this reduction would give it. Additionally, if we prove ln n hardness for Min-Set-Cover, this reduction automatically gives 1 vs. 1 - 1/e hardness for the search version of Max-Coverage. Unfortunately, it's a little easier to prove hardness for Max-Coverage, and that's what we'll see in next Tuesday's class. (To compensate a tiny bit, we'll at least prove the 1 vs. 1 - 1/e decision version of Max-Coverage is hard.)

Proof: Suppose we have a Min-Set-Cover instance with optimum Opt. Since 1 $\leq$ Opt $\leq n$, we can "guess" Opt's value in polynomial time. [Exercise: at the end of the proof, explain this rigorously.] Now run the 1 vs. 1 - $\gamma$ search algorithm for Max-Coverage with "m" = Opt. By assumption, this Max-Coverage instance has value 1, so we will get a collection of Opt sets which leave less than a $\gamma$ fraction of ground elements uncovered.

"Delete" all the covered elements from the instance, simplifying it. Now there are still (at most) Opt sets which cover all the uncovered elements. So we can run the Max-Coverage instance again and get Opt more sets which reduce the fraction of uncovereds down to $\gamma^2$.

Ergo etc., as they used to say. If we do $t$ iterations of this we will have chosen $t \cdot$ Opt sets and we can stop once $\gamma^t \lt 1/n$; i.e., $t \gt \log_{(1/\gamma)}$ $n$.

Wednesday, January 16, 2008

Min-R^2-TSP

Did anybody catch why the Min-$\mathbb{R}^2$-TSP problem is not known to be in NP?

Monday, January 14, 2008

Lecture 1: Definitions; greedy algorithm for Set-Cover & Max-Coverage

Any questions or comments about Lecture 1 can go here.

I also wanted to include just a little bit of my own opinion on why studying approximation algorithms is worthwhile.

The "usual" motivation given is the following: In practice we are faced with NP-hard problems, and we nevertheless want to do something besides just give up. This is definitely true. The motivation suggests that running an approximation algorithm is a good idea -- they have worst-case performance guarantees! This is partly true, but it's not the whole story.

As an example, at one point in late 2001, the best known approximation algorithm for "Uncapacitated Metric Facility Location" (sounds complicated, but it's actually a very natural problem studied in industry; we'll discuss it later in this class) was a 1.73-factor approximation [Guha-Charikar]. It used an unholy combination of linear programming, primal-dual methods, and greedy methods. It's doubtful anyone ever ran it. At the same time, there was a relatively simple, quadratic time greedy algorithm achieving a 1.86-factor approximation [Mahdian-Markakis-Saberi-V.Vazirani]. One would be hard-pressed to say that the 1.73-factor algorithm was a better heuristic in practice for the problem. (The current best known algorithm is 1.52-factor [Mahdian-Ye-Zhang] and is combinatorial.)

On the other hand, take Set Cover. Even though we know that the Greedy algorithm only achieves somewhat bad performance -- a ln n factor -- we would and do run it anyway.

So breakthroughs in the analysis of a problem's approximability don't necessarily help you out at all in practice.

(I should emphasize, though, that sometimes they do: for example, the Goemans-Williamson .878-factor Max-Cut algorithm has had a huge impact on practice; not because it is a .878-factor approximation algorithm but because it gave powerful evidence in favor of an algorithmic technique (semidefinite programming) which is today a key component in practical Max-Cut algorithms.)

Regardless of the "heuristics for practice" motivation, there are additional reasons to study approximability:

1. It helps you understand "hard instances"; for example, what properties of Set-Cover instances make them hard? What properties make them easy? Studying approximation algorithms for the problem usually reveals this and helps you design algorithm for special cases. See also Luca Trevisan's opinion on the whys of approximation algorithms.

2. It tells you something about P vs. NP. (This is my person reason for studying approximability.) Take Max-Cut for example. We know that .878-approximating it is in P. We also know [Håstad-Trevisan-Sorkin-Sudan-Williamson] that .942-approximating it is NP-hard. What about the algorithmic problem of .9-approximating Max-Cut: is it in P or is it NP-hard? No one knows. In fact, experts have contradictory guesses. And this is for Max-Cut, which I'd argue is the simplest possible NP optimization problem. How can such a simple problem evade being classified as being in P or NP-hard? I find this to be an even more galling situation than the unclear status of Factoring (which most people at least guess is not in P) and Graph-Isomorphism (which most people at least guess is in P).

15-854B: Advanced Approximation Algorithms