Monday, January 21, 2008

LP integrality gaps for Min-Set-Cover

As promised, herein I will discuss "LP integrality gap instances" for Set-Cover. Recall that these are instances $\mathcal{I}$ for which $Lp(\mathcal{I})$ is small but $Opt(\mathcal{I})$ is large.

Let's start with one that looks quite similar to the LP+Randomized-Rounding algorithmic gap we saw in class:

Ground elements: all nonzero elements of $\{0,1\}^q$ -- thought of as $\mathbb{F}_2^q$, the $q$-dimensional vector space over the field of size 2. There are $n = 2^q - 1$ ground elements.

Sets: For each $\alpha \in \mathbb{F}_2^q$ we have a set $S_\alpha = \{ e \in \mathbb{F}_2^q \setminus \{0\} : \alpha \cdot e = 1\}$. Here $\cdot$ denotes the usual dot product in $\mathbb{F}_2^q$: $\alpha_1 e_1 + ... + \alpha_q e_q$ (mod 2). There are $M = 2^q$ sets. (The set $S_0$ is kind of pathetic, being the empty set, but never mind.)

Analysis: Each ground element is contained in exactly half of the sets. (Standard undergrad probability exercise.) It follows that in an LP solution, if we take each set fractionally to the extent $2/M$, then each element is fractionally covered to the extent $1$.

Hence $Lp(\mathcal{I}) \leq M (2/M) = 2$. (In fact, $Lp$ is less than this...)

On the other hand, we claim $Opt(\mathcal{I}) \geq q$. Suppose by way of contradiction that there are some $q-1$ sets $S_{\alpha_1}, ..., S_{\alpha_{q-1}}$ covering all ground elements. So

$S_{\alpha_1} \cup ... \cup S_{\alpha_{q-1}} = \mathbb{F}_2^q \setminus \{0\}$

$\Leftrightarrow \bigcap_{i=1}^q \overline{S_{\alpha_i}} = \emptyset$

$\Leftrightarrow \bigcap_{i=1}^q \{e \in \mathbb{F}_2^q : \alpha_i \cdot e = 0 \} = \{0\}$.

But each set in the intersection above is a hyperplane in $\mathbb{F}_2^q$. This gives the contradiction, because in dimension $q$, you can't have $q-1$ hyperplanes intersecting on just the point $0$.

Conclusion: This instance is a $2$ vs. $\log_2$ $(n+1)$ integrality gap for Set-Cover. This yields a ratio of $\Omega(\ln n)$, but not the full $(1-\epsilon) \ln n$ we've come to expect.

--------------------

Comparing with the way things have gone before, I bet you think I'm going to now tell you a straightforward way to extend this to get the full $(1 - \epsilon) \ln n$ gap, perhaps by looking at an instance with ground elements $\mathbb{F}_k^n$.

Well, I'm here to tell you that I don't know how to do this. Quite likely it's possible, but I couldn't see how. I'd be very very interested if one of you could give such an example.

--------------------

So is there another kind of integrality gap instance? Or perhaps $Lp$ and $Opt$ can never be off by more than a $(\log_2 n)/2$ factor...

Since Feige proved that the $(1-\eps) \ln n$ decision problem is hard, we know that the former must be the case. (Otherwise, the efficient algorithm of computing and outputting $Lp$ would contradict his theorem.) Here is one way to show it:


Fix an abstract set of ground elements, $\Omega$, of cardinality $n$. Introduce sets $S_1, ..., S_M$ by picking them randomly (and indendently). For each $S_i$, we choose it by including each element of $\Omega$ with probability $1/k$. We will take $M = (C k^2) \log n$ for a large $C$ to be named later.

Analysis: Each element is expected to be contained in about a $1/k$ fraction of the sets. If we take $C$ large enough, then a Chernoff bound + union bound will imply that with high probability, every element is contained in at least a $(1/k)(1 - \epsilon)$ fraction of sets. (Here we can make $\epsilon$ as small as we like by making $C$ large.) Thus we can get a valid fractional solution to the LP by taking each set to the extent $(k/M)/(1-\epsilon)$; this solution has value $k/(1-\epsilon)$. Summarizing: with high probability, $Lp \leq k/(1 - \epsilon)$.

We now would like to show that with high probability, $Opt \geq t := (1-\epsilon) k \ln n$. This will give us an $Opt/Lp$ integrality gap of $(1-\epsilon)^2 \ln n$, which is good enough.

To see this, we fix any set of $t$ out of $M$ set indices and show that the probability these sets end up being a valid cover is very small. If the probability is significantly smaller even than $1/{M \choose t}$ then we can union bound over all possible size-$t$ covers and be done.

Now each element $e$ has a $1-1/k$ chance of being uncovered by a given set, and thus our fixed $t$ sets have a $(1 - 1/k)^t$ chance of leaving $e$ uncovered. With our choice of $t$, this quantity is essentially $\exp(-(1-\epsilon) \ln n) = 1/n^{1-\epsilon}$ (slight cheat here as the inequality goes the wrong way, but one can fix this easily). The events that the $t$ sets cover each element are actually independent across elements. So the probability that all elements are covered is at most $(1 - 1/n^{1-\epsilon})^n \leq \exp(-n^{\epsilon})$. This is indeed much less than $1/{M \choose t}$, since this latter quantity only is of the order $1/(\log n)^{O(\log n)}$ (assuming $k$ and $\epsilon$ are "constants"), much smaller than $\exp(-n^{\epsilon})$.

2 comments:

Suresh said...

Does the integrality gap depend on the nature of LP formulation. For example if we have some alternative LP formulation for set cover problem, do we end up getting different integrality gap.

Ryan O'Donnell said...

Yes, it does depend on the LP formulation.

Note, though, that since we know NP-hardness of deciding Set-Cover to within a factor of (1-eps) ln n, there cannot be any poly-size LP formulation for Set-Cover with integrality gap less than (1-eps) ln n. (Unless P = NP!)

[Actually, we only know this hardness subject to NP not in DTIME(n^{O(log log n)}), so "unless NP in DTIME(n^{O(log log n)}}".]