Generalized CNF for "At Least k, given n variables"

| ⌛ 5 minutes read

📋 Tags: Essay Math A.I. CS3243


I built this formula in preparation for CS3243 finals, but on hindsight it was overkill and the exam didn’t even need this… 😢

But, in the spirit of sharing, here’s my questionable formula and the method I obtained this.

General CNF at least k, n variables

For $n$ variables, $X = \set{x_1, x_2, …, x_n}$. Arrange them into combinations of k variables.
Denote the set of k-variable combinations, $C = \set{c_1, c_2, …}$, or more formally in the following footnote^1.
Each element in C is a k-combination of X in DNF, so:

  • $|C| = \binom{n}{k}$
  • $\forall c \in C, |c| = k$
  • $c_1 = (x_1\lor x_2\lor …\lor x_k)$ //note: a combination of k variables

Then, the CNF of “at least k” is:

> $c_1 \land c_2 \land … \land c_{n-k+1}$
i.e.
> $(x_1 \lor x_2 \lor x_3 … \lor x_{n-k+1}) \land (x_1 \lor x_2 \lor x_4 … \lor x_{n-k+1}) \land …$

Example: “at least 2, 4 variables”
We start with the ‘standard’ formulation:

$(x_1 \land x_2) \lor (x_1 \land x_3) \lor (x_1 \land x_4) \lor (x_2 \land x_3) \lor (x_2 \land x_4)\lor (x_3 \land x_4)$

The CNF form has 4C3 = 4 clauses. Applying the general form formula…

$(x_1 \lor x_2 \lor x_3) \land (x_1 \lor x_2 \lor x_4) \land (x_1 \lor x_3 \lor x_4) \land (x_2 \lor x_3 \lor x_4)$

“Proof”

We start with the “general form” for the “at least k” constraint.
Let’s assume we have 3 variables.

> At least 1 == $x_1 \lor x_2 \lor x_3$
> At least 2 == $ (x_1 \land x_2) \lor (x_1 \land x_3) \lor (x_2 \land x_3)$

See the pattern? At least 2, given 3 variables is the disjunction of combinations of 3 variables.
Ok. That’s a mouthful, but I hope you get it.

You can prove this pattern to yourself by using 4 variables, but I do not recommend that unless you like the tedium of enumerating combinations.

The issue now is the tedium of converting DNF to CNF.

Since we don’t have implications (->) and NOTs (!) to worry about, we can simply2 just distribute the terms and regroup them into CNF. See this stack exchange post for an example of distributing over many terms.

theWe can try and inductively (weak) figure out if there is a pattern.

We just need to do the ‘simple’ task of enumerating ALL the CNFs from their standard form and see if there is a pattern…

k n Standard form CNF
2 3 (X₁∧X₂)∨(X₁∧X₃)∨(X₂∧X₃) (X₁∨X₂)∧(X₁∨X₃)∧(X₂∨X₃)

^ Now, do that for sequential k and n!

Tedium by Wolframalpha

So, like all good engineers, we turn to Wolframalpha to do the heavy lifting for computational work.

Surprisingly, wolfram allows natural language input for boolean expressions and automatically shows CNF and DNF forms for any given expression.

Plugging in the standard form for k=2, n=4...

Neat!

After varying for multiple k and n, this is the pattern we see…

k n Standard form CNF
2 2 (x₁∧x₂) (x₁)∧(x₂)
2 3 (X₁∧X₂)∨(X₁∧X₃)∨(X₂∧X₃) (X₁∨X₂)∧(X₁∨X₃)∧(X₂∨X₃)
2 4 (x₁∧x₂)∨(x₁∧x₂)∨(x₁∧x₄)∨(x₂∧x₃)∨(x₂∧x₄)∨(x₃∧x₄) (x₁∨x₂∨x₃)∧(x₁∨x₂∨x₄)∧(x₁∨x₃∨x₄)∧(x₂∨x₃∨x₄)
3 3 (x₁∧x₂∧x₃) (x₁∧x₂∧x₃)
3 4 (x₁∧x₂∧x₃)∨(x₁∧x₂∧x₄)∨(x₁∧x₃∧x₄)∨(x₂∧x₃∧x₄) (x₁∨x₂)∧(x₁∨x₃)∧(x₁∨x₄)∧(x₂∨x₃)∧(x₂∨x₄)∧(x₃∨x₄)

As you can see, with higher k and n the number of clauses explodes, so the table omits3 certain values such as n > 4 and k > 3.

It should be somewhat obvious what the emerging pattern is.

Observe that for the standard form,

  1. Each conjunctive clause (e.g. (x₁∧x₂) ) is joined via disjunction to another clause
  2. Each clause is a unique combination of k vars out of n
    • By extension, this means that the |clauses| = $n \choose k$

For the CNF form,

  1. Each disjunctive clause (e.g. (X₁∨X₃)) is joined via conjunction to another clause
  2. Each clause is a unique combination if n-k+1 vars4 out of n.
    • By extension, this means that |clauses| = $n \choose {n-k+1}$

We can generalize this by defining a set of all unique combinations of clauses in disjunctive normal form and chaining them conjunctively.

This leads to the formula given above…

Where $c_i$ is an element in the set of all unique combinations of clauses, $C$,
At least k given n variables is:
> $c_1 \land c_2 \land … \land c_{n-k+1}$
i.e.
> $(x_1 \lor x_2 \lor x_3 … \lor x_{n-k+1}) \land (x_1 \lor x_2 \lor x_4 … \lor x_{n-k+1}) \land …$

Of course, this is a (pretty) weak induction. I am sure there is some way to hack out a formal proof using the law of distributivity and blah blah… but for what it’s worth, this is enough for most cases in CS3243 and perhaps small-bounded problems.

If I have more free time, perhaps I will try and hammer out a stronger induction case using a general case n, but for now this suffices (for me) for day-to-day use.

So… QED?


  1. $\binom{X}{k} := {A~:~|A|=k,~A\subseteq X}$, src ↩︎

  2. One does not simply distribute terms. Good luck doing it by hand! ↩︎

  3. I also do not intend to type out the ridiculous number of clauses that are generated in latex. It’s unreadable! If you aren’t convinced, the rows for n > 4 and k > 3 are left as an exercise to the reader 😉 ↩︎

  4. The term n-k+1 is not particularly obvious. But if you count the terms and cross-check with each row, it makes sense. ↩︎