Notes on Mathematical and Structural Induction

Mathematical induction

The classical rule (cf. page 51) is

   M(1)     for all n in N: M(n) -> M(n+1)
   ---------------------------------------
           for all n in N: M(n)

Here N is the set of natural numbers: {1,2,3,4,....}

Example (cf. page 51): We want to prove M(n) where

   M(n): 1 + ... + n = n(n+1)/2

The proof is as follows:

Base case

We must establish M(1), that is

   1 = 1(1+1)/2

which is trivial.

Inductive step

We can assume M(n), and must establish M(n+1) which follows from the calculation

     1 + ... + n+1 
   = 1 + ... + n + n+1
   = n(n+1)/2 + n+1      since M(n) holds by our induction hypothesis
   = (n(n+1) + 2(n+1))/2
   = (n+1)(n+2)/2

Another starting point

In some cases, it might be convenient to start the induction at zero:

  M(0) and for all n >= 0: M(n) -> M(n+1)
  ------------------------------------------
          for all n >= 0: M(n)

or by some other value.

Course-of-values induction

The rule is (cf. page 54)

  for all n in N: M(1) /\ ... /\ M(n-1) -> M(n)
  ---------------------------------------------
          for all n in N: M(n)

Note that for n = 1, the conjunction in the premise is empty, so we can rewrite the rule as

  M(1) and for all n >= 2: M(1) /\ ... /\ M(n-1) -> M(n)
  ------------------------------------------------------
          for all n in N: M(n)

Example: Consider the Fibonacci numbers given by

  fib(1) = 1
  fib(2) = 1
  fib(n) = fib(n-1) + fib(n-2)    for n >= 3

Theorem: If n is divisible by 3, then fib(n) is even, otherwise fib(n) is odd.

Proof: We shall employ course-of-values induction; there is thus no base step but "only" the inductive step for which we have to do a case analysis. In all cases, we can assume that the result hold for all m < n.

n = 1 or n = 2: These are not divisible by 3, and accordingly fib(n) = 1 which is odd.
n is divisible by 3: Then neither n-1 nor n-2 is divisible by 3. Our induction hypothesis thus tells us that fib(n-1) and fib(n-2) are both odd. As fib(n) = fib(n-1) + fib(n-2), this implies that fib(n) is even, as desired.
n > 3 but n is not divisible by 3: Then exactly one of n-1 and n-2 is divisible by 3. Our induction hypothesis then tells us that exactly one of fib(n-1) and fib(n-2) is even. As fib(n) = fib(n-1) + fib(n-2), this implies that fib(n) is odd, as desired.

Note that the last two steps couldn't have been carried out using the original principle of induction, where we in order to establish M(n) can assume only M(n-1) but not M(n-2).

Equivalence of definitions

It may seem suspicious that there apparently are so many definitions of induction. Fortunately, they are all in some sense equivalent, as illustrated by the two theorems below (which may safely be skipped by the non-curious reader).

Theorem: Mathematical induction implies Course-of-values induction.

Proof: Given some property M, and assume that for all n in N

  M(1) /\  ... /\ M(n-1) -> M(n).

Our goal is to prove, using the principle of mathematical induction, that M(n) holds for all n in N. For that purpose, we define a new property M':

  M'(n) : M(1) /\ ... /\ M(n).

As M(1) clearly holds, we have

  M'(1).

Also, it is easy to see that for n in N,

  M'(n) = M(1) /\ ... /\ M(n) -> M(1) /\ ... /\ M(n) /\ M(n+1) = M'(n+1).

So by the principle of mathematical induction, M'(n) holds for all n in N. But then also M(n) holds for all n in N, as desired. This concludes the proof.

Theorem: Course-of-values induction implies Mathematical induction.

Proof: Given some property M, and assume that

  M(1) and for all n in N: M(n) -> M(n+1)

Our goal is to prove, using the principle of course-of-values induction, that M(n) holds for all n in N. But since for n > 1 we have M(n-1) -> M(n) it in fact holds for all n in N that

  M(1) /\ ... /\ M(n-1) -> M(n).

So by the principle of course-of-values induction, M(n) holds for all n in N. This is as desired, and thus concludes the proof.

Lists

A very important datastructure is Lists. They are given by the BNF grammar (another example of a BNF grammer is given on page 40)

  lst ::= nil
       |  (c :: lst)

where c can be arbitrary (though we shall often think of c as being an integer). That is, a list is either empty (nil) or an element c in front of a list.

Example: Consider a list with the elements 5,7,4 (note that the order matters). This list is in our syntax written as

   (5 :: (7 :: (4 :: nil)))

or graphically

In the spirit of Definition 1.31 (page 55), we define a function computing the height of a list:

  height(nil) = 1

  height(c :: x) = 1 + height(x)

The Append function

This very useful function, taking two lists as arguments and returning their concatenation (also a list), is given by the following recursive definition

  append    nil   y = y

  append (c :: x) y = (c :: (append x y))

Example:

    append (5 :: (7 :: nil))  (8 :: (4 :: nil))
  = (5 :: (append (7 :: nil)  (8 :: (4 :: nil))))
  = (5 :: (7 :: (append  nil  (8 :: (4 :: nil)))))
  = (5 :: (7 :: (8 :: (4 :: nil))))

Structural Induction for Lists

This is an extremely crucial proof principle for computer science.

  M(nil)   for all lists x: M(x) -> M((c :: x))
  ---------------------------------------------
     for all lists x: M(x)

So in order to show that a property holds for all lists, you must show that it holds for the empty list, and that the property holds for a non-empty list provided it holds for its "tail".
As an example, we shall prove the following (not too surprising)

Theorem: For all x, we have M(x) where

  M(x): append x nil = x

Proof: Structural induction.

The base case

We must establish M(nil), that is

  append nil nil = nil

which is trivial from the definition of append.

The inductive step

We can assume M(x), and must establish M((c :: x)) which follows from the calculation

     append (c :: x) nil
   = (c :: (append x nil))    definition of append
   = (c :: x)                 the induction hypothesis

We can also prove that the append function is associative:

Theorem: For all x, we have M(x) where

  M(x): append (append x y) z = append x (append y z)

Proof: Structural induction.

The base case

We must establish M(nil), that is

  append (append nil y) z = append nil (append y z)

which is trivial since both LHS and RHS reduce to

  append y z.

The inductive step

We can assume M(x), and must establish M((c :: x)) which follows from the calculation

     append (append (c :: x) y) z
  =  append (c :: (append x y)) z    definition of append
  =  (c :: (append (append x y) z))  definition of append
  =  (c :: (append x (append y z)))  induction hypothesis
  =  append (c :: x) (append y z)    definition of append

Equivalence of definitions

Again, it may seem suspicious that we have now introduced yet another concept of induction. Fortunately, the two theorems below (which may safely be skipped by the non-curious reader) show that it is in fact in some sense equivalent to the previous one.

Theorem: Mathematical induction implies Structural induction.

Proof: Given some property M on lists, and assume that

  M(nil)  and for all lists x: M(x) -> M(c :: x)

Our goal is to prove, using the principle of mathematical induction, that M(x) holds for all lists x. For that purpose, we define a new property M' on natural numbers:

  M'(n): for all lists x with height(x) = n, M(x) holds

As nil is the only list with height 1, we have

  M'(1).

Now assume that M'(n). Let y be a list with height(y) = n+1. We can clearly write y on the form (c :: x) with height(x) = n. From M'(n) we infer that M(x) holds, and from our assumption we conclude M(c :: x), that is M(y). As y was arbitrary with height(y) = n+1, this shows that M'(n+1). We have thus established

  M'(n) -> M'(n+1).

Mathematical induction thus enables us to infer that M'(n) holds for all n in N. As all lists have a height which is a natural number, this shows that M(x) holds for all lists x. This concludes the proof.

Theorem: Structural induction implies Mathematical induction.

Proof: Given some property M on natural numbers, and assume that

  M(1)  and  for all n in N: M(n) -> M(n+1)

Our goal is to prove, using the principle of structural induction, that M(n) holds for all n in N. For that purpose, we define a new property M' on lists:

  M'(x) = M(height(x))

Since height(nil) = 1, we have

  M'(nil).

Now assume that we have M'(x), that is M(height(x)). By our assumption, this implies M(height(x) + 1), which is the same as M(height(c :: x)). But this shows that M'(c :: x) holds. We have thus established

  M'(x) -> M'(c :: x).

Structural induction thus enables us to infer that M'(x) holds for all lists x. Since for all n in N there exists a list x with height(x) = n, this shows that M(n) holds for all n in N. This concludes the proof.

Structural Induction for Trees

Well-formed formulas are particular examples of trees. They are given by the BNF grammar (cf. page 40)

  phi ::= p
       |  (~ phi)
       |  (phi /\ phi)
       |  (phi \/ phi)
       |  (phi -> phi)

where we can use Convention 1.3 (page 5) to eliminate some of the brackets.

Recall that the BNF grammar for lists

  x ::= nil
     |  (c :: x)

gives rise to the induction principle

  M(nil)   
  M(x) -> M((c :: x))
  ---------------------
  for all lists x: M(x)

Similarly, the BNF grammar for well-formed formulas gives rise to the induction principle (the generalization to other BNF grammars is straightforward)

  M(p)   
  M(phi) -> M((~ phi))    
  M(phi1) /\ M(phi2) -> M((phi1 /\ phi2))
  M(phi1) /\ M(phi2) -> M((phi1 \/ phi2))                             
  M(phi1) /\ M(phi2) -> M((phi1 -> phi2))                             
  ---------------------------------------
  for all formulas phi: M(phi)

We can use that principle to give a slightly alternative proof of Theorem 1.32 on page 55: Let

  M(phi): LBS(phi) = RBS(phi)

where LBS(phi) is the the number of left brackets in phi, and RBS(phi) is the number of right brackets in phi.

Our task is then to prove that M(phi) holds for all well-formed formulas phi. We do that by structural induction, and must establish several properties:

M(p)

Follows easily since LBS(p) = 0 = RBS(p)

M(phi) -> M((~phi))

Follows from the calculation

  LBS((~phi)) = 1 + LBS(phi) = 1 + RBS(phi) = RBS((~phi))

M(phi1) /\ M(phi2) -> M((phi1 /\ phi2))

Follows from the calculation

    LBS((phi1 /\ phi2)) = 1 + LBS(phi1) + LBS(phi2) = 1 + RBS(phi1) + RBS(phi2)
  = RBS((phi1 /\ phi2))

A more general form of structural induction

We saw that for integers, Course-of-values induction is often more applicable than Mathematical induction. Similarly, for formulas (and trees in general) it is often more convenient to apply the following induction principle

   M(phi) can be proven, assuming that 
    M(phi') holds for all phi' "smaller than" phi
   ----------------------------------------------
   M(phi) holds for all formulas phi

The unspecific "smaller than" can be instantiated in several ways:

   M(phi) can be proven, assuming that 
    M(phi') holds for all phi' that are proper subtrees of phi
   -----------------------------------------------------------
   M(phi) holds for all formulas phi

or, similar to the approach on page 55,

   M(phi) can be proven, assuming that 
    M(phi') holds for all phi' with height(phi') < height(phi)
   -----------------------------------------------------------
   M(phi) holds for all formulas phi

We can now do part of Exercise 1.14(7), page 84.

Theorem: Assume that phi does not contain implications. Then [[NNF(phi)]] = [[phi]].

(Recall that [[phi]] denotes the semantics of the formula phi, that is a mapping which given truth values for the atoms in phi returns the corresponding truth value of phi.)

Proof: We shall perform structural induction in the definition of NNF, presented page 80. Different cases:

phi is an atom p

Since phi is a literal, NNF(phi) = phi and the claim follows.

phi = (phi1 /\ phi2)

Using the induction hypothesis on phi1 and phi2, we obtain the desired relation

  [[NNF(phi)]] = [[NNF(phi1) /\ NNF(phi2)]] 
               = [[NNF(phi1)]] /\ [[NNF(phi2)]] = [[phi1]] /\ [[phi2]]
               = [[phi]]

phi = (phi1 \/ phi2)

Similar to the above case

phi is a negation

We split into several cases:

phi = ~p

Since phi is a literal, NNF(phi) = phi and the claim follows.

phi = ~(~phi1)

By applying the induction hypothesis to phi1, which is clearly smaller than phi, and by observing that phi and phi1 are semantically equivalent, we obtain the desired relation

  [[NNF(phi)]] = [[NNF(phi1)]] = [[phi1]] = [[phi]]

phi = ~(phi1 /\ phi2)

By unfolding the definition of NNF twice, we see that

  [[NNF(phi)]] = [[NNF(~phi1 \/ ~phi2)]] = [[NNF(~phi1)]] \/ [[NNF(~phi2)]]

and by applying de Morgan's laws we see that

  [[phi]] = [[~phi1]] \/ [[~phi2]]

Our claim that [[NNF(phi)]] = [[phi]] would therefore follow if we could establish

  [[NNF(~phi1)]] = [[~phi1]] and  [[NNF(~phi2)]] = [[~phi2]].

But this follows from the induction hypothesis applied to ~phi1 and ~phi2. Of course, for this to be a valid inference, we need to show that height(~phi1) (and also height(~phi_2)) is smaller than height(phi). But this follows from the calculation

  height(~phi1) = 1 + height(phi1) <= 1 + max(height(phi1),height(phi2)) = height(phi1 /\ phi2) 
                < 1 + height(phi1 /\ phi2) = height(phi).

phi = ~(phi1 \/ phi2)

Similar to the above case

Torben Amtoft