2.4.20

Higher-order Logic and Equality.

In this post, I want to sketch a particular perspective on $\lambda$-calculus and higher-order logic and Church's simple theory of types. I have a few motivations to write this up. One of them is that as an undergrad (ages ago), upon encountering first-order predicate logic, I had endlessly wondered why one couldn't use "boolean functions" and something like functional programming for logic. It was only much later that I discovered Church's 1940 typed $\lambda$ calculus was in a sense, just that.

When people think of the connection between $\lambda$-calculus and logic, the first thing that comes to their mind may well be the Curry-Howard correspondence. Similarly, the first thing that people may think of when hearing type theory is Martin Löf type theory. So another motivation is to have a place that describes higher-order logic a.k.a. simple type theory - which after all was the first appearance of typed $\lambda$ calculus - and how there are multiple ways to relate it to formal logic and proofs.

Preliminary: Natural Deduction

Natural deduction [wikipedia], in short, is a system of reasoning involving inference rules to derive new true statements from existing ones. What sets them apart from a Hilbert system [wikipedia] is that logical operators themselves are described using inference rules, proofs are trees (derivations) formed from such rule.

In this journey, we present natural deduction rules in sequent calculus style: assumptions are tracked in a context $\Gamma$.

Minimal Logic

Minimal logic [wikipedia], also referred to as the positive fragment of intuitionistic logic, is a deductive system for propositional logic that does not contain any inference rule for falsehood. Here are the rules: $$ \frac{}{\Gamma, A \vdash A}\quad\quad \frac{\Gamma, A \vdash B}{\Gamma \vdash A \rightarrow B} $$ $$ \frac{\Gamma \vdash A \rightarrow B \quad \Gamma \vdash A}{\Gamma \vdash B}$$ We could have added conjunction $\wedge$ or disjunction $\vee$ or a symbol $\bot$ to stand for a false proposition, along with the convention that $\neg A = A \rightarrow \bot$; it would still be minimal logic. "Full" intuitionistic logic has a rule that goes from $\bot$ to an arbitrary proposition $A$ (ex falso quodlibet); and other rules, like double negation elimitation, get us classical logic.

The naming scheme that calls this "minimal" is consistent with Dag Prawitz's "Natural Deduction: A Proof-Theoretical Study." Yet, I believe it is fairly common to call this kind of logic intuitionistic logic, even though there is no rule for $\bot$.

The proofs for this logic have a structure that is familiar to all students of functional programming languages.

Simply typed $\lambda$-calculus

We now follow the celebrated "propositions-as-types" or Curry-Howard correspondence and assign names (variables) to assumptions, terms (expressions) to proofs, using this language: $$ s, t ::= x \enspace \vert \enspace (\lambda x: T. t) \enspace \vert \enspace(s\ t) \enspace $$ $$ T ::= b_i \enspace \vert \enspace T \rightarrow T $$

Here $b_i$ are some given base types. Our annotated inference rules: $$ \frac{}{\Gamma, x:A \vdash x:A}\quad\quad \frac{\Gamma, x:A \vdash s:B}{\Gamma \vdash (\lambda x:A. s) : A \rightarrow B} $$ $$ \frac{\Gamma \vdash s:A \rightarrow B \quad \Gamma \vdash t:A}{\Gamma \vdash (s\ t):B}$$ A term encodes the entire derivation (or natural deduction proof) of the proposition/type.

Reduction as proof normalization

From the Curry-Howard angle, functional programming is "proof normalization": derivations that contain redundant steps get simplified to smaller derivations. The redundancy is always of the form that part of the derivation needs an assumption $x:A$, and another part of the derivation actually "produces" a term of type $A$. In more detail, consider a derivation $\Gamma \vdash (\lambda x:A. s): A \rightarrow B$ and a derivation $\Gamma \vdash t: A$, which can be combined to yield $\Gamma \vdash ((\lambda x:A. s)\ t) : B$. The detour can be removed by taking every occurrence of $x$ within $s$ and replacing it with $t$. This yields a smaller derivation $\Gamma \vdash s[t/x] : B$. If two derivations stand in this relation, we call them $\beta$-equivalent. $$ ((\lambda x:A. t)\ s) \approx_\beta t[s/x] $$

Here $[s/x]$ is notation for substituting $s$ for $x$, and we hurry to point out that care must be taken that we may need to rename bound variables. Two terms that only differ in the names of their bound variables are $\alpha$-equivalent and we will pretend they are equal. If we used de Bruijn indices instead of variable names, there would be no need for $\alpha$-equivalence.

[Digression: the identity above is not an algebraic identity - these are syntactic objects which involve variable binding. Sometimes it is ok to treat lambda-calculi as an algebra, and this leads to combinatory logic, see neelk's answer on stackexchange and Selinger's paper.]

If we give a direction to the identity, we obtain $\beta$ reduction: $$((\lambda x:A. t) s) \Rightarrow_\beta s[t/x]$$ We can show that $\Rightarrow_\beta$ is strongly normalizing and therefore terminating reduction relation, and that $\Rightarrow_\beta$ is confluent (Church-Rosser property), so all reductions end up in the same normal form. This yields a decision procedure for the "word problem": to find out whether $s \approx_\beta t$, reduce both to normal forms and check that they are equal.

We now shift our perspective, leave the Curry-Howard correspondence aside and look at the theory of equality between these expressions. \(\newcommand{\qeq}[0]{\color{blue} {\mathrm{eq}}}\)

Higher-order Logic, using Equality alone

We are going to build a calculus for classical logic, based on our language. Our development here follows the $Q_0$ type theory (due to Peter Andrews). This is a type theory which is close to Church's 1940 simply-typed $\lambda$ calculus, but uses only equality and a description operator. We will omit the description operator.

We need two base types: $\iota$ for "individuals" and $\omicron$ for "propositions". Since the logic is classical, every proposition will be either true of false, so $\omicron$ can be seen as the type these "truth values". With this setup, we can view any term of type $\iota \rightarrow \omicron$ as describing a set of individuals, or alternatively, a predicate. This resembles the characteristic function of a set in mathematics and also what a programmer calls a Boolean function.

[ The name "higher-order" is appropriate, since this logic lets us also talk about set of sets of individuals, via terms of type $(\iota \rightarrow \omicron) \rightarrow \omicron$ and so on. This use also matches the terminology "higher-order function", which is not a coincidence. Note that you can quantify over "smaller-order predicates", but you can't express a "set of all sets."]

The only built-in/primitive notion is a logical constant $\qeq_\alpha: \alpha \rightarrow \alpha \rightarrow \omicron$ which serves as equality operator for any type $\alpha$. Before we build propositions out of logical operators and quantifiers, let us write down types of such operators, with the common convention that $\alpha \rightarrow \beta \rightarrow \gamma$ stands for $\alpha \rightarrow (\beta \rightarrow \gamma)$ the types would be: $$ \newcommand{\qtrue}[0]{{\color{blue} {\mathrm{true}}}} \newcommand{\qfalse}[0]{{\color{blue} {\mathrm{false}}}} \newcommand{\qnot}[0]{{\color{blue} {\mathrm{not}}}} \newcommand{\qor}[0]{\color{blue} {\mathrm{or}}} \newcommand{\qand}[0]{\color{blue} {\mathrm{and}}} \newcommand{\qimplies}[0]{\color{blue} {\mathrm{implies}}} \begin{align*} \qtrue:\ & \omicron\\ \qfalse:\ & \omicron\\ \qnot:\ & \omicron \rightarrow \omicron\\ \qand:\ & \omicron \rightarrow \omicron \rightarrow \omicron\\ \qor:\ & \omicron \rightarrow \omicron \rightarrow \omicron\\ \qimplies:\ & \omicron \rightarrow \omicron \rightarrow \omicron\\ \end{align*} $$ Using again a notational convention that $((r~s)~t)$ is written $r~s~t$, the tautology $A \vee B \rightarrow B \vee A$ is expressed as $\qimplies~(\qor~A~B)~(\qor~B~A)$.

What about quantifiers? Since we have $\lambda$, for variable binding, there is no need for other variable binders like $\forall$ or $\exists$. Instead we introduce universal and existential quantification using two higher-order operators: $$ \newcommand{\qPi}[0]{\color{blue} \Pi} \newcommand{\qSigma}[0]{\color{blue} \Sigma} \begin{align*} \qPi_\alpha:\ & (\alpha \rightarrow \omicron) \rightarrow \omicron\\ \qSigma_\alpha:\ & (\alpha \rightarrow \omicron) \rightarrow \omicron\\ \end{align*} $$ So $\forall x .\exists y. x = y$ would be expressed as $\qPi \lambda x. \qSigma \lambda y. \qeq~x~y$. People studying classical logical systems frequently "apply Occam's razor" and try to define as few thing as possible. The $Q_0$ way is to play Occam's razor as much as possible. All logical operators are "implemented" using only $\qeq$. $$ \newcommand{\tprop}[0]{\omicron} \begin{align*} \qtrue &:= \qeq_{\tprop \rightarrow \tprop \rightarrow \tprop} \qeq_\tprop \qeq_\tprop\\ \qfalse &:= \qeq_{\tprop \rightarrow \tprop} (\lambda x:\tprop. \qtrue) (\lambda x:\tprop. x)\\ \qnot &:= \qeq_{\tprop \rightarrow \tprop} \qfalse\\ \qand &:= \lambda x:\tprop. \lambda y:\tprop. \\ &\quad \quad \qeq_{\tprop \rightarrow \tprop \rightarrow \tprop} \\ &\quad \quad \quad (\lambda z: \tprop \rightarrow \tprop \rightarrow \tprop. z~\qtrue~\qtrue) \\ &\quad \quad \quad (\lambda z: \tprop \rightarrow \tprop \rightarrow \tprop. z~x~y) \\ \qor &:= \lambda x:\tprop. \lambda y:\tprop. \\ &\quad \quad \qnot~(\qand~(\qnot~x)~(\qnot~y))\\ \qimplies &:= \lambda x:\tprop. \lambda y:\tprop. \qeq~x~(\qand~x~y)\\ \qPi_\alpha &:= \qeq_{(\alpha \rightarrow \tprop) \rightarrow (\alpha \rightarrow \tprop) \rightarrow \tprop}~(\lambda x : \alpha. \qtrue)\\ \qSigma_{\alpha} &:= \lambda x: \alpha \rightarrow \tprop. \qnot~~\qPi_{\alpha}~~\qnot~x.\\ \end{align*} $$

One advantage of having everything defined in terms of $\qeq$ is that when we give inference rules to explain how reasoning take place, they all involve only $\qeq$. We are using the notation of inference rules to describe an equational theory. The only assertions $\Gamma \vdash r$ that are derived are (more or less - up to $\beta$-equivalence) of the form $\Gamma \vdash \qeq_\alpha~s~t$, but we explicitly write some assumptions that we make about types. Here they are: $$ \frac{\Gamma \vdash s : \tprop \rightarrow \tprop}{\Gamma \vdash \qeq_\tprop~(\qand~(s~\qtrue)~(s~\qfalse))~(\qPi_\tprop~\lambda x : \tprop.~s~x)} $$ $$ \frac{\Gamma \vdash s : \alpha \quad \Gamma \vdash t:\alpha \quad \Gamma \vdash z: \alpha \rightarrow \tprop}{\Gamma \vdash \qimplies~(\qeq_\alpha~s~t)~(\qeq_\tprop~(z~s)~(z~t))} $$ $$ \frac{\Gamma \vdash s : \beta \rightarrow \alpha \quad \Gamma \vdash t:\beta \rightarrow \alpha}{\Gamma \vdash \qeq~(\qeq_{\beta \rightarrow \alpha}~s~t) (\qPi_\alpha \lambda x:\alpha.\qeq_\alpha~(s~x)~(t~x))} $$ $$ \frac{}{\Gamma \vdash \qeq~((\lambda x : \alpha. s)~t)~(s [t/x])} $$ $$ \frac{\Gamma, x: \alpha \vdash r : \beta \quad \Gamma \vdash r[A/x] \quad \vdash \qeq_\alpha~A~B}{\Gamma \vdash r[B/x]} $$

The first rule is an axiom that says $\tprop$ consists of $\qtrue$ and $\qfalse$. This looks like a data type specification for $\tprop$. The second rule is an axiom that says that if two objects are equal then they have the same properties (Leibniz would say "indiscernibility of identicals"). The third rule is an axiom that says two expressions of function type are equal, if they agree on all arguments; this is the extensional definition of function equality.

The fourth rule is an axiom for $\beta$-conversion. We're conveniently hiding all the naming stuff again behind $[t/x]$ notation; but essentially this lets us apply $\beta$-reduction as we please.

The fifth rule is a proper inference rule - the only actual inference rule, which has a premiss $\Gamma \vdash r$ - that says "equals can be replaced by equals", anywhere in a term. This is where the actual reasoning happens. For example, a proposition $\forall x:\alpha. P$ is encoded as a $\qPi \lambda x:\alpha. P$ and proving it "within the system" is literally showing that $\lambda x:\alpha. P$ is equal to $\lambda x:\alpha. \mathsf{true}$.

Wrapping up

I presented in a notation/style that is different from both Andrews's book or the Stanford Encyclopedia of Philosophy's entry on Church's type theory. It is closer to the way a typed programming languages would be formulated.

We have only scratched the surface, but sketched how a whole formal system of reasoning can indeed be based on $\lambda$ calculus playing the role of characteristic function. And it didn't actually involve Curry-Howard correspondence. I like to think of this situation as "multiple levels of logic," and topics like higher-order unification and meta-logic/logical frameworks make use of this.

I hope these notes can help raise awareness for what is going when simple type theory and higher order logic. A lot more can be said, but that is for another post. If you are interested to dive deeper, here are some references that talk about higher-order logic.

  Stanford Encyclopedia of Philosophy's entry on Church's type theory

  Jean Gallier. "Constructive Logics. Part I: A Tutorial on Proof Systems and Typed λ-Calculi", freely accessible on his logic papers page.

  William M. Farmer "The Seven Virtues of Type Theory" https://doi.org/10.1016/j.jal.2007.11.001

  Peter Andrews "An Introduction to Mathematical Logic and Type Theory: to Truth through Proof" (2002)

  Alonzo Church. "A Formulation of the Simple Theory of Types". J. Symb. Logic, 5 (1) (1940)

  Dag Prawitz. "Natural deduction: A Proof-Theoretical Study" (2006) Dover - new issue of the celebrated 1965 work.