Lambda-calculus


	Short Introduction to the Lambda-calculus
	Franco Barbanera with the collaboration of Claudia & Tiziana Genovese and Massimiliano Salfi

Contents

1. Introduction to functional programming

1.1 Introduction
1.2 Functional programming
1.3 Sketchy introduction to types
1.4 Higher order functions and curryfication
1.5 Recursion

2. The λ-calculus and computation theory

2.1   Lambda-terms
2.2   Free and bound variables
2.3   Substitution
2.4   The formalization of Notion of Computation in λ-calculus: the Theory of β-reduction
2.5   Bracketing conventions
2.6   Curried function
2.7   Multi-step reduction
2.8   Normalization and possibility of nontermination
2.9   Some important results of β-reduction theory
2.10 Formalizing the notion of equivalent programs: the theory of β-conversion

3. Representation of data type and operations in λ-calculus

3.1   The Natural numbers
3.2   The Booleans
3.3   Arithmetic on Church's numerals
3.4   Defining recursive functions using fixed points
3.5   Reduction strategies
3.6   Head normal forms

4. Types

4.1   Introduction to types
4.2   Type assignment 'a la Curry
4.3   Principal pair
4.4   The typed lambda calculus a' la Church.
4.5   Relation between the systems of Curry and Church.
4.5   Lambda-calculus extensions and their type systems a' la Curry.

1. Introduction to functional programming

1.1 Introduction

What does computing mean?
Roughly, to transform information from an implicit into an explicit form.

But this notion means nothing unless we make it precise, formal, by defining a computational model.
What is a Computational model?
A particular formalization of the notions connected to the concept of "computing".
The rather vague concept of computing can be formalized in several different ways, that is by means of several different computational models:
the Lambda-calculus, the Turing machine, etc.
"To program" means to specify a particular computation process, in a language based on a particular computational model.

Functional programming languages are based on the Lambda-calculus computational model.
Imperative programming languages are based on the Turing-machine computational model.

The fundamental concepts in imperative programming languages (coming from the Turing Machine model) are:

store
variable - something it is possible to modify ("memory cell")
assignment
iteration

In functional programming these concepts are absent (or, in case they are present, have a different meaning).
Instead, in it we find:

expression
recursion - different from the notion of iteration
evaluation - different from the notion of execution:

when we evaluate a mathematical expression we do not modify anything.

variable - intended in a mathematical sense:

unknown entity
abstraction from a concrete value

1.2 Functional programming

A program in a functional language consists in a set of function definitions and an expression (usually formed using the defined functions). The Interpreter of the language evaluates this expression by means of a discrete number of computation steps, making its meaning explicit (as pointed out before, during a computation, nothing is actually created, but - a fact extremely clear in functional programming - the representation of the information is "transformed".)

A function definition in functional programming is not simply a means to uniquely identify a mathematical function on a domain of objects.

Example: a mathematical definition of two functions

f: R -> R , g: R -> R
f '' - 6g' = 6 sen x
6g'' + a² f ' = 6 cos x
f(0) = 0, f '(0) = 0, g(0) = 1, g'(0) = 1

This set of equations precisely identify two precise f and g.

In the above example f and g are uniquely identified by the set of differential equation, but such a definition is not useful from a computational point of view.
A function definition in a functional programming language, instead, has also to show how to compute (evaluate) the defined function on a given argument.

Example: function definition in Scheme and in Haskell

(define Inctwo (lambda (n) (+ n 2)))

Inctwo n = n + 2

The evaluation of the application of a (user-defined) function to an argument consists in evaluating the expression consisting in the body of the function in which the formal parameter is replaced by the actual parameter (the argument).
To evaluate an expression means to first look for a sub-expression which is the application of a function to some arguments and then to evaluate such an application. In this way, in a sense, the meaning of the whole expression is made more and more explicit.

Example: evaluating (Inctwo 3)

Inctwo is a user defined function, and it is applied to an argument, so we take the body of the function, replace the formal parameter n by the actual parameter 3, obtaining
3+2, and then we evaluate 3+2 obtaining 5

In the example above we have seen two kinds of computational steps from Inc 3 to 5. In the first one we have simply substituted in the body of the function the formal parameter by the actual parameter, the argument, and in the second one we have computed a predefined function. We shall see that the essence of the notion of computation in functional programming is indeed embodied by the first type of computational step. The simple operation of replacing the arguments of a function for its formal parameters in its body contains all the computing power of functional programming. Any sort of computation in functional programming could indeed be defined in terms of this one. We shall see this later, when we shall represent this sort of computational step with the beta-reduction rule in the lambda calculus and we shall see that, if one wish, any possible computation can be defined in terms of this beta-rule.

When we evaluate an expression there are sometimes more than one sub-expression one can choose and hence different possible evaluation paths can be followed. An evaluation strategy is a policy enabling to decide on which sub-expression we shall perform the next evaluation step. Some evaluation strategies could lead me in a never-ending path (this fact is unavoidable); but I would like to be sure that all finite path do lead me to the very same value (this will be guaranteed by the Confluence Theorem of the Lambda-calculus).

Example: In Haskell

    inf = inf                // definition of the value inf
    alwayseven x = 7 // definition of the function alwayseven
    alwayseven inf     // expression to be evaluated

Two possible choices in the evaluation of alwayseven Inf:

We evaluate inf first (the evaluation strategy of Scheme).
We evaluate alwayseven inf first (the evaluation strategy of Haskell, a lazy language).

case 1. Replace inf by inf => we go on endlessly
case 2. Replace alwayseven inf by 7 => we terminate in a single step.

One of the distinguishing features in a programming language is the evaluation strategy it uses.

We know how to define the function factorial in a recursive way. Let us give now an example of mutually recursive functions.

even x=if x=0 then true
else odd(x-1)
odd x=if x=0 then false
else even(x-1)

With our recursive definitions we do not only specify functions, but we also provide a method to compute them: a recursive definition definition says that the left-hand side part of the equation is equivalent (has the same "meaning") of the right-hand side. The right-hand side, however, has its meaning expressed in a more explicit way. We can replace the left-hand side part with the right-hand one in any expression without modifying its meaning.
We can notice that in order to evaluate "even 3" we can follow different evaluation strategies. It is also possible to perform more reductions in parallel, that is to replace two or more sub-expressions with equivalent sub-expression (reduction step) at the same time. A similar thing is not possible in imperative languages since we need to take into account the problem of side effects:
running in parallel the instructions of the subprograms computing two functions F₁ and F₂ can produce strange results, since the two subprograms could use (and modify) the value of global variables which can affect the results produced by the two functions. Of course, because of the presence of global variables, the value of a function can depend on time and on how many times we have computed it before (in general in imperative languages it is difficult to be sure that we are programming actual mathematical functions.) We have not the above mentioned problems in functional programming languages, since there not exists side effects (there are no global modifyable variables). What we define are therefore actual mathematical functions; hence in an expression the value of any sub-expression does not depend on the way we evaluate it or other ones, that is the same expression always denotes the same value (a property called referential transparency).

1.3 Sketchy introduction to types

Some functional programming languages are typed. A type, in general, can be looked at as a partial specification of a program. These partial specifications, according to the particular type sytem of the language considered, can be more or less detailed.

By means of types we can avoid errors like:

true + 3

in fact the interpreter knows that

+:: int x int → int

In (strongly) typed languages this error is detected by the Type Checker, before the program is "executed". In not (strongly) typed languages this error is detected at run-time, by the interpreter itself (the program's execution is stopped whenever an error is detected). In typed languages not only the functions have a type, but all the possible correct expressions have (or can be given) a type.
In some typed functional languages it is possible to define polymorphic functions, that is function which, roughly, have the same behaviour on arguments of different types (a polymorphic type describes a whole set of (uniform) behaviours).

Example:

ident:: a → a (a is a type-variable. It denotes a "generic" type.)
ident x = x

We shall study very simple types, like int->bool (it simply specifies that a function (program) with this type has an integer as an argument and a boolean as result). Notice, however, that there exist type systems so expressive that they can specify in a very precise way what a function computes. For istance there could be type systems with a type like

Forall x : Int Exists y: Int . y=x+1

A function with this type have integers as argument, integers as results, and computes the successor function (of course the type does not specify how the successor of a given integer is computed).

1.4 Higher order functions and curryfication

In functional programming languages functions are first-class entities, i.e. can be treated as normal data. Hence they can be given as argument to other functions or be the result of a function.
A simple example is the following Scheme function

(define High-inc (lambda (x) (lambda (y) (+ x y))))

High-inc takes a number x and returns a functions that increments its argument by x. The function Inctwo seen before can be seen as the result of applying 2 to High-inc. As a matter of fact we could have defined Intwo in the following alternative way:

(define Inctwo (High-inc 2))

Function that can have functions as argument or result are called higher-order functions.

The notion of higher-order function enables us to introduce the notion of curryfication, a relevant concept in functional languages. To currify means to transform a function f of arity n, in an higher-order function f_cur of arity 1 such that also f_cur(a₁) is a higher order function of arity 1, and so are f_cur(a₁)(a₂) up to f_cur(a₁)(a₂)..(a_n-1) and such that f(a₁,..,a_n)=f_cur(a₁)..(a_n).
A binary function of type:

A x B → C

is currifyed to a function of type:

A → (B → C)

where any function arity is 1. (Higher-order functions and curryfication are treated in Section 1.7 of the reference below.)

We can notice that the function High-inc is indeed the currified version in Scheme of the addition.
The possibility of using currified functions definitely improves the expressive power of functional languages (that is why in Haskell all functions are already in their currified version, even the used-defined ones.)

Note: Also in imperative languages it is possible to program functions which have functions as arguments, but in general what is given as argument is simply a pointer to some code (which needs to be treated as such).

1.5 Recursion

The computational power of functional languages is provided by the recursion mechanism; hence it is natural that in functional languages it is much easier to work on recursively defined datatypes.

Example:

the datatype List is recursively defined as follows:
a list is either [] (the empty list) or elem:list (the concatenation of an element with a list).

BinaryTree (unlabelled) is another example:
an element of BinaryTree is either EmptyTree or a root at the top of two elements of BinaryTree.

Notice that there is a strong relation between the Induction principle as a mathematical proof tool and Recursion as a computation tool.

A property is true (for natural numbers) if

One proves the base case;
By assuming the property to hold for n, one proves the property for n+1

This is the principle of numerical induction; it closely corresponds to the mechanism of defining a function on numbers by recursion: A function (on numbers) is defined if

One defines it for the base case;
By assuming the function to be defined for n, one defines it for n+1

Of course we can use recursion on complex recursive (inductive) data types. And in fact there exist general versions of the induction principle, as we shall see in the course.

Data are not modified in (pure) functional programming
We are used to work on data structures that are intrinsically tied to the notion of "modification", so we usually thing that an element of a data type can be modified. This is not true in functional programming. For example, a functional program that takes as input a numeric labelled tree and increments its labels, does not modify the input. It simply builds a new tree with the needed characteristics. The same holds when we push or pop elements from a stack: each time a new stack is generated.
This causes a great waste of memory (the reason why in implementations of functional programming languages a relevant role is played by the Garbage Collector). Of course there can be optimizations and tricks used in implementations that can limit this phenomenon.

Text to use: R. Plasmeijer, M. van Eekelen, "Functional Programming and Parallel Graph Rewriting", Addison-Wesley, 1993. Cap. 1 (basic concepts)

2. The λ-calculus and computation theory

2.1 Lambda-terms

The fundamental concepts the Lambda-calculus is based on are:

variable         ( formalisable by   x, y ,z,…)
abstraction    (formalisable by   λx.M where M is a term and x a variable)
application    (formalizable by M N, where M and N are terms )

We have seen that these are indeed fundamental notions in functional programming. It seems that there are other important notions: basic elements, basic operators and the possibility of giving names to expressions, but we shall see that the three above notions are the real fundamental ones and that we can found our computational model just on these three concepts.
Using the formalization of these three concepts we form the lambda-terms. These terms will represent both programs and data in our computational model (whose strenght strongly depends on the fact it does not distinguish between "programs" and "data".)

The formal definition of the terms of the Lambda-calculus (lambda-terms) is given by the following grammar:

Λ ::= X | (ΛΛ) | λX.Λ

where Λ stands for the set of lambda-terms and X is a metavariable that ranges over the (numerable) set of variables (x, y, v, w, z, x2, x2,...)
Variable and application are well-known concepts, but what is abstraction ?
In a sense, it is a notion tied to that of "abstracting from a concrete value" (i.e. disregarding the specific aspects of a particular object, which, in our case, is an input.)
Abstraction is needed to create anonymous functions (i.e. functions without a name). When we define in Haskell the function fact, fact n = if (n=0) then 1 else n*fact(n-1) we are doing indeed two things: we are specifying a function and we are giving the name fact to it. Being able to specify a function is essential in our computational model, but we can avoid to introduce in our model the notion of "giving a name to a function". You could ask: how is it possible to define a function like fact without being able to refer to it trought its name??? We shall see.

Example of (functional) abstraction:

We can specify the square function by saying that it associates 1 to 1*1, 2 to 2*2, 3 to 3*3, ecc. In the specification of the function the actual values of the possible particular inputs are not needed, that is we can abstract from the actual values of the input by simply saying that the function square associates x*x to the generic input x.
We could use the mathematical notation x |—► x*x to specify such a particular relation between domain and codomain; with such mathematical notation we specify a function without assigning to it a particular name (i.e. we specify an anonymous function).

In Lambda-calculus we describe the relation input-output by means of the lambda-abstraction operator: the term λx. M represents the anonymous function that, given an input x, returns the "value" of the body M.
Other notions, usually considered as primitive, like basic operators (+, *, -, ...) or natural numbers, are not strictly needed in our computational model (indeed we shall see that they can actually be treated as derived concepts.)

In an abstraction λx.P, the term P is said to be the scope of the abstraction.

2.2 Free and bound variables

A variable x is said to be bound in M whenever it is abstracted in some subterm λx.P of M.
A variable is free in M if it is not bound.
These two concepts can be formalized by the following definitions.

Definition of bound variable
We define BV(M), the set of the Bound Variables of M, by induction on the term as follows:

BV(x) = Φ (the emptyset)
BV(PQ) = BV(P) U BV(Q)
BV(λx.P) = {x} U BV(P)

Definition of free variable
We define FV(M), the set of the Free Variables of M, by induction on the term as follows:

FV(x) = {x}
FV(PQ) = FV(P) U FV(Q)
FV(λx.P) = FV(P) \ {x}

Free and bound variables represent two different concepts in a term. A bound variable in the body of an abstraction represents the argument of a function, whereas a free variable represent an (unknown) value.

2.3 Substitution

The notation M [L / x] denotes the term M in which any free occurrence of the variable x in M is replaced by the term L.

Definition of substitution (by induction on the structure of the term)
1. If M is a variable (M = y) then:

_{y [L/x]}_≡	{	L if x=y
		y if x≠y

2. If M is an application (M = PQ) then:

PQ [L/x] = P[L/x] Q[L/x]

3. If M is a lambda-abstraction (M = λy.P) then:

_{λy.P [L/x]}_≡	{	λy.P if x=y
		λy.(P [L /x]) if x≠y

Notice that in a lambda abstraction a substitution is performed only if the variable to be substituted is free (not bound).

The definition of substitution given above, however, is not completely correct. Indeed it does not prevent some harmful ambiguities and errors:

Example:

Both terms   (λx.z)    and    (λy.z)    obviously represent the constant function which returns z for any argument.

Let us assume we wish to apply to both of these terms the substitution [x / z] (we replace x for the free variable z). If we do not take into account the fact that the variable x is free in the term x and bound in (λx.z) we get:

(λx.z) [x / z] => λx.(z[x / z]) => λx.x     representing the identity function
(λy.z) [x / z] => λy.(z[x/ z]) => λy.x     representing the function that returns always x

This is absurd, since both the initial terms are intended to denote the very same thing (a constant function returning z), but by applying the substitution we obtain two terms denoting very different things.

Hence a necessary condition in order the substitution M[L/x] be correctly performed is that FV(L) and BV(M) be disjoint sets.

Example:

Let us consider the following substitution:

(λz.x) [zw/x]

The free variable z in the term to be substituted would become bound after the substitution. This is meaningless. So, what it has to be done is to rename the bound variables in the term on which the substitution is performed, in order the condition stated above be satisfied:

(λq.x) [zw/x]

now, by performing the substitution, we correctly get: (λq.zw)

The example above introduces and justifies the notion of alpha-convertibility

The notion of alpha-convertibility

A bound variable is used to represent where, in the body of a function, the argument of the function is used. In order to have a meaningful calculus, terms which differ just for the names of their bound variables have to be considered identical. We could introduce such a notion of identity in a very formal way, by defining the relation of α-convertibility. Here is an example of two terms in such a relation.

λz.z =_α λx.x

We do not formally define here such a relation. For us it is sufficient to know that two terms are α-convertible whenever one can be obtained from the other by simply renaming the bound variables.
Obviously alpha convertibility mantains completely unaltered both the meaning of the term and how much this meaning is explicit. From now on, we shall implicitely work on λ-terms modulo alpha conversion. This means that from now on two alpha convertible terms like λz.z and λx.x are for us the very same term.

Example:

(λz. ((λt. tz) z)) z

All the variables, except the rightmost z, are bound, hence we can rename them (apply α-conversion). The rightmost z, instead, is free. We cannot rename it without modifying the meaning of the whole term.

It is clear from the definition that the set of free variables of a term is invariant by α-conversion, the one of bound variables obviously not.

2.4 The formalization of the notion of computation in λ-calculus: The Theory of β-reduction

In a functional language, to compute means to evaluate expressions. The process of evaluation makes the meaning of a given expression more and more explicit. Up to now we have formalized in the lambda-calculus the notions of programs and data (the terms). Let us see now how to formalize the notion of computation. In particular, the notion of "basic computational step".
In the introduction we noticed that, during the evaluation of a term, whenever there is a function applied to an argument, this application can be replaced by the body of the function in which the formal parameter is replaced by the actual argument.
So, if we call redex any term of the form (λx.M)N, and we call M[N/x] its contractum, to perform a basic computational step on a term P means that P contains a subterm which is a redex and that such a redex is replaced by its contractum (β-reduction is applied on the redex.)

Definition (The notion of β-reduction)
The relation between redexes and their contracta is called notion of β-reduction, and it is formally represented by the following β-rule: (λx.M) N →_β M [N/x]

Definition (One step β-reduction)
We say that a term P β-reduces in one-step to a term Q (notation: P→_βQ) if Q can be obtained out of P by replacing a subterm of P of the form (λx.M)N by M[N/x]

Example:

If we had the multiplication in our system, we could perform the following computation steps
(λx. x*x) 2 => 2*2 => 4
Here the first step is a β-reduction; the second computation is the evaluation of the multiplication function. We shall see how this last computation can indeed be realized by means of a sequence of basic computational steps (and how also the number 2 and the function multiplication can be represented by λ-terms).

Example:

(λz. (zy) z) (x (xt)) →_β (x (xt) y) (x (xt))

Example:

(λy. zy) (xy) →_β z(xy)
here it could be misleading the fact that the variable y appears once free and once bound. As a matter of fact these two occurrences of y represent two very different things (in fact the bound occurrence of y can be renamed without changing the meaning of the term, the free one not). Notice that in this case, even if there is this small ambiguity, the beta reduction can be performed without changing the name of the bound variables. Why?

A formal system for the one-step beta-reduction relation

If one wishes to be very precise, it is possible to define the relation of one-step beta reduction by means of a formal system.

First recall roughly what a formal system is.

Definition
A formal system is composed by axioms and inference rules enabling us to deduce valid "Judgements"

Example:

In logics we have formal systems enabling to infer the truth of propositions, that is the judgments we derive are of the form "P is true".

We can introduce a formal system whose only axiom is the β-rule we have seen before:

———————————
(λx. M) N →_β M [N/x]

and whose inference rules are the following ones:

if

then we can infer

P →_β P'
—————————
λx. P →_β λx. P'

P →_β P'
—————————
L P →_β L P'

P →_β P'
—————————
P L →_β P' L

The judgements we derive in the formal system defined above have the form M →_β M' and are interpreded as "the term M reduces to the term M' in one step of computation (β-reduction)".

Example:

in this formal system we can have the following derivation:

——————
(λx.x) y → y
——————————
(zz) ((λx.x) y) → (zz) y
———————————————
λw.((zz) ((λx.x) y)) → λw ((zz) y)

This is a (very formal) way to show that the term λw.((zz) ((λx.x) y)) reduces in one step to the term λw.((zz) y) (less formally, and more quickly, we could have simply spotted the redex (λx.x) y inside the term and replaced it by its contractum y)

So, our formal system defines a binary relation on lambda-terms, beta reduction, and we have seen that this can be done informally by saying that a term M is in the relation →_β with a term N ( M β-reduces with one step to N) if N can be obtained by replacing a subterm of M of the form (λx.P)Q by P[Q/x]

2.5 Bracketing conventions

For readability sake, it is useful to abbreviate nested abstractions and applications as follows:

(λx₁.(λx₂. ... . (λx_n.M) ... ) is abbreviated by (λx₁x₂ ... x_n.M)

( ... ((M₁M₂)M₃) ... M_n) is abbreviated by (M₁M₂M₃ ... M_n)

We also use the convention of dropping outermost parentheses and those enclosing the body of an abstraction.

Example:

(λx.(x (λy.(yx)))) can be written as λx.x(λy.yx)

2.6 Curried functions

Let us see what's the use of the notion of curryfication in the lambda-calculus.

As we mentioned before, if we have, say, the addition function, is is always possible to use its currified version AddCurr : N → (N → N) and any time we need to use it, say Add(3,5), we can use instead ((AddCurr(3)) 5). This means that it is not strictly needed to be able to define functions of more than one argument, but it is enough to be able to define one argument functions. That is why the λ-calculus, that contains only the strictly essential notions of functional programming, has only one-argument abstraction.

Notice that, by using curried functions, not only we lose no computational power, but, even better, we get more expressive power. For instance, if we wish to use a function that increments an argument by 3, instead of defining it we can simply use AddCurr(3).

Not all the functional languages have a built-in curryfication feature. Scheme has it not. In Scheme any function you define has a fixed arity, 1 or more. If you wish to use a curryfied version of a function you have to define it explicitely. In Haskell instead, all function are implicitly intended to be currified: when you define Add x y = x + y you are indeed defining AddCurr.

2.7 Multi-step reduction

Strictly speaking, M → N means that M reduces to N by exactly one reduction step. Frequently, we are interested in whether M can be reduced to N by any number of steps.

we write M —» N if M → M₁ → M₂ → ... → M_k ≡ N (K ≥ 0)

It means that M reduces to N in zero or more reduction steps.

Example:

((λz.(zy))(λx.x)) —» y

2.8 Normalization and possibility of non termination

Definition
If a term contains no β-redex, it is said to be in normal form.
Example:

λxy.y and xyz are in normal form.

A term is said to have a normal form if it can be reduced to a term in normal form.
Example:

(λx.x)y is not in normal form, but it has the normal form y

Definition:
A term M is normalizable (has a normal form, is weakly normalizable) if there exists a term Q in normal form such that

M —» Q

A term M is strongly normalizable, if there exists no infinite reduction path out of M. That is, any possible sequence of β-reductions eventually leads to a normal form.

Example:

(λx.y)((λz.z)t)

is a term not in normal form, normalizable and strongly normalizable too. In fact, the only two possible reduction sequences are:

1. (λx.y)((λz.z) t) → y

2. (λx.y)((λz.z) t) → (λx.y) t) → y

To normalize a term means to apply reductions until a normal form (if any) is reached. Many λ-terms cannot be reduced to normal form. A term that has no normal form is

(λx.xx)(λx.xx)

This term is usually called Ω.

Although different reduction sequences cannot lead to different normal forms, they can produce completely differend outcomes: a reduction sequence could lead to a normal form while onother one could never terminate. Typically, if M has a normal form and admit also an infinite reduction sequence, it contains a subterm having no normal form but that can be erased by some reduction. For instance, the reduction

(λy.a)Ω → a

reaches a normal form, by erasing the Ω. This corresponds to a call-by-name treatment of arguments: the argument is not reduced, but substituted as it is into the body of the abstraction.
The attempt to normalize the argument, instead, generates an endless reduction sequence:

(λy.a)Ω →(λy.a)Ω → ...

Evaluating the argument before its substitution into the body of the function correspons to a call-by-value treatment of function application. In the two examples above, a call-by-value strategy never reaches the normal form.

One is naturally led to think that the notion of normal form can be looked at as the abstract formalization in the lambda-calculus of the notion of value in programming languages.
This, however, is a too naive approach. Even if it is intuitively true that something that cannot be further evaluated can be considered as a value, the vice-versa does not hold a-priori. As a matter of fact there can be many different interpretations of the notion of value. In Scheme, for instance, a lambda expression is a value: the interpreter does not evaluate any expression of the form λy.M.
Why does the interpreter of Scheme choose to consider a lambda abstraction as a value independently from what there is in its body? Well, otherwise it would not be possible to define in Scheme, say, a function that takes an argument and then never stops.
So we cannot formalize in general the notion of value. What a value is has to be specified according to the context we are in. So, when we define a functional programming language we need to specify, among other things, also what we mean by value in our language.

2.9 Some important results of β-reduction theory

The normal order reduction strategy consists in reducing, at each step, the leftmost outermost β-redex. Leftmost means, for istance, to reduce L (if it is reducible) before N in a term LN. Outermost means, for instance, to reduce (λx.M)N before reducing M or N (if reducible). The Normal Order reduction is obviously a call-by-name evaluation strategy.

Theorem (Standardization)
If a term has a normal form, this is always reached by applying the normal order reduction strategy.

Theorem (Confluence)
If we have terms M, M2 and M3 such that:

M —» M₁ and M —» M₂

then there always exists a term L such that:

M₁ —» L and M₂ —» L

That is, in diagram form:

Corollary (Uniqueness of normal form)
If a term has a normal form, it is unique.

Proof:

Let us suppose to have a generic term M, and let us suppose it has two normal forms: N and N'. It means that

M —» N and M —» N'

Let us now apply the Confluence theorem. Then there exists a term Q such that:

N —» Q and N' —» Q

But N and N' are in normal form and hence they contain no redex. So, it necessarily means that

N —» Q in 0 steps,

and hence that N ≡ Q.
Analogously we can show that N' ≡ Q. It is then obvious that N ≡ N'.

Definition
For a binary relation R, the diamond property holds whenever:

if aRb and aRc, then there exists d such that bRd and cRd

Using the definition above, we can reword the theorem of confluence in simpler way:

DP(—»)

Note that:

┐DP(→)

The one-step reduction relation does not satisfy the diamond property, that is, it is not true that the following diagram always commutes

In fact, consider the term (λx.xx)(I a), where I ≡ λx.x.
In one step this term reduces to (λx.xx)a or (I a)(I a). These terms can both be reduced to aa, but there is no way to complete the diamond with a single-step reduction:

The problem, of course, is that (λx.xx) replicates its argument, which must then be reduced twice.

Sketck of the Proof of the Theorem of Confluence

2.10 Formalizing the notion of equivalent programs: the theory of β-conversion

In the actual programming activity we usually talk about "computational equivalence" of pieces of code or of whole programs (for instance when we try to do some optimizations.)
Being able to formally deal with such a notion would be useful for many things, like safely modifying and optimizing programs or proving properties of programs.
In the present section, we formalize the notion of equivalence of programs into the lambda calculus. In order to do that we can start by asking ourself what does it means for two terms to be computationally equivalent.
Informally, we could say that two terms are so when they "represent the same information".
In such a case the following definition should be the right one.

Definition (The relation of β-conversion)
M is β-convertible to N, in symbols

M =_β N

if and only if M can be transformed into N (or N into M) by performing zero or more β-reductions and β-expansions.
(A β-expansion being the inverse of a reduction: P[Q/x] ← (λx.P)Q )

That is, two terms M and N are β-convertible if and only if there exist terms N₁,..,N_k (k≥ 0) such that

M —» N₁ «— M₁—» N₂«— M₂ ... M_k-1—»N_k«— M _k ≡ N

Let us define now a formal system allowing to derive all, and only, the valid judgements concerning the beta-convertibility of terms, that is of the form M =_βN ("M is β-convertible to N".)

A formal system for β-conversion
Axiom:

——————————
(λx.M) N) =_β M [N/x]

Inference rules:

if

then we can infer

P =_β P'
—————————
λx. P =_β λx. P'

P =_β P'
—————————
Q P =_β Q P'

P =_β P'
—————————
P Q =_β P' Q

β-conversion is reflexive, symmetric and transitive too, hence we must add also the following axiom and rules:

———— M =_β M	M =_β N ———— N =_β M	L =_β M M =_β N ————————— L =_β N
———— M =_β M

We have thus formally defined the relation of beta-conversion as the smallest equivalence relation containing the beta-rule and compatible with the operations of formation of terms (hence a congruence).
We have said before that such a relation aims at being a precise formalization of the notion of equivalence between (functional) programs, since, by its definition, it equates terms containing the very same "information".
Are we sure, however, that such "information" we are equating is actually a "computational information"? In order to answer to such a question, let us notice that we can look at any term as a function (this is reasonable, since any term can be applied to any other term). From such a point of view, what does it mean for two terms to represent the same "computational information"? Does it just mean that they represent the same function? i.e. that they "return" the same output when given the same input?

If this were the case, the notion represented by the relation of β-equivalence would be useless for us computer scientists, since we are not only concerned about what output is produced by a program, given a certain input, but we are also (and mainly) interested in "how" the result is computed. In other words, we are more interested in algorithms than in functions (a computer scientist would not consider a program implementing the bubblesort equivalent to a program implementing the mergesort, although they both compute the same function, the sorting one.)

Fortunately the β-conversion relation does not equate two terms just in case they compute the same function, as can be roughly shown by the following counterexample. Let us consider the two terms (λx.yx) and y. They actually compute the same function from terms to terms since, if we apply the same input, say M, we get the very same output

1. (λx.yx)M —» yM

2. yM —» yM

Nonetheless (λx.yx) and y are not considered equivalent by the β-conversion relation, since it is easy to show, using the Church-Rosser theorem below, that

(λx.yx) ≠_β y

So, informally, we can not only say that two β-convertible terms "contain the same information", but also that this information is actually a computational, algorithmic information.

It is not wrong then to state that the actual notion formalized by β-conversion is:

two terms are β-convertible if and only if they compute the same function using the same algorithm.

For a more convincing example, when later on we shall represent numerical mathematical function in the lambda-calculus, you will be able to check that both λnfx.(f(nfx)) and λnfx.(nf(fx)) compute the successor function on natural numbers. Nonetheless, being two distinct normal forms, they cannot be β-convertible (by the Church-Rosser theorem). You can notice in fact that they represent different "algorithms".

And in case we would like to look at lambda-terms from an extensional point of view? (that is from a mathematical point of view, looking at them as functions). Well, in case we were interested in equating terms just when they produce the same output on the same input, we should consider a different relation: the beta-eta-conversion.

The theory of beta-eta-conversion (=_βη)
The formal system for =_βη is like the one for =_β, but it contains an extra axiom:

λx.Mx =_η M if x is not in FV(M)

Instead of adding this axiom, we could, equivalently, add the following inference rule, more intuitive, but with the drawback of having a universally quantified premise:

For all L: ML = NL
——————————
M = N

Obviously, from a Computer Science point of view we are more interested in =_β than in =_βη

Example: In the theory of beta-eta-conversion it is possible to prove that

(λx.yx) =_βη y

Remark: you could now wonder why the two non β-convertible terms considered before, λnfx.(f(nfx)) and λnfx.(nf(fx)) cannot be made equal even in the beta-eta-conversion theory. Well, the lambda-calculus is a very abstract theory and a term has be looked at as a function on all possible terms, while the two terms above compute the same function only on a subset of the lambda-calculus terms, that is on those lambda-terms which represent natural number, the so called Church numerals we shall discuss shortly.

Let us go back to our β-conversion and let us have a look at a few relevant results of the theory of β-conversion.

Theorem (Church-Rosser)
If two terms are beta-convertible (M =_β N), then there exists Q such that

M —» Q and N —» Q

The Church-Rosser theorem enables to prove that the theory of β-conversion is consistent.
In a logic with negation, we say that a theory is consistent, if one cannot prove a proposition P and also its negation ┐P. In general, a theory is consistent if not all judgements are provable (i.e. there exists some judgments that are not provable). Thanks to the Church-Rosser theorem, we can show that the λ-calculus is consistent.

Corollary (Consistency of =_β)
There exist at least two terms that are not β-convertible

Proof:

Let us consider the two terms (λx.yx) and y (any pair of distinct normal forms will work the same).
By contradiction, let us assume that

(λx.yx) =_β y

Then, by the Church-Rosser theorem, there exists a term Q such that

(λx.yx) —» Q and y —» Q

But since the two terms are both in normal form, the above reductions must necessarily be 0-step reductions. This implies that

(λx.yx) ≡ y

This is impossible, hence

(λx.yx) ≠_β y

The Church-Rosser theorem and the Confluence theorem are equivalent and in fact are often used and referred to interchangedly:

The Church-Rosser theorem (C-R) and the Theorem of confluence (CONF) are logically equivalent, that is it is possible to prove one using the other

Proof:

C-R => CONF

Let us assume the Church-Rosser theorem and prove Confluence.
In order to prove confluence let us take M, M₁ and M₂ such that

M —» M₁ and M —» M₂

By definition of β-conversion we have then that M₁ =_β M₂. We can now use the Church-Rosser theorem that guarantees that, whenever M₁ =_βM₂, there exists L such that

M₁ —» L and M₂ —» L

CONF => C-R

Let us assume the Confluence theorem and prove the Church-Rosser one. We then consider two terms M and M' such that

M =_β M'

By definition of beta-conversion, it means that I can transform M into M', by performing zero or more β-reductions / expansions:

M —» N₁ «— M₁—» N₂«— M₂ ... M_k-1—»N_k«— M _k = M'

We can now use the Confluence theorem on the terms M₂, N₁ and N₂, inferring that there exists a term L₁ such that

By iterating the process we get:

We give now an important theorem of undecidability for the theory of beta conversion.

Definition (Closure by a relation)

A set S is said to be closed with respect to a binary relation R iff a belongs to S and aRb implies that b belongs to S

Theorem (Undecidability)

Let A be a set of lambda-terms which is not trival (that is, it is not empty and it does not contain all the lambda-terms.) Then if A is closed by beta-conversion then it is not recursive (that is, it is not possible in general, given a term, to decide in an effective way whether it belongs to A or not).

3. Representation of data types and operations in λ-calculus

3.1 The Natural numbers

In the syntax of the λ-calculus we have not considered primitive data types (the natural numbers, the booleans and so on), or primitive operations (addition, multiplication and so on), because they indeed do not represent really primitive concepts. As a matter of fact, we can represent them by using the essential notions already formalized in the λ-calculus, namely variable, application and abstraction.

The question now is: how can we represent in the λ-calculus a concept like that of natural number?
We stressed that the λ-calculus is a theory of functions as algorithms. This means that if we wish to represent natural numbers in the λ-calculus, we need to look at them as algorithms. How can that be possible???? A number is a datum, not an algorithm! Well... but we could look at a number as an algorithm. For instance, we could identify a natural number with the procedure we use to "build" it. Let us consider the number 2. How can we "build" it? Well, give me the successor, give me the zero and I can get the number 2 by iterating twice the successor function on the zero. How can I represent this procedure? By the lambda-term λfx. f (fx) !!
Notice that the algorithm to build the number 2 is independent from what the zero and the constructor actually are. We can abstract from what they actually are (fortunately, since nobody really knows what zero and what the successor are. Do you?).

So, in general, we can represent natural numbers as lambda-terms as follows

0 ≡ λfx. x    (it represents the natural number 0)
1 ≡ λfx. f x    (it represents the natural number 1)
2 ≡ λfx. f (f x)    (it represents the natural number 2)
3 ≡ λfx. f (f (f x))    (it represents the natural number 3)
.......................................................................................
n ≡ λfx. fⁿx    (it represents the natural number n)

where we define fⁿx as follows: f⁰x = x ; f^k+1x = f(f^kx)

n denotes the lambda-representation of the number n.

The above encoding of the natural numbers is the original one developed by Church. The terms just defined are called Church's numerals. Church's encoding is just one among the many ones that it is possible to devise.

The technique to think of data as algorithms can be extended to all the (inductively defined) data.

Example:

I can represent an unlabelled binary tree by the function that takes two inputs (the empty tree and the constructor that adds a root on top of two binary trees) and that returns as output the expression that represents how to build the desired tree out of the emptytree and the constructor.

In order to be sure that the representation of natural numbers we have given is a reasonable one, we have to show that it is also possible to represent in the lambda-calculus the operations on numbers. But what does it mean for a function on natural numbers to be definable in the lambda-calculus?

Definition (λ-definability)
A function f: N → N is said to be λ-definible (representable in the λ-calculus) iff there exists a λ-term F such that, for all n:

F n =_β f(n)

The term F is said to represent the function f.

Example:

To represent the function factorial, we need to find a λ-term (let us call it fact) such that, whenever applied to the Church's numeral that represents the natural number n, we get a term that is β-convertible to the Church's numeral representing the factorial of n, that is

fact 3 =_β 6

Why have we not defined how to represent in the lambda-calculus the numerical functions with more that one argument? Well, since it is enough to be able to represent their curryfied versions. So the definition given is enough.
We shall see shortly how to define binary numerical functions like addition. What we shall do is to look at addition as AddCurr : N → (N → N)

Before seeing how to lambda-represent some numerical functions, let us see how we can represent the Booleans.

3.2 The Booleans

An encoding of the booleans corresponds to defining the terms true, false and if, satisfying (for all M and N):

if true M N = M
if false M N = N

The following encoding is usually adopted:

true ≡ λxy.x
false ≡ λxy.y
if ≡ λpxy.pxy

As exercise, check that if actually works.

We have true ≠ false by the Church-Rosser theorem, since true and false are distinct normal forms. All the usual operations on truth values can be defined using the conditional operator. Here are negation, conjunction and disjunction:

and ≡ λpq. if p q false
or ≡ λpq. if p true q
not ≡ λp. if p false true

3.3 Arithmetic on Church's numerals

Using Church's encoding of natural numbers, addition, multiplication and exponentiation can be defined very easily as follows:

add ≡ λmnfx.mf (nfx)
mult ≡ λmnfx.m (nf) x
expt ≡ λmnfx.nmfx

Addition is not hard to check. In fact:

add m n	≡	(λmnfx.mf (nfx)) m n →_β (λnfx. m f (nfx)) n →_β
	→_β	λfx. m f (n f x) ≡ λfx.(λgx.g^m x) f ((λhx.hⁿ x) fx) →_β
	→_β	λfx.(λgx.g^m x) f ((λx. f ⁿ x) x) →_β λfx.(λgx.g^m x) f (f ⁿ x) →_β
	→_β	λfx.(λx.f ^m x)(f ⁿ x) →_β λfx.f^m (f ⁿ x) ≡ λfx.f ^m+n x ≡
	≡	m+n

Multiplication is slightly more difficult:

mult m n	≡	(λmnfx.m(nf) x) m n →_β (λnfx. m (nf) x) n →_β
	→_β	λfx. m (n f) x ≡ λfx.(λgx.g^m x) (n f) x→_β →_β
	→_β	λfx.(n f)^m x → λfx.((λhx.hⁿ x) f)^m x →_β λfx.(f ⁿ)^m x ≡ λfx.f ^m^{x n} x ≡
	≡	m x n

These reductions hold for all Church's numerals m and n,.

Here are other simple definitions:

succ ≡ λnfx. f (nfx)
iszero ≡ λn.n (λx. false) true

The following reductions, as you can check by yourself, hold for any Church's numeral n:

succ n —» n+1
iszero 0 —» true
iszero (n+1) —» false

Of course it is also possible to define the predecessor function (pred) in the lambda calculus, but we do not do that here.

Example:
We can easily give the representation of the following mathematical function on natural numbers

f(n,m)₌	{	3+(m*5) if n=0
		(n+(3*m)) otherwise

by means of the following lambda-term:

λnm. if (iszero n) (add 3 (mult m 5)) (add n (mult 3 m))

3.4 Defining recursive functions using fixed points

In the syntax of the λ-calculus there is no primitive oparation that enables us to give names to lambda terms. However, that of giving names to expressions seems to be an essential features of programming languages in order to define functions by recursion, like in Scheme

(define fact (lambda (x) (if (= x 0) 1 (* x (fact (- x 1))))))

or in Haskell

fact 0 = 1

fact n = n * fact (n-1)

Then, in order to define recursive functions, it would seem essential to be able to give names to expressions, so that we can recursively refer to them.
Surprisingly enough, however, that's not true at all! In fact, we can define recursive functions in the lambda-calculus by using the few ingredients we have already at hand, exploiting the concept of fixed point.

Definition
Given a function f: A → A , a fixed point of f is an element a of A such that f (a) = a.

In the lambda-calculus, since we can look at any term M as a function from lambda-terms to lambda-terms, i.e. M : Λ → Λ, we can define a fixed point of M, if any, to be any term Q such that:

M Q =_β Q

To see how we can fruitfully exploit the concept of fixed point, let us consider an example. Let us take the definition of factorial using a generic sintax

factorial x = if (x=0) then 1 else x * factorial(x-1)

The above equation can be considered a definition because it identifies a precise function. In fact the function factorial is the only mathematical function which satisfies the following equation, where f can be looked at as an "unknown" function

f x = if x=0 then 1 else x * f(x-1)

The above equation can be rewritten as follows, by using the lambda notation

f =λx. if x=0 then 1 else x * f(x-1)

The function we are defining is hence a fixpoint, the fixpoint of the functional

λf.λx. if x=0 then 1 else x * f(x-1)

In fact it can be checked that

(λf.λx. if x=0 then 1 else x * f(x-1)) factorial = factorial

Now, we want to lambda-represent the factorial function. It is easy to see that the functional

λf.λx. if x=0 then 1 else x * f(x-1)

can be easily lambda-represented:

λ f.λx.if (iszero x) 1 (mult (f(sub x 1)) x)

Then, a lambda term F that lambda-represents the factorial function must be a fixed point of the above term. Such an F would in fact satisfy the following equation

F x =_β if (iszero x) 1 (mult x (F(sub x 1)))

and hence would be really the lambda-representation of the factorial.

But is it possible to find fixed points of lambda-terms? And for which lambda-terms? Well, the following theorem settles the question once and for all, by showing us that indeed all lambda-terms have a fixed point. Moreover, in the proof of the theorem, it will be shown how easy it is possible to find a fixed point for a lambda term.

Theorem
Any λ-term has at least one fixed point.

Proof.
We prove the theorem by showing that there exist lambda terms (called fixed point operators) which, given an arbitrary lambda term, produce a fixed point of its.
One of those operators (combinators) is the combinator Y discovered by Haskell B. Curry.
It is defined as follows:

Y ≡ λf. (λx. f(xx)) (λx. f(xx))

It is easy to prove that Y is a fixed point combinator. Let F be an arbitrary λ-term. We have that

Y F	≡	(λf. (λx. f(xx)) (λx. f(xx))) F → (λx. F(xx)) (λx. F(xx)) →
	→	F ((λx. F(xx)) (λx. F(xx))) ←	(3)
	←	F((λf.(λx. f(xx)) (λx. f(xx)))F) ≡ F (Y F)

That is

F (Y F) =_β (Y F)

Then the λ-term that represents the factorial function is:

Y(λf.λx.if (iszero x) 1 (mult x (f(sub x 1))))

Let us consider another example of recursively defined function.

g x = if (x=2) then 2 else g x (*)

What is the function defined by this equation? Well, it is the function that satisfies the equation, where g is considered an "unknown" function.

Ok, but we have a problem here, since it is possible to show that there are many functions that satisfy this equation. In fact, it is easy to check that

factorial x = if (x=2) then 2 else factorial x

And also the identity function satisfies the equation, since

identity x = if (x=2) then 2 else identity x

And many more. For instance, let us consider the function gee that returns 2 when x=2 and that is undefined otherwise. Well, also gee satisfies the equation:

gee x = if (x=2) then 2 else gee x

So, which one among all the possible solutions of (*), that is which one among all the possible fixed point of the functorial

λg.λx. if x=2 then 2 else g x

do we have to consider as the function identified by the definition (*)?.
We all know that it is gee the intended function defined by (*). Why?
Because implicitely, when we give a definition like (*), we are considering a computational process that identifies the least among all the possible solutions of the equation (least in the sense of "the least defined"). This means that, in order to correctly lambda-define the function identified by an equation like (*), we cannot consider any fixed point, but the least fixed point (the one that corresponds to the least defined function that satisfies the equation.)

The question now is: does the Y combinator picks the right fixed point? If a lambda term has more than one fixed point, which is the one picked up by Y?
Well, fortunately Y behaves really well. It not only returns a fixed point of a term, but the least fixed point, where, informally, by "least" we mean the one containing only the information strictly needed in order to satisfy the equation. Of course we could formalize in a precise mathematical sense what is the meaning, for a term, to have "less information" than another one, and hence what is the precise meaning of being a least fixed-point. But such a formalization would go beyond the scope of these introductory course notes.

It is enough for us to know that the least fixed point of the functional

λf. λx. if iszero (pred (pred x)) 2 (f x)

and hence the lambda term lambda-representing the function gee is

Y(λf. λx. if iszero (pred (pred x)) 2 (f x))

Let us look at some other examples
Example:

Let us consider the λ-term

λx.x

It is immediate to check that any term M is a fixed point of λx.x (the identity function) because

(λx.x)M =_β M

The least fixed point of λx.x is Ω, since

Y (λx.x) = Ω

It is reasonable that Ω be the least among all the lambda-terms (any lambda-term is a fixed point of the identity) since this term is intuitively the one "containing" the least information (actually no information at all).

Example:

Let us consider the function that diverges on any element of its domain. It can be defined by the following recursive equation:

h x = h x

The λ-term that represents the function h is the solution of the equation

h = λx. h x

that is, the least fixed point of

λf. λx. f x

The least fixed point of this term is hence the lambda-representation of the function that diverges everywhere.

Let us go back now to the Figure (3), where Y F is proved to be beta convertible to F(Y F) by means of two β-reductions followed by one β-expansion. Notice that it is impossible to have

Y F —» F (Y F)

If we wish to have such a behaviour we have to look for a different fixed point combinator, for instance Alan Turing's Θ, which is defined in the following way:

A ≡ λxy.y (xxy)
Θ ≡ AA

Its peculiarity is that we have that

Θ F —» F (Θ F)

The Why of Y.

We now show, in an intuitive way, why the combinator Y (and the fixed point combinators in general) works.

Let us consider once again the equation that defines the factorial function (and that at the same time provides a way to compute it.)

factorial x = if (x=0) then 1 else x * factorial(x-1)

From this equation we can obviously get

factorial x = if (x=0) then 1 else x * ( if (x-1=0) then 1 else (x-1) * factorial(x-1-1) )

and also

factorial x = if (x=0) then 1 else x * (if (x-1=0) then 1 else (x-1) * (if (x-1-1=0) then 1 else (x-1-1) * factorial(x-1-1-1) ) )

and so on....

To compute the factorial, in a sense, we unwind its definition as many times as it is needed by the argument, that is until x-1-1-1...-1 gets equal to 0. Such an unwinding is exactly what the Y combinator does. In fact

Y F =_β F(Y F) =_β F(F(Y F)) =_β F(F(F(Y F))) =_β F(F(F(F(Y F)))) =_β ... and so on

If you take as F the lambda-term λf.λx.if (iszero x) 1 (mult x (f (sub x 1))) , by applying a beta-reduction you get

Y F =_β λx.if (iszero x) 1 (mult x ((Y F)(sub x 1)))

but you can also get

Y F =_βλx.if (iszero x) 1 (mult x ((if (iszero (sub x 1)) 1 (mult (sub x 1) ((Y F)(sub (sub x 1) 1)))

and so on and so on...

That is (Y F) is beta-convertible to all the possible "unwindings" of the factorial definition. Then if you apply 2 to (Y F) you can get, by a few beta-reduction

(Y F)2 =_β if (iszero 2) 1 (mult 2 ((if (iszero (sub 2 1)) 1 (mult (sub 2 1) (if (iszero (sub (sub x 1) 1)) 1 (mult (sub (sub x 1) 1) ((Y F)(sub (sub (sub x 1) 1) 1)))))

We can easily check that the term on the right is beta convertible to 2, that is to the factorial of 2.

So, why does Y work? Because, in a sense, it is an "unwinder", an "unroller" of definitions.

Using Y (and hence recursion) to define infinite objects.

We have seen before how to represent a few simple data types in the lambda-calculus and we have said that it is easy to devise a representation for all the inductively defined data types. In particular, it is possible to represent the data type List with its basic element emptylist and the list constructor cons that we shall use below without giving their explicit definition.

Let us consider the following λ-term:

Y (λl. cons x l) (*)

It is definitely a fixed point. It is the fixed point of the term λl. cons x l. If we give a list ls as input to this term, it returns a new list formed by x followed by ls. It is not difficult to check that the least fixed point of λl. cons x l is the infinite list of x's. This means that the term Y (λl. cons x l) represents an infinite object!
So, in the lambda-calculus Y can be used not only to represent recursively defined functions, but also recursively defined object. In fact we could define the infinite list of x's by means of the following equation:

Infinite = cons x Infinite

Of course we cannot completely evaluate Y (λl. cons x l) (i.e. get all the information it contains), since the value of this term would be an infinitely long term (and in fact it does not have a normal form). But nontheless we can partially evaluate it in order to get finite approximations of the infinite list of x's.
You see, after a few beta reductions and beta conversions we can obtain

Y F =_β (cons x (cons x ( cons x (Y F)))) where F is the term (*)

and this term is a finite approximation of the infinite list of x's. Of course the term (*) is convertible to all the possible finite approximations of the infinite list of x's. This means that it is possible to use and work with terms representing infinite object. In fact, it is easy to check that

car (Y (λl. cons x l)) =_β x (**)

where car is the λ-representation of the function that, taken a list, returns its first element. So it is possible to use the term (*), even if it represents an infinite object (it is reasonable to be able to pick the first element from an infinite list.)

What we have seen here shows that in functional programming it is possible to deal also with infinite objects. Such a possibility, however, is not present in all functional languages, since when we evaluate an expression in a functional language we follow a precise reduction strategy, and not all the reduction strategies enable us to safely handle infinite objects.
In Scheme it is not possible since, in a term like (**) above, Scheme would keep on endlessly evaluating the argument of car. In Haskell, instead, the evaluation of a term like (**) terminates.
In order to better understand this fact, see the section below on evaluation strategies.

3.5 Reduction strategies

One of the aspects that characterizes a functional programming language is the reduction strategy it uses. Informally a reduction strategy is a "policy" which chooses a β-redex , if any, among all the possible ones present in the term. Abstractly, we can look at a strategy as a function.

Definition
A reduction strategy is a function

S: Λ → Λ

(where Λ is the set of all the λ-terms), such that:

if S(M) = N then M →_β N

Let us see, now, same classes of strategies.

Call-by-value strategies

These are all the strategies that never reduce a redex when the argument is not a value.
Of course we need to precisely formalize what we intend for "value".
In the following example we consider a value to be a lambda term in normal form.

Example:

Let us suppose to have:

((λx.x)((λy.y) z)) ((λx.x)((λy.y) z))

A call-by-value strategy never reduce the term (λx.x)((λy.y) z) because the argument is not a value. In a call-by-value strategy , then, we can reduce only one of the redex underlined.

Scheme uses a specific call-by-value strategy, and a particular notion of value.
Informally, up to now we have considered as "values" the terms in normal form. For Scheme, instead, also a function is considered to be a value (an expression like λx.M, in Scheme, is a value).
Then, if we have

(λx.x) (λz.(λx.x) x)

The reduction strategy of Scheme first reduces the biggest redex (the whole term), since the argument is considered to be a value (a function is a value for Scheme). After the reduction the interpreter stops, since the term λz.((λx.x) x) is a value for Scheme and hence cannot be further evaluated.

Call-by-name strategies

These are all the strategies that reduce a redex without checking whether its argument is a value or not. One of its subclasses is the set of lazy strategies. Generally, in programming languages a strategy is lazy if it is call-by-name and reduces a redex only if it's strictly necessary to get the final value (this strategy is called call-by-need too). One possible call-by-name strategy is the leftmost-outermost one: it takes, in the set of the possible redex, the leftmost and outermost one.

Example:

Let us denote by I the identity function and let us consider the term:

(I (II)) ( I (II))

the leftmost-outermost strategy first reduces the underlined redex.

We have seen before that the leftmost-outermost strategy is very important since, by the Standardization Theorem, if a term has a normal form, we get it by following such a strategy. The problem is that the number of reduction steps is sometimes greater then the strictly necessary ones.

If a language uses a call-by-value strategy, we have to be careful because if an expression has not normal form, the search could go on endlessly.

A fixed point combinator for Scheme-like strategies

Let us consider the functional F that has the factorial function as fixed point. Then we have that

(Θ F) 3 —» 6

Let us suppose to use a call-by-value strategy like that of Scheme. We get

(Θ F) 3 → (F (Θ F)) 3 → (F (F (Θ F))) 3 → ...

If we would like to use the above fixed point operator to define recursive function in Scheme (we do not do that when we program in Scheme, of course), we have this problem.
We could overcome this problem, however, by defining a new fixed point combinator, H, that, if we wished, we could safely define and use in Scheme as well (do it as an exercise when you study Scheme).
Let us consider the following λ-term:

H ≡ λg.((λf.ff) (λf.(gλx.ffx)))

It's easy to prove that this term is a fixed point operator. In fact we have that:

(H F) —» F (λx. AAx)

where A is (λf.F λx.ffx) and (HF) —» AA. From the point of view of the computations, we can write it in this form:

(H F) —» F (λx. H Fx)

Functionally, F (λx. H Fx) has the same behaviour of HF, but Scheme does not reduce it because that is a value for Scheme. Then if, for example, F is the functional whose fixed-point represents the function factorial, if we wish to compute the factorial of 5 we can evaluate the term ((HF)5).That is we have to evaluate

(F (λx. H Fx)) 5

and this evaluation is possible withouth getting into any infinite reduction sequence. If you compute the factorial of 5 in Scheme in this way, you will have to wait a bit of time in order to get the result...

3.6 Head Normal Forms

In the previous sections, we have implicitly assumed that in λ-calculus, if a term has not a normal form, then it represents no information at all. For example:

Ω has not normal form and it represents the indefinite value

But this is not always true, that is it is possible to have terms that have no normal form but nonetheless represent ("contain") some information.
We have already seen an example of this fact: the λ-term

Y (λl. cons x l) (*)

which It has no normal form, but it represents a precise object that contains an infinite amount of information. We cannot get such information all together (it would be an infinitely long term), but we have seen before that we can get all possible finite approximations of the information it contains.
So, it is definitely possible to use and work with terms that have no normal form but nonetheless posses a "meaning".

Note:

As discussed in the section about infinite objects, we have to be careful about the strategy we intend to use when manipulating expressions without normal form. If we tried to evaluate an expression like car(( Y (λl. cons x l)) and we used a call-by-value strategy with terms without normal form as values, we would get stuck in a loop, since we would try to get a value out of Y (λl. cons x l). In a language like Haskell, instead, we would obtain x (When you begin to have a look at Haskell, read also the notes [HASK] on laziness and infinite objects, available in the reading material)

How is it possible to formalize in the lambda-calculus the notion we have informally discussed above? That is, the notion of a term containing some information even if it has no normal form.
We do that by means of the definition of head normal form, the formalization of the notion of finite approximation of the meaning of a term.

Definition
A term is in head normal form (hnf) if, and only if, it has a form as follows:

λx₁...x_k.yM₁...M_n (n,k≥ 0)

where y can be any variable, also one of the x_i's.

If a term in head normal form λx₁...x_k.yM₁...M_n had a normal form, the normal form would have the following shape:

λx₁...x_k.yN₁...N_n

where N₁,...,N_n would be the normal forms of M₁,...,M_n.
If the term, instead, had no normal form, it is easy to check that in any case the portion of the term

λx₁...x_k.y

will stay always the same after any possible sequence of reduction steps.
(Note that a term in normal form is also in head normal form).

Example:

Let us consider the term:

λxyz.z(λx.x)(λzt.t(zz))(λt.t(λ (λxx)))

it is in normal form, but it is in head normal form too.

Let us consider, now, Ω. It is not in head normal form and it has no head normal form.

Now, going back to the notion of term containing no information (term representing the indeterminate), we have just seen that some information can be obtained out of terms having no normal form. So when can we say that a term really contains no information at all? Well, now we can answer: when a term has no head normal form!
Are we sure this to be the right answer? Well, the following fact confirms that it is really the right answer:
Let us take two terms not having head normal form, for example:

Ω and Ωx

Using the Church-Rosser theorem we can prove they are not β-convertible. Now If we add to the β-conversion theory the axiom:

Ω = _β Ωx

the theory of beta-conversion does remain consistent. It is no surprise, since this axiom is stating that a term that contains no information is equivalent to a term that contains no information, no contradiction in that!
Then, we can rightly represent the indeterminate with any term that has not head normal form.
Infact we could consistently add to the theory of the beta-conversion the axiom stating that all the terms with no head normal form are β-convertible (this intuitively means that we could consistently make "collapse" all the possible representations of the indeterminate into one single element).

4. Types

4.1 Introduction to types

The functional programming languages can be typed or untyped. A typed one is, for example, Haskell, while an untyped language is Scheme. By introducing the notion of type in the λ-calculus we shall be able to study it in an abstract and simplified way.
We can look at types as predicates on programs or, better, as partial specifications about the behaviour of programs. In the programming practice the use of types is helpful to prevent a number of programming errors.

There are two different approaches to the use of types.

the approach a' la Curry (the one used in Haskell)
the approach a' la Church (the one used in Pascal)

In the Curry approach, we first write a program and then check whether it satisfies a given specification.
Instead, in the Church approach, types are included in the syntax of the language and, in a sense, represent some constraints we have to follow during the writing of our programs in order to avoid making particular errors.
In the approach a' la Curry, checking if a term M satisfies the specification represented by the type t, it is usually referred to as "assigning the type t to the term M".

4.2 Type assignment 'a la Curry

In this approach we firsr write a program and then we check whether it satisfies the specification we had in mind. This means that we have two distinct sets: programs and partial specifications.
If we wish to talk in a general and abstract way about types (a' la Curry) in functional programming languages we can proceed with the following formalization:

Programs : the set of λ-terms, i.e.	/	Partial specifications: the set of types, i.e.
Λ	/	T ::= φ \| T → T

where φ is a meta-variable which ranges over a set of type variables (φ, φ', φ'', a, b, ... etc.). In this formalization we have considered the simplest form of type (in fact these types are called simple types). Elements of T will be denoted by σ, τ , ρ, etc.

There arise now a whole bunch of questions we should answer:

1) How can we formally show that, given a program (a term) M and a specification (a type) σ, M satisfies the specification σ? That is, how can we prove that M has type σ?

2) Is it decidable, given a M and a σ, that M is of type σ? i.e., given a program (term) M and a specification (type) σ, is there any algorithm enabling us to check whether M satisfies σ?

3) Do types play any role during a computation?

4) What is the intrinsic meaning of having type variables in our system? has the use of type variables some implication for our system?

5) Is it possible to "compare" types? i.e., is it possible to formalize the notion that a type represents a "more general" specification than another one?

6) Is there any relation among all the possible types we can assign to a term M? there exists the most general type among them? If so, can it be algorithmically found?

Let us start with question 1).
What we need to do is to devise a formal system whose judgments (typing judgements) are of the form

M : σ

and whose meaning is: the program M respects the specification σ (formally, the term M has type σ).

If we think about it, however, we can notice that our typing judgments should provide some more information than simply M : σ.
In fact, let us consider the term λx.yx. Which sort of specification is satisfied by such a program?
In case the input is of type φ, a type for λx.yx is definitely of the form

φ → something

But now, how can we found out what something could be, if we do not have any information about the free variable y? Our program (term) takes an input x and returns the result of the application of y to x. If we do not make any assumption about the type of the free variable y, we cannot say anything about the type of yx.
This means that the (typing) judgments we are looking for should indeed have the following shape:

B |— M:σ

where B (called a basis), provides informations about the types of the free variables of M.
The meaning of the typing judgment B |— M : σ is then:
using the informations given by the basis B, the term M satisfies the specification σ
(i.e. M has type σ under the assumptions provided by the basis B; or, shortly, M has type σ in B.)

Formally B will be a set of pairs of the form : variable : type

An example of basis is, for instance, B = {x : τ , z : ρ}

Let us introduce now the formal system enabling us to infer types for lambda-terms.

Definition (Curry type assignment system)

Axiom

—————
B |— x : σ

if x : σ is in B

(a variable x has type σ if the statement x : σ is in B).

Inference rules

These rules allow to derive all the possible valid typing judgements.

B \|— M : σ → τ	B \|— N : σ
——————————————— B \|— MN : τ

This rule says that if, by using the assumptions in B, we can prove that M has type
σ → τ and N has type σ, then we are assured that, in the basis B, the application MN has type τ. This rule is usually called (→E): "arrow elimination". This rule can be "read" also bottom-up, as follows: if we wish to prove that MN has type τ in B, then we need to prove that M has type σ → τ in B for some σ and that N has type σ in B.

B, x : σ |— M : τ
—————————
B |— λx.M : σ → τ

This rule says that if, by using the assumptions in B and the further assumption "x has type σ", we can prove that M has type τ, then we we are assured that, in B, λx.M has type σ → τ. This rule is called (→I) : "arrow introduction". Also this rule can be "read" bottom-up. Do it as exercise.

A term M is called typeable if there exist a basis B and a type σ such that the judgment
B |— M : σ can be derived in the assignment system above.

The above formal system is the core of the type systems of languages like Haskell, ML and others. This is the main motivation for which we are studying it.

Remark:

In Haskell, before writing the definition of a function, we can declare what type our function is intended to have, for instance:

fact :: Int → Int

By doing that, after defining the function fact the Haskell interpreter can check whether our program fact satisfies the given specification. In particular, the interpreter tries to derive, in a formal system like the one defined before, the Haskell typing judgement fact :: Int →Int.
Note: In Haskell we have not the concept of basis, because in correct programs an identifier either is a bound variable or it has been previously associated to some value.

Example:

Let Ø be the empty set and let us consider the judgment

Ø |— λxy.x : σ → (τ → σ)

whose informal meaning is: by considering no assumption at all (the empty basis), the term λxy.x satisfies the specification σ → (τ → σ).
If we wish to formally prove that the above judgment is valid, we need to use the Curry type assignment system.
Let us start from the axiom:

————————
{x : σ, y : τ} |— x : σ

By using (→I) two times, we get

{x : σ, y : τ} |— x : σ
————————
{x : σ} |— λy.x : τ → σ
———————————
Ø |— λx.λy.x : σ → (τ → σ)

Let us notice that in the last two steps we have discharged, from the basis we have started with, the assumptions for the bound variables of the term. In fact, a basis has to provide only informations about free variables, since the informations about bound variables are already contained in the type of the term.

Example:

We want to prove that:

B = {f : σ → σ, x : σ} |— f (f (fx)) : σ

Starting from B, we use the axiom and the rules of our formal system:

—————— B \|— f : σ → σ	———— B \|— x : σ
———————————— B \|— (fx) : σ		—————— B \|— f : σ → σ
——————————————————— B \|— f (fx) : σ			—————— B \|— f : σ → σ
———————————————————————————————— B \|— f (f (fx)) : σ

We can go on and derive, from the derivation above, other judgments:

{f : σ → σ, x : σ} |— f (f (fx)) : σ
—————————————
{f : σ → σ} |— λx.f (f (fx)) : σ → σ
———————————————
Ø |— λf.λx.f (f (fx)) : (σ → σ) → (σ → σ)

Remark:
There are programs that do not satisfy any specification, whatever be the basis B. For example, let us consider:

B |— xx : τ

It is impossible to derive this judgment in our system, whatever basis B and type τ we choose. What is the implication of this simple observation? Well, typed languages restrict the set of programs we can write, but on the other hand the programs we can write are safer, i.e. they are guaranteed not to contain certain errors and to satisfy their partial specifications. We loose something of course, that is flexibility, the possibility of writing "tricky" and "sly" programs.
So, in typed languages: less flexibility = more safeness.

There is a quite relevant property of typeable terms that concerns strong normalizability.

Theorem
Any typeable term is strongly normalizable.

So, since self-application cannot be typed, it follows that the combinator Y cannot be typed. This implies that we cannot type any term representing a recursive algorithm. This is also why, whereas in Scheme, if we wished, we could define recursive programs using fixed-point combinators, in Haskell it is not possible.
Of course this is not a real drawback for Haskell, since it is not realistic to write actual recursive programs by means of fixed-point combinators. In Haskell recursion is obtained by the use of recursive equations (which corresponds to the possibility of giving names to expressions.)

Let us now discuss another relevant property of our type assignment system: Subject reduction

Saying that M has type σ (M : σ ) amounts to say that M represents a value of type σ. The types do not give information about the form of the terms (about their syntax), but, instead, they provide informations about their intrinsic meaning, their semantics.
Hence, if a term has a certain type, it is important to be sure that during its evaluation the term "maintains" the same type.
For example, if a term M represents a program that returns a number, types would be useless if we could reduce M to a term that represents a program which returns a list.
Fortunately we can prove the following

Theorem (Subject reduction)
If in a basis B we can prove that M has type σ, that is

B |— M : σ

and if M —» N, then we can also prove that

B |— N : σ

Thanks to this theorem we are sure that, if we check that a term M has a given type at the beginning of a computation, then we can forget about types during the computation, since all the terms that we obtain by reducing M can be given the very same type. This also implies that types are irrelevant, do not play any role, in the computational process.

Let us note that the subject reduction property holds only for reductions and not for expansions. That is, if we have that

B |— N : σ and M —» N

it is not always true that

B |— M : σ

By means of the Subject reduction theorem we have also answered question 3).

Let us now try and answer question 4). Question 2) has still to be answered, but it will be done using the answer for 6)

Definition (Type substitution)
Let T be the set of simple types:

a) the type substitution (φ ρ) : T → T, where φ is a type-variable and ρ is a type, is inductively defined by:

(φ ρ) φ = ρ
(φ ρ) φ' = φ', if φ' ≠ φ
(φ ρ) σ → τ = ((φ ρ) σ) → ((φ ρ) τ)

The above definition formalizes the notion of "substituting a type variable by a type inside a type"

b) We can compose type substitution, that is, if S₁, S₂ are type substitutions, then so is S₁oS₂, where:

S₁oS₂ σ = S₁ (S₂ σ)

c) A type substitution can also be applied to a basis: SB = {x : Sτ | x : τ is in B}

d) A type substitution can also be applied to pairs of basis and type (it will be useful later on)

S <B, σ> = <SB, Sσ>

Note: The notion of substitution for types is much simpler than the one for terms, since in simple types we do not have bound type-variables.

Example: if we apply the type substitution (φ σ) to the type φ → (τ→ φ), and if τ does not contain φ, we obtain the type σ → (τ→ σ)

If for σ,τ there is a substitution S such that Sσ = τ, we say that τ is a (substitution) instance of σ.

The following lemma shows that if a term has a type, it has indeed infinitely many tipes. The Lemma will provide, at the same time, an answer to question 4).

Lemma (substitution lemma)
If using a basis B we can prove that a term M has a type σ, that is

B |— M : σ

then, for all possible substitutions S, we can also prove that:

S(B) |— M : S(σ)

So the Substitution Lemma states that if a term satisfies a specification, then it satisfies infinitely many ones (because we can have infinitely many possible substitutions.)

Example:

It is easy to prove that

{x : φ → ψ} |— λy .xy : φ → ψ

The Substitution Lemma states that if we apply any substitution S, it is also true that

S({x : φ → ψ}) |— λy .xy : S(φ → ψ)

For example, by considering the substitution

(φ (φ' → φ)) o (ψ ξ)

it is true that

x : (φ' → φ) → ξ |— (λy .xy) : (φ' → φ) → ξ

Example:

If we have derived

Ø |— λx.x : φ → φ

by the Substitution Lemma we are sure that also the following judgment is derivable

Ø |— λx.x : (ψ → φ) → (ψ → φ)

What we have just seen means that in our formal system we have polymorphic functions (a function is called polymorphic if it can be correctly applied to objects of different types).
So, by using type variables, we have implicitly introduced a notion of polymorphism in our type assignment system.
Since this system is the core of the type system of Haskell, this implies that Haskell is a language with polimorphism. (In Haskell type variables have names like a, b, c etc.)

To state that the lambda term λx.x has type a → a is tantamount to say that λx.x is a polymorphic function, that is it can be uniformly applied to any element of any type T and it returns an element of type T. The type a → a is (implicitly) a polymorphic type since, being "a" a type-variable, it "represents" all the types with shape T → T.
The polymorphism of our system can be said to be implicit, since the fact that a term has many types it is not explicitly expressed in the syntax by means of particular operators on types, but it is expressed by results like the Substitution Lemma.
A possible syntax for explicit polymorphism could be the following one:

λx.x : for all φ.φ → φ

In this case the polymorphic behaviour of the term is explicitely described by the sintax.

We can now provide, using the notion of type substitution, an answer to question 5) by giving the following definition.

Definition (more general type)
Let τ and σ be two simple types.
τ is said to be more general than σ if there exists a substitution S such that σ = S(τ).
That is if the latter can be obtained from the former by means of a type substitution.

We know that if a term M has a type, it has infinitely many ones. Now, it is also possible to prove that there exists a precise relation among all the types that a given term has (i.e. all the possible partial specifications of a program are, so to speak, connected to each other).
That would be an answer for question 6). Such an answer will also provide an answer for question 2) that up to now we have left unanswered.

4.3 Principal pairs

Theorem
Given a term M, if M is typeable then it has a Principal Pair <P, π>, where P is a basis and π a type such that:

1) we can derive the following typing for M

P |— M : π

2) Moreover, if we can derive

B |— M : σ

then there exists a substitution S such that B = S(P) (as a matter of fact B could be a superset of S(P)) and σ = S(π).
In other words, the principal pair represents the most generic typing possible for M. That is, any other basis and type that we can use to type M can be obtained from the principal pair.

This partially answer our question 6)

Example:

Let us consider the term:

λx.x

It has principal pair:

<Ø, φ → φ>

This means that any other typing for the identity is a substitution instance of this principal pair.

The proof of the principal pair algorithm can be given by providing an algorithm that, given a term M, returns its principal pair, pp(M), and fails in case M be not typeable.
These completes the answer of our question 6).

Before giving the formal description of the Principal Pair Algorithm, let us first informally describe how it works.
The algorithm tries to type the given term in the most general way (that is to find the most general basis and the most general type for the term.) It tries to do that by using the rules of the assignment system, in a bottom-up way.

If we consider a variable x, well, the only way to type it is by using an axiom, and the most general axiom for x is definitely the one that has only x in the basis and a type-variable as type for it.

If we consider an abstraction λx.M we know that the only way to type it is by using rule (→I). It is then clear that if we wish to type λx.M in the most general way by means of this rule, we first need to find the most general typing for M (by using recursively the algorithm itself. If we fail the whole algorithm fails). Once we have the most general typing for M, if x is in the basis (with a certain type, say σ) then the most general type for λx.M will be σ → π (where π is the most general type found for M). Otherwise, if x is not in the basis (that is, x is not a free variable of M) the most general typing for λx.M is φ → π, where π is the most general type found for M and φ is a type variable not yet used (if x is not a free variable of M it means that the input of λx.M could be anything and hence a type-variable φ is the most general type for such a generic input.

If we consider an application (MN) we know that the only way to type it is by using rule (→E). It is then clear that if we wish to type MN in the most general way by means of this rule, we first need to find the most general typing for M and the most general typing for N (by using recursively the algorithm itself. If we fail the whole algorithm fails). Once we have the most general basis and type for M, say P₁ and π₁, and the most general basis and type for N, say P₂ and π₂, these bases and types, however, cannot be used to type MN with the rule (→E) unless π₁ is an arrow type having π₂ as domain. If this is not the case, we can profit from the Substitution Lemma and find the simplest substitution that, when applied to π₁ and to (π₂ → φ), makes them equal (φ is a new type variable). Once we have such a substitution, say S₁ (we find it by means of another algorithm called unify. If it fails the whole algorithm fails), we apply it to P₁, π₁, P₂ and π₂. Unfortunately, we cannot use rule (→E) yet, since the bases used in the two premises of the rule must be the same, and in general it is not true that S₁ P₁ is the same as S₁ P₂. This means that we need to find another substitution S₂ (the simplest one), by means of which we manage to make S₁ P₁ and S₁ P₂ equal (in their common variables). If we manage to find such a substitution (otherwise the whole algorithm fails) we are done, since now we can use rule (→E) to type MN with S₂ S₁ φ from the basis S₂ S₁ (P₁UP₂) (we consider the union because the free variables of M could be different from the free variables of N).

The formalization of the process we have informally presented before is therefore the following recursive algorithm.

Definition (The principal pair algorithm)
Given a term M we can compute its principal pair pp(M), if any, as follows:

a) For all x, φ : pp(x) = < {x : φ},φ>

b) If pp(M) = <P, π> then:

i) in case x is in FV(M) (and hence there is a σ such that x : σ is in P) we define
pp(λx.M) = <P\x , σ → π>
ii) otherwise, in case x is not a free variable, we define
pp(λx.M) = <P, φ → π>, where φ does not occur in <P, π>.

c) If pp(M₁) = <P₁, π₁> and pp(M₂) = <P₂, π₂> (we choose, if necessary, trivial variants such that the <P₁, π₁> and <P₂, π₂> contain distinct type variables), φ is a type-variable that does not occur in any of the pairs <P_i, π_i>, and:

S₁ = unify π₁ (π₂ → φ)
S₂ = UnifyBases(S₁ P₁) (S₁ P₂)

then we define pp(M₁M₂) = S₂oS₁<P₁UP₂, φ>. (unify and UnifyBases are defined below)

Of course, we should also prove that this algorithm is correct and complete, but this would go beyond our scopes.

The principal pair algorithm provides also an answer to question 2). In fact, if a term is not typeable the algorithm fails, whereas it returns its principal pair if it is typeable. So, if a term is typeable we can not only effectively find it out, but, in a sense, we can have all its possible types, since any possible typing can be got out of the principal pair.

Extensions of the above algorithm are used in languages like Haskell and ML. Let us roughly see how it is used: let us define a function, say

Id x = x

And let us write now the following thing (that amounts to asking to the Haskell interpreter the following question: "what is the principal type of Id?")

:type Id

What the interpreter does is to run an algorithm similar to the one we have defined above, producing the principal type of the term associated to the identifier Id (the principal type and not the principal pair, since our Haskell expressions must be closed).
In case of Id the interpreter returns

a → a

Onother possibility in Haskell is instead to "declare" the type of an expression before we write it, for instance:

Id : (Int → b) → (Int → b)

After the type declaration we can write the "code" for Id, that is

Id x = x

Now, what the interpreter does is the following: it finds the principal type of the expression associated to Id and then checks whether the specification we gave, (Int → b)→ (Int → b), can be obtained out of the principal type of Id by means of a type substitution.

You can notice that the pp algorithm uses the unify and UnifyBases sub-algorithms (described below). The unify algorithms takes two types and checks whether there exists a substitution that makes them equal. That is why sometimes the Haskell interpreter gives you a warning saying that it does not manage to unify some type variable, say a. This means that the unify algorithm implemented in the interpreter does not manage to find a substitution enabling the unify algorithm to successfully terminate.

Definition (Unify and UnifyBases)

Let B the collection of all bases, S be the set of all substitutions, and Id_s be the substitution that replaces all type-variables by themselves.

i) Robinson's unification algorithm . Unification of Curry types is defined by:

unify :: T x T → S
unify φ τ = (φ τ), if φ does not occur in τ
unify σ φ = unify φ σ
unify (σ → τ) (ρ → μ) = S₂oS₁,

where S₁ = unify σ ρ and S₂ = unify (S₁ τ) (S₁ μ)

ii) By defining the operation UnifyBases, the operation unify can be generalized to bases:

UnifyBases :: B x B → S
UnifyBases B₀, x:σ B₁, x:τ = S₂oS₁
UnifyBases B₀, x:σ B₁ = UnifyBases B₀ B₁, if x does not occur in B₁.
UnifyBases Ø B₁ = Id_s.

where S₁ = unify σ τ and S₂ = UnifyBases (S₁ B₀) (S₁ B₁)

Notice that unify can fail only in the second alternative, when φ occurs in μ.

4.4 The typed lambda calculus a' la Church.

We have seen that in the Curry approach to types there is a neat separation between terms and types. The underlying "programming philosophy" being that we first write untyped programs and after that we use the type inference system either to check (possibly under certain assumptions -the basis-) if the program is typeble with a given type or to find its principal type (by means of the the pp algorithm).

In the Church approach, instead, the types get involved during the construction of the term (program) itself. In a sense, types are part of the program and are used as "constraints" preventing the programmer to make certain mistakes. A fundamental property of the Church approach is that each term has just one single type.
This is achieved by imposing every variable to have (to be declared to have) just one type. The constraint of declaring the type of each variable is adopted in several programming languages, like C, C++, Java, Pascal, lambda Prolog,

Since in the typed lambda calculus a' la Church each term has just one type, the set of terms is partitioned in subsets, one for each type.

The set of types is defined as follows.

T :: INT | BOOL | T → T
Of course we might choose to have even more base types, not just INT and BOOL.
Notice that, unlike in the Curry system, we have no type variable. In the Curry system, if a term M has a type containing type variables, say a → a, then M has infinitely many types (by the Substitution Lemma), whereas in the Church system a term has just one type. This is also the motivation why, if we wish to introduce the notion of polymorphism in a Church system, it has necessarily to be explicit polymorphism.

Let us introduce now the rules of our system. Unlike in the Curry system, they are not type inference rules, but rather term formation rules, in which types sort of "guide" the correct formation of terms.

For any possible type we assume there exists an infinite set of variables having that particular type. We assume also that there exist no two variables with the same name but with different types.

The type of a variable will be denoted by means of superscripts, like in x^t (the variable with name x and type t.)
We shall denote by Λ_t the set of well-formed typed lambda terms having type t.

Definition (Well-formed typed lambda terms)
The (term formation) rules of the typed lambda calculus a' la Church defining the sets Λ_t, are the following ones:

   For each typed variable x^t, x^t belongs to Λ_t;

   If M belongs to Λ_t1→t2 and N to Λ_t1, then (MN) belongs to Λ_t2;

   If M belongs to Λ_t2, then λx^t1.M belongs to Λ_t1→t2;

Notice that in the last clause, in case x is a free variable of M, it has necessarily the type t₁ of the abstracted variable, since variables with the same name do have the same type.

Property (Uniqueness of type)
Let M belongs to Λ_t1 and to Λ_t2, then t1=t2.

It is possible to develop also for the typed lambda calculus a' la Church a theory of beta-reduction and a theory of beta-conversion, starting, respectively, from the following axioms

(λx^t1.M)N →_β M[N/x^t1]

(λx^t1.M)N =_β M[N/x^t1]
and proceeding in a similar way as in the untyped lambda-calculus.
In the above axioms, (λx^t1.M)N has obviously to be a well-formed term. It can be proved that M[N/x^t1] is well formed if (λx^t1.M)N is so and that they have both the same type (belong to the same Λ_t).

In the Church typed lambda-calculus, the following questions naturally arise:
- given a term, is it well-formed? that is, has it been formed correctly using the rules of the system?
- given a well formed term, what is its type?

The answers to these questions are provided by the Type Checking algorithm (TC). Such algorithm, given a term, checks whether it has been correctly formed and, in case, returns its type.

Definition (Type Checking algorithm)

TC(x^t) = t if x^t belongs to Λ_t; fails otherwise

TC(MN) = t2 if TC(M) = t1 → t2 and TC(N) = t1; fails otherwise

TC(λx^t1.M) = t1 → t2, is TC(M) = t2 and x^t1 belongs to Λ_t1; fails otherwise.

We say that a term M type checks if the TC algorithm does not fail on M.
It is easy to see that TC(M)= t iff M belongs to Λ_t

As can be expected, it is possible to prove the "Church version" of the subject reduction property.

Theorem (Type preservation under reduction)
If M belongs to Λ_t and M —» N, then N belongs to Λ_t

Similarly to the Curry system, the following relevant theorem holds:

Theorem (Strong Normalization)

Let M be a well-formed typed term in the Church system, then M is strongly normalizable.

4.5 Relation between the systems of Curry and Church

As can be expected, there exists a strong relation between the two type systems described in the previous sections.

Given a simply-typed lambda term M, let |M| denote the untyped lambda term obtained from M by removing the type decorations. Then we have that if M type checks, then |M| is Curry-typeable. Viceversa, if a lambda term N is Curry-typeable, then we can find a suitable type for the bound variables of N, so to obtain a term in the Church system. More precisely, if we add base types to the types of the Curry system, the following holds:

Proposition

(i) If M is a well formed term in the Church system and TC(M)=A, then |M| can be given the type A in the Curry system (from a suitable basis). Note that the viceversa does not hold. Also note that A is not necessarily the principal type of |M|.

(ii) If N can be given a type A in Curry, then there exists a well-formed Church typed term M such that TC(M)=A and |M| = N.

4.6 Lambda-calculus extensions and their type systems a' la Curry.

We have said that the type system a' la Curry we have introduced in the previous sections is essentially the one used by the language Haskell. However, if we wish to investigate in an abstract way functional languages like Haskell, we cannot use just the pure lambda-calculus terms to represent programs.
In fact, we know that lambda-terms typeable in the type system a' la Curry are strongly normalizable, whereas not all typeable Haskell terms are so.
We cannot type fixed point operators like Y and this prevents using the notion of recursion in the simply typed lambda-calculus a' la Curry.
In Haskell, instead, we have recursion (since we have the possibility of giving names to terms).
Hence, if we wish to study the properties of languages like Haskell in a still general and abstract framework, a possibility is to extend the lambda-calculus in order to make is a bit closer to a real language, for what concerns both terms and types.

Let us call LC⁺ our extended calculus.
We assume LC⁺ to contain constants. For instance constants for numbers (once we know that we can represent numbers in the pure lambda-calculus, we can use constants instead, for sake of readability and computational efficiency): 0, 1, 2, ecc., constants for booleans: true, false, constants for numeric functions like add, mul, ecc. and also a constant for the conditional expression: if. If we wish we can introduce a constant for the pair constructor: [-], and many more, if we wish. (For semplicity's sake we shall denote by [M,N] the term [-]MN). If we have the pair constructor we also need the two selectors on pairs: first and second.
Notice the difference between having a constant for a certain value, say true, and having that value represented by a term, λxy.x (we have called it true just in order to better handle it in our expressions.)
As discussed before, since we wish to have a typed calculus and we wish to deal with recursion, we need to introduce the notion of recursion explicitely, by means of a recursor operator constant: fix.
The set of terms of LC⁺ can be defined almost as the set of terms of the pure lambda-calculus. The terms are build using abstraction and application starting from variables and constants.

Adding constants to our calculus, however, it is not enough.
If, from now on, you ask your friends to call you "Carlo Rubbia", is this enough to make you a great scientist? Of course not. It is not the name "Carlo Rubbia" that makes a person a great scientist. It is how the person "behaves". A cat is a cat not beacuse you call it "cat", but because it catches mice.
So, it is not enough to extend the lambda-calculus with a constant add in order to have the operation of addition in LC⁺. We need to introduce in LC⁺ something that makes add behave like the addition operation.
So, we add new axioms in the reduction theory of LC⁺, besides the one of beta-reductions. Axioms that specify the computational "behaviour" of constants, axioms like

add 1 2 -->_add 3
add 4 5 -->_add 8
etc. etc. And all those that are necessary for the constants we have, like if true M N -->_if M
if false M N -->_if N
first [M,N] --> _[-] M
etc. etc. Of course we have not to forget the axiom describing the computational behavior of fix: fix F -->_fix F(fix F)
We can now take into account the type system a' la Curry for LC⁺.
Since we have constants for numbers, booleans, etc., we need also to have base types like INT, BOOL, etc.
Of course in our system we have arrow types, but since we have introduced pairs, we need to have also product types. That is, if T₁ and T₂, also T₁xT₂ is a type.
Now, in order to have a reasonable type system, we cannot allow constants to have arbitrary types. When we use our formal system to infer a type for a term it would be unreasonable to assume that 3 has type φ → BOOL. So, what we need to do is to prevent constants to appear in a basis B and to extend our type inference system with new axioms, like

———————
B |— 3 : INT

and

—————————————
B |— add : INT→INT→INT

for any possible basis B.
And so on.
Notice that in the axiom for if we need to specify that it can be polymorphically typed:

—————————————
B |— if : BOOL→Sφ→Sφ→Sφ

for any possible basis B and substitution S.

For what concerns the fix operator, its typing axiom is obviously

———————————
B |— fix : (Sφ→Sφ)→Sφ

for any possible basis B and substitution S.

To deal with product types we need to add an inference rule:

B |— M : σ B |— N : τ
———————————————
B |— [M,N] : σxτ

Of course the property of strong normalization for typeable term does not hold anymore in LC⁺.

We can now use LC⁺ and its typing system to describe particular notions of programming languages like Haskell and to study their properties.
For instance we can formally define the lazy reduction strategy of Haskell by means of the following formal systems for typeable terms of LC⁺.

The Lazy reduction strategy

Axioms ———————
M -->^lazy N
If M -> N is a reduction axioms, that is the one of beta-reduction or one for the operator constants, for instance if true M N -->_if M.
Rules M -->^lazy M'
---------------------------
add M N -->^lazy add M' N

M -->^lazy M'
---------------------------
add n M -->^lazy add n M'
if n is a numeric constant

M -->^lazy M'
---------------------------
Eq? M N -->^lazy Eq? M' N

M -->^lazy M'
---------------------------
Eq? n M -->^lazy Eq? n M'
if n is a numeric constant

M -->^lazy M'
------------------------------------
if M N P -->^lazy if M' N P

M -->^lazy M'
---------------------------
first M -->^lazy first M'

M -->^lazy M'
---------------------------
second M -->^lazy second M'

M -->^lazy M'
---------------------------
MN -->^lazy M'N

Notice that the lazy reduction strategy does not reduce neither inside an abstraction nor inside a pair.

Of course it is possible to define extended version of the lambda-calculus also with type systems a' la Church (the system PCF is a well known example of typed extended lambda-calculus a' la Church).