Functional Programming and Lambda-calculus


	Short Introduction to Functional Programming and Lambda-calculus
	Franco Barbanera

Contents

1. Why functional programming?

2. An imperative programming style

3. A functional programming style

4. Functions in mathematics and functions as programs

5. Functional programs

6. The evaluation of a functional program

7. The order of evaluation

8. Recursion

9. Higher order functions

10. Data Structures

11. Correctness proofs of functional programs

12. Typed Languages

13. The λ-calculus and computation theory

14. Types and Typed λ-calculi

15. Introduction to Subtyping

Sections 1-12 are partly based on
Chap.1 of R. Plasmeijer, M. van Eekelen,
"Functional Programming and Parallel Graph Rewriting"

Section 15 is partly based on
Karl Schnaitter,
"Subtyping in the Lambda-Calculus"

1. Why functional programming?

Imagine the availability of perfect computer systems: software and hardware systems that are user-friendly, cheap, reliable and fast. Imagine that programs can be specified in such a way that they are not only very understandable but that their correctness can easily be proven as well. Moreover, the underlying hardware ideally supports these software systems and superconduction in highly parallel architectures makes it possible to get an answer to our problems at dazzling speed. Well, in reality people always have problems with their computer systems. Actually, one does not often find a bug-free piece of software or hardware. We have learned to live with the software crisis, and have to accept that most software products are unreliable, unmanageable and unprovable. We spend money on a new release of a piece of software in which old bugs are removed and new ones are introduced. Hardware systems are generally much more reliable than software systems, but most hardware systems appear to be designed in a hurry and even well established processors contain errors. Software and hardware systems clearly have become very complex. A good, orthogonal design needs many person years of research and development, while at the same time pressure increases to put the product on the market. So it is understandable that these systems contain bugs. The good news is that hardware becomes cheaper and cheaper (thanks to very large scale integration) and speed can be bought for prices one never imagined. But the increase of processing power automatically leads to an increase of the use of that power with an increasing complexity of the software as a result. So computers are never fast enough, while the complexity of the systems is growing enormously. The two key problems that the computer science community has to solve are:

How can we, at low cost, make large software systems that remain reliable and user-friendly?

How can we increase processing power at low cost?

Researchers are looking for solutions to these problems: by investigating software engineering techniques, to deal with problems related to the management of software projects and the construction and maintenance of software; by designing new proof techniques to tackle the problems in proving the correctness of systems; by developing program transformation techniques, to transform the specification of a problem into a program that solves it; and by designing new (parallel) computer architectures using many processors (thousands or more). In the mean time the quest for revolutionary new technologies (e.g. optical chips, superconduction) is always going on.

Another approach is based on the idea that the problems mentioned above are fundamental problems that cannot be solved unless a totally different approach is used and hardware and software are designed with a completely different model of computation in mind.

2. An imperative programming style

Most computer programs are written in an imperative programming language in which algorithms are expressed by a sequence of commands.
These languages, such as FORTRAN, C, Algol, COBOL, PL/1 and Pascal, are originally deduced from (and form an abstraction of) the computer architectures they are running on. These computer architectures, although different in detail, are all based on the same architecture: the Von Neumann computer architecture. The Von Neumann computer architecture is based on the mathematical model of computation proposed by Turing in 1937: the Turing machine.

Imperative programming languages are those based on the Turing-machine model of computation.
The fundamental concepts in imperative programming languages (coming from the Turing Machine model) are:

store
variable - something it is possible to modify ("memory cell")
assignment
iteration

The great advantage of the Turing model and hence of the corresponding phisical architectures is that they are extremely simple. The Von Neumann architecture consists of a piece of memory that contains information that can be read and changed by a central processing unit (the CPU). Conceptually there are two kinds of information: program instructions in the form of machine code, information that is interpreted by the CPU and that controls the computation in the computer; and data, information that is manipulated by the program.
This simple concept has made it possible to make efficient realizations in hardware at a relatively low cost. In the beginning of the computer era, the ‘high-level’ imperative programming languages were designed to provide a notation to express computations in a machine-independent way. Later on, one recognized the importance of expressing computations such that programs are understandable for the human being and their correctness can be proven. It became clear that, in order to make it possible to reason about programs, not every machine instruction should have a direct equivalent in the high-level programming language. For instance, it is common knowledge that with the use of GOTO statements, which are the direct abstraction of the branch and jump instructions that are available on any computer, programs can be written in such a way that reasoning about them is almost impossible (Dijkstra, 1968). We strongly believe that a similar kind of problem is caused by the assignment statement.

Consider the following example written in an imperative programming style:

BOOLEAN even := TRUE;
:
PROCEDURE calculate (INTEGER value) : INTEGER;
BEGIN
 even := NOT even;
 IF even
  THEN value + 1
  ELSE value + 2
 ENDIF
END;
:
print(calculate (6));
:
print(calculate (6));

Both print statements in this program are syntactically the same. Still they may produce different results. Clearly either the value 7 or 8 is printed in both cases, but the exact value printed depends on the number of times the procedure calculate is called. The result returned by the procedure not only depends on the actual value of its argument, but also on the value the global boolean has at that particular moment. This value is "secretly" changed in each procedure call. Such side-effects cause the result of a procedure call to be context dependent. Results become very hard to predict when, in a large program, global variables are changed all over the place.

One of the most important drawbacks of an imperative programming style is that an imperative program consists of a sequence of commands of which the dynamic behaviour must be known to understand how such a program works. The assignment causes problems because it changes the value (and often even the meaning) of a variable. Owing to side-effects it can happen that evaluating the same expression in succession produces different answers. Reasoning about the correctness of an imperative program is therefore very difficult.

Furthermore, because of the command sequence, algorithms are more sequential than is necessary. Therefore it is hard to detect which parts of the algorithm can or cannot be executed concurrently. This is a pity, since concurrent evaluation seems to be a natural way to increase execution speed.
A conjecture adopted by many researchers nowadays is that the software crisis and the speed problem are inherent to the nature of imperative programming languages and the underlying model of computation. Therefore, other styles of programming, such as object oriented, logical and functional styles of programming are investigated.

3. A functional programming style

John Backus (1978) pointed out that the solution for the software problems has to be found in using a new discipline of programming: a functional programming style instead of an imperative one.
In a functional program the result of a function call is uniquely determined by the actual values of the function arguments. No assignments are used, so no side-effects are possible. As a consequence, it makes no difference where and under what conditions a function is called. The result of a function will, under all conditions, be determined solely by the value of its arguments. For instance, in the program below the value printed will be 42 in both cases.

FUNCTION increment (INTEGER value) : INTEGER;
BEGIN
value + 1
END;
:
print(increment (41));
:
print(increment (41));

It is much easier to reason about the result produced by a function that has no side-effects than about the result of an imperative procedure call with side-effects.

Advantages of functional programming languages
So perhaps a functional programming style is important and side effects, and hence the assignment statement, should be abandoned. But why should we use a functional programming language? Is it not possible to use the familiar languages? The common imperative languages also have functions, so why not restrict ourselves and only use the functional subset of for instance C, Algol or Modula2?
Well, one can use the functional subset of imperative languages (i.e. only using functions and leaving out the assignment), but then one is deprived of the expressive power of the new generation of functional languages that treat functions as "first class citizens". In most imperative languages functions can only be used restrictively. An arbitrary function cannot be passed as an argument to a function nor yielded as result.

Functional programming languages have the advantage that they offer a general use of functions which is not available in classical imperative languages. This is a fact of life, not a fundamental problem. The restricted treatment of functions in imperative languages is due to the fact that when these languages were designed people simply did not know how to implement the general use of functions efficiently. It is also not easy to change the imperative languages afterwards. For instance, the type systems of these languages are not designed to handle these kinds of function. Also the compilers have to be changed dramatically.

Another advantage is that in most modern functional programming language(s) (FPLs) the functional programming style is guaranteed: the assignment statement is simply not available (like GOTOs are not available in decent modern imperative languages). FPLs in which there are no side-effects or imperative features of any kind are called pure functional languages. Examples of pure functional languages are Miranda, LML (Augustsson, 1984), HOPE (Burstall et al., 1980), Haskell (Hudak et al., 1992) and Concurrent Clean (Nöcker et al., 1991b). LISP (McCarthy, 1960) and ML (Harper et al., 1986) are examples of well known functional languages which are impure. From now on only pure aspects of FPLs are considered.

In functional programming the relevant notion of imperative programming are absent (or, in case they are present, have a different meaning).
In fact, the relevant notions of functional programming are:

expression
recursion - different from the notion of iteration
evaluation - different from the notion of execution (when we evaluate a mathematical expression we do not modify anything)
variable - intended in a mathematical sense (unknown entity, abstraction from a concrete value)

In pure FPLs the programmer can only define functions that compute values uniquely determined by the values of their arguments. The assignment statement is not available, and nor is the heavily used programming notion of a variable as something that holds a value that is changed from time to time by an assignment. Rather, the variables that exist in purely functional languages are used in mathematics to name and refer to a yet unknown constant value. This value can never be altered. In a functional style a desired computation is expressed in a static fashion instead of a dynamic one. Due to the absence of side-effects, program correctness proofs are easier than for imperative languages (see Section 11). Functions can be evaluated in any order, which makes FPLs suitable for parallel evaluation (see Section 6). Furthermore, the guaranteed absence of side-effects enables certain kinds of static analysis of a program. analysis.

Besides the full availability of functions, the new generation functional languages also offer an elegant, user-friendly notation. Patterns and guards provide the user with simple access to complex data structures; basically one does not have to worry about memory management any more. Incorrect access of data structures is impossible. Functional programs are in general much shorter than their conventional counterparts and thus in principle easier to enhance and maintain.

The question arises, is it possible to express any possible computation by using pure functions only? Fortunately, this question has already been answered years ago. One of the greatest advantages of functional languages is that they are based on a sound and well-understood mathematical model of computation: the λ-calculus (Church, 1932; 1933).
One could say that functional languages are sugared versions of this calculus. The λ-calculus was introduced at approximately the same time as the Turing model (Turing, 1937). Church’s thesis (Church, 1936) states that the class of effectively computable functions, i.e. the functions that intuitively can be computed, is the same as the class of functions that can be defined in the λ-calculus. Turing formalized machine computability and showed that the resulting notion of Turing computability is equivalent to λ-definability. So the power of both models is the same. Hence any computation can be expressed using a functional style only.

Disadvantages of functional programming languages
The advantages mentioned above are very important. But although functional languages are being used more frequently, in particular as a language for rapid prototyping and as a language in which students learn how to program, functional languages are not yet commonly used for general purpose programming. The two main drawbacks lie in the fields of

efficiency and

suitability for applications with a strongly imperative nature.

   Firstly, until recently programs written in a functional language ran very, very slowly in comparison with their imperative counterparts. The main problem is that the traditional machine architectures on which these programs have to be executed are not designed to support functional languages. On these architectures function calls are relatively expensive, in particular when the lazy evaluation scheme (see Section 7) is used. Our computers are ideally suited for destructive updates as present in imperative languages (assignments), but these are conceptually absent in functional languages. It is therefore not easy to find an efficient compilation scheme. Another big problem is caused by the fact that for some algorithms, due to the absence of destructive updates, the time and space complexity can be much worse for a functional program than for its imperative equivalent. In such a case, to retain the efficiency, program transformations are necessary.
   By using several new (and old) compilation techniques, the efficiency of (lazy) functional programs can nowadays be made acceptable in many (but certainly not all) cases. New compilation techniques have been developed at several research institutes. These techniques are quite complex. Commercial compilers are therefore not widely available yet. This will soon change. Anyhow, in our opinion, the advantages of functional languages are so important that some loss of efficiency is quite acceptable. One has to keep in mind that decades ago we accepted a loss of efficiency when we started to use high-level imperative languages instead of machine assembly languages.
   Secondly, a very important drawback of functional languages was that some algorithms could not be expressed elegantly in a functional programming style. In particular, this seemed to hold for applications that strongly interact with the environment (interactive programs, databases, operating systems, process control). But, the problem is largely caused by the fact that the art of functional programming is still in development. We had and still have to learn how to express the different kinds of applications elegantly in a functional style. We now know that strongly interactive applications can be expressed very elegantly in a functional programming style. One example is the way interactive programs that use windows, dialogs, menus and the like can be specified in a precise and elegant way in many functional languages, like Clean and others.
   The advantages of a functional programming style are very important for the development of reliable software. The disadvantages can be reduced to an acceptable level. Therefore we strongly believe that one day functional languages will be used worldwide as general purpose programming languages.

4. Functions in mathematics and functions as programs

   In mathematics, there are several ways to define a function. The type of a function can be specified separately from the function definition. One way to define a function is by explicit enumeration of all objects in the domain on which the function is defined with their corresponding images. An example of this is the following partial function (domain names Z and N are used for the domains of integers and natural numbers).

abs: Z → N
abs(-1) = 1
abs( 0) = 0
abs( 1) = 1

   Another way to define functions is by using definitions that consist of one or more (recursive) equations. For example, with this method the abs-function above can easily be defined as a total function, applicable for all objects in the domain. Of course, the functions and operators used on the right-hand side must be defined on the appropriate domains.

abs: Z → N
abs(n) = n, n > 0
          = 0, n = 0
          = -n, n < 0

A function like factorial can be defined as follows:

fac: N → N
fac(n) = 1, n = 0
fac(n) = n * fac (n - 1), n > 0

or an alternative definition method is:

fac: N → N
fac(n) = if (n=0) then 1 else n * fac (n - 1)

   A mathematician would consider the definitions above as very common, ordinary function definitions. But these examples are also perfect examples of function definitions in a functional programming language. Notationally a function definition in a functional language has many similarities with a function definition in mathematics. However, there is an important difference in objective. The objective in a functional language is not only to define a function, but also to define a computation that automatically computes the image (result) of a function when it is applied to a specific object in its domain (the actual argument of the function).
Some function definitions, well-defined from a mathematical point of view, cannot be defined similarly in a functional language, because the images of some functions are very difficult to compute or even cannot be computed at all.

Consider, for example, the following function definition:

halting: All_Programs → N
halting(p) = 1, if the execution of p will stop
                       0, otherwise

The halting function as indicated above is a problem that is not computable, and therefore an attempt to express it will not produce the desired computation. Suppose that a function in an FPL would try to calculate the image of the halting function for an arbitrary program. The only way of doing this is more or less running the program. But then the function would simply not terminate if the argument program does not terminate, in which case the result 0 would never be produced.

For another example, consider:

f: R → R ,   g: R -> R
f '' - 6g' = 6 sen x
6g'' + a² f ' = 6 cos x
f(0) = 0, f '(0) = 0, g(0) = 1, g'(0) = 1

The equations for f, g and their derivatives f ', f '', g' and g'' are solvable, but it is not easy to compute such functions.

    Some special purpose programming languages are able to calculate functions by applying special purpose calculation techniques (symbolic computation using computer algebra techniques or formula transformations). But a general purpose functional programming language uses a very simple model of computation, based on substitution. So when functions are defined in an FPL a computation through substitutions is defined implicitly (see Section 6).

In a sense, a function definition in a FPL, besides identifying a precise mathematical function, implicitely describes a way to compute it (an algorithm).

5. Functional programs

A program written in a functional language consists of a collection of function definitions, possibly recursive, and an initial expression that has to be evaluated.

From now on, for our running examples, we shall use the functional programming language SCHEME.

In Scheme a function definition consists in an expression of the form

(define (function-name arg1 .. argn) body)

The part immediately after the keyword "define" defines the function name and its formal arguments (also called formal parameters). The expression body specifies the function result. Such an expression can be a denotation of some value, or it can be a formal argument, or a function application.

In a function application a function is applied to an expression, the actual argument. The application of a function f to an expression a is represented, in Scheme, as

(f a)

So function application is denoted by simple juxtaposition of the function and its argument.

Below are some examples of function definitions in Scheme. The ";;" indicates that the rest of the line is a comment.

(define (ident x)  x)   
;; the identity function on numbers

(define (alwaysseven x) 7)  
;;  a function from num to num that yields 7, 
;; independent of arg. x

(define (inc x) (+ x 1))  
;;  a function that returns the value of its arg. + 1

(define (square x) (* x x))  
;; square function

(define (squareinc x) (square (inc x)))     
;; square increment of argument

(define (fac x) (if (= x 0) 1 (* x (fac (- x 1)))))   
;; the factorial function

Notice: in Scheme all functions are applied using the prefix notation.

In the last example if is a predefined operator with three arguments: a boolean, a then expression and an else expression. Its semantics corresponds to a conditional choice in imperative languages.

A formal argument, such as x in the example above, is also called a variable. The word variable is used here in the mathematical sense of the word that is not to be confused with the use of the word variable in an imperative language. This variable does not vary. Its scope is limited to the equation in which it occurs (whereas the defined function names have the whole program as scope).
Functions defined as above are called user-defined functions. One can also denote and manipulate objects of certain predefined types with given predefined operators. These predefined basic operators can be regarded as predefined functions.

The initial expression is the expression whose value has to be calculated. For example, in the case that the value of 2 + 3 has to be calculated, the initial expression

(+ 2 3)

is written. But one can also calculate any application of userdefined functions:

(squareinc 5)

6. The evaluation of a functional program

The execution of a functional program consists of the evaluation of the initial expression in the context of the function definitions the program is made of, such a context is called the environment.

An example of functional program: a set of function definitions and an initial expression.

(define (ident x)  x)   

(define (inc x) (+ x 1)) 

(define (square x) (* x x)) 

(define (squareinc x) (square (inc x))) 

(squareinc 7)

Running such a program means first to have all the define-expressions evaluated (their evaluation enriching the environment with new user-defined functions) and then to have the initial expression evaluated (of course the define-expressions are evaluated once and for all).
For istance, when we open the DrScheme or PLT implementation of Scheme, two windows appear. We write in the upper window all the define-expressions and then we ask the interpreter to evaluate them (roughly, the interpreter is the program that evaluates scheme expressions.).
In the lower window there appears a prompt '>' (showing that the interpreter is ready to evaluate an expression). We write our initial expression after the prompt and the interpreter returns its value (if any):

> (squareinc 7)
64
>

Let us see now how the Scheme interpreter evaluates an expressions.
The evaluation of the initial expression performed by the (interpreter of the) language consists of repeatedly performing reduction or rewriting steps.
In each reduction step (indicated by a '→') a function application in the expression is replaced (reduced, rewritten) by the corresponding function body, substituting the formal arguments by the corresponding actual arguments. A (sub)expression that can be rewritten according to some function definition is called a redex (reducible expression). The basic idea is that the reduction process stops when none of the function definitions can be applied any more (there are no redexes left) and the initial expression is in its most simple form, the normal form. This is the result of the functional program that is then printed out.

For instance, given the function definitions above (the environment), the initial expressions below can be evaluated (reduced) as follows. In the examples the redex that will be reduced has been underlined.

(ident 42) → 42

(square (+ 1 2)) → (square 3) → (* 3 3) → 9

(squareinc 7) → (square (inc 7) → (square (+ 7 1)) → (square 8) → (*8 8) → 64

However, the initial expression may not have a normal form at all. As a consequence, the evaluation of such an initial expression will not terminate. In some functional languages (not in Scheme), infinite computations may produce partial results that will be printed as soon as they are known.

Example of a non-terminating reduction. Take the following definition:

(define (inf x) (inf x))

then the evaluation of the following initial expression will not terminate:

(inf 2) → (inf 2) → (inf 2) → ...

7. The order of evaluation

   Because there are in general many redexes in one expression, one can perform rewrite steps in several orders. The actual order of evaluation is determined by the reduction strategy which is dependent on the kind of language being used. A reduction strategy is a policy used by an interpreter to choose the next redex to be reduced among the many possible ones.

   There is a very important things to know about the ordering of reduction steps. Due to the absence of side-effects, the result of a computation does not depend on the chosen order of reduction. After all the redexes are vanished, the result of the computation (if any) will always be the same: the normal form (the result) is unique (see The Confluence Property and its Corollary, in Section 13, for a formal proof of this fact).
For instance, one can compute one of the previous expressions in a different order, but the result is identical:

(square (+ 1 2)) → (* (1 + 2) (1 + 2)) → (* 3 (1 + 2)) → (* 3 3) → 9

   It is sometimes even possible to rewrite several redexes at the same time. This forms the basis for parallel evaluation.
Reducing several redexes at the same time:

(square (+ 1 2)) → (* (+1 2) (+1 2)) → (* 3 3) → 9

   However, the order is not completely irrelevant. Some reduction orders may not lead to the normal form at all. So a computation may not terminate in one particular order while it would terminate when the right order was chosen.

Example of a non-terminating reduction order.
Assume that the following (recursive) functions are defined:

(define (inf x) (inf x))

(define (alwaysseven x) 7)

Now it is possible to repeatedly choose the "wrong" redex which causes an infinite calculation:

(alwaysseven (inf 2)) → (alwaysseven (inf 2)) → ...

In this case another choice would lead to termination and to the unique normal form:

(alwaysseven (inf 2)) → 7

The reduction strategy followed by the interpreter depends on the kind of Functional Programming Language. In some languages, e.g. LISP, ML and HOPE, the arguments of a function are always reduced before the function application itself is considered as a redex. These languages are called eager or strict languages. The reduction strategies they use are called call-by-value strategies.
In most recent FPLs, e.g. Miranda and Haskell, the rewriting is done lazily. In lazy functional languages the value of a subexpression (redex) is calculated if and only if this value must be known to find the normal form. The lazy evaluation order will find the normal form if it exists. The strategies used by such languages belong to the class of call-by-name strategies.

The (alwaysseven (inf 2)) expression shows one advantage of using call-by-name stategies. In Haskell we shall see that this sort of strategies enable us also to define and use infinite data structures. One could then wonder why we need to have languages with call-by-value strategies at all. Well, call-by-name languages are far more easier to implement. Think about a function in whose body the formal parameter is used in several different places. Then apply the function to a complex expression. Intuitively, an efficient computation of this function application is easier if the strategy is call-by-value.

Let us see an example of evaluation with the strategies of Haskell and Scheme. Let us begin with Haskell.
The predefined conditional function demands the evaluation of its first argument to make a choice between the then and else parts possible. The equality function forces evaluation in order to yield the appropriate Boolean value. Multiplication and subtraction are only possible on numbers; again evaluation is forced.

(fac 2)
→ (if (= 2 0) 1 (* 2 (fac (- 2 1))))
→ (if FALSE 1 (* 2 (fac (- 2 1))))
→ (* 2 (fac (- 2 1)))
→ (* 2 (if (= (- 2 1) 0) 1 (* (- 2 1) (fac (- (- 2 1) 1)))))
→ (* 2 (if (= 1 0) 1 (* (- 2 1) (fac (- (- 2 1) 1)))))
→ (* 2 (if FALSE 1 (* (- 2 1) (fac (- (- 2 1) 1)))))
→ (* 2 (* (- 2 1) (fac (- (- 2 1) 1))))
→ (* 2 (* 1 (fac (- (- 2 1) 1))))
→ (* 2 (*1 (if (= (- (- 2 1) 1) 0) 1 (* (- (- 2 1) 1) (fac (- (- (- 2 1) 1) 1))))))
→ ... → (* 2 (* 1 1)) → (* 2 1) → 2

Scheme is a strict language. It evaluates expressions using a call-by-value reduction strategy, that means that the application of a function to an argument is a redex only if the argument is a value.
Let us see how the Scheme interpreter evaluates the expression (fac 2).

(fac 2)
→ (if (= 2 0) 1 (* 2 (fac (- 2 1))))
→ (if FALSE 1 (* 2 (fac (- 2 1))))
→ (* 2 (fac (- 2 1)))
→ (* 2 (fac 1))
→ (* 2 (if (= 1 0) 1 (* 1 (fac (- 1 1)))))
→ (* 2 (if FALSE 1 (* 1 (fac (- 1 1))))))
→ (* 2 (* 1 (fac 0)))
→ (* 2 (* 1 (if (= 0 0) 1 (* 0 (fac (- 0 1)))))))
→ (* 2 (* 1 (if TRUE 1 (* 0 (fac (- 0 1)))))))
→ (* 2 (* 1 1)))
→ (* 2 1) → 2

Notice how, also in languages like Scheme, the arguments of the conditional operator if, are not both evaluated before the application of if, since otherwise the computation would never stop.

8. Recursion

In imperative languages the most relevant control structure is iteration. In functional languages, instead, it is recursion that gives computational power to programs. Without recursion we could compute almost nothing in pure functional languages.

Recursion is a naturally applied process to define functions on recursively defined data (called also inductively defined data or algebraic data).

Defining recursive functions, rough intuition: A rough idea of how we can look at a recursive function definition is the following one.
When we define a function by recursion we first try to look at the elements of the domain as "built" out of "simpler" parts (or you can think about it in terms of "smaller" parts). Then we try to express the value of the function on a generic element in terms of the value of the function on its "simpler" parts.
We need also to provide the value of the function on the "simplest" ("smallest") elements.
We must also be sure that, starting from a generic element, it is not possible to go on getting simpler and simpler (smaller and smaller) elements for ever.

The easiest inductive data are obviously the natural numbers: a natural number is zero or the successor of a natural number (you see, a recursively define data type). So Natural Numbers have only one base element (zero) and a unary constructor (the successor operator).

Let us define a simple recursive function on numbers, which sums up all the natural numbers from n downto 0.

(define (sumdowntozero n)
      (if (= n 0)
          0
          (+ n (sumdowntozero (- n 1)))))

You notice that the base case is 0, but it could be different, like in this other example, where we sum up all the numbers in the interval [from..to], starting from the bigger one.

(define (sumdownto from to)
      (if (= from to)
          from 
          (+ to  (sumdownto from (- to 1)))))

Notice that we have always in mind that a recursive function must have the recursive calls performed on "smaller elements". For numerical inputs this means "smaller numbers".

But then, what about the following function, which returns the same result as the previous function?

(define (sumfromupto from to)
      (if (= from to)
          to
          (+ from (sumfromupto (+ from 1) to))))

Well, it works. And it is still true that recursive calls are on "smaller" elements down to the base case. It is just that now the relation of "smaller" is not the usual one on natural numbers (actually it is the opposite relation, where the smallest element is the (numerically) greatest of the from-to interval: "to"). We could be more precise if we studied well-founded sets.

9. Higher-order functions

Compared with the traditional imperative languages, which normally allow also the declaration of functions, functional programming languages have a more general view of the concept of functions: they are treated as ‘first-class citizens’, i.e. functions are treated just like other objects in the language. As a consequence, functions can have functions as arguments and as results.

Functions that have functions as actual arguments or yield a function as result are called higher-order functions, in contrast to first-order functions, which have only non-function values as argument or as result.

Example of a higher order function that takes a function as argument:

(define (atzero f) (f 0))

Let us see how the Scheme interpreter evaluates the following initial expressions (inc, square and ident are taken to be defined as before):

(atzero inc) → (inc 0) → (+ 0 1) → 1

(atzero square) → (square 0) → (* 0 0) → 0

(atzero ident) → (ident 0) → 0

In order to better understand the notion of higher-order functions, let us see a few functions that take functions as arguments.
Let g be a function on natural numbers, g: N -> N, which we assume to be computed by a Scheme function called G already defined.
We want to write a program that takes a natural number n as input and computes the sum, for i = 1... n, of the g(i)'s.

(define (SG n) 
    (if (= n 0) 
        0 
        (+ (G n) (SG (- n 1)))))

The program PG that, instead, computes the product of the elements g(i)'s, will be:

(define (PG n) 
    (if (= n 0) 
        1 
        (* (G n) (PG (- n 1)))))

Both the above programs can be generalized. Let us define the functions Sgen and Pgen that, besides taking as input a natural number n, take also the function on which the sums or the products are performed.

(define (Sgen f n) 
    (if (= n 0) 
        0 
        (+ (f n) (Sgen f (- n 1)))))

(define (Pgen f n) 
    (if (= n 0) 
        1 
        (* (f n) (Pgen f (- n 1)))))

Now, to compute the sum or the product of g(i) for i =1 ...10, we have simply to evaluate the initial expression (Pgen G 10) or (Sgen G 10).

We try now to generalize further, so we abstract both from the binary function applied to the various f(i)'s, and from the base index. Example: computing F_i=3..5f(i) corresponds to F(F(f(3),f(4)),f(5)). The following function does it in general:

(define (Ftoria F f b n) 
    (if (= n b) 
        (f b) 
        (F (f n) (Ftoria F f b (- n 1)))))

Now with this new function at hand we can easily define Sgen and Pgen as follows

(define (Sgen f n) 
    (Ftoria + f 1 n))

(define (Pgen f n) 
    (Ftoria * f 1 n))

We could also generalize Ftoria further, by abstracting on the predecessor function for the linear order over the finite set over which f is defined, and over the equality relation on the element of the set:

(define (FtoriaOrd F f low up pred eq) 
    (if (eq up low) 
        (f up) 
        (F (f up) (FtoriaOrd F f low (pred up) pred eq))))

Well... maybe with FtoriaOrd we have gone a bit too far with the generalization... but you have noticed how very general functions can be defined and how other functions can be easily defined out of them.

It is possible to have functions in which both arguments and result are functions.
One example is the function compose: it takes two functions as input and returns their compositions.
Notice that in order to define the function compose we need to be able to write expression representing a function, without having to give a name to it. That is we need to be able to describe anonymous functions, functions without a name.
In Scheme the syntax for anonymous functions is the following one:

(lambda (arg1...argn) body)

An expression like the above one represents a function that, given n arguments arg1...argn, returns the value of body as result (of course after replacing in it the formal arguments with the actual arguments.)

We can now easily define the compose function

(define (compose f g)
  (lambda (x) (f (g x))))

By means of anonymous functions we are also able to define Sgen using FtoriaOrd, in the following way.

(define (Sgen f n) 
    (FtoriaOrd + f 1 n (lambda (x)(- x 1)) =))

One more example: we define a higher-order function returning a function (from natural numbers to natural numbers) identical to the one given as argument, but in a point, for which a new value is provided.

(define (change-value-at-n  f  n  new-value)
    (lambda (x)
        (if (= x n)
            new-value
            (f x))))

Curryfication

The notion of higher-order function enables us to introduce the notion of curryfication, a relevant concept in functional languages. To currify means to transform a function f of arity n, in an higher-order function f_cur of arity 1 such that also f_cur(a₁) is a higher order function of arity 1, and so are f_cur(a₁)(a₂) up to f_cur(a₁)(a₂)..(a_n-1) and such that f(a₁,..,a_n)=f_cur(a₁)..(a_n).
A binary function of type:

A x B → C

is currifyed to a function of type:

A → (B → C)

where any function arity is 1.

Let us conside the following Scheme function

(define High-inc (lambda (x) (lambda ( y) (+ x y))))

We can notice that the function High-inc is indeed the currified version in Scheme of the addition.
The possibility of using currified functions definitely improves the expressive power of functional languages (that is why in Haskell all functions are already in their currified version, even the used-defined ones.)

Definitions
Up to now we have used the define-expressions to define functions. Actually the keyword 'define' can be used in general to associate, in the environment, a name to any value of Scheme.

(define a (+ 3 4))

The evaluation of the expression above associates in the environment the value 7 to the name "a". In fact if we now evaluate

(* a 2)

we get 14.
The general syntax of a define-expression is

(define name expr)

The evaluation of such an expression consists in evaluating the expression expr and in associating its value to the identifier "name" in the environment.
Notice that in Scheme functions are values, in fact our way to define a function:

(define (function-name arg1 .. argn) body)

is equivalent to writing

(define function-name (lambda (arg1 .. argn) body))

10. Data Structures: Lists

In imperative languages one can define global data structures that are globally accessible for reading and writing during execution. Since this is not possible in functional languages, the use of data structures in these languages is quite different. After creation of a data structure, the only possible access is read access. Data structures cannot be overwritten. Furthermore, data structures are not globally available but they must always be passed as arguments to the functions that need them. Modern functional languages allow the definition of complex data structures.
In this section we treat list. Lists is a data structure which is predefined in all the functional programming languages. Lists are recursive data structures, in fact the definition of list is recursive: A list is either the empty list or it is formed by an element followed by a list. It is then obvious that lists, since they are a recursively defined data structure, can be easily handled by means of recursion, the main control mechanism of functional languages.
In Scheme lists are not uniform, i.e. it is possible to have lists of elements of different types, unlike many other functional languages, like Haskell.

Lists, like any data structure in an FPL, are built using data constructors. Lists are constructed using two data constructors. A constant data constructor denoting the emplylist: empty (in some implementations we can also denote the emptylist by '() ), and a constructor, cons that, taken a value v and a list L returns the list formed by v followed by L.

The expression

(cons 1 (cons 2 (cons 3 empty)))

denotes the list formed by the numbers from 1 to 3.

Here is a list of four boolean elements:

(cons #t (cons #t (cons #f (cons #t empty))))

A list of non uniform elements:

(cons #t (cons 2 (cons #f empty)))

In order to avoid to write too many cons in building a list, one can use the predefined function list. The three lists above can be represented by the following expressions:

(list 1 2 3)

(list #t #t #f #t)

(list #t 2 #f)

Here is what the interpreter returns by evaluating the above expressions:
(list 1 2 3) → (1 2 3)
(list #t #t #f #t) → (#t #t #f #t)
(list #t 2 #f) → (#t 2 #f).

Since a list of values is a value itself, we can form lists having other lists as elements:

(cons 3 (cons (cons 2 (cons #t empty)) (cons #f empty)))

which is equivalent to

(list 3 (list 2 #t) #f)

The evaluation of the above expression returns the list (3 (2 #t) #f).

Of course we can give names to lists by means of define-expressions

(define Karaosmanglu 
   (cons 3 (cons (cons 2 (cons #t empty)) (cons #f empty))))

(define Pamuk (list #t 2 #f))

We can "build" list by means of constructors, but if we wish them to be of any use we need to be able to handle the parts they are made of. For these purpose there are the following predefined selectors: car and cdr.
car is an operator that takes a list in input and returns its first element. cdr takes a list and returns a list like the one in input, but without the first element.

(car (list 1 2 3 4 5 6))

returns 1.

(car Pamuk)

returns #t.

(cdr Pamuk)

returns (2 #f).

This way of handling lists is very user-friendly. One needs to worry neither about "memory management" nor about "updating pointers".

In order to handle lists in programs, we need also predicates on them.
A predicate is a function that returns boolean values (by notational convention in Scheme the name of a predicate always terminates with the character '?'). In Scheme the predefined predicate on lists is null?. When applied to a list it returns #t if and only if the argument is the emptylist. It will be very useful in defining recursive functions on lists.

Let us now write a few recursive programs dealing with lists.

Recursion on (flat) lists

By a flat list we mean a list were no element in it is a list.

Let us see the apnd function that concatenates two lists given in input (we call it "apnd" since the function "append" is predefined in most Scheme implementations).

 (define  apnd (lambda ( ls1 ls2 )
     ( if  (null? ls1)
           ls2
           ( cons (car ls1) (apnd (cdr ls1 ) ls2)) )))

Exercise: Program the function reverse (i.e. the function that returns a list whose elements are the same of the input list, but in the reverse order).

The apnd function is an example of function defined by recursion on the lenght of the list, this informally means that the recursive calls are on shorter lists than the input one.

Let's define the function (predicate) member?. It checks whether a number is in the list of numbers given as input. .

(define (member? elem ls)
    (if (null? ls)
        #f
        (if (= elem (car ls))
            #t
           (member? elem (cdr ls)))))

We can use boolean operators in order to get a simpler and clearer definition. Boolean operators are very helpful to define predicates.

Notice: the boolean operators or and and of Scheme are actually operators that can take an arbitrary number of expressions as arguments. The arguments are evaluated in sequence.
and returns #f as soon as the evaluation of an argument returns #f.
or returns #t as soon as the evaluation of an argument returns #t.

(define (member? elem ls)
   (and (not (null? ls))
        (or (= elem (car ls))
            (member? elem (cdr ls))))))

Exercise: write functions to sort a list using the different sorting algorithms you know.

Now let's consider another example of recursion on the length of a list: a function summing up all the numbers present in a list of numbers.

(define (Sumlist ls)
     (if (null? ls)
         0
         (+ (car ls) (Sumlist (cdr ls)))))

Intuition: How did we manage to program this function? Well, first we provided its result ( 0 ) in case the input be the basic element of the domain (the empty list). Then, by assuming that our program works correctly on lists that are "smaller" than the input, (Sumlist (cdr ls)), we "built" the result for the generic input ls: (+ (car ls) (Sumlist (cdr ls))).
We considered (cdr ls) to be "smaller" than ls.
But why do we have to use "smaller" inputs for the recursive calls?
In order to guarantee the computation eventually stops. In fact we cannot go on forever calling the function on smaller and smaller elements. Sooner or later we reach the emptylist, on which the function is correctly defined.
The problem is how we can be sure that it is not possible to have an infinite sequence of smaller and smaller elements.
For the present function there is no problem, since a "smaller" input consists in a shorter list, and it is impossible to have an infinite sequence of shorter and shorter lists.
But we shall see that it is not always so simple....

So, we need
(1) to provide the result for the simplest elements of the domain;
(2) to provide the result for all the other possible "shapes" of the input using the assumption that the program works correctly on "smaller" arguments than the input.;
(3) to be sure that there exists no infinite sequence of smaller and smaller elements

Point (3) above can be formally proved by using the notion of well-founded precedence relation.
We do not study such a notion in the present very introductory notes.

Let us write a function that sums up the number in a list using the constraint that the first argument of an addition must always be 1 (it is a weird and inefficient way to sum up the elements of a list, but in our examples we do not care about efficiency, we simply wish to practice with recursion.)

(define (Sumlistweird ls)
     (cond ((null? ls) 0)
           ((= (car ls) 0) (Sumlistweird (cdr ls)))
           (else (+ 1 
	            (Sumlistweird (cons (- (car ls) 1) 
		                        (cdr ls)))))))

Intuition: In this case we provided the value for the simplest element and for all the possible shapes of the input different from the emptylist:
(in an informal notation)
Sumlistweird emptylist = 0
Sumlistweird (cons 0 (cdr ls)) = Sumlist (cdr ls)
Sumlistweird (cons n (cdr ls)) = 1 + (Sumlist (cons (- n 1) (cdr ls)))

We assumed that Sumlistweird works correctly on "smaller" elements than ls. But now we are considering both (cdr ls) and (cons (- n 1) (cdr ls)) to be smaller than ls! Notwithstanding (cons (- n 1) (cdr ls)) has the same lenght as ls!
Actually this is not a problem, we need only to be sure that, according to this particular notion of "being smaller" there is no infinite sequence of smaller and smaller lists. The notion of well-founded precedence relation could help us to get a formal proof of that.

Recursion on nested lists

In Scheme lists are not homogeneus, so we can have anything in a list, also other lists.

One-level nesting
Let us consider a function on lists of lists of numbers.
The following function returns the sum of all the numbers present in a list of lists of numbers. (We call lsls the argument, in order to recall that it is a list of lists (of numbers)).
Notice that in Scheme (caar exp) stands for (car (car exp)).
Moreover, (cdar exp) stands for (cdr (car exp), and so on.

(define (Sumlistlist lsls)
    (if (null? lsls)
        0
        (if (null? (car lsls))
            (Sumlistlist (cdr lsls))
            (+ (caar lsls) 
	       (Sumlistlist (cons (cdar lsls) 
	                           (cdr lsls)))))))

Useless to say that the above function could be defined far more easily by means of the functions Sumlist and map (do it as exercise).
But now we are not interested in the best ways to make sums, we are just playing with recursion in its various forms.

Now we write a "weird" version of Sumlistlist: a version where "+" can be used only with 1 as first argument.

(define (Sumlslsweird ls)
   (cond ((null? ls) 0)
         ((null? (car ls)) (Sumlslsweird (cdr ls)))
         ((= (caar ls) 0) (Sumlslsweird (cons (cdar ls) 
	                                      (cdr ls))))
         ( else (+ 1 
	        (Sumlslsweird (cons (cons (- (caar ls) 1) 
		                          (cdar ls)) 
                                    (cdr ls)))))))

This is an example of triple induction: on the length of the list, on the length of the first element and on the numerical value of the first number of the first element of the list.

Multiple levels of nesting
We consider now lists containing numerical elements or other lists at any level of nesting.

Let us define the function (actually a predicate) genmember?.
Such a function checks whether a number is in a list containing numbers at any level of nesting.

(define (genmember? elem ls)
    (if (null? ls)
         #f
        (if (list? (car ls))
            (or (genmember? elem (car ls))
                (genmember? elem (cdr ls)) )
            (or (eqv? elem (car ls))
                (genmember? elem (cdr ls))))))

We write now a function that sums all the numbers present in the input (a list containing numerical elements or other lists at any level of nesting.)

(define (Gensumls llss)
    (if (null? llss)
        0
        (if (list? (car llss))
            (+ (Gensumls  (car llss))
               (Gensumls  (cdr llss)) )
            (+ (car llss) (Gensumls  (cdr llss))))))

Notice that the recursion is not on the level of nesting. In fact (cdr llss) could have the same level of nesting of llss.

If somebody knows the game of the Hanoi towers, the program to solve it in the right way is the following.

(define (tower-of-hanoi n peg1 peg2 peg3)
   (if (= n 0)
       '()
       (append (tower-of-hanoi (- n 1) 
                               peg1 
			       peg3 
			       peg2)
               (cons (list peg1 peg3) 
                     (tower-of-hanoi (- n 1) 
		                     peg2 
				     peg1 
				     peg3)))))

In this definition of the tower the problem is defined by giving the number of disks and the name of the three pegs (that can be also be identified by a number); it is assumed that the first one is where all the disks are at the beginning. The third one is where the disks have to be moved to.

Of course we can have recursion on all sort of inductively defined Data Types (trees for examples).

11. Correctness proofs of functional programs

A functional programming style has advantages that are common in any mathematical notation:

There is consistency in the use of names: variables do not vary, they stand for a, perhaps not yet known, constant value throughout their scope.
Thanks to the absence of side-effects, in FPLs the same expression always denotes the same value. This property is called referential transparency.

A definition like x = x + 1 would mean in an imperative language that x is incremented. In a functional language however this definition means that any occurrence of x can always be substituted by x + 1.
Clearly the initial expression x does not have a normal form: substitution will continue forever (some systems recognize such situations in certain simple cases and report an error message).

x → x+1 → (x+1)+1 → ...

Due to the referential transparency one is allowed to replace in any expression any occurrence of any left-hand side of a definition by its corresponding right-hand side and vice versa

Referential transparency makes it possible to use common mathematical techniques such as symbolic substitution and induction. With the help of these techniques a program can be transformed into a more efficient one or certain properties of a program, such as its correctness, can be proven.

The use of Induction principles for proving functional program properties

Recursion is the basis of the computational and expressive power of functional programming languages. You can notice how its working principle is analogous to that used in mathematical proofs by induction.
In fact, in a recursive program the output is defined for the base elements and then, assuming the program correctly works for "simpler", "smaller" elements than the input, the output is defined for the generic input.
In a proof by mathematical induction the property is proven for the base case (usually 0) and then, assuming the property to hold for smaller numbers, we prove the property for the generic number.
Refresh your knowledge about proofs by mathematical induction.

You can notice that the principle of mathematical induction is a logical rule that sort of implicitely "expresses" also a model of calculation. It corresponds, in functional programming, to defining numerical functions by recursion like the following one:

 (define (fact n) 
     (if (= n 0) 
         1 
         (* n (fact (- n 1)))))

the value of the function is given for n=0 and is calculated for n assuming that fact correctly works for n-1.

The close correspondence betweeen recursive functions and the principles of induction, make it possible, for example, to use induction to formally prove properties of recursive programs.
It is natural to use the principle of mathematical induction if we wish to show that the factorial is total, i.e. that for all n (fact n) converges (i.e. that its evaluation eventually terminates)
We need:
1. to prove that (fact 0) converges
2. to prove that, for a generic k not equal to 0, (fact k) converges, using the inductive hypothesis that (fact (- k 1)) converges).

Let's consider the definition of factorial given above and let's prove the two points above:
1. if k=0 you evaluate the "then" part of the conditional, therefore the value of the expression that represents the factorial is 1, that is (fact 0) converges ;
2. if k >0 you evaluate the "else" part, hence the value of (fact k) is the value of the expression (* k (fact (- k 1))). Hence, by assuming that the evaluation of (fact (- k 1)) converges, the evaluation of whole expression (* k (fact (- k 1))) trivially converges.

In order to prove properties of programs that works on more complex data structures thatn number, we need to use more powerful mathematical tools, for instance the principle of well-founded induction (that we do not treat here).

12. Typed Languages

The class of functional languages can be divided into two main groups: Strongly typed languages (usually referred to simply as typed languages) and Non strongly typed languages (usually referred to simply as untyped languages).
A type, in general, can be looked at as a partial specification of a program (a function) or of an expression. These partial specifications, according to the particular type system of the language considered, can be more or less detailed.

By means of types we can avoid errors like:

true + 3

in fact the interpreter knows that + takes a pair of numbers and returns a number, i.e. that its type is

+:: int x int → int

A language is strongly typed whenever any function and any legal expression of the language has (or can be given) a type.

In (strongly) typed languages errors like the one shown above are detected by the Type Checker, before the program is "executed" (statically). In not (strongly) typed languages this sort of errors can be detected only at run-time, by the interpreter itself (the program's execution is stopped whenever an error is detected).

If a function in a typed language has a type int->bool, it means that the function (program) takes an integer as an argument and returns a boolean as result.

Scheme is an untyped language. In fact, we could write the following legal function definition

(define (f x) ((if (fac (list x x)) 5 #t)))

It is easy to notice that the function f cannot be given a type...
We know that fac can work correctly only on natural numbers, not on lists, but the language Scheme can realize it only when the function f is evaluated on a given input (at "run time"). Moreover, even if we write

(define (g x) ((if (> x 4) 5 #t)))

also g would be untypable, simply because we do not know which is the codomain of g, since the then part of the conditional is a number, whereas the else part is a boolean.

Typed languages, like Haskell, prevents us to write functions like f and g, since they realize statically, at the moment of their definition, that f and g cannot be typed.
Notice that this is a good thing for what concerns f, since in such a way the Type Checker prevents us to apply fac to a list.
For what concerns g, however, you can notice that even if the Type Checker cannot give a type to g, such a function is not "intrinsically" wrong. Its application does not cause any disaster, we simply do not know if the result would be a number or a boolean.

The above discussion shows that, on the one hand, typed languages prevents us to write functions, like f, that can cause some disaster during their application, and this is a good thing.
On the other hand they prevent us to write functions, like g, that are not intrinsically dangerous and that in same case could be useful.
In a nutshell: typed languages are safer, while untyped languages are more flexible. However, if I can choose between a typed and an untyped language... I prefer the first one!

13 The λ-calculus and computation theory

The Lambda-calculus is the computational model the functional languages are based on.
It is a very simple mathematical formalism. In the theory of the λ-calculus one is enabled to formalize, study and investigate properties and notions common to all the functional languages, without being burdened by all the technicalities of actual languages, useless from a theoretical point of view.
In a sense, the λ-calculus it is a paradigmatic and extremely simple functional language.

The concepts the Lambda-calculus is based on are those that are fundamental in all functional programming languages:

variable         ( formalisable by   x, y ,z,…)
abstraction (anonymous function)    (formalisable by   λx.M
                                                       where M is a term and x a variable)
application    (formalizable by MN, where M and N are terms )

We have seen that these are indeed fundamental notions in functional programming. It seems, however, that there are other important notions: basic elements, basic operators and the possibility of giving names to expressions.
Even if it could appear very strange, this notions actually are not really fundamental ones: they can be derived from the first ones.
This means that our fucntional computational model can be based on the concepts of variable, functional abstracion and application alone.
Using the formalization of these three concepts we form the lambda-terms. These terms represent both programs and data in our computational model (whose strenght in fact strongly depends on the fact it does not distinguish between "programs" and "data".)

The formal definition of the terms of the Lambda-calculus (lambda-terms) is given by the following grammar:

Λ ::= X | (ΛΛ) | λX.Λ

where Λ stands for the set of lambda-terms and X is a metavariable that ranges over the (numerable) set of variables (x, y, v, w, z, x2, x2,...)
Variable and application are well-known concepts, but what is abstraction?
Abstraction is needed to create anonymous functions (i.e. functions without a name). We have seen that being able to specify an anonymous function is very important in fucntional programming, but it is also important to be able to associate an identifier to an anonymous function. Nonetheless we can avoid to introduce in our model the notion of "giving a name to a function". You could ask: how is it possible to define a function like fac without being able to refer to it trought its name??? Believe me, it is possible!.

The term λx. M represents the anonymous function that, given an input x, returns the "value" of the body M.
Other notions, usually considered as primitive, like basic operators (+, *, -, ...) or natural numbers, are not strictly needed in our computational model (indeed they can actually be treated as derived concepts.)

Bracketing conventions

For readability sake, it is useful to abbreviate nested abstractions and applications as follows:

(λx₁.(λx₂. ... . (λx_n.M) ... ) is abbreviated by (λx₁x₂ ... x_n.M)

( ... ((M₁M₂)M₃) ... M_n) is abbreviated by (M₁M₂M₃ ... M_n)

We also use the convention of dropping outermost parentheses and those enclosing the body of an abstraction.

Example:

(λx.(x (λy.(yx)))) can be written as λx.x(λy.yx)

In an abstraction λx.P, the term P is said to be the scope of the abstraction.
In a term, the occurrence of a variable is said to be bound if it occurs in the scope of an abstraction (i.e. it represents the argument in the body of a function).
It is said to be free otherwise.

These two concepts can be formalized by the following definitions.

Definition of bound variable
We define BV(M), the set of the Bound Variables of M, by induction on the term as follows:

BV(x) = Φ (the emptyset)
BV(PQ) = BV(P) U BV(Q)
BV(λx.P) = {x} U BV(P)

Definition of free variable
We define FV(M), the set of the Free Variables of M, by induction on the term as follows:

FV(x) = {x}
FV(PQ) = FV(P) U FV(Q)
FV(λx.P) = FV(P) \ {x}

Substitution

The notation M [L / x] denotes the term M in which any free (i.e. not representing any argument of a function) occurrence of the variable x in M is replaced by the term L.

Definition of substitution (by induction on the structure of the term)
1. If M is a variable (M = y) then:

_{y [L/x]}_≡	{	L if x=y
		y if x≠y

2. If M is an application (M = PQ) then:

PQ [L/x] = P[L/x] Q[L/x]

3. If M is a lambda-abstraction (M = λy.P) then:

_λy.P[L/x]_≡	{	λy.P if x=y
		λy.(P [L /x]) if x≠y

Notice that in a lambda abstraction a substitution is performed only if the variable to be substituted is free.

Great care is needed when performing a substitution, as the following example shows:

Example:

Both of the terms   (λx.z)    and    (λy.z)    obviously represent the constant function which returns z for any argument

Let us assume we wish to apply to both of these terms the substitution [x / z] (we replace x for the free variable z). If we do not take into account the fact that the variable x is free in the term x and bound in (λx.z) we get:

(λx.z) [x / z] => λx.(z[x / z]) => λx.x     representing the identity function
(λy.z) [x / z] => λy.(z[x/ z]) => λy.x     representing the function that returns always x

This is absurd, since both the initial terms are intended to denote the very same thing (a constant function returning z), but by applying the substitution we obtain two terms denoting very different things.

Hence a necessary condition in order the substitution M[L/x] be correctly performed is that free variables in L does not change status (i.e. become not free) after the substitution. More formally: FV(L) and BV(M) must be disjoint sets.
Such a problem is present in all the programming languages, also imperative ones.

Example:

Let us consider the following substitution:

(λz.x) [zw/x]

The free variable z in the term to be substituted would become bound after the substitution. This is meaningless. So, what it has to be done is to rename the bound variables in the term on which the substitution is performed, in order the condition stated above be satisfied:

(λq.x) [zw/x]

now, by performing the substitution, we correctly get: (λq.zw)

The example above introduces and justifies the notion of alpha-convertibility

The notion of alpha-convertibility

A bound variable is used to represent where, in the body of a function, the argument of the function is used. In order to have a meaningful calculus, terms which differ just for the names of their bound variables have to be considered identical. We could introduce such a notion of identity in a very formal way, by defining the relation of α-convertibility. Here is an example of two terms in such a relation.

λz.z =_α λx.x

We do not formally define here such a relation. For us it is sufficient to know that two terms are α-convertible whenever one can be obtained from the other by simply renaming the bound variables.
Obviously alpha convertibility mantains completely unaltered both the meaning of the term and how much this meaning is explicit. From now on, we shall implicitely work on λ-terms modulo alpha conversion. This means that from now on two alpha convertible terms like λz.z and λx.x are for us the very same term.

Example:

(λz. ((λt. tz) z)) z

All the variables, except the rightmost z, are bound, hence we can rename them (apply α-conversion). The rightmost z, instead, is free. We cannot rename it without modifying the meaning of the whole term.

It is clear from the definition that the set of free variables of a term is invariant by α-conversion, the one of bound variables obviously not.

The formalization of the notion of computation in λ-calculus: The Theory of β-reduction

In a functional language to compute means to evaluate expressions, making the meaning of a given expression more and more explicit. Up to now we have formalized in the lambda-calculus the notions of programs and data (the terms in our case). Let us see now how to formalize the notion of computation. In particular, the notion of "basic computational step".
We have noticed in the introduction that in the evaluation of a term, if there is a function applied to an argument, this application is replaced by the body of the function in which the formal parameter is replaced by the actual argument.
So, if we call redex any term of the form (λx.M)N, and we call M[N/x] its contractum, to perform a basic computational step on a term P means that P contains a subterm which is a redex and that such a redex is replaced by its contractum (β-reduction is applied on the redex.)
The relation between redexes and their contracta is called notion of β-reduction, and it is represented as follows.

The notion of β-reduction

(λx.M) N →_β M [N/x]

A term P β-reduces in one-step to a term Q if Q can be obtained out of P by replacing a subterm of P of the form (λx.M)N by M[N/x]

Example:

If we had the multiplication in our system, we could perform the following computations

(λx. x*x) 2 => 2*2 => 4

here the first step is a β-reduction; the second computation is the evaluation of the multiplication function. We shall see how this last computation can indeed be realized by means of a sequence of basic computational steps (and how also the number 2 and the function multiplication can be represented by λ-terms).

Strictly speaking, M → N means that M reduces to N by exactly one reduction step. Frequently, we are interested in whether M can be reduced to N by any number of steps.

we write M —» N if M → M₁ → M₂ → ... → M_k ≡ N (K ≥ 0)

It means that M reduces to N in zero or more reduction steps.

Example:

((λz.(zy))(λx.x)) —» y

Normalization and possibility of non termination

Definition
If a term contains no β-redex, it is said to be in normal form.
Example:

λxy.y and xyz are in normal form.

A term is said to have a normal form if it can be reduced to a term in normal form.
Example:

(λx.x)y is not in normal form, but it has the normal form y

Definition:
A term M is normalizable (has a normal form, is weakly normalizable) if there exists a term Q in normal form such that

M —» Q

A term M is strongly normalizable, if there exists no infinite reduction path out of M. That is, any possible sequence of β-reductions eventually leads to a normal form.

Example:

(λx.y)((λz.z)t)

is a term not in normal form, normalizable and strongly normalizable too. In fact, the only two possible reduction sequences are:

1. (λx.y)((λz.z) t) → y

2. (λx.y)((λz.z) t) → (λx.y) t) → y

To normalize a term means to apply reductions until a normal form (if any) is reached. Many λ-terms cannot be reduced to normal form. A term that has no normal form is

(λx.xx)(λx.xx)

This term is usually called Ω.

Although different reduction sequences cannot lead to different normal forms, they can produce completely differend outcomes: one could terminate while the other could "run" forever. Typically, if M has a normal form and admits an infinite reduction sequence, it contains a subterm L having no normal form, and L can be erased by some reductions. The reduction

(λy.a)Ω → a

reaches a normal form, by erasing the Ω. This corresponds to a call-by-name treatment of arguments: the argument is not reduced, but substituted 'as it is' into the body of the abstraction.
Attempting to normalize the argument generates a non terminating reduction sequence:

(λy.a)Ω →(λy.a)Ω → ...

Evaluating the argument before substituting it into the body correspons to a call-by-value treatment of function application. In this examples, a call-by-value strategy never reaches the normal form.

Some important results of β-reduction theory

The normal order reduction strategy consists in reducing, at each step, the leftmost outermost β-redex. Leftmost means, for istance, to reduce L (if it is reducible) before N in a term LN. Outermost means, for instance, to reduce (λx.M) N before reducing M or N (if reducible). Normal order reduction is a call-by-name evaluation strategy.

Theorem (Standardization)
If a term has a normal form, it is always reached by applying the normal order reduction strategy.

Theorem (Confluence)
If we have terms M, M₁ and M₂ such that:

M —» M₁ and M —» M₂

then there always exists a term L such that:

M₁ —» L and M₂ —» L

That is, in diagram form:

Corollary (Uniqueness of normal form)
If a term has a normal form, it is unique.

Proof:

Let us suppose to have a generic term M, and let us suppose it has two normal forms N and N'. It means that

M —» N and M —» N'

Let us now apply the Confluence theorem. Then there exists a term Q such that:

N —» Q and N' —» Q

But N and N' are in normal form, then they have not any redex. So, it necessarily means that

N —» Q in 0 steps

and hence that N ≡ Q.
Analogously we can show that N' ≡ Q. It is then obvious that N ≡ N'.

The above result is a very important one for programming languages, since it ensures that if we implement the same algorithm in Scheme and in Haskell, the functions we obtain give us the very same result (if any), even if Scheme and Haskell are functional languages that use very different reduction strategies.

In the syntax of the λ-calculus we have not considered primitive data types (the natural numbers, the booleans and so on), or primitive operations (addition, multiplication and so on), because they indeed do not represent really primitive concepts. As a matter of fact, we can represent them by using the essential notions already formalized in the λ-calculus, namely variable, application and abstraction.

The theory of β-conversion

It is possible to formalize the notion of computational equivalence of programs into the lambda calculus. This is done by introducing the relation of β-conversion among λ-terms, denoted by M =_β N
Informally, we could say that two terms are β-convertible when they "represent the same information": actually if one can be transformed into the other by performing zero or more β-reductions and β-expansions.
(A β-expansion being the inverse of a reduction: P[Q/x] ← (λx.P)Q ) .
We shall not treat the theory of β-conversion in our course.

Representing data types and functions on them

In the syntax of the λ-calculus we have not considered primitive data types (the natural numbers, the booleans and so on), or primitive operations (addition, multiplication and so on), because they indeed do not represent really primitive concepts. As a matter of fact, we can represent them by using the essential notions already formalized in the λ-calculus, namely variable, application and abstraction. We shall not show how to do it (even if it fairly simple and natural). Even more, it is possible to represent in the lambda-calculus any computable function.

Representing recursive functions

In the syntax of the λ-calculus there is no primitive operation that enables us to give names to lambda terms. However, that of giving names to expressions seems to be an essential features of programming languages in order to define functions by recursion, like in Scheme

(define fact (lambda (x) (if (= x 0) 1 (* x (fact (- x 1))))))

or in Haskell

fact 0 = 1

fact n = n * fact (n-1)

Then, in order to define recursive functions, it would seem essential to be able to give names to expressions, so that we can recursively refer to them.
Surprisingly enough, however, that's not true at all! In fact, we can represent recursive functions in the lambda-calculus by using the few ingredients we have already at hand, exploiting the concept of fixed point.
The notion of fixed point can be represented in the lambda calculus through terms like

λf. (λx. f(xx)) (λx. f(xx))

(this term is traditionally called Y)
In fact, for any term M we have that M(YM)=_βYM
If we look at M as a function (since we can apply terms to it), the term YM is then a fixed point of M (a fixed point of a function f:D->D being an element d of D such that f(d)=d).
From fixed points we can derive the notion of recursion in the lambda calculus: in fact, intuitively, what is the function factorial? it is the function such that

factorial = λx. if (x=0) then 1 else (x*factorial(x-1))

this means that the function factorial is the fixed point of

λf. λx. if (x=0) then 1 else (x*f(x-1))

So, recursive function (or better, recursive algorithms) can be represented in the lambda-calculus by means of fixed points (and hence by using terms like Y.)

Reduction strategies

One of the aspects that characterizes a functional programming language is the reduction strategy it uses. Informally a reduction strategy is a "policy" which chooses a β-redex , if any, among all the possible ones present in the term. It is possible to formalize a strategy as a function, even if we do not do it here.

Let us see, now, same classes of strategies.

Call-by-value strategies
These are all the strategies that never reduce a redex when the argument is not a value.
Of course we need to precisely formalize what we intend for "value".
In the following example we consider a value to be a lambda term in normal form.

Example:

Let us suppose to have:

((λx.x)((λy.y) z)) ((λx.x)((λy.y) z))

A call-by-value strategy never reduce the term (λx.x)((λy.y) z) because the argument is not a value. In a call-by-value strategy , then, we can reduce only one of the redex underlined.

Scheme uses a specific call-by-value strategy, and a particular notion of value.
Informally, up to now we have considered as "values" the terms in normal form. For Scheme, instead, also a function is considered to be a value (an expression like λx.M, in Scheme, is a value).
Then, if we have

(λx.x) (λz.(λx.x) x)

The reduction strategy of Scheme first reduces the biggest redex (the whole term), since the argument is considered to be a value (a function is a value for Scheme). After the reduction the interpreter stops, since the term λz.((λx.x) x) is a value for Scheme and hence cannot be further evaluated.

Call-by-name strategies
These are all the strategies that reduce a redex without checking whether its argument is a value or not. One of its subclasses is the set of lazy strategies. Generally, in programming languages a strategy is lazy if it is call-by-name and reduces a redex only if it's strictly necessary to get the final value (this strategy is called call-by-need too). One possible call-by-name strategy is the leftmost-outermost one: it takes, in the set of the possible redex, the leftmost and outermost one.

Example:

Let us denote by I the identity function and let us consider the term:

(I (II)) ( I (II))

the leftmost-outermost strategy first reduces the underlined redex.

We have seen before that the leftmost-outermost strategy is very important since, by the Standardization Theorem, if a term has a normal form, we get it by following such a strategy. The problem is that the number of reduction steps is sometimes greater then the strictly necessary ones.

If a language uses a call-by-value strategy, we have to be careful because if an expression has not normal form, the search could go on endlessly.

14 Types and Typed Lambda-Calculi

Introduction to types

The functional programming languages can be typed or untyped. A typed one is, for example, Haskell, while an untyped language is Scheme. By introducing the notion of type in the λ-calculus we shall be able to study it in an abstract and simplified way.
We can look at types as predicates on programs or, better, as partial specifications about the behaviour of programs. In the programming practice the use of types is helpful to prevent a number of programming errors.

There are two different approaches to the use of types.

the approach a' la Curry (the one used in Haskell)
the approach a' la Church (the one used in Pascal)

In the Curry approach, we first write a program and then check whether it satisfies a given specification.
Instead, in the Church approach, types are included in the syntax of the language and, in a sense, represent some constraints we have to follow during the writing of our programs in order to avoid making particular errors.
In the approach a' la Curry, checking if a term M satisfies the specification represented by the type t, it is usually referred to as "assigning the type t to the term M".

Type assignment 'a la Curry

In this approach we firsr write a program and then we check whether it satisfies the specification we had in mind. This means that we have two distinct sets: programs and partial specifications.
If we wish to talk in a general and abstract way about types (a' la Curry) in functional programming languages we can proceed with the following formalization:

Programs : the set of λ-terms, i.e.	/	Partial specifications: the set of types, i.e.
Λ	/	T ::= φ \| T → T

where φ is a meta-variable which ranges over a set of type variables (φ, φ', φ'', a, b, ... etc.). In this formalization we have considered the simplest form of type (in fact these types are called simple types). Elements of T will be denoted by σ, τ , ρ, etc.

There arise now a whole bunch of questions we should answer:

1) How can we formally show that, given a program (a term) M and a specification (a type) σ, M satisfies the specification σ? That is, how can we prove that M has type σ?

2) Is it decidable, given a M and a σ, that M is of type σ? i.e., given a program (term) M and a specification (type) σ, is there any algorithm enabling us to check whether M satisfies σ?

3) How can we be sure that the type of a term actually provide a piece of information about the semantics of a term?

4) What is the intrinsic meaning of having type variables in our system? has the use of type variables some implication for our system?

5) Is it possible to "compare" types? i.e., is it possible to formalize the notion that a type represents a "more general" specification than another one?

6) Is there any relation among all the possible types we can assign to a term M? there exists the most general type among them? If so, can it be algorithmically found?

Let us start with question 1).
What we need to do is to devise a formal system whose judgments (typing judgements) are of the form

M : σ

and whose meaning is: the program M respects the specification σ (formally, the term M has type σ).

If we think about it, however, we can notice that our typing judgments should provide some more information than simply M : σ.
In fact, let us consider the term λx.yx. Which sort of specification is satisfied by such a program?
In case the input is of type φ, a type for λx.yx is definitely of the form

φ → something

But now, how can we found out what something could be, if we do not have any information about the free variable y? Our program (term) takes an input x and returns the result of the application of y to x. If we do not make any assumption about the type of the free variable y, we cannot say anything about the type of yx.
This means that the (typing) judgments we are looking for should indeed have the following shape:

B |— M:σ

where B (called a basis), provides informations about the types of the free variables of M.
The meaning of the typing judgment B |— M : σ is then:
using the informations given by the basis B, the term M satisfies the specification σ
(i.e. M has type σ under the assumptions provided by the basis B; or, shortly, M has type σ in B.)

Formally B will be a set of pairs of the form : variable : type

An example of basis is, for instance, B = {x : τ , z : ρ}

Let us introduce now the formal system enabling us to infer types for lambda-terms.

Definition (Curry type assignment system)

Axiom

—————
B |— x : σ

if x : σ is in B

(a variable x has type σ if the statement x : σ is in B).

Inference rules

These rules allow to derive all the possible valid typing judgements.

B \|— M : σ → τ	B \|— N : σ
——————————————— B \|— MN : τ

This rule says that if, by using the assumptions in B, we can prove that M has type
σ → τ and N has type σ, then we are assured that, in the basis B, the application MN has type τ. This rule is usually called (→E): "arrow elimination". This rule can be "read" also bottom-up, as follows: if we wish to prove that MN has type τ in B, then we need to prove that M has type σ → τ in B for some σ and that N has type σ in B.

B, x : σ |— M : τ
—————————
B |— λx.M : σ → τ

This rule says that if, by using the assumptions in B and the further assumption "x has type σ", we can prove that M has type τ, then we we are assured that, in B, λx.M has type σ → τ. This rule is called (→I) : "arrow introduction". Also this rule can be "read" bottom-up. Do it as exercise.

A term M is called typeable if there exist a basis B and a type σ such that the judgment
B |— M : σ can be derived in the assignment system above.

The above formal system is the core of the type systems of languages like Haskell, ML and others. This is the main motivation for which we are studying it.

Remark:

In Haskell, before writing the definition of a function, we can declare what type our function is intended to have, for instance:

fact :: Int → Int

By doing that, after defining the function fact the Haskell interpreter can check whether our program fact satisfies the given specification. In particular, the interpreter tries to derive, in a formal system like the one defined before, the Haskell typing judgement fact :: Int →Int.
Note: In Haskell we have not the concept of basis, because in correct programs an identifier either is a bound variable or it has been previously associated to some value.

Example:

Let Ø be the empty set and let us consider the judgment

Ø |— λxy.x : σ → (τ → σ)

whose informal meaning is: by considering no assumption at all (the empty basis), the term λxy.x satisfies the specification σ → (τ → σ).
If we wish to formally prove that the above judgment is valid, we need to use the Curry type assignment system.
Let us start from the axiom:

————————
{x : σ, y : τ} |— x : σ

By using (→I) two times, we get

{x : σ, y : τ} |— x : σ
————————
{x : σ} |— λy.x : τ → σ
———————————
Ø |— λx.λy.x : σ → (τ → σ)

Let us notice that in the last two steps we have discharged, from the basis we have started with, the assumptions for the bound variables of the term. In fact, a basis has to provide only informations about free variables, since the informations about bound variables are already contained in the type of the term.

Example:

We want to prove that:

B = {f : σ → σ, x : σ} |— f (f (fx)) : σ

Starting from B, we use the axiom and the rules of our formal system:

—————— B \|— f : σ → σ	———— B \|— x : σ
———————————— B \|— (fx) : σ		—————— B \|— f : σ → σ
——————————————————— B \|— f (fx) : σ			—————— B \|— f : σ → σ
———————————————————————————————— B \|— f (f (fx)) : σ

We can go on and derive, from the derivation above, other judgments:

{f : σ → σ, x : σ} |— f (f (fx)) : σ
—————————————
{f : σ → σ} |— λx.f (f (fx)) : σ → σ
———————————————
Ø |— λf.λx.f (f (fx)) : (σ → σ) → (σ → σ)

Remark:
There are programs that do not satisfy any specification, whatever be the basis B. For example, let us consider:

B |— xx : τ

It is impossible to derive this judgment in our system, whatever basis B and type τ we choose. What is the implication of this simple observation? Well, typed languages restrict the set of programs we can write, but on the other hand the programs we can write are safer, i.e. they are guaranteed not to contain certain errors and to satisfy their partial specifications. We loose something of course, that is flexibility, the possibility of writing "tricky" and "sly" programs.
So, in typed languages: less flexibility = more safeness.

There is a quite relevant property of typeable terms that concerns strong normalizability.

Theorem
Any typeable term is strongly normalizable.

So, since self-application cannot be typed, it follows that the combinator Y cannot be typed. This implies that we cannot type any term representing a recursive algorithm. This is also why, whereas in Scheme, if we wished, we could define recursive programs using fixed-point combinators, in Haskell it is not possible.
Of course this is not a real drawback for Haskell, since it is not realistic to write actual recursive programs by means of fixed-point combinators. In Haskell recursion is obtained by the use of recursive equations (which corresponds to the possibility of giving names to expressions.)

Let us now discuss another relevant property of our type assignment system: Subject reduction

Saying that M has type σ (M : σ ) amounts to say that M represents a value of type σ. The types should not give information about the form of the terms (about their syntax, in general), but, instead, they should provide informations about their intrinsic meaning, their semantics.
Hence, if a term has a certain type, it is important to be sure that during its evaluation the term "maintains" the same type.
For example, if a term M represents a program that returns a number, types would be useless if we could reduce M to a term that represents a program which returns a list.
Fortunately we can prove the following

Theorem (Subject reduction)
If in a basis B we can prove that M has type σ, that is

B |— M : σ

and if M —» N, then we can also prove that

B |— N : σ

Thanks to this theorem we are sure that, if we check that a term M has a given type at the beginning of a computation, then we can forget about types during the computation, since all the terms that we obtain by reducing M can be given the very same type. This also implies that types are irrelevant, do not play any role, in the computational process.

Let us note that the subject reduction property holds only for reductions and not for expansions. That is, if we have that

B |— N : σ and M —» N

it is not always true that

B |— M : σ

By means of the Subject reduction theorem we have also answered question 3).

Let us now try and answer question 4). Question 2) has still to be answered, but it will be done using the answer for 6)

Definition (Type substitution)
Let T be the set of simple types:

a) the type substitution (φ ρ) : T → T, where φ is a type-variable and ρ is a type, is inductively defined by:

(φ ρ) φ = ρ
(φ ρ) φ' = φ', if φ' ≠ φ
(φ ρ) σ → τ = ((φ ρ) σ) → ((φ ρ) τ)

The above definition formalizes the notion of "substituting a type variable by a type inside a type"

b) We can compose type substitution, that is, if S₁, S₂ are type substitutions, then so is S₁oS₂, where:

S₁oS₂ σ = S₁ (S₂ σ)

c) A type substitution can also be applied to a basis: SB = {x : Sτ | x : τ is in B}

d) A type substitution can also be applied to pairs of basis and type (it will be useful later on)

S <B, σ> = <SB, Sσ>

Note: The notion of substitution for types is much simpler than the one for terms, since in simple types we do not have bound type-variables.

Note: The above definition of type susbtitution is tailored in order to be used for the principal pair algorithm that we shall describe below. In fact it is not possible to describe a type substitution transforming φ→ψ into ψ→φ. This is not a problem, since such a type substitution will never be needed in the principar pair algorithm.

Example: if we apply the type substitution (φ σ) to the type φ → (τ→ φ), and if τ does not contain φ, we obtain the type σ → (τ→ σ)

If for σ,τ there is a substitution S such that Sσ = τ, we say that τ is a (substitution) instance of σ.

The following lemma shows that if a term has a type, it has indeed infinitely many tipes. The Lemma will provide, at the same time, an answer to question 4).

Lemma (substitution lemma)
If using a basis B we can prove that a term M has a type σ, that is

B |— M : σ

then, for all possible substitutions S, we can also prove that:

S(B) |— M : S(σ)

Note: The above substitution lemma holds also for a more general definition of type substitution.

So the Substitution Lemma states that if a term satisfies a specification, then it satisfies infinitely many ones (because we can have infinitely many possible substitutions.)

Example:

It is easy to prove that

{x : φ → ψ} |— λy .xy : φ → ψ

The Substitution Lemma states that if we apply any substitution S, it is also true that

S({x : φ → ψ}) |— λy .xy : S(φ → ψ)

For example, by considering the substitution

(φ (φ' → φ)) o (ψ ξ)

it is true that

x : (φ' → φ) → ξ |— (λy .xy) : (φ' → φ) → ξ

Example:

If we have derived

Ø |— λx.x : φ → φ

by the Substitution Lemma we are sure that also the following judgment is derivable

Ø |— λx.x : (ψ → φ) → (ψ → φ)

What we have just seen means that in our formal system we have polymorphic functions (a function is called polymorphic if it can be correctly applied to objects of different types).
So, by using type variables, we have implicitly introduced a notion of polymorphism in our type assignment system.
Since this system is the core of the type system of Haskell, this implies that Haskell is a language with polymorphism. (In Haskell type variables have names like a, b, c etc.)

To state that the lambda term λx.x has type a → a is tantamount to say that λx.x is a polymorphic function, that is it can be uniformly applied to any element of any type T and it returns an element of type T. The type a → a is (implicitly) a polymorphic type since, being "a" a type-variable, it "represents" all the types with shape T → T.
This sort of polymorphism is called parametric polymorphism. In particular, the parametric polymorphism of our system can be said to be implicit, since the fact that a term has many types it is not explicitly expressed in the syntax by means of particular operators on types, but it is expressed by results like the Substitution Lemma.
A possible syntax for explicit parametric polymorphism could be the following one:

λx.x : ∀φ.(φ → φ)

In this case the polymorphic behaviour of the term is explicitely described by the sintax.
(We shall deal later on with another sort of polymorphism: subtyping polymorphism. Besides, when studying the Type Classes in Haskell we shall deal with one more sort of polymorphism: overloading, a very restricted sort of polymorphism actually.)

We can now provide, using the notion of type substitution, an answer to question 5) by giving the following definition.

Definition (more general type)
Let τ and σ be two simple types.
τ is said to be more general than σ if there exists a substitution S such that σ = S(τ).
That is if the latter can be obtained from the former by means of a type substitution.

We know that if a term M has a type, it has infinitely many ones. Now, it is also possible to prove that there exists a precise relation among all the types that a given term has (i.e. all the possible partial specifications of a program are, so to speak, connected to each other).
That would be an answer for question 6). Such an answer will also provide an answer for question 2) that up to now we have left unanswered.

Principal pairs

Theorem
Given a term M, if M is typeable then it has a Principal Pair <P, π>, where P is a basis and π a type such that:

1) we can derive the following typing for M

P |— M : π

2) Moreover, if we can derive

B |— M : σ

then there exists a substitution S such that B = S(P) (as a matter of fact B could be a superset of S(P)) and σ = S(π).
In other words, the principal pair represents the most generic typing possible for M. That is, any other basis and type that we can use to type M can be obtained from the principal pair.

This partially answer our question 6)

Example:

Let us consider the term:

λx.x

It has principal pair:

<Ø, φ → φ>

This means that any other typing for the identity is a substitution instance of this principal pair.

The proof of the principal pair algorithm can be given by providing an algorithm that, given a term M, returns its principal pair, pp(M), and fails in case M be not typeable.
These completes the answer of our question 6).

Before giving the formal description of the Principal Pair Algorithm (also called Hindley-Milner algorithm), let us first informally describe how it works.
The algorithm tries to type the given term in the most general way (that is to find the most general basis and the most general type for the term.) It tries to do that by using the rules of the assignment system, in a bottom-up way.

If we consider a variable x, well, the only way to type it is by using an axiom, and the most general axiom for x is definitely the one that has only x in the basis and a type-variable as type for it.

If we consider an abstraction λx.M we know that the only way to type it is by using rule (→I). It is then clear that if we wish to type λx.M in the most general way by means of this rule, we first need to find the most general typing for M (by using recursively the algorithm itself. If we fail the whole algorithm fails). Once we have the most general typing for M, if x is in the basis (with a certain type, say σ) then the most general type for λx.M will be σ → π (where π is the most general type found for M). Otherwise, if x is not in the basis (that is, x is not a free variable of M) the most general typing for λx.M is φ → π, where π is the most general type found for M and φ is a type variable not yet used (if x is not a free variable of M it means that the input of λx.M could be anything and hence a type-variable φ is the most general type for such a generic input.

If we consider an application (MN) we know that the only way to type it is by using rule (→E). It is then clear that if we wish to type MN in the most general way by means of this rule, we first need to find the most general typing for M and the most general typing for N (by using recursively the algorithm itself. If we fail the whole algorithm fails). Once we have the most general basis and type for M, say P₁ and π₁, and the most general basis and type for N, say P₂ and π₂, these bases and types, however, cannot be used to type MN with the rule (→E) unless π₁ is an arrow type having π₂ as domain. If this is not the case, we can profit from the Substitution Lemma and find the simplest substitution that, when applied to π₁ and to (π₂ → φ), makes them equal (φ is a new type variable). Once we have such a substitution, say S₁ (we find it by means of another algorithm called unify. If it fails the whole algorithm fails), we apply it to P₁, π₁, P₂ and π₂. Unfortunately, we cannot use rule (→E) yet, since the bases used in the two premises of the rule must be the same, and in general it is not true that S₁ P₁ is the same as S₁ P₂. This means that we need to find another substitution S₂ (the simplest one), by means of which we manage to make S₁ P₁ and S₁ P₂ equal (in their common variables). If we manage to find such a substitution (otherwise the whole algorithm fails) we are done, since now we can use rule (→E) to type MN with S₂ S₁ φ from the basis S₂ S₁ (P₁UP₂) (we consider the union because the free variables of M could be different from the free variables of N).

The formalization of the process we have informally presented before is therefore the following recursive algorithm.

Definition (The principal pair algorithm)
Given a term M we can compute its principal pair pp(M), if any, as follows:

a) For all x, φ : pp(x) = < {x : φ},φ>

b) If pp(M) = <P, π> then:

i) in case x is in FV(M) (and hence there is a σ such that x : σ is in P) we define
pp(λx.M) = <P\x , σ → π>
ii) otherwise, in case x is not a free variable, we define
pp(λx.M) = <P, φ → π>, where φ does not occur in <P, π>.

c) If pp(M₁) = <P₁, π₁> and pp(M₂) = <P₂, π₂> (we choose, if necessary, trivial variants such that the <P₁, π₁> and <P₂, π₂> contain distinct type variables), φ is a type-variable that does not occur in any of the pairs <P_i, π_i>, and:

S₁ = unify π₁ (π₂ → φ)
S₂ = UnifyBases(S₁ P₁) (S₁ P₂)

then we define pp(M₁M₂) = S₂oS₁<P₁UP₂, φ>. (unify and UnifyBases are defined below)

Of course, we should also prove that this algorithm is correct and complete, but this would go beyond our scopes.

The principal pair algorithm provides also an answer to question 2). In fact, if a term is not typeable the algorithm fails, whereas it returns its principal pair if it is typeable. So, if a term is typeable we can not only effectively find it out, but, in a sense, we can have all its possible types, since any possible typing can be got out of the principal pair.

Extensions of the above algorithm are used in languages like Haskell and ML. Let us roughly see how it is used: let us define a function, say

Id x = x

And let us write now the following thing (that amounts to asking to the Haskell interpreter the following question: "what is the principal type of Id?")

:type Id

What the interpreter does is to run an algorithm similar to the one we have defined above, producing the principal type of the term associated to the identifier Id (the principal type and not the principal pair, since our Haskell expressions must be closed).
In case of Id the interpreter returns

a → a

Onother possibility in Haskell is instead to "declare" the type of an expression before we write it, for instance:

Id : (Int → b) → (Int → b)

After the type declaration we can write the "code" for Id, that is

Id x = x

Now, what the interpreter does is the following: it finds the principal type of the expression associated to Id and then checks whether the specification we gave, (Int → b)→ (Int → b), can be obtained out of the principal type of Id by means of a type substitution.

You can notice that the pp algorithm uses the unify and UnifyBases sub-algorithms (described below). The unify algorithms takes two types and checks whether there exists a substitution that makes them equal (such a substitution is called unifier). That is why sometimes the Haskell interpreter gives you a warning saying that it does not manage to unify some type variable, say a. This means that the unify algorithm implemented in the interpreter does not manage to find a substitution enabling the unify algorithm to successfully terminate.

Definition (Unify and UnifyBases)

Let B the collection of all bases, S be the set of all substitutions, and Id_s be the substitution that replaces all type-variables by themselves.

i) Robinson's unification algorithm . Unification of Curry types is defined by:

unify :: T x T → S

unify φ φ = (φ φ);
unify φ τ = (φ τ), if φ does not occur in τ
unify σ φ = unify φ σ
unify (σ → τ) (ρ → μ) = S₂oS₁,
where S₁ = unify σ ρ and S₂ = unify (S₁ τ) (S₁ μ)

ii) By defining the operation UnifyBases, the operation unify can be generalized to bases:

UnifyBases :: B x B → S

UnifyBases B₀, x:σ B₁, x:τ = S₂oS₁
where S₁ = unify σ τ and S₂ = UnifyBases (S₁ B₀) (S₁ B₁)
UnifyBases B₀, x:σ B₁ = UnifyBases B₀ B₁, if x does not occur in B₁.
UnifyBases Ø B₁ = Id_s.

Notice that unify can fail only in the second alternative, when φ occurs in μ.

It is worth noticing that the unify algorithm returns the most general unifier, that is the unifier U such that for any other unifier V there exists a substitution W such that V=WoU.

The typed lambda calculus a' la Church.

We have seen that in the Curry approach to types there is a neat separation between terms and types. The underlying "programming philosophy" being that we first write untyped programs and after that we use the type inference system either to check (possibly under certain assumptions -the basis-) if the program is typable with a given type or to find its principal type (by means of the the pp algorithm).

In the Church approach, instead, the types get involved during the construction of the term (program) itself. In a sense, types are part of the program and are used as "constraints" preventing the programmer to make certain mistakes. A fundamental property of the Church approach is that each term has just one single type.
This is achieved by imposing every variable to have (to be declared to have) just one type. The constraint of declaring the type of each variable is adopted in several programming languages, like C, C++, Java, Pascal, lambda Prolog,

Since in the typed lambda calculus a' la Church each term has just one type, the set of terms is partitioned in subsets, one for each type.

The set of types of the lambda calculus a' la Church (the simplest version of it actually) can be defined as follows.

T :: INT | BOOL | T → T
Of course we might choose to have even more basic types, not just INT and BOOL.
Notice that, unlike in the Curry system, we do not have type variables. In the Curry system, if a term M has a type containing type variables, say a → a, then M has infinitely many types (by the Substitution Lemma), whereas in the Church system a term has just one type. This is also the motivation why, if we wish to introduce the notion of parametric polymorphism in a Church system, it has necessarily to be explicit parametric polymorphism.

Let us introduce now the rules of our system. Unlike in the Curry system, they are not type inference rules, but rather term formation rules, in which types sort of "guide" the correct formation of terms.

For any possible type we assume there exists an infinite set of variables having that particular type. We assume also that there exist no two variables with the same name but with different types.

The type of a variable will be denoted by means of superscripts, like in x^t (the variable with name x and type t.)
We shall denote by Λ_t the set of well-formed typed lambda terms having type t.

Definition (Well-formed typed lambda terms)
The (term formation) rules of the typed lambda calculus a' la Church defining the sets Λ_t, are the following ones:

   For each typed variable x^t, x^t belongs to Λ_t;

   If M belongs to Λ_t1→t2 and N to Λ_t1, then (MN) belongs to Λ_t2;

   If M belongs to Λ_t2, then λx^t1.M belongs to Λ_t1→t2.

Notice that in the last clause, in case x is a free variable of M, it has necessarily the type t₁ of the abstracted variable, since variables with the same name do have the same type.

Property (Uniqueness of type)
Let M belongs to Λ_t1 and to Λ_t2, then t1=t2.

It is possible to develop also for the typed lambda calculus a' la Church a theory of beta-reduction and a theory of beta-conversion, starting, respectively, from the following axioms

(λx^t1.M)N →_β M[N/x^t1]

(λx^t1.M)N =_β M[N/x^t1]
and proceeding in a similar way as in the untyped lambda-calculus.
In the above axioms, (λx^t1.M)N has obviously to be a well-formed term. It can be proved that M[N/x^t1] is well formed if (λx^t1.M)N is so and that they have both the same type (belong to the same Λ_t).

In the Church typed lambda-calculus, the following questions naturally arise:
- given a term, is it well-formed? that is, has it been formed correctly using the rules of the system?
- given a well formed term, what is its type?

The answers to these questions are provided by the Type Checking algorithm (TC). Such algorithm, given a term, checks whether it has been correctly formed and, in case, returns its type.

Definition (Type Checking algorithm)

TC(x^t) = t if x^t belongs to Λ_t; fails otherwise

TC(MN) = t2 if TC(M) = t1 → t2 and TC(N) = t1; fails otherwise

TC(λx^t1.M) = t1 → t2, is TC(M) = t2 and x^t1 belongs to Λ_t1; fails otherwise.

We say that a term M type checks if the TC algorithm does not fail on M.
It is easy to see that TC(M)= t iff M belongs to Λ_t

As can be expected, it is possible to prove the "Church version" of the subject reduction property.

Theorem (Type preservation under reduction)
If M belongs to Λ_t and M —» N, then N belongs to Λ_t

Similarly to the Curry system, the following relevant theorem holds:

Theorem (Strong Normalization)

Let M be a well-formed typed term in the Church system, then M is strongly normalizable.

Relation between the systems of Curry and Church

As can be expected, there exists a strong relation between the two type systems described in the previous sections.

Given a simply-typed lambda term M, let |M| denote the untyped lambda term obtained from M by removing the type decorations. Then we have that if M type checks, then |M| is Curry-typeable. Viceversa, if a lambda term N is Curry-typeable, then we can find a suitable type for the bound variables of N, so to obtain a term in the Church system. More precisely, if we add base types to the types of the Curry system, the following holds:

Proposition
(i) If M is a well formed term in the Church system and TC(M)=A, then |M| can be given the type A in the Curry system (from a suitable basis). Note that the viceversa does not hold. Also note that A is not necessarily the principal type of |M|.
(ii) If N can be given a type A in Curry, then there exists a well-formed Church typed term M such that TC(M)=A and |M| = N.

Lambda-calculus extensions and their type systems a' la Curry.

We have said that the type system a' la Curry we have introduced in the previous sections is essentially the one used by the language Haskell. However, if we wish to investigate in an abstract way functional languages like Haskell, we cannot use just the pure lambda-calculus terms to represent programs.
In fact, we know that lambda-terms typeable in the type system a' la Curry are strongly normalizable, whereas not all typeable Haskell terms are so.
We cannot type fixed point operators like Y and this prevents using the notion of recursion in the simply typed lambda-calculus a' la Curry.
In Haskell, instead, we have recursion (since we have the possibility of giving names to terms).
Hence, if we wish to study the properties of languages like Haskell in a still general and abstract framework, a possibility is to extend the lambda-calculus in order to make is a bit closer to a real language, for what concerns both terms and types.

Let us call LC⁺ our extended calculus.
We assume LC⁺ to contain constants. For instance constants for numbers (once we know that we can represent numbers in the pure lambda-calculus, we can use constants instead, for sake of readability and computational efficiency): 0, 1, 2, ecc., constants for booleans: true, false, constants for numeric functions like add, mul, ecc. and also a constant for the conditional expression: if. If we wish we can introduce a constant for the pair constructor: [-], and many more, if we wish. (For semplicity's sake we shall denote by [M,N] the term [-]MN). If we have the pair constructor we also need the two selectors on pairs: first and second.
Notice the difference between having a constant for a certain value, say true, and having that value represented by a term, λxy.x (we have called it true just in order to better handle it in our expressions.)
As discussed before, since we wish to have a typed calculus and we wish to deal with recursion, we need to introduce the notion of recursion explicitely, by means of a recursor operator constant: fix.
The set of terms of LC⁺ can be defined almost as the set of terms of the pure lambda-calculus. The terms are build using abstraction and application starting from variables and constants.

Adding constants to our calculus, however, it is not enough.
If, from now on, you ask your friends to call you "Carlo Rubbia", is this enough to make you a great scientist? Of course not. It is not the name "Carlo Rubbia" that makes a person a great scientist. It is how the person "behaves". A cat is a cat not beacuse you call it "cat", but because it catches mice.
So, it is not enough to extend the lambda-calculus with a constant add in order to have the operation of addition in LC⁺. We need to introduce in LC⁺ something that makes add behave like the addition operation.
So, we add new axioms in the reduction theory of LC⁺, besides the one of beta-reductions. Axioms that specify the computational "behaviour" of constants, axioms like

add 1 2 -->_add 3
add 4 5 -->_add 9
etc. etc. And all those that are necessary for the constants we have, like if true M N -->_if M
if false M N -->_if N
first [M,N] --> _[-] M
etc. etc. Of course we have not to forget the axiom describing the computational behavior of fix: fix F -->_fix F(fix F)
We can now take into account the type system a' la Curry for LC⁺.
Since we have constants for numbers, booleans, etc., we need also to have base types like INT, BOOL, etc.
Of course in our system we have arrow types, but since we have introduced pairs, we need to have also product types. That is, if T₁ and T₂, also T₁xT₂ is a type.
Now, in order to have a reasonable type system, we cannot allow constants to have arbitrary types. When we use our formal system to infer a type for a term it would be unreasonable to assume that 3 has type φ → BOOL. So, what we need to do is to prevent constants to appear in a basis B and to extend our type inference system with new axioms, like

———————
B |— 3 : INT

and

—————————————
B |— add : INT→INT→INT

for any possible basis B.
And so on.
Notice that in the axiom for if we need to specify that it can be polymorphically typed:

—————————————
B |— if : BOOL→Sφ→Sφ→Sφ

for any possible basis B and substitution S.

For what concerns the fix operator, its typing axiom is obviously

———————————
B |— fix : (Sφ→Sφ)→Sφ

for any possible basis B and substitution S.

To deal with product types we need to add an inference rule:

B |— M : σ B |— N : τ
———————————————
B |— [M,N] : σxτ

Of course the property of strong normalization for typeable term does not hold anymore in LC⁺.

We can now use LC⁺ and its typing system to describe particular notions of programming languages like Haskell and to study their properties.
For instance we can formally define the lazy reduction strategy of Haskell by means of the following formal systems for typeable terms of LC⁺.

The Lazy reduction strategy

Axioms ———————
M -->^lazy N
If M -> N is a reduction axioms, that is either the axiom of beta-reduction or one of the axioms for the operator constants, for instance if true M N -->_if M.
Rules M -->^lazy M'
---------------------------
add M N -->^lazy add M' N

M -->^lazy M'
---------------------------
add n M -->^lazy add n M'
if n is a numeric constant

M -->^lazy M'
---------------------------
Eq? M N -->^lazy Eq? M' N

M -->^lazy M'
---------------------------
Eq? n M -->^lazy Eq? n M'
if n is a numeric constant

M -->^lazy M'
------------------------------------
if M N P -->^lazy if M' N P

M -->^lazy M'
---------------------------
first M -->^lazy first M'

M -->^lazy M'
---------------------------
second M -->^lazy second M'

M -->^lazy M'
---------------------------
MN -->^lazy M'N

Notice that the lazy reduction strategy does not reduce neither inside an abstraction nor inside a pair.

Of course it is possible to define extended version of the lambda-calculus also with type systems a' la Church (the system PCF is a well known example of typed extended lambda-calculus a' la Church).

Records and Record types

We extend now the lambda-calculus with terms representing records with labelled fields:
if M₁, ...,M_n are terms and l₁, ...,l_n are labels (taken from a given set L of admissible labels), then the following is also a term

< l₁=M₁, ...,l_n=M_n >

Since it shold be possible to extract a field from a record, we need to introduce also terms of the following form

M.l

where M is a term and l is a label in L.

The computational behaviour of the record selection operator is formalized by introducing the following new reduction rule:

< l₁=M₁, ...,l_n=M_n>.l_i -->_sel M_i

if 1 ≤i≤ n

The set of types can be now extended with types of the following form

{l₁:T₁...l_n:T_n}

where T₁,...,T_n are types.

Let us now give the typing rules for a typed lambda calculus a' la Curry extended with record terms and record types.

for each 1 ≤i≤n B |— M_i : T_i
———————————————————————— (T-Rcd)
B |— < l₁=M₁, ...,l_n=M_n> : {l₁:T₁...l_n:T_n}

B |— M : {l₁:T₁...l_n:T_n}
————————————————— (T-Proj)
B |— M.l_i : T_i

In [FB2], when dealing with objects in Object-oriented languages, we shall see how object types are similar to record types.

Exercise: Extend also the typed lambda-calculus a' la Church with record and record types.

15. Introduction to Subtyping.

Subtyping is a common feature in modern programming languages. Type systems will expect values of certain types in expressions, function arguments, or storage. There are common situations where a type T is expected, but a value of some other type S would also make sense. It is advantageous to allow the flexibility of accepting expressions of type S in such situations (Such a possibility can be condidered as a particular form of polymorphism). This allows for more expressiveness in the programming language. Functions can be written more generally to accept more than one type as an argument. In functions that accept records (or objects) as input, this also provides some abstraction because a function does not need to know every field of the record (or object). The function only needs to specify the fields it needs in the input record, and any additional fields may be present when the function is called. Consider the following example (in the typed lambda-calculus a' la Church) of a function that selects a field a from its input, where {a:Nat} is the type of records with one numerical field with label 'a'.

λr^{a:Nat}.(r.a)

Without subtyping, this function can only operate on arguments of type {a:Nat}. However, the function is well-behaved on inputs with type {a:Nat, b:Bool}, {a:Nat, b:Nat, c:Nat}, etc. With subtyping, we are allowed to pass these types of values to the function. Thus one function may be applied to many types (that's why we said that subtyping is a particular form of polymorphim).

Subtyping is characteristically found in object-oriented languages and is often considered an essential feature of the object-oriented style. We will explore this connection in detail when studying [P] and [FB2]. For now, though, we present subtyping in a more economical setting with just functions and records, where most of the interesting issues already appear.

The Subtyping Relation

In this subsection we shall take into account types that can be ground basic types (Nat, Bool,...), arrow types or record types.

Informally, S is a subtype of T if, in any context that expects a value of type T, a value of type S can be supplied instead (principle of substitutability).
We use S<:T to denote that S is a subtype of T.
Now we can define the formal system defining the relation <:, that is the inference rules that enable us to derive correct judgments of the form S<:T.

—————— (S-Refl)
S <: S

S <:T T <:U
——————————— (S-Trans)
S <: U

These above rules are straightforward and intuitive.

Now, for record types, we want to consider the type
S = {k₁:T₁...k_n:T_n,k_n+1:T_n+1...k_n+k:T_n+k }
to be a subtype of
T = {k₁:T₁...k_n:T_n}
since anything we can do with a record of type S can be done with a record of type T. Of course we need to consider also possible permutations of labelled fields and possible subtyping in the field types.

———————————————————————————————— (S-RcdWidth)
{l₁:T₁...l_n:T_n,l_n+1:T_n+1...l_n+k:T_n+k } <: {l₁:T₁...l_n:T_n}

for each i, S_i <:T_i
———————————————————————— (S-RcdDepth)
{l₁:S₁...l_n:S_n} <: {l₁:T₁...l_n:T_n}

{h₁:S₁...h_n:S_n} is a permutation of {l₁:T₁...l_n:T_n}
————————————————————————————— (S-RcdPerm)
{h₁:S₁...h_n:S_n} <: {l₁:T₁...l_n:T_n}

Exercise: Formally prove that

{x:{a:Nat,b:Nat},y:{m:Nat}} <: {x:{a:Nat}}

Let us consider now function types.
Suppose we have two functions, f and g, with types S₁→S₂ and T₁→T₂, respectively. When is it safe to substitute g with f? First, f must accept at least all the values that g accepts. This intuitively means that T₁ <: S₁. Second, f cannot return values that are not contained in the return type of g. This requirement is stated as S₂ <: T₂. We can then define the subtyping rule for functions as follows.

T₁ <: S₁ S₂ <: T₂
——————————————— (S-Arrow)
S₁→S₂ <: T₁→T₂

Rule (S-Arrow) then, according to the definition below, states that the subtyping relation is contravariant with respect to the first argument of the arrow type constructor and covariant with respect to the second argument of the arrow type constructor.

Definition
Given a binary relation R on a set A and given an operation c on A, we say that

R is covariant w.r.t. c iff
aRb implies ca Rcb
R is contravariant w.r.t. c iff
aRb implies cb Rca

The definition of Variance and Contravariance actually concerns Category Theory and in particular the notion of Functor. Above we have actually given a restriction of the general definition, for a very specific case, which is however enough for the use we make of it.

When one defines types for objects in typed Object-oriented languages, one has to decide whether their subtype relation has to be defined as covariant or contravariant w.r.t the object type constructor. This fact is not so obvious as for arrow type contructors, and any possible approach has its pros and cons. This problem will be approached in [FB2], where we shall also briefly discuss about the relationship between the notions of subtype and subclass in OO programming languages (see [KB] for a detailed discussion).

We can now integrate the subtyping relation in the type assignment system a' la Curry for the lambda-calculus with arrow and record types. Essentially, we need to allow a term with a given term to be typed also with any supertype. This idea is called subsumption. The following rule is then added to the typing rules:

B |— M: σ σ <: τ
————————————— (T-Sub)
B |— M: τ

The subtyping rules allow for a more flexible type system. The implementation of this new type system is not as straightforward as the type system without subtyping. The first problem is that the type rules are not syntax-directed. The subsumption rule applies to every term. This means that every syntactic form can fit two type rules, so the structure of a term does not determine what rule should be applied in typing a term (that is why in the type system described in [IPW] that we shall study later on there is no subsumption rule, but the subtype relation is used instead inside specific typing rules). The second problem is with the subtype relation. The rules to derive subtypes are not syntax-directed either. It is particularly hard to think of a way to algorithmically apply the S-Trans rule for subtype transitivity. If you are trying to prove that S<: T, then it is not clear whether there exists a third intermediate type U to use in the transitivity rule.
We shall not treat such problems in our course.

In order to help the intuition, the subtype relation can sometimes be roughly interpreted as the subset relation, looking at a type T as the set of the elements having type T. Such an intuition, however, is sometimes misleading, as we shall see when channel types for PICT will be dealt with in [FB3].
It is always better to keep in mind the principle of sustitutability.

For what concerns the notion of subtyping, there is a distinction between nominal subtyping, in which only types declared in a certain way may be subtypes of each other, and structural subtyping, in which the structure of two types determines whether or not one is a subtype of the other. The class-based object-oriented subtyping used in Java and formalized in the paper [IPW] is nominal; The subtyping relation we defined above on record types is instead structural.
(The following paragraph is by Donna Malayeri and Jonathan Aldrich)
Structural subtyping offers a number of advantages. It is often more expressive than nominal subtyping, as subtyping relationships do not need to be declared ahead of time. It is compositional and intrinsic, existing outside of the mind of the programmer. This has the advantage of supporting unanticipated reuse - programmers don't have to plan for all possible scenarios. Additionally, structural subtyping is often more notationally succinct. Programmers can concisely express type requirements without having to define an entire subtyping hierarchy.
Nominal subtyping also has its advantages. First, it allows the programmer to express and enforce design intent explicitly. A programmer's defined subtyping hierarchy serves as checked documentation that specifies how the various parts of a program are intended to work together. Second, explicit specification has the advantage of preventing "accidental" subtyping relationships. Third, error messages are usually much more comprehensible, since, for the most part, every type in a type error is one that the programmer has defined explicitly.