Short Introduction to Functional Programming using Scheme
	Franco Barbanera Based on part of Chap.1 of R. Plasmeijer, M. van Eekelen, "Functional Programming and Parallel Graph Rewriting" and on F. Barbanera, Lecture notes of the course Functional Programming, University of Catania.

Almost everything about Scheme (tutorials, interpreters, etc.) can be found in
http://www.schemers.org/

Extended versions of the various parts of the notes can be found in http://www.dmi.unict.it/~barba/LinguaggiII.html/DOCS/PROGRAMMA-TESTI/programma07-08.html

Contents

1. Why functional programming?

2. An imperative programming style

3. A functional programming style

4. Functions in mathematics and functions as programs

5. Functional programs

6. The evaluation of a functional program

7. The order of evaluation

8. Recursion

9. Higher order functions

10. Data Structures

11. Correctness proofs of functional programs

12. Typed Languages

13. The λ-calculus and computation theory

1. Why functional programming?

Imagine the availability of perfect computer systems: software and hardware systems that are user-friendly, cheap, reliable and fast. Imagine that programs can be specified in such a way that they are not only very understandable but that their correctness can easily be proven as well. Moreover, the underlying hardware ideally supports these software systems and superconduction in highly parallel architectures makes it possible to get an answer to our problems at dazzling speed. Well, in reality people always have problems with their computer systems. Actually, one does not often find a bug-free piece of software or hardware. We have learned to live with the software crisis, and have to accept that most software products are unreliable, unmanageable and unprovable. We spend money on a new release of a piece of software in which old bugs are removed and new ones are introduced. Hardware systems are generally much more reliable than software systems, but most hardware systems appear to be designed in a hurry and even wellestablished processors contain errors. Software and hardware systems clearly have become very complex. A good, orthogonal design needs many person years of research and development, while at the same time pressure increases to put the product on the market. So it is understandable that these systems contain bugs. The good news is that hardware becomes cheaper and cheaper (thanks to very large scale integration) and speed can be bought for prices one never imagined. But the increase of processing power automatically leads to an increase of the use of that power with an increasing complexity of the software as a result. So computers are never fast enough, while the complexity of the systems is growing enormously. The two key problems that the computer science community has to solve are:

How can we, at low cost, make large software systems that remain reliable and user-friendly?

How can we increase processing power at low cost?

Researchers are looking for solutions to these problems: by investigating software engineering techniques, to deal with problems related to the management of software projects and the construction and maintenance of software; by designing new proof techniques to tackle the problems in proving the correctness of systems; by developing program transformation techniques, to transform the specification of a problem into a program that solves it; and by designing new (parallel) computer architectures using many processors (thousands or more). In the mean time the quest for revolutionary new technologies (e.g. optical chips, superconduction) is always going on.

Another approach is based on the idea that the problems mentioned above are fundamental problems that cannot be solved unless a totally different approach is used and hardware and software are designed with a completely different model of computation in mind.

2. An imperative programming style

Most computer programs are written in an imperative programming language in which algorithms are expressed by a sequence of commands.
These languages, such as FORTRAN, C, Algol, COBOL, PL/1 and Pascal, are originally deduced from (and form an abstraction of) the computer architectures they are running on. These computer architectures, although different in detail, are all based on the same architecture: the Von Neumann computer architecture. The Von Neumann computer architecture is based on the mathematical model of computation proposed by Turing in 1937: the Turing machine.

Imperative programming languages are those based on the Turing-machine model of computation.
The fundamental concepts in imperative programming languages (coming from the Turing Machine model) are:

store
variable - something it is possible to modify ("memory cell")
assignment
iteration

The great advantage of the Turing model and hence of the corresponding phisical architectures is that they are extremely simple. The Von Neumann architecture consists of a piece of memory that contains information that can be read and changed by a central processing unit (the CPU). Conceptually there are two kinds of information: program instructions in the form of machine code, information that is interpreted by the CPU and that controls the computation in the computer; and data, information that is manipulated by the program.
This simple concept has made it possible to make efficient realizations in hardware at a relatively low cost. In the beginning of the computer era, the ‘high-level’ imperative programming languages were designed to provide a notation to express computations in a machine-independent way. Later on, one recognized the importance of expressing computations such that programs are understandable for the human being and their correctness can be proven. It became clear that, in order to make it possible to reason about programs, not every machine instruction should have a direct equivalent in the high-level programming language. For instance, it is common knowledge that with the use of GOTO statements, which are the direct abstraction of the branch and jump instructions that are available on any computer, programs can be written in such a way that reasoning about them is almost impossible (Dijkstra, 1968). We strongly believe that a similar kind of problem is caused by the assignment statement.

Consider the following example written in an imperative programming style:

BOOLEAN even := TRUE;
…
PROCEDURE calculate (INTEGER value) : INTEGER;
BEGIN
 even := NOT even;
 IF even
  THEN value + 1
  ELSE value + 2
 ENDIF
END;
…
print(calculate (6));
…
print(calculate (6));

Both print statements in this program are syntactically the same. Still they may produce different results. Clearly either the value 7 or 8 is printed in both cases, but the exact value printed depends on the number of times the procedure calculate is called. The result returned by the procedure not only depends on the actual value of its argument, but also on the value the global boolean has at that particular moment. This value is ‘secretly’ changed in each procedure call. Such side-effects cause the result of a procedure call to be context dependent. Results become very hard to predict when, in a large program, global variables are changed all over the place.

One of the most important drawbacks of an imperative programming style is that an imperative program consists of a sequence of commands of which the dynamic behaviour must be known to understand how such a program works. The assignment causes problems because it changes the value (and often even the meaning) of a variable. Owing to side-effects it can happen that evaluating the same expression in succession produces different answers. Reasoning about the correctness of an imperative program is therefore very difficult.

Furthermore, because of the command sequence, algorithms are more sequential than is necessary. Therefore it is hard to detect which parts of the algorithm can or cannot be executed concurrently. This is a pity, since concurrent evaluation seems to be a natural way to increase execution speed.
A conjecture adopted by many researchers nowadays is that the software crisis and the speed problem are inherent to the nature of imperative programming languages and the underlying model of computation. Therefore, other styles of programming, such as object oriented, logical and functional styles of programming are investigated.

3. A functional programming style

John Backus (1978) pointed out that the solution for the software problems has to be found in using a new discipline of programming: a functional programming style instead of an imperative one.
In a functional program the result of a function call is uniquely determined by the actual values of the function arguments. No assignments are used, so no side-effects are possible. As a consequence, it makes no difference where and under what conditions a function is called. The result of a function will, under all conditions, be determined solely by the value of its arguments. For instance, in the program below the value printed will be 42 in both cases.

FUNCTION increment (INTEGER value) : INTEGER;
BEGIN
value + 1
END;
…
print(increment (41));
…
print(increment (41));

It is much easier to reason about the result produced by a function that has no side-effects than about the result of an imperative procedure call with side-effects.

Advantages of functional programming languages
So perhaps a functional programming style is important and side effects, and hence the assignment statement, should be abandoned. But why should we use a functional programming language? Is it not possible to use the familiar languages? The common imperative languages also have functions, so why not restrict ourselves and only use the functional subset of for instance C, Algol or Modula2?
Well, one can use the functional subset of imperative languages (i.e. only using functions and leaving out the assignment), but then one is deprived of the expressive power of the new generation of functional languages that treat functions as ‘first class citizens’. In most imperative languages functions can only be used restrictively. An arbitrary function cannot be passed as an argument to a function nor yielded as result.

Functional programming languages have the advantage that they offer a general use of functions which is not available in classical imperative languages. This is a fact of life, not a fundamental problem. The restricted treatment of functions in imperative languages is due to the fact that when these languages were designed people simply did not know how to implement the general use of functions efficiently. It is also not easy to change the imperative languages afterwards. For instance, the type systems of these languages are not designed to handle these kinds of function. Also the compilers have to be changed dramatically.

Another advantage is that in most modern functional programming language(s) (FPLs) the functional programming style is guaranteed: the assignment statement is simply not available (like GOTOs are not available in decent modern imperative languages). FPLs in which there are no side-effects or imperative features of any kind are called pure functional languages. Examples of pure functional languages are Miranda, LML (Augustsson, 1984), HOPE (Burstall et al., 1980), Haskell (Hudak et al., 1992) and Concurrent Clean (Nöcker et al., 1991b). LISP (McCarthy, 1960) and ML (Harper et al., 1986) are examples of wellknown functional languages which are impure. From now on only pure aspects of FPLs are considered.

In functional programming the relevant notion of imperative programming are absent (or, in case they are present, have a different meaning).
In fact, the relevant notions of functional programming are:

expression
recursion - different from the notion of iteration
evaluation - different from the notion of execution (when we evaluate a mathematical expression we do not modify anything)
variable - intended in a mathematical sense (unknown entity, abstraction from a concrete value)

In pure FPLs the programmer can only define functions that compute values uniquely determined by the values of their arguments. The assignment statement is not available, and nor is the heavily used programming notion of a variable as something that holds a value that is changed from time to time by an assignment. Rather, the variables that exist in purely functional languages are used in mathematics to name and refer to a yet unknown constant value. This value can never be altered. In a functional style a desired computation is expressed in a static fashion instead of a dynamic one. Due to the absence of side-effects, program correctness proofs are easier than for imperative languages (see Section 11). Functions can be evaluated in any order, which makes FPLs suitable for parallel evaluation (see Section 6). Furthermore, the guaranteed absence of side-effects enables certain kinds of static analysis of a program. analysis.

Besides the full availability of functions, the new generation functional languages also offer an elegant, user-friendly notation. Patterns and guards provide the user with simple access to complex data structures; basically one does not have to worry about memory management any more. Incorrect access of data structures is impossible. Functional programs are in general much shorter than their conventional counterparts and thus in principle easier to enhance and maintain.

The question arises, is it possible to express any possible computation by using pure functions only? Fortunately, this question has already been answered years ago. One of the greatest advantages of functional languages is that they are based on a sound and well-understood mathematical model of computation: the λ-calculus (Church, 1932; 1933).
One could say that functional languages are sugared versions of this calculus. The λ-calculus was introduced at approximately the same time as the Turing model (Turing, 1937). Church’s thesis (Church, 1936) states that the class of effectively computable functions, i.e. the functions that intuitively can be computed, is the same as the class of functions that can be defined in the λ-calculus. Turing formalized machine computability and showed that the resulting notion of Turing computability is equivalent to λ-definability. So the power of both models is the same. Hence any computation can be expressed using a functional style only.

Disadvantages of functional programming languages
The advantages mentioned above are very important. But although functional languages are being used more frequently, in particular as a language for rapid prototyping and as a language in which students learn how to program, functional languages are not yet commonly used for general purpose programming. The two main drawbacks lie in the fields of

efficiency and

suitability for applications with a strongly imperative nature.

   Firstly, until recently programs written in a functional language ran very, very slowly in comparison with their imperative counterparts. The main problem is that the traditional machine architectures on which these programs have to be executed are not designed to support functional languages. On these architectures function calls are relatively expensive, in particular when the lazy evaluation scheme (see Section 7) is used. Our computers are ideally suited for destructive updates as present in imperative languages (assignments), but these are conceptually absent in functional languages. It is therefore not easy to find an efficient compilation scheme. Another big problem is caused by the fact that for some algorithms, due to the absence of destructive updates, the time and space complexity can be much worse for a functional program than for its imperative equivalent. In such a case, to retain the efficiency, program transformations are necessary.
   By using several new (and old) compilation techniques, the efficiency of (lazy) functional programs can nowadays be made acceptable in many (but certainly not all) cases. New compilation techniques have been developed at several research institutes. These techniques are quite complex. Commercial compilers are therefore not widely available yet. This will soon change. Anyhow, in our opinion, the advantages of functional languages are so important that some loss of efficiency is quite acceptable. One has to keep in mind that decades ago we accepted a loss of efficiency when we started to use high-level imperative languages instead of machine assembly languages.
   Secondly, a very important drawback of functional languages was that some algorithms could not be expressed elegantly in a functional programming style. In particular, this seemed to hold for applications that strongly interact with the environment (interactive programs, databases, operating systems, process control). But, the problem is largely caused by the fact that the art of functional programming is still in development. We had and still have to learn how to express the different kinds of applications elegantly in a functional style. We now know that strongly interactive applications can be expressed very elegantly in a functional programming style. One example is the way interactive programs that use windows, dialogs, menus and the like can be specified in a precise and elegant way in many functional languages, like Clean and others.
   The advantages of a functional programming style are very important for the development of reliable software. The disadvantages can be reduced to an acceptable level. Therefore we strongly believe that one day functional languages will be used worldwide as general purpose programming languages.

4. Functions in mathematics and functions as programs

   In mathematics, there are several ways to define a function. The type of a function can be specified separately from the function definition. One way to define a function is by explicit enumeration of all objects in the domain on which the function is defined with their corresponding images. An example of this is the following partial function (domain names Z and N are used for the domains of integers and natural numbers).

abs: Z → N
abs(–1) = 1
abs( 0) = 0
abs( 1) = 1

   Another way to define functions is by using definitions that consist of one or more (recursive) equations. For example, with this method the abs-function above can easily be defined as a total function, applicable for all objects in the domain. Of course, the functions and operators used on the right-hand side must be defined on the appropriate domains.

abs: Ζ → Ν
abs(n) = n, n > 0
          = 0, n = 0
          = –n, n < 0

A function like factorial can be defined as follows:

fac: Ν → Ν
fac(n) = 1, n = 0
fac= n * fac (n – 1), n > 0

or an alternative definition method is:

fac: Ν → Ν
fac(n) = if (n=0) then 1 else n * fac (n – 1)

   A mathematician would consider the definitions above as very common, ordinary function definitions. But these examples are also perfect examples of function definitions in a functional programming language. Notationally a function definition in a functional language has many similarities with a function definition in mathematics. However, there is an important difference in objective. The objective in a functional language is not only to define a function, but also to define a computation that automatically computes the image (result) of a function when it is applied to a specific object in its domain (the actual argument of the function).
Some function definitions, well-defined from a mathematical point of view, cannot be defined similarly in a functional language, because the images of some functions are very difficult to compute or even cannot be computed at all.

Consider, for example, the following function definition:

halting: All_Programs → Ν
halting(p) = 1, if the execution of p will stop
                       0, otherwise

The halting function as indicated above is a problem that is not computable, and therefore an attempt to express it will not produce the desired computation. Suppose that a function in an FPL would try to calculate the image of the halting function for an arbitrary program. The only way of doing this is more or less running the program. But then the function would simply not terminate if the argument program does not terminate, in which case the result 0 would never be produced.

For another example, consider:

f: R → R ,   g: R -> R
f '' - 6g' = 6 sen x
6g'' + a² f ' = 6 cos x
f(0) = 0, f '(0) = 0, g(0) = 1, g'(0) = 1

The equations for f, g and their derivatives f ', f '', g' and g'' are solvable, but it is not easy to compute such functions.

    Some special purpose programming languages are able to calculate functions by applying special purpose calculation techniques (symbolic computation using computer algebra techniques or formula transformations). But a general purpose functional programming language uses a very simple model of computation, based on substitution. So when functions are defined in an FPL a computation through substitutions is defined implicitly (see Section 6).

In a sense, a function definition in a FPL, besides identifying a precise mathematical functions, implicitely describes a way to compute it (an algorithm).

5. Functional programs

A program written in a functional language consists of a collection of function definitions, possibly recursive, and an initial expression that has to be evaluated.

From now, for our running examples, we shall use the functional programming language SCHEME.

In Scheme a function definition consists in an expression of the form

(define (function-name arg1 .. argn) body)

The part immediately after the keyword "define" defines the function name and its formal arguments (also called formal parameters). The expression body specifies the function result. Such an expression can be a denotation of some value, or it can be a formal argument, or a function application.

In a function application a function is applied to an expression, the actual argument. The application of a function f to an expression a is represented, in Scheme, as

(f a)

So function application is denoted by simple juxtaposition of the function and its argument.

Below are some examples of function definitions in Scheme. The ";;" indicates that the rest of the line is a comment.

(define (ident x)  x)   
;; the identity function on numbers

(define (alwaysseven x) 7)  
;;  a function from num to num that yields 7, 
;; independent of arg. x

(define (inc x) (+ x 1))  
;;  a function that returns the value of its arg. + 1

(define (square x) (* x x))  
;; square function

(define (squareinc x) (square (inc x)))     
;; square increment of argument

(define (fac x) (if (= x 0) 1 (* x (fac (- x 1)))))   
;; the factorial function

Notice: in Scheme all functions are applied using the prefix notation.

In the last example if is a predefined operator with three arguments: a boolean, a then expression and an else expression. Its semantics corresponds to a conditional choice in imperative languages.

A formal argument, such as x in the example above, is also called a variable. The word variable is used here in the mathematical sense of the word that is not to be confused with the use of the word variable in an imperative language. This variable does not vary. Its scope is limited to the equation in which it occurs (whereas the defined function names have the whole program as scope).
Functions defined as above are called user-defined functions. One can also denote and manipulate objects of certain predefined types with given predefined operators. These predefined basic operators can be regarded as predefined functions.

The initial expression is the expression whose value has to be calculated. For example, in the case that the value of 2 + 3 has to be calculated, the initial expression

(+ 2 3)

is written. But one can also calculate any application of userdefined functions:

(squareinc 5)

6. The evaluation of a functional program

The execution of a functional program consists of the evaluation of the initial expression in the context of the function definitions the program is made of, such a context is called the environment.

An example of functional program: a set of function definitions and an initial expression.

(define (ident x)  x)   

(define (inc x) (+ x 1)) 

(define (square x) (* x x)) 

(define (squareinc x) (square (inc x))) 

(squareinc 7)

Running such a program means first to have all the define-expressions evaluated (their evaluation enriching the environment with new user-defined functions) and then to have the initial expression evaluated (of course the define-expressions are evaluated once and for all).
For istance, when we open the Drscheme implementation of scheme, two windows appear. We write in the upper window all the define-expressions and then we ask the interpreter to evaluate them (roughly, the interpreter is the program that evaluates scheme expressions.).
In the lower window there appear a prompt '>' (showing that the interpreter is ready to evaluate an expression). We write our initial expression after the prompt and the interpreter returns its value (if any):

> (squareinc 7)
64
>

Let us see now how the Scheme interpreter evaluates an expressions.
The evaluation of the initial expression performed by the (interpreter of the) language consists of repeatedly performing reduction or rewriting steps.
In each reduction step (indicated by a ‘→’) a function application in the expression is replaced (reduced, rewritten) by the corresponding function body, substituting the formal arguments by the corresponding actual arguments. A (sub)expression that can be rewritten according to some function definition is called a redex (reducible expression). The basic idea is that the reduction process stops when none of the function definitions can be applied any more (there are no redexes left) and the initial expression is in its most simple form, the normal form. This is the result of the functional program that is then printed out.

For instance, given the function definitions above (the environment), the initial expressions below can be evaluated (reduced) as follows. In the examples the redex that will be reduced has been underlined.

(ident 42) → 42

(square (+ 1 2)) → (square 3 3) → (* 3 3) → 9

(squareinc 7) → (square (inc 7) → (square (+ 7 1)) → (square 8) → (*8 8) → 64

However, the initial expression may not have a normal form at all. As a consequence, the evaluation of such an initial expression will not terminate. In some functional languages (not in Scheme), infinite computations may produce partial results that will be printed as soon as they are known.

Example of a non-terminating reduction. Take the following definition:

(define (inf x) (inf x))

then the evaluation of the following initial expression will not terminate:

(inf 2) → (inf 2) → (inf 2) → ...

7. The order of evaluation

   Because there are in general many redexes in one expression, one can perform rewrite steps in several orders. The actual order of evaluation is determined by the reduction strategy which is dependent on the kind of language being used. A reduction strategy is a policy used by an interpreter to choose the next redex to be reduced among the many possible ones.

   There is a very important things to know about the ordering of reduction steps. Due to the absence of side-effects, the result of a computation does not depend on the chosen order of reduction. If all redexes are vanished and the initial expression is in normal form, the result of the computation (if any) will always be the same: the normal form is unique (see The Confluence Property and its Corollary, in Section 13, for a formal proof of this fact).
For instance, one can compute one of the previous expressions in a different order, but the result is identical:

(square (+ 1 2)) → (* (1 + 2) (1 + 2)) → (* 3 (1 + 2)) → (* 3 3) → 9

   It is sometimes even possible to rewrite several redexes at the same time. This forms the basis for parallel evaluation.
Reducing several redexes at the same time:

(square (+ 1 2)) → (* (+1 2) (+1 2)) → (* 3 3) → 9

   However, the order is not completely irrelevant. Some reduction orders may not lead to the normal form at all. So a computation may not terminate in one particular order while it would terminate when the right order was chosen.

Example of a non-terminating reduction order.
Assume that the following (recursive) functions are defined:

(define (inf x) (inf x))

(define (alwaysseven x) 7)

Now it is possible to repeatedly choose the ‘wrong’ redex which causes an infinite calculation:

(alwaysseven (inf 2)) → (alwaysseven (inf 2)) → ...

In this case another choice would lead to termination and to the unique normal form:

alwaysseven (inf 2)) → 7

The reduction strategy followed by the interpreter depends on the kind of Functional Programming Language. In some languages, e.g. LISP, ML and HOPE, the arguments of a function are always reduced before the function application itself is considered as a redex. These languages are called eager or strict languages. The reduction strategies they use are called call-by-value strategies.
In most recent FPLs, e.g. Miranda and Haskell, the rewriting is done lazily. In lazy functional languages the value of a subexpression (redex) is calculated if and only if this value must be known to find the normal form. The lazy evaluation order will find the normal form if it exists. The strategies used by such languages belong to the class of call-by-name strategies.

Illustrating lazy rewriting:

(alwaysseven (inf 2)) → 7

A more complex example to illustrate lazy evaluation is shown below. It is the sort of evaluation that is performed by interpreters like the one of Haskell.
The predefined conditional function demands the evaluation of its first argument to make a choice between the then and else parts possible. The equality function forces evaluation in order to yield the appropriate Boolean value. Multiplication and subtraction are only possible on numbers; again evaluation is forced.

(fac 2)
→ (if (= 2 0) 1 (* 2 (fac (- 2 1))))
→ (if FALSE 1 (* 2 (fac (- 2 1))))
→ (* 2 (fac (- 2 1)))
→ (* 2 (if (= (- 2 1) 0) 1 (* (- 2 1) (fac (- (- 2 1) 1)))))
→ (* 2 (if (= 1 0) 1 (* (- 2 1) (fac (- (- 2 1) 1)))))
→ (* 2 (if FALSE 1 (* (- 2 1) (fac (- (- 2 1) 1)))))
→ (* 2 (* (- 2 1) (fac (- (- 2 1) 1))))
→ (* 2 (* 1 (fac (- (- 2 1) 1))))
→ (* 2 (*1 (fac (- (- 2 1) 1)))))
→ (* 2 (*1 (if (= (- (- (- 2 1) 1)) 0) 1 (* (- (- 2 1) 1) (fac (- (- (- 2 1) 1) 1)))))))
→ ... → (* 2 (* 1 1)) → (* 2 1) → 2

Scheme is a strict language. It evaluates expressions using a call-by-value reduction strategy, that means that the application of a function to an argument is a redex only if the argument is a value.
Let us see how the Scheme interpreter evaluates the expression (fac 2).

(fac 2)
→ (if (= 2 0) 1 (* 2 (fac (- 2 1))))
→ (if FALSE 1 (* 2 (fac (- 2 1))))
→ (* 2 (fac (- 2 1)))
→ (* 2 (fac 1))
→ (* 2 (if (= 1 0) 1 (* 1 (fac (- 1 1)))))
→ (* 2 (if FALSE 1 (* 1 (fac (- 1 1))))))
→ (* 2 (* 1 (fac 0)))
→ (* 2 (* 1 (if (= 0 0) 1 (* 0 (fac (- 0 1)))))))
→ (* 2 (* 1 (if TRUE 1 (* 0 (fac (- 0 1)))))))
→ (* 2 (* 1 1)))
→ (* 2 1) → 2

Notice how, also in languages like Scheme, the arguments of the conditional operator if, are not both evaluated before the application of if, since otherwise the computation would never stop.

8. Recursion

In imperative languages the most relevant control structure is iteration. In functional languages, instead, it is recursion that gives computational power to programs. Without recursion we could compute almost nothing in pure functional languages.

Recursion is a naturally applied process to define functions on recursively defined data (called also inductively defined data or algebraic data).

Defining recursive functions, rough intuition: A rough idea of how we can look at a recursive function definition is the following one.
When we define a function by recursion we first try to look at the elements of the domain as "built" out of "simpler" parts (or you can think about it in terms of "smaller" parts). Then we try to express the value of the function on a generic element in terms of the value of the function on its "simpler" parts.
We need also to provide the value of the function on the "simplest" ("smallest") elements.
We must also be sure that, starting from a generic element, it is not possible to go on getting simpler and simpler (smaller and smaller) elements for ever.

The easiest inductive data are obviously the natural numbers: a natural number is zero or the successo of a natural number (you see, a recursively define data type). So Natural Numbers have only one base element (zero) and a unary constructor (the successor operator).

Let us define a simple recursive function on numbers, which sums up all the natural numbers from n downto 0.

(define (sumdowntozero n)
      (if (= n 0)
          0
          (+ n (sumdowntozero (- n 1)))))

You notice that the base case is 0, but it could be different, like in this other example, where we sum up all the numbers in the interval [from..to], starting from the bigger one.

(define (sumdownto from to)
      (if (= from to)
          from 
          (+ to  (sumdownto from (- to 1)))))

Notice that we have always in mind that a recursive function must have the recursive calls performed on "smaller elements". For numerical inputs this means "smaller numbers".

But then, what about the following function, which returns the same result as the previous function?

(define (sumfromupto from to)
      (if (= from to)
          to
          (+ from (sumfromupto (+ from 1) to))))

Well, it works. And it is still true that recursive calls are on "smaller" elements down to the base case. It is just that now the relation of "smaller" is not the usual one on natural numbers (actually is the opposite relation). We could be more precise if we studied well-founded sets.

9. Higher-order functions

Compared with the traditional imperative languages, which normally allow also the declaration of functions, functional programming languages have a more general view of the concept of functions: they are treated as ‘first-class citizens’, i.e. functions are treated just like other objects in the language. As a consequence, functions can have functions as arguments and as results.

Functions that have functions as actual arguments or yield a function as result are called higher-order functions, in contrast to first-order functions, which have only non-function values as argument or as result.

Example of a higher order function that takes a function as argument:

(define (atzero f) (f 0))

Let us see how the Scheme interpreter evaluates the following initial expressions (inc, square and ident are taken to be defined as before):

(atzero inc) → (inc 0) → (+ 0 1) → 1

(atzero square) → (square 0) → (* 0 0) → 0

(atzero ident) → (ident 0) → 0

In order to better understand the notion of higher-order functions, let us see a few functions that take functions as arguments.
Let g be a function on natural numbers, g: N -> N, which we assume to be computed by a Scheme function called G already defined.
We want to write a program that takes a natural number n as input and computes the sum, for i = 1... n, of the g(i)'s.

(define (SG n) 
    (if (= n 0) 
        0 
        (+ (G n) (SG (- n 1)))))

The program PG that, instead, computes the product of the elements g(i)'s, will be:

(define (PG n) 
    (if (= n 0) 
        1 
        (* (G n) (PG (- n 1)))))

Both the above programs can be generalized. Let us define the functions Sgen and Pgen that, besides taking as input a natural number n, take also the function on which the sums or the products are performed.

(define (Sgen f n) 
    (if (= n 0) 
        0 
        (+ (f n) (Sgen f (- n 1)))))

(define (Pgen f n) 
    (if (= n 0) 
        1 
        (* (f n) (Pgen f (- n 1)))))

Now, to compute the sum or the product of g(i) for i =1 ...10, we have simply to evaluate the initial expression (Pgen G 10) or (Sgen G 10).

We try now to generalize further, so we abstract both from the binary function applied to the various f(i)'s, and from the base index. (Example: the computing F for i=3..5 of f(i) corresponds to F(F(f(3),f(4)),f(5)) the following function do it in general)

(define (Ftoria F f b n) 
    (if (= n b) 
        (f b) 
        (F (f n) (Ftoria F f b (- n 1)))))

Now with this new function at hand we can easily define Sgen and Pgen as follows

(define (Sgen f n) 
    (Ftoria + f 0 n))

(define (Pgen f n) 
    (Ftoria * f 0 n))

We could also generalize Ftoria further, by abstracting on the predecessor function for the linear order over the finite set over which f is defined, and over the equality relation on the element of the set:

(define (FtoriaOrd F f low up pred eq) 
    (if (eq up low) 
        (f up) 
        (F (f up) (Ftoria F f low (pred up) eq))))

Well... maybe with FtoriaOrd we have gone a bit too far with the generalization... but you have noticed how very general functions can be defined and how other functions can be easily defined out of them.

It is possible to have functions in which both arguments and result are functions.
One example is the function compose: it takes two functions as input and returns their compositions.
Notice that in order to define the function compose we need to be able to write expression representing a function, without having to give a name to it. That is we need to be able to describe anonymous functions, functions without a name.
In Scheme the syntax for anonymous functions is the following one:

(lambda (arg1...argn) body)

An expression like the above one represents a function that, given n arguments arg1...argn, returns the value of body as result (of course after replacing in it the formal arguments with the actual arguments.

We can now easily define the compose function

(define (compose f g)
  (lambda (x) (f (g x))))

By means of anonymous functions we are also able to define Sgen using FtoriaOrd, in the following way.

(define (Sgen f n) 
    (FtoriaOrd + f 1 n (lambda (x)(- x 1)) =))

One more example: we define a higher-order function returning a function (from natural numbers to natural numbers) identical to the one given as argument, but in a point, for which a new value is provided.

(define (change-value-at-n  f  n  new-value)
    (lambda (x)
        (if (= x n)
            new-value
            (f x))))

Definitions
Up to now we have used the define-expressions to define functions. Actually the keyword 'define' can be used in general to associate, in the environment, a name to any value of Scheme.

(define a (+ 3 4))

The evaluation of the expression above associates in the environment the value 7 to the name "a". In fact if we now evaluate

(* a 2)

we get 14.
The general syntax of a define-expression is

(define name expr)

The evaluation of such an expression consists in evaluating the expression expr and in associating its value to the identifier "name" in the environment.
Notice that in Scheme functions are values, in fact our way to define a function:

(define (function-name arg1 .. argn) body)

is equivalent to writing

(define function-name (lambda (arg1 .. argn) body))

10. Data Structures: Lists

In imperative languages one can define global data structures that are globally accessible for reading and writing during execution. Since this is not possible in functional languages, the use of data structures in these languages is quite different. After creation of a data structure, the only possible access is read access. Data structures cannot be overwritten. Furthermore, data structures are not globally available but they must always be passed as arguments to the functions that need them. Modern functional languages allow the definition of complex data structures.
In this section we treat list. Lists is a data structure which is predefined in all the functional programming languages. Lists are recursive data structures, in fact the definition of list is recursive: A list is either the empty list or it is formed by an element followed by a list. It is then obvious that lists, since they are a recursively defined data structure, can be easily handled by means of recursion, the main control mechanism of functional languages.
In Scheme lists are not uniform, i.e. it is possible to have lists of elements of different types, unlike many other functional languages, like Haskell.

Lists, like any data structure in an FPL, are built using data constructors. Lists are constructed using two data constructors. A constant data constructor denoting the emplylist: empty (in certain implementation we can also denote the emptylist with '() ), and a constructor, cons that, taken a value v and a list L returns the list formed by v followed by L.

The expression

(cons 1 (cons 2 (cons 3 empty)))

denotes the list formed by the numbers from 1 to 3.

Here is a list of four boolean elements:

(cons #t (cons #t (cons #f (cons #t empty))))

A list of non uniform elements:

(cons #t (cons 2 (cons #f empty)))

In order to avoid to write too many cons in building a list, one can use the predefined function list. The three lists above can be represented by the following expressions:

(list 1 2 3)

(list #t #t #f #t)

(list #t 2 #f)

Here is what the interpreter returns by evaluationg the above expressions:
(list 1 2 3) → (1 2 3)
(list #t #t #f #t) → (#t #t #f #t)
(list #t 2 #f) → (#t 2 #f).

Since a list of values is a value itself, we can form lists having other lists as elements:

(cons 3 (cons (cons 2 (cons #t empty)) (cons #f empty)))

which is equivalent to

(list 3 (list 2 #t) #f)

The evaluation of the above expression returns the list (3 (2 #t) #f).

Of course we can give names to lists by means of define-expressions

(define Karaosmanglu 
   (cons 3 (cons (cons 2 (cons #t empty)) (cons #f empty))))

(define Pamuk (list #t 2 #f))

We can "build" list by means of constructors, but if we wish them to be of any use we need to be able to handle the parts they are made of. For these purpose there are the following predefined selectors: car and cdr.
car is an operator that takes a list in input and returns its first element. cdr takes a list and returns a list like the one in input, but without the first element.

(car (list 1 2 3 4 5 6))

returns 1.

(car Pamuk)

returns #t.

(cdr Pamuk)

returns (2 #f).

This way of handling lists is very user-friendly. One does not need to worry about ‘memory management’ nor about ‘updating pointers’.

In order to handle lists in programs, we need also predicates on them.
A predicate is a function that returns boolean values (by notational convention in Scheme the name of a predicate always terminates with the character '?'). In Scheme the predefined predicate on lists is null?. When applied to a list it returns #t if and only if the argument is the emptylist. It will be very useful in defining recursive functions on lists.

Let us now write a few recursive programs dealing with lists.

Recursion on (flat) lists

By a flat list we mean a list were no element in it is a list.

Let us see the apnd function that concatenates two lists given in input (we call it "apnd" since the function "append" is predefined in most Scheme implementations).

 (define  apnd (lambda ( ls1 ls2 )
     ( if  (null? ls1)
           ls2
           ( cons (car ls1) (apnd (cdr ls1 ) ls2)) )))

Exercise: Program the reverse function.

The apnd function is an example of function defined by recursion on the lenght of the list, this informally means that the recursive calls are on shorter lists than the input one.

Let's define the function (predicate) member?. It checks whether a number is in the list of numbers given as input. .

(define (member? elem ls)
    (if (null? ls)
        #f
        (if (= elem (car ls))
            #t
           (member? elem (cdr ls)))))

We can use boolean operators in order to get a simpler and clearer definition. Boolean operators are very helpful to define predicates.

Notice: the boolean operators or and and of Scheme are actually operators that can take an arbitrary number of expressions as arguments. The arguments are evaluated in sequence.
and returns #f as soon as the evaluation of an argument returns #f.
or returns #t as soon as the evaluation of an argument returns #t.

(define (member? elem ls)
   (and (not (null? ls)
        (or (= elem (car ls))
            (member? elem (cdr ls))))))

Exercise: write functions to sort a list using the different sorting algorithms you know.

Now let's consider another example of recursion on the length of a list: a function summing up all the numbers present in a list of numbers.

(define (Sumlist ls)
     (if (null? ls)
         0
         (+ (car ls) (Sumlist (cdr ls)))))

Intuition: How did we manage to program this function? Well, first we provided its result ( 0 ) in case the input be the basic element of the domain (the empty list). Then, by assuming that our program works correctly on lists that are "smaller" than the input, (Sumlist (cdr ls)), we "built" the result for the generic input ls: (+ (car ls) (Sumlist (cdr ls))).
We considered (cdr ls) to be "smaller" than ls.
But why do we have to use "smaller" inputs for the recursive calls?
In order to guarantee the computation eventually stops. In fact we cannot go on forever calling the function on smaller and smaller elements. Sooner or later we reach the emptylist, on which the function is correctly defined.
The problem is how we can be sure that it is not possible to have an infinite sequence of smaller and smaller elements.
For the present function there is no problem, since a "smaller" input consists in a shorter list, and it is impossible to have an infinite sequence of shorter and shorter lists.
But we shall see that it is not always so simple....

So, we need
(1) to provide the result for the simplest elements of the domain;
(2) to provide the result for all the other possible "shapes" of the input using the assumption that the program works correctly on "smaller" arguments than the input.;
(3) to be sure that there exists no infinite sequence of smaller and smaller elements

Point (3) above can be formally proved by using the notion of well-founded precedence relation.
We do not study such a notion in the present very introductory notes.

Let us write a function that sums up the number in a list using the constraint that the first argument of an addition must always be 1 (it is a weird and inefficient way to sum up the elements of a list, but in our examples we do not care about efficiency, we simply wish to practice with recursion.)

(define (Sumlistweird ls)
     (cond ((null? ls) 0)
           ((= (car ls) 0) (Sumlistweird (cdr ls)))
           (else (+ 1 
	            (Sumlistweird (cons (- (car ls) 1) 
		                        (cdr ls)))))))

Intuition: In this case we provided the value for the simplest element and for all the possible shapes of the input different from the emptylist:
(in an informal notation)
Sumlistweird emptylist = 0
Sumlistweird (cons 0 (cdr ls)) = Sumlist (cdr ls)
Sumlistweird (cons n (cdr ls)) = 1 + (Sumlist (cons (- n 1) (cdr ls)))

We assumed that Sumlistweird works correctly on "smaller" elements than ls. But now we are considering both (cdr ls) and (cons (- n 1) (cdr ls)) to be smaller than ls! Notwithstanding (cons (- n 1) (cdr ls)) has the same lenght as ls!
Actually this is not a problem, we need only to be sure that, according to this particular notion of "being smaller" there is no infinite sequence of smaller and smaller lists. The notion of well-founded precedence relation could help us to get a formal proof of that.

Recursion on nested lists

In Scheme lists are not homogeneus, so we can have anything in a list, also other lists.

One-level nesting
Let us consider a function on lists of lists of numbers.
The following function returns the sum of all the numbers present in a list of lists of numbers. (We call lsls the argument, in order to recall that it is a list of lists (of numbers)).
Notice that in Scheme (caar exp) stands for (car (car exp)).
Moreover, (cdar exp) stands for (cdr (car exp), and so on.

(define (Sumlistlist lsls)
    (if (null? lsls)
        0
        (if (null? (car lsls))
            (Sumlistlist (cdr lsls))
            (+ (caar lsls) 
	       (Sumlistlist (cons (cdar lsls) 
	                           (cdr lsls)))))))

Useless to say that the above function could be defined far more easily by means of the functions i Sumlist and map (do it as exercise).
But now we are not interested in the best ways to make sums, we are just playing with recursion in its various forms.

Now we write a "weird" version of Sumlistlist: a version where "+" can be used only with 1 as first argument.

(define (Sumlslsweird ls)
   (cond ((null? ls) 0)
         ((null? (car ls)) (Sumlslsweird (cdr ls)))
         ((= (caar ls) 0) (Sumlslsweird (cons (cdar ls) 
	                                      (cdr ls))))
         ( else (+ 1 
	        (Sumlslsweird (cons (cons (- (caar ls) 1) 
		                          (cdar ls)) 
                                    (cdr ls)))))))

This is an example of triple induction: on the length of the list, on the length of the first element and on the numerical value of the first number of the first element of the list.

Multiple levels of nesting
We consider now lists containing numerical elements or other lists at any level of nesting.

Let us define the function (actually a predicate) genmember?.
Such a function checks whether a number is in a list containing numbers at any level of nesting.

(define (genmember? elem ls)
    (if (null? ls)
         #f
        (if (list? (car ls))
            (or (genmember? elem (car ls))
                (genmember? elem (cdr ls)) )
            (or (eqv? elem (car ls))
                (genmember? elem (cdr ls))))))

We write now a function that sums all the numbers present in the input (a list containing numerical elements or other lists at any level of nesting.)

(define (Gensumls llss)
    (if (null? llss)
        0
        (if (list? (car llss))
            (+ (Gensumls  (car llss))
               (Gensumls  (cdr llss)) )
            (+ (car llss) (Gensumls  (cdr llss))))))

Notice that the recursion is not on the level of nesting. In fact (cdr llss) could have the same level of nesting of llss.

If somebody knows the game of the Hanoi towers, the program to solve it in the right way is the following.

(define (tower-of-hanoi n peg1 peg2 peg3)
   (if (= n 0)
       '()
       (append (tower-of-hanoi (- n 1) 
                               peg1 
			       peg3 
			       peg2)
               (cons (list peg1 peg3) 
                     (tower-of-hanoi (- n 1) 
		                     peg2 
				     peg1 
				     peg3)))))

In this definition of the tower the problem is defined by giving the number of disks and the name of the three pegs (that can be also be identified by a number); it is assumed that the first one is where all the disks are at the beginning. The third one is where the disks have to be moved to.

Of course we can have recursion on all sort of inductively defined Data Types (trees for examples).

11. Correctness proofs of functional programs

A functional programming style has advantages that are common in any mathematical notation:

There is consistency in the use of names: variables do not vary, they stand for a, perhaps not yet known, constant value throughout their scope.
Thanks to the absence of side-effects, in FPLs the same expression always denotes the same value. This property is called referential transparency.

A definition like x = x + 1 would mean in an imperative language that x is incremented. In a functional language however this definition means that any occurrence of x can always be substituted by x + 1.
Clearly the initial expression x does not have a normal form: substitution will continue forever (some systems recognize such situations in certain simple cases and report an error message).

x → x+1 → (x+1)+1 → ...

Due to the referential transparency one is allowed to replace in any expression any occurrence of any left-hand side of a definition by its corresponding right-hand side and vice versa

Referential transparency makes it possible to use common mathematical techniques such as symbolic substitution and induction. With the help of these techniques a program can be transformed into a more efficient one or certain properties of a program, such as its correctness, can be proven.

The use of Induction pronciples for proving functional program properties

Recursion is the basis of the computational and expressive power of functional programming languages. You can notice how its working principle is analogous to that used in mathematical proofs by induction.
In fact, in a recursive program the output is defined for the base elements and then, assuming the program correctly works for "simpler", "smaller" elements than the input, the output is defined for the generic input.
In a proof by mathematical induction the property is proven for the base case (usually 0) and then, assuming the property to hold for smaller numbers, we prove the property for the generic number.
Refresh your knowledge about proofs by mathematical induction.

You can notice that the principle of mathematical induction is a logical rule that sort of implicitely "expresses" also a model of calculation. It corresponds, in functional programming, to defining numerical functions by recursion like the following one:

 (define (fact n) 
     (if (= n 0) 
         1 
         (* n (fact (- n 1)))))

the value of the function is given for n=0 and is calculated for n assuming that fact correctly works for n-1.

The close correspondence betweeen recursive functions and the principles of induction, make it possible, for example, to use induction to formally prove properties of recursive programs.
It is natural to use the principle of mathematical induction if we wish to show that the factorial is total, i.e. that for all n (fact n) converges (i.e. that its evaluation eventually terminates)
We need:
1. to prove that (fact 0) converges
2. to prove that, for a generic k not equal to 0, (fact k) converges, using the inductive hypothesis that (fact (- k 1)) converges).

Let's consider the definition of factorial given above and let's prove the two points above:
1. if k=0 you evaluate the "then" part of the conditional, therefore the value of the expression that represents the factorial is 1, that is (fact 0) converges ;
2. if k >0 you evaluate the "else" part, hence the value of (fact k) is the value of the expression (* k (fact (- k 1))). Hence, by assuming that the evaluation of (fact (- k 1)) converges, the evaluation of whole expression (* k (fact (- k 1))) trivially converges.

In order to prove properties of programs that works on more complex data structures thatn number, we need to use more powerful mathematical tools, for instance the principle of well-founded induction (that we do not treat here).

12. Typed Languages

The class of functional languages can be divided into two main groups: Strongly typed languages (usually referred to simply as typed languages) and Non strongly typed languages (usually referred to simply as untyped languages).
A type, in general, can be looked at as a partial specification of a program (a function) or of an expression. These partial specifications, according to the particular type system of the language considered, can be more or less detailed.

By means of types we can avoid errors like:

true + 3

in fact the interpreter knows that + takes a pair of numbers and returns a number, i.e. that its type is

+:: int x int → int

A language is strongly typed whenever any function and any legal expression of the language has (or can be given) a type.

In (strongly) typed languages errors like the one shown above are detected by the Type Checker, before the program is "executed" (statically). In not (strongly) typed languages this sort of errors can be detected only at run-time, by the interpreter itself (the program's execution is stopped whenever an error is detected).

If a function in a typed language has a type int->bool, it means that the function (program) takes an integer as an argument and returns a boolean as result.

Scheme is an untyped language. In fact, we could write the following legal function definition

(define (f x) ((if (fac (list x x)) 5 #t)))

It is easy to notice that the function f cannot be given a type...
We know that fac can work correctly only on natural numbers, not on lists, but the language Scheme can realize it only when the function f is evaluated on a given input (at "run time"). Moreover, even if we write

(define (g x) ((if (> x 4) 5 #t)))

also g would be untypable, simply because we do not know which is the codomain of g, since the then part of the conditional is a number, whereas the else part is a boolean.

Typed languages, like Haskell, prevents us to write functions like f and g, since they realize statically, at the moment of their definition, that f and g cannot be typed.
Notice that this is a good thing for what concern f, since in such a way the Type Checker prevents us to apply fac to a list.
For what concern g, however, you can notice that even if the Type Checker cannot give a type to g, such a fucntion is not "intrinsically" wrong. Its application does not cause any disaster, we simply do not know if the result would be a number or a boolean.

The above discussion show that, on the one hand, typed languages prevents us to write functions, like f, that can cause some disaster during their application, and this is a good thing.
On the other hand they prevent us to write functions, like g, that are not intrinsically dangerous and that in same case could be useful.
In a nutshell: typed languages are safer, while untyped languages are more flexible. However, if I can choose between a typed and an untyped language... I prefer the first one!

13 The λ-calculus and computation theory

The Lambda-calculus is the computational model the functional languages are based on.
It is a very simple mathematical formalism. In the theory of the λ-calculus one is enabled to formalize, study and investigate properties and notions common to all the functional languages, without being burdened by all the technicalities of actual languages, useless from a theoretical point of view.
In a sense, the λ-calculus it is a paradigmatic and extremely simple functional language.

The concepts the Lambda-calculus is based on are those that are fundamental in all functional programming languages:

variable         ( formalisable by   x, y ,z,…)
abstraction (anonymous function)    (formalisable by   λx.M
                                                       where M is a term and x a variable)
application    (formalizable by M N, where M and N are terms )

We have seen that these are indeed fundamental notions in functional programming. It seems, however, that there are other important notions: basic elements, basic operators and the possibility of giving names to expressions.
Even if it could appear very strange, this notions actually are not really fundamental ones: they can be derived from the first ones.
This means that our fucntional computational model can be based on the concepts of variable, functional abstracion and application alone.
Using the formalization of these three concepts we form the lambda-terms. These terms represent both programs and data in our computational model (whose strenght in fact strongly depends on the fact it does not distinguish between "programs" and "data".)

The formal definition of the terms of the Lambda-calculus (lambda-terms) is given by the following grammar:

Λ ::= X | (ΛΛ) | λX.Λ

where Λ stands for the set of lambda-terms and X is a metavariable that ranges over the (numerable) set of variables (x, y, v, w, z, x2, x2,...)
Variable and application are well-known concepts, but what is abstraction?
Abstraction is needed to create anonymous functions (i.e. functions without a name). We have seen that being able to specify an anonymous function is very important in fucntional programming, but it is also important to be able to associate an identifier to an anonymous function. Nonetheless we can avoid to introduce in our model the notion of "giving a name to a function". You could ask: how is it possible to define a function like fac without being able to refer to it trought its name??? Believe me, it is possible!.

The term λx. M represents the anonymous function that, given an input x, returns the "value" of the body M.
Other notions, usually considered as primitive, like basic operators (+, *, -, ...) or natural numbers, are not strictly needed in our computational model (indeed they can actually be treated as derived concepts.)

In an abstraction λx.P, the term P is said to be the scope of the abstraction.
In a term, the occurrence of a variable is said to be bound if it occurs in the scope of an abstraction (i.e. it represents the argument in the body of a function).
It is said to be free otherwise.

Substitution

The notation M [L / x] denotes the term M in which any free (i.e. not representing any argument of a function) occurrence of the variable x in M is replaced by the term L.

Notice that in a lambda abstraction a substitution is performed only if the variable to be substituted is free.

Great care is needed when performing a substitution, as the following example shows:

Example:

Both of the terms   (λx.z)    and    (λy.z)    obviously represent the constant function which returns z for any argument

Let us assume we wish to apply to both of these terms the substitution [x / z] (we replace x for the free variable z). If we do not take into account the fact that the variable x is free in the term x and bound in (λx.z) we get:

(λx.z) [x / z] => λx.(z[x / z]) => λx.x     representing the identity function
(λy.z) [x / z] => λy.(z[x/ z]) => λy.x     representing the function that returns always x

This is absurd, since both the initial terms are intended to denote the very same thing (a constant function returning z), but by applying the substitution we obtain two terms denoting very different things.

Hence a necessary condition in order the substitution M[L/x] be correctly performed is that free variables in L does not change status (i.e. become not free) after the substitution.
Such a problem is present in all the programming languages, also imperative ones.

The formalization of the notion of computation in λ-calculus: The Theory of β-reduction

In a functional language to compute means to evaluate expressions, making the meaning of a given expression more and more explicit. Up to now we have formalized in the lambda-calculus the notions of programs and data (the terms in our case). Let us see now how to formalize the notion of computation. In particular, the notion of "basic computational step".
We have noticed in the introduction that in the evaluation of a term, if there is a function applied to an argument, this application is replaced by the body of the function in which the formal parameter is replaced by the actual argument.
So, if we call redex any term of the form (λx.M)N, and we call M[N/x] its contractum, to perform a basic computational step on a term P means that P contains a subterm which is a redex and that such a redex is replaced by its contractum (β-reduction is applied on the redex.)
The relation between redexes and their contracta is called notion of β-reduction, and it is represented as follows.

The notion of β-reduction

(λx.M) N →_β M [N/x]

A term P β-reduces in one-step to a term Q if Q can be obtained out of P by replacing a subterm of P of the form (λx.M)N by M[N/x]

Example:

If we had the multiplication in our system, we could perform the following computations

(λx. x*x) 2 => 2*2 => 4

here the first step is a β-reduction; the second computation is the evaluation of the multiplication function. We shall see how this last computation can indeed be realized by means of a sequence of basic computational steps (and how also the number 2 and the function multiplication can be represented by λ-terms).

Strictly speaking, M → N means that M reduces to N by exactly one reduction step. Frequently, we are interested in whether M can be reduced to N by any number of steps.

we write M —» N if M → M₁ → M₂ → ... → M_k ≡ N (K ≥ 0)

It means that M reduces to N in zero or more reduction steps.

Example:

((λz.(zy))(λx.x)) —» y

Normalization and possibility of non termination

Definition
If a term contains no β-redex, it is said to be in normal form.
Example:

λxy.y and xyz are in normal form.

A term is said to have a normal form if it can be reduced to a term in normal form.
Example:

(λx.x)y is not in normal form, but it has the normal form y

Definition:
A term M is normalizable (has a normal form, is weakly normalizable) if there exists a term Q in normal form such that

M —» Q

A term M is strongly normalizable, if there exists no infinite reduction path out of M. That is, any possible sequence of β-reductions eventually leads to a normal form.

Example:

(λx.y)((λz.z)t)

is a term not in normal form, normalizable and strongly normalizable too. In fact, the only two possible reduction sequences are:

1. (λx.y)((λz.z) t) → y

2. (λx.y)((λz.z) t) → (λx.y) t) → y

To normalize a term means to apply reductions until a normal form (if any) is reached. Many λ-terms cannot be reduced to normal form. A term that has no normal form is

(λx.xx)(λx.xx)

This term is usually called Ω.

Although different reduction sequences cannot lead to different normal forms, they can produce completely differend outcomes: one could terminate while the other could "run" forever. Typically, if M has a normal form and admits an infinite reduction sequence, it contains a subterm L having no normal form, and L can be erased by some reductions. The reduction

(λy.a)Ω → a

reaches a normal form, by erasing the Ω. This corresponds to a call-by-name treatment of arguments: the argument is not reduced, but substituted 'as it is' into the body of the abstraction.
Attempting to normalize the argument generates a non terminating reduction sequence:

(λy.a)Ω →(λy.a)Ω → ...

Evaluating the argument before substituting it into the body correspons to a call-by-value treatment of function application. In this examples, a call-by-value strategy never reaches the normal form.

Some important results of β-reduction theory

The normal order reduction strategy consists in reducing, at each step, the leftmost outermost β-redex. Leftmost means, for istance, to reduce L (if it is reducible) before N in a term LN. Outermost means, for instance, to reduce (λx.M) N before reducing M or N (if reducible). Normal order reduction is a call-by-name evaluation strategy.

Theorem (Standardization)
If a term has a normal form, it is always reached by applying the normal order reduction strategy.

Theorem (Confluence)
If we have terms M, M2 and M3 such that:

M —» M₁ and M —» M₂

then there always exists a term L such that:

M₁ —» L and M₂ —» L

That is, in diagram form:

Corollary (Uniqueness of normal form)
If a term has a normal form, it is unique.

Proof:

Let us suppose to have a generic term M, and let us suppose it has two normal forms N and N'. It means that

M —» N and M —» N'

Let us now apply the Confluence theorem. Then there exists a term Q such that:

N —» Q and N' —» Q

But N and N' are in normal form, then they have not any redex. So, it necessarily means that

N —» Q in 0 steps

and hence that N ≡ Q.
Analogously we can show that N' ≡ Q. It is then obvious that N ≡ N'.

The above result is a very important one for programming languages, since it ensures that if we implement the same algorithm in Scheme and in Haskell, the functions we obtain give us the very same result (if any), even if Scheme and Haskell are functional languages that use very different reduction strategies.

In the syntax of the λ-calculus we have not considered primitive data types (the natural numbers, the booleans and so on), or primitive operations (addition, multiplication and so on), because they indeed do not represent really primitive concepts. As a matter of fact, we can represent them by using the essential notions already formalized in the λ-calculus, namely variable, application and abstraction.

Defining recursive functions

In the syntax of the λ-calculus there is no primitive oparation that enables us to give names to lambda terms. However, that of giving names to expressions seems to be an essential features of programming languages in order to define functions by recursion, like in Scheme

(define fact (lambda (x) (if (= x 0) 1 (* x (fact (- x 1))))))

or in Haskell

fact 0 = 1

fact n = n * fact (n-1)

Then, in order to define recursive functions, it would seem essential to be able to give names to expressions, so that we can recursively refer to them.
Surprisingly enough, however, that's not true at all! In fact, we can define recursive functions in the lambda-calculus by using the few ingredients we have already at hand, exploiting the concept of fixed point (but we do not get into it).