Notes on Data Types in Haskell Data Types
Let us try to answer the following questions in the most general and simple manner (it is possible to give very precise mathematical definitions, but we will not do that).
Sometimes, even if a language has not a particular Data Type, it can give you the possibility to define it by yourself. Inductive data Types
An Inductive Data Type is a Data Type whose elements can be "built" starting from some basic elements by means of some "data constructors".
The constructors can use elements of the inductive Data Type itself as arguments (that's why we call them "recursive" Data Types; in Haskell they are also called "algebraic" Data Types), but also elements of of other data types.
As example, let us consider the Data Type of Binary Trees with Numerical Labels (Natural Numbers as labels).
data NumBinTree = EmptyNBTree | MkNBTree Integer NumBinTree NumBinTree It is easy to see that what this definition gives you: For instance the expression (MkNBTree 4 expT1 expT2) denotes the tree whose root has label 4 and whose subtrees are expT1 and expT2, where expT1 and expT2 are expressions denoting two other trees. And what about the functions working on elements of NumBinTree ? When dealing with inductive Data Types, the most important functions are those that return the "components" an element of the Data Type is built from, and the predicates to check whether an element is a basic element or a "compound" element. The first sort of functions are usually called "selectors". For NumBinTree the only predicate is the one that checks whether a given tree is the emptytree, while the three selectors are those that, given a non empty tree, return, respectively, the label of the root, the left subtree and the right subtree. In Haskell, predicates and selectors of an Inductive Data Type are implicitly defined by the mechanism of pattern matching. Let us see with an example (the function for the inorder visit of a tree) how pattern matching works.
inorder EmptyNBTree = [] inorder (MkNBTree n t1 t2) = (inorder t1)++[n]++(inorder t2) where ++ is the predefined concatenation (append) operator in Haskell. You notice that, by the possibility of describing the "shape" of the input of the function, we can easily refer to the components of the input inside the body of the function. Exercise:write the functions for the other sorts of visits for a tree. Let us see a different definition in Haskell for NumBinTree
data NumBinTree = EmptyNBTree | Leaf Integer | MkNBTree Integer NumBinTree NumBinTreeIf we took out "EmptyNBTree" from the above definition, we would get full binary trees (i.e. binary trees in which nodes have always zero or two children). By doing that we would not consider emptytrees to exist anymore, and hence the basic elements of the Data Types would be the leaves. It is important to stress that, once we have defined a data type (and its selectors if we are in Scheme), all the possible functions on the data type, can be defined by the user using only the selectors.
Haskell is a language that really gives you the possibility of defining new Data Types.
The above discussion is a very rough and intuitive view of Data Types,
but it is useful to keep it in mind when working with languages that do not really enable us to define new Data Types, like Scheme.
In fact in Scheme, if we wish to introduce a Data Type like NumBinTree, we first need to decide what is their internal representation and afterwards we have to define the basic elements, the constructors, the selectors and the predicates in terms of such an internal representation.
The internal representation of the Data Type will always remain visible to the user and such visibility is completely useless, if not harmful sometimes. So Scheme does not allow to define new Data Types. What one can do is just to "represent" a Data Type with one of the predefined Data Types and work with such a representation. Being trees inductively defined, it is not difficult to understand that the best way to define functions on trees is by recursion. Let us write an expression denoting a tree.
MkNBTree 3 EmptyNBTree (MkNBTree 4 EmptyNBTree EmptyNBTree) The tree denoted by this expression is
3 \ 4
Remind that in functional languages any function, and hence also the constructors of a Data Type, returns a new object and never the input modified.
Of course we could also implement modifiable objects in functional languages, but that can be achieved only by using the imperative features that sometimes functional languages have. In Scheme the keywords related to imperative operators are followed by a "!", like set! Let us define the Data Type of stacks of numbers in Haskell.
data NumStack = EmptyNStack | PushNS Integer NumStack The data type of sets of natural numbers in Haskell:
data SetNum = SetN [Integer] The only constructor is SetN that takes a list of elements of numbers and returns the set of the numbers contained in the list. Another choice can be
data SetNum = EmptySN | InsertSN Integer SetNum We define now the set of the lambda-terms in Haskell We first define the data type of variables. This could be done by means of an enumerated type
data Var = x | y | z | w but so we would have only a finite number of variabless. It is better to define the data type of variables as follows, using the intuition of looking at the variables as x1, x2, x3, ... data Var = MkVar Int (there is no base element, only a constructor) We can now define the lambda-terms as
data LambdaTerm = MkLamdaTerm Var | Appl LambdaTerm LambdaTerm | LambdaAbs Var LambdaTerm
Of course the general idea behind any inductively defined data type is the same. Also the following one is the description of a set of rather strange objects that indeed are elements of an inductive data type:
data Sarchiapone = Manzanillo | Minollo | Tirulero Integer Sarchiapone | Zumpappero Sarchiapone Sarchiapone [Integer] Parametric Data Types In Haskell we can define data types parametrized over all possible types. For instance we can define not only the data type of binary trees with numerical labels, but it is possible to define it for any possible type of label, by simply writing
data LabBinTree a = EmptyLBTree | MkLBTree a (LabBinTree a) (LabBinTree a) where a is a type variable, representing the type of the label of the data type. A generic stack can be defined as
data Stack a = EmptyStack | Push a (Stack a) In Scheme it is not possible to define such parametric data type, since in Scheme, you know, it is not even possible to really define a new data type. In fact the stack we define in Scheme can contain even heterogeneous elements. And this is something you cannot have in Haskell. Well, but we can have a certain amount of flexibility in Haskell, by a smart use of type variables. For instance, by defining the parametric data type below, (AlternatingBinTree Int Bool) denotes the type of labelled binary trees with integer labels on nodes at even depth and boolean labels at odd depth.
data AlternBinTree a b = EmptyABTree | MkABTree a (AlternBinTree b a) (AlternBinTree b a) EXERCISES:
Types in Haskell
One can wonder why we have used the word "Data Type" instead of simply "Type".
Well the motivation is that a type is something different.
A type is indeed something more than a set of element, it is a partial specification of a program.
In Haskell we have particular types, providing a particular sort of partial specifications.
These partial specifications describe mainly the domain and the codomain of a function.
It is not a complete specification, but the use of these sort of types helps in preventing a whole bunch of errors in our programs.
Let see this through a few examples Before writing the factorial function in Haskell we can declare its type
fact :: Integer -> Integer fact 0 = 1 fact n = n * (fact (n - 1)) The interpreter when it evaluates this type declaration and this function definition, checks whether the fact you have defined is really defined on integers and returns an integer. In fact, if we had written
fact :: Integer -> Integer fact 0 = [] fact n = n * (fact (n - 1))
([] denotes the empty list) Another possibility in Haskell is to first define a function and then ask the interpreter what is its type(that is its partial specification.) We can write
fact 0 = 1 fact n = n * (fact (n - 1)) and, if no problem is encountered by the interpreter, we can write
:type fact and the interpreter will return us
fact :: Integer -> Integer (actually it returns something more general, but the essence is the above one) What happens when we define the following function
id x = x and afterwards we ask the interpreter which is its partial specification?
:type id Well, we get
id :: a -> a
In Haskell letters like a, b, c etc. are intended to represent type variables, and a type variable is a variable that can be replaced by any type. So, what does the interpreter wish to tell us by writing "id :: a -> a"?
The notion of type and polymorphic type is not present just in Haskell, but in many other functional languages. Read the Notes about polymorphsm in Haskell [HPF] available in the reading material. Types in Scheme.
Well, Scheme is an untyped functional language! Also for Data Types we know that it is impossible to really define Data Types in Scheme. So, no types in Scheme, sorry.
|