Introduction to R¶

Statistical Laboratory

Alessandro Ortis - University of Catania

In this lesson we will introduce some basic R commands. They best way to learn a new programming language is to try out the commands.

The help() function provides details about a command. Is possible to run help(command).

help(solve)   #open a new help window with additional information about the function 'help'
?solve        #obtain the same result

Tip: everytime you see a new command, run the help() function to have a comprhensive knowledge of it.

Basic commands¶

R uses functions to perform operations. To run a function called funcname we type funcname(input1, input2), where input1 and input 2 are the inputs (or parameters) of the function.

A function can have any number of inputs. For example, to create a vector of numbers, we use the function c() (for concatenate). Any numbers inside the parentheses are joined together.

v <- c(1,3,1,2,5,3,1)  # <- (or =) denotes the assigment of a value to a variable name
z <- c(2,2,2,v)        # concatenate 2,2,2 with the existing vector v

v
z

?c

length(v)

v
z
# It will raise a warning message because v and z have different lengths
v + z

Warning message in v + z:
"longer object length is not a multiple of shorter object length"

t = c(v,0,0,0)  # concatenate v with 3 zeros to match the length of z
length(t)
length(z)
t + z

ls() # lists all the objects that we have saved so far.

rm(v)  # remove 'v' from memory
ls()

t
z
# We can apply standard mathematical operations to numeric vectors
t * z
t + z
t / z
t - z

The matrix() function can be used to create a matrix of numbers. Among all the numerous inputs of this function, let's focus on the first three:

data: the entries in the matrix
nrow: number of rows
ncol: number of columns

?matrix()

# Note that, the matrix func. will fill the matrix by column
x = matrix(data = c(1,2,3,4), 
           nrow = 2, 
           ncol = 2)
x

x = matrix(data = c(1,2,3,4,5,6,7,8),
           nrow = 2,
           ncol = 4)
x

x = matrix(data = c(1,2,3,4,5,6,7,8),
           nrow = 4,
           ncol = 2)
x

x = matrix(c(1,2,3,4), 2, 2, byrow=TRUE)
x

x = matrix(data = c(1,2,3,4,5,6,7,8),
           nrow = 4,
           ncol = 2,
           byrow=TRUE)
x

sqrt(x) # returns the square root of each element of a vector or matrix

x^2 # raises each element of x to the power 2
10^3

Here we create two correlated sets of numbers by using the rnorm() function, and use the cor() function to compute the correlation between them.

?rnorm

random_normal = rnorm(50) # generates a vector of 50 random (normal) variables with first argument n the sample size
random_gaussian = random_normal + rnorm(50, mean = 100, sd = .1)
cor(random_normal, random_gaussian)

rnorm(5)

To generate the same sequences of (pseudo) random numbers, we can run seed() before any randomic function.

set.seed(24)
rnorm(5)

data = rnorm(100, mean = 24, sd = 1)
mean(data)
var(data)
sqrt(var(data)) # square root of var...
sd(data)  # ... is equal to the std dev!

summary(data)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  19.53   23.38   23.95   23.90   24.47   26.78

Indexing Data¶

?seq

#seq(a,b,length=.) is a function that allows us to generate a sequence of numbers between a and b with a specified length
seq(5,13)
seq(5,13, length=10)
seq(13,5)
#help(seq)

seq(1,10,2)
seq(1,10,length = 3)
seq(3,15,length = 5)

seq(10) # Implicitly starts from 1
seq(1,10) # It's the same!
1:10 # again the same!

x <- 1:10 # a:b is shorthand for seq(1,10)
x

We often wish to examine part of a set of data. Suppose that our data is stored in a given matrix A.

# matrix(data, nrow, ncol)
A = matrix (1:16 ,4 ,4)
A

A[2,3] # select a single element at row=2 and col= 3

The first number after the open-bracket symbol [ always refers to the row, and the second number always refers to the column. We can also select multiple rows and columns at a time, by providing vectors as the indices.

A
# rows: 1 and 3   cols: 2 and 4
# selected elements:
# [1,2] -> 5
# [3,2] -> 7
# [1,4] -> 13
# [3,4] -> 15
#A[c(1 ,3) ,c(2 ,4) ] -> ?
A[c(1 ,3),c(2 ,4)]

# 1:3 means seq(1,3) -> [1,2,3]
A
A[1:3, 2:4] # rows: from 1 to 3, cols: from 2 to 4

A
A[1:2,]

A[ ,1:2]

# R treats a single row/column of a matrix as a vector
A[1,]

# The dim() function outputs the number of rows followed by the number of columns of a given matrix
d <- dim(A)
d

Lists¶

# The $ operator allows you to extract elements by name from a named list
person <- list(name='Mikey', surname='Mouse', age=33)

person$name
person$surname
person$age
# The names of a list can be found using names
names(person)

Loading Data¶

For most analyses, the first step involves importing a data set into R.

# old version
#Auto = read.csv("../Datasets/Auto.csv")
#fix(Auto)

?read.csv

Auto= read.csv("../Datasets/Auto.csv", header =T,na.strings ="?")

The dim() function tells us that the data has 397 observations, or rows, and 9 variables, or columns.

dim(Auto)

Auto[1:4,]

There are various ways to deal with the missing data. In this case, only 5 of the rows contain missing observations, and so we choose to use the na.omit() function to simply remove these rows.

?na.omit

Auto=na.omit(Auto)
dim(Auto)

data = matrix(1:15,5,3)
data[2,3] = NA
data[3,1] = NA
data[4,2] = NA
data

na.omit(data)

Once the data are loaded correctly, we can use names() to check the variable names.

names(Auto)

all_the_auto_names <- Auto$name
all_the_auto_names 

# The round brakets will output the assignment value
#(all_the_auto_names <- Auto$name)

Graphics¶

plot(x,y) produces a scatterplot of the numbers in x versus the ones in y.

?plot

x = rnorm(10000)
y = rnorm(10000)
plot(x,y)

# add labels
plot(x, y, xlab='x data', ylab= 'y data', main= 'Plot of x versus y')

We will often want to save the otuput of a plot to a file.

?pdf

pdf("Figure.pdf")         # Start creating a new PDF file
plot(x,y,col="green")     # Plot x vs. y using green circles
dev.off()                 # Stop creating the file, and save it

x = seq(100)
y = x^3
plot(x,y, xlab = "x", ylab="y = x^3", main = "x^3 function")

x = seq(-pi*2, pi, length=1000)
#x
y = sin(x)
plot(x,y, main="sin(x) with x in [-2*pi,pi]")
y2= cos(x)
plot(x,y2,col="green")

The contour() function produces a contour plot in order to represent three-dimensional data; it is like a topographical map and takes three arguments:

a vector of the x values (first dimension)
a vector of the y values (second dimension)
a matrix whose elements corresponds to the z values (third dimension) for each pair of (x,y) coordinates

There are many other inputs that can be used to fine-tune the output of the contour() function. To learn more about these, take a look at the help file by typing ?contour.

?outer

x = seq(-pi, pi, length=50)
y <- x
f <- outer(x,y,
          
           function(x,y) 
               cos(y)/(1+x^2)
           
          ) #We write a cosine function whose domain is x,y and range is f
contour(x,y,f)
#contour(x,y,f,nlevels=45,add=T)
# t(x) returns the transpose of x
#fa = (f-t(f))/2
#contour(x,y,fa,nlevels=15)

image(x,y,f)

persp(x,y,f , theta =30, phi =40)

Exercise¶

Create a 5x5 matrix M containing elements randomly selected from a Gaussian distribution with mean 5 and stdev 0.1. Then, extract the central 3x3 sub-matrix from M, let's call this submatrix N, then create the matrix Q = N - 5 and place Q within M, in the same position where N has been extracted. Then, run the following command:

image(1:5,1:5,M)

can you explain the result?

M = matrix(data=rnorm(25,5,0.1), nrow=5, ncol=5)
N = M[2:4,2:4]
Q = N - 5
M[2:4,2:4] = Q
image(1:5,1:5,M)

# Step by step ....
M = matrix(rnorm(25, mean=5, sd=0.1), nrow = 5, ncol =5)
M

N = M[2:4, 2:4]
N

Q = N - 5
Q

M[2:4, 2:4] = Q
M

#?image

image(1:5,1:5,M)

Exercise¶

Define the code to obtain the following output by creating a 8x8 matrix.

# Solution by M. Andronaco
M = matrix(c(0,1),nrow=9,ncol=8)
M = M[1:8,]
image(1:8,1:8,M)

# Solution by C. Cimino

S = matrix(data = rnorm(64, mean = 5, sd = 0.1), nrow = 8, ncol = 8)
#print(S)

W = S[c(2,4,6,8), c(1,3,5,7)]
#print(W)
J = W - 6
#print(J)
S[c(2,4,6,8), c(1,3,5,7)] = J
#print(S)
K = S[c(1,3,5,7), c(2,4,6,8)]
#print(K)
L = K - 6
#print(L)
S[c(1,3,5,7), c(2,4,6,8)] = L
print(S)
image(1:8, 1:8, S)

           [,1]       [,2]      [,3]       [,4]       [,5]       [,6]
[1,]  5.0923640 -1.0074475  4.926258 -0.9952692  4.9239959 -0.9412714
[2,] -1.0102043  4.9800116 -1.107290  5.1005884 -0.8097100  5.0019442
[3,]  5.0694689 -1.1052663  4.921278 -0.9442782  4.8231532 -0.8999689
[4,] -1.1223695  4.8786211 -0.937773  5.0335695 -0.8997635  5.0612045
[5,]  5.0263061 -0.9381853  4.904463 -1.0399000  5.1082862 -0.8460279
[6,] -0.8922884  5.0237857 -1.028276  5.0616895 -0.9541770  4.8776684
[7,]  4.9195279 -0.9407290  5.080724 -0.9388278  4.9446825 -1.0311310
[8,] -1.1434426  4.9970836 -1.093246  4.8295138 -1.1797255  4.9657781
           [,7]       [,8]
[1,]  5.0359482 -1.1625396
[2,] -1.0948587  4.8461840
[3,]  5.1268769 -0.9780646
[4,] -1.0456980  4.9690766
[5,]  4.8686386 -1.1219732
[6,] -0.8350475  4.8295284
[7,]  4.8494063 -0.9301597
[8,] -1.1893865  5.1478797

M = matrix(rnorm(64, mean=5, sd=0.01), nrow = 8, ncol =8)
M[seq(1,8,2),seq(2,8,2)] = 0
M[seq(2,8,2),seq(1,8,2)] = 0
image(1:8,1:8,M)

1.000000	1.414214
1.732051	2.000000
2.236068	2.449490
2.645751	2.828427

mpg	cylinders	displacement	horsepower	weight	acceleration	year	origin	name
18	8	307	130	3504	12.0	70	1	chevrolet chevelle malibu
15	8	350	165	3693	11.5	70	1	buick skylark 320
18	8	318	150	3436	11.0	70	1	plymouth satellite
16	8	304	150	3433	12.0	70	1	amc rebel sst

5.147315	4.969439	4.934472	4.972994	5.035105
5.127131	5.098530	5.069950	5.022257	4.929652
4.920254	5.119562	4.998753	5.038359	4.994592
4.894486	4.791330	5.172153	4.985587	4.972451
5.026423	4.960072	4.860899	5.141260	5.048628

5.098530	5.069950	5.022257
5.119562	4.998753	5.038359
4.791330	5.172153	4.985587

0.09853026	0.06994970	0.02225712
0.11956189	-0.00124724	0.03835918
-0.20866964	0.17215254	-0.01441297

5.147315	4.96943850	4.93447159	4.97299360	5.035105
5.127131	0.09853026	0.06994970	0.02225712	4.929652
4.920254	0.11956189	-0.00124724	0.03835918	4.994592
4.894486	-0.20866964	0.17215254	-0.01441297	4.972451
5.026423	4.96007209	4.86089947	5.14125969	5.048628