In this lesson we will introduce some basic R commands. They best way to learn a new programming language is to try out the commands.
The help() function provides details about a command. Is possible to run help(command).
help(solve) #open a new help window with additional information about the function 'help'
?solve #obtain the same result
Tip: everytime you see a new command, run the help() function to have a comprhensive knowledge of it.
R uses functions to perform operations. To run a function called funcname we type funcname(input1, input2), where input1 and input 2 are the inputs (or parameters) of the function.
A function can have any number of inputs. For example, to create a vector of numbers, we use the function c() (for concatenate). Any numbers inside the parentheses are joined together.
v <- c(1,3,1,2,5,3,1) # <- (or =) denotes the assigment of a value to a variable name
z <- c(2,2,2,v) # concatenate 2,2,2 with the existing vector v
v
z
?c
length(v)
v
z
# It will raise a warning message because v and z have different lengths
v + z
t = c(v,0,0,0) # concatenate v with 3 zeros to match the length of z
length(t)
length(z)
t + z
ls() # lists all the objects that we have saved so far.
rm(v) # remove 'v' from memory
ls()
t
z
# We can apply standard mathematical operations to numeric vectors
t * z
t + z
t / z
t - z
The matrix() function can be used to create a matrix of numbers. Among all the numerous inputs of this function, let's focus on the first three:
?matrix()
# Note that, the matrix func. will fill the matrix by column
x = matrix(data = c(1,2,3,4),
nrow = 2,
ncol = 2)
x
x = matrix(data = c(1,2,3,4,5,6,7,8),
nrow = 2,
ncol = 4)
x
x = matrix(data = c(1,2,3,4,5,6,7,8),
nrow = 4,
ncol = 2)
x
x = matrix(c(1,2,3,4), 2, 2, byrow=TRUE)
x
x = matrix(data = c(1,2,3,4,5,6,7,8),
nrow = 4,
ncol = 2,
byrow=TRUE)
x
sqrt(x) # returns the square root of each element of a vector or matrix
x^2 # raises each element of x to the power 2
10^3
Here we create two correlated sets of numbers by using the rnorm() function, and use the cor() function to compute the correlation between them.
?rnorm
random_normal = rnorm(50) # generates a vector of 50 random (normal) variables with first argument n the sample size
random_gaussian = random_normal + rnorm(50, mean = 100, sd = .1)
cor(random_normal, random_gaussian)
rnorm(5)
To generate the same sequences of (pseudo) random numbers, we can run seed() before any randomic function.
set.seed(24)
rnorm(5)
data = rnorm(100, mean = 24, sd = 1)
mean(data)
var(data)
sqrt(var(data)) # square root of var...
sd(data) # ... is equal to the std dev!
summary(data)
?seq
#seq(a,b,length=.) is a function that allows us to generate a sequence of numbers between a and b with a specified length
seq(5,13)
seq(5,13, length=10)
seq(13,5)
#help(seq)
seq(1,10,2)
seq(1,10,length = 3)
seq(3,15,length = 5)
seq(10) # Implicitly starts from 1
seq(1,10) # It's the same!
1:10 # again the same!
x <- 1:10 # a:b is shorthand for seq(1,10)
x
We often wish to examine part of a set of data. Suppose that our data is stored in a given matrix A.
# matrix(data, nrow, ncol)
A = matrix (1:16 ,4 ,4)
A
A[2,3] # select a single element at row=2 and col= 3
The first number after the open-bracket symbol [ always refers to the row, and the second number always refers to the column. We can also select multiple rows and columns at a time, by providing vectors as the indices.
A
# rows: 1 and 3 cols: 2 and 4
# selected elements:
# [1,2] -> 5
# [3,2] -> 7
# [1,4] -> 13
# [3,4] -> 15
#A[c(1 ,3) ,c(2 ,4) ] -> ?
A[c(1 ,3),c(2 ,4)]
# 1:3 means seq(1,3) -> [1,2,3]
A
A[1:3, 2:4] # rows: from 1 to 3, cols: from 2 to 4
A
A[1:2,]
A[ ,1:2]
# R treats a single row/column of a matrix as a vector
A[1,]
# The dim() function outputs the number of rows followed by the number of columns of a given matrix
d <- dim(A)
d
# The $ operator allows you to extract elements by name from a named list
person <- list(name='Mikey', surname='Mouse', age=33)
person$name
person$surname
person$age
# The names of a list can be found using names
names(person)
For most analyses, the first step involves importing a data set into R.
# old version
#Auto = read.csv("../Datasets/Auto.csv")
#fix(Auto)
?read.csv
Auto= read.csv("../Datasets/Auto.csv", header =T,na.strings ="?")
The dim() function tells us that the data has 397 observations, or rows, and 9 variables, or columns.
dim(Auto)
Auto[1:4,]
There are various ways to deal with the missing data. In this case, only 5 of the rows contain missing observations, and so we choose to use the na.omit() function to simply remove these rows.
?na.omit
Auto=na.omit(Auto)
dim(Auto)
data = matrix(1:15,5,3)
data[2,3] = NA
data[3,1] = NA
data[4,2] = NA
data
na.omit(data)
Once the data are loaded correctly, we can use names() to check the variable names.
names(Auto)
all_the_auto_names <- Auto$name
all_the_auto_names
# The round brakets will output the assignment value
#(all_the_auto_names <- Auto$name)
plot(x,y) produces a scatterplot of the numbers in x versus the ones in y.
?plot
x = rnorm(10000)
y = rnorm(10000)
plot(x,y)
# add labels
plot(x, y, xlab='x data', ylab= 'y data', main= 'Plot of x versus y')
We will often want to save the otuput of a plot to a file.
?pdf
pdf("Figure.pdf") # Start creating a new PDF file
plot(x,y,col="green") # Plot x vs. y using green circles
dev.off() # Stop creating the file, and save it
x = seq(100)
y = x^3
plot(x,y, xlab = "x", ylab="y = x^3", main = "x^3 function")
x = seq(-pi*2, pi, length=1000)
#x
y = sin(x)
plot(x,y, main="sin(x) with x in [-2*pi,pi]")
y2= cos(x)
plot(x,y2,col="green")
The contour() function produces a contour plot in order to represent three-dimensional data; it is like a topographical map and takes three arguments:
There are many other inputs that can be used to fine-tune the output of the contour() function. To learn more about these, take a look at the help file by typing ?contour.
?outer
x = seq(-pi, pi, length=50)
y <- x
f <- outer(x,y,
function(x,y)
cos(y)/(1+x^2)
) #We write a cosine function whose domain is x,y and range is f
contour(x,y,f)
#contour(x,y,f,nlevels=45,add=T)
# t(x) returns the transpose of x
#fa = (f-t(f))/2
#contour(x,y,fa,nlevels=15)
image(x,y,f)
persp(x,y,f , theta =30, phi =40)
Create a 5x5 matrix M containing elements randomly selected from a Gaussian distribution with mean 5 and stdev 0.1. Then, extract the central 3x3 sub-matrix from M, let's call this submatrix N, then create the matrix Q = N - 5 and place Q within M, in the same position where N has been extracted. Then, run the following command:
image(1:5,1:5,M)
can you explain the result?
M = matrix(data=rnorm(25,5,0.1), nrow=5, ncol=5)
N = M[2:4,2:4]
Q = N - 5
M[2:4,2:4] = Q
image(1:5,1:5,M)
# Step by step ....
M = matrix(rnorm(25, mean=5, sd=0.1), nrow = 5, ncol =5)
M
N = M[2:4, 2:4]
N
Q = N - 5
Q
M[2:4, 2:4] = Q
M
#?image
image(1:5,1:5,M)
Define the code to obtain the following output by creating a 8x8 matrix.
# Solution by M. Andronaco
M = matrix(c(0,1),nrow=9,ncol=8)
M = M[1:8,]
image(1:8,1:8,M)
# Solution by C. Cimino
S = matrix(data = rnorm(64, mean = 5, sd = 0.1), nrow = 8, ncol = 8)
#print(S)
W = S[c(2,4,6,8), c(1,3,5,7)]
#print(W)
J = W - 6
#print(J)
S[c(2,4,6,8), c(1,3,5,7)] = J
#print(S)
K = S[c(1,3,5,7), c(2,4,6,8)]
#print(K)
L = K - 6
#print(L)
S[c(1,3,5,7), c(2,4,6,8)] = L
print(S)
image(1:8, 1:8, S)
M = matrix(rnorm(64, mean=5, sd=0.01), nrow = 8, ncol =8)
M[seq(1,8,2),seq(2,8,2)] = 0
M[seq(2,8,2),seq(1,8,2)] = 0
image(1:8,1:8,M)