The variables are assigned with R-Objects and the data type of the R-Object becomes the data type of the variable. There are many types of R-Objects, the frequently used ones are:
The values contained in R-Objects belong to one of the following types.
# Boolean/logical
v <- TRUE # assign to the new variable 'v' the boolean value TRUE
print(v) # print the content of 'v'
print(class(v)) # print the class name of the value in 'v'
# Numeric
v <- 2.55
print(class(v))
# Integer (long integer can represent a larger range of values wrt numeric)
v <- 2L
print(class(v))
# Complex
v <- 3 + 2i
print(class(v))
# Char
v <- 'a'
print(class(v))
v <- 'hello'
print(class(v))
# Raw (i.e., how the character are actually stored)
v <- charToRaw('a')
print(v) # 61 is the code for the character 'a'
print(class(v))
print(charToRaw('b'))
print(charToRaw('A'))
As we previously observed, to create vector with more than one element we can use the c() function, which means to combine the elements into a vector. Vectors can hold numeric, character or logical values.
However, the elements in the vector have the same data types, while the list contain different data types of elements like strings, char, numbers. It can also contain vectors or another list, matrix or a function inside it.
# vector of numeric
a <- c(2,3,3,4,5,6)
print(a)
print(class(a))
# Create two vectors.
v1 <- c(3,8,4,5,0,11)
v2 <- c(4,11,0,8,1,2)
# Vector addition.
add <- v1+v2
print(add)
# Vector subtraction.
sub <- v1-v2
print(sub)
# Vector multiplication.
multi <- v1*v2
print(multi)
# Vector division.
div <- v1/v2
print(div)
# Create a vector
v <- c(3,8,4,5,0,11, -9, 300)
# Sort the elements of the vector.
sorted <- sort(v)
print(sorted)
# Sort the elements in the reverse order.
revsort <- sort(v, decreasing = TRUE)
print(revsort)
# Sorting character vectors.
v <- c("Red","Blue","yellow","violet")
sort_c <- sort(v)
print(sort_c)
# Sorting character vectors in reverse order.
revsort_c <- sort(v, decreasing = TRUE)
print(revsort_c)
# but, if I want to mix data types inside a vector...
a <- c(2,3,4,'hello',2.5, TRUE)
print(a)
#...all elements are converted into characters and 'a' is now a vector of chars.
print(class(a))
a <- list(2,3,4)
a
a <- list(1,2,'a',3,'b')
a
a <- list(2,3,'hello', c(5,6), 34.5) # 'a' contains 5 sublists
print(a)
print(class(a))
a <- list(2, # element with index 1
list(3, # element with index 2,1
3, # element with index 2,2
4, # element with index 2,3
5), # element with index 2,4
4) # element with index 3
print(a)
print(class(a))
?matrix
# matrices are 2 dimensional arrays
M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)
print(M)
# arrays can be of any number of dimesions
A = array(c('hello','world'), dim = c(2,2))
print(A)
If there are too few elements in data to fill the array, then the elements in data are recycled.
A = array(c('hello','world'), dim = c(2,4))
print(A)
A = array(c('hello','world','today',
'will','be','a',
'very','long','day'), dim = c(2,3,4))
print(A)
A factor is the R-object created using a vector. It stores the vector along with the distinct values of the elements in the vector as labels. The labels are always character irrespective of whether it is numeric or character or Boolean etc. in the input vector. They are useful in the columns which have a limited number of unique values. Like "Male, "Female" and True, False etc. Factors are created using the factor() function. The nlevels functions gives the count of levels.
# Create a vector.
wheater <- c('sunny', 'sunny',
'sunny', 'cloudy',
'rain', 'rain',
'rain')
# Create a factor object.
factor_wheater <- factor(wheater)
# Print the factor.
print(factor_wheater)
print(nlevels(factor_wheater))
We have seen data frames when we explored the dataset 'Auto'.
Data frames are tabular data objects. Unlike a matrix in data frame each column can contain different modes of data. The first column can be numeric while the second column can be character and third column can be logical. It is a list of vectors of equal length.
Data Frames are created using the data.frame() function.
?data.frame
# Create a data frame
students <- data.frame(
gender = c("Male", "Male","Female"),
height = c(177, 175, 165),
weight = c(81,78,50),
age = c(30L,27L,26L)
)
print(students)
# Transpose the students data frame
t(students)
print(students$age)
# Select the students' weight and compute the mean weight
mw = mean(students$weight)
print(mw)
summary(students)
Any value written within a pair of single quote or double quotes in R is treated as a string. Internally R stores every string within double quotes.
General rules: The quotes at the beginning and end of a string should be both double quotes or both single quote. They can not be mixed.
Examples of valid strings:
# you can exploit the escape notation '\'
a <- 'Start and end \' with single quote'
print(a)
b <- "Start and end with double quotes"
print(b)
c <- "single quote ' in between double quotes"
print(c)
d <- 'Double quotes " in between single quote'
print(d)
Examples of not valid strings:
e <- 'Mixed quotes"
print(e)
f <- 'Single quote ' inside single quote'
print(f)
g <- "Double quotes " inside double quotes"
print(g)
s <- paste("hello", "world")
print(s)
# optionally, we can specify a separator
s <- paste("hello", "world","more", "words", sep="---")
print(s)
Numbers and strings can be formatted to a specificy style using the format() function.
# Total number of digits displayed. Last digit rounded off.
result <- format(23.123456789, digits = 4)
print(result)
result <- format(23.123456789, digits = 9)
print(result)
out_str <- paste("The performance is: ", result)
print(out_str)
# Display numbers in scientific notation.
result <- format(6, scientific = TRUE)
print(result)
result <- format(0.001314521, scientific = TRUE)
print(result)
result <- format(123.998, scientific = TRUE)
print(result)
# you can also input a list of numbers...
result <- format(c(6, 123.345), scientific = TRUE)
print(result)
# The minimum number of digits to the right of the decimal point.
result <- format(c(4,
2,
1.41,
99.2,
12.21548772,
23.47),
nsmall = 5)
print(result)
# Format treats everything as a string.
result <- format(6)
print(result)
?format
# Numbers are padded with blank in the beginning for width.
result <- format(13.7, width = 6)
print(result)
result <- format(123456, width = 6)
print(result)
result <- format(123456.789, width = 6)
print(result)
# to have the same format for all three...
result <- format(c(13.7,
123456,
123456.789), width = 6)
print(result)
# Left justify strings.
result <- format("Hello", width = 8, justify = "l")
print(result)
# Justfy string with center.
result <- format("Hello", width = 8, justify = "c")
print(result)
# Extract characters from 5th to 7th position.
result <- substring("StatisticalLaboratory", 5, 7)
print(result)
print(substring("HelloWorld", 6,10))
Data reshaping allows to change the way data is organized into rows and columns.
Most of the time data processing in R is done by taking the input data as a data frame. It is easy to extract data from the rows and columns of a data frame but there are cases when we need the data frame in a format that is different from format in which we received it.
R has many functions to split, merge and change the rows to columns and vice-versa in a data frame.
# Create vector objects.
city <- c("Catania", "Seattle", "Boston")
state <- c("IT", "WA", "MA")
zipcode <- c(95030,98104,02101)
# Combine above three vectors into one data frame.
addresses <- cbind(city,state,zipcode)
# Print the data frame.
print(addresses)
# Create another data frame with similar columns
new_address <- data.frame(
city = c("Lowry","Charlotte"),
state = c("CO","FL"),
zipcode = c("80230","33949")
# stringsAsFactors = FALSE
)
# Print a header.
cat("# # # The Second data frame\n")
# Print the data frame.
print(new_address)
?rbind
# Combine rows form both the data frames.
combined_addresses <- rbind(addresses,new_address)
# Print a header.
cat("# # # The combined data frame\n")
# Print the result.
print(combined_addresses)