Visualize data and trends is essential to explore data and results of analysis, as well as to present the insights of a data analysis work.
In this part, an overview of the most common graphs and visualization tools will be presented.
# Create the data.
v <- c(7,14,20,13,40)
# Plot the bar chart.
# the parameter 'type' takes:
# - "p" to draw only the points
# - "l" to draw only the lines
# - "o" to draw both points and lines
plot(v,type = "o", col="blue")
# Plot the bar chart.
plot(v,
type = "o",
col = "red",
xlab = "Time",
ylab = "Values",
main = "Colored chart")
We can draw more than one line on the same chart using the lines() function starting from the secon line.
In this case, may be useful to add a legend to distinguish the data sequences.
?legend
# Create the data for the chart.
v <- c( 7, 12, 28, 3, 41)
t <- c( 14, 7, 6, 19, 3)
# Plot the bar chart.
plot(v,type = "o",
col = "red",
xlab = "Month",
ylab = "Value",
main = "Rain fall chart")
lines(t, type = "o", col = "blue", lty=2)
legend("topleft",
c("first sequence",
"second sequence"),
lty=c(1,2), # gives the legend appropriate symbols (lines)
col=c("red", "blue")
)
Scatterplots show a set of points plotted on the Cartesian plane.
Each point represents the values of two variables. The simple scatterplot is created using the plot() function.
NB: mtcarts is a dataset available in R, head is a function that shows the first n rows of an object (e.g., list, matrix, data frame, etc.). See head() and tail() functions documentation for more details.
head(mtcars)
# wt: weight mpg: miles per gallon
# select all the rows and two columns
input <- mtcars[,c('wt','mpg')]
# the first n, by default is n = 6
print("First data elements:")
print(head(input, n = 10))
print("Elements from the tail:")
# the last n elemnt
print(tail(input, n = 5))
The following script creates a scatterplot graph for the relation between wt(weight) and mpg(miles per gallon).
# Plot the chart for cars with weight between 2.5 to 5 and mileage between 15 and 30.
plot(x = input$wt,
y = input$mpg,
xlab = "Weight",
ylab = "Milage",
# xlim = c(1,4.5),
# ylim = c(10,40),
main = "Weight vs Milage",
col = "red"
)
When we have more than two variables and we want to find the correlation between one variable versus the remaining ones we use scatterplot matrix. We use pairs() function to create matrices of scatterplots.
This function takes two main parameters:
the optional argument 'panel=panel.smooth' allows the visualization an estimate of the relationship between pairs of variables.
names(mtcars)
names(mtcars)
# Plot the matrices between 4 variables giving 12 plots.
# One variable with 3 others and total 4 variables.
pairs(~wt+mpg+disp+cyl,
data = mtcars,
main = "Scatterplot Matrix",
panel = panel.smooth
)
# Create data for the graph.
x <- c(21, 62, 10, 53)
labels <- c("London", "New York", "Singapore", "Mumbai")
# Plot the chart.
pie(x,labels)
?rainbow
x <- c(21, 62, 10, 53)
labels <- c("London", "New York", "Singapore", "Mumbai")
# Plot the chart with title and rainbow color pallet.
pie(x, labels, main = "City pie chart", col = rainbow(length(x)))
?legend
# Create data for the graph.
x <- c(21, 62, 10,53)
labels <- c("London","New York","Singapore","Mumbai")
piepercent<- round(100*x/sum(x), 1)
# Plot the chart.
pie(x, labels = piepercent,
main = "City pie chart",
col = rainbow(length(x))) #end
legend("topright", c("London","New York","Singapore","Mumbai"),
cex = 0.8,
fill = rainbow(length(x)))
Repeat the previous example by plotting both the name of the city and the percentage as pie chart slice legend.
?hist
# Standard histogram of random data
data = rnorm(1000, mean = 50, sd = 3)
hist(data)
# histogram with added parameters
hist(data,
main="Random Gaussian data with mean=50 and stdev=3",
xlab="Random variable",
xlim=c(30,80),
col="darkmagenta",
freq=FALSE
)
?barplot
temp <- c(14,18,20,25,27)
labels <- c("Mar","Apr","May","Jun","Jul")
# Plot the bar chart
barplot(temp,
names.arg=labels,
xlab="Month",
ylab="Temperature",
col="blue",border="red",
main="Avg temperature chart")
# Create the input vectors.
colors = c("green","orange","brown")
months <- c("Mar","Apr","May","Jun","Jul")
regions <- c("East","West","North")
# Create the matrix of the values.
Values <- matrix(c(2,9,3,11,9,
4,8,7,3,12,
5,2,8,10,11),
nrow = 3, ncol = 5, byrow = TRUE)
# Create the bar chart
barplot(Values, main = "total revenue",
names.arg = months, xlab = "month",
ylab = "revenue", col = colors)
# Add the legend to the chart
legend("topleft", regions, cex = 1.3, fill = colors)
NB: for advanced graphs and data visualization tool, you can use the ggplot2 library included in the tidyverse package: https://r4ds.had.co.nz/data-visualisation.html