Graphs and data visualization

Statistical Laboratory

Alessandro Ortis - University of Catania

Visualize data and trends is essential to explore data and results of analysis, as well as to present the insights of a data analysis work.

In this part, an overview of the most common graphs and visualization tools will be presented.

Line charts

In [1]:
# Create the data.
v <- c(7,14,20,13,40)

# Plot the bar chart. 
# the parameter 'type' takes:
#  - "p" to draw only the points 
#  - "l" to draw only the lines 
#  - "o" to draw both points and lines
plot(v,type = "o", col="blue")
In [8]:
# Plot the bar chart.
plot(v,
     type = "o", 
     col = "red", 
     xlab = "Time", 
     ylab = "Values",
     main = "Colored chart")

We can draw more than one line on the same chart using the lines() function starting from the secon line.

In this case, may be useful to add a legend to distinguish the data sequences.

In [3]:
?legend
In [12]:
# Create the data for the chart.
v <- c(  7, 12, 28,  3, 41)
t <- c( 14,  7,  6, 19,  3)

# Plot the bar chart.
plot(v,type = "o",
     col = "red", 
     xlab = "Month", 
     ylab = "Value", 
     main = "Rain fall chart")

lines(t, type = "o", col = "blue", lty=2)

legend("topleft", 
      c("first sequence", 
        "second sequence"),
      lty=c(1,2), # gives the legend appropriate symbols (lines)
      col=c("red", "blue")
)

Scatterplots

Scatterplots show a set of points plotted on the Cartesian plane.

Each point represents the values of two variables. The simple scatterplot is created using the plot() function.

NB: mtcarts is a dataset available in R, head is a function that shows the first n rows of an object (e.g., list, matrix, data frame, etc.). See head() and tail() functions documentation for more details.

In [13]:
head(mtcars)
mpgcyldisphpdratwtqsecvsamgearcarb
Mazda RX421.0 6 160 110 3.90 2.62016.460 1 4 4
Mazda RX4 Wag21.0 6 160 110 3.90 2.87517.020 1 4 4
Datsun 71022.8 4 108 93 3.85 2.32018.611 1 4 1
Hornet 4 Drive21.4 6 258 110 3.08 3.21519.441 0 3 1
Hornet Sportabout18.7 8 360 175 3.15 3.44017.020 0 3 2
Valiant18.1 6 225 105 2.76 3.46020.221 0 3 1
In [7]:
# wt: weight   mpg: miles per gallon
# select all the rows and two columns
input <- mtcars[,c('wt','mpg')] 
# the first n, by default is n = 6
print("First data elements:")
print(head(input, n = 10))
print("Elements from the tail:")
# the last n elemnt
print(tail(input, n = 5))
[1] "First data elements:"
                     wt  mpg
Mazda RX4         2.620 21.0
Mazda RX4 Wag     2.875 21.0
Datsun 710        2.320 22.8
Hornet 4 Drive    3.215 21.4
Hornet Sportabout 3.440 18.7
Valiant           3.460 18.1
Duster 360        3.570 14.3
Merc 240D         3.190 24.4
Merc 230          3.150 22.8
Merc 280          3.440 19.2
[1] "Elements from the tail:"
                  wt  mpg
Lotus Europa   1.513 30.4
Ford Pantera L 3.170 15.8
Ferrari Dino   2.770 19.7
Maserati Bora  3.570 15.0
Volvo 142E     2.780 21.4

The following script creates a scatterplot graph for the relation between wt(weight) and mpg(miles per gallon).

In [8]:
# Plot the chart for cars with weight between 2.5 to 5 and mileage between 15 and 30.
plot(x = input$wt,
     y = input$mpg,
     xlab = "Weight",
     ylab = "Milage",
   #  xlim = c(1,4.5),
   #  ylim = c(10,40),
     main = "Weight vs Milage",
     col = "red"
)

When we have more than two variables and we want to find the correlation between one variable versus the remaining ones we use scatterplot matrix. We use pairs() function to create matrices of scatterplots.

This function takes two main parameters:

  • formula: the series of variables used in pairs;
  • data: the data from which extract the variables.

the optional argument 'panel=panel.smooth' allows the visualization an estimate of the relationship between pairs of variables.

In [18]:
names(mtcars)
  1. 'mpg'
  2. 'cyl'
  3. 'disp'
  4. 'hp'
  5. 'drat'
  6. 'wt'
  7. 'qsec'
  8. 'vs'
  9. 'am'
  10. 'gear'
  11. 'carb'
In [21]:
names(mtcars)
# Plot the matrices between 4 variables giving 12 plots.
# One variable with 3 others and total 4 variables.
pairs(~wt+mpg+disp+cyl,
      data = mtcars,
      main = "Scatterplot Matrix",
     panel = panel.smooth
     )
  1. 'mpg'
  2. 'cyl'
  3. 'disp'
  4. 'hp'
  5. 'drat'
  6. 'wt'
  7. 'qsec'
  8. 'vs'
  9. 'am'
  10. 'gear'
  11. 'carb'

Pie Charts

In [19]:
# Create data for the graph.
x <- c(21, 62, 10, 53)
labels <- c("London", "New York", "Singapore", "Mumbai")

# Plot the chart.
pie(x,labels)
In [14]:
?rainbow
In [21]:
x <- c(21, 62, 10, 53)
labels <- c("London", "New York", "Singapore", "Mumbai")

# Plot the chart with title and rainbow color pallet.
pie(x, labels, main = "City pie chart", col = rainbow(length(x)))
In [23]:
?legend
In [27]:
# Create data for the graph.
x <-  c(21, 62, 10,53)
labels <-  c("London","New York","Singapore","Mumbai")

piepercent<- round(100*x/sum(x), 1)


# Plot the chart.
pie(x, labels = piepercent, 
    main = "City pie chart",
    col = rainbow(length(x))) #end

legend("topright", c("London","New York","Singapore","Mumbai"), 
       cex = 0.8,
   fill = rainbow(length(x)))

Exercise

Repeat the previous example by plotting both the name of the city and the percentage as pie chart slice legend.

In [ ]:

Bar charts and histograms

In [28]:
?hist
In [24]:
# Standard histogram of random data
data = rnorm(1000, mean = 50, sd = 3)
hist(data)
In [26]:
# histogram with added parameters
hist(data,
main="Random Gaussian data with mean=50 and stdev=3",
xlab="Random variable",
xlim=c(30,80),
col="darkmagenta",
freq=FALSE
)
In [15]:
?barplot
In [29]:
temp <- c(14,18,20,25,27)
labels <- c("Mar","Apr","May","Jun","Jul")

# Plot the bar chart 
barplot(temp,
        names.arg=labels,
        xlab="Month",
        ylab="Temperature",
        col="blue",border="red",
main="Avg temperature chart")
In [44]:
# Create the input vectors.
colors = c("green","orange","brown")
months <- c("Mar","Apr","May","Jun","Jul")
regions <- c("East","West","North")

# Create the matrix of the values.
Values <- matrix(c(2,9,3,11,9,
                   4,8,7,3,12,
                   5,2,8,10,11),
                 nrow = 3, ncol = 5, byrow = TRUE)

# Create the bar chart
barplot(Values, main = "total revenue",
        names.arg = months, xlab = "month", 
        ylab = "revenue", col = colors)

# Add the legend to the chart
legend("topleft", regions, cex = 1.3, fill = colors)

NB: for advanced graphs and data visualization tool, you can use the ggplot2 library included in the tidyverse package: https://r4ds.had.co.nz/data-visualisation.html

In [ ]: