6  Data Visualisation

6.1 Plots and Graphs

The objective of this section is to provide information on the topic under consideration, along with examples and exercises. You should be able to work through it in R studio. This section requires some packages to be loaded.

# Loading libraries

library(ggplot2) # data visualisation
library(dplyr) # data manipulation

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

To demonstrate the visuals, let us load a dataframe called ihs5_consumption which was generated in{Chapter 5}.

# Loading the data

ihs5_consumption <- read.csv(here::here("data", "ihs5_consumption.csv")) %>% 
  mutate(region = as.factor(region))

This dataframe contains 36 variables, of which we will be focusing on food_item, consumption_per_person, and region.

The specific objective of the material in this script is to introduce you to different graphic used in R. By the end you should have a better understanding of some basic concepts regarding data visualisation, and should be better-placed to start developing and editing scripts yourself. The particular topics we shall cover are:

  1. Univariate graphs
  2. Multivariate graphs
  3. Controlling layout
  4. Printing graphs

6.2 Univariate graphs

In this section, we look at graphics that we may create with a single variable. This includes histograms, boxplots, bar charts, as well as QQ plots. These are usually important in checking the distribution of variables in your dataset or checking the residuals of a fitted model.

6.2.1 Histogram

# Generating the base for the plot

ihs5_consumption %>% 
  ggplot()

# Creating the histogram

ihs5_consumption %>% 
  ggplot() +
  geom_histogram(aes(consumption_per_person))
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Changing colour of a histogram

This is done by adding argument fill ="color". There are various options of colors that can be used. You can check the various options of colors you can use by typing colors().

colors()
  [1] "white"                "aliceblue"            "antiquewhite"        
  [4] "antiquewhite1"        "antiquewhite2"        "antiquewhite3"       
  [7] "antiquewhite4"        "aquamarine"           "aquamarine1"         
 [10] "aquamarine2"          "aquamarine3"          "aquamarine4"         
 [13] "azure"                "azure1"               "azure2"              
 [16] "azure3"               "azure4"               "beige"               
 [19] "bisque"               "bisque1"              "bisque2"             
 [22] "bisque3"              "bisque4"              "black"               
 [25] "blanchedalmond"       "blue"                 "blue1"               
 [28] "blue2"                "blue3"                "blue4"               
 [31] "blueviolet"           "brown"                "brown1"              
 [34] "brown2"               "brown3"               "brown4"              
 [37] "burlywood"            "burlywood1"           "burlywood2"          
 [40] "burlywood3"           "burlywood4"           "cadetblue"           
 [43] "cadetblue1"           "cadetblue2"           "cadetblue3"          
 [46] "cadetblue4"           "chartreuse"           "chartreuse1"         
 [49] "chartreuse2"          "chartreuse3"          "chartreuse4"         
 [52] "chocolate"            "chocolate1"           "chocolate2"          
 [55] "chocolate3"           "chocolate4"           "coral"               
 [58] "coral1"               "coral2"               "coral3"              
 [61] "coral4"               "cornflowerblue"       "cornsilk"            
 [64] "cornsilk1"            "cornsilk2"            "cornsilk3"           
 [67] "cornsilk4"            "cyan"                 "cyan1"               
 [70] "cyan2"                "cyan3"                "cyan4"               
 [73] "darkblue"             "darkcyan"             "darkgoldenrod"       
 [76] "darkgoldenrod1"       "darkgoldenrod2"       "darkgoldenrod3"      
 [79] "darkgoldenrod4"       "darkgray"             "darkgreen"           
 [82] "darkgrey"             "darkkhaki"            "darkmagenta"         
 [85] "darkolivegreen"       "darkolivegreen1"      "darkolivegreen2"     
 [88] "darkolivegreen3"      "darkolivegreen4"      "darkorange"          
 [91] "darkorange1"          "darkorange2"          "darkorange3"         
 [94] "darkorange4"          "darkorchid"           "darkorchid1"         
 [97] "darkorchid2"          "darkorchid3"          "darkorchid4"         
[100] "darkred"              "darksalmon"           "darkseagreen"        
[103] "darkseagreen1"        "darkseagreen2"        "darkseagreen3"       
[106] "darkseagreen4"        "darkslateblue"        "darkslategray"       
[109] "darkslategray1"       "darkslategray2"       "darkslategray3"      
[112] "darkslategray4"       "darkslategrey"        "darkturquoise"       
[115] "darkviolet"           "deeppink"             "deeppink1"           
[118] "deeppink2"            "deeppink3"            "deeppink4"           
[121] "deepskyblue"          "deepskyblue1"         "deepskyblue2"        
[124] "deepskyblue3"         "deepskyblue4"         "dimgray"             
[127] "dimgrey"              "dodgerblue"           "dodgerblue1"         
[130] "dodgerblue2"          "dodgerblue3"          "dodgerblue4"         
[133] "firebrick"            "firebrick1"           "firebrick2"          
[136] "firebrick3"           "firebrick4"           "floralwhite"         
[139] "forestgreen"          "gainsboro"            "ghostwhite"          
[142] "gold"                 "gold1"                "gold2"               
[145] "gold3"                "gold4"                "goldenrod"           
[148] "goldenrod1"           "goldenrod2"           "goldenrod3"          
[151] "goldenrod4"           "gray"                 "gray0"               
[154] "gray1"                "gray2"                "gray3"               
[157] "gray4"                "gray5"                "gray6"               
[160] "gray7"                "gray8"                "gray9"               
[163] "gray10"               "gray11"               "gray12"              
[166] "gray13"               "gray14"               "gray15"              
[169] "gray16"               "gray17"               "gray18"              
[172] "gray19"               "gray20"               "gray21"              
[175] "gray22"               "gray23"               "gray24"              
[178] "gray25"               "gray26"               "gray27"              
[181] "gray28"               "gray29"               "gray30"              
[184] "gray31"               "gray32"               "gray33"              
[187] "gray34"               "gray35"               "gray36"              
[190] "gray37"               "gray38"               "gray39"              
[193] "gray40"               "gray41"               "gray42"              
[196] "gray43"               "gray44"               "gray45"              
[199] "gray46"               "gray47"               "gray48"              
[202] "gray49"               "gray50"               "gray51"              
[205] "gray52"               "gray53"               "gray54"              
[208] "gray55"               "gray56"               "gray57"              
[211] "gray58"               "gray59"               "gray60"              
[214] "gray61"               "gray62"               "gray63"              
[217] "gray64"               "gray65"               "gray66"              
[220] "gray67"               "gray68"               "gray69"              
[223] "gray70"               "gray71"               "gray72"              
[226] "gray73"               "gray74"               "gray75"              
[229] "gray76"               "gray77"               "gray78"              
[232] "gray79"               "gray80"               "gray81"              
[235] "gray82"               "gray83"               "gray84"              
[238] "gray85"               "gray86"               "gray87"              
[241] "gray88"               "gray89"               "gray90"              
[244] "gray91"               "gray92"               "gray93"              
[247] "gray94"               "gray95"               "gray96"              
[250] "gray97"               "gray98"               "gray99"              
[253] "gray100"              "green"                "green1"              
[256] "green2"               "green3"               "green4"              
[259] "greenyellow"          "grey"                 "grey0"               
[262] "grey1"                "grey2"                "grey3"               
[265] "grey4"                "grey5"                "grey6"               
[268] "grey7"                "grey8"                "grey9"               
[271] "grey10"               "grey11"               "grey12"              
[274] "grey13"               "grey14"               "grey15"              
[277] "grey16"               "grey17"               "grey18"              
[280] "grey19"               "grey20"               "grey21"              
[283] "grey22"               "grey23"               "grey24"              
[286] "grey25"               "grey26"               "grey27"              
[289] "grey28"               "grey29"               "grey30"              
[292] "grey31"               "grey32"               "grey33"              
[295] "grey34"               "grey35"               "grey36"              
[298] "grey37"               "grey38"               "grey39"              
[301] "grey40"               "grey41"               "grey42"              
[304] "grey43"               "grey44"               "grey45"              
[307] "grey46"               "grey47"               "grey48"              
[310] "grey49"               "grey50"               "grey51"              
[313] "grey52"               "grey53"               "grey54"              
[316] "grey55"               "grey56"               "grey57"              
[319] "grey58"               "grey59"               "grey60"              
[322] "grey61"               "grey62"               "grey63"              
[325] "grey64"               "grey65"               "grey66"              
[328] "grey67"               "grey68"               "grey69"              
[331] "grey70"               "grey71"               "grey72"              
[334] "grey73"               "grey74"               "grey75"              
[337] "grey76"               "grey77"               "grey78"              
[340] "grey79"               "grey80"               "grey81"              
[343] "grey82"               "grey83"               "grey84"              
[346] "grey85"               "grey86"               "grey87"              
[349] "grey88"               "grey89"               "grey90"              
[352] "grey91"               "grey92"               "grey93"              
[355] "grey94"               "grey95"               "grey96"              
[358] "grey97"               "grey98"               "grey99"              
[361] "grey100"              "honeydew"             "honeydew1"           
[364] "honeydew2"            "honeydew3"            "honeydew4"           
[367] "hotpink"              "hotpink1"             "hotpink2"            
[370] "hotpink3"             "hotpink4"             "indianred"           
[373] "indianred1"           "indianred2"           "indianred3"          
[376] "indianred4"           "ivory"                "ivory1"              
[379] "ivory2"               "ivory3"               "ivory4"              
[382] "khaki"                "khaki1"               "khaki2"              
[385] "khaki3"               "khaki4"               "lavender"            
[388] "lavenderblush"        "lavenderblush1"       "lavenderblush2"      
[391] "lavenderblush3"       "lavenderblush4"       "lawngreen"           
[394] "lemonchiffon"         "lemonchiffon1"        "lemonchiffon2"       
[397] "lemonchiffon3"        "lemonchiffon4"        "lightblue"           
[400] "lightblue1"           "lightblue2"           "lightblue3"          
[403] "lightblue4"           "lightcoral"           "lightcyan"           
[406] "lightcyan1"           "lightcyan2"           "lightcyan3"          
[409] "lightcyan4"           "lightgoldenrod"       "lightgoldenrod1"     
[412] "lightgoldenrod2"      "lightgoldenrod3"      "lightgoldenrod4"     
[415] "lightgoldenrodyellow" "lightgray"            "lightgreen"          
[418] "lightgrey"            "lightpink"            "lightpink1"          
[421] "lightpink2"           "lightpink3"           "lightpink4"          
[424] "lightsalmon"          "lightsalmon1"         "lightsalmon2"        
[427] "lightsalmon3"         "lightsalmon4"         "lightseagreen"       
[430] "lightskyblue"         "lightskyblue1"        "lightskyblue2"       
[433] "lightskyblue3"        "lightskyblue4"        "lightslateblue"      
[436] "lightslategray"       "lightslategrey"       "lightsteelblue"      
[439] "lightsteelblue1"      "lightsteelblue2"      "lightsteelblue3"     
[442] "lightsteelblue4"      "lightyellow"          "lightyellow1"        
[445] "lightyellow2"         "lightyellow3"         "lightyellow4"        
[448] "limegreen"            "linen"                "magenta"             
[451] "magenta1"             "magenta2"             "magenta3"            
[454] "magenta4"             "maroon"               "maroon1"             
[457] "maroon2"              "maroon3"              "maroon4"             
[460] "mediumaquamarine"     "mediumblue"           "mediumorchid"        
[463] "mediumorchid1"        "mediumorchid2"        "mediumorchid3"       
[466] "mediumorchid4"        "mediumpurple"         "mediumpurple1"       
[469] "mediumpurple2"        "mediumpurple3"        "mediumpurple4"       
[472] "mediumseagreen"       "mediumslateblue"      "mediumspringgreen"   
[475] "mediumturquoise"      "mediumvioletred"      "midnightblue"        
[478] "mintcream"            "mistyrose"            "mistyrose1"          
[481] "mistyrose2"           "mistyrose3"           "mistyrose4"          
[484] "moccasin"             "navajowhite"          "navajowhite1"        
[487] "navajowhite2"         "navajowhite3"         "navajowhite4"        
[490] "navy"                 "navyblue"             "oldlace"             
[493] "olivedrab"            "olivedrab1"           "olivedrab2"          
[496] "olivedrab3"           "olivedrab4"           "orange"              
[499] "orange1"              "orange2"              "orange3"             
[502] "orange4"              "orangered"            "orangered1"          
[505] "orangered2"           "orangered3"           "orangered4"          
[508] "orchid"               "orchid1"              "orchid2"             
[511] "orchid3"              "orchid4"              "palegoldenrod"       
[514] "palegreen"            "palegreen1"           "palegreen2"          
[517] "palegreen3"           "palegreen4"           "paleturquoise"       
[520] "paleturquoise1"       "paleturquoise2"       "paleturquoise3"      
[523] "paleturquoise4"       "palevioletred"        "palevioletred1"      
[526] "palevioletred2"       "palevioletred3"       "palevioletred4"      
[529] "papayawhip"           "peachpuff"            "peachpuff1"          
[532] "peachpuff2"           "peachpuff3"           "peachpuff4"          
[535] "peru"                 "pink"                 "pink1"               
[538] "pink2"                "pink3"                "pink4"               
[541] "plum"                 "plum1"                "plum2"               
[544] "plum3"                "plum4"                "powderblue"          
[547] "purple"               "purple1"              "purple2"             
[550] "purple3"              "purple4"              "red"                 
[553] "red1"                 "red2"                 "red3"                
[556] "red4"                 "rosybrown"            "rosybrown1"          
[559] "rosybrown2"           "rosybrown3"           "rosybrown4"          
[562] "royalblue"            "royalblue1"           "royalblue2"          
[565] "royalblue3"           "royalblue4"           "saddlebrown"         
[568] "salmon"               "salmon1"              "salmon2"             
[571] "salmon3"              "salmon4"              "sandybrown"          
[574] "seagreen"             "seagreen1"            "seagreen2"           
[577] "seagreen3"            "seagreen4"            "seashell"            
[580] "seashell1"            "seashell2"            "seashell3"           
[583] "seashell4"            "sienna"               "sienna1"             
[586] "sienna2"              "sienna3"              "sienna4"             
[589] "skyblue"              "skyblue1"             "skyblue2"            
[592] "skyblue3"             "skyblue4"             "slateblue"           
[595] "slateblue1"           "slateblue2"           "slateblue3"          
[598] "slateblue4"           "slategray"            "slategray1"          
[601] "slategray2"           "slategray3"           "slategray4"          
[604] "slategrey"            "snow"                 "snow1"               
[607] "snow2"                "snow3"                "snow4"               
[610] "springgreen"          "springgreen1"         "springgreen2"        
[613] "springgreen3"         "springgreen4"         "steelblue"           
[616] "steelblue1"           "steelblue2"           "steelblue3"          
[619] "steelblue4"           "tan"                  "tan1"                
[622] "tan2"                 "tan3"                 "tan4"                
[625] "thistle"              "thistle1"             "thistle2"            
[628] "thistle3"             "thistle4"             "tomato"              
[631] "tomato1"              "tomato2"              "tomato3"             
[634] "tomato4"              "turquoise"            "turquoise1"          
[637] "turquoise2"           "turquoise3"           "turquoise4"          
[640] "violet"               "violetred"            "violetred1"          
[643] "violetred2"           "violetred3"           "violetred4"          
[646] "wheat"                "wheat1"               "wheat2"              
[649] "wheat3"               "wheat4"               "whitesmoke"          
[652] "yellow"               "yellow1"              "yellow2"             
[655] "yellow3"              "yellow4"              "yellowgreen"         

The color name is placed in quotation marks. Let us make our histogram dark blue.

# Changing  colour of the histogram
ihs5_consumption %>% 
  ggplot() +
  geom_histogram(aes(consumption_per_person), fill = "darkblue")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This produces a histogram with blue bars, an x-axis labelled "consumption_per_person" and no title. All these three can be changed to your preference by adding extra arguments to the ggplot() function.

For instance, changing name of x-axis: This is done by adding argument xlab("name of axis"). Note that the name of axis is in quotation marks. Lets assume these data is food consumption data.

ihs5_consumption %>% 
  ggplot() +
  geom_histogram(aes(consumption_per_person), fill = "darkblue") + 
  xlab("food consumption per person (g/day)") 
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Tip

Note that in some published graphs you will find a "solidus" or / inbetween the name of the variable and the units. This is good practice for presenting units in axis labels, favoured by many publishers. The quantities on some axis labels have dimensions which are ratios, like gram per day. This can be done "g/day" but that is not good scientific practice, particularly if you are using the solidus to indicate units as above. It is better to follow the "g" with a power "-1". In R we can do this as follows (of course your data won’t be realistic for this example!)

# Using expession for labelling units in x-axis
ihs5_consumption %>% 
  ggplot() +
  geom_histogram(aes(consumption_per_person), fill = "darkblue") + 
  xlab(expression("g day"^-1))
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Changing main title

This is done by adding argument ggtitle("name of main title"). Note that the name of axis is in quotation marks. Lets assume these data is food consumption per person data.

# Adding the title
ihs5_consumption %>% 
  ggplot() +
  geom_histogram(aes(consumption_per_person), fill = "darkblue") + 
  xlab(expression("g day"^-1)) +
  ggtitle("Histogram of food consumption per person") 
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

You can also change other features like the contour of the bins or the width.

# Changing bin width of the histogram
ihs5_consumption %>% 
  ggplot() +
  geom_histogram(aes(consumption_per_person), binwidth = 0.5) +  
  xlab(expression("g day"^-1)) +
  ggtitle("Histogram of food consumption per person") 

# Changing outline colour of the histogram
ihs5_consumption %>% 
  ggplot() +
  geom_histogram(aes(consumption_per_person), colour = "green") +  
  xlab(expression("g day"^-1)) +
  ggtitle("Histogram of food consumption per person") 
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Exercise
  1. generate a red histogram. Label the histogram appropriately, assuming that these are data for Food consumption per household in kilograms per week.

6.2.2 QQ plots

The second type of plot we can look at is the QQ plot. This plot is used to check normality of data. The argument used is stat_qq(), and it needs to specify the sample=variable.

# Starting with the empty plot 
ihs5_consumption %>% 
  ggplot() 

# QQ plot
ihs5_consumption %>% 
  ggplot() +
  stat_qq(aes(sample = consumption_per_person))

The argument for this function is the soil moisture data. The sample quantiles are just the data values, plotted in increasing order. The theoretical quantiles are the corresponding values for an ordered set of the same number of variables with the standard normal distribution (mean zero variance 1). This means that, if the data are normal, the QQ plot should lie on a straight line. The stat_qq_line() command adds this line to the plot to help your interpretation.

# QQ plot + QQ line
ihs5_consumption %>% 
  ggplot() +
  stat_qq(aes(sample = consumption_per_person)) +
  stat_qq_line(aes(sample = consumption_per_person))

You can add a plot title using ggtitle("") as in “histogram and you can change the stat_qq_line() color if you so wish by adding the col="" argument.

# QQ plot + QQ line (in red)
ihs5_consumption %>% 
  ggplot() +
  stat_qq(aes(sample = consumption_per_person)) +
  stat_qq_line(aes(sample = consumption_per_person), colour = "red") +
  ggtitle("Food consumption QQ-plot")

Exercise: qq plot
  1. generate a qq plot with a 1:1 line.
  2. Label it appropriately assuming that these are data for Food consumption per household in kilograms per week.

6.2.3 Box plot

Box plots give summary of the minimum, first quartile, median, third quartile inter quartile range, maximum and outlier values in your dataset. They are used for univariate data but can be split based on a factorial variable e.g gender or region. The function that is used to call for a boxplot is geom_boxplot() and the argument is vector data. Let us try plotting using the data we generated earlier.

# Boxplot
ihs5_consumption %>% 
  ggplot() +
  geom_boxplot(aes(consumption_per_person))

Let’s try a different orientation

# Boxplot - changing the orientation

ihs5_consumption %>% 
  ggplot() +
  geom_boxplot(aes(consumption_per_person)) +
  coord_flip() 

You can choose to label your boxplot with main title, color and label the axis similar to what we did for histograms. This time however, we label y-axis using ylab() argument.

#Boxplot - changing the orientation

ihs5_consumption %>% 
  ggplot() +
  geom_boxplot(aes(consumption_per_person), colour = "dark blue") +
  coord_flip() +
  ylab("Food consumption (g/day)") +
  ggtitle("Boxplot of food consumption per person") 

The thick black line in the centre of the boxplot corresponds to the median value of the data (half the values are smaller, half are larger). The bottom of the box (the blue shaded area) is the first quartile of the data, Q1 (25% of the values are smaller), and the top of the box is the third quartile of the data, Q3 (25% of the values are larger).

In exploratory data analysis we call the quantity H = Q3-Q1 the “h-spread”. R calculates what are known as “inner fences” of the data which are at Q1-1.5*H and Q3+1.5*H The “whiskers” above and below the box join the Q1 to the smallest data value inside the inner fences, and Q3 to the largest value inside the inner fences. If there are values outside the inner fences then these appear as points on the plot.

It is possible to produce a graph in which separate boxplots are produced for different levels of a factor. As an example, we would like to understand how food is consumed in the three regions in Malawi. The values are stored in the variable called region.

We then want to plot our data split by the corresponding region we have sampled. We use the function geom_boxplot() but this time we add a new variable.

#Boxplot - by region

 ihs5_consumption %>% 
  ggplot() +
  geom_boxplot(aes(consumption_per_person, region))

Now, we can delete the x-axis label using the xlab(), label the y-axis and change the title to reflect the new variable.

# Boxplot - by region 

 ihs5_consumption %>% 
  ggplot() +
  geom_boxplot(aes(consumption_per_person, region), colour = "dark blue") +
  coord_flip() +
  xlab("") +
  ylab("Food consumption (g/day)") +
  ggtitle("Boxplot of food consumption per person per region") 
Exercise: Box plot

Using the data create:

  1. three boxplots one for each the regions and
  2. exclude the label in the x-axis,
  3. label the boxplots appropriately.
  4. are there any outliers in your data?

6.2.4 Bar plot

This allows us to create a bar chart where the heights of the bars are based on the values given by the vector input. The argument that is used to call for a barplot is geom_bar() and the argument is our region data. There are additional options for giving names to each of the bars, for instance, and for coloring the bars, as you have seen for other earlier plots. This function usually works well when you have tabular data. The simplest form for the function geom_bar() is given below.

#Bar plot
  ihs5_consumption %>% 
  ggplot() +
  geom_bar(aes(region))

# Checking the results of the barplot
table(ihs5_consumption$region)

  1   2   3 
805 469 438 

You can check the results by using the function table(), which provide you a count per each variable.

Also, you can choose to add labels to bar plot as earlier mentioned for the previous plots. You can as well change the color of the bars.

  ihs5_consumption %>% 
  ggplot() +
  geom_bar(aes(region)) +
  xlab("Regions") +
  ylab("count") +
  ggtitle("Number of foods reported per region") 

You can also change the axis, by using the ylim() function

# Changing limits and colour
  ihs5_consumption %>% 
  ggplot() +
  geom_bar(aes(region),  fill = "light blue") +
  ylim(0,810) +
   xlab("Regions") +
  ylab("count") +
  ggtitle("Number of foods reported per region") 

Question

What would it happen if you change the y-axis limit from (0, 810) to (0, 800)?

You can also change the colour by each site, that will provide a distinct colour for each site.

# Changing limits and colour by region

  ihs5_consumption %>% 
  ggplot() +
  geom_bar(aes(region,  fill = region)) +
  ylim(0,810) +
   xlab("Regions") +
  ylab("count") +
  ggtitle("Number of foods reported per region") 

Exercise: Bar plot
  1. Create a bar plot to show the frequency of the food consumed by region in the sample,
  2. label it and adjust the axis and colour appropriately.

6.3 Multivariate graphs

In this section, we look at graphics that we may create with multiple variables. They are important in checking how two or more variables relate to each other.

6.3.1 Plots

The simplest scatter plot is done using the geom_point() function which takes in two arguments. The first argument represents the x-axis while the second argument is the vector of y-axis.

# The data points per site (x, y)
  ihs5_consumption %>% 
  ggplot() +
  geom_point(aes(region, consumption_per_person)) 

From the scatter plot, you will notice that, by default,it added axis labels that are simply the names of the objects we passed i.e consumption_per_person and region and there is no title. All of these things, can be added as previous graphs.

The list below shows arguments that can be added to the plot function as discussed already:

  • xlab("Region")
  • ylab("Food compsumption (g/day")
  • ggtitle("Food consumption by different regions in Malawi")
 # The changing the colour of the data points per site (x, y)
ihs5_consumption %>% 
  ggplot() +
  geom_point(aes(region, consumption_per_person), 
             colour = "red") + # Define the colour of the symbols
  xlab("Region") +
  ylab("Food compsumption (g/day") +
  ggtitle("Food consumption by different regions in Malawi")

6.3.2 Plot Symbols

In the graphics that we have created so far, we have mostly left the plotting symbol as the default, black, unfilled circle. However, We can change the symbol by using the argument shape.

You can change the plotting symbol by assigning a numeric value using = sign. There are two categories of symbols. Those that range from 0 to 20 and from 21 to 25. For the symbols that range from 21 to 25, in addition to being able to set the colour, we can also set the fill. The fill of the shapes is actually set with the argument fill=, but just like with the argument colour=, we can assign any colour value.

 # Changing the symbol & colour of the data points per site (x, y)
ihs5_consumption %>% 
  ggplot() +
  geom_point(aes(region, consumption_per_person),
             shape = 17,  # Defining the symbol
             colour = "red") + # Defining the colour
  xlab("Region") +
  ylab("Food compsumption (g/day") +
  ggtitle("Food consumption by different regions in Malawi")

Let us change the fill color of the symbol by using the fill argument. Remember that only symbol from 21 to 25 allow that that argument.

 # Changing the symbol, the outline colour and the fill colour of the data points per site (x, y)

ihs5_consumption %>% 
  ggplot() +
  geom_point(aes(region, consumption_per_person),
             shape = 23, # Define the shape
             colour = "red",  # Define outline colour
             fill = "black") + # Define fill colour
  xlab("Region") +
  ylab("Food compsumption (g/day") +
  ggtitle("Food consumption by different regions in Malawi")

We can also set the size of the symbols. We do this with the argument size=. This argument is simply a numeric value indicating how bigger(or smaller) than the usual size we want our points.

ihs5_consumption %>% 
  ggplot() +
  geom_point(aes(region, consumption_per_person), 
    # Next arguments change the symbol (point)
             shape = 23,   # Define the symbol
             colour = "red", # Define the outline colour
             fill = "black",  # Define the fill colour 
             size =3) +      # Define the size
  xlab("Region") +
  ylab("Food compsumption (g/day") +
  ggtitle("Food consumption by different regions in Malawi")

Exercise 3.6
  1. Update plots with different symbols, fill colors and symbol size. You can use any symbol and fill color of your choice.
Tip

Note: not all symbol types accept changing fill color.

6.3.3 Plot types

The plot we have created so far are scatter plots. We can however, use alternative plot types. These may include line plot, step plot and lines with points among others.

Exercise 3.7

Create a plot using the variables consumption_quantity, consumption_per_person.

Exercise 3.8

From your plot in Exercise 3.7 , update plots to differentiate the household size (hh_members) using symbol type and color, fill colors and symbol size. You can use any symbol and fill color of your choice.

Tip

Note: not all symbol types accept changing fill color.

From the dataframe ihs5_consumption, we can plot the data by the different household size on the same plot using colour=.

# Scatterplot of food consumption per person & hh by hh size
ihs5_consumption %>% 
  ggplot()+
  geom_point(aes(consumption_quantity, consumption_per_person, 
                 colour=hh_members)) + # Define colour by hh size
  xlab("Food consumption per person (g/day)") + # Rename x-axis
  ylab("Food consumption per household (g/day)") + # Rename y-axis
  # Adding a title
  ggtitle("Variation of the food consumption per person & houehold by different household size")

You can also change the symbol shape by any variable, for instance, region

# Plotting the food consumption per person & hh by hh size (colour) and region  (shape)
ihs5_consumption %>% 
  ggplot()+
  geom_point(aes(consumption_quantity, consumption_per_person, 
                 shape=region, # Defining shape by region
                 colour=hh_members)) + # Define colour by hh size
  xlab("Food consumption per person (g/day)") + # Rename x-axis
  ylab("Food consumption per household (g/day)") + # Rename y-axis
  # Adding a title
  ggtitle("Variation of the food consumption per person & household by different household size & region")

6.3.4 Adding Legend to plot

Adding a legend to your plot will make your plot easy to translate. From the plot in the previous section, it is not clear what the different colors or shapes represent. A legend provides information for this.The function to use is theme() combined with legend.position().

The first argument to this function is the position of the legend on your plot. This can be done either by using X and Y co-ordinate location or a single string of the form "bottom", "top", "left", "topleft" among others.

We then need to specify the legend text using legend.text argument. This is a vector of text that will be used to label the legend. The order of the text in the vector should correspond to the order of the points in the plot.

We then specify colors, points, and so on, for data added maintaining the ordering.

Lets create the legend for the plot of soil moisture vs temperature at the sites liempe, chitedze and domboshava. Note that a plot must already be active for legend to be used.

# Plotting the food consumption per person & hh by hh size (colour) and region  (shape)
# Changing the position of the legend
ihs5_consumption %>% 
  ggplot()+
  geom_point(aes(consumption_quantity, consumption_per_person, 
                 shape=region, # Defining shape by region
                 colour=hh_members)) + # Define colour by hh size
  xlab("Food consumption per person (g/day)") + # Rename x-axis
  ylab("Food consumption per household (g/day)") + # Rename y-axis
  # Adding a title
  ggtitle("Variation of the food consumption per person & household by different household size & region")+
   theme(legend.position = "bottom") # Changing the position of the legend

One can alternatively use the x,y position on the plot to position the legend

# Plotting the food consumption per person & hh by hh size (colour) and region  (shape)
# Specifying the location of the legend
ihs5_consumption %>% 
  ggplot()+
  geom_point(aes(consumption_quantity, consumption_per_person, 
                 shape=region, # Defining shape by region
                 colour=hh_members)) + # Define colour by hh size
  xlab("Food consumption per person (g/day)") + # Rename x-axis
  ylab("Food consumption per household (g/day)") + # Rename y-axis
  # Adding a title
  ggtitle("Variation of the food consumption per person & household by different household size & region")+
  # Specifying the position of the legend
  theme(legend.position = c(.1, .6)) 

Exercise 3.9

From your previous plot in exercise 3.8, add a legend to the updated plot that differentiate the region using symbol type and color, fill colors and symbol size.

6.3.5 Controlling graphical layout

When we create plots, we may want to present them on the same page for easy comparison. This can be done in two ways, firstly, using the facetting (e.g., facet_wrap()) or using the plot_grid() function.

Using facet function

There are two facet_ functions within the ggplot. The first one facet_wrap is commonly used when you only need to visualise your data based on one categorical variable. It only needs to specify the variable (vars()) by which one you want to separate your data by. When you have more than one categorical variables that you want to split you daya by, the function facet_grid() would allow more flexibility.

# Plotting the food consumption per person & hh by region
ihs5_consumption %>% 
  ggplot()+
  geom_point(aes(consumption_quantity, consumption_per_person)) +
  # Adding the variable for splitting the data
   facet_wrap(vars(region)) +
  xlab("Food consumption per person (g/day)") + # Rename x-axis
  ylab("Food consumption per household (g/day)") + # Rename y-axis
  # Adding a title
  ggtitle("Variation of the food consumption per person & household by different household size & region") 

Using plot_grid() function

This function is not part of the ggplot2 package, therefore it has to be installed and loaded before using it (For more information about packages see Section 4.2.

# Installing the package for the first time
# instal.package("cowplot")

# Loading the library
library(cowplot)

With thhe plot_grid we can set up a graphics using the nrow argument. The argument is a vector of the number of rows and columns into which our device should be split. When we then create and store the graphics, they will be entered into the device across the rows, starting in the top left of the grid.

As an example, let’s use some of the graphs that we have been creating, and plote them together.

First, we are going to plot and save the scatter plot with the faceted region as an object in our environment called graph1.

Tip

Note: If you place parenthesis () around your code when saving the object the object will be printed.

# Saving the graph1: Food consumption per person & hh by region

graph1 <- ihs5_consumption %>% 
  ggplot()+
  geom_point(aes(consumption_quantity, consumption_per_person)) +
  # Adding the variable for splitting the data
   facet_wrap(vars(region)) +
  xlab("Food consumption per person (g/day)") + # Rename x-axis
  ylab("Food consumption per household (g/day)") + # Rename y-axis
  # Adding a title
  ggtitle("Variation of the food consumption per person & household by different household size & region")

graph1

Then, let’s do the same for the box plot and the histogram.

# Saving the graph2: Food consumption per person by region
 (graph2 <- ihs5_consumption %>% 
  ggplot() +
  geom_boxplot(aes(consumption_per_person, region), colour = "dark blue") +
  coord_flip() +
  xlab("") +
  ylab("Food consumption (g/day)") +
  ggtitle("Boxplot of food consumption per person per region"))

# Saving the graph3: Food consumption per person histogram
(graph3 <- ihs5_consumption %>% 
  ggplot() +
  geom_histogram(aes(consumption_per_person), fill = "darkblue") + 
  xlab(expression("g day"^-1)) +
  ggtitle("Histogram of food consumption per person")) 
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Once we have our graphs (objects), let’s plot them together into two rows. We can see that it fills the first row, with the graph1 and graph2, and then the second row with the graph3

# Plotting the three graph together
cowplot::plot_grid(graph1, graph2, graph3, nrow = 2)

Then, we can add labels to each plot by using the function label=. If we use the "AUTO". It will automatically label them from A-Z in the order as they appeard. We can change it to cou

# Plotting the three graph together with label
plot_grid(graph1, graph2, graph3, nrow = 2, labels = "AUTO")

We can customise the labels by changing the label function.

# Plotting the three graph together
cowplot::plot_grid(graph1, graph2, graph3, nrow = 2, 
          labels = c("1)", "2)", "3)"))

We can also change the way it is structure, by plotting two graphs as it was one. Let’s save the two first graphs as one combined graph.

(top_row <- cowplot::plot_grid(graph2, graph3, ncol = 2, labels = "AUTO"))
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Then, we can plot the again using the top_row object.

# Re-arragning the plots
cowplot::plot_grid(top_row, graph1, nrow = 2, labels = c("", "C"))

We can see now that there are two plots are now in the first row (there are considered one graph), and the graph at the bottom (graph1) is spread across the second row.

In addition, we can change the space that each graph is occupying. For instance, we would like to decrease the size of the histogram and the boxplot (top_row). Note that as it is one graph you can not change the size of the histogram or the boxplot indepdently here.

cowplot::plot_grid(top_row, graph1, nrow = 2, labels = c("", "C"), 
          rel_heights = c(0.7, 1, 1))
x <- rnorm(100)

#layout(mat)
layout(x)

hist(x)
#boxplot(x)
qqnorm(x)
plot(x)
Exercise 3.10

Using the iris data, generate

  1. histogram of Sepal Length,
  2. boxplot of Petal Length,
  3. qq plot of Petal Width and
  4. a plot of Sepal Length against Petal Length on the same plot area with equal dimensions.
Exercise 3.11

Adjust, the plot in the previous exercise so that histogram occupies the whole bottom of the plot area and the other three occupy the top of the plot area in equal dimensions.

6.3.6 Saving/Printing plots

Now that we have known how to create graphics, one thing remaining is to print out the output. A number of graphics devices are available, including PDF, PNG, JPEG, and bitmap. If we do not specify the device to use, the default device will be opened, and in R this is the Plot tab.

To print a graph to pdf ,png and jpeg, one must create the device before plotting the graph. This is done by using the functions

pdf("name.pdf")
png("name.pgn")
jpeg("name.jpeg") 

The argument for these functions is the desired name of the document in quotation marks e.g. pdf("myFirstGraphic.pdf"). When this function is run, the plot tab in R will not appear but a pdf of the graph will be produced in the working directory.

Let us create a histogram of 100 random numbers and save it as a pdf document.

# Create a pdf device
pdf("myFirstGraphic.pdf")

# Create a histogram of 100 random numbers
hist(rnorm(100))

# Close the device
dev.off() 

Remember to close the device when done using the dev.off() function, otherwise all your graphics onward will be pdf documents and not any other device e.g the R plot tab.

Exercise 3.12

Print the plot you generated in EXERCISE to a PDF, PNG and JPEG giving it an appropriate name. Remember to close the device