if (!require("devtools")) {
install.packages("devtools")
}::install_github("TomCodd/NutritionTools") devtools
9 Food composition harmonisation and food matching
9.1 Introduction
9.1.1 Food matching
After the FCTs are standardised and harmonised and the food list (i.e., food reported as consumed in the Integrated Household Survey, Wave 5, 2019-2020), we can proceed to match them together. To do so, we are using a standardised list of foods that we called “food dictionary”. As depicted in the ?fig-matching.
9.2 Harmonising the Food Composition Tables
When all the FCTs are standardised, we can use them all together, which is particularly useful when food items and/or nutrient values are missing in the main FCT, then a similar food item could be found in another FCT.
9.3 Getting the Nutrition Tools
NutritionTools is an R package of functions to help with a wide range of calculations and processes that commonly occur when working with nutrition datasets. More information can be found here.
9.3.1 Food composition functions
There are some useful functions that can be downloaded here, and are currently being checked to be added to the NutritionTool package. Those can be loaded into the R environment by running the functions.R
script.
# We also need to import some custom functions in another script:
source(here::here("functions.R")) # Loading nutrition functions
The function source()
will run the called script.
9.4 Getting the FC Standardarised dataset
Download the following folders: KE18 & WA19 from this GitHub repository. Eventually, you could download all of them an create a unique FC library.
Each folder has one script with the FCT id followed by *“_FCT_FAO_Tags”*, and a README file.
# Finding the list of FCTs/FCDBs script in the data folder
list.files("data/", pattern = "*_FCT_FAO_Tags", recursive=FALSE, # so it is not looking into the subfolders
full.names=TRUE)
The next block of code will download and standardise each individual FCT. Then, the FCTs are merged into a common FC library that can be used for food matching.
# Getting the list of FCTs/FCDBs script in the data folder
<- list.files("data/", pattern = "*_FCT_FAO_Tags.R", recursive=FALSE, # so it is not looking into the subfolders
source_fct_name full.names=TRUE)
for(i in source_fct_name){
source(here::here(i))
}
9.4.1 Merging the data
Checking the FCT we have in our data folder
# finding all the cleaned FCTs/FCDBs from the output folder
list.files("data/", pattern = "*_FCT_FAO_Tags", recursive=FALSE, # so it is not looking into the subfolders.
full.names=TRUE)
Now, we can merge all the FCTs/FCBDs into one file. Note that this is posible because they all have been standardised previously.
# finding all the cleaned FCTs/FCDBs from the output folder
list.files("data/", pattern = "*_FCT_FAO_Tags", recursive=FALSE, # so it is not taking the fcts in the folder
full.names=TRUE)%>%
map_df(~read_csv(., col_types = cols(.default = "c"),
locale = locale(encoding = "Latin1")))
We can check that all FCTs that we are expected are there, by using the source_fct
variable that is generated within the standardisation scripts, and the number of foods in each one.
#checking that all FCTs are loaded and
# counting No. of items
%>%
data.df count(source_fct)
9.5 Food matching
First, we need the list of unique foods reported as consumed. In HCES dataset, this is frequently presented as set of standard list of foods. We are also interested in knowing the frequency with each food is reported, and hence their impact and importance for subsequent analysis.
# Read, subset and rename the data
<-
ihs5_consumption read_dta(here::here("data", "mwi-ihs5-sample-data", "HH_MOD_G1_vMAPS.dta")) |>
select(
case_id,
HHID,
hh_g01,
hh_g01_oth,
hh_g02,
hh_g03a,
hh_g03b,
hh_g03b_label,
hh_g03b_oth,
hh_g03c,
hh_g03c_1%>%
) rename(
consumedYN = hh_g01,
food_item = hh_g02,
food_item_other = hh_g01_oth,
consumption_quantity = hh_g03a,
consumption_unit = hh_g03b,
consumption_unit_label = hh_g03b_label,
consumption_unit_oth = hh_g03b_oth,
consumption_subunit_1 = hh_g03c,
consumption_subunit_2 = hh_g03c_1
)
# Getting the food item list
<- hcesNutR::create_dta_labels(ihs5_consumption)
ihs5_consumption
# Getting the food list & frequency of HH
<- ihs5_consumption %>%
food_list count(food_item_code, food_item_name)
Then, we will match those food items with their corresponding food dictionary code(s). There are instances were the matching will be one food reported to many foods in the FCT. For example, wheat flour will be matched to wheat flour refined, and wheat flour wholemeal.
# Food dictionary
<- read.csv("https://raw.github.com/LuciaSegovia/fct/repro/metadata/MAPS_food-dictionary_v3.0.3.csv") dictionary
Then, the unique food dictionary codes (ID_3
) will be used to match the food in the food list to the FCTs.
# Matching
<- read.csv(here::here("data", "fct_ihs5_v2.2.csv")) ihs5
9.6 Dealing with missing values
9.6.1 Combining Tagnames to generate variables
9.6.2 Re-calculating variables
Some varibles need to be recalculated, as part of the harmonisation process and also for quality assurance. One case is Energy (kcal/kJ) which is calculated from the proximate: Protein, Fat, available Carbohydrates, Fibre and Alcohol. Hence, we need to make sure that all these variables are reported and are completed. For instance, if there were missing values in Fat content, that the combination of Tagnames have been performed. In addition, if we are using Carbohydrate by difference, then we should re-calculate that variable as well.
# Re-calculate variables:
%>%
data.df # Calculate available Carbohydrates, by difference
CHOAVLDFg_std_creator() %>%
# Calculate Energy (kcal)
ENERCKcal_standardised() %>%
# calculate Energy (kJ)
ENERCKj_standardised()
Another similar example is Vitamin A (RE/RAE), which is calculated from retinol and the carotenoids (i.e., Beta-carotene equivalents). Similarly, we need to check that those two variables. Note, that beta-carotene eq. is also re-calculated, when possible, from the carotenids and their conversion factors. Hence, first we should check that beta-carotene, alpha-carotene, and beta-crypoxanthin are available.
# Re-calculate variables:
%>%
data.df # Recalculate beta-carotene eq.
CARTBEQmcg_std_creator() %>%
# Recalculate Vitamin A (RAE)
VITA_RAEmcg_std_creator() %>%
# Recalculate Vitamin A (RE)
VITAmcg_std_creator()
9.6.3 Further Readings
- Greenfield, Heather, and D. A. T. Southgate. Food Composition Data: Production, Management, and Use. Rome: FAO, 2003.
- FAO/INFOODS (2012). FAO/INFOODS Guidelines for Checking Food Composition Data Prior to the Publication of a User Table/Database-Version 1.0. FAO, Rome’. Accessed 22 January 2022. https://www.fao.org/3/ap810e/ap810e.pdf.