9 Food composition harmonisation and food matching

9.1 Introduction

9.1.1 Food matching

After the FCTs are standardised and harmonised and the food list (i.e., food reported as consumed in the Integrated Household Survey, Wave 5, 2019-2020), we can proceed to match them together. To do so, we are using a standardised list of foods that we called “food dictionary”. As depicted in the ?fig-matching.

9.2 Harmonising the Food Composition Tables

When all the FCTs are standardised, we can use them all together, which is particularly useful when food items and/or nutrient values are missing in the main FCT, then a similar food item could be found in another FCT.

9.3 Getting the Nutrition Tools

NutritionTools is an R package of functions to help with a wide range of calculations and processes that commonly occur when working with nutrition datasets. More information can be found here.

if (!require("devtools")) {
  install.packages("devtools")
}
devtools::install_github("TomCodd/NutritionTools")

9.3.1 Food composition functions

There are some useful functions that can be downloaded here, and are currently being checked to be added to the NutritionTool package. Those can be loaded into the R environment by running the functions.R script.

# We also need to import some custom functions in another script:
source(here::here("functions.R")) # Loading nutrition functions

The function source() will run the called script.

9.4 Getting the FC Standardarised dataset

Download the following folders: KE18 & WA19 from this GitHub repository. Eventually, you could download all of them an create a unique FC library.

Each folder has one script with the FCT id followed by *“_FCT_FAO_Tags”*, and a README file.

# Finding the list of FCTs/FCDBs script in the data folder

list.files("data/", pattern = "*_FCT_FAO_Tags", recursive=FALSE, # so it is not looking into the subfolders
           full.names=TRUE)

The next block of code will download and standardise each individual FCT. Then, the FCTs are merged into a common FC library that can be used for food matching.

# Getting the list of FCTs/FCDBs script in the data folder
source_fct_name <- list.files("data/", pattern = "*_FCT_FAO_Tags.R", recursive=FALSE, # so it is not looking into the subfolders
           full.names=TRUE)

for(i in source_fct_name){
  source(here::here(i))
}

9.4.1 Merging the data

Checking the FCT we have in our data folder

# finding all the cleaned FCTs/FCDBs from the output folder
list.files("data/", pattern = "*_FCT_FAO_Tags", recursive=FALSE, # so it is not looking into the subfolders.
           full.names=TRUE)

Now, we can merge all the FCTs/FCBDs into one file. Note that this is posible because they all have been standardised previously.

# finding all the cleaned FCTs/FCDBs from the output folder
list.files("data/", pattern = "*_FCT_FAO_Tags", recursive=FALSE, # so it is not taking the fcts in the folder
           full.names=TRUE)%>% 
  map_df(~read_csv(., col_types = cols(.default = "c"), 
                   locale = locale(encoding = "Latin1")))

We can check that all FCTs that we are expected are there, by using the source_fct variable that is generated within the standardisation scripts, and the number of foods in each one.

#checking that all FCTs are loaded and 
# counting No. of items 

data.df %>%  
  count(source_fct)

9.5 Food matching

First, we need the list of unique foods reported as consumed. In HCES dataset, this is frequently presented as set of standard list of foods. We are also interested in knowing the frequency with each food is reported, and hence their impact and importance for subsequent analysis.

# Read, subset and rename the data
ihs5_consumption <-
    read_dta(here::here("data", "mwi-ihs5-sample-data", "HH_MOD_G1_vMAPS.dta")) |>
    select(
        case_id,
        HHID,
        hh_g01,
        hh_g01_oth,
        hh_g02,
        hh_g03a,
        hh_g03b,
        hh_g03b_label,
        hh_g03b_oth,
        hh_g03c,
        hh_g03c_1
    ) %>%
    rename(
        consumedYN = hh_g01,
        food_item = hh_g02,
        food_item_other = hh_g01_oth,
        consumption_quantity = hh_g03a,
        consumption_unit = hh_g03b,
        consumption_unit_label = hh_g03b_label,
        consumption_unit_oth = hh_g03b_oth,
        consumption_subunit_1 = hh_g03c,
        consumption_subunit_2 = hh_g03c_1
    )


# Getting the food item list

ihs5_consumption <- hcesNutR::create_dta_labels(ihs5_consumption)


# Getting the food list & frequency of HH

food_list <- ihs5_consumption %>%
  count(food_item_code, food_item_name)

Then, we will match those food items with their corresponding food dictionary code(s). There are instances were the matching will be one food reported to many foods in the FCT. For example, wheat flour will be matched to wheat flour refined, and wheat flour wholemeal.

# Food dictionary
dictionary <- read.csv("https://raw.github.com/LuciaSegovia/fct/repro/metadata/MAPS_food-dictionary_v3.0.3.csv")

Then, the unique food dictionary codes (ID_3) will be used to match the food in the food list to the FCTs.

# Matching 

ihs5 <- read.csv(here::here("data", "fct_ihs5_v2.2.csv"))

Activity:

Thinking about food matches between HCES food list and food composition.

Do you think that the food matches selected represents well what the food reported as consumed in the survey?
If not, what other foods would you add/remove/change to?
For the foods without matches, can you think of what food items you would use?
Could you identify any food matches in the Food Composition Library that will be a good match for those foods that have no matches?
What nutrients are important for each of the foods selected?

9.6 Dealing with missing values

9.6.1 Combining Tagnames to generate variables

9.6.2 Re-calculating variables

Some varibles need to be recalculated, as part of the harmonisation process and also for quality assurance. One case is Energy (kcal/kJ) which is calculated from the proximate: Protein, Fat, available Carbohydrates, Fibre and Alcohol. Hence, we need to make sure that all these variables are reported and are completed. For instance, if there were missing values in Fat content, that the combination of Tagnames have been performed. In addition, if we are using Carbohydrate by difference, then we should re-calculate that variable as well.

# Re-calculate variables:
data.df %>% 
  # Calculate available Carbohydrates, by difference
  CHOAVLDFg_std_creator() %>%  
  # Calculate Energy (kcal)
  ENERCKcal_standardised() %>% 
  # calculate Energy (kJ)
  ENERCKj_standardised()

Another similar example is Vitamin A (RE/RAE), which is calculated from retinol and the carotenoids (i.e., Beta-carotene equivalents). Similarly, we need to check that those two variables. Note, that beta-carotene eq. is also re-calculated, when possible, from the carotenids and their conversion factors. Hence, first we should check that beta-carotene, alpha-carotene, and beta-crypoxanthin are available.

# Re-calculate variables:
data.df %>% 
# Recalculate beta-carotene eq.
    CARTBEQmcg_std_creator() %>% 
# Recalculate Vitamin A (RAE)
   VITA_RAEmcg_std_creator() %>%  
# Recalculate Vitamin A (RE)
    VITAmcg_std_creator()

9.6.3 Further Readings

Greenfield, Heather, and D. A. T. Southgate. Food Composition Data: Production, Management, and Use. Rome: FAO, 2003.
FAO/INFOODS (2012). FAO/INFOODS Guidelines for Checking Food Composition Data Prior to the Publication of a User Table/Database-Version 1.0. FAO, Rome’. Accessed 22 January 2022. https://www.fao.org/3/ap810e/ap810e.pdf.