8 Food Composition Table & Databases: Standardisation

8.1 Introduction

8.1.1 Selecting food composition data

When selecting the food composition table or database (FCT) that will be used, it is good to reflect on the following questions:

Relevancy for the study/context (e.g., is that FCT/FCBD geographically and culturally close to our survey scope?).
FCT availability & missing values (e.g., are relevant foods and nutrients reported?).
Data quality and reporting (e.g., what are the method of analysis and metadata available?).

8.1.2 Objective

This document provide, together with the template document, the steps and description for cleaning and standardising FCTs from diverse sources. More details about the cleaned data that can be found in the repository is documented in this folder (documentation).

For easy navigation and use of this script it is recommended to use Rstudio. In RStudio please click the “Show Document Outline” button to the right of the source button, at the top right of this window. This will allow for easier navigation of the script.

8.1.3 Environment Prep

First we need to check what (Section 4.2) are installed. If you have run this template before in this RStudio project and are sure these packages are already installed, you can comment out (put a hash at the start of) line 20, and skip it.

# Run this to clean the environment
rm(list = ls())

# Loading libraries

library(readxl) # reading and writing excel files
library(stringr) # character string handling
library(dplyr) # cleaning data
library(here) # file management

8.2 Obtaining the raw (FCT) file

8.2.1 Data license

Before using any dataset, we recommend to check licensing conditions & record the data source, you can use the README template.

8.2.2 Downloading the data

If the data is publicly available online, usually you only need to run the code below to obtain the raw files. Remember you only need to do it the first time! Then, the data will be stored in the folder of your choice (see below).

For instance, many raw files can be found provided by the FAO here, in various formats.

Once the link to the data is found, check what file type it is, and paste the direct file link to replace the fill-in value below.

8.2.3 File names conventions

We advise to use the ISO code (2 digits) (see ISO 3166 2-alpha code for further information) of the country or the region of the FCT scope, plus the two last digits of the year of publication to name, both the folder which will contain the data and the scripts related to the FCT. For instance, Western Africa FCT, 2019 will be coded as WA19. This will help with the interoperability, reusability and findability of the data. Also, to streamline the work in the future. That name convention will be used also as the identifier of the FCT.

Note that you need to create the folders to store the FCT.

f <- "https://www.fao.org/fileadmin/user_upload/faoweb/2020/WAFCT_2019.xlsx"
 
 download.file(f, 
             destfile = here::here("data", # data folder
                                   'WA19', #FCT folder
                                   #FCT file
                                   "WAFCT_2019.xlsx"),  
             method="wininet", #use "curl" for OS X / Linux, "wininet" for Windows
             mode="wb")

If using an RStudio project, and you put the .R file and the data file in the same folder as the RStudio project or within a subfolder, files and folders are much easier to navigate as your project/here::here location automatically moves to the main project folder.

8.2.4 Getting your file into R

If you are using an RStudio project but used a different download method, or already have the file you want to process on your computer, or are using base R we can still use the here::here() function, however we will have to find the file first.

The best practice is to put the file in the same folder as this script, or in a folder within the project. If this is done, then use here::here() to find your current working directory, and then navigate to the file’s folder. More information about the here package can be found here.

# Run this script to see where is your directory
here::here()

In order to navigate R to the file, you have to include each subfolder between the here::here() location and the file itself. So for our file structure, we have the ‘data’ folder, the ‘FCT’ folder and the FCT file).

Find your file in your project, and then direct here::here() to it.

# This identifies the file and file path, and saves it as a variable
FCT_file_location <- here::here('data','WA19', "WAFCT_2019.xlsx")

8.3 Importing the data (loading the data)

8.3.1 Importing Files

After having found the file on your system that we want to import it. FCT files come in many different forms - the most common being “.xlsx” files and “.csv” files. Methods to import both of these file types will be covered - please navigate to the relevant subsection.

During import, a identifier for the FCT is created and added to the table. Please replace ‘WA19’ from the next code chunck with the FCT id., comprised of the countries ISO 3166 2-alpha code, and the year the FCT was produced (e.g. for the Western Africa FCT from 2019, the reference would be ‘MWA9’).

This should be the same as the folder name explained in (section 1.3)[link-to-section].

# This is an example of the name 
FCT_id <- 'WA19' # Change two first letter for your ISO 2 code & the two digits for the last two digits of the year of publication.

If you used the download method above Section 8.2.2 then we will see the same location as specified there to specify the file. We can simply use the object FCT_file_location, or copy the contents inside the here::here() and use it to fill the here::here() in the line of code below.

Importing .xlsx files

For the spreadsheet (excel-type of files), first, you need to check what information is provided and which of the sheet is providing the FC data.

# Checking the sheets

readxl::excel_sheets(FCT_file_location)

data.df <- readxl::read_excel(FCT_file_location, #The file location, as                            identified in section 2.1
                              sheet = 5  # Change to the excel sheet where                              the FCT is stored in the excel file
                              ) %>%  
  mutate(source_fct = FCT_id)  #Creates the source_fct column and fills with                 a id for this FCT, as filled in in section 2.2.

Importing .csv files

data.df <- read.csv2(FCT_file_location, #The file location, as identified in section 2.1
                     sep = ",") %>%  # Replace w/ other symbol if needed
  mutate(source_fct = FCT_id) #Creates the source_fct column and fills with a id for this FCT, as filled in in section 2.2.

Once imported, it is important to check the data.frame created from the csv, by using head(data.df) or clicking on its entry in the Environment panel of RStudio (This second option is not advised with very large files, however, as it can be slow).

If the data shown by doing this has all its columns combined, with a symbol in-between, then that symbol (e.g. ‘;’) is the separator for that csv. Replace comma in the sep = "," line from the code block above with the new symbol, and run the entire block again.

Visually checking the data

# Checking the dataframe
head(data.df)

Checking the loaded data

Question

How many rows & columns have the data?

You can use the function dim() to answer to check the number of rows and column.

Answer

dim(data.df) # rows & columns

Other useful functions to evaluate the structure of the data are:

# Structure (variable names, class, etc.)
str(data.df)

# Checking the last rows and columns
tail(data.df)

Using View()

The function View() can be use for opening the dataframe in a tab. However, if the dataset is very very big, it may crash the R session.

After checking that the correct FCT file have loaded the, then proceed. If not, find the correct file and import it instead.

8.4 Cleaning (tidying) and standardising the data

8.4.1 Formatting FCT into a tabular format

Trimming dataframe rows

Running this will trim down the table to only include the row numbers between x and y - replace x and y with your desired values. If you wanted to include multiple row ranges, that is also possible - use comments to differentiate between different row ranges and individual rows. e.g. if you wanted to include rows a:b, row c, row e, and rows g:x, then the code would be slice(a:b, c, e, g:x).

data.df %>% slice(1:5) %>% knitr::kable()

Trimming dataframe columns

If you only wish to include certain columns/nutrients, then you might wish to remove the unnecessary columns to make the dataframe easier to read and manage. This can be done through 2 methods; either by selecting the names of the columns you want to keep, or by selecting the names of the columns you want to remove.

Keep specified columns only

This method requires creating a list of column names you want to keep - for #example the line below would select the columns ‘Energy_kcal’, ‘Fatg’, ‘Protein_g’, but nothing else. If you wish to trim the columns this way, replace the items in the first line with the column names you want to keep, then run the code block below.

# Storing the variables you want to keep
columns_to_keep <- c('Scientific name', 'Energy\r\n(kJ)')

# Selecting the variables
data.df %>% select(columns_to_keep) %>% 
  head(5) %>% 
  knitr::kable()

Remove specified columns, keep all others

Sometimes it is easier to list the columns you want to remove, rather than the ones you want to keep. The code block below identifies the columns to be removed (‘VitB12_mcg’ and ‘Calcium_mg’ in the example), and then removes them. If you wish to trim the columns this way, replace the items in the first line with the column names you want to remove, then run the code block below.

This works in a similar way to the codeblock in section 3.3.1, however by putting an exclamation mark (!) before the list of columns, it inverts the selection - instead of instructing R to keep only the listed columns (as with the codeblock above), it instructs R to keep all columns but the listed ones.

# Selecting the variables that you don't want to keep
columns_to_remove <- c('Food name in French', 'Sum of proximate components\r\n(g)') 

data.df %>% select(!columns_to_remove) %>% 
  head(5) %>% 
  knitr::kable()

8.4.2 Creating food groups variable

Some food composition tables reported food groups that were placed as the first row of each category, however that it is not a data structure that can be used, as we need one column per variable. Hence, the food group names are extracted from the rows, and are allocated as a new attribute of each food (e.g., fish and fishery products to catfish). The food groups are stored in a new column (food_group).

This process requires multiple steps, each covered in their own subsections below: Extracting food group names, Creating the variable, and checking changes in the structure.

Extracting food group names

#Creates a list of the food groups using their unique row structure in the table to identify them

fgroup <- data.df %>% 
  filter(is.na(`Food name in English`), !is.na(Code)) %>%
  pull(Code) %>%
  stringr::str_split_fixed( '/', n = 2) %>% 
  as_tibble() %>%
  pull(V1) 

group.id <-  unique(str_extract(data.df$Code, "^[:digit:]{2}\\_"))[-1]

9 Creating the food_group variable in the FCT

# Removes any rows without a food description entry (the food group name rows, and a row that have already been used for naming)

data.df <- data.df %>% #Identifies the food group number from the fdc_id, and applies the correct food_group from the fgroup list to the food_group column 
  mutate(food_group = ifelse(grepl("01_", Code), fgroup[1],
                      ifelse(grepl("02_", Code), fgroup[2],
                      ifelse(grepl("03_", Code), fgroup[3],
                      ifelse(grepl("04_", Code), fgroup[4],
                      ifelse(grepl("05_", Code), fgroup[5],
                      ifelse(grepl("06_", Code), fgroup[6],
                      ifelse(grepl("07_", Code), fgroup[7],
                      ifelse(grepl("08_", Code), fgroup[8],
                      ifelse(grepl("09_", Code), fgroup[9], 
                      ifelse(grepl("10_", Code), fgroup[10], 
                      ifelse(grepl("11_", Code), fgroup[11], 
                      ifelse(grepl("12_", Code), fgroup[12],
                      ifelse(grepl("13_", Code), fgroup[13],
                      ifelse(grepl("14_", Code), fgroup[14],                          'NA')))))))))))))))

9.0.1 Diving combined variables into two (or more) columns

In some cases, to minimise the number of missing values, two similar food components are combined into one column/ variable. For example, when Beta-carotene and Beta-Carotene Eq. were reported in the same column and identified using brackets ([]) around the component values (CARTEB or [CARTEBEQ]). However, because we are aiming to have one variable per column, we need to divide the combined variables into two (or more) columns, as such the Beta-carote variable was separated into two independent columns (CARTEBmg, CARTBEQmg). Note that when we are separating these two food components into the new variable, there are several things that we need to check: we need to check whether they are present in the original FCT, otherwise we may be overwriting values.

The variable is not present, then create a new column.
The variable is present: only overwrite if the value is missing in the original column.
Do not remove the brackets from the original variable. As we will add information (metadata) about the quality of the value after removing the brackets.

9.0.2 Renaming variable names: Food components definition and re-naming

Each FCT has its own variable names, including for similar food components. Some FCTs included information related to the FAO/ INFOODS food component identifiers (Tagnames) while other did not. In order to merge all data from various FCTs we need to harmonise the names of all the variables and particularly the food components of interest. To do so, we evaluated and renamed them with the most appropriate Tagname. Other variables (e.g., food id, food name or food description) were renamed to a common variable name, for instance, fdc_id, food_desc.

Question

Are the food component variable names using Tagnames & units, i.e., [Variable][unit] (NAmg)?

If yes, use that information to rename food components

# Checking current names
names(data.df)

If not, do they provide Tagnames information?

# Checking for Tagnames
data.df %>% 
  head(5) %>% 
  knitr::kable()

In this case we can use the Tagname information on the two first rows to rename our variables.

# Automatic renaming 

for( i in 8:62){ #Loops through each column between column 8 and 64
  first_row <- toString(data.df[1, i]) #Takes the first row for that column and assigns it to a variable
  second_row <- toString(data.df[2, i]) #Takes the second row for that column and assigns it to a variable
  split_string <- str_split(first_row, "\\(") #Splits the first row around "(", assigning the two resulting strings to a variable
  units_int <- gsub("\\*|\\(|\\)", "", split_string[[1]][length(split_string[[1]])]) #Separates the units out from the split_string (everything after the last open bracket)
  colnames(data.df)[i] <- paste0(second_row, units_int) #The column name is replaced with row 1 and the units from row 2
}

# Checking new names

names(data.df)

If information is not provided the manual renaming of the food components would be necessary and hence, the identification of each food component to its Tagname.

The identification of food components

Information on the food components and their description should be sought in for FCT. We advise to use the (Tagnames). Some minor changes in the Tagnames are introduced to be compliant with R conventions. E.g., removing spaces in variable names, changing symbols to characters (e.g., µg to mcg), or standardising the name formatting from using underscores and/or parenthesis to using only underscores. Also, changing dashes (-) to underscores (_). Note that within the Tagnames, the dash is used to denote that the method for obtaining that (component) value is unknown. This is important for the quality assessment of the data.

Also, we also assumed that all the variables labelled as “standardised” were combined or recalculated variables. In ?tbl-tagnames, we provided a list of the most relevant food information and components and their standard variable names, including the Tagnames. It also provides information about the variable name reference (i.e., FAO/INFOODS, 2012) together with the variable source (as it was compiled from the FCT or should be calculated), and the type of variable (e.g., numeric, etc).

read.csv(here::here("data", "fct_variables_standards.csv")) |>
  stringr::str_squish() |> knitr::kable()

Warning in stri_replace_all_regex(string, pattern,
fix_replacement(replacement), : argument is not an atomic vector; coercing

?(caption)

x
c(“fdc_id”, “food_desc”, “food_group”, “scientific_name”, “source_fct”, “nutrient_data_source”, “Edible_factor_in_FCT”, “Edible_desc”, “specific_gravity”, “SOPg”, “SOPg_std”, “ASHg”, “ASHg_std”, “ENERCkJ”, “ENERCkJ_standardised”, “ENERCkcal”, “ENERCkcal_standardised”, “WATERg”, “PROCNTg”, “NTg”, “XN”, “FATg”, “FAT_g”, “FATCEg”, “FATg_standardised”, “CHOAVLDFg”, “CHOAVLg”, “CHOCSMg”, “CHOAVLMg”, “CHOAVLDFg_std”, “FIBTGg”, “NSPg”, “FIBCg”, “ALCg”, “ALCg_100mL”, “SUGARg”, “FASATg”, “FAMSg”, “FAPUg”, “FATRNg”, “F22D6N3g”, “F20D5N3g”, “CHOLEmg”, “CHOL_mg”, “RETOLmcg”, “VITAmcg_std”, “VITAmcg”, “VITA_RAEmcg”, “VITA_RAEmcg_std”, “CARTBEQmcg”, “CARTBEQmcg_std”, “CARTAmcg”, “CARTBmcg”, “CRYPXBmcg”, “VITEmg”, “TOCPHAmg”, “TOCPHBmg”, “TOCPHGmg”, “TOCPHDmg”, “TOCTRAmg”, “TOCTRBmg”, “TOCTRGmg”, “THIAmg”, “THIAHCLmg”, “RIBFmg”, “VITB6Amg”, “VITB6Cmg”, “VITB6_mg”, “VITB6_mg_standardised”, “FOLDFEmcg”, “FOLmcg”, “FOLACmcg”, “FOLFDmcg”, “FOLSUMmcg”, “FOL_mcg”, “NIAEQmg”, “NIAmg”, “NIATRPmg”, “TRPmg”, “VITB12mcg”, “VITCmg”, “ASCLmg”, “VITDEQmcg”, “VITDmcg”, “CHOCALmcg”, “ERGCALmcg”, “CHOCALOHmcg”, “ERGCALOHmcg”, “CAmg”, “MGmg”, “MNmg”, “Pmg”, “FEmg”, “NAmg”, “Kmg”, “CUmg”, “ZNmg”, “SEmcg”, “IDmcg”)
c(” Food identifier as per original source (FCT) “,” Food name and description as per original source (FCT) “,” Food group as per original source (FCT) “,” Scientific name of the food “,” Id. of the FCT (BiblioID), see Table 1 “,” References or data source for the nutrient values “,” Edible portion reported as fraction edible, imputed from the original FCT, or calculated from refuse in the FCT “,” Information provided on the edible portion “,” Density reported in the original units as in the original FCT “,” Sum of Proximate in g per 100g EP as reported in the original FCT “,” Sum of Proximate in g per 100g EP (Eq. 3.3). “,” Ashes in g per 100g of EP. “,” Ashes in g per 100g of EP (Eq. 3.4) “,” Energy in kJ/ 100g of EP as reported in the original FCT “,” Energy in kJ/ 100g of EP (Eq. 3.5) “,” Energy in kcal/ 100g EP as reported in the original FCT “,” Energy in kcal/ 100g EP (Eq. 3.5) “,” Water/ moisture content in g per 100g of EP “,” Protein in g per 100g of EP, as reported in the original FCT and assumed to be calculated from nitrogen (NTg) content “,” Nitrogen, total “,” Nitrogen conversion factor “,” Total fat content in g per 100g of EP. Sum of triglycerides, phospholipids, sterols and related compounds obtained by a mixed solvent extraction “,” fat content unknown method of calculation in g per 100g of EP “,” Fat content measured using continuous extraction (e.g., Soxhlet method) expressed in g per 100g of EP “,” Total fat content in g per 100g of EP regardless of the Tagname (and analytical method used). Including FATg, FAT_g and FATCEg. “,” Available carbohydrates calculated by difference (E.g., 100 - (weight in grams of water + protein + fat + ash + fibre + alcohol in 100 g of food)) imputed from the original source “,” Available carbohydrates calculated by weight in g per 100g of EP “,” Carbohydrates, total; calculated by summation. This values is the sum of the sugars, starches, oligosaccharide, and dietary fibre in g per 100g of EP.”, ” Carbohydrates, available in monosaccharide equivalent. This value includes the free sugars plus dextrins, starch, and glycogen in g per 100g of EP “,” Available carbohydrates calculated by difference (Eq. 3.6). “,” Total dietary fibre by AOAC Prosky method expressed in g per 100g of EP. “,” Non-starch polysaccharide, (Englyst fibre) expressed in g per 100g of EP.”, ” Fibre, crude in g per 100g of EP. “,” Alcohol in g per 100g “,” Alcohol expressed in g per 100mL “,” Total sugar (the sum of free mono- and disaccharides) in g per 100g of EP “,” Fatty acids, total saturated in g per 100g “,” Fatty acids, total monounsaturated in g per 100g “,” Fatty acids, total polyunsaturated in g per 100g of EP. “,” Fatty acids, total trans in g per 100g EP. “,” Docosahexaenoic acid (DHA) (22:6 n-3) in g/100g of EP. “,” Eicosapentaenoic acid (EPA) (20:5 n-3) in g/100g of EP. “,” Cholesterol determined by enzymatic or chromatographic method in mg per 100g of EP. “,” Cholesterol by unknown method of determination in mg per 100g of EP. “,” Retinol in mcg per 100g of EP. “,” Vitamin A (Retinol Eq. (RE) in mcg per 100g of EP calculated using Eq.3.7. “,” Vitamin A (Retinol Eq. (RE) = mcg retinol + 1/6 mcg - carotene + 1/12 mcg other pro-vitamin A carotenoids) in mcg per 100g of EP. “,” Vitamin A (Retinol Activity Eq. (RAE) = mcg retinol + 1/12 mcg - carotene + 1/24 mcg other provitamin A carotenoids) in mcg per 100g of EP. “,” Vitamin A (Retinol Activity Eq. (RAE)) in mcg per 100g of EP calculated using Eq.3.6.”, ” Beta-carotene equivalents, is the sum of the beta-carotene + 1/2 quantity of other carotenoids with vitamin A activity, expressed in mcg per 100g of EP. “,” Beta-carotene equivalents, expressed in mcg per 100g of EP and calculated using Eq. 3.7. It also can be imputed using function "CARTBEQmcg_std_imputation" (See section 3.4.) “,” Alpha-carotene in mcg per 100g of EP. “,” Beta-carotene in mcg per 100g of EP. “,” Beta-cryptoxanthin in mcg per 100g of EP. “,” Vitamin E calculated by summation of the vitamin E activities of the active tocopherols and tocotrienols expressed as alpha-tocopherol equivalent in mg per 100g of EP (alpha-tocophero eq. = alpha-tocopherol + 0.4 beta-tocopherol + 0.1 gamma-tocopherol+ 0.01 delta-tocopherol+ 0.3 alpha-tocotrienol + 0.05 beta-tocotrienol + 0.01 gamma-tocotrienol) “,” Alpha-tocopherol in mg per 100g of EP. “,” Beta-tocopherol in mg per 100g of EP. “,” Gamma-tocopherol in mg per 100g of EP. “,” Delta-tocopherol in mg per 100g of EP. “,” Alpha-tocotrienol in mg per 100g of EP. “,” Beta-tocotrienol in mg per 100g of EP. “,” Gamma-tocotrienol in mg per 100g of EP. “,” Thiamin, vitamin B1 analysed and expressed as thiamin in mg per 100g of EP. “,” Thiamin hydrochloride, vitamin B1 analysed and expressed as thiamin hydrochloride in mg per 100g of EP. “,” Riboflavin (Vitamin B2) in mg per 100g of EP. “,” Vitamin B6 determined by analysis in mg/100g of EP. “,” Vitamin B6 determined by calculation (sum of pyridoxal, pyridoxamine and pyridoxine) in mg per 100g EP. “,” Vitamin B6 by unknown method in mg/100g of EP. “,” Vitamin B6 in mg/100g of EP regardless of the Tagname (and analytical method used). Including VITB6Amg, VITB6Cmg and VITB6_mg.”, ” Dietary folate equivalents calculated as food folate + 1.7 x synthetic folic acid and expressed in mcg per 100g of EP. “,” Folate, total (food folate + synthetic folic) in mcg per 100g EP. Includes both conjugated and free folate (determined by microbiological assay). “,” Folic acid. Synthetic folic acid used in fortification and expressed in mcg per 100g EP. “,” Folate food, naturally occurring food folate (determined by microbiological assay) in mcg per 100g EP. “,” Folate, sum vitamers in mcg per 100g EP. It includes mostly tetrahydrofolate, 5-methyltetrahydrofolate, 5-formyltetrahydrofolate, 10-formylfolic acid, 10-formyldihyrdofolate and folic acid (determined by HPLC). “,” Folate total, unknown method “,” Niacin equivalents, total. Preformed niacin plus niacin equivalents from tryptophan (TRP) in mg per 100g EP. “,” Niacin, prefrormed in mg per 100g EP. “,” Niacin equivalents, from tryptophan. 1/60 x tryptophan (TRP) expressed in mg per 100g EP.”, ” Tryptophan in mg per 100g of EP (includes only L-tryptophan). “,” Vitamin B12 (cobalamin, including all the active forms in foods) expressed in mcg per 100g EP.”, ” Vitamin C (L-ascorbic acid plus Ldehydro-ascorbic acid). Usually analyzed by HPLC and expressed in mg per 100g of EP. “,” L-ascorbic acid in mg per 100g of EP. Titrimetry can normally analyze L-ascorbic acid only”, ” Vitamin D calculated as the sum of Vitamin D3 + D2 + 5 x 25-hydroxycholecalciferol and expressed in mcg per 100g of EP. “,” Vitamin D calculated by summation of ergocalciferol and cholecalciferol and expressed in mcg per 100g of EP. This definition is mostly used. “,” Cholecalciferol (D3) in mcg per 100g of EP. “,” Ergocalciferol (D2) in mcg per 100g of EP. “,” 25-hydroxycholecalciferol in mcg per 100g of EP. “,” 25-hydroxyergocalciferol “,” Calcium in mg per 100g of EP. “,” Magnesium in mg per 100g of EP. “,” Manganese in mg per 100g of EP. “,” Phosphorus in mg per 100g of EP. “,” Iron in mg per 100g of EP. “,” Sodium in mg per 100g of EP. “,” Potassium in mg per 100g of EP. “,” Copper in mg per 100g of EP. “,” Zinc in mg per 100g of EP. “,” Selenium in mcg per 100g of EP. “,” Iodine in mcg per 100g of EP. “)
c(” - “,” - “,” - “,” - “,” - “,” - “,” - “,” - “,” - “,” - “,” - “,” - “,” - “,” - “,” - “,” - “,” - “,” - “,”FAO/INFOODS, 2012 “,”FAO/INFOODS, 2012 “,”FAO/INFOODS, 2012 “,”FAO/INFOODS, 2012 “,” - “,”FAO/INFOODS, 2012 “,”- “,”FAO/INFOODS, 2012 “,”FAO/INFOODS, 2012 “,”FAO/INFOODS, 2012 “,”FAO/INFOODS, 2012 “,” - “,”FAO/INFOODS, 2012 “,”FAO/INFOODS, 2012 “,”FAO/INFOODS, 2012 “,”- “,”- “,”- “,”- “,”- “,”- “,”- “,”- “,”- “,”- “,”- “,”-“,” FAO/INFOODS, 2012 “,”- “,”- “,” FAO/INFOODS, 2012 “,”- “,”- “,”- “,”- “,”- “,” FAO/INFOODS, 2012 “,”- “,”- “,”- “,”- “,”- “,”-“,”-“,”- “,”-“,”-“,”- “,”- “,”- “,” - “,” FAO/INFOODS, 2012 “,”-“,”- “,”-“,” -“,” -“,” -“,”- “,”-“,” Tagnames, 2007 (accessed on 2022-06-08) “,” - “,”- “,” -“,”-“,”-“,”-“,”- “,”-“,”-“,”-“,”- “,”-“,”-“,”-“,”-“,”-“,”-“,”-“,”-“,”-“)
c(“Compiled”, “Compiled”, ” Compiled “,” Compiled “,” Generated”, ” Compiled”, ” Compiled “,” Compiled “,” Compiled “,” Compiled “,” Calculated “,” Compiled”, ” Calculated “,” Compiled “,” Calculated “,” Compiled “,” Calculated “,” Compiled “,” Compiled “,” Compiled “,” Compiled “,” Compiled “,” Compiled “,” Compiled “,” Calculated “,” Compiled “,” Compiled “,” Compiled “,” Compiled “,” Calculated “,” Compiled”, ” Compiled”, ” Compiled “,” Compiled”, ” Compiled”, ” Compiled “,” Compiled “,”Compiled “,”Compiled “,”Compiled “,” Compiled “,”Compiled “,” Compiled”, ” Compiled “,” Compiled “,” Calculated “,” Compiled “,” Compiled “,” Calculated “,” Compiled”, ” Calculated “,” Compiled”, ” Compiled “,” Compiled “,” Compiled”, ” Compiled “,” Compiled “,” Compiled “,” Compiled “,” Compiled “,” Compiled”, ” Compiled”, ” Compiled”, ” Compiled “,” Compiled”, ” Compiled”, ” Compiled “,” Compiled”, ” Compiled “,” Compiled”, ” Compiled “,” Compiled”, ” Compiled”, ” Compiled”, ” Compiled”, ” Compiled”, ” Compiled”, ” Compiled”, ” Compiled “,” Compiled”, ” Compiled”, ” Compiled”, ” Compiled “,” Compiled”, ” Compiled”, ” Compiled “,” Compiled”, ” Compiled”, ” Compiled”, ” Compiled”, ” Compiled “,” Compiled “,” Compiled “,” Compiled “,” Compiled “,” Compiled”, ” Compiled “,” Compiled “,” Compiled “)
c(“character”, “character”, “character”, “character”, “character”, “character”, “numeric”, “character”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”, “numeric”)

Some of the variables that are frequently manually renamed are, for instance, food code/id, food name, etc. For, easy manual renaming, just change the names in quotes (““) to those in your dataset (if needed), remove/ add as needed.

data.df <- data.df %>%
dplyr::rename( 
  fdc_id = "Code", # Food id/code 
food_desc = "Food name in English", # Food name/description
food_descFR = "Food name in French",# Food name/description
scientific_name = "Scientific name", # Scientific name
Edible_factor_in_FCT = "Edible portion coefficient 1 (from as purchased to as described)", # Edible portion
Edible_factor_in_FCT2 = "Edible portion coefficient 2 (from as described to as eaten)", # Edible portion
nutrient_data_source = "BiblioID/Source") # Reference for NVs reported

names(data.df)[1:5]

Check

Are the variable names = column names?

If not, more formatting is needed (back to previous step)

9.0.3 Standardisation of values

To perform mathematical operations, characters needed to be converted into numeric operator. This includes three steps:

Removing brackets or other special characters.

As, described above in section, special character (“[]”, “()”, “*”)were used to denote “low quality values” and/or alternative (determination) methods. We kept record of those values in metadata files for those cases where the numeric value was extracted and the special character removed. In addition, the variable comments was created in a way of keeping that information as metadata for other users.

#Creating a dataset w/ the values that were of low quality [] trace or normal

var_nut <- data.df %>% select(Edible_factor_in_FCT:VITCmg) %>% colnames() #selecting nutrient variable names where we want to check for quality/trace

#dataset w/ metadata info that will be removed from the dataset for use
metadata <- data.df %>% mutate_at(var_nut,  ~case_when(
  str_detect(. , '\\[.*?\\]') ~ "low_quality", #Looking for things in square brackets to mark as low quality
  str_detect(. , 'tr') ~ "trace", #Looking for things marked as "tr" and labels them as trace
  TRUE ~ "normal_value")) #Else it marks the entry as a normal value

The following chunk is related to formatting the FCT section, as here we are creating new varibles to separate those Tagnames and/or similar food components into their individual columns with their Tagnames.

#Extracting variables calculated with different (lower quality) method 
#and reported as using [] and removing them from the original variable

data.df <- data.df %>% 
#Creating calculated values from the lower quality method and removing the original values from the original variable
  mutate(FATCEg = str_extract(FATg, '(?<=\\[).*?(?=\\])'),  
   #e.g. this creates the FIBCg value from the FIBTGg value
         FIBCg =  str_extract(FIBTGg, '(?<=\\[).*?(?=\\])'),
         CARTBmcg = ifelse(is.na(CARTBmcg), str_extract(CARTBEQmcg, '(?<=\\[).*?(?=\\])'), CARTBmcg), 
         TOCPHAmg = ifelse(is.na(TOCPHAmg),str_extract(VITEmg, '(?<=\\[).*?(?=\\])'), TOCPHAmg ),
         NIAmg = ifelse(is.na(NIAmg), str_extract(NIAEQmg, '(?<=\\[).*?(?=\\])'), NIAmg), 
         FOLSUMmcg = str_extract(FOLmcg, '(?<=\\[).*?(?=\\])'), 
         PHYTCPPD_PHYTCPPImg = str_extract(PHYTCPPmg, '(?<=\\[).*?(?=\\])'))

Changing characters into numeric

For instance, values that were reported to be trace (“tr”) or below the detection limit (“<LOD”) were converted to zero (0). These changes are also recorded in the comments variable as part of the metadata available.

#The following f(x) removes [] and changing tr w/ 0

no_brackets_tr <- function(i){
  case_when(
    str_detect(i, 'tr|[tr]') ~ "0",
    str_detect(i, '\\[.*?\\]')  ~ str_extract(i, '(?<=\\[).*?(?=\\])'),
    TRUE ~ i)
}

data.df <- data.df %>% 
  mutate_at(var_nut, no_brackets_tr) #This applies the above function

Extracting information

Some food component information, for instance alcohol content, could be reported within the food description instead of in a independent variable. Hence that information needs to be extracted and a new variable generated.

Converting into numeric

# Converting to numeric

wafct <- wafct %>% mutate_at(vars(`Edible_factor_in_FCT`:`PHYTCPPD_PHYTCPPImg`), as.numeric) #Converts certain columns (the data columns) to numeric

9.0.4 Standardising unit of measurement

To standardise and merge the different FCTs, food components need to be reported in the same units. For example, some nutrients needed to be convert from mg/100g to g/100g, or from percentage (100%) to a fraction (1). For all the unit conversion we followed the FAO/INFOODS Guidelines for Converting Units, Denominators and Expression (FAO/INFOODS, 2012b), and the suggested standard reporting units.

Eg. Converting alcohol from weight in volume (w/v) to weight in mass (w/m) (Eq.1a) or percentage of alcohol (v/v) into weight mass (Eq.1b) (See INFOODS Guidelines - page 12).

Eq.1a ALC (g/100mL) (w/v) / density (g/mL) = ALC (g/100 EP)

Eq.1b ALC (%) (v/v) * 0.789 (g/mL) / density (g/mL) = ALC (g/100 EP)

Eg. amino acids (AA) reported per g in 100g of PROT to mg in 100g of EP:

Eq.2.1 AA mg/100g EP = AA mg/g prot * prot g/100g EP /100

Eq.2.2 AA mg/100g EP = AA g/ 100g prot * prot g/100g EP /100 * 1000/100

Eq.2.3 AA mg/100g EP = AA g/100g prot * prot g/100g EP *10

\Eq.3: Edible portion = Edible portion (%)/100

Data quality and reporting (e.g., method of analysis, good metadata)

General quality checks are: the level of detail in the food description, the methods used for nutrient values compilation, and the documentation and degree of detail.

Other quality checks that can be performed are: calculating sum of proximate and re-calculating the values of: Carbohydrates available by difference, energy, etc. These is covered in the visualisation and QC section.

9.0.5 Saving the output

We are saving the standardised FCT into the data folder, for use in the future.

# Data Output 

write.csv(data.df, file = here::here("data", paste0(FCT_id, "_FCT_FAO_Tags.csv")), 
          row.names = FALSE) #Saves the newly-created data table to the Output folder

#Run this to clean the environment
rm(list = ls())

9.1 Further readings

Charrondiere, U.R., Stadlmayr, B., Grande, F., Vincent, A., Oseredczuk, M., Sivakumaran, S., Puwastien, P., Judprasong, K., Haytowitz, D., Gnagnarella, P. 2023. FAO/INFOODS Evaluation framework to assess the quality of published food composition tables and databases - User guide. Rome, FAO. https://doi.org/10.4060/cc5371en