# install.packages("devtools")
::install_github("dzvoti/hcesNutR") devtools
7 hcesNutR
Package
The goal of the hcesNutR project is to create a repository of functions and data that will help with the analysis of the Household Consumption Expenditure Survey (HCES) data. A good source of HCES data is the world bank microdata repository.
The package contain functions that will help with the analysis of HCES data. The package also contains the sample data used in this book i.e. r4hces-data/mwi-ihs5-sample-data We will use this sample data to demonstrate the use of the functions in the package. The package is still under development and will be updated regularly.
7.1 Reporting bugs
Please report any bugs or issues here.
7.2 Installation
You can install the development version of hcesNutR from GitHub with:
As we discussed in previous chapters you need to load the package in your R session before you can use it. You can load the package by running the following code in your R console.
library(hcesNutR)
7.3 Functions in the package
You can view the functions in the package by running the following code in your R console.
ls("package:hcesNutR")
7.4 Sample data
The data used in this example is randomly generated to mimic the structure of the Fifth Integrated Household Survey 2019-2020 an HCES of Malawi. The variables and structure of this data is found here
7.4.1 Import and explore the sample data
Import the sample data from the r4hces-data/mwi-ihs5-sample-data
folder. Use the read_dta
function from the haven
package to import it.
# Import the data using the haven package from the tidyverse
<-
sample_hces ::read_dta(here::here("data",
haven"mwi-ihs5-sample-data",
"HH_MOD_G1_vMAPS.dta"))
7.4.2 Trim the data
In this example we will use hcesNutR
functions to demonstrate processing of total
consumption data. The total
consumption data is the data that contains the total consumption of each food item by each household.
The other consumption columns contain values for consumption from sources i.e. gifted, purchased, ownProduced. The workflow for processing the “other” consumption data is the same as demonstrated below.
# Trim the data to total consumption
<- sample_hces |>
sample_hces ::select(case_id:HHID,
dplyr:hh_g03c_1) hh_g01
7.5 hcesnutR
Workflow
7.5.1 Column Naming Conventions and Renaming
The sample_hces
data is in stata format which contains data with short column name codes that have associated “question” labels that explain the contents of the data. To make the column names more interpretable, the package provides the rename_hces
function, which can be used to rename the column codes to standard hces names used downstream.
The rename_hces
function uses column names from the standard_name_mappings_pairs
dataset within the package. Alternatively, a user can create their own name pairs or manually rename their columns to the standard
names.
It is important to note that all downstream functions in the hcesNutR
package work with standard names and will not work with the short column names. Therefore, it is recommended to use the rename_hces()
function to ensure that the column names are consistent with the package’s naming conventions.
For more information on how to use the rename_hces
function, please refer to the function’s documentation: rename_hces
.
# Rename the variables
<- hcesNutR::rename_hces(sample_hces,
sample_hces country_name = "MWI",
survey_name = "IHS5")
7.5.2 Remove unconsumed food items
HCES surveys administer a standard questionaire to each household where they are asked to conform whether they consumed the food items on their standard list. If a household did not consume a food item, the value of the ‘consYN’ is set to a constant. The remove_unconsumed
function removes all food items that were not consumed by the household. The function takes in a data frame and the name of the column that contains the consumption information. The function also takes in the value that indicates that the food item was consumed.
# Remove unconsumed food items
<- hcesNutR::remove_unconsumed(sample_hces,
sample_hces consCol = "consYN",
consVal = 1)
7.5.3 Create two columns from each dbl+lbl column
The create_dta_labels
function creates two columns from each dbl+lbl (double plus label) column. The first column contains the numeric values and the second column contains the labels. The function takes in a data frame and finds all columns that contains the double plus label column. The function returns a data frame with the new columns.
# Split dbl+lbl columns
<- hcesNutR::create_dta_labels(sample_hces) sample_hces
7.5.4 Concatenate columns
Some HCES data surveys split consumed food items or their consumption units into multiple columns. The concatenate_columns
function cleans the data by combining the split columns into one column. The function can exclude values from contatenation by specifying the whole or part of values to be excluded.
Concatenate food item names
# Merge food item names
<-
sample_hces ::concatenate_columns(sample_hces,
hcesNutRc("item_code_name",
"item_oth"),
"SPECIFY",
"item_code_name")
Concatenate food item units
# Merge consumption unit names. For units it is essential to remove parentesis as they are the major cause of duplicate units
<-
sample_hces ::concatenate_columns(
hcesNutR
sample_hces,c(
"cons_unit_name",
"cons_unit_oth",
"cons_unit_size_name",
"hh_g03c_1_name"
),"SPECIFY",
"cons_unit_name",
TRUE
)
<- sample_hces |>
sample_hces ::select(
dplyr
case_id,
hhid,
item_code_name,
item_code_code,
cons_unit_name,
cons_unitA,
cons_quant|>
) ::rename(food_name = item_code_name,
dplyrfood_code = item_code_code,
cons_unit_code = cons_unitA)
7.5.5 Match survey food items to standard food items
The match_food_names
function is useful for standardising survey food names. This is feasible due to an internal dataset of standard food item names matched with their corresponding survey food names for supported surveys. Alternatively users can use their own food matching names by passing a csv to the function. See hcesNutR::food_list for csv structure.
<-
sample_hces match_food_names_v2(
sample_hces,country = "MWI",
survey = "IHS5",
food_name_col = "food_name",
food_code_col = "food_code",
overwrite = FALSE
)
7.5.6 Match survey consumption units to standard consumption units
The match_food_units_v2
function is useful for standardising survey consumption units. This is feasible due to an internal dataset of standard consumption units matched with their corresponding survey consumption units for supported surveys. Alternatively users can download our template from hcesNutR::unit_names_n_codes_df
and modify it to use their own consumption unit matching names.
<-
sample_hces match_food_units_v2(
sample_hces,country = "MWI",
survey = "IHS5",
unit_name_col = "cons_unit_name",
unit_code_col = "cons_unit_code",
matches_csv = NULL,
overwrite = FALSE
)
7.5.7 Add regions and districts to the data
Identify the HCES module that contains household identifiers
. In some cases this will already be present in the HCES data and should be skipped. From the household identifiers
select the ones that are required and add to the data. In this example we will add the region and district identifiers to the data from the hh_mod_a_filt.dta
file.
# Import household identifiers from the hh_mod_a_filt.dta file
<-
household_identifiers ::read_dta(here::here("data",
haven"mwi-ihs5-sample-data",
"hh_mod_a_filt_vMAPS.dta")) |>
# subset the identifiers and keep only the ones needed.
::select(case_id,
dplyr
HHID,|>
region) ::rename(hhid = HHID)
dplyr
# Add the identifiers to the data
<-
sample_hces ::left_join(sample_hces,
dplyr
household_identifiers,by = c("hhid", "case_id"))
7.5.8 Create a measure_id
column
The create_measure_id
function creates a measure id column that is used to identify the consumption measure of each food item. The function takes in a data frame and the name of the column that contains the consumption information. The function also takes in the value that indicates that the food item was consumed.
The measure_id
is a unique identifier that allows us to join the consumption data with the food conversion factors data.
# Create measure id column
<-
sample_hces create_measure_id(
sample_hces,country = "MWI",
survey = "IHS5",
cols = c("region",
"matched_cons_unit_code",
"matched_food_code"),
include_ISOs = FALSE
)
7.5.9 Import food conversion factors.
The available data comes with a `food_conversion fcators file which has conversion fcators that link the food names and units to their corresponding
# Import food conversion factors file
<-
IHS5_conv_fct ::read_csv(
readr::here(
here"data",
"mwi-ihs5-sample-data",
"IHS5_UNIT_CONVERSION_FACTORS_vMAPS.csv"
) )
We need to check if the conversion factors file contain all the expected conversion factors for the hces data being processed. The check_conv_fct
function checks if the conversion factors file contains all the expected conversion factors for the hces data being processed. T
# Check conversion factors
check_conv_fct(hces_df = sample_hces,
conv_fct_df = IHS5_conv_fct)
7.5.10 Calculate weight of food items in kilograms.
The apply_wght_conv_fct
function will take the hces_df
and conv_fct_df
and calculate the weight of each food item in kilograms.
<-
sample_hces apply_wght_conv_fct(
hces_df = sample_hces,
conv_fct_df = IHS5_conv_fct,
factor_col = "factor",
measure_id_col = "measure_id",
wt_kg_col = "wt_kg",
cons_qnty_col = "cons_quant",
allowDuplicates = TRUE
)
7.5.11 Calculate AFE/AME and add to the data
Import data required
In order to calculate the AFE and AME metrics we require the following data: - Household roster with the sex and age of each individual HH_MOD_B_vMAPS.dta
- Household health HH_MOD_D_vMAPS.dta
- AFE and AME factors IHS5_AME_FACTORS_vMAPS.csv
and IHS5_AME_SPEC_vMAPS.csv
# Import data of the roster and health modules of the IHS5 survey
<-
ihs5_roster ::read_dta(here::here("data",
haven"mwi-ihs5-sample-data",
"HH_MOD_B_vMAPS.dta"))
<-
ihs5_health ::read_dta(here::here("data",
haven"mwi-ihs5-sample-data",
"HH_MOD_D_vMAPS.dta"))
# Import data of the AME/AFE factors and specifications
<-
ame_factors read.csv(here::here("data",
"mwi-ihs5-sample-data",
"IHS5_AME_FACTORS_vMAPS.csv")) |>
::clean_names()
janitor
<-
ame_spec_factors read.csv(here::here("data",
"mwi-ihs5-sample-data",
"IHS5_AME_SPEC_vMAPS.csv")) |>
::clean_names() |>
janitor# Rename the population column to cat and select the relevant columns
::rename(cat = population) |>
dplyr::select(cat, ame_spec, afe_spec) dplyr
Extra energy requirements for pregnancy
# Extra energy requirements for pregnancy and Illness
<- ihs5_health |>
pregnantPersons ::filter(hh_d05a == 28 |
dplyr== 28) |>
hh_d05b # NOTE: 28 is the code for pregnancy in this survey
::mutate(ame_preg = 0.11, afe_preg = 0.14) |>
dplyr::select(HHID, ame_preg, afe_preg) dplyr
Process HH roster data
# Process the roster data and rename variables to be more intuitive
<- ihs5_roster |>
aMFe_summaries # Rename the variables to be more intuitive
::rename(sex = hh_b03, age_y = hh_b05a, age_m = hh_b05b) |>
dplyr::mutate(age_m_total = (age_y * 12 + age_m)) |>
dplyr# Add the AME/AFE factors to the roster data
::left_join(ame_factors, by = c("age_y" = "age")) |>
dplyr::mutate(
dplyrame_base = dplyr::case_when(sex == 1 ~ ame_m, sex == 2 ~ ame_f),
afe_base = dplyr::case_when(sex == 1 ~ afe_m, sex == 2 ~ afe_f),
age_u1_cat = dplyr::case_when(
# NOTE: Round here will ensure that decimals are not omited in the calculation.
round(age_m_total) %in% 0:5 ~ "0-5 months",
round(age_m_total) %in% 6:8 ~ "6-8 months",
round(age_m_total) %in% 9:11 ~ "9-11 months"
)|>
) # Add the AME/AFE factors for the specific age categories
::left_join(ame_spec_factors, by = c("age_u1_cat" = "cat")) |>
dplyr# Dietary requirements for children under 1 year old
::mutate(
dplyrame_lac = dplyr::case_when(age_y < 2 ~ 0.19),
afe_lac = dplyr::case_when(age_y < 2 ~ 0.24)
|>
) ::rowwise() |>
dplyr# TODO: Will it not be better to have the pregnancy values added at the same time here?
::mutate(ame = sum(c(ame_base, ame_spec, ame_lac), na.rm = TRUE),
dplyrafe = sum(c(afe_base, afe_spec, afe_lac), na.rm = TRUE)) |>
# Calculate number of individuals in the households
::group_by(HHID) |>
dplyr::summarize(
dplyrhh_persons = dplyr::n(),
hh_ame = sum(ame),
hh_afe = sum(afe)
|>
) # Merge with the pregnancy and illness data
::left_join(pregnantPersons, by = "HHID") |>
dplyr::rowwise() |>
dplyr::mutate(hh_ame = sum(c(hh_ame, ame_preg), na.rm = T),
dplyrhh_afe = sum(c(hh_afe, afe_preg), na.rm = T)) |>
::ungroup() |>
dplyr# Fix single household factors
::mutate(
dplyrhh_ame = dplyr::if_else(hh_persons == 1, 1, hh_ame),
hh_afe = dplyr::if_else(hh_persons == 1, 1, hh_afe)
|>
) ::select(HHID, hh_persons, hh_ame, hh_afe) |>
dplyr::rename(hhid = HHID) dplyr
Enrich Consumption Data with AFE/AME
We will use the left_join
function from dplyr
to join the consumption data with the aMFe_summaries
data.
The left_join
function will join the aMFe_summaries
data to the sample_hces
data by matching the hhid
column in both data sets.
The left_join
function will add the hh_persons
, hh_ame
and hh_afe
columns to the sample_hces
data.
The hh_persons
column contains the number of people in each household. The hh_ame
and hh_afe
columns contain the AME and AFE factors for each household.
<- sample_hces |>
sample_hces ::left_join(aMFe_summaries) dplyr
Now we have a “clean” data set that we can use for analysis.
7.6 Summary
This chapter demonstrated the use of the hcesNutR
package to process HCES data. The package contains functions that will help with the analysis of HCES data.
The package also contains the sample data used in this book i.e. r4hces-data/mwi-ihs5-sample-data We used this sample data to demonstrate the use of the functions in the package.
The package is still under development and will be updated regularly.Please report any bugs or issues here.
7.7 Future work
- Add more functions to the package
- Support more surveys (NGA Living Standards Survey 2018-2019)
- Add more internal data to the package