“To measure is to know. If you cannot measure it, you cannot improve it.”
- Lord Kelvin
The Nigeria National Data Repository (NDR) houses the de-identified patient-level information for the HIV program in Nigeria. Perhaps the most versatile feature of the repository is that it allows users with login access to download this patient-level information, analyze, and make informed decisions to improve the response of their program towards achieving the UNAIDS three 95s targets.You can download “treatment”, “recent infection” and “HTS” line lists.
Analysis of the de-identified patient-level information is traditionally conducted in Microsoft Excel. While this provides a great platform, it has some downsides which include:
The software must be installed on the user’s computer.
The user must be familiar with the formula for calculation of the indicator of interest.
Because of the point-and-click nature of the analysis, it is error-prone. Sometimes these errors go unnoticed by the users giving a rather false result.
Performing the same analysis periodically can be quite tedious and time-consuming as the analysis is usually done afresh every time.
The aim of the {tidyndr} package is to eliminate these hurdles by providing the user with an application that can be conducted on a free and open-source software, allows the user to focus on the task to be performed and not the formula, remove user-defined errors, and allow for easy automation of routine activities.
The {tidyndr} functions are grouped into five categories for performing related actions.
Importing functions
Treatment functions
Viral Load functions
Summary functions
HIV-1 recent infection functions
library(tidyndr)
The read_ndr()
function allows you to import your line-list in a nicely configured format for data analysis. It:
Reads your .csv
file.
Formats the data type for each of the column variables as necessary (converts all date variables to dates and categorical variables to factors).
Converts all column names to snake case format.
Adds three new columns to your treatment data - date_lost
(calculated by adding 28 days to the sum of last_drug_pickup_date
and the days_of_arv_refill
), appointment_date
(calculated by adding the number of days_of_arv_refill
to the last_drug_pickup_date
, and current_status
(calculated by classifying the patient as “active” or “inactive” using the value of the time_stamp
argument as a reference.
The read_ndr()
is an s3 generic
function that calls another function depending on the type of line-list supplied
path
- this specifies the location of the NDR line-list to be imported. You can do this either by specifying the “absolute” file path or a “relative” file path. See ?file.path()
and ?read_ndr()
for more details. The line-list must be exactly as downloaded from the NDR.
type
- to specify the type of line-list that you are importing. The options are treatment
, recency
and hts
.
time_stamp
- this is required only when importing “treatment” line-list. It is the reference date for the NDR line-list. It is used to derive the current_status
of clients based on the last drug pickup date and the number of days of ARV refill. The value for this argument should be specified in the ISO 8601 “yyyy-mm-dd” format. I recommend that you download the NDR line-list on Monday and specify this date as the previous Friday. That way, you are almost certain that all the data available on the NDR were as at the preceding week and upload of data from the new week has not commenced.
cols
- when the value to this argument is absent, read_ndr()
uses the column specification for NDR column modifications that occurred between October 2020 and March 2021. NDR column specifications before October 2020 are not accounted for by the default function so these will have to be specified manually. See ?vroom::cols()
for more details.
quiet
- when this is set to FALSE (the default) it prints the names of the two columns added. This is available only when the line-list type is “treatment”
## import file from the computer. This uses the "treatment" example file that comes with the {tidyndr} package.
<- system.file("extdata",
file_path "ndr_example.csv",
package = "tidyndr")
<- read_ndr(file_path, time_stamp = "2021-12-15")
ex_ndr
## import file from the computer using a few of the `...` arguments and setting `quiet` to TRUE
<- read_ndr(file_path,
ndr_example time_stamp = "2021-12-15",
skip = 0,
comment = "",
quiet = TRUE)
## import recent infection example file
<- system.file(
file_path2 "extdata",
"recency_example.csv",
package = "tidyndr"
)
<- read_ndr(file_path2, type = "recency") ex_recency
These group of indicators are based on the PEPFAR MER treatment indicators and their supporting indicators. They include:
tx_new()
This generates the line-list of clients who started ART within a period. It can be supplied 5 different arguments with the first one being the only compulsory one:
data
- the NDR line-list imported using the read_ndr()
.
from
- the start date for generating the requested line-list. This defaults to the beginning of the Fiscal Year. When this is supplied, it must in in the ISO 8601 format.
to
- the end date for generating the requested line-list. This defaults to all clients who started ART after the from
date.
states
and facilities
- the particular state(s) of interest. When this is not supplied,it calculates the new clients for all states and facilities contained in the data.
## generate tx_new clients between January and June 2021 for all states in the data
tx_new(ndr_example, from = "2021-01-01", to = "2021-06-30")
#> # A tibble: 2,307 x 52
#> ip state lga facility datim_code sex patient_identif~ hospital_number
#> <fct> <fct> <fct> <fct> <fct> <fct> <chr> <chr>
#> 1 NGOHea~ Okun Okun~ Facilit~ datim_cod~ F Okuty200048 00016
#> 2 NGOHea~ Okun Okun~ Facilit~ datim_cod~ M Okuty100049 00012
#> 3 NGOHea~ Okun Odo-~ Facilit~ datim_cod~ M Okuty600051 00015
#> 4 NGOHea~ Okun Okun~ Facilit~ datim_cod~ F Okuty200057 00020
#> # ... with 2,303 more rows, and 44 more variables: date_of_birth <date>,
#> # age_at_art_initiation <dbl>, current_age <dbl>, art_start_date <date>,
#> # art_start_date_source <fct>, last_drug_pickup_date <date>,
#> # last_drug_pickup_date_q1 <date>, last_drug_pickup_date_q2 <date>,
#> # last_drug_pickup_date_q3 <date>, last_drug_pickup_date_q4 <date>,
#> # last_regimen <fct>, last_clinic_visit_date <date>,
#> # days_of_arv_refill <dbl>, pregnancy_status <fct>, ...
## generate tx_new for only one state ("Arewa" in the data) for January 2021.
tx_new(ndr_example,
from = "2021-01-01",
to = "2021-01-31",
states = "Arewa")
#> # A tibble: 84 x 52
#> ip state lga facility datim_code sex patient_identif~ hospital_number
#> <fct> <fct> <fct> <fct> <fct> <fct> <chr> <chr>
#> 1 NGOHea~ Arewa Arew~ Facilit~ datim_cod~ F Arety318703 000129
#> 2 NGOHea~ Arewa Oke-~ Facilit~ datim_cod~ M Arety518952 000170
#> 3 NGOHea~ Arewa Oke-~ Facilit~ datim_cod~ M Arety519197 000213
#> 4 NGOHea~ Arewa Oke-~ Facilit~ datim_cod~ F Arety519340 000253
#> # ... with 80 more rows, and 44 more variables: date_of_birth <date>,
#> # age_at_art_initiation <dbl>, current_age <dbl>, art_start_date <date>,
#> # art_start_date_source <fct>, last_drug_pickup_date <date>,
#> # last_drug_pickup_date_q1 <date>, last_drug_pickup_date_q2 <date>,
#> # last_drug_pickup_date_q3 <date>, last_drug_pickup_date_q4 <date>,
#> # last_regimen <fct>, last_clinic_visit_date <date>,
#> # days_of_arv_refill <dbl>, pregnancy_status <fct>, ...
tx_curr()
Generates the line-list of all clients who are still active on treatment. It has 4 different parameters with only the first one as the compulsory argument to be specified. The parameters are:
data
, states
and facilities
- see data
for tx_new
above.
status
- the column to be used to determine the tx_curr
. Can be one of two - ‘calculated’ or ‘default’. The ‘calculated’ used the derived current_status
column while the ‘default’ uses the NDR current_status_28_days
column.
## generate current clients using the calculated `current_status` column
tx_curr(ndr_example)
#> # A tibble: 10,646 x 52
#> ip state lga facility datim_code sex patient_identif~ hospital_number
#> <fct> <fct> <fct> <fct> <fct> <fct> <chr> <chr>
#> 1 NGOHea~ Okun Odo-~ Facilit~ datim_cod~ M Okuty600005 0002
#> 2 NGOHea~ Okun Okun~ Facilit~ datim_cod~ F Okuty200006 0002
#> 3 NGOHea~ Okun Okun~ Facilit~ datim_cod~ M Okuty400007 0001
#> 4 NGOHea~ Okun Odo-~ Facilit~ datim_cod~ F Okuty600010 0003
#> # ... with 10,642 more rows, and 44 more variables: date_of_birth <date>,
#> # age_at_art_initiation <dbl>, current_age <dbl>, art_start_date <date>,
#> # art_start_date_source <fct>, last_drug_pickup_date <date>,
#> # last_drug_pickup_date_q1 <date>, last_drug_pickup_date_q2 <date>,
#> # last_drug_pickup_date_q3 <date>, last_drug_pickup_date_q4 <date>,
#> # last_regimen <fct>, last_clinic_visit_date <date>,
#> # days_of_arv_refill <dbl>, pregnancy_status <fct>, ...
## generate current clients using the default `current_status_28_days` column
tx_curr(ndr_example,
status = "default")
#> # A tibble: 10,646 x 52
#> ip state lga facility datim_code sex patient_identif~ hospital_number
#> <fct> <fct> <fct> <fct> <fct> <fct> <chr> <chr>
#> 1 NGOHea~ Okun Odo-~ Facilit~ datim_cod~ M Okuty600005 0002
#> 2 NGOHea~ Okun Okun~ Facilit~ datim_cod~ F Okuty200006 0002
#> 3 NGOHea~ Okun Okun~ Facilit~ datim_cod~ M Okuty400007 0001
#> 4 NGOHea~ Okun Odo-~ Facilit~ datim_cod~ F Okuty600010 0003
#> # ... with 10,642 more rows, and 44 more variables: date_of_birth <date>,
#> # age_at_art_initiation <dbl>, current_age <dbl>, art_start_date <date>,
#> # art_start_date_source <fct>, last_drug_pickup_date <date>,
#> # last_drug_pickup_date_q1 <date>, last_drug_pickup_date_q2 <date>,
#> # last_drug_pickup_date_q3 <date>, last_drug_pickup_date_q4 <date>,
#> # last_regimen <fct>, last_clinic_visit_date <date>,
#> # days_of_arv_refill <dbl>, pregnancy_status <fct>, ...
tx_ml()
This generates the line-list of clients who were active at the beginning of the reference date and have now become inactive at the to
date specified. The default is to generate the line-list of all clients who became inactive in the current Fiscal Year (i.e. were active at the beginning of FY22 but have become inactive at the end of December 2021). It accepts 5 arguments listed below:
data
, from
, to
, states
, facilities
- see tx_new
documentation above.## generate the line-list of clients who were active at the beginning of October 2020
## (beginning of FY21) but became inactive at the end of December 2020.
tx_ml(new_data = ndr_example,
from = "2021-10-01",
to = "2021-12-31")
## if data from two periods are available, you can supply these to determine the `tx_ml"
<- "https://raw.githubusercontent.com/stephenbalogun/example_files/main/ndr_example.csv"
file_path <- read_ndr(file_path, time_stamp = "2021-02-15")
ndr_old <- ndr_example
ndr_new tx_ml(old_data = ndr_old,
new_data = ndr_new)
## generate the line-list of clients who have become inactive for "Arewa" and "Abaji"
## since the beginning of October 2021.
tx_ml(ndr_example,
states = c("Abaji", "Arewa"), from = "2021-10-01")
tx_ml_outcomes
For the inactive clients generated, you might be interested in subsetting those with specific final outcomes of interest. Currently, the NDR recognizes only two final outcomes (“dead” and “transferred out”). These are the ones that can be subset using the tx_ml_outcomes()
function. This function takes only two argument:
data
- see data
under tx_new()
above.
outcome
- one of “dead”, “transfer out” (or “transferred out”).
## generate the line-list of all clients who became inactive this Fiscal Year
<- tx_ml(ndr_example)
ml_example
## subset inactive clients who were transferred out
tx_ml_outcomes(ml_example, outcome = "transferred out")
tx_rtt
You can filter for clients who were previously inactive but have returned to treatment and are still active at the end of the period of interest. This is the only function where you will be needing two different sets of data - the first data contains the inactive clients while the second data is where their change in status will be checked. The acceptable arguments to tx_rtt
include:
old_data
- the dataset including the list of inactive clients.
new_data
- a more recent dataset where change in ART status will be evaluated.
states
, and facilities
- see tx_new
above.
status
- see tx_curr
above.
## location of the old line-list that contains the list of inactive clients
<- "https://raw.githubusercontent.com/stephenbalogun/example_files/main/ndr_example.csv"
file_path
<- read_ndr(file_path,
old_data time_stamp = "2021-02-15")
<- ndr_example
new_data tx_rtt(old_data, new_data)
tx_appointment
Sometimes, you are interested in knowing the number of Active clients who are due for medication refill/drug pick up within a period of time. This can help you to plan for the visits, forecast medication appointments and also identify active clients who have missed their appointment. The tx_appointment()
is one of the supporting treatment indicators that helps in this regard. It takes 6 arguments viz:
data
, from
, to
, states
and facilities
- kindly see tx_new()
documentation above.
status
- please refer to the previous documentation for tx_curr()
.
## generate list of clients with medication appointment in Q2 of FY21
<- tx_appointment(ndr_example,
q2_appt from = "2022-01-01",
to = "2022-03-31")
## print the number of clients with appointments in Q2
nrow(q2_appt)
#> [1] 1613
tx_mmd
Knowing the number of months of medications dispensed during the last medication refill allows you to calculate the number of active clients who are on MMD (Multi-month Dispensing), i.e. clients who were given between 3 months and 6 months medication during the last clinic visit. You might also be interested in know the details of clients who did not have MMD, or who had more than 6 months medication refill (some of which might be due to data entry errors). The arguments that can be supplied to this function include:
data
, states
, and facilities
- see tx_new()
above.
status
- see tx_curr()
documentation above.
months
- the number of months of ARV medications dispensed during the last clinic visit. The default filters active clients who had between 3 and 6 months of ARV but you can change this to generate the list of clients who had more than 6 months medications dispensed for example.
tx_mmd(ndr_example)
#> # A tibble: 10,512 x 53
#> ip state lga facility datim_code sex patient_identif~ hospital_number
#> <fct> <fct> <fct> <fct> <fct> <fct> <chr> <chr>
#> 1 NGOHea~ Okun Odo-~ Facilit~ datim_cod~ M Okuty600005 0002
#> 2 NGOHea~ Okun Okun~ Facilit~ datim_cod~ F Okuty200006 0002
#> 3 NGOHea~ Okun Okun~ Facilit~ datim_cod~ M Okuty400007 0001
#> 4 NGOHea~ Okun Odo-~ Facilit~ datim_cod~ F Okuty600010 0003
#> # ... with 10,508 more rows, and 45 more variables: date_of_birth <date>,
#> # age_at_art_initiation <dbl>, current_age <dbl>, art_start_date <date>,
#> # art_start_date_source <fct>, last_drug_pickup_date <date>,
#> # last_drug_pickup_date_q1 <date>, last_drug_pickup_date_q2 <date>,
#> # last_drug_pickup_date_q3 <date>, last_drug_pickup_date_q4 <date>,
#> # last_regimen <fct>, last_clinic_visit_date <date>,
#> # days_of_arv_refill <dbl>, pregnancy_status <fct>, ...
## filter clients who had more than 6 months of ARV
tx_mmd(ndr_example,
months = c(7, Inf))
#> # A tibble: 0 x 53
#> # ... with 53 variables: ip <fct>, state <fct>, lga <fct>, facility <fct>,
#> # datim_code <fct>, sex <fct>, patient_identifier <chr>,
#> # hospital_number <chr>, date_of_birth <date>, age_at_art_initiation <dbl>,
#> # current_age <dbl>, art_start_date <date>, art_start_date_source <fct>,
#> # last_drug_pickup_date <date>, last_drug_pickup_date_q1 <date>,
#> # last_drug_pickup_date_q2 <date>, last_drug_pickup_date_q3 <date>,
#> # last_drug_pickup_date_q4 <date>, last_regimen <fct>, ...
## list of clients who had either more than 6 months, or < 3 months medications dispensed
tx_mmd(ndr_example,
months = c(1, 2, 7, Inf))
#> # A tibble: 129 x 53
#> ip state lga facility datim_code sex patient_identif~ hospital_number
#> <fct> <fct> <fct> <fct> <fct> <fct> <chr> <chr>
#> 1 NGOHea~ Okun Okun~ Facilit~ datim_cod~ M Okuty300079 0002
#> 2 NGOHea~ Okun Odo-~ Facilit~ datim_cod~ F Okuty600237 00075
#> 3 NGOHea~ Okun Okun~ Facilit~ datim_cod~ F Okuty201449 000562
#> 4 NGOHea~ Okun Okun~ Facilit~ datim_cod~ F Okuty201686 000645
#> # ... with 125 more rows, and 45 more variables: date_of_birth <date>,
#> # age_at_art_initiation <dbl>, current_age <dbl>, art_start_date <date>,
#> # art_start_date_source <fct>, last_drug_pickup_date <date>,
#> # last_drug_pickup_date_q1 <date>, last_drug_pickup_date_q2 <date>,
#> # last_drug_pickup_date_q3 <date>, last_drug_pickup_date_q4 <date>,
#> # last_regimen <fct>, last_clinic_visit_date <date>,
#> # days_of_arv_refill <dbl>, pregnancy_status <fct>, ...
Summary indicators provide aggregates for a particular indicator of interest. {tidyndr} provides two aggregate functions. These are:
1. summarise_ndr()
summarise_ndr
All the previous functions generates patient-level line-lists. You will most often be interested in a tabular summary of the information provided. This is the work of summarise_ndr()
(and its partner summarize_ndr()
). It takes all the line-lists that you might have generated, and display a summary table with one column for each of your generated line-lists. summarise_ndr()
accepts three arguments:
...
- these are the names assigned to each of the line-lists that you have generated.
level
- specifies the level at which the summary should be performed. The options are “facility”, “lga”, “state” or “country” (“ip”). The default level is “state”.
names
- the names to be assigned to each of the summary columns created. See ?summarise_ndr
for more details.
<- tx_curr(ndr_example) # generate active clients and assign to "curr"
curr
<- tx_new(ndr_example, from = "2021-10-01", to = "2021-12-31") # generate TX_NEW for the FY and assign to "new"
new
summarise_ndr(curr, new, level = "state", names = c("curr", "tx_new")) # when the `names` argument is not supplied, the data names are used
#> # A tibble: 5 x 4
#> ip state curr tx_new
#> <chr> <chr> <int> <int>
#> 1 NGOHealth Okun 641 39
#> 2 NGOHealth Abaji 3196 230
#> 3 NGOHealth Arewa 4088 283
#> 4 NGOHealth Ayetoro 2721 191
#> # ... with 1 more row
disaggregate
A very powerful function that allows you to summarise your generated line-list disaggregated based on a particular variable. The disaggregation options currently available are “current_age”, “sex”, “pregnancy_status”, “art_duration”, “months_dispensed”, and “age_sex”. It accepts 4 arguments:
data
- see tx_new()
documentation above.
by
, level
and pivot_wide
- see the documentation for summarise_ndr()
above.
## generate list of inactive clients
<- tx_ml(new_data = ndr_example, from = "2021-01-01", to = "2021-03-31")
inactives
## disaggregate inactive clients by gender at state level
disaggregate(inactives,
by = "sex")
## disaggregate inactive clients by "age group" at country level
disaggregate(inactives,
by = "current_age",
level = "country",
pivot_wide = FALSE)