Codelist diagnostics

This vignette presents a set of functions to explore the use of codes in a codelist. We will cover the following key functions:

Let’s start by loading the required packages, connecting to a mock database, and generating a codelist for example purposes. We’ll use getCandidateCodes() to find our codes.

library(DBI)
library(duckdb)
library(dplyr)
library(CDMConnector)
library(CodelistGenerator)
library(CohortConstructor)
library(omopgenerics)

# Connect to the database and create the cdm object
con <- dbConnect(duckdb(), 
                      eunomiaDir("synpuf-1k", "5.3"))
cdm <- cdmFromCon(con = con, 
                  cdmName = "Eunomia Synpuf",
                  cdmSchema   = "main",
                  writeSchema = "main", 
                  achillesSchema = "main")

# Create a codelist for depression
depression <- getCandidateCodes(cdm,
                                keywords = "depression")
depression <- newCodelist(list("depression" = depression$concept_id))

Running diagnostics for a codelist

Summarise code use using ACHILLES tables

This function uses ACHILLES summary tables to count the number of records and persons associated with each concept in a codelist. Notice that it requires that ACHILLES tables are available in the CDM.

achilles_code_use <- summariseAchillesCodeUse(depression, 
                                              cdm, 
                                              countBy = c("record", "person"))

From this, we will obtain a summarised result object. We can easily visualise the results using tableAchillesCodeUse():

tableAchillesCodeUse(achilles_code_use,
                     type = "gt")

Notice that concepts with zero counts will not appear in the result table.

Summarise code use using patient-level data

This function performs a similar task as above but directly queries patient-level data, making it usable even if ACHILLES tables are not available. It can be configured to stratify results by concept (byConcept), by year (byYear), by sex (bySex), or by age group (byAgeGroup). We can further specify a specific time period (dateRange).

code_use <- summariseCodeUse(depression,
                             cdm,
                             countBy = c("record", "person"),
                             byYear  = FALSE,
                             bySex   = FALSE,
                             ageGroup =  list("<=50" = c(0,50), ">50" = c(51,Inf)),
                             dateRange = as.Date(c("2010-01-01", "2020-01-01")))

tableCodeUse(code_use, type = "gt")

Identify orphan codes

Orphan codes are concepts that might be related to our codelist but that have not been included. It can be used to ensure that we have not missed any important concepts. Notice that this function uses ACHILLES tables.

summariseOrphanCodes() will look for descendants (via concept_descendants table), ancestors (via concept_ancestor table), and concepts related to the codes included in the codelist (via concept_relationship table). Additionally, if the cdm contains PHOEBE tables (concept_recommended table), they will also be used.

orphan <- summariseOrphanCodes(depression, cdm)
tableOrphanCodes(orphan, type = "gt")

Run diagnostics within a cohort

You can also evaluate how the codelist is used within a specific cohort. First, we will define a cohort using the conceptCohort() function from CohortConstructor package.

cdm[["depression"]] <- conceptCohort(cdm, 
                                     conceptSet = depression, 
                                     name = "depression")

Then, we can summarise the code use within this cohort:

cohort_code_use <- summariseCohortCodeUse(cdm,
                                          cohortTable = "depression",
                                          countBy = c("record", "person"))
tableCohortCodeUse(cohort_code_use)

Summarise code use at cohort entry

Use the timing argument to restrict diagnostics to codes used at the entry date of the cohort.

cohort_code_use <- summariseCohortCodeUse(cdm,
                                          cohortTable = "depression",
                                          countBy = c("record", "person"),
                                          timing = "entry")
tableCohortCodeUse(cohort_code_use)

Cohort code use with a different codelist

By default we’ll get cohort code use for the codes that were used for creating the cohort. But we could change this to another cohort. Here we’d get counts for anxiety codes that occur on the same day as entry into the depression cohort.

anxiety <- getCandidateCodes(cdm,
                             keywords = "anxiety")
anxiety <- newCodelist(list("anxiety" = anxiety$concept_id))

cohort_code_use <- summariseCohortCodeUse(cdm,
                                          cohortTable = "depression",
                                          x = anxiety,
                                          countBy = c("record", "person"),
                                          timing = "entry")
tableCohortCodeUse(cohort_code_use)

Stratify cohort code use

You can also stratify cohort code use results by year (byYear), by sex (bySex), or by age group (byAgeGroup):

cohort_code_use <- summariseCohortCodeUse(cdm = cdm,
                                          cohortTable = "depression",
                                          countBy = c("record", "person"),
                                          byYear = FALSE,
                                          bySex = TRUE,
                                          ageGroup = NULL)
tableCohortCodeUse(cohort_code_use)