---
title: "Adding a New Country to epicR"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 4

vignette: >
  %\VignetteIndexEntry{Adding a New Country to epicR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

oldopt <- options(rmarkdown.html_vignette.check_title = FALSE)
on.exit(options(oldopt))
```

## Introduction

EPIC (Evaluation Platform in COPD) was originally developed for the Canadian healthcare context. However, the model's architecture supports adaptation to different countries and healthcare systems through jurisdiction-specific configuration files. This vignette explains how to add a new country to epicR.

## Overview of the Configuration System

epicR uses JSON configuration files to store country-specific parameters. These files contain all the epidemiological, demographic, economic, and healthcare system parameters needed to run the model for a specific jurisdiction.

### Current Configuration Files

- `inst/config/config_canada.json` - Fully configured Canadian parameters
- `inst/config/config_us.json` - US template with placeholders (requires configuration)

## Step-by-Step Guide to Adding a New Country

### Step 1: Create the Configuration File

Create a new JSON configuration file for your country in the `inst/config/` directory:

```r
# Example: Adding Germany
file_name <- "inst/config/config_germany.json"
```

### Step 2: Copy the Template Structure

Start by copying the structure from an existing configuration file. You can use the US template as a starting point:

```{r, eval=FALSE}
# Read the US template
us_config <- jsonlite::fromJSON("inst/config/config_us.json")

# Modify for Germany
germany_config <- us_config
germany_config$jurisdiction <- "germany"

# Save the template
jsonlite::write_json(germany_config, "inst/config/config_germany.json", 
                     pretty = TRUE, auto_unbox = TRUE)
```

### Step 3: Parameter Categories to Configure

The configuration file contains several major parameter categories that need country-specific data:

#### 3.1 Global Parameters
```json
{
  "global_parameters": {
    "age0": 40,
    "time_horizon": 20,
    "discount_cost": 0.03,
    "discount_qaly": 0.03,
    "closed_cohort": 0
  }
}
```

**Required data:**
- Discount rates for costs and QALYs (country-specific economic guidelines)

#### 3.2 Demographics and Population
```json
{
  "agent": {
    "p_female": 0.51,
    "height_0_betas": [...],
    "weight_0_betas": [...],
    "p_prevalence_age": [...],
    "p_bgd_by_sex": {
      "male": [...],
      "female": [...]
    }
  }
}
```

**Required data sources:**
- **Population demographics**: National statistics office
- **Age pyramid**: Current population by age group
- **Life tables**: Age and sex-specific mortality rates
- **Anthropometric data**: Height and weight distributions by age and sex

#### 3.3 Smoking Patterns
```json
{
  "smoking": {
    "logit_p_current_smoker_0_betas": [...],
    "minimum_smoking_prevalence": 0.12,
    "mortality_factor_current": [...],
    "mortality_factor_former": [...]
  }
}
```

**Required data sources:**
- **Smoking prevalence**: National health surveys
- **Smoking-related mortality**: Meta-analyses or national studies
- **Smoking cessation rates**: Longitudinal studies

#### 3.4 COPD Epidemiology
```json
{
  "COPD": {
    "logit_p_COPD_betas_by_sex": {
      "male": [...],
      "female": [...]
    },
    "ln_h_COPD_betas_by_sex": {
      "male": [...],
      "female": [...]
    }
  }
}
```

**Required data sources:**
- **COPD prevalence**: Spirometry-based population studies
- **COPD incidence**: Longitudinal cohort studies
- **Risk factors**: Smoking, age, sex associations

#### 3.5 Healthcare Costs
```json
{
  "cost": {
    "bg_cost_by_stage": [...],
    "exac_dcost": [...],
    "cost_gp_visit": 85.50,
    "cost_outpatient_diagnosis": 125.50,
    "cost_smoking_cessation": 485.25
  }
}
```

**Required data sources:**
- **Healthcare unit costs**: National fee schedules or health economics studies
- **COPD treatment costs**: Health administrative data or costing studies
- **Currency**: Convert to local currency or standardize to USD/EUR

#### 3.6 Healthcare Utilization
```json
{
  "outpatient": {
    "ln_rate_gpvisits_COPD_by_sex": {
      "male": [...],
      "female": [...]
    }
  }
}
```

**Required data sources:**
- **Healthcare utilization patterns**: Administrative health data
- **GP visit rates**: Primary care databases

### Step 4: Data Collection Strategy

#### Essential Data Sources

1. **National Statistics Offices**
   - Population demographics
   - Life tables
   - Health surveys

2. **Health Administrative Databases**
   - Healthcare utilization
   - Treatment costs
   - Disease prevalence

3. **Published Literature**
   - Disease-specific studies
   - Health economics evaluations
   - Epidemiological studies

4. **International Databases**
   - WHO Global Health Observatory
   - OECD Health Statistics
   - Global Burden of Disease Study

#### Data Quality Considerations

- **Representativeness**: Ensure data represents the target population
- **Recency**: Use the most recent available data
- **Consistency**: Maintain consistent definitions across parameters
- **Validation**: Cross-check with multiple sources when possible

### Step 5: Parameter Estimation

When direct data is not available, parameters can be estimated using:

#### 5.1 Regression Models
```r
# Example: Estimating COPD prevalence coefficients
# Using logistic regression on survey data
model <- glm(copd ~ age + sex + smoking_status + pack_years, 
             data = survey_data, family = binomial())
coefficients(model)
```

#### 5.2 Literature Meta-analysis
- Systematic review of published studies
- Meta-analytic techniques to pool estimates
- Adjustment for population differences

#### 5.3 Calibration
- Use model calibration to match observed outcomes
- Iterative adjustment of parameters
- Validation against external data sources

### Step 6: Testing the New Configuration

Once you have populated the configuration file:

#### 6.1 Test Loading
```{r, eval=FALSE}
# Test that the configuration loads without errors
library(epicR)
input <- get_input(jurisdiction = "germany")
```

#### 6.2 Run Basic Simulation
```{r, eval=FALSE}
# Run a small simulation to check for errors
results <- simulate(
  jurisdiction = "germany",
  n_agents = 10000,
  time_horizon = 5
)
print(results$basic)
```

#### 6.3 Validate Outputs
- Compare population demographics to national statistics
- Check COPD prevalence against published estimates
- Validate healthcare utilization patterns

### Step 7: Documentation

Create documentation for your new country configuration:

#### 7.1 Data Sources Documentation
```r
# Create a data sources file
data_sources <- list(
  demographics = "German Federal Statistical Office, 2023",
  copd_prevalence = "BOLD Study Germany, 2022",
  healthcare_costs = "German Health Economics Association, 2023"
)
```

#### 7.2 Parameter Assumptions
Document any assumptions or approximations made during parameter estimation.

## Example: Adding Germany

Here's a simplified example of adding Germany to epicR:

```{r, eval=FALSE}
# 1. Create base configuration
germany_config <- list(
  jurisdiction = "germany",
  global_parameters = list(
    age0 = 40,
    time_horizon = 20,
    discount_cost = 0.03,  # German health economics guidelines
    discount_qaly = 0.03,
    closed_cohort = 0
  ),
  agent = list(
    p_female = 0.507,  # German Federal Statistical Office 2023
    # ... other parameters
  ),
  cost = list(
    cost_gp_visit = 25.50,  # German fee schedule 2023
    cost_outpatient_diagnosis = 85.40,
    # ... other costs
  )
  # ... other parameter categories
)

# 2. Save configuration
jsonlite::write_json(germany_config, 
                     "inst/config/config_germany.json", 
                     pretty = TRUE, auto_unbox = TRUE)

# 3. Test the configuration
library(epicR)
input <- get_input(jurisdiction = "germany")
```

## Quality Assurance

### Validation Checklist

- [ ] All placeholder values replaced with actual data
- [ ] Data sources documented
- [ ] Parameter ranges are clinically plausible
- [ ] Model outputs match expected population patterns
- [ ] Healthcare costs reflect local context
- [ ] Currency and units are consistent

### Common Issues and Solutions

1. **Missing Data**: Use proxy data from similar countries or systematic reviews
2. **Currency Conversion**: Use purchasing power parity adjustments
3. **Different Healthcare Systems**: Adapt utilization patterns to local context
4. **Data Quality**: Document limitations and uncertainty

## Contributing Your Configuration

If you've successfully created a configuration for a new country:

1. **Validate thoroughly** using local data
2. **Document data sources** and methods
3. **Consider contributing** to the epicR project
4. **Share with the community** for peer review

## Conclusion

Adding a new country to epicR requires substantial data collection and parameter estimation, but the modular configuration system makes this process systematic and reproducible. The key is to ensure high-quality, country-specific data while maintaining the model's scientific rigor.

For questions or assistance with adding a new country, consider:
- Reviewing published adaptations of EPIC
- Consulting with local health economists
- Engaging with the epicR development community
- Collaborating with researchers who have local data access
