---
title: "Get Started with ukbflow"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Get Started with ukbflow}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  eval     = FALSE
)
```

## Welcome to `ukbflow`

**`ukbflow`** is an R package for UK Biobank analysis on the
[Research Analysis Platform (RAP)](https://ukbiobank.dnanexus.com).
It covers the full midstream-to-downstream pipeline — from phenotype derivation
and association analysis to publication-ready figures and genetic risk scoring —
designed for RAP-native UKB workflows, with local simulated data for development
and testing.

## Installation

```{r install}
# From CRAN (recommended)
install.packages("ukbflow")

# Latest development version from GitHub
pak::pkg_install("evanbio/ukbflow")
```

## A Quick Taste

### Load data

```{r load-data}
library(ukbflow)

df <- ops_toy()   # synthetic UKB-like cohort, no RAP connection needed

# On RAP, replace with:
# auth_login()
# auth_select_project("project-XXXXXXXXXXXX")
# df <- extract_pheno(c(31, 21022, 53, 20116)) |>
#   decode_values() |>
#   decode_names()
```

### Derive a disease phenotype

```{r derive}
df <- df |>
  derive_missing() |>                                               # recode "Prefer not to answer" → NA
  derive_selfreport(name = "t2dm", regex = "diabetes",           # T2DM self-report
                    field = "noncancer") |>
  derive_icd10(name = "t2dm", icd10 = "E11", source = "hes") |> # T2DM from HES
  derive_case(name = "t2dm") |>                                  # → t2dm_status, t2dm_date
  derive_followup(name         = "t2dm",
                  event_col    = "t2dm_date",
                  baseline_col = "p53_i0",                          # assessment centre date
                  censor_date  = as.Date("2022-06-01"))
```

### Run an association model

```{r assoc}
res <- assoc_coxph(
  data         = df,
  outcome_coll  = "t2dm_status",
  time_col     = "t2dm_followup_years",
  exposure_col = "p21001_i0",   # BMI (continuous)
  covariates   = c("p21022",    # age_at_recruitment
                   "p31")       # sex
)
```

### Plot the results

```{r plot}
# Forest plot — see vignette("plot") for full usage
res_df <- as.data.frame(res)
plot_forest(
  data      = res_df,
  est       = res_df$HR,
  lower     = res_df$CI_lower,
  upper     = res_df$CI_upper,
  ci_column = 7L   # res_df has 6 cols before HR; CI graphic goes here
)

# Table 1
plot_tableone(
  data   = as.data.frame(df),
  vars   = c("p21022",     # age_at_recruitment
             "p31",        # sex
             "p21001_i0"), # bmi
  strata = "t2dm_status"
)
```

## Full Function Overview

| Module | Key functions | Vignette |
|---|---|---|
| Auth | `auth_login()`, `auth_select_project()` | `vignette("auth")` |
| Fetch | `fetch_ls()`, `fetch_file()`, `fetch_tree()` | `vignette("fetch")` |
| Extract | `extract_pheno()`, `extract_batch()`, `extract_ls()` | `vignette("extract")` |
| Job | `job_wait()`, `job_status()`, `job_result()` | `vignette("job")` |
| Decode | `decode_values()`, `decode_names()` | `vignette("decode")` |
| Derive | `derive_missing()`, `derive_icd10()`, `derive_case()` | `vignette("derive")` |
| Survival | `derive_timing()`, `derive_age()`, `derive_followup()` | `vignette("derive-survival")` |
| Assoc | `assoc_coxph()`, `assoc_logistic()`, `assoc_subgroup()` | `vignette("assoc")` |
| Plot | `plot_forest()`, `plot_tableone()` | `vignette("plot")` |
| GRS | `grs_check()`, `grs_score()`, `grs_validate()` | `vignette("grs")` |
| Ops | `ops_setup()`, `ops_toy()`, `ops_snapshot()` | `vignette("ops")` |

## End-to-End Case Study

For a complete worked example using a simulated UK Biobank cohort — covering
data loading, phenotype derivation, cohort assembly, Cox regression, and
publication-ready visualisation — see:

`vignette("smoking_lung_cancer")` — **Smoking and Lung Cancer Risk: A Complete Analysis Workflow**

## Additional Resources

- [Documentation site](https://evanbio.github.io/ukbflow/)
- [GitHub](https://github.com/evanbio/ukbflow)
- View all functions: `?ukbflow` or `help(package = "ukbflow")`

> *"All models are wrong, but some are publishable."*
>
> — after George Box
