---
title: "Quick Start with the baorista R package"
author: "Enrico Crema"
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
toc: true
fig_caption: true
self_contained: yes
fontsize: 11pt
documentclass: article
vignette: >
%\VignetteIndexEntry{Quick Start with the baorista R package}
%\VignetteEngine{knitr::rmarkdown_notangle}
---
```{r general setup, include = FALSE}
h = 3.5
w = 3.5
is_check <- ("CheckExEnv" %in% search()) || any(c("_R_CHECK_TIMINGS_",
"_R_CHECK_LICENSE_") %in% names(Sys.getenv()))
knitr::opts_chunk$set(fig.align = "center", eval = !is_check)
```
# Introduction
_Baorista_ is an R package that provides robust alternatives to aoristic analyses based on parametric and non-parametric Bayesian inference.
The package consists of a series of utility and wrapper functions as well as custom statistical distributions for the [NIMBLE](https://cran.r-project.org/package=nimble) probabilistic programming language.
This documents provides a short guide for the key steps required to setup datasets and execute the core functionalities of _baorista_.
## Installing and loading the _baorista_ package
Stable version of _Baorista_ is available directly on CRAN while development version can be installed via `devtools` with the following command:
```{r,eval=F}
library(devtools)
install_github('ercrema/baorista')
library(baorista)
```
```{r,echo=F}
library(baorista)
```
Please note that in order to execute all commands you need a working C++ compiler. For more further details please read the [nimble package documentation](https://r-nimble.org/html_manual/cha-installing-nimble.html)
# Using _baorista_
## Data Preparation
_baorista_ can read two types of data: a) `data.frame` class objects containing two fields recording the start and end date of the time-spans of existence recorded in BP, and b) matrices containing the probability of existence for each event (row) at each time-block (column). Sample datasets of the two formats are provided:
```{r}
data(sampledf) #two column data.frame format
data(samplemat) #events x time-block matrix format
```
The function `createProbMat()` creates an object of class `probMat` which standardise the data format and includes additional information such as chronological range and resolution:
```{r}
x.df <- createProbMat(x=sampledf,timeRange=c(6500,4001),resolution=50) #50 year timeblock ranging from 6500 to 4001 (i.e. 6500-6451,6450-6401,...)
x.mat <- createProbMat(pmat=samplemat,timeRange=c(5000,3001),resolution=100)
```
The `plot()` function displays `probMat` class object using aoristic sums:
```{r,fig.width=7,fig.height=4}
par(mfrow=c(1,2))
plot(x.df,main='x.df')
plot(x.mat,main='x.mat')
```
## Bayesian Inference
_baorista_ contains functions for running two types of Bayesian analyses: parametric and non-parametric. Parametric analyses requires the selection of a suitable growth model (each one is currently a separate function) and provides estimates of key model parameters. Non-parametric models employs an intrinsic conditional auto-regressive (ICAR) Gaussian model which enables the calculation of the probability mass function (i.e. it estimates the probability of _any_ event occurring at each timeblock, effectively recovering the shape of the time-frequency distribution).
In all cases _baorista_ provides a _wrapper_ function which internally initialises and runs an MCMC via the NIMBLE package. These function allow for user-defined MCMC settings (e.g. number of iterations, number chains, etc.), samplers, and priors and automatically assesses convergence statistics.
#### Estimating Exponential Growth Rate
The most straightforward model is a truncated discrete exponential distribution. Users can fit the data to such distribution and estimate growth rates using the function `expfit()`.
```{r}
exponential.fit <- expfit(x.mat) #fitting using default MCMC/Prior settings
# expfit(x.mat,rPrior='dnorm(mean=0,sd=1)') #Using a Gaussian prior with mean 0 and sd 1 for the growth rate r
# expfit(x.mat,niter=50000) #Fitting the model with 50k iterations
# expfit(x.mat,nchains=4,parallel=T) #Running 4 chains in parallel
```
Summary statistics on the posterior parameters and the MCMC diagnostics can be retrieved using the `summary()` function:
```{r}
summary(exponential.fit)
```
Alternatively values can be extracted directly:
```{r}
#Compute 90% HPDI with coda package
library(coda)
mcmc(exponential.fit$posterior.r) |> HPDinterval(prob=0.90)
#Plot histogram of posterior
hist(exponential.fit$posterior.r[,1],xlab='r',main='Posterior of Growth Rate')
```
A dedicated plot function can also be used to visualise the fitted exponential model:
```
plot(exponential.fit)
```
#### Non-Parametric Modelling via Random Walk ICAR Model
_baorista_ offers also a non-parametric model based Random Walk ICAR.
The function `icarfit()` estimates the probability mass for each time-block accounting for the information from adjacent blocks (and hence temporal autocorrelation).
Given the larger number of parameters, the MCMC requires longer chains and the execution over multiple cores is recommended.
Convergence issues are also more likely to arise, in which case adjust running the model with longer chains, changing the sampler, or adjusting the priors is recommended.
```{r}
# The following reaches convergence but takes a couple of hours:
# non.param.fit <- icarfit(x.df,niter=2000000,nburnin=1000000,thin=100,nchains=4,parallel=TRUE)
# Shorter number of iterations (does not reach convergence)
non.param.fit <- icarfit(x.df,niter=100000,nburnin=50000,thin=50,nchains=4,parallel=TRUE)
```
A dedicated function can display the HPDI of the probability mass of each time-block:
```{r,fig.height=4,fig.width=5}
plot(non.param.fit)
```