---
title: "Getting Started with LactCurveModels"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with LactCurveModels}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Overview

**LactCurveModels** is an R package for fitting nonlinear lactation curve models
to daily milk yield records from dairy animals. The package fits up to 20
published models simultaneously, evaluates each model using six
goodness-of-fit metrics, and produces publication-ready diagnostic figures and
summary tables — all in a single function call.

Key features:

- Fits 20 nonlinear lactation curve models spanning four mathematical families
- Supports batch processing of multiple animals from a single CSV file
- Produces 15 diagnostic figures per animal plus 3 combined cross-animal outputs
- Reports R², Adjusted R², RMSE, AIC, BIC, and Durbin-Watson for every model
- Flexible selection of any subset of models or animals to analyse

---

## Installation

Install the package from CRAN:

```{r install-cran, eval=FALSE}
install.packages("LactCurveModels")
```

Or install the development version from GitHub:

```{r install-github, eval=FALSE}
# install.packages("devtools")
devtools::install_github("venkatesanraja/LactCurveModels")
```

Load the package:

```{r setup}
library(LactCurveModels)
```

---

## Input Data Format

Your input CSV file must contain exactly **three columns** in any column order:

| Column       | Type      | Description                                        |
|--------------|-----------|----------------------------------------------------|
| `animal_id`  | character | Unique identifier for each animal                  |
| `dim`        | integer   | Days-in-milk time index (typically 1 to 20)        |
| `dmy`        | numeric   | Daily milk yield in kg/day                         |

A typical CSV file looks like this:

```
animal_id,dim,dmy
Animal_A,1,9.3
Animal_A,2,9.9
Animal_A,3,10.2
...
Animal_B,1,8.1
Animal_B,2,8.7
...
```

> **Note:** The `dim` column represents a fortnightly time index across the
> lactation period. Each record corresponds to one fortnightly observation of
> daily milk yield for a given animal.

---

## Basic Usage

### Analyse All Animals with All 20 Models

```{r basic-usage, eval=FALSE}
results <- run_lactation_analysis(
  input_csv_path = "path/to/your/data.csv"
)
```

By default, all animals in the CSV are analysed using all 20 models, and
outputs are saved to a folder named `lactation_results` in the current
working directory.

### Specify a Custom Output Folder

```{r custom-outdir, eval=FALSE}
results <- run_lactation_analysis(
  input_csv_path = "path/to/your/data.csv",
  out_dir        = "D:/my_results/lactation_analysis"
)
```

---

## Selecting Specific Models

Use the `selected_models` argument to fit only a subset of the 20 available
models. This is useful when you already know which model families are most
appropriate for your data, or when you want faster processing.

```{r selected-models, eval=FALSE}
results <- run_lactation_analysis(
  input_csv_path  = "path/to/your/data.csv",
  selected_models = c("Wood_1967", "Wilmink_k005", "Ali_Schaeffer",
                      "Pollott_Multiplicative")
)
```

### All 20 Available Model Names

The full list of valid model name strings is given below, grouped by
mathematical family:

| Family                | Model Name String          | Reference                        |
|-----------------------|----------------------------|----------------------------------|
| Brody / Exponential   | `Brody_1923`               | Brody, 1923                      |
| Brody / Exponential   | `Brody_1924`               | Brody, 1924                      |
| Brody / Exponential   | `Parabolic_Exp_Sikka`      | Sikka, 1950                      |
| Brody / Exponential   | `Papajcsik_Bodero1`        | Papajcsik & Bodero, 1988         |
| Brody / Exponential   | `Papajcsik_Bodero3`        | Papajcsik & Bodero, 1988         |
| Wood / Gamma-type     | `Wood_1967`                | Wood, 1967                       |
| Wood / Gamma-type     | `Wood_Dhanoa`              | Dhanoa, 1981                     |
| Wood / Gamma-type     | `Cappio_Borlino`           | Cappio-Borlino et al., 1995      |
| Wood / Gamma-type     | `Papajcsik_Bodero2`        | Papajcsik & Bodero, 1988         |
| Wood / Gamma-type     | `Pollott_Multiplicative`   | Pollott, 2000                    |
| Polynomial / Linear   | `Cobby_LeDu`               | Cobby & Le Du, 1978              |
| Polynomial / Linear   | `Quadratic_Dave`           | Dave, 1971                       |
| Polynomial / Linear   | `Mixed_Log_GS`             | Guo & Swalve, 1995               |
| Polynomial / Linear   | `Khandekar`                | Khandekar, 1993                  |
| Polynomial / Linear   | `Inverse_Poly_Nelder`      | Nelder, 1966                     |
| Wilmink / Log-exp     | `Wilmink_k005`             | Wilmink, 1987                    |
| Wilmink / Log-exp     | `Wilmink_k_estimated`      | Wilmink, 1987                    |
| Wilmink / Log-exp     | `Ali_Schaeffer`            | Ali & Schaeffer, 1987            |
| Wilmink / Log-exp     | `Log_Quadratic_Adediran`   | Adediran et al., 2012            |
| Wilmink / Log-exp     | `Morant_Gnanasakthy`       | Morant & Gnanasakthy, 1989       |

---

## Selecting Specific Animals

Use the `selected_animals` argument to analyse only a subset of the animals
present in the CSV file. Animal IDs must match the values in the `animal_id`
column exactly.

```{r selected-animals, eval=FALSE}
results <- run_lactation_analysis(
  input_csv_path   = "path/to/your/data.csv",
  selected_animals = c("Animal_A", "Animal_C")
)
```

---

## Combined Selection of Models and Animals

All four arguments can be used together:

```{r combined, eval=FALSE}
results <- run_lactation_analysis(
  input_csv_path   = "path/to/your/data.csv",
  selected_models  = c("Wood_1967", "Wilmink_k005", "Ali_Schaeffer"),
  selected_animals = c("Animal_A", "Animal_B"),
  out_dir          = "D:/results"
)
```

---

## Understanding the Outputs

### Per-Animal Outputs

For each successfully processed animal, a sub-folder named after the animal ID
is created inside `out_dir`. It contains the following files:

#### CSV Tables

| File                                      | Contents                                         |
|-------------------------------------------|--------------------------------------------------|
| `AnimalID_parameter_estimates.csv`        | Fitted parameters and standard errors            |
| `AnimalID_summary_metrics.csv`            | R², Adj. R², AIC, BIC, RMSE, Durbin-Watson      |
| `AnimalID_actual_and_predicted_values.csv`| Observed vs predicted milk yield at each DIM     |
| `AnimalID_residuals.csv`                  | Residuals at each time point for all models      |

#### Diagnostic Figures (15 PNG files)

| Figure  | Description                                                        |
|---------|--------------------------------------------------------------------|
| Fig01   | All 20 fitted curves overlaid on observed data                     |
| Fig02   | Four-panel ranked model fits (best to worst R²)                    |
| Fig03   | Best 3 vs Worst 3 models side by side                              |
| Fig04   | Models grouped and coloured by mathematical family                 |
| Fig05   | Residual diagnostics for top 4 models (4 sub-panels each)         |
| Fig06   | Residual bubble chart across all models and time points            |
| Fig07   | R² and Adjusted R² bar chart for all models                       |
| Fig08   | RMSE bar chart for all models                                      |
| Fig09   | AIC and BIC bar chart for all models                               |
| Fig10   | Durbin-Watson statistic bar chart for all models                   |
| Fig11   | R² vs RMSE performance bubble chart coloured by model family       |
| Fig12   | Multi-metric ranking dot-plot (6 metrics simultaneously)           |
| Fig13   | Predicted vs actual scatter plots (one panel per model)            |
| Fig14   | Pearson correlation heatmap of model predictions                   |
| Fig15   | Predicted peak yield and time-to-peak bar charts                   |

### Combined Cross-Animal Outputs

When more than one animal is analysed, the following combined outputs are
saved directly in `out_dir`:

| File / Figure                                  | Contents                                      |
|------------------------------------------------|-----------------------------------------------|
| `COMBINED_summary_metrics_all_animals.csv`     | All metrics for all animals and models        |
| `COMBINED_parameter_estimates_all_animals.csv` | All parameter estimates across all animals    |
| `COMBINED_best_model_per_animal.csv`           | Best fitting model per animal with key metrics|
| `COMBINED_Fig_R2_CrossAnimal.png`              | Grouped bar chart comparing R² across animals |
| `COMBINED_Fig_RMSE_CrossAnimal.png`            | Grouped bar chart comparing RMSE across animals|
| `COMBINED_Fig_BestModel_Overlay.png`           | Best-fit curve overlay for all animals        |

---

## Accessing Results in R

The function returns a named list (one element per animal) which can be
accessed directly in R:

```{r access-results, eval=FALSE}
# View goodness-of-fit metrics for Animal_A
results[["Animal_A"]]$metrics_df

# View parameter estimates for Animal_A
results[["Animal_A"]]$param_table

# Access the fitted nls object for Wood (1967) model
results[["Animal_A"]]$model_fits[["Wood_1967"]]$model

# View model coefficients
coef(results[["Animal_A"]]$model_fits[["Wood_1967"]]$model)

# View individual metrics
results[["Animal_A"]]$model_fits[["Wood_1967"]]$metrics
```

---

## Using fit_lactation_models() Directly

For advanced users, the lower-level function `fit_lactation_models()` fits
models to a single animal's pre-formatted data frame:

```{r fit-direct, eval=FALSE}
# Build a data frame for one animal
animal_df <- data.frame(
  x = 1:20,
  y = c(9.3, 9.9, 10.2, 10.4, 10.3, 10.1, 9.9, 9.6, 9.3, 9.0,
        8.7, 8.4, 8.1, 7.8, 7.5, 7.2, 6.9, 6.6, 6.3, 6.0),
  z = (1:20) / 365
)

# Fit all 20 models
fits <- fit_lactation_models(animal_df)

# Fit selected models only
fits <- fit_lactation_models(
  animal_df,
  selected_models = c("Wood_1967", "Wilmink_k005", "Ali_Schaeffer")
)

# Access results
fits[["Wood_1967"]]$metrics
fits[["Wood_1967"]]$predictions
coef(fits[["Wood_1967"]]$model)
```

The data frame passed to `fit_lactation_models()` must contain:

| Column | Description                                           |
|--------|-------------------------------------------------------|
| `x`    | Integer time index (days-in-milk, e.g. 1 to 20)      |
| `y`    | Numeric daily milk yield in kg/day                    |
| `z`    | Numeric, equal to `x / 365` (required by Morant-Gnanasakthy model) |

---

## Goodness-of-Fit Metrics

The following metrics are computed for every successfully fitted model:

| Metric          | Description                                                         |
|-----------------|---------------------------------------------------------------------|
| R²              | Coefficient of determination                                        |
| Adjusted R²     | R² penalised for the number of parameters                          |
| RMSE            | Root mean square error (kg/day)                                     |
| AIC             | Akaike information criterion — lower is better                      |
| BIC             | Bayesian information criterion — lower is better                    |
| Durbin-Watson   | Test statistic for autocorrelation in residuals (ideal value = 2)  |

---

## Citation

If you use LactCurveModels in your research, please cite it as:

```{r citation, eval=FALSE}
citation("LactCurveModels")
```

> Raja TV, Lalremruati PC, Priyadharshini P, Nidhishree NS, Gurjar D,
> Alex R, Vohra V (2025). *LactCurveModels: Lactation Curve Model Fitting
> for Dairy Animals*. R package version 0.1.0.
> https://cran.r-project.org/package=LactCurveModels

---

## Getting Help

- Use `?run_lactation_analysis` and `?fit_lactation_models` for function-level help
- Report bugs at: https://github.com/venkatesanraja/LactCurveModels/issues