---
title: "Getting Started with ml"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with ml}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE  # examples shown but not run during check (require optional deps)
)
```

```{r setup}
library(ml)
```

## Overview

The `ml` package implements the split-fit-evaluate-assess workflow from
Hastie, Tibshirani, and Friedman (2009), Chapter 7. The key idea: keep a
held-out test set sacred until you are done experimenting, then assess once.

**Formula interfaces are not supported.** Pass the data frame and target
column name as a string: `ml_fit(data, "target", seed = 42)`.

## Step 1: Profile your data

Before modeling, understand what you have:

```{r profile}
prof <- ml_profile(iris, "Species")
prof
```

## Step 2: Split into train/valid/test

Three-way split (60/20/20). Stratified by default for classification.

```{r split}
s <- ml_split(iris, "Species", seed = 42)
s
```

Access partitions with `$train`, `$valid`, `$test`. The `$dev` property
combines train and valid for final retraining.

## Step 3: Screen algorithms

Find candidates quickly before tuning:

```{r screen}
lb <- ml_screen(s, "Species", seed = 42)
lb
```

## Step 4: Fit and evaluate

Iterate freely on the validation set:

```{r fit-evaluate}
model <- ml_fit(s$train, "Species", algorithm = "logistic", seed = 42)
model

metrics <- ml_evaluate(model, s$valid)
metrics
```

## Step 5: Explain feature importance

```{r explain}
exp <- ml_explain(model)
exp
```

## Step 6: Validate against rules

Gate your model before final assessment:

```{r validate}
gate <- ml_validate(model,
                    test  = s$test,
                    rules = list(accuracy = ">0.70"))
gate
```

## Step 7: Assess on test data (once)

The final exam. Call this only when done experimenting.

```{r assess}
verdict <- ml_assess(model, test = s$test)
verdict
```

## Step 8: Save and load

```{r io, eval = FALSE}
path <- file.path(tempdir(), "iris_model.mlr")
ml_save(model, path)
loaded <- ml_load(path)
predict(loaded, s$valid)[1:5]
```

## Module-style interface

All functions are also available via the `ml$verb()` pattern, which mirrors
Python's `import ml; ml.fit(...)`:

```{r module-style}
# Identical results — pick the style you prefer
m2 <- ml$fit(s$train, "Species", algorithm = "logistic", seed = 42)
identical(predict(model, s$valid), predict(m2, s$valid))
```

## Regression example

The same workflow applies to regression:

```{r regression}
s2   <- ml_split(mtcars, "mpg", seed = 42)
m_rf <- ml_fit(s2$train, "mpg", seed = 42)
ml_evaluate(m_rf, s2$valid)
```

## Available algorithms

```{r algorithms}
ml_algorithms()
```

| Algorithm | Classification | Regression | Package |
|-----------|:-----------:|:-----------:|---------|
| "logistic" | yes | -- | base R ('nnet') |
| "xgboost" | yes | yes | 'xgboost' |
| "random_forest" | yes | yes | 'ranger' |
| "linear" (Ridge) | -- | yes | 'glmnet' |
| "elastic_net" | -- | yes | 'glmnet' |
| "svm" | yes | yes | 'e1071' |
| "knn" | yes | yes | 'kknn' |
| "naive_bayes" | yes | -- | 'naivebayes' |

LightGBM support is planned for v1.1.
