---
title: "Getting Started with SCIproj"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with SCIproj}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## What is a research compendium?

A **research compendium** is a self-contained collection of data, code, and documentation that accompanies a research project. By structuring a project as an R package, you gain:

- a standard, well-understood directory layout,
- built-in dependency management via `DESCRIPTION`,
- documentation infrastructure (`roxygen2`, vignettes),
- testing infrastructure (`testthat`),
- easy sharing and installation via GitHub.

SCIproj automates the creation of such a compendium, adding opinionated defaults for reproducible workflows (`targets`), dependency snapshots (`renv`), and FAIR-compliant metadata (`CITATION.cff`).

## Getting started

Install SCIproj from GitHub:

```{r install, eval = FALSE}
# install.packages("remotes")
remotes::install_github("saskiaotto/SCIproj")
```

Create a new project with a single call:

```{r basic, eval = FALSE}
library(SCIproj)
create_proj("~/projects/my_analysis")
```

This creates a fully scaffolded research compendium with `renv` and `targets` enabled by default.

### Customizing the call

```{r custom, eval = FALSE}
create_proj("~/projects/baltic_cod",
  add_license = "MIT",
  license_holder = "Jane Doe",
  orcid = "0000-0001-2345-6789",
  use_docker = TRUE,
  use_git = TRUE
)
```

Directory names with underscores or hyphens are fine --- the R package name in `DESCRIPTION` is automatically sanitized (e.g., `baltic_cod` becomes `baltic.cod`).

## IDE support

SCIproj works with RStudio, Positron, VSCode, and terminal R sessions. By default, `create_proj()` sets your working directory to the new project (`setwd_to_proj = TRUE`) so you can start working immediately. Three parameters control post-creation behavior:

- `setwd_to_proj` (default `TRUE`): whether the current session's working directory is switched to the new project. Set to `FALSE` for batch workflows where you create multiple projects in sequence and want to stay in your current directory.
- `use_rproj` (default `TRUE`): whether an `.Rproj` file is created. Set to `FALSE` for projects used exclusively in Positron, VSCode, or similar IDEs that don't rely on `.Rproj` files.
- `open_proj` (default `FALSE`): if `TRUE`, opens the new project in a fresh RStudio or Positron session. In that case, the current session keeps its original working directory.
  
## Project structure

After creation, the project directory looks like this:

```
your-project/
├── DESCRIPTION             # Project metadata, dependencies, and author info (with ORCID).
├── README.Rmd              # Top-level project description.
├── your-project.Rproj      # RStudio project file (if use_rproj = TRUE, default).
├── CITATION.cff            # Machine-readable citation metadata for FAIR compliance.
├── CONTRIBUTING.md         # Contribution guidelines.
├── LICENSE.md              # Full license text (here: MIT).
├── NAMESPACE               # Auto-generated by roxygen2 (do not edit by hand).
│
├── data-raw/               # Raw data files and pre-processing scripts.
│   ├── clean_data.R        # Script template for data cleaning.
│   ├── DATA_SOURCES.md     # Data provenance: source, license, DOI, download date.
│   └── ...
│
├── data/                   # Cleaned datasets stored as .rda files.
│
├── R/                      # Custom R functions and dataset documentation.
│   ├── function_ex.R       # Template for custom functions.
│   ├── data.R              # Template for dataset documentation.
│   └── ...
│
├── analyses/               # R scripts or R Markdown/Quarto documents for analyses.
│   ├── figures/            # Generated plots.
│   └── ...
│
├── docs/                   # Publication-ready documents (article, report, presentation).
├── trash/                  # Temporary files that can be safely deleted.
│
├── _targets.R              # Pipeline definition for reproducible workflow.
├── renv/                   # renv library and settings.
├── renv.lock               # Lockfile for reproducible package versions.
└── Dockerfile              # Container definition for full reproducibility.
```


| Directory / File    | Purpose                                              |
|---------------------|------------------------------------------------------|
| `R/`                | Reusable R functions (documented with `roxygen2`)    |
| `data/`             | Cleaned, analysis-ready datasets (`.rda` format)     |
| `data-raw/`         | Raw data files and the script that cleans them       |
| `analyses/`         | Analysis scripts, R Markdown reports, figures        |
| `docs/`             | Manuscripts, presentations, supplementary material   |
| `trash/`            | Temporary files not under version control            |
| `_targets.R`        | Pipeline definition for `targets`                    |
| `CITATION.cff`      | Machine-readable citation metadata                   |
| `CONTRIBUTING.md`   | Guidelines for collaborators                         |

## FAIR compliance

SCIproj encourages **FAIR** (Findable, Accessible, Interoperable, Reusable) research practices through several built-in features:

### CITATION.cff
A [Citation File Format](https://citation-file-format.github.io/) file is created automatically. It includes the project title, author name, version, release date, and optionally a license and ORCID iD. Services like GitHub and Zenodo can parse this file to generate proper citations.

```{r citation, eval = FALSE}
create_proj("my_project",
  license_holder = "Jane Doe",
  orcid = "0000-0001-2345-6789",
  add_license = "MIT"
)
```

### DATA_SOURCES.md
When `data_raw = TRUE` (the default), a `DATA_SOURCES.md` template is placed in `data-raw/`. Use it to document the provenance of every dataset: source, URL, DOI, license, download date, and file names.

### ORCID
Pass your [ORCID iD](https://orcid.org/) via the `orcid` parameter to embed it in `CITATION.cff`, making your authorship unambiguously machine-readable.

## Workflow with targets

By default (`use_targets = TRUE`), SCIproj adds a `_targets.R` pipeline template. The [targets](https://docs.ropensci.org/targets/) package provides:

- **Automatic dependency tracking** --- only outdated targets are re-run.
- **Caching** --- results are stored in the `_targets/` data store.
- **Visualization** --- `tar_visnetwork()` shows the pipeline as a graph.

A typical workflow:

```{r targets, eval = FALSE}
# 1. Define targets in _targets.R
# 2. Inspect the pipeline
targets::tar_manifest()
targets::tar_visnetwork()
# 3. Run the pipeline
targets::tar_make()
# 4. Read a result
targets::tar_read(my_result)
```

Edit `_targets.R` to define your data-loading, analysis, and reporting steps. Each step is a target that depends on upstream targets and R functions in `R/`.

## Dependency management with renv

By default (`use_renv = TRUE`), SCIproj initializes [renv](https://rstudio.github.io/renv/) with the `"explicit"` snapshot type.
This means renv discovers dependencies from `DESCRIPTION` rather than scanning all R files, which is the recommended approach for package-based compendia.

Key commands:

```{r renv, eval = FALSE}
renv::status()     # check if lockfile is in sync
renv::snapshot()   # update the lockfile after adding packages
renv::restore()    # reinstall packages from the lockfile
```

The `renv.lock` file should be committed to version control so collaborators can reproduce your exact package versions.

## Optional features

### Docker
Set `use_docker = TRUE` to add a `Dockerfile` and `.dockerignore`. The Dockerfile provides a template for building a container that reproduces your computational environment, independent of the host system.

### GitHub and CI
Set `create_github_repo = TRUE` to create a GitHub repository (requires a configured `GITHUB_PAT`). Add `ci = "gh-actions"` to include a GitHub Actions workflow for automated R CMD check on push.

```{r github, eval = FALSE}
create_proj("my_project",
  use_git = TRUE,
  create_github_repo = TRUE,
  ci = "gh-actions"
)
```

### Licenses
Choose from `"MIT"`, `"GPL"`, `"AGPL"`, `"LGPL"`, `"Apache"`, `"CCBY"`, or`"CC0"` via the `add_license` parameter. The selected license is applied to `DESCRIPTION` and recorded in `CITATION.cff`.

### testthat
Set `testthat = TRUE` to add testing infrastructure (`tests/testthat.R` and `tests/testthat/`). Writing tests for your analysis functions helps catch regressions early.

### Makefile
Set `makefile = TRUE` to add a `makefile.R` script as an alternative to `targets` for orchestrating your workflow.

## Typical development cycle

1. **Create the project**
   ```r
   SCIproj::create_proj("~/projects/my_study", add_license = "MIT",
     license_holder = "Your Name")
   ```
2. **Start working in the project.** Your working directory is already set to the new project (`setwd_to_proj = TRUE` by default), so you can continue immediately. For a dedicated project session, either open the `.Rproj` file manually (RStudio) or the project folder as a workspace (Positron/VSCode) — or pass `open_proj = TRUE` to `create_proj()` to open a new IDE session automatically.
3. **Add raw data** to `data-raw/` and document it in `DATA_SOURCES.md`.
4. **Write cleaning code** in `data-raw/clean_data.R`; save cleaned data to `data/` with `usethis::use_data()`.
5. **Write analysis functions** in `R/` and document them with `roxygen2`.
6. **Define the pipeline** in `_targets.R` to connect data, functions, and reports.
7. **Run `targets::tar_make()`** to execute the pipeline.
8. **Write reports** in `analyses/` using R Markdown or Quarto, reading results with `targets::tar_read()`.
9. **Snapshot dependencies** with `renv::snapshot()` before sharing.
10. **Push to GitHub** and let CI run `R CMD check` automatically.