---
title: "CellDEEP Quick Start"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{CellDEEP Quick Start}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

## What CellDEEP does

CellDEEP reduces scRNA-seq sparsity by pooling cells into pseudocells before DE testing.

## Load package and example data

```{r, load_data}
library(CellDEEP)
data("sim")
```

## Step 1: Run DE directly with FindMarker.CellDEEP

`FindMarker.CellDEEP` includes metadata preparation internally.
Key parameters to set:
- `group_id`, `sample_id`, `cluster_id`: metadata column names in your Seurat object
- `ident.1`, `ident.2`: two groups to compare
- `cell_selection`: how to select cells for pooling (`"kmean"` or `"random"`)
- `readcounts`: how to aggregate counts in pooled cells (`"sum"` or `"mean"`)
- `min_cells_per_subgroup`: minimum cells required in each sample-cluster subgroup for pooling

```{r}
de.test <- FindMarker.CellDEEP(
  sim,
  group_id = "Status",
  sample_id = "DonorID",
  cluster_id = "cluster_id",
  Pool = TRUE,
  test.use = "wilcox",
  n_cells = 3,
  min_cells_per_subgroup = 1,
  cell_selection = "random",
  readcounts = "sum",
  logfc.threshold = 0.25,
  ident.1 = "Case",
  ident.2 = "Control"
)
```

## Step 2: Pool cells only (optional)

Use these functions if you want pooled objects without running DE immediately.

`min_cells_per_subgroup` means the minimum number of cells required in each
`sample_id x cluster_id` subgroup before pooling is performed.

Pooling functions use standardized metadata fields (`sample_id`, `group_id`, `cluster_id`),
so prepare once before pooling:

```{r}
pool_input <- prepare_data(
  sim,
  sample_id = "DonorID",
  group_id = "Status",
  cluster_id = "cluster_id"
)
```

### K-means pooling

```{r}
pooled_kmean <- CellDEEP.Kmean(
  pool_input,
  readcounts = "sum",
  n_cells = 3,
  min_cells_per_subgroup = 1,
  assay_name = "RNA"
)
pooled_kmean
```

### Random pooling

```{r}
pooled_random <- CellDEEP.Random(
  pool_input,
  readcounts = "sum",
  n_cells = 5,
  min_cells_per_subgroup = 1,
  assay_name = "RNA"
)
pooled_random
```

If no genes pass the adjusted p-value filter in this small example dataset, try a larger dataset or set `full_list = TRUE`.
