---
title: "Example workflow using FUSE"
author: "Susanna Holmstrom"
date: "2026-02-05"
output:  rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Example workflow using FUSE}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, 
                      message=FALSE, 
                      warning=FALSE)
```

```{r}
library(methFuse)
``` 

### Reading in data 
We start by reading in the data, which consist of the tables K0 and K1. Both have CpG sites as rows and samples as columns, and 

- K0 contains the unmethylated counts
- K1 contains the methylated counts

This is a dummy data set consisting of manipulated counts of the 100 000 first CpG sites in chromosome 20. 

```{r}
k0_file <- system.file("examples/k0.tsv.gz", package = "methFuse")
k1_file <- system.file("examples/k1.tsv.gz", package = "methFuse")

K0 <- read.table(k0_file, header = TRUE)
K1 <- read.table(k1_file, header = TRUE)

head(K0)
```
### Apply FUSE as a pipeline
The whole FUSE pipeline can be applied using the function 'fuse.segment()'. It performs the following steps: 

1. clustering
2. cutting clustering tree optimally
3. summarizing segments

The function takes as input the count matrices K0 and K1 and the chromosome and position information for each CpG site, and outputs a summary table and a data frame with betas. 
```{r}
segment_result <- fuse.segment(
  as.matrix(K0),
  as.matrix(K1),
  chr = sub("\\..*$", "", rownames(K0)),
  pos = as.integer(sub("^.*\\.", "", rownames(K0))),
  method = "BIC"
)

head(segment_result$summary)
```
```{r}
head(segment_result$betas_per_segment)
```

```{r, fig.height=10, fig.width=8}
plot(segment_result, segments_to_plot = 1:nrow(segment_result$summary))
plot(segment_result, segments_to_plot = 50:60)
``` 

## Using BSseq or methrix
The 'fuse.segment()' function accepts as input also a BSseq or a methrix object. 

```{r bsseq-example, eval = requireNamespace("bsseq", quietly = TRUE)}
library(bsseq)

set.seed(1)
M <- matrix(c(sample(0:10, 1000, TRUE),
              sample(50:60, 1000, TRUE),
              sample(0:10, 1000, TRUE),
              sample(20:30, 1000, TRUE))
            , nrow = 1000, byrow = T)
Cov <- M + matrix(sample(50:60, 4000, TRUE), nrow = 1000)
  
  
bs <- BSseq(
    chr = rep("chr1", 1000),
    pos = seq_len(1000),
    M = M,
    Cov = Cov,
    sampleNames = colnames(M)
  )

res <- fuse.segment(bs)
head(res$summary)
```

```{r methrix-example, eval = requireNamespace("methrix", quietly = TRUE)}
library(methrix)

data(methrix_data, package = "methrix")

res <- fuse.segment(methrix_data)
head(res$summary)

```


## Apply FUSE through separate steps
If the intermediate outputs of the method are relevant, then the pipeline can also be applied by calling each of the functions 'fuse.cluster()', 'number.of.clusters()', 'fuse.cut.tree()', and 'fuse.summary()' separately. 

### 1. Cluster
In the first step, 'fuse.cluster()' is applied on the count matrices. This performs a hierarchical clustering of closest neighbors, and outputs a clustering tree of class 'hclust'.

```{r}
tree <- fuse.cluster(as.matrix(K0), 
                     as.matrix(K1),
                     chr = sub("\\..*$", "", rownames(K0)), 
                     pos = as.integer(sub("^.*\\.", "", rownames(K0))))
names(tree)
head(tree$merge)
head(tree$height)
```

The clustering tree contains the following elements:

- 'merge': the labels of the merged points
- 'height': The total log-likelihood for forming this merge
- 'order': Same order as original, since clustering is done in 1D
- 'labels': Labels of form 'chr.pos'
- 'call': 'fuse.cluster(K0, K1)'
- 'method': 'fuse'
- 'dist.method': 'fuse'

### 2. Cutting the tree
In order to cut the clustering tree, the optimal number of segments needs to be calculated. For this we have the function 'number.of.clusters()', which performs model selection by minimizing either the Bayesian Information Criterion (BIC, default), or the Akaike Information Criterion (AIC).

```{r}
optimal_num_of_segments <- number.of.clusters(tree, ncol(K0), 'BIC')
optimal_num_of_segments
```

The tree can then be cut using 'fuse.cut.tree()', (or 'cutree' from package 'stats').

```{r}
segments <- fuse.cut.tree(tree, optimal_num_of_segments)
head(segments, 100)
```


### 3. Summarizing segments
The segments can be summarized in a table with the function 'fuse.summary()'.
```{r}
result <- fuse.summary(as.matrix(K0), 
                       as.matrix(K1), 
                       chr = sub("\\..*$", "", rownames(K0)), 
                       pos = as.numeric(sub("^.*\\.", "", rownames(K0))), 
                       segments)

head(result$summary)
```
```{r}
head(result$betas_per_segment)
```

### Verification
The outputs are identical.
```{r}
identical(segment_result$summary, result$summary)
```

```{r}
identical(segment_result$betas_per_segment, result$betas_per_segment)
```

## Plotting
Let's visualize a small piece of the result:
```{r, fig.width=8, fig.height=3}
plot(result, segments_to_plot = 1:50)
```
