---
title: "Getting Started"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  message = FALSE,
  warning = FALSE,
  comment = "#>"
)
```

`puremoe` provides a unified interface to PubMed and NLM data. Search with `search_pubmed()`, then retrieve data from any of five endpoints with `get_records()`.

```{r libs}
library(puremoe)
library(dplyr)
library(DT)
```

## Search

`search_pubmed()` accepts standard PubMed query syntax and returns a vector of PMIDs.

```{r search}
pmids <- puremoe::search_pubmed('("political ideology"[TiAb])')
length(pmids)
```

```{r subset}
pmids_sub <- head(pmids, 50L)
```

## Abstracts

```{r abstracts}
abstracts <- puremoe::get_records(
  pmids_sub,
  endpoint = "pubmed_abstracts",
  cores    = 1L,
  sleep    = 0.5
)

abstracts <- abstracts |> mutate(pmid = as.character(pmid))
```

```{r abstracts-table}
abstracts |>
  select(pmid, year, journal, articletitle) |>
  DT::datatable(rownames = FALSE)
```

The `annotations` column is a list of per-article data frames containing MeSH terms, chemical names, and keywords.

```{r annotations}
bind_rows(abstracts$annotations) |>
  head(20) |>
  DT::datatable(rownames = FALSE)
```

## Affiliations

```{r affiliations}
affiliations <- puremoe::get_records(
  head(pmids_sub, 25L),
  endpoint = "pubmed_affiliations",
  cores    = 1L,
  sleep    = 0.5
)

affiliations |>
  DT::datatable(rownames = FALSE)
```

## iCite metrics

```{r icites}
icites <- puremoe::get_records(
  pmids_sub,
  endpoint = "icites",
  cores    = 1L,
  sleep    = 0.25
)

icites |>
  mutate(pmid = as.character(pmid)) |>
  select(-citation_net, -cited_by_clin) |>
  DT::datatable(rownames = FALSE, options = list(scrollX = TRUE))
```

## PubTator annotations

```{r pubtations}
pubtations <- puremoe::get_records(
  head(pmids_sub, 30L),
  endpoint = "pubtations",
  cores    = 1L
)

pubtations |>
  DT::datatable(rownames = FALSE)
```

## Full text

Full-text retrieval requires open-access PMC articles. `pmid_to_ftp()` resolves
PMIDs to XML URLs via the PMC Cloud Service on AWS S3, filtering to only those
with open-access full text available. In August 2026, NCBI will complete its
migration from the legacy PMC FTP Service to the Cloud Service; `puremoe` already
uses the new service.

```{r ftp}
ftp <- puremoe::pmid_to_ftp(pmids = pmids_sub)
ftp |> DT::datatable(rownames = FALSE, options = list(scrollX = TRUE))
```

```{r fulltext}
fulltext <- puremoe::get_records(
  head(ftp$url, 2L),
  endpoint = "pmc_fulltext",
  cores    = 1L
)

fulltext |>
  mutate(text = sapply(strsplit(text, "\\s+"), function(w) paste0(paste(head(w, 15), collapse = " "), "..."))) |>
  slice(1:5) |>
  DT::datatable(rownames = FALSE, options = list(scrollX = TRUE))
```

## Endpoint schemas

`endpoint_info()` returns column definitions, rate limits, and notes for any endpoint.

```{r endpoint-info}
puremoe::endpoint_info()
```

```{r endpoint-detail}
puremoe::endpoint_info("icites")
```
