---
title: "Mining species list for any plant family"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Mining species list for any plant family}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{css, echo=FALSE}
.table{
  width: auto;
  font-size: 14px;
}
.table caption {
  font-size: 1em;
}
```

Here in this article, we show how to use the package's function `powoSpecies` 
for mining all accepted species for any genus or family of vascular plants in 
the World Checklist of Vascular Plants (WCVP; Govaerts et al., 2021) available 
through Kew’s Plants of The World Online (POWO, https://powo.science.kew.org).\

\

# Setup

Install the latest development version of __expowo__ from 
[GitHub](https://github.com/):

``` r
#install.packages("devtools")
devtools::install_github("DBOSlab/expowo")
```
```{r package load, message=FALSE, warning=FALSE}
library(expowo)
```

\

## Mining all accepted species for any plant family

The function `powoSpecies` returns a data frame or saves a CSV file listing all 
accepted species (including hybrid species or not), their publication, 
authorship, and global geographic distribution at country or botanical level for
any family of vascular plants. The global classification of botanical divisions
follows the [World Geographical Scheme](https://www.tdwg.org/standards/wgsrpd/) 
for Recording Plant Distributions, which is already associated with each taxon's
distribution in the POWO.\

The example below shows how to mine all accepted species for a specified vector 
of families. The output shown here (TABLE 1) is a simplified version, where we 
have removed some columns (genus, epithet and authors) so as to focus on the 
display of the taxonomic and distribution data.\

```{r, eval = FALSE}
res <- powoSpecies(family = c("Cabombaceae", "Martyniaceae"),
                   hybrid = FALSE,
                   verbose = TRUE,
                   save = FALSE,
                   dir = "results_powoSpecies",
                   filename = "Cabom_Martyniaceae_search")
```

\

```{r, echo = FALSE, warning = FALSE}
utils::data("angioData")
fam <- c("Cabombaceae", "Martyniaceae")
df <- angioData[angioData$family %in% fam, ]

knitr::kable(df[-c(2, 3, 4, 5, 10, 11, 13)],
             row.names = FALSE,
             caption = "TABLE 1. A general `powoSpecies` search for mining all 
             accepted species for some specific plant families.")
```

\

## Narrowing down the `powoSpecies` search based on a specified country vector

You can also narrow down the species checklist search of any family by
focusing on just a particular country or a list of countries. You just need to 
define a vector of country names in the argument \code{country}. In the example 
below, note that we have originally searched for the species within the families 
__Aristolochiaceae__, __Cabombaceae__, and __Martyniaceae__, but the function 
returned a smaller species list, because many species in the searched families 
are not recorded in any of the specified vector of country, i.e. __Brazil__ or 
__Colombia__.\

```{r, eval = FALSE}
ACM <- powoSpecies(family = c("Aristolochiaceae", "Cabombaceae", "Martyniaceae"),
                   hybrid = FALSE,
                   country = c("Brazil", "Colombia"),
                   verbose = FALSE,
                   save = FALSE,
                   dir = "results_powoSpecies",
                   filename = "country_constrained_search")
```

\

```{r, echo = FALSE, warning = FALSE}
utils::data("angioData")
fam <- c("Aristolochiaceae", "Cabombaceae", "Martyniaceae")
df <- angioData[angioData$family %in% fam, ]
country <- c("Brazil", "Colombia")
df <- df[df$native_to_country %in% country, ]

knitr::kable(df[-c(2, 3, 4, 5, 10, 11, 12, 13)],
             row.names = FALSE,
             caption = "TABLE 2. A `powoSpecies` search based on a specified 
             country vector.")
```

\

## Narrowing down the `powoSpecies` search based on a specified genus and country vector

You may want to retrieve information for just one or a list of accepted 
genera from a given country (or from a list of countries). Just like before,
you only need to define a vector of genus names in the argument \code{genus} and
a vector of country names in the argument \code{country}. In the example below,
see that we have searched for just the genera ***Proboscidea*** and 
***Cabomba*** of the families __Martyniaceae__ and __Cabombaceae__, 
but the function only returned the __Cabombaceae__ genus ***Cabomba***, 
because ***Proboscidea*** does not occur in any of the provided list of countries,
i.e. __Argentina__, __French Guiana__, or __Brazil__.\

```{r, eval = FALSE}
MC <- powoSpecies(family = c("Martyniaceae", "Cabombaceae"),
                  genus = c("Proboscidea", "Cabomba"),
                  hybrid = FALSE,
                  country = c("Argentina", "French Guiana", "Brazil"),
                  verbose = TRUE,
                  save = FALSE,
                  dir = "results_powoSpecies",
                  filename = "country_and_genus_constrained_search")
```

\

```{r, echo = FALSE, warning = FALSE}
utils::data("angioData")
genus <- c("Proboscidea", "Cabomba")
country <- c("Argentina", "French Guiana", "Brazil")
df <- angioData[angioData$genus %in% genus, ]
df <- df[df$native_to_country %in% country, ]

knitr::kable(df[-c(2, 3, 10, 11, 13)],
             row.names = FALSE,
             caption = "TABLE 3. A `powoSpecies` search based on specified genus
             and country vectors.")
```
\

## Mining all accepted species for all plant families

To mine a species checklist for all families of vascular plants, you recommend 
to load the dataframe-formatted data object called `POWOcodes` that comes 
associated with the __expowo__ package. The `POWOcodes` data object 
already contains the URI addresses for all plant families recognized in 
the [POWO](https://powo.science.kew.org) database, so you just need to call it
to your R environment.\

The example below shows how to mine a global checklist of all accepted species 
of vascular plants by using the vector of all plant families and 
associated URI addresses stored in the `POWOcodes` object.\

```{r, eval = FALSE}
utils::data("POWOcodes")

ALL_spp <- powoSpecies(POWOcodes$family,
                       hybrid = TRUE,
                       verbose = TRUE,
                       save = FALSE,
                       dir = "results_powoSpecies",
                       filename = "all_plant_species")
```

\

Since this is a very long search that might requires hours to be completed, it 
may happen to loose internet connection at some point of the queried search.
But you do not need to start an entirely new search from the beginning. By 
setting the `powoSpecies` with `rerun = TRUE`, a previously stopped search will 
continue from where it left off, starting with the last retrieved taxon. Please 
ensure that the 'filename' argument exactly matches the name of the CSV file 
saved from the previous search, and that the previously saved CSV file is 
located within a subfolder named after the current date. If it is not, please 
rename the date subfolder accordingly.

\

```{r, eval = FALSE}
utils::data("POWOcodes")

ALL_spp <- powoSpecies(POWOcodes$family,
                       hybrid = TRUE,
                       verbose = TRUE,
                       rerun = TRUE,
                       save = FALSE,
                       dir = "results_powoSpecies",
                       filename = "all_plant_species")
```
\

## Reference

POWO (2019). "Plants of the World Online. Facilitated by the Royal Botanic Gardens, Kew. Published on the Internet; http://www.plantsoftheworldonline.org/ Retrieved April 2023."
