Test data (SDTM) for the pharmaverse family of packages
To provide a one-stop-shop for SDTM test data in the pharmaverse
family of packages. This includes datasets that are therapeutic area
(TA)-agnostic (DM
, VS
, EG
, etc.)
as well TA-specific ones (RS
, TR
,
OE
, etc.).
The package is available from CRAN and can be installed by running
install.packages("pharmaversesdtm")
. To install the latest
development version of the package directly from GitHub use the
following code:
if (!requireNamespace("remotes", quietly = TRUE)) {
install.packages("remotes")
}
::install_github("pharmaverse/pharmaversesdtm", ref = "main") # This command installs the latest development version directly from GitHub. remotes
Some test datasets have been sourced from the CDISC pilot project, while other datasets have been constructed ad-hoc by the {admiral} team. Please check the Reference page for detailed information regarding the source of specific datasets.
dm
, rs
).oe_ophtha
,
rs_onco
, rs_onco_irecist
).Note: If an SDTM domain is used by multiple TAs,
{pharmaversesdtm}
may provide multiple versions of the
corresponding test dataset. For instance, the package contains
ex
and ex_ophtha
as the latter contains
ophthalmology-specific variables such as EXLAT
and
EXLOC
, and EXROUTE
is exchanged for a
plausible ophthalmology value.
Firstly, make a GitHub issue in {pharmaversesdtm}
with the planned updates and tag @pharmaverse/admiral
so
that one of the development core team can sanity check the request. Then
there are two main ways to extend the test data: either by adding new
datasets or extending existing datasets with new records/variables.
Whichever method you choose, it is worth noting the following:
data-raw/
folder.library()
at the start of the program (but please do
not call library(pharmaversesdtm)
).data-raw/
folder, you need to run it as a standalone R script, in order to
generate a test dataset that will become part of the
{pharmaversesdtm}
package, but you do not need to build the
package..rda
file whose
name is consistent with the name of the dataset, e.g., dataset
xx
is stored as xx.rda
. The easiest way to
achieve this is to use usethis::use_data(xx)
data-raw/
are stored within the
{pharmaversesdtm}
GitHub repository, but they are
not part of the {pharmaversesdtm}
package–the data-raw/
folder is specified in
.Rbuildignore
.data-raw/
folder,
you generate a dataset that is written to the data/
folder,
which will become part of the {pharmaversesdtm}
package.R/*.R
, for the purpose of generating documentation in the
man/
folder.Note: The documentation process in
{pharmaversesdtm}
is automated for consistency and ease of
maintenance. Metadata for each dataset, such as names, labels,
descriptions, authors, and sources, is managed in a centralized JSON
file (inst/extdata/sdtms-specs.json
) and used to generate
.R
documentation files. This streamlined approach aligns
with best practices for efficient package development.
data-raw/
folder, named
<name>.R
, where <name>
should
follow the naming convention, to generate the test
data and output <name>.rda
to the data/
folder.
dm
as input in this
program in order to create realistic synthetic data that remains
consistent with other domains (not mandatory).inst/extdata/sdtms-specs.json
file.data-raw/create_sdtms_data.R
in order to update
NAMESPACE
and update the .Rd
files in
man/
..github/CODEOWNERS
.NEWS.md
.<name>.R
in the
data-raw/
folder, update it accordingly.inst/extdata/sdtms-specs.json
file.<name>.rda
to
the data/
folder.data-raw/create_sdtms_data.R
in order to update
NAMESPACE
and update the .Rd
files in
man/
..github/CODEOWNERS
.NEWS.md
.