hubUtils 1.2.0
- Improved error handling when internet resources are unavailable.
Functions that access remote URLs now fail gracefully with informative
error messages (#272).
hubUtils 1.1.0
- Added utility functions for extracting properties from
target-data.json configuration files (v6.0.0 schema):
get_date_col(): Get the name of the date column across
hub data.
get_observable_unit(): Get observable unit column names
with support for dataset-specific overrides.
get_versioned(): Get whether target data is versioned
with inheritance from global settings.
get_has_output_type_ids(): Get whether oracle-output
data has output_type/output_type_id columns.
get_non_task_id_schema(): Get the schema for non-task
ID columns in time-series data.
- Moved
has_target_data_config() from
hubAdmin package to hubUtils. This function
checks if a target-data.json file exists in a hub (#260).
hubUtils 1.0.0
- Added latest schema version (v6.0.0).
- Added support for
target-data config files in
get_schema_url(), read_config(),
get_version_hub(), and is_v3_hub()
(#252).
- Added two lightweight example v6 hubs for use in examples and tests:
testhubs/v6/target_file — target data stored in single
files.
testhubs/v6/target_dir — hive-partitioned target
data.
hubUtils 0.7.0
- Added two lightweight example v5 hubs for use in examples and tests:
testhubs/v5/target_file — target data stored in single
files.
testhubs/v5/target_dir — hive-partitioned target data.
These hubs are complete and valid, enabling faster examples and checks
without requiring a clone of the public example hub repository.
hubUtils 0.6.0
- Added
convert_output_type() function to convert model
outputs from one output type to another (currently only supports sample
to mean, median, and quantile) (#212, #214, #215)
convert_output_type() now supports transformations
involving output type IDs dependent on task ID variable values
(#222)
- Added last schema version (v5.1.0)
hubUtils 0.5.0
read_config_file now accepts a URL to the raw
contents of a JSON config file as well as an object of class
<SubTreeFileSystem> pointing to a config file in an
S3 cloud hub (#209). This enables reading config files directly from
GitHub S3 cloud hubs without having to clone the contents of a hub
locally.
read_config now also accepts a URL to of a fully
configured hub repository hosted on GitHub.
- Added utilities for working with URLs:
is_url(): checks whether a character string is a
URL.
is_valid_url(): checks whether a URL is valid and
reachable.
is_github_url(): checks whether a URL is a
github.com URL.
is_github_repo_url(): checks whether a URL is a GitHub
repository URL.
create_s3_url(): creates an S3 URL from a bucket name
and object path.
is_s3_base_fs(): checks whether an object of class
<SubTreeFileSystem> is a base file system (i.e. the
root of a cloud hub).
hubUtils 0.4.0
- Released schemas are now shipped with the package, so an internet
connection is no longer necessary for local validation. Released
versions of
hubUtils will always only contain released
versions of schemas while dev versions from hubUtils
(installed from GitHub) may contain versions of schema under active
development.
- Added
subset_task_id_names() function to subset task ID
names from a character vector of column names (#149).
- Added functions
subset_task_id_cols() and
subset_std_cols() to subset a model_out_tbl or
submission tbl to task ID or standard (non-task ID) columns
respectively (#149).
hubUtils 0.3.0
schema_id version checks silenced by default in
read_config() and read_config_file().
- Add and export
hubValidations functions
get_hub_timezone(), get_hub_model_output_dir()
and get_hub_file_formats() for extracting hub metadata to
hubUtils package.
- Add new function
get_hub_derived_task_ids() to extract
round or hub level derived task ID values from a tasks.json
config file.
hubUtils 0.2.0
- Add family of functions for extracting the version number from a
variety of sources:
get_version_config(): extract version from a
<config> class object.
get_version_config_file(): extract version from a
config file by specifying a config_path.
get_version_hub(): extract version from a config file
by specifying a hub_path.
- Add family of functions for comparing the version number extracted
from a variety of sources to a given version number (#171):
version_equal(): Check whether a schema version
property is equal to.
version_gte(): Check whether a schema version property
is equal to or greater than.
version_gt(): Check whether a schema version property
is greater than.
version_lte(): Check whether a schema version property
is equal to or less than.
version_lt(): Check whether a schema version property
is less than.
<config> class objects now have a
type attribute to track what type of config they contain
(i.e "tasks" or "admin").
read_config() and read_config_file() will
attempt to coerce their output a <config> class
object, with a warning if unsuccessful (#173).
- Add
as_config() function to coerce a config list to a
<config> class object (from the hubAdmin
package) (#173).
- Fix bug in
extract_schema_version() where only single
digits from each version component were being extracted.
- Fix documentation for
get_schema_version_latest() to no
longer use v1.0.0
hubUtils 0.1.7
- First submission to CRAN
- Removed
hubData dependency
hubUtils 0.1.2
- Bug fix: Corrected bug in v3 config utilities so that configs are
detected as
v3 if they are v3.0.0 or above,
not just v3.0.0. Thanks to @M-7th for reporting.
hubUtils 0.1.1
- Remove
hubAdmin Suggests dependency by moving test hub
configuration validation to CI (resolved: @annakrystalli,
https://github.com/hubverse-org/hubUtils/issues/158)
hubUtils 0.1.0
- Add
read_config_file() helper function to read a JSON
config file from a file path.
- Add
extract_schema_version() helper function to extract
the schema version from a schema id or config
schema_version property character string.
- Add helpers
is_v3_config,
is_v3_config_file and is_v3_config_hub to
check whether a config object, file or hub is using schema version
3.
hubUtils 0.0.2
- Missing dependency (
jsonlite) bug fix.
hubUtils 0.0.1 MAJOR RELEASE
- First major release of
hubUtils
package containing significant breaking changes. Much of the
package has been moved and split across two smaller and more dedicated
packages:
hubData package: contains functions
for connecting to and interacting with hub data.
- Exported functions moved to
hubData:
connect_hub(), connect_model_output(),
expand_model_out_val_grid(),
create_model_out_submit_tmpl(),
coerce_to_character(), coerce_to_hub_schema()
and create_hub_schema().
hubUtils functions re-exported to hubData:
as_model_out_tbl(), validate_model_out_tbl(),
model_id_split() and model_id_merge().
hubAdmin package: contains functions
for administering Hubs, in particular creating and validating hub
configuration files. Exported functions moved to hubAdmin:
- Functions for creating config files:
create_config(),
create_model_task(), create_model_tasks(),
create_output_type(),
create_output_type_cdf(),
create_output_type_mean(),
create_output_type_median(),
create_output_type_pmf(),
create_output_type_quantile(),
create_output_type_sample(), create_round(),
create_rounds(), create_target_metadata(),
create_target_metadata_item(),
create_task_id(), create_task_ids().
- Functions for validating config files:
validate_config(),validate_model_metadata_schema(),
validate_hub_config(),
view_config_val_errors().
hubUtils 0.0.0.9019
- Minor internal bug fixes and documentation updates.
hubUtils 0.0.0.9018
- Added US and European location datasets. These can be used e.g. when
assigning location task ID values for
tasks.json config
files programmatically (#127).
hubUtils 0.0.0.9017
connect_hub() and connect_model_output()
now identify and report on files that are present and should have been
opened but for which a connection was not successful (#124)
- Introduced a number of minor documentation clarifications and bug
fixes (#129, #128, #121, #130)
hubUtils 0.0.0.9016
- Added
validate_model_metadata_schema() function and
included it as part of validate_hub_config() (#110 &
#112).
hubUtils 0.0.0.9015
- Added
load_model_metadata() function to compile hub
model metadata.
hubUtils 0.0.0.9014
- Added
coerce_to_character() function for coercing all
model output columns to character. This can be much faster than coercing
to coerce_to_hub_schema(), especially for dates.
- Added the following parameters to
expand_model_out_val_grid():
all_character: allow for returning all character
columns.
as_arrow_table: allow for returning an arrow data
table.
bind_model_tasks: allow for returning list of model
task level grids.
- Bug fix. Handle situation in
expand_model_out_val_grid() when
required_vals_only = TRUE yet required task ID columns are
not consistent across modeling tasks. The function now pads missing task
ID column values with NAs.
hubUtils 0.0.0.9013
- Introduced
coerce_to_hub_schema() function and applied
it to create_model_out_submit_tmpl() &
expand_model_out_val_grid() to ensure column data types in
returned tibbles are consistent with the hub’s schema (#100).
- Fixed bug where optional
mean/median
output types where being included erroneously when
required_vals_only = TRUE.
- Exported function
get_round_task_id_names() (#99).
- Memoized function
read_config() (#101).
hubUtils 0.0.0.9012
- Fixed bug (#95 & #97) which was causing
connect_hub() to error when "csv" was an
accepted hub file format but there were no CSV in the model output
directory. Now connect_hub() checks for the presence of
files of each accepted file format and only opens datasets for file
formats of which files exists. If there are no files of any accepted
file_format in the model output directory, the S3
hub_connection object returned consists of an empty
list.
- Fixed bug (#96) which was required
hubUtils to be
loaded for std_colnames to be internally available.
hubUtils 0.0.0.9011
- Changed default behavior of
create_model_out_submit_tmpl(). Function now, by default,
returns rows of complete cases only and the behavior is controlled by
argument complete_cases_only. Argument
remove_empty_cols was also removed.
hubUtils 0.0.0.9010
- Support for Hubs using schema earlier than v2.0.0
deprecated. Currently a warning is issued when interacting with
such Hubs. Support will eventually be retired completely and errors will
be produced with Hubs using older config schema.
- Added
create_model_out_submit_tmpl() for generating
round specific model output template tibbles (#82).
- Added lower level utilities:
expand_model_out_val_grid() for creating an expanded
grid of valid task ID and output type ID across round modeling tasks and
output types.
get_round_idx(): for getting an integer index of the
element in config_tasks$rounds that a character round
identifier maps to.
get_round_ids(): for getting a list or character vector
of Hub round IDs.
- Added additional
tasks.json validation checks via
validate_config():
- Check that all task_id and output_type_id values are unique across
required and optional properties.
- In rounds where
round_id_from_variable is
TRUE, check that the specification of the task_id set as
round_id is consistent across modeling tasks.
- Check that
round_id values are unique across
rounds.
- Exported object
std_colnames which contains standard
column names used in hubverse model output data files, for use in other
hubverse packages (#88).
hubUtils 0.0.0.9009
- Added
as_model_out_tbl() function to standardize model
output data by converting to a model_out_tbl S3 class
object. (#32, #33, #63, #64, #66)
- To support back-compatibility with model output data in older hubs,
added functions
model_id_merge() and
model_id_split() to create model_id column
from separate team_abbr and model_abbr columns
and vice versa (#63).
hubUtils 0.0.0.9008
- Added argument
output_type_id_datatype to
connect_hub() to allow overriding default behavior of
automatically detecting the output_type_id column data type
from the tasks.json config file (#70).
- Exposed
create_hub_schema() argument
partitions to connect_hub() function to
accommodate custom hub partitioning.
- Added argument
partition_names to
connect_model_output() to accommodate custom hub
partitioning.
- Added argument
schema to
connect_model_output() to allow for overriding default
arrow schema auto-detection.
- Moved
jsonvalidate package to Imports so Hub
administrator functionality accessible through standard
installation.
- Removed argument
format from
create_hub_schema() which now creates the same schema from
a tasks.json config file, regardless of the data file
format (#80).
hubUtils 0.0.0.9007
- New function
validate_hub_config() allows maintainers
to check the validity of hub config files in a single call. Function
view_config_val_errors() also modified to create combined
report for hub config files from output of
validate_hub_config().
- Breaking change: All
model-output data are expected to
have output_type & output_type_id instead
of type & type_id respectively.
hubUtils 0.0.0.9006
connect_hub() now automatically determines the
output_type_id column data type from the
tasks.json config file coercing to the highest possible
data type, “character” being the lowest denominator.
- Introduced function
create_hub_schema() for determining
the schema for data in a hub’s model-output directory from a
tasks.json config file.
connect_hub() now allows establishing connections to
hubs with multiple file type formats.
create_output_type_categorical() function was renamed
to create_output_type_pmf().
- When extracting data via a hub connection, the column containing
model identification information, inferred from
model-output data directory partitions, was renamed from
“model” to “model_id”.
hubUtils 0.0.0.9005
- Re-implemented
connect_hub() function to open
connection to model-output data implemented through an
arrow FileSystemDataset object. This allows
users to create custom dplyr queries to access model output
data.
hubUtils 0.0.0.9004
- Added functionality to help create JSON configuration files.
hubUtils 0.0.0.9003
- Added
validate_config() function to validate JSON
configuration files against Hub schema as well as function
view_config_val_errors() for viewing a concise and easier
to navigate table of validation errors.
- Added a
NEWS.md file to track changes to the
package.