
vald.extractor extends the valdr package by
providing a production-ready, fault-tolerant pipeline for extracting,
cleaning, and visualizing VALD ForceDecks data across multiple sports.
Designed for CRAN submission with comprehensive documentation and
enterprise-grade error handling.
Organizations using VALD ForceDecks face three critical challenges:
vald.extractor solves these problems through:
# Install from CRAN (when available)
install.packages("vald.extractor")
# Or install development version from GitHub
# install.packages("devtools")
devtools::install_github("praveenmaths89/vald.extractor")library(vald.extractor)
# 1. Set VALD credentials
valdr::set_credentials(
client_id = "your_client_id",
client_secret = "your_client_secret",
tenant_id = "your_tenant_id",
region = "aue"
)
# 2. Fetch test and trial data in chunks (prevents timeout)
vald_data <- fetch_vald_batch(
start_date = "2020-01-01T00:00:00Z",
chunk_size = 100
)
# 3. Fetch and standardize athlete metadata
metadata <- fetch_vald_metadata(
client_id = "your_client_id",
client_secret = "your_client_secret",
tenant_id = "your_tenant_id"
)
athlete_metadata <- standardize_vald_metadata(
profiles = metadata$profiles,
groups = metadata$groups
)
# 4. Apply automated sports classification
athlete_metadata <- classify_sports(athlete_metadata)
table(athlete_metadata$sports_clean)
# 5. Transform to wide format and join with metadata
# ... (see vignette for complete pipeline)
# 6. Split by test type with suffix removal
test_datasets <- split_by_test(final_analysis_data)
cmj_data <- test_datasets$CMJ # Column names: "PEAK_FORCE_Both", not "PEAK_FORCE_Both_CMJ"
dj_data <- test_datasets$DJ # Same column names enable generic analysis
# 7. Generate summary statistics
summary_vald_metrics(cmj_data, group_vars = c("sex", "sports"))
# 8. Visualize trends and comparisons
plot_vald_trends(cmj_data, metric_col = "PEAK_FORCE_Both", group_col = "profileId")
plot_vald_compare(cmj_data, metric_col = "JUMP_HEIGHT_Both", group_col = "sports", fill_col = "sex")# Processes 5000 tests without timeout errors
vald_data <- fetch_vald_batch(
start_date = "2020-01-01T00:00:00Z",
chunk_size = 100, # Adjust based on API performance
verbose = TRUE
)
# If chunk 23 fails, chunks 1-22 and 24+ still succeed
# Error messages indicate which rows failed for debuggingWhy it matters: Organizations with large historical datasets (5000+ tests) cannot extract data in a single API call. The chunked approach with tryCatch error handling ensures partial extraction succeeds even if some chunks fail.
metadata <- classify_sports(metadata, group_col = "all_group_names")
# Before:
# "Team A - Football", "Soccer U18", "FSI Elite", "Basketball", "BBall"
# After:
# "Football", "Football", "Football", "Basketball", "Basketball"
table(metadata$sports_clean)
#> Football Basketball Cricket Swimming Track & Field
#> 523 198 145 87 234The Value Add: Multi-sport organizations waste hours manually categorizing athletes. This regex-based system handles 15+ sports out-of-the-box and is easily extensible.
# Write analysis code ONCE that works for ALL test types
analyze_bilateral_asymmetry <- function(test_data) {
test_data %>%
mutate(
asymmetry = (PEAK_FORCE_Left - PEAK_FORCE_Right) /
((PEAK_FORCE_Left + PEAK_FORCE_Right) / 2) * 100
)
}
# Apply to CMJ, DJ, ISO without code changes
test_datasets <- split_by_test(final_data)
cmj_with_asymmetry <- analyze_bilateral_asymmetry(test_datasets$CMJ)
dj_with_asymmetry <- analyze_bilateral_asymmetry(test_datasets$DJ)
iso_with_asymmetry <- analyze_bilateral_asymmetry(test_datasets$ISO)DRY Principle: Without suffix removal, you’d need
separate code for PEAK_FORCE_Left_CMJ,
PEAK_FORCE_Left_DJ, etc. This package enables true generic
programming.
# Fix missing/incorrect demographics from external Excel file
cmj_data <- patch_metadata(
data = cmj_data,
patch_file = "corrections.xlsx",
fields_to_patch = c("sex", "dateOfBirth")
)
# Unknown values are replaced with corrections
table(cmj_data$sex)
#> Before: Male: 450, Female: 380, Unknown: 45
#> After: Male: 470, Female: 405, Unknown: 0# Longitudinal trends
plot_vald_trends(
data = cmj_data,
metric_col = "JUMP_HEIGHT_Both",
group_col = "profileId",
facet_col = "sports"
)
# Cross-sectional comparisons
plot_vald_compare(
data = cmj_data,
metric_col = "PEAK_FORCE_Both",
group_col = "sports",
fill_col = "sex"
)?fetch_vald_batch,
?standardize_vald_metadata, ?split_by_test,
etc.vald.extractor is designed for:
| Task | Manual Workflow | vald.extractor |
|---|---|---|
| Extract 5000 tests | ❌ API timeout errors | ✅ Chunked processing (15 min) |
| Classify 500 athletes into sports | ❌ 2-3 hours manual work | ✅ Automated (30 sec) |
| Analyze CMJ, DJ, ISO separately | ❌ Duplicate code for each | ✅ Generic functions |
| Handle missing demographics | ❌ Manual data entry | ✅ Excel patch import |
| Generate summary tables | ❌ Custom scripts | ✅ summary_vald_metrics() |
| Create visualizations | ❌ ggplot2 from scratch | ✅ Pre-built themes |
The R Journal article will focus on:
Key Message: “Automating domain-specific data taxonomy for multi-organizational sports science”
If you use vald.extractor in published research, please
cite:
Chougale PD, Anathakumar U (2026). vald.extractor: Robust Pipeline for VALD
ForceDecks Data Extraction and Analysis. R package version 0.1.0.
https://github.com/praveenmaths89/vald.extractor
Contributions are welcome! Please:
git checkout -b feature/new-sport-taxonomy)Common contributions:
classify_sports()MIT License - see LICENSE file for details.
valdr packagevignette("end-to-end-pipeline", package = "vald.extractor")Status: Ready for CRAN submission pending: - [ ] Final testing on multiple VALD tenants - [ ] CRAN comment responses - [ ] Logo design (hex sticker) - [ ] pkgdown website deployment
Maintainer: Praveen D Chougale (praveenmaths89@gmail.com)