Diagnostic Tools for Data Frame Joins in R
The joinspy package helps you understand and debug join
operations by analyzing key columns before and after joins, detecting
common issues, and explaining unexpected row count changes. Catch
problems early instead of discovering them when downstream analysis
breaks.
library(joinspy)
# Pre-join diagnostics
report <- join_spy(orders, customers, by = "customer_id")
summary(report)
# Quick pass/fail check
key_check(orders, customers, by = "customer_id")
# Safe join with cardinality enforcement
result <- join_strict(orders, customers, by = "customer_id", expect = "1:1")
# Auto-repair common issues
orders_fixed <- join_repair(orders, by = "customer_id")Joins silently produce unexpected results when:
These problems are discovered only when downstream analysis breaks.
joinspy catches them upfront by analyzing keys
before you join, explaining why joins
misbehave, and showing where the problems are.
join_spy(): Comprehensive pre-flight
diagnostic reportkey_check(): Quick pass/fail key
quality assessmentkey_duplicates(): Find and locate
duplicate keysjoin_explain(): Explain row count
changes after a joinjoin_diff(): Compare before/after
statesjoin_strict(): Join with cardinality
enforcement (1:1, 1:m, m:1,
m:m)left_join_spy(),
right_join_spy(),
inner_join_spy(),
full_join_spy(): Joins with automatic
diagnosticslast_report(): Retrieve diagnostics
after silent (.quiet = TRUE) joinsjoin_repair(): Fix whitespace, case,
encoding, empty strings automaticallysuggest_repairs(): Generate R code
snippets to fix detected issuesdetect_cardinality(): Determine actual
relationship (1:1, 1:m, m:1, m:m)check_cartesian(): Warn about
Cartesian product explosionsanalyze_join_chain(): Analyze
multi-table join sequencesplot(): Venn diagram of key overlap
(with optional file param to save)summary(): Compact metrics table (with
optional format param for text/markdown)log_report(): Write reports to file
(text/JSON/RDS)set_log_file(): Enable automatic
logging# Install from CRAN (when available)
install.packages("joinspy")
# Or install development version from GitHub
# install.packages("pak")
pak::pak("gcol33/joinspy")library(joinspy)
orders <- data.frame(
customer_id = c("A", "B", "B", "C", "D "),
amount = c(100, 200, 150, 300, 50),
stringsAsFactors = FALSE
)
customers <- data.frame(
customer_id = c("A", "B", "C", "D", "E"),
name = c("Alice", "Bob", "Carol", "David", "Eve"),
stringsAsFactors = FALSE
)
# Full diagnostic report
report <- join_spy(orders, customers, by = "customer_id")
# Compact summary
summary(report)
#> metric value
#> 1 left_rows 5
#> 2 right_rows 5
#> 3 left_unique_keys 4
#> 4 right_unique_keys 5
#> ...# Succeeds - 1:1 relationship
products <- data.frame(id = 1:3, name = c("Widget", "Gadget", "Gizmo"))
prices <- data.frame(id = 1:3, price = c(10, 20, 30))
join_strict(products, prices, by = "id", expect = "1:1")
# Fails - duplicates violate 1:1
prices_dup <- data.frame(id = c(1, 1, 2, 3), price = c(10, 15, 20, 30))
join_strict(products, prices_dup, by = "id", expect = "1:1")
#> Error: Cardinality violation: expected '1:1' but found '1:m'messy <- data.frame(
id = c(" A", "B ", " C "),
value = 1:3,
stringsAsFactors = FALSE
)
# Preview what would be fixed
join_repair(messy, by = "id", dry_run = TRUE)
# Apply fixes
fixed <- join_repair(messy, by = "id")
fixed$id
#> [1] "A" "B" "C"# Silent join for pipelines
result <- left_join_spy(orders, customers, by = "customer_id", .quiet = TRUE)
# Access diagnostics afterward
last_report()$match_analysis$match_rate
#> [1] 0.8report <- join_spy(orders, customers, by = "customer_id")
# Venn diagram
plot(report)
# Save to file
plot(report, file = "overlap.png")relationship argument provides basic cardinality
checksjoinspy fills the gap: it tells you why
joins misbehave and where the problems are.
“Software is like sex: it’s better when it’s free.” — Linus Torvalds
I’m a PhD student who builds R packages in my free time because I believe good tools should be free and open. I started these projects for my own work and figured others might find them useful too.
If this package saved you some time, buying me a coffee is a nice way to say thanks.
MIT (see the LICENSE.md file)
@software{joinspy,
author = {Colling, Gilles},
title = {joinspy: Diagnostic Tools for Data Frame Joins},
year = {2025},
url = {https://github.com/gcol33/joinspy}
}