| Title: | Translating from R to Python's Pandas Package |
| Version: | 0.1.4 |
| Description: | Provides an R interface to Python's 'pandas' library using non-standard evaluation. Users can write R code (e.g., rp_filter(), rp_select(), rp_mutate()) that is translated into pandas commands and executed via 'reticulate'. Supports chaining, grouping, and 'summarisation', and includes a 'table_name' parameter to generate 'copy-pasteable' Python code. Ideal for leveraging pandas' speed and flexibility within the R ecosystem. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Suggests: | ggplot2, knitr, rmarkdown, testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| Imports: | reticulate, rlang |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-04-27 15:55:05 UTC; akshat |
| Author: | Akshat Maurya [aut, cre], Rihaan Satia [aut], David Shilane [aut] |
| Maintainer: | Akshat Maurya <codingmaster902@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-04-28 20:30:09 UTC |
rPandas: A User-Friendly R Interface to Pandas
Description
This package provides a set of wrapper functions that allow R users to interact with Python's pandas library using familiar R syntax.
Author(s)
Maintainer: Akshat Maurya codingmaster902@gmail.com
Authors:
Rihaan Satia
David Shilane david.shilane@columbia.edu
"Not In" Operator
Description
Provides the opposite of the standard R %in% operator.
Usage
x %notin% y
Arguments
x |
Vector of values to be matched. |
y |
Vector of values to be matched against. |
Value
A logical vector.
Examples
"a" %notin% c("b", "c")
Create a chained pandas command string
Description
This internal function assembles a Python command string for pandas by chaining together different data manipulation methods. It serves as the central command generator for the package.
Usage
create_pandas_statement(
df_name,
filter_str = NULL,
select_str = NULL,
sort_by_str = NULL,
sort_asc_str = NULL,
assign_str = NULL,
drop_str = NULL,
groupby_str = NULL,
agg_str = NULL,
head_k = NULL,
tail_k = NULL
)
Arguments
df_name |
A character string for the name of the pandas DataFrame. |
filter_str |
A string for the |
select_str |
A string for column selection (e.g., |
sort_by_str |
A string for the |
sort_asc_str |
A string for the |
assign_str |
A string for the |
drop_str |
A string for the |
groupby_str |
A string for the |
agg_str |
A string for the |
head_k |
Integer for |
tail_k |
Integer for |
Value
A character string of the complete, chained pandas command.
Executes a pandas command string on an R data frame.
Description
This is the core execution engine. It explicitly injects an R data frame into the Python session, runs a command, retrieves the result, and cleans up.
Usage
execute_pandas_statement(
r_df,
py_command,
table_name = table_name,
return.as = "result"
)
Arguments
r_df |
An R data.frame. |
py_command |
A character string of Python code using 'df' as a placeholder. |
table_name |
An optional character string. If provided, the generated Python code will replace the internal dataframe name with this string (e.g., |
return.as |
What to return: "result", "code", or "all". |
Value
The result of the execution.
Apply multiple summary functions to multiple columns
Description
Applies a list of summary functions to a list of columns, after optionally grouping the data.
Usage
rp_calculate(
.data,
...,
the.functions,
.by = NULL,
table_name = NULL,
return.as = "result"
)
Arguments
.data |
An R data.frame. |
... |
Bare column names to summarize (e.g., |
the.functions |
A character vector of R function names
(e.g., |
.by |
A bare column name or |
table_name |
An optional character string. If provided, the generated Python code will replace the internal dataframe name with this string (e.g., |
return.as |
What to return: "result", "code", or "all". |
Value
A data.frame with the summarized and grouped data.
Examples
if (reticulate::py_available(initialize = TRUE) &&
reticulate::py_module_available("pandas")) {
rp_calculate(
ggplot2::diamonds,
price, carat,
the.functions = c("mean", "sd"),
.by = cut
)
}
Check for rPandas dependencies and provide diagnostics
Description
This function checks if the user's system is correctly configured with Python and the pandas library. If dependencies are missing, it stops with a detailed diagnostic report and actionable instructions (only in interactive sessions). In non‑interactive contexts (e.g., CRAN checks), it issues a warning and returns FALSE.
Usage
rp_check_env()
Value
Invisibly returns TRUE if all checks pass, otherwise FALSE.
Count rows in a data frame, optionally by groups
Description
This function returns the number of rows in a data frame. When grouping
variables are provided via .by, it returns the row counts for each group.
Usage
rp_count(.data, .by = NULL, table_name = NULL, return.as = "result")
Arguments
.data |
An R data frame (or tibble) to be processed. |
.by |
Optional grouping variables. Can be one or more unquoted column names
(e.g., |
table_name |
An optional character string. If provided, the generated Python code will replace the internal dataframe name with this string (e.g., |
return.as |
One of |
Value
A data frame with one column "n" (total row count) if .by = NULL,
or a data frame with the grouping columns and a column "n" (per‑group counts).
Filter rows using pandas
Description
Filters a data frame using an R expression translated to pandas.
Usage
rp_filter(.data, filter_expression, table_name = NULL, return.as = "result")
Arguments
.data |
An R data.frame or tibble. |
filter_expression |
The filtering expression, written in R syntax. |
table_name |
An optional character string. If provided, the generated Python code will replace the internal dataframe name with this string (e.g., |
return.as |
What to return: "result", "code", or "all". |
Value
A data.frame containing the filtered rows.
Examples
if (reticulate::py_available(initialize = TRUE) &&
reticulate::py_module_available("pandas")) {
rp_filter(ggplot2::diamonds, carat > 1 & price < 4000)
}
Extract the first k rows of a data frame
Description
This function returns the first k rows of the data frame. If grouping variables
are provided via .by, it returns the first k rows within each group.
Usage
rp_first_k_rows(.data, k, .by = NULL, table_name = NULL, return.as = "result")
Arguments
.data |
An R data frame (or tibble) to be processed. |
k |
An integer specifying the number of rows to return. If |
.by |
Optional grouping variables. Can be one or more unquoted column names. When provided, the operation is performed on each group separately. |
table_name |
An optional character string. If provided, the generated Python code will replace the internal dataframe name with this string (e.g., |
return.as |
One of |
Value
Depending on return.as: a data frame, a character string, or a list.
Extract the last k rows of a data frame
Description
This function returns the last k rows of the data frame. If grouping variables
are provided via .by, it returns the last k rows within each group.
Usage
rp_last_k_rows(.data, k, .by = NULL, table_name = NULL, return.as = "result")
Arguments
.data |
An R data frame (or tibble) to be processed. |
k |
An integer specifying the number of rows to return. If |
.by |
Optional grouping variables. Can be one or more unquoted column names. When provided, the operation is performed on each group separately. |
table_name |
An optional character string. If provided, the generated Python code will replace the internal dataframe name with this string (e.g., |
return.as |
One of |
Value
Depending on return.as: a data frame, a character string, or a list.
Mutate (add/modify/remove) columns using pandas
Description
Mutate (add/modify/remove) columns using pandas
Usage
rp_mutate(
.data,
to_remove = NULL,
...,
table_name = NULL,
return.as = "result"
)
Arguments
.data |
An R data frame. |
to_remove |
A character vector of column names to remove. |
... |
Named expressions for new/modified columns. |
table_name |
An optional character string. If provided, the generated Python code will replace the internal dataframe name with this string (e.g., |
return.as |
Either "result", "code", or "all". |
Value
A data frame or list depending on return.as.
Filtering columns
Description
Selects specific columns from a data frame. It captures the bare column names and translates the operation into a pandas selection command.
Usage
rp_select(.data, ..., table_name = NULL, return.as = "result")
Arguments
.data |
An R data.frame or tibble. |
... |
The bare column names to select (e.g., |
table_name |
An optional character string. If provided, the generated Python code will replace the internal dataframe name with this string (e.g., |
return.as |
What to return: "result", "code", or "all". |
Value
A data.frame containing only the selected columns.
Examples
if (reticulate::py_available(initialize = TRUE) &&
reticulate::py_module_available("pandas")) {
rp_select(ggplot2::diamonds, carat, cut, price)
}
Sort rows of a data frame using pandas
Description
Sorts a data frame by one or more columns. It translates the R expressions
into a pandas .sort_values() command and executes it.
Usage
rp_sort(.data, ..., table_name = NULL, return.as = "result")
Arguments
.data |
An R data.frame or tibble. |
... |
Bare column names to sort by. Use |
table_name |
An optional character string. If provided, the generated Python code will replace the internal dataframe name with this string (e.g., |
return.as |
What to return: "result", "code", or "all". |
Value
A data.frame sorted by the specified columns.
Examples
if (reticulate::py_available(initialize = TRUE) &&
reticulate::py_module_available("pandas")) {
# Sort by cut (ascending) and price (descending)
rp_sort(ggplot2::diamonds, cut, desc(price))
}
Summarize data using pandas
Description
Aggregates a data frame by one or more groups, applying summary functions.
It translates R's dplyr::summarise syntax into a pandas .groupby().agg()
command.
Usage
rp_summarize(.data, ..., .by = NULL, table_name = NULL, return.as = "result")
Arguments
.data |
An R data.frame or tibble. |
... |
Named summary expressions (e.g., |
.by |
A bare column name or |
table_name |
An optional character string. If provided, the generated Python code will replace the internal dataframe name with this string (e.g., |
return.as |
What to return: "result", "code", or "all". |
Value
A data.frame with the summarized and grouped data.
Examples
if (reticulate::py_available(initialize = TRUE) &&
reticulate::py_module_available("pandas")) {
# Summarize by one group
rp_summarize(ggplot2::diamonds,
avg_price = mean(price),
.by = cut)
# Summarize by multiple groups and multiple functions
rp_summarize(ggplot2::diamonds,
avg_price = mean(price),
count = n(),
.by = c(cut, color))
}
Recursively translate an R expression for a pandas .assign() lambda
Description
Recursively translate an R expression for a pandas .assign() lambda
Usage
translate_assign_recursive(expr_body)
Arguments
expr_body |
A language object (call, symbol, or atomic). |
Value
A character string of the translated Python expression.
Translate R function/column names into a pandas agg dictionary
Description
Translates R's the.variables and the.functions into pandas'
dictionary-based .agg() syntax.
Usage
translate_calculate(variable_exprs, function_names)
Arguments
variable_exprs |
A list of enquosured variable names. |
function_names |
A character vector of R function names. |
Value
A string for the .agg() method (e.g., .agg({'col1': ['mean', 'std']})).
Translate an R filter expression into a Python query string
Description
Capture a bare R expression and translate it to a Python-compatible string
suitable for use with pandas.DataFrame.query().
Usage
translate_filter(expr)
Arguments
expr |
A bare R expression (e.g., |
Value
A character string of the translated Python query.
Examples
translate_filter(carat > 2 & cut == "Ideal")
# -> "(carat > 1) and (cut == 'Ideal')"
Recursive helper to translate R expressions
Description
Recursive helper to translate R expressions
Usage
translate_filter_recursive(expr_body, env)
Arguments
expr_body |
A language object (call, symbol, or atomic). |
Value
A character string.
Translate a .by argument into a pandas .groupby() string
Description
Translate a .by argument into a pandas .groupby() string
Usage
translate_groupby(by_expr)
Arguments
by_expr |
An enquosured |
Value
A string for the .groupby(..., as_index=False) method,
or NULL if the argument is empty.
Translate R mutate expressions to pandas assign/drop strings
Description
Translate R mutate expressions to pandas assign/drop strings
Usage
translate_mutate(to_remove = NULL, ...)
Arguments
to_remove |
Character vector of column names to drop. |
... |
Named R expressions. |
Value
A list with components assign_str and drop_str.
Translate captured column names into a Python list string
Description
Translate captured column names into a Python list string
Usage
translate_select(...)
Arguments
... |
Bare column names (captured by |
Value
A character string formatted as a Python list.
Translate captured sort expressions into Python .sort_values() arguments
Description
Translate captured sort expressions into Python .sort_values() arguments
Usage
translate_sort(...)
Arguments
... |
Bare column names or |
Value
A list with two elements:
-
$by: A string for thebyargument (e.g.,['cut', 'price']) -
$ascending: A string for theascendingargument (e.g.,[True, False])
Translate named R expressions for .agg()
Description
Translates R's new = func(old) syntax into pandas' named aggregation
syntax new = ('old', 'func').
Usage
translate_summarize(...)
Arguments
... |
Named R expressions (e.g., |
Value
A string for the .agg() method.