Help for package rPandas

Title:

Translating from R to Python's Pandas Package

Version:

0.1.4

Description:

Provides an R interface to Python's 'pandas' library using non-standard evaluation. Users can write R code (e.g., rp_filter(), rp_select(), rp_mutate()) that is translated into pandas commands and executed via 'reticulate'. Supports chaining, grouping, and 'summarisation', and includes a 'table_name' parameter to generate 'copy-pasteable' Python code. Ideal for leveraging pandas' speed and flexibility within the R ecosystem.

License:

MIT + file LICENSE

Encoding:

UTF-8

RoxygenNote:

7.3.3

Suggests:

ggplot2, knitr, rmarkdown, testthat (≥ 3.0.0)

Config/testthat/edition:

Imports:

reticulate, rlang

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2026-04-27 15:55:05 UTC; akshat

Author:

Akshat Maurya [aut, cre], Rihaan Satia [aut], David Shilane [aut]

Maintainer:

Akshat Maurya <codingmaster902@gmail.com>

Repository:

CRAN

Date/Publication:

2026-04-28 20:30:09 UTC

rPandas: A User-Friendly R Interface to Pandas

Description

This package provides a set of wrapper functions that allow R users to interact with Python's pandas library using familiar R syntax.

Author(s)

Maintainer: Akshat Maurya codingmaster902@gmail.com

Authors:

Rihaan Satia
David Shilane david.shilane@columbia.edu

"Not In" Operator

Description

Provides the opposite of the standard R %in% operator.

Usage

x %notin% y

Arguments

x

Vector of values to be matched.

y

Vector of values to be matched against.

Value

A logical vector.

Examples

"a" %notin% c("b", "c")

Create a chained pandas command string

Description

This internal function assembles a Python command string for pandas by chaining together different data manipulation methods. It serves as the central command generator for the package.

Usage

create_pandas_statement(
  df_name,
  filter_str = NULL,
  select_str = NULL,
  sort_by_str = NULL,
  sort_asc_str = NULL,
  assign_str = NULL,
  drop_str = NULL,
  groupby_str = NULL,
  agg_str = NULL,
  head_k = NULL,
  tail_k = NULL
)

Arguments

df_name

A character string for the name of the pandas DataFrame.

filter_str

A string for the .query() method.

select_str

A string for column selection (e.g., ['col1', 'col2']).

sort_by_str

A string for the by argument of .sort_values().

sort_asc_str

A string for the ascending argument of .sort_values().

assign_str

A string for the .assign() method.

drop_str

A string for the .drop() method.

groupby_str

A string for the .groupby() method.

agg_str

A string for the .agg() method.

head_k

Integer for .head(k).

tail_k

Integer for .tail(k).

Value

A character string of the complete, chained pandas command.

Executes a pandas command string on an R data frame.

Description

This is the core execution engine. It explicitly injects an R data frame into the Python session, runs a command, retrieves the result, and cleans up.

Usage

execute_pandas_statement(
  r_df,
  py_command,
  table_name = table_name,
  return.as = "result"
)

Arguments

r_df

An R data.frame.

py_command

A character string of Python code using 'df' as a placeholder.

table_name

An optional character string. If provided, the generated Python code will replace the internal dataframe name with this string (e.g., "diamonds.query(...)"). This is useful for seeing the exact, copy-pasteable Python code. Defaults to NULL (uses "df").

return.as

What to return: "result", "code", or "all".

Value

The result of the execution.

Apply multiple summary functions to multiple columns

Description

Applies a list of summary functions to a list of columns, after optionally grouping the data.

Usage

rp_calculate(
  .data,
  ...,
  the.functions,
  .by = NULL,
  table_name = NULL,
  return.as = "result"
)

Arguments

.data

An R data.frame.

...

Bare column names to summarize (e.g., ⁠price, carat⁠).

the.functions

A character vector of R function names (e.g., c("mean", "sd")). Supports "mean", "median", "sd", "var", "min", "max", "sum".

.by

A bare column name or c(col1, col2) to group by.

table_name

return.as

What to return: "result", "code", or "all".

Value

A data.frame with the summarized and grouped data.

Examples


if (reticulate::py_available(initialize = TRUE) &&
    reticulate::py_module_available("pandas")) {

  rp_calculate(
    ggplot2::diamonds,
    price, carat,
    the.functions = c("mean", "sd"),
    .by = cut
  )
}

Check for rPandas dependencies and provide diagnostics

Description

This function checks if the user's system is correctly configured with Python and the pandas library. If dependencies are missing, it stops with a detailed diagnostic report and actionable instructions (only in interactive sessions). In non‑interactive contexts (e.g., CRAN checks), it issues a warning and returns FALSE.

Usage

rp_check_env()

Value

Invisibly returns TRUE if all checks pass, otherwise FALSE.

Count rows in a data frame, optionally by groups

Description

This function returns the number of rows in a data frame. When grouping variables are provided via .by, it returns the row counts for each group.

Usage

rp_count(.data, .by = NULL, table_name = NULL, return.as = "result")

Arguments

.data

An R data frame (or tibble) to be processed.

.by

Optional grouping variables. Can be one or more unquoted column names (e.g., cut or c(cut, color)). When provided, counts are computed per group.

table_name

return.as

One of "result", "code", or "all".

Value

A data frame with one column "n" (total row count) if .by = NULL, or a data frame with the grouping columns and a column "n" (per‑group counts).

Filter rows using pandas

Description

Filters a data frame using an R expression translated to pandas.

Usage

rp_filter(.data, filter_expression, table_name = NULL, return.as = "result")

Arguments

.data

An R data.frame or tibble.

filter_expression

The filtering expression, written in R syntax.

table_name

return.as

What to return: "result", "code", or "all".

Value

A data.frame containing the filtered rows.

Examples


if (reticulate::py_available(initialize = TRUE) &&
    reticulate::py_module_available("pandas")) {
      rp_filter(ggplot2::diamonds, carat > 1 & price < 4000)
}

Extract the first k rows of a data frame

Description

This function returns the first k rows of the data frame. If grouping variables are provided via .by, it returns the first k rows within each group.

Usage

rp_first_k_rows(.data, k, .by = NULL, table_name = NULL, return.as = "result")

Arguments

.data

An R data frame (or tibble) to be processed.

k

An integer specifying the number of rows to return. If .by is used, returns up to k rows per group.

.by

Optional grouping variables. Can be one or more unquoted column names. When provided, the operation is performed on each group separately.

table_name

return.as

One of "result", "code", or "all".

Value

Depending on return.as: a data frame, a character string, or a list.

Extract the last k rows of a data frame

Description

This function returns the last k rows of the data frame. If grouping variables are provided via .by, it returns the last k rows within each group.

Usage

rp_last_k_rows(.data, k, .by = NULL, table_name = NULL, return.as = "result")

Arguments

.data

An R data frame (or tibble) to be processed.

k

An integer specifying the number of rows to return. If .by is used, returns up to k rows per group.

.by

Optional grouping variables. Can be one or more unquoted column names. When provided, the operation is performed on each group separately.

table_name

return.as

One of "result", "code", or "all".

Value

Depending on return.as: a data frame, a character string, or a list.

Mutate (add/modify/remove) columns using pandas

Description

Mutate (add/modify/remove) columns using pandas

Usage

rp_mutate(
  .data,
  to_remove = NULL,
  ...,
  table_name = NULL,
  return.as = "result"
)

Arguments

.data

An R data frame.

to_remove

A character vector of column names to remove.

...

Named expressions for new/modified columns.

table_name

return.as

Either "result", "code", or "all".

Value

A data frame or list depending on return.as.

Filtering columns

Description

Selects specific columns from a data frame. It captures the bare column names and translates the operation into a pandas selection command.

Usage

rp_select(.data, ..., table_name = NULL, return.as = "result")

Arguments

.data

An R data.frame or tibble.

...

The bare column names to select (e.g., ⁠carat, cut, price⁠).

table_name

return.as

What to return: "result", "code", or "all".

Value

A data.frame containing only the selected columns.

Examples


if (reticulate::py_available(initialize = TRUE) &&
    reticulate::py_module_available("pandas")) {
  rp_select(ggplot2::diamonds, carat, cut, price)
}

Sort rows of a data frame using pandas

Description

Sorts a data frame by one or more columns. It translates the R expressions into a pandas .sort_values() command and executes it.

Usage

rp_sort(.data, ..., table_name = NULL, return.as = "result")

Arguments

.data

An R data.frame or tibble.

...

Bare column names to sort by. Use desc(colname) to sort in descending order (e.g., ⁠cut, desc(price)⁠).

table_name

return.as

What to return: "result", "code", or "all".

Value

A data.frame sorted by the specified columns.

Examples


if (reticulate::py_available(initialize = TRUE) &&
    reticulate::py_module_available("pandas")) {
  
  # Sort by cut (ascending) and price (descending)
  rp_sort(ggplot2::diamonds, cut, desc(price))
}

Summarize data using pandas

Description

Aggregates a data frame by one or more groups, applying summary functions. It translates R's dplyr::summarise syntax into a pandas ⁠.groupby().agg()⁠ command.

Usage

rp_summarize(.data, ..., .by = NULL, table_name = NULL, return.as = "result")

Arguments

.data

An R data.frame or tibble.

...

Named summary expressions (e.g., avg_price = mean(price)). Supports mean, median, sd, var, min, max, sum, and n().

.by

A bare column name or c(col1, col2) to group by.

table_name

return.as

What to return: "result", "code", or "all".

Value

A data.frame with the summarized and grouped data.

Examples


if (reticulate::py_available(initialize = TRUE) &&
    reticulate::py_module_available("pandas")) {
  
  # Summarize by one group
  rp_summarize(ggplot2::diamonds, 
               avg_price = mean(price), 
               .by = cut)
  
  # Summarize by multiple groups and multiple functions
  rp_summarize(ggplot2::diamonds, 
               avg_price = mean(price), 
               count = n(),
               .by = c(cut, color))
}

Recursively translate an R expression for a pandas .assign() lambda

Description

Recursively translate an R expression for a pandas .assign() lambda

Usage

translate_assign_recursive(expr_body)

Arguments

expr_body

A language object (call, symbol, or atomic).

Value

A character string of the translated Python expression.

Translate R function/column names into a pandas agg dictionary

Description

Translates R's the.variables and the.functions into pandas' dictionary-based .agg() syntax.

Usage

translate_calculate(variable_exprs, function_names)

Arguments

variable_exprs

A list of enquosured variable names.

function_names

A character vector of R function names.

Value

A string for the .agg() method (e.g., .agg({'col1': ['mean', 'std']})).

Translate an R filter expression into a Python query string

Description

Capture a bare R expression and translate it to a Python-compatible string suitable for use with pandas.DataFrame.query().

Usage

translate_filter(expr)

Arguments

expr

A bare R expression (e.g., carat > 2 & cut == "Ideal").

Value

A character string of the translated Python query.

Examples

translate_filter(carat > 2 & cut == "Ideal")
# -> "(carat > 1) and (cut == 'Ideal')"

Recursive helper to translate R expressions

Description

Recursive helper to translate R expressions

Usage

translate_filter_recursive(expr_body, env)

Arguments

expr_body

A language object (call, symbol, or atomic).

Value

A character string.

Translate a .by argument into a pandas .groupby() string

Description

Translate a .by argument into a pandas .groupby() string

Usage

translate_groupby(by_expr)

Arguments

by_expr

An enquosured .by argument.

Value

A string for the .groupby(..., as_index=False) method, or NULL if the argument is empty.

Translate R mutate expressions to pandas assign/drop strings

Description

Translate R mutate expressions to pandas assign/drop strings

Usage

translate_mutate(to_remove = NULL, ...)

Arguments

to_remove

Character vector of column names to drop.

...

Named R expressions.

Value

A list with components assign_str and drop_str.

Translate captured column names into a Python list string

Description

Translate captured column names into a Python list string

Usage

translate_select(...)

Arguments

...

Bare column names (captured by rlang::enquos).

Value

A character string formatted as a Python list.

Translate captured sort expressions into Python .sort_values() arguments

Description

Translate captured sort expressions into Python .sort_values() arguments

Usage

translate_sort(...)

Arguments

...

Bare column names or desc(colname) (captured by rlang::enquos).

Value

A list with two elements:

⁠$by⁠: A string for the by argument (e.g., ['cut', 'price'])
⁠$ascending⁠: A string for the ascending argument (e.g., [True, False])

Translate named R expressions for .agg()

Description

Translates R's new = func(old) syntax into pandas' named aggregation syntax ⁠new = ('old', 'func')⁠.

Usage

translate_summarize(...)

Arguments

...

Named R expressions (e.g., avg_price = mean(price)).

Value

A string for the .agg() method.