dplyr R library support in Data Refinery

Data Refinery provides scripting support for the following dplyr R library operations, functions, and logical operators.

Important
dplyr R library support is for the operations and functions in the user interface. If you insert other operations or functions from the open source dplyr R library, the Data Refinery flow might fail. See the command-line help and be sure to use the list of operations or functions from the customized templates. Use the examples in the templates to further customize the syntax as needed.

Operations

arrange

arrange(`<column>`)
Sort rows, in ascending order, by the specified columns.

arrange(desc(`<column>`))
Sort rows, in descending order, by the specified column.

arrange(`<column>`, `<column>`)
Sort rows, in ascending order, by each specified, successive column, keeping the order from the prior sort intact.

count

count()
Total the data by group.

count(`<column>`)
Group the data by the specified column and return the number of rows with unique values (for string values) or return the total for each group (for numeric values).

count(`<column>`, wt=`<column>`)
Group the data by the specified column and return the number of rows with unique values (for string values) or return the total for each group (for numeric values) in the specified weight column.

count(`<column>`, wt=<func>(`<column>`))
Group the data by the specified column and return the result of the function applied to the specified weight column.

count(`<column>`, wt=<func>(`<column>`), sort = <logical>)
Group the data by the specified column and return the result of the function applied to the specified weight column, sorted or not.

distinct

distinct()
Keep distinct, unique rows based on all columns or on specified columns.

filter

filter(`<column>` <logicalOperator> provide_value)
Keep rows that meet the specified condition and filter out all other rows.

filter(<func>(`<column>`) <logicalOperator> provide_value)
Keep rows that meet the specified condition and filter out all other rows. The condition can apply a function to a column on the left side of the operator.

filter(`<column>` <logicalOperator> <func(column)>) Keep rows that meet the specified condition and filter out all other rows. The condition can apply a function to a column on the right side of the operator.

filter(<logicalfunc(column)>)
Keep rows that meet the specified condition and filter out all other rows. The condition can apply a logical function to a column.

filter(`<column>` <logicalOperator> provide_value <andor> `<column>` <logicalOperator> provide_value )
Keep rows that meet the specified conditions and filter out all other rows.

group_by

group_by(`<column>`)
Group the data based on the specified column.

group_by(desc(`<column>`))
Group the data, in descending order, based on the specified column.

mutate

mutate(provide_new_column = `<column>`)
Add a new column and keep existing columns.

mutate(provide_new_column = <func(column)>)
Add a new column by using the specified expression, which applies a function to a column. Keep existing columns.

mutate(provide_new_column = case_when(`<column>` <operator> provide_value_or_column_to_compare ~ provide_value_or_column_to_replace, `<column>` <operator> provide_value_or_column_to_compare ~ provide_value_or_column_to_replace, TRUE ~ provide_default_value_or_column))
Add a new column by using the specified conditional expression.

mutate(provide_new_column = `<column>` <operator> `<column>`)
Add a new column by using the specified expression, which performs a calculation with existing columns. Keep existing columns.

mutate(provide_new_column = coalesce(`<column>`, `<column>`))
Add a new column by using the specified expression, which replaces missing values in the new column with values from another, specified column. As an alternative to specifying another column, you can specify a value, a function on a column, or a function on a value. Keep existing columns.

mutate(provide_new_column = if_else(`<column>` <logicalOperator> provide_value, provide_value_for_true, provide_value_for_false))
Add a new column by using the specified conditional expression. Keep existing columns.

mutate(provide_new_column = `<column>`, provide_new_column = `<column>`)
Add multiple new columns and keep existing columns.

mutate(provide_new_column = n())
Count the values in the groups. Ensure grouping is done already using group_by. Keep existing columns.

mutate_all

mutate_all(funs(<func>))
Apply the specified function to all of the columns and overwrite the existing values in those columns. Specify whether to remove missing values.

mutate_all(funs(. <operator> provide_value))
Apply the specified operator to all of the columns and overwrite the existing values in those columns.

mutate_all(funs(“provide_value” = . <operator> provide_value))
Apply the specified operator to all of the columns and create new columns to hold the results. Give the new columns names that end with the specified value.

mutate_at

mutate_at(vars(`<column>`), funs(<func>))
Apply functions to the specified columns.

mutate_if

mutate_if(<predicateFunc>, <func>)
Apply functions to the columns that meet the specified condition.

mutate_if(<predicateFunc>, funs( . <operator> provide_value))
Apply the specified operator to the columns that meet the specified condition.

mutate_if(<predicateFunc>, funs(<func>))
Apply functions to the columns that meet the specified condition. Specify whether to remove missing values.

rename

rename(provide_new_column = `<column>`)
Rename the specified column.

sample_frac

sample_frac(provide_number_between_0_and_1, weight=`<column>`,replace=<logical>)
Generate a random sample based on a percentage of the data. weight is optional and is the ratio of probability the row will be chosen. Provide a numeric column. replace is optional and its Default is FALSE.

sample_n

sample_n(provide_number_of_rows,weight=`<column>`,replace=<logical>)
Generate a random sample of data based on a number of rows. weight is optional and is the ratio of probability the row will be chosen. Provide a numeric column. replace is optional and its default is FALSE.

select

select(`<column>`)
Keep the specified column.

select(-`<column>`)
Remove the specified column.

select(starts_with(“provide_text_value”))
Keep columns with names that start with the specified value.

select(ends_with(“provide_text_value”))
Keep columns with names that end with the specified value.

select(contains(“provide_text_value”))
Keep columns with names that contain the specified value.

select(matches (“provide_text_value”))
Keep columns with names that match the specified value. The specified value can be text or a regular expression.

select(`<column>`:`<column>`)
Keep the columns in the specified range. Specify the range as from one column to another column.

select(`<column>`, everything())
Keep all of the columns, but make the specified column the first column.

select(`<column>`, `<column>`)
Keep the specified columns.

select_if

select_if(<predicateFunc>) Keep columns that meet the specified condition. Supported functions include:

  • contains
  • ends_with
  • matches
  • num_range
  • starts_with

summarize

summarize(provide_new_column = <func>(`<column>`))
Apply aggregate functions to the specified columns to reduce multiple column values to a single value. Be sure to group the column data first by using the group_by operation.

summarize_all

summarize_all(<func>)
Apply an aggregate function to all of the columns to reduce multiple column values to a single value. Specify whether to remove missing values. Be sure to group the column data first by using the group_by operation.

summarize_all(funs(<func>))
Apply multiple aggregate functions to all of the columns to reduce multiple column values to a single value. Create new columns to hold the results. Specify whether to remove missing values. Be sure to group the column data first by using the group_by operation.

summarize_if

summarize_if(<predicate_conditions>,…)
Apply aggregate functions to columns that meet the specified conditions to reduce multiple column values to a single value. Specify whether to remove missing values. Be sure to group the column data first by using the group_by operation. Supported functions include:

  • count
  • max
  • mean
  • min
  • standard deviation
  • sum

tally

tally()
Counts the number of rows (for string columns) or totals the data (for numeric values) by group. Be sure to group the column data first by using the group_by operation.

tally(wt=`<column>`)
Counts the number of rows (for string columns) or totals the data (for numeric columns) by group for the weighted column.

tally( wt=<func>(`<column>`), sort = <logical>)
Applies a function to the specified weighted column and returns the result, by group, sorted or not.

top_n

top_n(provide_value)
Select the top or bottom N rows (by value) in each group. Specify a positive integer to select the top N rows; specify a negative integer to select the bottom N rows.

top_n(provide_value, `<column>`)
Select the top or bottom N rows (by value) in each group, based on the specified column. Specify a positive integer to select the top N rows; specify a negative integer to select the bottom N rows.

transmute

transmute(<new_or_existing_column> = `<column>`)
Add a new column or overwrite an existing one by using the specified expression. Keep only columns that are mentioned in the expression.

transmute(<new_or_existing_column> = <func(column)>)
Add a new column or overwrite an existing one by applying a function to the specified column. Keep only columns that are mentioned in the expression.

transmute(<new_or_existing_column> = `<column>` <operator> `<column>`)
Add a new column or overwrite an existing one by applying an operator to the specified column. Keep only columns that are mentioned in the expression.

transmute(<new_or_existing_column> = `<column>`, <new_or_existing_column> = `<column>`)
Add multiple new columns. Keep only columns that are mentioned in the expression.

transmute(<new_or_existing_column> = if_else( provide_value, provide_value_for_true, provide_value_for_false))
Add a new column or overwrite an existing one by using the specified conditional expressions. Keep only columns that are mentioned in the expressions.

ungroup

ungroup()
Ungroup the data.

Functions

Aggregate

  • mean
  • min
  • n
  • sd
  • sum

Logical

  • is.na

Numerical

  • abs
  • coalesce
  • cut
  • exp
  • floor

Text

  • c
  • coalesce
  • paste
  • tolower
  • toupper

Type

  • as.character
  • as.double
  • as.integer
  • as.logical

Logical operators

  • <
  • <=
  • >=
  • >
  • between
  • !=
  • ==
  • %in%

Learn more