Data Refinery provides interactive templates for you to code operations, functions, and logical operators. Access the templates from the command-line text box at the top of the page. The templates include interactive assistance to help you with the syntax options.
Operations
arrange
arrange(`<column>
`)
Sort rows, in ascending order, by the specified columns.
arrange(desc(`<column>
`))
Sort rows, in descending order, by the specified column.
arrange(`<column>
`, `<column>
`)
Sort rows, in ascending order, by each specified, successive column, keeping the order from the prior sort intact.
count
count()
Total the data by group.
count(`<column>
`)
Group the data by the specified column and return the number of rows with unique values (for string values) or return the total for each group (for numeric values).
count(`<column>
`, wt=`<column>
`)
Group the data by the specified column and return the number of rows with unique values (for string values) or return the total for each group (for numeric values)
in the specified weight column.
count(`<column>
`, wt=<func>
(`<column>
`))
Group the data by the specified column and return the result of the function applied to the specified weight column.
count(`<column>
`, wt=<func>
(`<column>
`), sort = <logical>
)
Group the data by the specified column and return the result of the function applied to the specified
weight column, sorted or not.
distinct
distinct()
Keep distinct, unique rows based on all columns or on specified columns.
filter
filter(`<column>
` <logicalOperator>
provide_value)
Keep rows that meet the specified condition and filter out all other rows.
For the Boolean column type, provide_value should be uppercase TRUE
or FALSE.
filter(`<column>
`== <logical>
)
Keep rows that meet the specified filter conditions based on logical value TRUE or FALSE.
filter(<func>
(`<column>
`) <logicalOperator>
provide_value)
Keep rows that meet the specified condition and filter out all other rows. The condition can apply a function to a column
on the left side of the operator.
filter(`<column>
` <logicalOperator>
<func(column)>
)
Keep rows that meet the specified condition and filter out all other rows. The condition can apply a function to a column on
the right side of the operator.
filter(<logicalfunc(column)>
)
Keep rows that meet the specified condition and filter out all other rows. The condition can apply a logical function to a column.
filter(`<column>
` <logicalOperator>
provide_value <andor>
`<column>
` <logicalOperator>
provide_value )
Keep rows that meet the specified conditions
and filter out all other rows.
group_by
group_by(`<column>
`)
Group the data based on the specified column.
group_by(desc(`<column>
`))
Group the data, in descending order, based on the specified column.
mutate
mutate(provide_new_column = `<column>
`)
Add a new column and keep existing columns.
mutate(provide_new_column = <func(column)>
)
Add a new column by using the specified expression, which applies a function to a column. Keep existing columns.
mutate(provide_new_column = case_when(`<column>
` <operator>
provide_value_or_column_to_compare ~ provide_value_or_column_to_replace, `<column>
` <operator>
provide_value_or_column_to_compare
~ provide_value_or_column_to_replace, TRUE ~ provide_default_value_or_column))
Add a new column by using the specified conditional expression.
mutate(provide_new_column = `<column>
` <operator>
`<column>
`)
Add a new column by using the specified expression, which performs a calculation with existing columns. Keep existing
columns.
mutate(provide_new_column = coalesce(`<column>
`, `<column>
`))
Add a new column by using the specified expression, which replaces missing values in the new column with values from another, specified
column. As an alternative to specifying another column, you can specify a value, a function on a column, or a function on a value. Keep existing columns.
mutate(provide_new_column = if_else(`<column>
` <logicalOperator>
provide_value, provide_value_for_true, provide_value_for_false))
Add a new column by using the specified conditional expression. Keep
existing columns.
mutate(provide_new_column = `<column>
`, provide_new_column = `<column>
`)
Add multiple new columns and keep existing columns.
mutate(provide_new_column = n())
Count the values in the groups. Ensure grouping is done already using group_by. Keep existing columns.
mutate_all
mutate_all(funs(<func>
))
Apply the specified function to all of the columns and overwrite the existing values in those columns. Specify whether to remove missing values.
mutate_all(funs(. <operator>
provide_value))
Apply the specified operator to all of the columns and overwrite the existing values in those columns.
mutate_all(funs("provide_value" = . <operator>
provide_value))
Apply the specified operator to all of the columns and create new columns to hold the results. Give the new columns names that end with the specified
value.
mutate_at
mutate_at(vars(`<column>
`), funs(<func>
))
Apply functions to the specified columns.
mutate_if
mutate_if(<predicateFunc>
, <func>
)
Apply functions to the columns that meet the specified condition.
mutate_if(<predicateFunc>
, funs( . <operator>
provide_value))
Apply the specified operator to the columns that meet the specified condition.
mutate_if(<predicateFunc>
, funs(<func>
))
Apply functions to the columns that meet the specified condition. Specify whether to remove missing values.
rename
rename(provide_new_column = `<column>
`)
Rename the specified column.
sample_frac
sample_frac(provide_number_between_0_and_1, weight=`<column>
`,replace=<logical>
)
Generate a random sample based on a percentage of the data. weight is optional and is the ratio of probability the row
will be chosen. Provide a numeric column. replace is optional and its Default is FALSE.
sample_n
sample_n(provide_number_of_rows,weight=`<column>
`,replace=<logical>
)
Generate a random sample of data based on a number of rows. weight is optional and is the ratio of probability the row will be
chosen. Provide a numeric column. replace is optional and its default is FALSE.
select
select(`<column>
`)
Keep the specified column.
select(-`<column>
`)
Remove the specified column.
select(starts_with("provide_text_value"))
Keep columns with names that start with the specified value.
select(ends_with("provide_text_value"))
Keep columns with names that end with the specified value.
select(contains("provide_text_value"))
Keep columns with names that contain the specified value.
select(matches ("provide_text_value"))
Keep columns with names that match the specified value. The specified value can be text or a regular expression.
select(`<column>
`:`<column>
`)
Keep the columns in the specified range. Specify the range as from one column to another column.
select(`<column>
`, everything())
Keep all of the columns, but make the specified column the first column.
select(`<column>
`, `<column>
`)
Keep the specified columns.
select_if
select_if(<predicateFunc>
) Keep columns that meet the specified condition. Supported functions include:
- contains
- ends_with
- matches
- num_range
- starts_with
summarize
summarize(provide_new_column = <func>
(`<column>
`))
Apply aggregate functions to the specified columns to reduce multiple column values to a single value. Be sure to group the column data first by using
the group_by operation.
summarize_all
summarize_all(<func>
)
Apply an aggregate function to all of the columns to reduce multiple column values to a single value. Specify whether to remove missing values. Be sure to group the column data first by using the
group_by operation.
summarize_all(funs(<func>
))
Apply multiple aggregate functions to all of the columns to reduce multiple column values to a single value. Create new columns to hold the results. Specify whether to remove missing values.
Be sure to group the column data first by using the group_by operation.
summarize_if
summarize_if(<predicate_conditions>
,...)
Apply aggregate functions to columns that meet the specified conditions to reduce multiple column values to a single value. Specify whether to remove missing values. Be sure to
group the column data first by using the group_by operation. Supported functions include:
- count
- max
- mean
- min
- standard deviation
- sum
tally
tally()
Counts the number of rows (for string columns) or totals the data (for numeric values) by group. Be sure to group the column data first by using the group_by operation.
tally(wt=`<column>
`)
Counts the number of rows (for string columns) or totals the data (for numeric columns) by group for the weighted column.
tally( wt=<func>
(`<column>
`), sort = <logical>
)
Applies a function to the specified weighted column and returns the result, by group, sorted or not.
top_n
top_n(provide_value)
Select the top or bottom N rows (by value) in each group. Specify a positive integer to select the top N rows; specify a negative integer to select the bottom N rows.
top_n(provide_value, `<column>
`)
Select the top or bottom N rows (by value) in each group, based on the specified column. Specify a positive integer to select the top N rows; specify a negative integer to select the
bottom N rows.
If duplicate rows affect the count, use the Remove duplicates GUI operation prior to using the top_n() operation.
transmute
transmute(<new_or_existing_column>
= `<column>
`)
Add a new column or overwrite an existing one by using the specified expression. Keep only columns that are mentioned in the expression.
transmute(<new_or_existing_column>
= <func(column)>
)
Add a new column or overwrite an existing one by applying a function to the specified column. Keep only columns that are mentioned in the expression.
transmute(<new_or_existing_column>
= `<column>
` <operator>
`<column>
`)
Add a new column or overwrite an existing one by applying an operator to the specified column.
Keep only columns that are mentioned in the expression.
transmute(<new_or_existing_column>
= `<column>
`, <new_or_existing_column>
= `<column>
`)
Add multiple new columns. Keep only columns that are mentioned in the expression.
transmute(<new_or_existing_column>
= if_else( provide_value, provide_value_for_true, provide_value_for_false))
Add a new column or overwrite an existing one by using the specified conditional expressions. Keep only columns
that are mentioned in the expressions.
ungroup
ungroup()
Ungroup the data.
Functions
Aggregate
- mean
- min
- n
- sd
- sum
Logical
- is.na
Numerical
- abs
- coalesce
- cut
- exp
- floor
Text
- c
- coalesce
- paste
- tolower
- toupper
Type
- as.character
- as.double
- as.integer
- as.logical
Logical operators
- <
- <=
- >=
- >
- between
- !=
- ==
- %in%
Parent topic: Refining data