You can use Select nodes to select or discard a subset of records from the data stream based on a specific condition, such as BP (blood pressure) = "HIGH".
Mode. Specifies whether records that meet the condition will be included or excluded from the data stream.
- Include. Select to include records that meet the selection condition.
- Discard. Select to exclude records that meet the selection condition.
Condition. Displays the selection condition that will be used to test each record, which you specify using a CLEM expression. Either enter an expression in the window or use the Expression Builder by clicking the calculator (Expression Builder) button to the right of the window.
If you choose to discard records based on a condition, such as the following:
(var1='value1' and var2='value2')
the Select node by default also discards records having null values for all selection fields. To avoid this, append the following condition to the original one:
and not(@NULL(var1) and @NULL(var2))
Select nodes are also used to choose a proportion of records. Typically, you would use a different node, the Sample node, for this operation. However, if the condition you want to specify is more complex than the parameters provided, you can create your own condition using the Select node. For example, you can create a condition such as:
BP = "HIGH" and random(10) <= 4
This will select approximately 40% of the records showing high blood pressure and pass those records downstream for further analysis.