External Filter stage: Stage tab (DataStage)

The Stage tab for the External Filter stage enables you to control aspects of the External Filter stage.

The Properties section on the Stage tab lets you specify what the stage does. The Advanced section allows you to specify how the stage executes.

Properties

The External Filter stage has the following properties:

Table 1. Properties
Category/Property	Values	Default	Mandatory?	Repeats?	Dependent of
Options/Filter command	string	N/A	Y	N	N/A
Options/Arguments	string	N/A	N	N	N/A

Filter command

Specifies the filter command line to be executed and any command line options it requires. For example:


grep

If you use the grep command in the External Filter stage, leading or trailing space characters that are contained within column data are not sent to the output of the stage. To avoid this behavior, use a Wrapped stage. This example uses a comma (,) to delimit the fields:

#!/bin/sh
# ------------------------------------------------------------
# mygrep.op  --  'wrapped grep' example
# ------------------------------------------------------------
#
cat <<END
{
wrapped, kind = parallel,
command = "grep 'abc'",
port = { input  = 0, fd = 0, schema = "record{delim=','}()" },
port = { output = 0, fd = 1, schema = "record{delim=','}()" },
usage = "mygrep"
}
END
# ------------------------------------------------------------
# End of wrapper
# ------------------------------------------------------------

Arguments

Allows you to specify any arguments that the command line requires. For example:


\(cancel\).*\1

Together with the grep command would extract all records that contained the string "cancel" twice and discard other records.

Advanced

This section allows you to specify the following:

Execution Mode. The stage can execute in parallel mode or sequential mode. In parallel mode the input data is processed by the available nodes as specified in the Configuration file, and by any node constraints specified on the Advanced section. In Sequential mode the entire data set is processed by the conductor node.
Combinability mode. This is Auto by default, which allows IBM® DataStage® to combine the operators that underlie parallel stages so that they run in the same process if it is sensible for this type of stage.
Preserve partitioning. This is Set by default. You can explicitly select Set or Clear. Select Set to request the next stage should attempt to maintain the partitioning.