Using stored procedures in DataStage
A stored procedure is a block of procedural constructs and embedded SQL statements. Use the Call procedure option under the Read method or Write method property of the Stage tab to call an existing stored procedure from a database. Use the Call procedure statement option to write your own procedure statement syntax in the text box.
Supported connectors
The following connectors support stored procedures. Stored procedures can be run on connectors with input links, output links, or both.
- Amazon RDS for PostgreSQL
- IBM Cloud® Databases for PostgreSQL
- IBM Db2®
- IBM® Db2 Big SQL
- IBM Db2 on Cloud
- IBM Db2 for i
- IBM Db2 Warehouse
- IBM Db2 for z/OS®
- Google BigQuery
- Microsoft Azure SQL Database
- Microsoft SQL Server
- Microsoft Azure Synapse Analytics
- Oracle
- PostgreSQL
- SAP ASE
- Snowflake
- Teradata
Stored procedures in a source stage
Select Add procedure return value to schema to add the procedure return code and message to the schema as columns ProcCode and ProcMess. The first column on the output link holds the value of the code. The second column holds the value of the message. If you also select Forward row data, the input columns are added to the output link after the ProcCode and ProcMess columns.
Stored procedures in a target stage
Select Execute procedure for each row to execute the procedure for each row of the input data. If you do not select Execute procedure for each row, the procedure will only execute once, for the first row. By default, Execute procedure for each row is selected. This procedure is not supported by the Snowflake connector.
Syntax
?
as a parameter marker. The following syntax calls a
stored procedure with name <procedure_name>
on an input parameter
@<column_name>
and output column ?
, both in
parentheses.CALL <procedure_name>(@<column_name>, ?)
? = CALL <procedure_name>('@<string_column_name>', ?)
In the Stage tab, under General source properties if you choose Read method > Call procedure statement and select Add procedure return value to schema, then two columns (ProcCode and ProcMess) are added to the output as the first columns in the output schema.
- For connectors (except Microsoft SQL Server and SAP ASE), which support stored procedures, always provide
the following statement:
CALL (@column_name, ?)
- For Microsoft SQL Server and SAP ASE connectors, the syntax of
? =
is valid and the procedure return code is captured in?
and sent to the output column. You always need to provide the following statement:? = CALL (@column_name, ?)
If the stored procedure is called in a stage with only output links, then ?
always indicates an output parameter. If the stored procedure is called in a stage with only input
links, then ?
always indicates an input parameter.
CALL <procedure_name>()
CALL <procedure_name>(?, ?)
CALL <procedure_name>(4,?)
A stored procedure for PostgreSQL connector can not return multiple rows into output parameters. To get multiple rows, use the table for user-defined functions instead.
Parameter values
In the Parameter values section of the Stage tab, use parameter specifications to map the input and output parameters of your stored procedure to input and output columns. This determines which columns data is returned in. Parameter names in the Stage tab should match parameter names in your stored procedure. Each parameter name can be mapped to a single column. If a mapping is not provided, the output data for each column is returned in the order the columns are specified. A connector with an input link will have input parameters, a connector with an output link will have output parameters, and a connector with both input and output links will have both.
- @
- The default value for the parameter. The column value that is specified in the mapping is passed to the parameter.
- Literal
- A constant specified as an input parameter. The value of the constant is used, and the value from the input link is discarded.
- Local parameter or a parameter from a parameter set
- For a parameter from a parameter set, use
(#name#)
format. The value of the parameter is used, and the value from the input link is discarded.
Each parameter must be mapped to the input column. For example, P1 should be mapped to COLUMN_1, and P2 to COLUMN_2. Parameter mapping does not replace the column definition. However, you can discard the value from an input link and use the value in the Marker / Literal column.
Optional parameters
You can use optional parameters only for the stored procedures. User-defined functions do not support optional parameters. A stored procedure can have one or more optional parameters. If you want to omit optional parameters while calling a stored procedure, use Call procedure statement. If a stored procedure has one or more optional parameters, all optional parameters must be defined after output parameters. Optional parameters with the Null value must be grouped at the end.
- IBM Db2
- IBM Db2 Big SQL
- IBM Db2 on Cloud
- IBM Db2 for i
- IBM Db2 Warehouse
- Microsoft Azure SQL Database
- Microsoft SQL Server
- Oracle
- Snowflake
Example: A stored procedure MYSCHEMA.MYSP
has two required input
parameters (inparam1, inparam2), two optional inputs (optinparam3, optinparam4), and two output
parameters. See the following examples of valid and invalid use of Call procedure
statement:
Valid:
CALL MYSCHEMA.MYSP(inparam1, inparam2, ?, ?, optinparam3, optinparam4)
In
this case the Input value for optinparam3 is used, and default value for optinparam4 is
used:CALL MYSCHEMA.MYSP(inparam1, inparam2, ?, ?, optinparam3)
In this case
the default value for both optinparam3 and optinparam4 is
used:CALL MYSCHEMA.MYSP(in1, in2, ?, ?)
CALL MYSCHEMA.MYSP(inparam1, inparam2, optinparam3, optinparam4, ?, ?)
CALL MYSCHEMA.MYSP(inparam1, inparam2, ?, ?, , optinparam4)
Query timeout
For long-running queries and stored procedures, set the Query timeout
property to limit the number of seconds that a procedure runs for. The default value is 300 seconds.
If you set a value greater than 600 seconds, you must also edit the environment variable
$OSH_JOB_START_TIMEOUT
and set a value higher than the Query
timeout. This property is available on Microsoft SQL Server, Microsoft Azure SQL
Database, IBM Db2, IBM
Db2 Big SQL, IBM Db2 on Cloud, IBM Db2 for i, IBM Db2
Warehouse, and IBM Db2 for z/OS.
User-defined functions
When Call procedure or Call procedure statement is selected, you can select User defined function to specify a user-defined function to execute. A user-defined function can take input parameters and returns either a single value (for scalar functions) or a set of values (for table functions). This property is available on Amazon RDS for PostgreSQL, Db2, IBM Cloud Databases for PostgreSQL, IBM Db2 Big SQL, IBM Db2 on Cloud, IBM Db2 for i, IBM Db2 Warehouse, IBM Db2 for z/OS, PostgreSQL, Snowflake, and Teradata. In IBM Db2 for z/OS, user-defined functions must be called as part of a SELECT statement. In Teradata, macro-type functions are also available.
User-defined functions support both source and transform modes. For the Source mode, the data is read from the database. For the Transform mode, the data is read, transformed, and written to the target node.
-
- Scalar
- Returns a single value.
-
- Table
- Returns a table.
- For scalar user-defined
function:
SELECT “Sclr_udf_name”(@arg1,@arg2,…);
- For table user-defined
function:
SELECT * FROM “table_udf_name”(@arg1,@arg2,…);