XML Parser step (Hierarchical Data stage)

XML Parser step

Use the XML Parser step to parse one or more documents that have the same structure.

XML Source

For the source of the XML data, specify one of the following options:

String set: Select the input schema item that contains the document string. Only items that have the String, normalizedString, byteString, or XML data types are available for selection.

Single file: Enter the path and file name, or click Insert Parameter and then select the name of the parameter. The parameters that are available are the ones that you previously defined in the job and the built-in macros that are in IBM® InfoSphere® DataStage®. Only items that have the String, normalizedString, or byteString data types are available for selection.
File set: A file set option is used to read multiple xml files that are based on the same xsd. Select the input schema item that will contain in runtime the absolute paths (example, c:\test.xml) of the xml files. Only items that have the String, normalizedString, or byteString data types are available for selection.

Enable Filtering: Enable filtering to apply an XSLT stylesheet to the document before it is parsed. The document root must reflect the document that is created from the result of the XSLT transformation. This option is not recommended for large documents because the entire processing is done in memory. For large documents, use transformation steps.

Document Root

Select the top-level element that describes the documents that you are parsing. The types that display under the library's namespace are top-level element definitions. Following the XML Schema standard, only top-level elements can describe documents. The name of the element that you select must match the top-level element name in the instance documents. For example, if you are parsing Order documents, you select the Order element. When you select the element, you can view its structure and verify that the structure is correct for the documents that you want to parse.

Note: The elements from which you select the document root are from the resources that were previously imported into the schema libraries. If you need to import the resource that contains the document root for the XML Parser step, click Open libraries and import the resource that you need. Then, return to the Assembly Editor and configure the document root.

Validation

By default, when the XML Parser step runs, it uses minimal validation, which disables all of the validation rules and provides better performance than strict validation does. Strict validation is initially configured so that each validation rule is set to Fatal, and the job stops as soon as it parses the first occurrence of invalid data. To customize validation, specify the action to perform when a violation occurs.

For more information about validation rules, see XML Parser validation rules.