Implementing Abstract Methods of Processor Class (Java Integration stage)

Implementing abstract methods of the Processor class (DataStage)

Last updated: Nov 07, 2024

Your Java™ code must implement a subclass of the Processor class. The Processor class consists of methods that are invoked by the Java Integration stage. When a job that includes the Java Integration stage starts, the stage instantiates your Processor class and calls the logic within your Processor implementations.

The Processor class provides the following list of methods that the Java Integration stage can call to interact with your Java code at job execution time or at design-time.

getCapabilities()
validateConfiguration()
getConfigurationErrors()
getBeanForInput()
getBeanForOutput()
getAdditionalOutputColumns()
initialize()
process()
terminate()
getColumnMetadataForInput()
getColumnMetadataForOutput()
getUserPropertyDefinitions()

At minimum, your Java code must implement the following two abstract methods.

public abstract boolean validateConfiguration(Configuration configuration, boolean isRuntime) throws Exception;
public abstract void process() throws Exception;

The following example shows the simple peek stage implementation that prints record column values to the job log which can be viewed in Director client. It assumes single input link.


package samples;

import com.ibm.is.cc.javastage.api.*;

public class SimplePeek extends Processor
{
  private InputLink m_inputLink;

  public boolean validateConfiguration(
    Configuration configuration, boolean isRuntime)throws Exception
  {
    if (configuration.getInputLinkCount() != 1)
    {
      // this sample code assumes stage has 1 input link.
      return false;
    }
    
    m_inputLink = configuration.getInputLink(0);
     
    return true;   
  }

  public void process() throws Exception
  {
    do
    {
      InputRecord inputRecord = m_inputLink.readRecord();
      if (inputRecord == null)
      { 
         // No more input. Your code must return from process() method.
         break;
      }

      for (int i = 0; i < m_inputLink.getColumnCount(); i++) 
      { 
         Object value = inputRecord.getValue(i);
         Logger.information(value.toString());
      }
    } 
    while (true);
  }
}

The Java Integration stage calls the validateConfiguration() method to specify the current configuration (number and types of links), and the values for the user properties. Your Java code must validate a given configuration and user properties and return false to Java Integration stage if there are problems with them. In the previous example, since this code assumes a stage that has single input link, it checks the number of input links and returns false if the stage configuration does not meet this requirement.

if (configuration.getInputLinkCount() != 1)
{
  // this sample code assumes stage has 1 input link.
  return false;
}

The Configuration interface defines methods that are used to get the current stage configuration (number and types of links), and the values for the user properties. The getInputLinkCount() method is used to get the number of input links connected to this stage.

If stage configuration is accepted by your Java code, it saves the reference to an InputLink object for subsequent processing, and returns true to the Java Integration stage.

m_inputLink = configuration.getInputLink(0);     
        return true;
      }

After the stage configuration is verified by your Java code, you can interact with the stages connected in your job. The process() method is an entry point for processing records from the input link or to the output link. When a row is available on any of the stage input links (if any and whatever the number of output links is), the Java Integration stage calls this method, if the job does not end. Your Java code must consume all rows from the stage input links.

By calling the readRecord() method of the InputLink interface, your Java code can consume a row from the input link. It returns an object that implements the InputRecord interface. The InputRecord interface defines methods that are used to get column data from a consumed row record.

InputRecord inputRecord = m_inputLink.readRecord();
if (inputRecord == null)
{ 
  // No more input. Your code must return from process() method.
  break;
}

After your Java code consumes a row record from the stage input link, your Java code can get the column record values by calling the getValue(int columnIndex) method of the InputRecord interface. The getColumnCount() in InputLink returns the number of columns that exist in this input link.

for (int i = 0; i < m_inputLink.getColumnCount(); i++) 
{ 
  Object value = inputRecord.getValue(i);

Finally, each column value is written to the job log by calling the information() method of the Logger class. The Logger class allows your Java code to write the data to job log with specified log levels. The following code writes the string representation of each column value to a job log.

Logger.information(value.toString());