Compress stage in DataStage

Compress stage

The Compress stage uses the UNIX compress or GZIP utility to compress a data set. It converts a data set from a sequence of records into a stream of raw binary data.

The Compress stage is a proessing stage. It can have a single input link and a single output link.

The complement to the Compress stage is the Expand stage, which is described in Expand stage.

A compressed data set is similar to an ordinary data set and can be stored in a persistent form by a Data Set stage. However, a compressed data set cannot be processed by many stages until it is expanded, that is, until its rows are returned to their normal format. Stages that do not perform column-based processing or reorder the rows can operate on compressed data sets. For example, you can use the Copy stage to create a copy of the compressed data set.

Because compressing a data set removes its normal record boundaries, the compressed data set must not be repartitioned before it is expanded.

DataStage® puts the existing data set schema as a subrecord to a generic compressed schema. For example, given a data set with a schema of:

a:int32;
b:string[50];

The schema for the compressed data set would be:

record
  ( t: tagged {preservePartitioning=no}
    ( encoded: subrec
        ( bufferNumber: dfloat;
          bufferLength: int32;
          bufferData: raw[32000];
         );
      schema: subrec
        ( a: int32;
          b: string[50];
         );

Therefore, when you are looking to reuse a file that has been compressed, ensure that you use the 'compressed schema' to read the file rather than the schema that had gone into the compression.

When you double-click the Compress stage, the properties panel opens. The properties panel has three tabs:

Stage. This is always present and is used to specify general information about the stage.
Input. This is where you specify details about the data set being compressed.
Output. This is where you specify details about the compressed data being output from the stage.