Compress stage
The Compress stage uses the UNIX compress or GZIP utility to compress a data set. It converts a data set from a sequence of records into a stream of raw binary data.
The Compress stage is a proessing stage. It can have a single input link and a single output link.
The complement to the Compress stage is the Expand stage, which is described in Expand stage.
A compressed data set is similar to an ordinary data set and can be stored in a persistent form by a Data Set stage. However, a compressed data set cannot be processed by many stages until it is expanded, that is, until its rows are returned to their normal format. Stages that do not perform column-based processing or reorder the rows can operate on compressed data sets. For example, you can use the Copy stage to create a copy of the compressed data set.
Because compressing a data set removes its normal record boundaries, the compressed data set must not be repartitioned before it is expanded.
a:int32;
b:string[50];
The schema for the compressed data set would be:record
( t: tagged {preservePartitioning=no}
( encoded: subrec
( bufferNumber: dfloat;
bufferLength: int32;
bufferData: raw[32000];
);
schema: subrec
( a: int32;
b: string[50];
);
Therefore, when you are looking to reuse a file that has been compressed,
ensure that you use the 'compressed schema' to read the file rather than the schema that had gone
into the compression.When you double-click the Compress stage, the properties panel opens. The properties panel has three tabs: