Data metadata

This section describes how to set up the data model attributes based on pyspark.sql.StructField.

spss.datamodel.Role Objects

This class enumerates valid roles for each field in a data model.

BOTH: Indicates that this field can be either an antecedent or a consequent.

FREQWEIGHT: Indicates that this field is to be used as a frequency weight; this isn't displayed to the user.

INPUT: Indicates that this field is a predictor or an antecedent.

NONE: Indicates that this field is not used directly during modeling.

TARGET: Indicates that this field is predicted or a consequent.

PARTITION: Indicates that this field identifies the data partition.

RECORDID: Indicates that this field identifie the record id.

SPLIT: Indicates that this field splits the data.

spss.datamodel.Measure Objects

This class enumerates measurement levels for fields in a data model.

UNKNOWN: Indicates that the measure type is unknown.

CONTINUOUS: Indicates that the measure type is continuous.

NOMINAL: Indicates that the measure type is nominal.

FLAG: Indicates that the field value is one of two values.

DISCRETE: Indicates that the field value should be interpreted as a collection of values.

ORDINAL: Indicates that the measure type is ordinal.

TYPELESS: Indicates that the field can have any value compatible with its storage.

pyspark.sql.StructField Objects

Represents a field in a StructType. A StructField object comprises four fields:
  • name (string): name of a StructField
  • dataType (pyspark.sql.DataType): specific data type
  • nullable (bool): if the values of a StructField can contain None values
  • metadata (dictionary): a python dictionary that stores the option attributes
You can use the metadata dictionary instance to store the measure, role, or label attribute for the specific field. The key words for these attributes are:
  • measure: the key word for measure attribute
  • role: the key word for role attribute
  • displayLabel: the key word for label attribute
from spss.datamodel.Role import Role
from spss.datamodel.Measure import Measure
_metadata = {}
_metadata['measure'] = Measure.TYPELESS
_metadata['role'] = Role.NONE
_metadata['displayLabel'] = "field label description"
StructField("userName", StringType(), nullable=False,