Forecasting

Data preparation for time series data

Data preparation for time series data (TSDP) provides the functionality that converts raw time data (in Flattened multi-dimensional format, which includes transactional (event) based and column-based data) into regular time series data (in compact row-based format) which is required by the subsequent time series analysis methods. The main job of TSDP is to generate time series in terms of the combination of each unique value in the dimension fields with metric fields. In addition, it sorts the data based on the timestamp, extracts metadata of time variables, transforms time series with another time granularity (interval) by applying an aggregation or distribution function, checks the data quality, and handles missing values if needed.

Example code:

import com.ibm.spss.ml.forecasting.TimeSeriesDataPreparation

val tsdp = TimeSeriesDataPreparation().
  setMetricFieldList(Array("Demand")).
  setDateTimeField("Date").
  setEncodeSeriesID(true).
  setInputTimeInterval("MONTH").
  setOutTimeInterval("MONTH").
  setQualityScoreThreshold(0.0).
  setConstSeriesThreshold(0.0)

val tsdpOut = tsdp.transform(data)

TimeSeriesDataPreparationConvertor

This is the date/time convertor API that is used to provide some functionalities of the date/time convertor inside TSDP for applications to use. There are two use cases for this component: 1) Compute the time points between a specified start and end time. In this case, the start and end time both occur after the first observation in the previous TSDP's output. 2) Compute the time points between a start index and end index referring to the last observation in the previous TSDP's output.

Temporal causal modeling

Temporal causal modeling (TCM) refers to a suite of methods that attempt to discover key temporal relationships in time series data by using a combination of Granger causality and regression algorithms for variable selection.

Example code:

import com.ibm.spss.ml.forecasting._

val tsdp = TimeSeriesDataPreparation().setDimFieldList(Array("Demension1", "Demension2")).
  setMetricFieldList(Array("m1", "m2", "m3", "m4")).
  setDateTimeField("date").
  setEncodeSeriesID(true).
  setInputTimeInterval("MONTH").
  setOutTimeInterval("MONTH")
val tsdpOutput = tsdp.transform(changedDF)

val lcm = LocalContainerManager()
  lcm.exportContainers("TSDP", tsdp.containers)
val estimator = TemporalCausal(lcm).
  setInputContainerKeys(List("TSDP")).
  setTargetPredictorList(List(Predictor(
  targetList = List(List("","","")),
  predictorCandidateList = Some(List(List("","","")))))).
  setMaxNumPredictor(MaxNumberOfPredictor(false, 4)).
  setMaxLag(MaxLag("SETTING", 5)).setTolerance(1e-6)

val tcmModel = estimator.fit(tsdpOutput)
val transformer = tcmModel.setDataEncoded(true).
  setCILevel(0.95).
  setOutTargetValues(false).
  setTargets(FieldSettings(fieldNameList = Some(FieldNameList(seriesIDList = Some(List(List("da1","db1","m1"))))))).
  setReestimate(false).
  setForecast(Forecast(outForecast = true, forecastSpan = 5,
  outCI = true)).
  setFit(Fit(outFit = true, outCI = true, outResidual = true))

val predictions = transformer.transform(tsdpOutput)
val rtsdp = ReverseTimeSeriesDataPreparation(lcm).
  setInputContainerKeys(List("TSDP")).
  setDeriveFutureIndicatorField(true)

val rtsdpOutput = rtsdp.transform(predictions)
  rtsdpOutput.show()

Temporal Causal Auto Regressive Model

Autoregressive (AR) models are built to compute out-of-sample forecasts for predictor series that are not target series. These predictor forecasts are then used to compute out-of-sample forecasts for the target series.

Model produced by TemporalCausal

TemporalCausal exports outputs:

JSON file, contains TemporalCausal model information.
XML file, contains multi series model.

For details about outputs, refer to the TemporalCausal Output Document.

Example code:

import com.ibm.spss.ml.forecasting.{TemporalCausal, TemporalCausalAutoRegressiveModel}

val lcm = LocalContainerManager()
val arEstimator = TemporalCausal(lcm).
 setInputContainerKeys(List(tsdp.uid)).
 setTargetPredictorList(List(Predictor(
   targetList = List(List("da1", "db1", "m2")),
   predictorCandidateList = Some(List(
     List("da1", "db1", "m1"),
     List("da1", "db2", "m1"),
     List("da1", "db2", "m2"),
     List("da1", "db3", "m1"),
     List("da1", "db3", "m2"),
     List("da1", "db3", "m3")))))).
 setMaxNumPredictor(MaxNumberOfPredictor(false, 5)).
 setMaxLag(MaxLag("SETTING", 5))

 arEstimator.fit(df)

val tcmAr = TemporalCausalAutoRegressiveModel(lcm).
 setInputContainerKeys(List(arEstimator.uid)).
 setDataEncoded(true).
 setOutTargetValues(true).
 setTargets(FieldSettingsAr(Some(FieldNameList(seriesIDList = Some(List(List("da1", "db1", "m1"), List("da1", "db2", "m2"), List("da1", "db3", "m3"))))))).
 setForecast(ForecastAr(forecastSpan = 5))
val scored = tcmAr.transform(df)
scored.show()

Temporal Causal Outlier Detection

One of the advantages of building TCM models is the ability to detect model-based outliers. Outlier detection refers to a capability to identify the time points in the target series with values that stray too far from their expected (fitted) values based on the TCM models.

Temporal Causal Root Cause Analysis

The root cause analysis refers to a capability to explore the Granger causal graph in order to analyze the key/root values that resulted in the outlier in question.

Temporal Causal Scenario Analysis

Scenario analysis refers to a capability of the TCM models to "play-out" the repercussions of artificially setting the value of a time series. A scenario is the set of forecasts that are performed by substituting the values of a root time series by a vector of substitute values.

Temporal Causal Summary

TCM Summary selects Top N models based on one model quality measure. There are five model quality measures: Root Mean Squared Error (RMSE), Root Mean Squared Percentage Error (RMSPE), Bayesian Information Criterion (BIC), Akaike Information Criterion (AIC), and R squared (RSQUARE). Both N and the model quality measure can be set by the user.

Time Series Exploration

Time Series Exploration explores the characteristics of time series data based on some statistics and tests to generate preliminary insights about the time series before modeling. It covers not only analytic methods for expert users (including time series clustering, unit root test, and correlations), but also provides an automatic exploration process based on a simple time series decomposition method for business users.

Example code:

import com.ibm.spss.ml.forecasting.TimeSeriesExploration
val data = sqlContext.read.format("com.databricks.spark.csv").load("hdfs://...")

val tse = TimeSeriesExploration().
    setAutoExploration(true).
    setClustering(true)
val tseModel = tse.fit(data)

val predictions = tseModel.transform(data)
predictions.show()

Reverse Data preparation for time series data

Reverse Data preparation for time series data (RTSDP) provides the functionality that converts the compact row based (CRB) format that is generated by TimeSeriesDataPreperation (TSDP) or TemporalCausalModel (TCM Score) back to the flattened multidimensional (FMD) format.

Example code:

import com.ibm.spss.ml.common.LocalContainerManager
import com.ibm.spss.ml.forecasting.ReverseTimeSeriesDataPreparation
import com.ibm.spss.ml.forecasting.TimeSeriesDataPreparation

val manager = LocalContainerManager()

val tsdp = TimeSeriesDataPreparation(manager).
  setDimFieldList(Array("Dimension1", "Dimension2", "Dimension3")).
  setMetricFieldList(Array("Metric1", "Metric2", "Metric3", "Metric4", "Metric5", "Metric6", "Metric7", "Metric8", "Metric9", "Metric10")).
  setDateTimeField("TimeStamp").
  setEncodeSeriesID(false).
  setInputTimeInterval("WEEK").
  setOutTimeInterval("WEEK").
  setMissingImputeType("LINEAR_INTERP").
  setQualityScoreThreshold(0.0).
  setConstSeriesThreshold(0.0).
  setGroupType(GroupType(List(("Metric1", "MEAN"), ("Metric2", "SUM"), ("Metric3", "MODE"), ("Metric4", "MIN"), ("Metric5", "MAX"))))

 val tsdpOut = tsdp.transform(changedDF)

 val rtsdp = ReverseTimeSeriesDataPreparation(manager).
   setInputContainerKeys(List(tsdp.uid)).
   setDeriveFutureIndicatorField(true)

 val rtdspOut = rtsdp.transform(tsdpOut)

Autoregressive Integrated Moving Average

The Autoregressive Integrated Moving Average (ARIMA) model is a traditional time series model which was first popularized by Box and Jenkins (1976). The model is built for each target time series and can be used to forecast future values. When the model includes other time series as predictor series in addition to the ARIMA part for the target series, it becomes the transfer function (TF) model in which the target series is a function of its own past values, past errors (also called shocks or innovations), and current and past values of the predictor series.

Example code:

import com.ibm.spss.ml.forecasting.traditional.TimeSeriesForecastingArima

val tsdp = TimeSeriesDataPreparation().
   setDimFieldList(Array("da", "db")).
   setMetricFieldList(Array("metric")).
   setDateTimeField("date").
   setEncodeSeriesID(false).
   setInputTimeInterval("MONTH").
   setOutTimeInterval("MONTH")

val lcm = LocalContainerManager()
lcm.exportContainers("k", tsdp.containers)

val arima = TimeSeriesForecastingArima(lcm).
  setTargetPredictorList(List(Predictor(targetList = List(List("da1", "db1", "m1")))))
  setBuildScoringModelOnly(false).
  setMaxNumLags(4).
  setInputContainerKeys(Seq("k"))

val rtsdp = ReverseTimeSeriesDataPreparation(tsdp.manager).
  setInputContainerKeys(List(tsdp.uid)).
  setDeriveFutureIndicatorField(true)

val pipeline = new Pipeline().setStages(Array(tsdp, arima, rtsdp))
val scored = pipeline.fit(data).transform(data)
scored.show()

Croston

Croston and Modified Croston methods are used to model and forecast intermittent time series. Many time series, for example the demand series for spare parts, are intermittent time series. The demands for spare parts appear at random, with many time periods having no demand. The Croston's method gives a more explicit representation of the demand pattern by making separate estimates of demand size and inter-arrival interval of demands. This strategy greatly increases the accuracy of intermittent demand forecasts and stock control. Note that the Croston's method forecasts the average demand rate, not the point estimation at a particular time point. An intermittency test is also provided to identify whether the input time series is intermittent before applying the Croston's model.

Expert

Expert modeler for time series forecasting is an automatic model identification tool. It applies some time series model, such as ARIMA and/or exponential smoothing, to a specified target series and then recommends a model or top N models based on a model quality measure.

ExponentialSmoothing

The exponential smoothing model is a time series model for a univariate time series. "Smoothing" implies predicting a current observation by a weighted average of the past values. "Exponential" implies that the weights are decreased exponentially over time. Thirteen types of exponential smoothing models are provided to handle level, trend, or seasonality in the time series.

Garch

Many time series often show random varying variances, which is also called “realized volatility” and often of great importance. GARCH (Generalized Auto Regressive Conditional Heteroscedasticity) models are widely used to characterize and model time series with time-varying volatility. GARCH models assume the variance of the current error term to be a linear function of the squares of the previous errors and their variances.

ModelReEstimate

When new time series data arrives, re-estimate will estimate parameters of a time series model, which is built on the old time series data, with new and old time series data.

Example code:

import com.ibm.spss.ml.forecasting.traditional.TimeSeriesForecastingModelReEstimate

val tsdp = TimeSeriesDataPreparation().
   setDimFieldList(Array("da", "db")).
   setMetricFieldList(Array("metric")).
   setDateTimeField("date").
   setEncodeSeriesID(false).
   setInputTimeInterval("MONTH").
   setOutTimeInterval("MONTH")

val lcm = LocalContainerManager()
lcm.exportContainers("k", tsdp.containers)

val reestimate = TimeSeriesForecastingModelReEstimate(lcm).
  setForecast(ForecastEs(outForecast = true, forecastSpan = 4, outCI = true)).
  setFitSettings(Fit(outFit = true, outCI = true, outResidual = true)).
  setOutInputData(true).
  setInputContainerKeys(Seq("k"))

val rtsdp = ReverseTimeSeriesDataPreparation(tsdp.manager).
  setInputContainerKeys(List(tsdp.uid)).
  setDeriveFutureIndicatorField(true)

val pipeline = new Pipeline().setStages(Array(tsdp, reestimate, rtsdp))
val scored = pipeline.fit(data).transform(data)
scored.show()