0 / 0
Predict retail sales promotions
Last updated: Dec 11, 2024
Predict retail sales promotions

This tutorial builds two models to predict the effects of future sales promotions, and then compares the models.

Similar to the Condition monitoring tutorial, the data mining process consists of the exploration, data preparation, training, and test phases. Not all of the data in the telco.csv data file are useful in predicting churn. You can use the filter to select only data that is considered to be important for use as a predictor (the fields marked as Important in the model).

Try the tutorial

In this tutorial, you will complete these tasks:

Sample modeler flow and data set

This tutorial uses the Retail Sales Promotion flow in the sample project. The data file used is goods2n.csv. The following image shows the sample modeler flow.

Figure 1. Sample modeler flow
Sample modeler flow

The following image shows the sample data set.
Figure 2. Sample data set
Sample data set

Task 1: Open the sample project

The sample project contains several data sets and sample modeler flows. If you don't already have the sample project, then refer to the Tutorials topic to create the sample project. Then follow these steps to open the sample project:

  1. In Cloud Pak for Data, from the Navigation menu Navigation menu, choose Projects > View all Projects.
  2. Click SPSS Modeler Project.
  3. Click the Assets tab to see the data sets and modeler flows.

Checkpoint icon Check your progress

The following image shows the project Assets tab. You are now ready to work with the sample modeler flow associated with this tutorial.

alt text

Back to the top

Task 2: Examine the Data Asset, Derive, and Type nodes

Retail Sales Promotion includes several nodes. Follow these steps to examine the Data Asset, Derive, and Type nodes:

Data Asset node

  1. From the Assets tab, open the Retail Sales Promotion modeler flow, and wait for the canvas to load.
  2. Double-click the goods1n.csv node. This node is a Data Asset node that points to the goods1n.csv file in the project.
  3. Review the File format properties.
  4. Click Preview data to see the full data set.
  5. Notice that each record contains:
    • Class. Product type.
    • Cost. Unit price.
    • Promotion. Index of amount spent on a particular promotion.
    • Before. Revenue before promotion.
    • After. Revenue after promotion.

    The two revenue fields (Before and After) are expressed in absolute terms. However, it seems likely that the increase in revenue after the promotion (and presumably as a result of it) might be a more useful figure.

  6. Close the data preview and the properties side pane.

Derive node

  1. Double-click the Increase (Derive) node. This node derives the value of the increase in revenue.
  2. Review the settings, in particular, the Expression field; which contains a formula to derive the increase as a percentage of the revenue before the promotion: (After - Before) / Before * 100.0.
  3. Click Preview data to see the data set with the derived values.
  4. Notice the Increase column.

    For each class of product, and almost linear relationship exists between the increase in revenue and the cost of the promotion. Therefore, it seems likely that a decision tree or neural network could predict, with reasonable accuracy, the increase in revenue from the other available fields.

  5. Close the data preview and the properties side pane.

Type node

  1. Double-click the Define Types (Type) node. This node specifies field properties, such as measurement level (the type of data that the field contains), and the role of each field as a target or input in modeling. The measurement level is a category that indicates the type of data in the field. The source data file uses three different measurement levels:
    • A Continuous field (such as the Age field) contains continuous numeric values.
    • A Nominal field (such as the Education field) has two or more distinct values—in this case College or High school.
    • An Ordinal field (such as the Income level field) describes data with multiple distinct values that have an inherent order—in this case Low, Medium, and High.

      For each field, the Type node also specifies a role to indicate the part that each field plays in modeling. The role is set to Target for the field Increase, which is the field that was derived. The target is the field for which you want to predict the value.

      Role is set to Input for most other fields. Input fields are sometimes known as predictors, or fields whose values are used by the modeling algorithm to predict the value of the target field.

      The role for the After field is set to None, so this field is not used by the modeling algorithm.

  2. Optional: Click Preview data to see the data set with the derived values.

Checkpoint icon Check your progress

The following image shows the Type node. You are now ready to generate and compare the models.

Type node

Back to the top

Task 3: Generate and compare the models

The flow trains a neural network and a decision tree to make this prediction of revenue increase. Follow these steps to generate the two models:

Generate the models

  1. Double-click the Increase (Neural net) node to review its properties.
    1. Expand the Basics section to see that the Multilayer Perceptron is the model type. This property determines how the network connects the predictors to the targets through the hidden layers. Multilayer perceptron allows for more complex relationships at the possible cost of increasing the training and scoring time.
    2. Expand the Model Options section to see the evaluation and scoring properties.
  2. Double-click the Increase (C&R Tree) node to see its properties.
  3. Click Run all Run icon, and wait for the model nuggets to generate.
Compare the models
  1. Connect the Increase (C&R Tree) model nugget to the Increase (Neural net).
  2. Add an Analysis node:
    1. From the palette, expand the Outputs section.
    2. Drag the Analysis node on to the canvas.
    3. Connect the Increase (Neural net) model nugget to the Analysis node.
  3. Change the data set to use different data for the analysis:
    1. Double-click the goods1n.csv node to view its properties.
    2. CV lick Change data set.
    3. Navigate to Data asset > GOODS2n.csv.
    4. Click Select.
    5. Click Save.
  4. Hover over the Analysis node, and click the Run icon Run icon.
  5. In the Outputs and models pane, click the output with the name Analysis to view the results.

    From the Analysis output, in particular from the linear correlation between the predicted increase and the correct answer, you see that the trained systems predict the increase in revenue with a high degree of success.

    Further exploration might focus on the cases where the trained systems make relatively large errors. You might identify these errors by plotting the predicted increase in revenue against the actual increase. You might then select outliers on a graph by using the interactive graphics within SPSS Modeler, and from their properties, it might be possible to tune the data description or learning process to improve accuracy.

Checkpoint icon Check your progress

The following image shows the output from the Analysis node.

Analysis node results

Back to the top

Summary

This example showed you how to predict the effects of future sales promotions. Similar to the condition monitoring example, the data mining process consists of the exploration, data preparation, training, and test phases.

Next steps

You are now ready to try other SPSS® Modeler tutorials.