This tutorial uses the Auto Numeric node to automatically create and compare
different models for continuous (numeric range) outcomes, such as predicting the taxable value of a
property. With a single node, you can estimate and compare a set of candidate models and generate a
subset of models for further analysis. The node works in the same manner as the Auto
Classifier node, but for continuous rather than flag or nominal targets.
The node combines the best of the candidate models into a single aggregated (Ensembled) model
nugget. This approach combines the ease of automation with the benefits of combining multiple
models, which often yield more accurate predictions than can be gained from any one model.
This example focuses on a fictional municipality responsible for adjusting and assessing real
estate taxes. To accomplish this goal more accurately, you build a model that predicts property
values based on building type, neighborhood, size, and other known factors.
Preview the tutorial
Watch this video to preview the steps in this tutorial. There might
be slight differences in the user interface that is shown in the video. The video is intended to be
a companion to the written tutorial. This video provides a visual method to learn the concepts and
tasks in this documentation.
This tutorial uses the Automated Modeling for a Continuous Target flow in the sample
project. The data file used is property_values_train.csv. The following image shows the
sample modeler flow.
Figure 1. Sample modeler flow
The data file includes a field that is named taxable_value, which is the
target field, or value, that you want to predict. The other fields contain information
such as neighborhood, building type, and interior volume, and might be used as predictors.
Field name
Label
property_id
Property ID
neighborhood
Area within the city
building_type
Type of building
year_built
Year built
volume_interior
Volume of interior
volume_other
Volume of garage and extra buildings
lot_size
Lot size
taxable_value
Taxable value
The following image shows the sample data set.Figure 2. Sample data set
Task 1: Open the sample project
The sample project contains several data sets and sample modeler flows. If you don't already have
the sample project, then refer to the Tutorials topic to create the sample project. Then follow these steps to open the sample
project:
In Cloud Pak for Data, from the Navigation menu, choose
Projects > View all Projects.
Click SPSS Modeler Project.
Click the Assets tab to see the data sets and modeler flows.
Check your progress
The following image shows the project Assets tab. You are now ready to work with the sample
modeler flow associated with this tutorial.
This example uses an Auto Numeric Modeling node which estimates and compares models to try
out various approaches for a continuous numeric range. Follow these steps to configure the
Modeling node:
Double-click the taxable-value node to see its properties.
Expand the Basics section, and set the following properties:
For the Rank models by field, select Correlation.
For the Number of models to use field, type 3. This means that the three
best models will be built when you run the node.
Figure 4. Auto Numeric node Basics
section
Expand the Expert section. There are six algorithms that are selected which results in
the node estimating a single model for each algorithm, for a total of six models. (Alternatively,
you can modify these settings to compare multiple variants for each model type.) Because you set the
Number of models to use property to 3 in the Basics section, the node
calculates the accuracy of the six algorithms and build a single model nugget containing the three
most accurate.Figure 5. Auto Numeric node Expert
section
Expand the Ensemble section to view the default settings. Since you use a continuous
target in this example, the ensemble score is generated by averaging the scores for the individual
models.Figure 6. Auto Numeric node Ensemble
section
Check your progress
The following image shows the Modeling node. You are now ready to compare the models.
Now that you specified the three models to build, follow these steps to generate and compare the
models:
Hover over the taxable_value node, and click the Run icon .
In the Outputs and models pane, click the results with the name taxable_value to
view the results.
You'll see details about each of the models that are created during the run. (In
a real situation, in which hundreds of models are estimated on a large dataset, running the flow
might take many hours.) The table contains a set of models that are generated by the Modeling
node.
To explore any of the individual models further, click a model name in the Estimator
column to see the individual model results.
View the Model Information page. This table contains information on the type of model
that is fitted, identifies the target field, the number of input features, activation functions, and
the size of the resulting network.
View any other pages for the model.
Close the model details.
By default, models are sorted by accuracy (correlation) because you
selected correlation as the measure in the Auto Numeric node's properties. For purposes of
ranking, the absolute value of the accuracy is used, with values closer to 1 indicating a stronger
relationship.
You can sort on a different column by clicking the header for that column.
Based on these results, you decide to use all three of these most accurate
models. By combining predictions from multiple models, limitations in individual models might be
avoided, resulting in a higher overall accuracy.
Verify that all three models are selected in the Use column.
Close the View Model: taxable_value window.
Check your progress
The following image shows the model comparison table. You are now ready to run the model
analysis.
Now that you viewed a comparison of the three models, you can follow these steps to run an
analysis of the models:
Hover over the Analysis node, and click the Run icon .
In the Outputs and models pane, click the output results with the name Analysis to
view the results.
The averaged score that is generated by the ensembled model is
added in a field that is named $XR-taxable_value, with a correlation of 0.934,
which is higher than those scores of the three individual models. The ensemble scores also show a
low mean absolute error and might perform better than any of the individual models when applied to
other datasets.
Check your progress
The following image shows the model comparison from the Analysis node.
With this example Automated Modeling for a Flag Target flow, you used the
Auto Numeric node to compare several different models, selected the three most accurate
models, and added them to the flow within an ensembled Auto Numeric model nugget.
The ensembled model showed performance that was better than two of the individual models and
might perform better when applied to other datasets. If your goal is to automate the process as much
as possible, this approach assists with obtaining a robust model under most circumstances without
having to dig deeply into the specifics of any one model.
Use this interactive map to learn about the relationships between your tasks, the tools you need, the services that provide the tools, and where you use the tools.
Select any task, tool, service, or workspace
You'll learn what you need, how to get it, and where to use it.
Tasks you'll do
Some tasks have a choice of tools and services.
Tools you'll use
Some tools perform the same tasks but have different features and levels of automation.
Create a notebook in which you run Python, R, or Scala code to prepare, visualize, and analyze data, or build a model.
Automatically analyze your tabular data and generate candidate model pipelines customized for your predictive modeling problem.
Create a visual flow that uses modeling algorithms to prepare data and build and train a model, using a guided approach to machine learning that doesn’t require coding.
Create and manage scenarios to find the best solution to your optimization problem by comparing different combinations of your model, data, and solutions.
Create a flow of ordered operations to cleanse and shape data. Visualize data to identify problems and discover insights.
Automate the model lifecycle, including preparing data, training models, and creating deployments.
Work with R notebooks and scripts in an integrated development environment.
Create a federated learning experiment to train a common model on a set of remote data sources. Share training results without sharing data.
Deploy and run your data science and AI solutions in a test or production environment.
Find and share your data and other assets.
Import asset metadata from a connection into a project or a catalog.
Enrich imported asset metadata with business context, data profiling, and quality assessment.
Measure and monitor the quality of your data.
Create and run masking flows to prepare copies of data assets that are masked by advanced data protection rules.
Create your business vocabulary to enrich assets and rules to protect data.
Track data movement and usage for transparency and determining data accuracy.
Track AI models from request to production.
Create a flow with a set of connectors and stages to transform and integrate data. Provide enriched and tailored information for your enterprise.
Create a virtual table to segment or combine data from one or more tables.
Measure outcomes from your AI models and help ensure the fairness, explainability, and compliance of all your models.
Replicate data to target systems with low latency, transactional integrity and optimized data capture.
Consolidate data from the disparate sources that fuel your business and establish a single, trusted, 360-degree view of your customers.
Services you can use
Services add features and tools to the platform.
Develop powerful AI solutions with an integrated collaborative studio and industry-standard APIs and SDKs. Formerly known as Watson Studio.
Quickly build, run and manage generative AI and machine learning applications with built-in performance and scalability. Formerly known as Watson Machine Learning.
Discover, profile, catalog, and share trusted data in your organization.
Create ETL and data pipeline services for real-time, micro-batch, and batch data orchestration.
View, access, manipulate, and analyze your data without moving it.
Monitor your AI models for bias, fairness, and trust with added transparency on how your AI models make decisions.
Provide efficient change data capture and near real-time data delivery with transactional integrity.
Improve trust in AI pipelines by identifying duplicate records and providing reliable data about your customers, suppliers, or partners.
Increase data pipeline transparency so you can determine data accuracy throughout your models and systems.
Where you'll work
Collaborative workspaces contain tools for specific tasks.
Where you work with data.
> Projects > View all projects
Where you find and share assets.
> Catalogs > View all catalogs
Where you deploy and run assets that are ready for testing or production.
> Deployments
Where you manage governance artifacts.
> Governance > Categories
Where you virtualize data.
> Data > Data virtualization
Where you consolidate data into a 360 degree view.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.