Linear regression is a common statistical technique for
classifying records based on the values of numeric input fields. Linear regression fits a straight
line or surface that minimizes the discrepancies between predicted and actual output
values.
Requirements. Only numeric fields and categorical
predictors can be used in a linear regression model. You must have exactly one target field (with
the role set to Target) and one or more predictors (with the role set to
Input). Fields with a role of Both or
None are ignored, as are non-numeric fields. (If necessary, non-numeric
fields can be recoded using a Derive node.)
Strengths. Linear regression models are relatively simple and give
an easily interpreted mathematical formula for generating predictions. Because linear regression is
a long-established statistical procedure, the properties of these models are well understood. Linear
models are also typically very fast to train. The Linear node provides methods for automatic field
selection in order to eliminate non-significant input fields from the equation.
Note: In cases where the target field is categorical rather than a continuous range, such as
yes/no or churn/don't churn, logistic regression can
be used as an alternative. Logistic regression also provides support for non-numeric inputs,
removing the need to recode these fields.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.