About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Last updated: Jan 12, 2024
You can use non-parametric distribution fitting, parametric distribution fitting, or parametric regression modeling SPSS predictive analytics algorithms in notebooks.
Non-Parametric Distribution Fitting
Survival analysis analyzes data where the outcome variable is the time until the occurrence of an event of interest. The distribution of the event times is typically described by a survival function.
Non-parametric Distribution Fitting (NPDF) provides an estimate of the survival function without making any assumptions concerning the distribution of the data. NPDF includes Kaplan-Meier estimation, life tables, and specialized extension algorithms to support left censored, interval censored, and recurrent event data.
Python example code:
from spss.ml.survivalanalysis import NonParametricDistributionFitting from spss.ml.survivalanalysis.params import DefinedStatus, Points, StatusItem npdf = NonParametricDistributionFitting(). \ setAlgorithm("KM"). \ setBeginField("time"). \ setStatusField("status"). \ setStrataFields(["treatment"]). \ setGroupFields(["gender"]). \ setUndefinedStatus("INTERVALCENSORED"). \ setDefinedStatus( DefinedStatus( failure=StatusItem(points = Points("1")), rightCensored=StatusItem(points = Points("0")))). \ setOutMeanSurvivalTime(True) npdfModel = npdf.fit(df) predictions = npdfModel.transform(data) predictions.show()
Parametric Distribution Fitting
Survival analysis analyzes data where the outcome variable is the time until the occurrence of an event of interest. The distribution of the event times is typically described by a survival function.
Parametric Distribution Fitting (PDF) provides an estimate of the survival function by comparing the functions for several known distributions (exponential, Weibull, log-normal, and log-logistic) to determine which, if any, describes the data best. In addition, the distributions for two or more groups of cases can be compared.
Python excample code:
from spss.ml.survivalanalysis import ParametricDistributionFitting from spss.ml.survivalanalysis.params import DefinedStatus, Points, StatusItem pdf = ParametricDistributionFitting(). \ setBeginField("begintime"). \ setEndField("endtime"). \ setStatusField("status"). \ setFreqField("frequency"). \ setDefinedStatus( DefinedStatus( failure=StatusItem(points=Points("F")), rightCensored=StatusItem(points=Points("R")), leftCensored=StatusItem(points=Points("L"))) ). \ setMedianRankEstimation("RRY"). \ setMedianRankObtainMethod("BetaFDistribution"). \ setStatusConflictTreatment("DERIVATION"). \ setEstimationMethod("MRR"). \ setDistribution("Weibull"). \ setOutProbDensityFunc(True). \ setOutCumDistFunc(True). \ setOutSurvivalFunc(True). \ setOutRegressionPlot(True). \ setOutMedianRankRegPlot(True). \ setComputeGroupComparison(True) pdfModel = pdf.fit(data) predictions = pdfModel.transform(data) predictions.show()
Parametric regression modeling
Parametric regression modeling (PRM) is a survival analysis technique that incorporates the effects of covariates on the survival times. PRM includes two model types: accelerated failure time and frailty. Accelerated failure time models assume that the relationship of the logarithm of survival time and the covariates is linear. Frailty, or random effects, models are useful for analyzing recurrent events, correlated survival data, or when observations are clustered into groups.
PRM automatically selects the survival time distribution (exponential, Weibull, log-normal, or log-logistic) that best describes the survival times.
Python example code:
from spss.ml.survivalanalysis import ParametricRegression from spss.ml.survivalanalysis.params import DefinedStatus, Points, StatusItem prm = ParametricRegression(). \ setBeginField("startTime"). \ setEndField("endTime"). \ setStatusField("status"). \ setPredictorFields(["age", "surgery", "transplant"]). \ setDefinedStatus( DefinedStatus( failure=StatusItem(points=Points("0.0")), intervalCensored=StatusItem(points=Points("1.0")))) prmModel = prm.fit(data) PMML = prmModel.toPMML() statXML = prmModel.statXML() predictions = prmModel.transform(data) predictions.show()
Parent topic: SPSS predictive analytics algorithms