SPSS predictive analytics survival analysis algorithms in notebooks

Last updated: Jan 12, 2024

You can use non-parametric distribution fitting, parametric distribution fitting, or parametric regression modeling SPSS predictive analytics algorithms in notebooks.

Non-Parametric Distribution Fitting

Survival analysis analyzes data where the outcome variable is the time until the occurrence of an event of interest. The distribution of the event times is typically described by a survival function.

Non-parametric Distribution Fitting (NPDF) provides an estimate of the survival function without making any assumptions concerning the distribution of the data. NPDF includes Kaplan-Meier estimation, life tables, and specialized extension algorithms to support left censored, interval censored, and recurrent event data.

Python example code:

from spss.ml.survivalanalysis import NonParametricDistributionFitting
from spss.ml.survivalanalysis.params import DefinedStatus, Points, StatusItem

npdf = NonParametricDistributionFitting(). \
    setAlgorithm("KM"). \
    setBeginField("time"). \
    setStatusField("status"). \
    setStrataFields(["treatment"]). \
    setGroupFields(["gender"]). \
    setUndefinedStatus("INTERVALCENSORED"). \
    setDefinedStatus(
    DefinedStatus(
        failure=StatusItem(points = Points("1")),
        rightCensored=StatusItem(points = Points("0")))). \
    setOutMeanSurvivalTime(True)

npdfModel = npdf.fit(df)
predictions = npdfModel.transform(data)
predictions.show()

Parametric Distribution Fitting

Survival analysis analyzes data where the outcome variable is the time until the occurrence of an event of interest. The distribution of the event times is typically described by a survival function.

Parametric Distribution Fitting (PDF) provides an estimate of the survival function by comparing the functions for several known distributions (exponential, Weibull, log-normal, and log-logistic) to determine which, if any, describes the data best. In addition, the distributions for two or more groups of cases can be compared.

Python excample code:

from spss.ml.survivalanalysis import ParametricDistributionFitting
from spss.ml.survivalanalysis.params import DefinedStatus, Points, StatusItem

pdf = ParametricDistributionFitting(). \
    setBeginField("begintime"). \
    setEndField("endtime"). \
    setStatusField("status"). \
    setFreqField("frequency"). \
    setDefinedStatus(
        DefinedStatus(
         failure=StatusItem(points=Points("F")),
         rightCensored=StatusItem(points=Points("R")),
         leftCensored=StatusItem(points=Points("L")))
    ). \
    setMedianRankEstimation("RRY"). \
    setMedianRankObtainMethod("BetaFDistribution"). \
    setStatusConflictTreatment("DERIVATION"). \
    setEstimationMethod("MRR"). \
    setDistribution("Weibull"). \
    setOutProbDensityFunc(True). \
    setOutCumDistFunc(True). \
    setOutSurvivalFunc(True). \
    setOutRegressionPlot(True). \
    setOutMedianRankRegPlot(True). \
    setComputeGroupComparison(True)

pdfModel = pdf.fit(data)
predictions = pdfModel.transform(data)
predictions.show()

Parametric regression modeling

Parametric regression modeling (PRM) is a survival analysis technique that incorporates the effects of covariates on the survival times. PRM includes two model types: accelerated failure time and frailty. Accelerated failure time models assume that the relationship of the logarithm of survival time and the covariates is linear. Frailty, or random effects, models are useful for analyzing recurrent events, correlated survival data, or when observations are clustered into groups.

PRM automatically selects the survival time distribution (exponential, Weibull, log-normal, or log-logistic) that best describes the survival times.

Python example code:

from spss.ml.survivalanalysis import ParametricRegression
from spss.ml.survivalanalysis.params import DefinedStatus, Points, StatusItem

prm = ParametricRegression(). \
    setBeginField("startTime"). \
    setEndField("endTime"). \
    setStatusField("status"). \
    setPredictorFields(["age", "surgery", "transplant"]). \
    setDefinedStatus(
        DefinedStatus(
         failure=StatusItem(points=Points("0.0")),
         intervalCensored=StatusItem(points=Points("1.0"))))

prmModel = prm.fit(data)
PMML = prmModel.toPMML()
statXML = prmModel.statXML()
predictions = prmModel.transform(data)
predictions.show()

Parent topic: SPSS predictive analytics algorithms

Was the topic helpful?