# Survival Analysis

## Non-parametric Distribution Fitting

Survival analysis analyzes data where the outcome variable is the time until the occurrence of an event of interest. The distribution of the event times is typically described by a survival function.

Non-parametric Distribution Fitting (NPDF) provides an estimate of the survival function without making any assumptions concerning the distribution of the data. NPDF includes Kaplan-Meier estimation, life tables, and specialized extension algorithms to support left censored, interval censored, and recurrent event data.

Example code:

``````import com.ibm.spss.ml.common.params.DefinedStatus
import com.ibm.spss.ml.common.params.StatusItem
import com.ibm.spss.ml.common.params.Points
import com.ibm.spss.ml.survivalanalysis.NonParametricDistributionFitting

val npdf = NonParametricDistributionFitting().
setAlgorithm("KM").
setBeginField("time").
setStatusField("status").
setStrataFields(Array("treatment")).
setGroupFields(Array("gender")).
setUndefinedStatus("INTERVALCENSORED").
setDefinedStatus(
DefinedStatus(
failure = Some(StatusItem(points = Some(Points("1")))),
rightCensored = Some(StatusItem(points = Some(Points("0")))))).
setOutMeanSurvivalTime(true)

val npdfModel = npdf.fit(df)
val predictions = npdfModel.transform(data)
predictions.show()
``````

## Parametric Distribution Fitting

Survival analysis analyzes data where the outcome variable is the time until the occurrence of an event of interest.The distribution of the event times is typically described by a survival function.

Parametric Distribution Fitting (PDF) provides an estimate of the survival function by comparing the functions for several known distributions (exponential, Weibull, log-normal, and log-logistic) to determine which, if any, describes the data best. In addition, the distributions for two or more groups of cases can be compared.

Example code:

``````import com.ibm.spss.ml.survivalanalysis.ParametricDistributionFitting
val pdf = ParametricDistributionFitting().
setBeginField("begintime").
setEndField("endtime").
setStatusField("status").
setFreqField("frequency").
setDefinedStatus(
DefinedStatus(
failure = Some(StatusItem(points = Some(Points("F")))),
rightCensored = Some(StatusItem(points = Some(Points("R")))),
leftCensored = Some(StatusItem(points = Some(Points("L"))))
)
).
setMedianRankEstimation("RRY").
setMedianRankObtainMethod("BetaFDistribution").
setStatusConflictTreatment("DERIVATION").
setEstimationMethod("MRR").
setDistribution("Weibull").
setOutProbDensityFunc(true).
setOutCumDistFunc(true).
setOutSurvivalFunc(true).
setOutRegressionPlot(true).
setOutMedianRankRegPlot(true).
setComputeGroupComparison(true)

val pdfModel = pdf.fit(data)

val predictions = pdfModel.transform(data)
predictions.show
``````

## Parametric regression modeling

Parametric regression modeling (PRM) is a survival analysis technique that incorporates the effects of covariates on the survival times. PRM includes two model types: accelerated failure time and frailty. Accelerated failure time models assume that the relationship of the logarithm of survival time and the covariates is linear. Frailty, or random effects, models are useful for analyzing recurrent events, correlated survival data, or when observations are clustered into groups.

PRM automatically selects the survival time distribution (exponential, Weibull, log-normal, or log-logistic) that best describes the survival times.

Example code:

``````import com.ibm.spss.ml.survivalanalysis.ParametricRegression
var prm = ParametricRegression().
setBeginField("startTime").
setEndField("endTime").
setStatusField("status").
setPredictorFields(Array("age", "surgery", "transplant")).
setDefinedStatus(
DefinedStatus(
failure = Some(StatusItem(points = Some(Points("0.0")))),
intervalCensored = Some(StatusItem(points = Some(Points("1.0")))))).
val prmModel = prm.fit(data)
val PMML = prmModel.toPMML()
val statXML = prmModel.statXML()
val predictions = prmModel.transform(data)
predictions.show()
``````