GLMM node

This node creates a generalized linear mixed model (GLMM).

Generalized linear mixed models extend the linear model so that:
  • The target is linearly related to the factors and covariates via a specified link function
  • The target can have a non-normal distribution
  • The observations can be correlated

Generalized linear mixed models cover a wide variety of models, from simple linear regression to complex multilevel models for non-normal longitudinal data.

Examples. The district school board can use a generalized linear mixed model to determine whether an experimental teaching method is effective at improving math scores. Students from the same classroom should be correlated since they are taught by the same teacher, and classrooms within the same school may also be correlated, so we can include random effects at school and class levels to account for different sources of variability.

Medical researchers can use a generalized linear mixed model to determine whether a new anticonvulsant drug can reduce a patient's rate of epileptic seizures. Repeated measurements from the same patient are typically positively correlated so a mixed model with some random effects should be appropriate. The target field – the number of seizures – takes positive integer values, so a generalized linear mixed model with a Poisson distribution and log link may be appropriate.

Executives at a cable provider of television, phone, and internet services can use a generalized linear mixed model to learn more about potential customers. Since possible answers have nominal measurement levels, the company analyst uses a generalized logit mixed model with a random intercept to capture correlation between answers to the service usage questions across service types (tv, phone, internet) within a given survey responder's answers.

In the node properties, data structure options allow you to specify the structural relationships between records in your dataset when observations are correlated. If the records in the dataset represent independent observations, you don't need to specify any data structure options.

Subjects. The combination of values of the specified categorical fields should uniquely define subjects within the dataset. For example, a single Patient ID field should be sufficient to define subjects in a single hospital, but the combination of Hospital ID and Patient ID may be necessary if patient identification numbers are not unique across hospitals. In a repeated measures setting, multiple observations are recorded for each subject, so each subject may occupy multiple records in the dataset.

A subject is an observational unit that can be considered independent of other subjects. For example, the blood pressure readings from a patient in a medical study can be considered independent of the readings from other patients. Defining subjects becomes particularly important when there are repeated measurements per subject and you want to model the correlation between these observations. For example, you might expect that blood pressure readings from a single patient during consecutive visits to the doctor are correlated.

All of the fields specified as subjects in the node properties are used to define subjects for the residual covariance structure, and provide the list of possible fields for defining subjects for random-effects covariance structures on the Random Effect Block.

Repeated measures. The fields specified here are used to identify repeated observations. For example, a single variable Week might identify the 10 weeks of observations in a medical study, or Month and Day might be used together to identify daily observations over the course of a year.

Define covariance groups by. The categorical fields specified here define independent sets of repeated effects covariance parameters; one for each category defined by the cross-classification of the grouping fields. All subjects have the same covariance type, and subjects within the same covariance grouping will have the same values for the parameters.

Spatial covariance coordinates. The variables in this list specify the coordinates of the repeated observations when one of the spatial covariance types is selected for the repeated covariance type.

Repeated covariance type. This specifies the covariance structure for the residuals. The available structures are:

  • First-order autoregressive (AR1)
  • Autoregressive moving average (1,1) (ARMA11)
  • Compound symmetry
  • Diagonal
  • Scaled identity
  • Spatial: Power
  • Spatial: Exponential
  • Spatial: Gaussian
  • Spatial: Linear
  • Spatial: Linear-log
  • Spatial: Spherical
  • Toeplitz
  • Unstructured
  • Variance components