In this chapter, you will learn the basic concepts of MixWILD. You will also know how to install the MixWILD program (Section 1.3. Preparing for Use) and how to load data and start analysis using MixWILD (Section 1.4. Step by-step Instructions on Import Data in MixWILD).
Mixed model analysis With Intensive Longitudinal Data (MixWILD) is a standalone statistical software program that examines the effects of subject-level parameters (intercept, slope(s), and scale) comprised of time-varying variables on subject-level outcome or outcome nested within time or clusters, specifically in the context of studies using intensive sampling methods, such as ecological momentary assessment (EMA). It combines estimation of a Stage 1 mixed-effects model, including random subject effects, with a subsequent Stage 2 regression in which values sampled from each subject's random effect distributions can be used as regressors (Dzubur et al., 2020).
In MixWILD, we distinguish two parts of the total model: 1) ‘Stage 1 mixed-effects model’, in which random subject intercept and slopes (location effects) and random subject within-subjects variance (scale effect) are estimated, and 2) ‘Stage 2 model’, in which the Stage 1 random subject effect estimates are used as regressors and interaction terms to predict a Stage 2 outcome.
In Stage 1, a mixed-effects location-scale (MELS) model is specified, which extends the conventional mixed-effects regression model by allowing modeling of both the between-subject (BS) and within-subjects (WS) variances in terms of covariates (Hedeker et al., 2008). Specifically, log-linear submodels for the BS and WS variances are used, allowing covariates to influence both types of variance. Additionally, besides the inclusion of a random subject effect in the Mean model, a random subject (scale) effect is added to the WS variance specification, allowing the WS variance to be subject-specific, as well as influenced by covariates. Thus, these MELS models include both random subject location and scale effects, which are estimated using empirical Bayes methods (Bock, 1989). Additionally, MixWILD can allow for random subject intercept and slopes (of time-varying covariates) in the Mean model, which we refer to as mixed-effects multiple location scale (MEMLS) models. In all, these subject-specific parameters indicate a baseline mean level (random intercept), the effect of a (time-varying) covariate on the mean (random slope), and the degree of within-subject variability (random scale) (Dzubur et al., 2020). These random subject effects from the Stage 1 model can then be used in a regression model to predict a Stage 2 outcome. This Stage 2 outcome can be a subject-level outcome or outcome nested within time or clusters, and can be of four different outcome types: continuous (normal), dichotomous/ordinal, count, or nominal. The random effects (estimated in Stage 1) can be included in the Stage 2 regression model as main effects and as interactions with other regressors.
(Please check Dzubur et al., 2020 for more details)
* RL: Random Location; RS: Random Scale.
Figure 1-1. Illustration of two-stage model.
For example, you may have a MELS model which estimates day-level minutes moderate-to-vigorous physical activity (\(y_{1}\) = MVPA) in Stage 1 at a function of various time-varying and invariant predictors (\(x_{1}\)~\(x_{n}\), i.e., positive affect and sex). As shown in Figure 1-1, MixWILD will generate an output to report regression coefficients, the effects of the various covariates on BS- and WS- variance in MVPA. Meanwhile, the software also creates estimates of the random subject effects of MVPA (i.e., random location and random scale effects) from the Stage 1 MELS model. Following the Stage 1 model, you can use the random location effect of MVPA (MVPA mean effect) and random scale effect of MVPA (MVPA variability effect) to predict one's BMI status (\(y_{2}\) = BMI) in the Stage 2 model in Figure 1-1. By doing this, you could understand if/how day-level MVPA mean and variability influence one's BMI.
To set the stage for the discussion to follow, it is helpful to present a unified terminology for the various aspects of MixWILD.
Figure 1-2. Structure of Level 1 & Level 2 in mixed-effects data.
* Stage 1 model must be a multilevel model.
* Stage 2 model could be a subject-level or multilevel model.
Figure 1-3. Simple illustration of Stage 1 & Stage 2 models.
Thus, researchers have more capabilities to conduct sophisticated research designs and provide the answers with more details. MixWILD is useful for a variety of Intensive Longitudinal Data (ILD) collection strategies (i.e., EMA, sensors) as a robust and reproducible method to test predictors of variability in level 1 outcomes and the associations between subject-level parameters (variances and slopes) and level 2 outcomes (Dzubur et al., 2020). For example, MixWILD can address research questions such as the following.
In EMA studies, it is common to have up to 30 or 40 observations per subject, and this allows us to model subject-level variances (i.e., how erratic is a subject's mood?) and slopes (i.e., how much does a subject's mood change across contexts?) for time-varying variables. Traditionally, intraindividual means and variances from Ecological Momentary Assessment (EMA) data have been computed for each person using standard formulas, such as subject-level standard deviations (i.e., SD and MSSD) (Jahng et al., 2008). These strategies ignore the fact that subjects can have unequal numbers of EMA observations and cannot account for the effects of covariates. Alternatively, the MixWILD approach recognizes that subjects can vary in terms of their numbers of observations, and can generate variability estimates adjusting for the effects of other variables. Furthermore, MixWILD uses the plausible values resampling approach to take into account for the variability that is inherent in these estimates, rather than simply treating these as known predictors (which is what is assumed if one uses SD estimates as regressors). As a result, MixWILD does not ignore this source of variation, which avoids leading to more false positive results.
MixWILD can conduct MELS/MEMLS modeling parsing the variance of a Stage 1 outcome into mean and variability, which are used as predictors in Stage 2 regression models of a new outcome. This two-stage approach can be useful to address questions of interest in a variety of research areas in the social and behavioral sciences. These tools can be used to analyze data from and ultimately to design new mHealth interventions, Intensively Adaptive Interventions (IAIs), and Just-In-Time Adaptive Interventions (JITAIs) (Nahum-Shani et., 2018); where subjects provide a greater number of data points over time than in traditional RCT studies. For example, researchers are able to ask critical research questions such as whether erratic mood mediates the effects of depression on physical activity, or whether the effects of living in a highly walkable neighborhood on physical activity are attenuated for individuals with unstable self-efficacy beliefs by examining EMA data using the MixWILD two-stage models.
If this is your first time to install MixWILD, the Windows system may ask you to do some extra steps to successfully install the software.
(i.e., Please don't name your folder as " My Data" which will lead to an error). (i.e., Use underscore to replace space , the correct name should be " My_Data").
You may also find our MixWILD example dataset in the link below.
https://reach-lab.github.io/MixWildGUI/Mixwild_example_data.csv
This is an example dataset which provides you the basic elements which MixWILD needs in the analysis. Some of our tutorials in the following chapters are using this dataset. We encourage you to download it and give it a try, especially if you don't have any EMA dataset on hand. Here, we present the steps involved in using the program with the example dataset, where EMA data were obtained from 94 subjects (1,635 observations). Variables in the example dataset include:
Variable | Level | Description |
---|---|---|
ID | Participant ID number | |
AGE | 2 | Number of years (centered around mean age=29.29) |
SEX | 2 | 0 (female); 1 (male) |
WEEKEND | 1 | 0 (weekday); 1 (weekend) |
DOW | 1 | Day of week; 0: Monday, 1: Tuesday, …, 6: Sunday |
OBESE | 2 | 0 (not obese); 1 (obese) |
BMI | 2 | Body Mass Index (mean = 24.22, min=13.9, max=48.68) |
NegMood | 1 | Levels of negative affect reported in each prompt |
PosMood | 1 | Levels of positive affect reported in each prompt |
MVPA_daily_mins | 2 | Daily averaged moderate-to-vigorous physical activity time in minutes |
SED_daily_hours | 2 | Daily averaged sedentary time in hours |
NEGMOOD10 | 1 | A decile (10 category) recoding of the variable NEG_AFFECT |
POSMOOD10 | 1 | A decile (10 category) recoding of the variable POS_AFFECT |
MVPA_daily4 | 2 | a quartile (4 category) recoding of the variable MVPA_daily_mins |
The data are sorted by ID - this is important as the program will not produce correct results if the data are not sorted by the level-2 ID variable. Also, the variables in the dataset are numeric only (i.e., no letters or non-numeric text can be present in the dataset) and the variables are separated by tabs, commas, or one or more spaces in the dataset.
There are more resources for EMA data sets which are publicly available. You may search on “intensive longitudinal” or “EMA” to access more datasets.
Harvard Dataverse is a repository for research data. Deposit data and code here.
https://dataverse.harvard.edu/
Texas Data Repository Dataverse is a research data management system for Texas Digital Library (TDL) member institutions.
The user interface for MixWILD runs in a Java runtime environment that provides feature parity between Windows and Mac versions. Native 64-bit binaries for macOS and Windows written in Fortran are used to execute statistical analyses and generate model output.
Please check the project on Github for more details
The code for MixWILD is written in FORTRAN and uses maximum likelihood estimation, utilizing both the expectation-maximization (EM) algorithm and the Newton–Raphson method to obtain the parameter estimates. Additionally, the mean and variance of each subject's random effects are estimated using empirical Bayes methods (Dzubur et al., 2020). More information about the estimation procedure can be found in the MIXREGLS manual (Hedeker & Nordgren, 2013).
Software
Dzubur, E., Ponnada, A., Nordgren, R., Yang, C. H., Intille, S., Dunton, G., & Hedeker, D. (2020). MixWILD: A program for examining the effects of variance and slope of time-varying variables in intensive longitudinal data. Behavior Research Methods, 1-25.
MixWILD: A tutorial of a simple multilevel model with intensive longitudinal data (Link: https://youtu.be/ZqyCxrMG1R8 provided by Eldin Dzubur)
Hedeker, D., & Nordgren, R. (2013). MIXREGLS: a program for mixed-effects location scale analysis. Journal of statistical software, 52(12), 1.
Methodology
Applied Papers
This manual comprises four parts, in which various aspects of MixWILD models are discussed.
Chapter 2 presents the Stage 1 basic multilevel model for analysis of repeated measurements. It describes the basic settings in MixWILD Stage 1 model and how to run MELS and MEMLS models which include random subject intercept, slope(s), and scale in the analysis of repeated measurements as well as some interpretation of the outputs.
Chapter 3 introduces a two-stage model: MixWILD combines a Stage 1 MELS or MEMLS model of a Stage 1 outcome with a subsequent Stage 2 model in which the Stage 1 random effects (i.e., random subject intercept, slopes, and scale) are used as regressors in the Stage 2 model.
Chapter 2 will focus on MixWILD Stage 1 model (e.g., MELS or MEMLS). You will learn how to include random subject intercept, slope(s), and scale in the analysis of repeated measurements. In addition, this chapter will give you details about the submodels of the Stage 1 model (e.g., BS and WS Variance submodels). We will provide simple examples and step-by-step instructions to run a Stage 1 model in MixWILD and interpret the outputs. Example 1 shows how to operate a MELS model with random scale (2.2 Instruction; 2.3 Results); in Example 2, we provide a tutorial to run a MEMLS model in MixWILD (2.4 Instruction; 2.5 Results).
As mentioned in Chapter 1, the Stage 1 model is a mixed-effects model, in which random subject intercept and slope(s) (location effects) and random subject within-subjects variance (scale effect) are included. On the one hand, the Stage 1 model can be considered as an independent model which applies a mixed-effects modeling approach with random subject effects to examine the associations between a time-varying outcome and subject-level or time-varying covariates. On the other hand, the Stage 1 model can also be an antecedent model which generates necessary estimates (e.g., random subject intercept, slope(s), and scale effects) as regressors for use in a subsequent Stage 2 model. In this chapter, the discussion will focus on the application of the Stage 1 model only. We will describe use of Stage 1 estimates of random effects as regressors in a Stage 2 model in Chapter 3.
The Stage 1 model is a MELS or MEMLS model that includes and provides estimates of the random subject effects. The Stage 1 model can also include covariates for the mean and variance submodels. In this way, the random effects can either be unadjusted (submodels with no covariates) or adjusted for covariates (submodels including covariates). MELS and MEMLS in MixWILD offer an extra capability to handle random effects, compared to the standard Ordinary Least Squares regression (OLS), and goes beyond standard mixed model software by including random subject scale effects.
Figure 2.1 shows an example of OLS model which uses WS positive affect (PA) to predict daily averaged moderate-to-vigorous physical activity (MVPA) time in minutes. In this model, we neglected the multilevel data structure (i.e., observations nested within person) and simply analyzed the observations using a standard OLS regression model.
Figure 2-1. OLS model (DV: MVPA; IV: WS PA)
For measurement y (i.e., Daily MVPA) of subject i (i = 1, 2, . . . , N subjects) on occasion j (j = 1, 2, . . . , occasions):
(2.1)
In Equation 2.1, is the intercept and is the regressor (i.e., WS PA for a subject at a particular occasion) for the Mean model, and is the corresponding regression coefficient. is subject i's error at time j. With OLS, subjects share one regression line, and observations are dispersed around it comparably in all subjects. This model ignores the structure of the EMA data and potentially leads to impoverished analysis and inferential error due the violation of Independent and identically distributed (i.i.d.) observations assumption.
A MELS or MEMLS model is an extension of a multilevel model. Multilevel models typically include random subject intercepts and possibly random slopes to account for the correlation of the repeated observations within subjects. However, ordinarily, multilevel models do not include covariates to predict WS variance (models _ without _ Scale Parameters), but instead assume a common WS variance. Applying MELS or MEMLS models can allow covariates to influence the WS variance (models with Scale Parameters), and even further allow each subject to have their own degree of WS variance, above and beyond the effects of covariates (models with Random Scale). The random scale effect is often necessary to obtain correct inference for the covariate effects on the WS variance (Please see Nordgren et al., 2020).
Here, we applied the same DV (MVPA) and IV (WS) in a MELS model in MixWILD. As mentioned, this model decomposes the variance of the dependent variable (i.e., Daily MVPA) into BS and WS components. As shown in Figure 2.2, each subject is allowed to have their own intercept while the association (slope) between MVPA and WS PA remains the same.
Figure 2-2. MELS model (DV: MVPA; IV: WS PA)
, , (2.2)
In Equation 2.2, is the grand intercept and is the regressor (i.e., WS PA) for the Mean model, and is the corresponding regression coefficient. There is a new component, , to represent a subject's mean (deviation from grant mean, ), and it is referred to as a random location (intercept) effect in the MELS model. are the other regressors for the Mean model and are the corresponding regression coefficients. Please note that bolded andrepresent vectors. In other words, there could be more than one variable, and therefore more than one _ Beta _ coefficient. is subject i's error at time j (deviation of a subject's observations from their mean and the fixed part of the model).
(2.3)
In the BS Variance submodel (Equation 2.3), refers to the BS variance and are the regressors (typically including “1” and other covariates) for the BS Variance model, and is the corresponding vector of regression coefficients. The BS variance is subscripted by i and j to indicate that the value changes depending on the covariates and their coefficients . Please note that bolded andrepresent vectors. As mentioned, there could be more than one variable, and therefore more than one _ Alpha _ coefficient. The exponential function is used to ensure a positive multiplicative factor, and so the resulting BS variance is strictly positive.
Friendly Note: The BS variance () is defined as the variance across subject-level mean. Thanks to the Intensive-longitudinal Study Design, the EMA data become rich and offer many modeling possibilities. The estimation of the BS variance in the MELS models is more complicated than the tradition variance equation. As a result, the BS variance could be time-varying and different across occasion within a person. It's also possible that WS (time-varying) variables may influence the BS variance. |
* Yellow area represents the magnitude of BS variance.
Figure 2-3. MELS BS submodel (DV: BS Variance in MVPA; IV: WS PA)
In Figure 2-3, the BS variance is represented by the dispersion of the subject lines. In particular, the amount of spread (yellow area) across the lines indicates the magnitude of the BS variance. For example, if the lines are close together then subjects are more similar (smaller variance) and vice versa. The magnitude of indicates how different subjects are from each other (heterogeneity).
(2.4)
, (2.5)
, (2.6)
In the WS Variance submodel without random scale effect (Equation 2.4), refers to the WS variance and are the regressors (typically including “1” and other covariates) for the WS variance model, andis the corresponding vector of regression coefficients. The magnitude of indicates how data vary within subjects (erraticism). In the WS variance submodel with random scale effect (Equation 2.5), the random scale effect, , allows the WS variance to vary across subjects beyond the contribution of covariates (Dzubur et al., 2020). In Equation 2.6, the coefficient represents the linear association between a subject's location effect and the WS variance, and refers to the regressor of subject-level random location/intercept. It is also possible to have a linear () term and a quadratic () term in the MELS WS Variance submodel, and the coefficients are represented by and respectively.
The WS variance is subscripted by i and j to indicate that the value change depending on the covariates and their coefficients . Please note that bolded and represent vectors. In other words, there could be more than one variable, and therefore more than one _ Tau _ coefficient. The exponential function is used to ensure that the resulting WS variance is positive.
* Green area represents the magnitude of the WS variance.
Figure 2-4. MELS WS submodel with Random Scale effect
(DV: WS Variance in MVPA; IV: WS PA)
In Figure 2-4, the variation of the points within a subject relative to each subject's line indicates the WS variance (green area). Subject 1 with blue dots has a higher WS variance, compared to Subject 2 with orange dots. This difference in the WS variance across subjects is what the random scale effect () represents in Equation 2.5 and 2.6.
Figure 2-5. An Example of Stage 1 Regressors in Mean, BSV and WSV Submodels
Table 2-1. An Example of Stage 1 Regressors in Mean, BSV and WSV Submodels
Model | Regressor | Regressor List |
---|---|---|
Mean model | \(x^{'}\) | Weekend (LV1), Age (LV2), Sex (LV2) |
BS Variance Submodel | \(u^{'}\) | Weekend (LV1), Sex (LV2) |
WS Variance Submodel | \(w^{'}\) | Weekend (LV1), Sex (LV2) |
Please note although we have used different letters (, and ) to represent the covariates in the different submodels (Mean, BS and WS Variance submodels), there is no restriction and the same covariates could be used (Dzubur et al., 2020). For instance, Figure 2-5 and Table 2-1 show that the regressors are identical in BS and WS Variance submodels (Weekend and Sex), while it also allows a different variable set (Weekend, Age and Sex) in Mean model.
Per Hedeker and Nordgren (2013), the parameters of these models (Mean, WS and BS Variance submodels) are estimated using maximum likelihood and the Newton–Raphson algorithm. Once the model has converged to a solution, empirical Bayes methods (Bock, 1989) are used to obtain subject-specific estimates for (random location intercept) and (random scale), along with the variance-covariance matrix associated with these estimates, which are saved for potential use in a Stage 2 model. These correspond to estimates of the mean and variance-covariance of the posterior distribution of the random effects (Dzubur et al., 2020).
Extending the model presented in the previous section (2.1.2 MELS), one may be interested in understanding how the slopes of the lines vary by subject for time-varying covariates. Such random slopes can be used to generalize the above model, allowing for a vector of random location effects instead of only a random intercept.
Figure 2-6. MEMLS model (DV: MVPA; IV: WS PA)
MixWILD can allow for random subject intercept and slope(s) (of time-varying covariates) in the Mean model, which we refer to as MEMLS models to reflect the multiple location random effects. In all, these subject-specific parameters indicate a baseline mean level (random intercept when the covariate(s) equal 0), the effect of (time-varying) covariate(s) on the mean (random slope), and the degree of within-subject variability (random scale) (Dzubur et al., 2020).
Figure 2-6 shows how the slope of WS PA predicting daily MVPA can vary by subject. The average across all subjects is depicted with the solid black line, and the location averages (mean plus slope) of four subjects are presented as dashed lines.
,
, , (2.7)
In Equation 2.7, is the grand intercept and is the regressor (i.e., WS PA) for the Mean model, and is the corresponding regression coefficient. The random intercept, , represents subject's mean (deviation) when the regressor (i.e., WS PA) equals zero, as it is a deviation from the grand intercept, . The random slope, , indicates the extra random part (association) for a subject beyond the average slope of predicting . It is possible to have multiple random slope effects in a MEMLS model. are the other regressors for the Mean model and are the corresponding regression coefficients. Please note that bolded andrepresent vectors. is subject i's error at time j (deviation from subject's trend line).
Like MELS, MEMLS has Mean and WS Variance submodels, in which covariates can be included to examine their effects on the mean and WS variance. However, as mentioned, MEMLS augments the MELS model by including multiple random subject effects in the Mean model (i.e., both random intercept and slope(s)). In this way, the BS variance-covariance is a function of the random intercept as well as slopes. Please note the random slope effect is only possible for time-varying covariates.
(2.8)
As the BS variance-covariance Matrix (2.8) shows, and refer to the variances of random intercept and slope effects respectively, and represents the covariance. By examining the variances and covariance(s), the MEMLS Stage 1 model can indicate the degree to which the random intercept and slope vary across subjects. The covariance can show the degree to which the random intercept and random slope are associated with each other.
, (2.9)
In the WS Variance submodel (Equation 2.9), refers to the WS variance and are the regressors (typically including “1” and other covariates) for the WS Variance model, andis the corresponding vector of regression coefficients. represents the random scale effect. As discussed in Hedeker and Nordgren (2013), an association between the location and scale random effects can be induced by including the location random effects () as predictors in the WS Variance model, using ,which are terms from the Cholesky decomposition of the variance-covariance matrix. In this regard, MixWILD allows for two possibilities to describe the relationship between random location and random scale: (1) no association ( = 0) or (2) association ( 0) (Dzubur et al., 2020). However, in the current version of MixWILD, there is no option for selecting “No Association”. When using MEMLS models with the random scale effect, the association between random location and random scale will be on and this is the default setting. This option will be available in a future update.
MEMLS can have more than 2 random location effects (i.e., multiple random slopes), though often a random intercept and one slope is common. As shown in the variance-covariance matrix (2.10), it represents a MEMLS model with parameters that consist of one random intercept, one random slope and one random scale. To estimate the variance-covariance in this model, it requires a matrix. If there are n random location effects and one random scale effect in a MEMLS, the size of the variance-covariance matrix will be . In this case, the model will be more complicated and estimation time will be significantly increased with each additional random effect.
(2.10)
In MixWILD, the outcome of the Stage 1 model (both MELS and MEMLS) can be continuous, dichotomous or ordinal. However, a random scale parameter is not be available in Stage 1 models with dichotomous outcomes because dichotomous outcomes generally do not provide enough variance to allow for estimation of the random scale effect.
Table 2-2. Summary of Submodels in MELS and MEMLS
Submodel | Outcome | MELS | MEMLS |
---|---|---|---|
Mean Model with Random Intercept Only | Original outcome | √ | N/A |
Mean Model with Random Intercept and Random Slope(s) | Original outcome | N/A | √ |
BS Variance Submodel | Variance of the subject-level means in outcome | √ | N/A |
WS Variance Submodel | Within-subject variance of the outcome | √ | √ |
* Note: √ – Available; N/A – Not Available
Friendly Note: As shown in Table 2-2, WS Variance submodel and Random Scale effects are distinctive features in MELS and MEMLS models. Models without random scale can still include modeling of the WS variance as a function of covariates. If, in addition to omitting random scale, they also do not include WS Variance modeling, then they are equivalent to standard mixed-effects (aka multilevel) models. |