1 Chapter 1 Introduction

In this chapter, you will learn the basic concepts of MixWILD. You will also know how to install the MixWILD program (Section 1.3. Preparing for Use) and how to load data and start analysis using MixWILD (Section 1.4. Step by-step Instructions on Import Data in MixWILD).


1.1 What is MixWILD

Mixed model analysis With Intensive Longitudinal Data (MixWILD) is a standalone statistical software program that examines the effects of subject-level parameters (intercept, slope(s), and scale) comprised of time-varying variables on subject-level outcome or outcome nested within time or clusters, specifically in the context of studies using intensive sampling methods, such as ecological momentary assessment (EMA). It combines estimation of a Stage 1 mixed-effects model, including random subject effects, with a subsequent Stage 2 regression in which values sampled from each subject's random effect distributions can be used as regressors (Dzubur et al., 2020).

1.1.1 MixWILD Stage 1 vs. Stage 2 models

In MixWILD, we distinguish two parts of the total model: 1) ‘Stage 1 mixed-effects model’, in which random subject intercept and slopes (location effects) and random subject within-subjects variance (scale effect) are estimated, and 2) ‘Stage 2 model’, in which the Stage 1 random subject effect estimates are used as regressors and interaction terms to predict a Stage 2 outcome.

In Stage 1, a mixed-effects location-scale (MELS) model is specified, which extends the conventional mixed-effects regression model by allowing modeling of both the between-subject (BS) and within-subjects (WS) variances in terms of covariates (Hedeker et al., 2008). Specifically, log-linear submodels for the BS and WS variances are used, allowing covariates to influence both types of variance. Additionally, besides the inclusion of a random subject effect in the Mean model, a random subject (scale) effect is added to the WS variance specification, allowing the WS variance to be subject-specific, as well as influenced by covariates. Thus, these MELS models include both random subject location and scale effects, which are estimated using empirical Bayes methods (Bock, 1989). Additionally, MixWILD can allow for random subject intercept and slopes (of time-varying covariates) in the Mean model, which we refer to as mixed-effects multiple location scale (MEMLS) models. In all, these subject-specific parameters indicate a baseline mean level (random intercept), the effect of a (time-varying) covariate on the mean (random slope), and the degree of within-subject variability (random scale) (Dzubur et al., 2020). These random subject effects from the Stage 1 model can then be used in a regression model to predict a Stage 2 outcome. This Stage 2 outcome can be a subject-level outcome or outcome nested within time or clusters, and can be of four different outcome types: continuous (normal), dichotomous/ordinal, count, or nominal. The random effects (estimated in Stage 1) can be included in the Stage 2 regression model as main effects and as interactions with other regressors.

(Please check Dzubur et al., 2020 for more details)

* RL: Random Location; RS: Random Scale.

Figure 1-1. Illustration of two-stage model.

For example, you may have a MELS model which estimates day-level minutes moderate-to-vigorous physical activity (\(y_{1}\) = MVPA) in Stage 1 at a function of various time-varying and invariant predictors (\(x_{1}\)~\(x_{n}\), i.e., positive affect and sex). As shown in Figure 1-1, MixWILD will generate an output to report regression coefficients, the effects of the various covariates on BS- and WS- variance in MVPA. Meanwhile, the software also creates estimates of the random subject effects of MVPA (i.e., random location and random scale effects) from the Stage 1 MELS model. Following the Stage 1 model, you can use the random location effect of MVPA (MVPA mean effect) and random scale effect of MVPA (MVPA variability effect) to predict one's BMI status (\(y_{2}\) = BMI) in the Stage 2 model in Figure 1-1. By doing this, you could understand if/how day-level MVPA mean and variability influence one's BMI.

1.1.2 General Terms

To set the stage for the discussion to follow, it is helpful to present a unified terminology for the various aspects of MixWILD.

  • Random Location: Random Location (intercept) effect is defined as the degree to which a subject deviates from the population average. It typically represents a subject-level mean effect, over and above the effects of covariates on the mean, and accounts for the non-independence of observations within subjects (Dzubur et al., 2020). For example, a random location effect of positive affect can be interpreted as the subject-level mean of positive affect, beyond the effects of covariates on positive affect. It is a random effect because the subjects in the sample are thought to represent a population.
  • Random Scale: Random Scale effect is defined as the degree of within-subject variability, over and above the effects of covariates on the within-subject variability (Dzubur et al., 2020). Random scale parameters allow subjects to have individual estimates of the within-subject variance, and this is the distinguishing feature of MELS and MEMLS models. It can be thought of as intraindividual variability or variance. For example, a random effect of MVPA can be interpreted as the degree of random subject variance in MVPA.
  • Random Slope: Random Slope effect refers to associations between time-varying covariates and the time-varying outcome. In addition to the random location (intercept), random slope is also a type of location effect as it represents the relationship of the covariate and the mean response for each subject. For example, a random slope effect of momentary positive affect and weekends/weekdays (weekends = 1; weekdays = 0) can be interpreted as the subject-level change in the mean of positive affect on weekends relative to weekdays, beyond the effects of covariates on positive affect.
  • Level 1 variable: This is a basic concept in Mixed-effects model aka Multilevel Modeling (MLM) or Hierarchical Linear Modeling (HLM), and it denotes a variable at the lowest level. For example, an EMA study might include 10-20 observations (person-time or prompt) for each subject (person). The variables at observation level (person-time or prompt) such as affective states or cognitive responses are all considered as level 1 variables.
  • Level 2 variable: In contrast, Level 2 variable denotes a variable at the higher level in a Mixed-effects model. A level 2 variable typically represents a person-level variable such as sex, race, or baseline age. It also possible to have level 3 variables if the structure of the data includes three more tiers (i.e., subject (Level 3) – wave (Level 2) – prompt (Level 1)).

Figure 1-2. Structure of Level 1 & Level 2 in mixed-effects data.

  • Between-subject (BS) Variance submodel: The BS Variance submodel allows for prediction of the BS variance (of the outcome) in terms of regressors in the Stage 1 model. For example, researchers may use social context (“alone” or “with other”) in EMA data to predict how homogeneous or heterogeneous subjects are in terms of their positive affect when they are alone versus when they are with others.
  • Within-subjects (WS) Variance submodel: This WS Variance submodel allows for prediction of the WS variance (of the outcome) in terms of regression in the Stage 1 model. For example, researchers may ask whether the degree of consistency/erraticism in mood within children is influenced by dietary intake behaviors (i.e., eating high-sugar snacks).
  • MELS model: This refers to a mixed-effects location-scale (MELS) model which generally includes three components: a) Mean model, b) BS Variance submodel, and c) WS Variance submodel; along with a random subject intercept and a random subject scale effect.
  • MEMLS model: This refers toa mixed-effects multiple location scale model (MEMLS), which augments the MELS model by including multiple subject-level random location effects in the Mean model (i.e., both random intercept and slope(s)). In this way, the BS variance is a function of the random intercept and slopes. The model still allows for the WS variance submodel, as well as random scale effect.
  • Stage 1 model: The Stage 1 model is a MELS or MEMLS model that provides estimates of the random subject effects, which then can be used as predictors in a Stage 2 model. In addition to the random effects (location and scale), the Stage 1 model can include covariates for the Mean and Variance submodels. In this way, the random effects can either be unadjusted (submodels with no covariates) or adjusted for covariates (submodels including covariates).
  • Stage 2 model: Stage 2 model is a regression that allows a subject-level outcome or outcome nested within time or clusters to be influenced by the Stage 1 random effects, as well as other covariates. This allows researchers to test whether the (Stage 1) random effects have predictive, mediating, and/or moderating effects on the Stage 2 outcome. As indicated, the Stage 2 model could be a subject-level or multilevel model, and the outcome types that are allowed include continuous (normal), binary, ordinal, nominal, or count outcomes.
  • Two-stage model: This means the combination of Stage 1 (a MELS or MEMLS model) and Stage 2 models in MixWILD. The two-stage model allows for prediction of the outcome in Stage 1 model and the MELS/MEMLS model parses the Stage 1 variance into the effects of locations (i.e., intercept and slope(s)) and variability, which are used as predictors in a regression model of associations with the Stage 2 outcome. For example, one may wonder if there is an association between weekends/weekdays and momentary negative affect (Stage 1 outcome) and may also want to explore if the difference of negative affect on weekends vs. weekdays predicts eating disorders (Stage 2 outcome). MixWILD can generate a MEMLS model to test the association between negative affect (Stage 1 outcome) and weekends/weekdays in Stage 1 model. Then using the random slope effect of weekends/weekdays predicting negative affect in Stage 2 model allows for examining if the change in negative affect on weekends relative to weekdays is associated with eating disorders.


* Stage 1 model must be a multilevel model.
* Stage 2 model could be a subject-level or multilevel model.

Figure 1-3. Simple illustration of Stage 1 & Stage 2 models.


1.2 Why MixWILD?

Thus, researchers have more capabilities to conduct sophisticated research designs and provide the answers with more details. MixWILD is useful for a variety of Intensive Longitudinal Data (ILD) collection strategies (i.e., EMA, sensors) as a robust and reproducible method to test predictors of variability in level 1 outcomes and the associations between subject-level parameters (variances and slopes) and level 2 outcomes (Dzubur et al., 2020). For example, MixWILD can address research questions such as the following.

  1. How do means, variances, and slopes of intensively time-varying variables predict overall subject-level outcomes?
  2. Do means, variances, and slopes of intensively time-varying variables mediate or moderate the effects of time-invariant factors on subject-level outcomes.
  3. How do changes in means, variances, and slopes for intensively time-varying variables (over time or within clusters of people) predict subject-level outcomes?
  4. How do means, variances, and slopes for intensively time-varying variables predict changes in outcomes (over time or within clusters of people)?
  5. What are the relative predictive strengths of two or more means, variances, and slopes of intensively time-varying variables on subject-level outcomes?
  6. How do means, variances and slopes of intensively time-varying count variables or ordinal variables predict subject-level outcomes?

1.2.1 Innovative approach to assess intraindividual means and variances

In EMA studies, it is common to have up to 30 or 40 observations per subject, and this allows us to model subject-level variances (i.e., how erratic is a subject's mood?) and slopes (i.e., how much does a subject's mood change across contexts?) for time-varying variables. Traditionally, intraindividual means and variances from Ecological Momentary Assessment (EMA) data have been computed for each person using standard formulas, such as subject-level standard deviations (i.e., SD and MSSD) (Jahng et al., 2008). These strategies ignore the fact that subjects can have unequal numbers of EMA observations and cannot account for the effects of covariates. Alternatively, the MixWILD approach recognizes that subjects can vary in terms of their numbers of observations, and can generate variability estimates adjusting for the effects of other variables. Furthermore, MixWILD uses the plausible values resampling approach to take into account for the variability that is inherent in these estimates, rather than simply treating these as known predictors (which is what is assumed if one uses SD estimates as regressors). As a result, MixWILD does not ignore this source of variation, which avoids leading to more false positive results.

1.2.2 Extension to behavioral processes and health outcomes

MixWILD can conduct MELS/MEMLS modeling parsing the variance of a Stage 1 outcome into mean and variability, which are used as predictors in Stage 2 regression models of a new outcome. This two-stage approach can be useful to address questions of interest in a variety of research areas in the social and behavioral sciences. These tools can be used to analyze data from and ultimately to design new mHealth interventions, Intensively Adaptive Interventions (IAIs), and Just-In-Time Adaptive Interventions (JITAIs) (Nahum-Shani et., 2018); where subjects provide a greater number of data points over time than in traditional RCT studies. For example, researchers are able to ask critical research questions such as whether erratic mood mediates the effects of depression on physical activity, or whether the effects of living in a highly walkable neighborhood on physical activity are attenuated for individuals with unstable self-efficacy beliefs by examining EMA data using the MixWILD two-stage models.


1.3 Preparing for Use

1.3.1 Download Software

  1. Visit: https://reach-lab.github.io/MixWildGUI/
  2. Submit your email prior to downloading the application in the web page to receive notifications on major software updates.
  3. Click on macOS or Windows to download the program.
  4. Select your directory to save the program.
  5. When finished downloading, double-click on the MixWILD icon and follow the instructions to complete installation.

1.3.2 Install Software

If this is your first time to install MixWILD, the Windows system may ask you to do some extra steps to successfully install the software.

  1. Click the MixWILD-2.0.exe , and click [More info] to continue the process.

  1. Click [Run anyway].

  1. Click [Install] to complete the installation.

1.3.3 Set Up Data

  1. The dataset should be save in a folder, and the fold name CANNOT have any SPACE ( ).

(i.e., Please don't name your folder as " My Data" which will lead to an error). (i.e., Use underscore to replace space , the correct name should be " My_Data").

  1. The dataset should be saved as a .csv file with variable names in the first row.
  2. Data should be in the long format and sorted ascending or descending by ID number.
  3. Missing values should not be blank or periods (.) in the dataset and should be coded as numeric values only (i.e., " -999").

1.3.4 Example Data

You may also find our MixWILD example dataset in the link below.

https://reach-lab.github.io/MixWildGUI/Mixwild_example_data.csv

This is an example dataset which provides you the basic elements which MixWILD needs in the analysis. Some of our tutorials in the following chapters are using this dataset. We encourage you to download it and give it a try, especially if you don't have any EMA dataset on hand. Here, we present the steps involved in using the program with the example dataset, where EMA data were obtained from 94 subjects (1,635 observations). Variables in the example dataset include:

Variable Level Description
ID Participant ID number
AGE 2 Number of years (centered around mean age=29.29)
SEX 2 0 (female); 1 (male)
WEEKEND 1 0 (weekday); 1 (weekend)
DOW 1 Day of week; 0: Monday, 1: Tuesday, …, 6: Sunday
OBESE 2 0 (not obese); 1 (obese)
BMI 2 Body Mass Index (mean = 24.22, min=13.9, max=48.68)
NegMood 1 Levels of negative affect reported in each prompt
PosMood 1 Levels of positive affect reported in each prompt
MVPA_daily_mins 2 Daily averaged moderate-to-vigorous physical activity time in minutes
SED_daily_hours 2 Daily averaged sedentary time in hours
NEGMOOD10 1 A decile (10 category) recoding of the variable NEG_AFFECT
POSMOOD10 1 A decile (10 category) recoding of the variable POS_AFFECT
MVPA_daily4 2 a quartile (4 category) recoding of the variable MVPA_daily_mins

The data are sorted by ID - this is important as the program will not produce correct results if the data are not sorted by the level-2 ID variable. Also, the variables in the dataset are numeric only (i.e., no letters or non-numeric text can be present in the dataset) and the variables are separated by tabs, commas, or one or more spaces in the dataset.

1.3.5 Resources for more data sets

There are more resources for EMA data sets which are publicly available. You may search on “intensive longitudinal” or “EMA” to access more datasets.

Harvard Dataverse is a repository for research data. Deposit data and code here.

https://dataverse.harvard.edu/

Texas Data Repository Dataverse is a research data management system for Texas Digital Library (TDL) member institutions.

https://dataverse.tdl.org/

1.3.6 Compatibility Notes for Windows and macOS users

The user interface for MixWILD runs in a Java runtime environment that provides feature parity between Windows and Mac versions. Native 64-bit binaries for macOS and Windows written in Fortran are used to execute statistical analyses and generate model output.

  • To allow for compatibility with older operating systems and architecture, the Windows version features an option to use 32-bit binaries.
  • Users running MixWILD in a virtual machine, such as VMWare or Parallels, should ensure working directories are isolated from hypervisor processes that allow sharing between host and guest. These include common directories such as Downloads, Desktop, and Documents. Instead, create a new folder located at “C:/MixWILD” to improve compatibility.

Please check the project on Github for more details

https://github.com/reach-lab/MixWildGUI

1.3.7 Statistical Computations

The code for MixWILD is written in FORTRAN and uses maximum likelihood estimation, utilizing both the expectation-maximization (EM) algorithm and the Newton–Raphson method to obtain the parameter estimates. Additionally, the mean and variance of each subject's random effects are estimated using empirical Bayes methods (Dzubur et al., 2020). More information about the estimation procedure can be found in the MIXREGLS manual (Hedeker & Nordgren, 2013).

1.3.8 Suggested References

Software

  • Dzubur, E., Ponnada, A., Nordgren, R., Yang, C. H., Intille, S., Dunton, G., & Hedeker, D. (2020). MixWILD: A program for examining the effects of variance and slope of time-varying variables in intensive longitudinal data. Behavior Research Methods, 1-25.

  • MixWILD: A tutorial of a simple multilevel model with intensive longitudinal data (Link: https://youtu.be/ZqyCxrMG1R8 provided by Eldin Dzubur)

  • Hedeker, D., & Nordgren, R. (2013). MIXREGLS: a program for mixed-effects location scale analysis. Journal of statistical software, 52(12), 1.

Methodology

  • Nordgren, R., Hedeker, D., Dunton, G., & Yang, C. H. (2020). Extending the mixed‐effects model to consider within‐subject variance for Ecological Momentary Assessment data. Statistics in Medicine, 39(5), 577-590.
  • Hedeker, D., Mermelstein, R.J., & Demirtas, H. (2012). Modeling between‐subject and within‐subject variances in ecological momentary assessment data using mixed‐effects location scale models. Statistics in medicine, 31(27), 3328-3336.

Applied Papers

  • Maher J.P., Dzubur, E., Nordgren, R. Huh, J., Chou, C.P., Hedeker, D., Dunton, G. F. Do fluctuations in positive affective and physical feeling states predict physical activity and sedentary time?Psychology of Sport and Exercise. 41, 153-161.
  • Maher, J.P., Huh, J., Intille, S., Hedeker, D., & Dunton, G.F. (2018). Greater variability in daily physical activity is associated with poorer mental health profiles among obese adults. Mental Health and Physical Activity, 14, 74-81.
  • Maher, J. P., Ra, C. K., Leventhal, A. M., Hedeker, D., Huh, J., Chou, C. P., & Dunton, G. F. (2018). Mean level of positive affect moderates associations between volatility in positive affect, mental health, and alcohol consumption among mothers. Journal of abnormal psychology, 127(7), 639.
  • Yang, C. H., Maher, J. P., Ponnada, A., Dzubur, E., Nordgren, R., Intille, S., & Dunton, G. F. (2020). An empirical example of analysis using a two-stage modeling approach: within-subject association of outdoor context and physical activity predicts future daily physical activity levels. Translational Behavioral Medicine.

1.4 Step by-step Instructions to Import Data in MixWILD

  1. Open MixWILD: Double-click on the MixWILD icon to open the main window.
  2. Start to load data:
  • Start with New CVS file: Load the file from your local address if you want to start a new model. Make sure your dataset is a CSV file with variable names in the first row.In addition,all data set should be numerical values (i.e., integers or floats), except for the first row. Finally, data should be in the long format and sorted ascending or descending by ID number.
  • Please make sure your dataset is saved in a folder and its name without any space.
  • Start with Previous Model: You may reload the configuration from your previous model settings and continue the analyses. The model settings are saved as a MW file (i.e., configuration.mw)

  1. Create a new title: It is optional and just for your own reference in the model.
  2. Format missing value: Click on the options of missing values and indicate if there are any in your dataset; specify the numeric missing value code in the box (i.e., ‘-999’).

  1. Select the type of Stage 1 outcome: The Stage 1 outcome should be a time-varying (i.e., multilevel) outcome, and can be of three different outcome types:
  • Continuous: Continuous outcomes are numerical responses that arise from a measuring process. Weight is an example of a continuous numerical variable, because the response takes on any value within a continuum or interval.
  • Dichotomous: Dichotomous outcomes are binary categorical responses, such as “Yes” or “No” answers.
  • Ordinal: Ordinal outcomes refer to sequential categorical responses, such as from a Likert scale (“disagree,” “not sure,” “agree”) or symptom severity (“low,” “medium,” “high”)
  • Select the regression type of Stage 1 outcome: Users may choose between Probit or Logistic model if your Stage 1 outcome is dichotomous or ordinal.

  1. Specify random location effects:
  • Select “Intercept only” , and the model includes a random subject intercept only (in the Mean model).
  • Select “Intercept and slope(s)”, and the model includes a random subject intercept and random subject slope(s) for one or more time-varying variables (in the Mean model).

  1. Include estimates of random scale: Random scale parameters allow subjects to have individual estimates of the within-subject variance (i.e., random subject scale effects), and this is the distinguishing feature of a mixed-effects location scale model. Random scale is not allowed when the outcome is dichotomous.

  1. Select Stage 2 model: MixWILD allows a two-stage modeling approach in which the Stage 1 random effects (i.e., intercept, slope(s), scale) are used as regressors to predict an outcome in a Stage 2 model. Select “Yes” if you want this two-stage approach, or “no” when you just want a Stage 1 model.

  1. Select the type of Stage 2 model: The stage-2 outcome can be subject-level or multilevel.
  • Single level implies an ordinary regression or a subject-level outcome. For example, there is only one outcome per subject in your data (i.e., subject BMI in the example dataset).
  • Multilevel implies a random intercept (multilevel) model for the Stage 2 outcome. For this, the Stage 2 outcome must vary across time (i.e., the occasion varying , for example, NEGMOOD10 in the example dataset).

  1. Select the type of stage 2 outcome: The stage 2 outcome can be of four different outcome types: continuous (normal), dichotomous/ordinal, count, or multinomial.

  1. Set a random seed (optional): Using the same seed for resampling of the random effects that are used in the Stage 2 model (as regressors) will allow you to get the same results (the seed is used in the random number generator that yields the resampled data).

  1. Complete Model Configuration settings:
  • Click “Continue”: Click Continue to enter Stage 1 Configuration window.
  • Click reset (optional): Click Reset to clear the specifications and start over.
  • Save model (optional): Click Save Model to keep all the model configuration settings specified and save them as a .MW file.


1.5 What is in this Users Guide?

This manual comprises four parts, in which various aspects of MixWILD models are discussed.

  • Chapter 2. Stage 1 Model Only

Chapter 2 presents the Stage 1 basic multilevel model for analysis of repeated measurements. It describes the basic settings in MixWILD Stage 1 model and how to run MELS and MEMLS models which include random subject intercept, slope(s), and scale in the analysis of repeated measurements as well as some interpretation of the outputs.

  • Chapter 3. Two-stage Model Approach

Chapter 3 introduces a two-stage model: MixWILD combines a Stage 1 MELS or MEMLS model of a Stage 1 outcome with a subsequent Stage 2 model in which the Stage 1 random effects (i.e., random subject intercept, slopes, and scale) are used as regressors in the Stage 2 model.


2 Chapter 2

Chapter 2 will focus on MixWILD Stage 1 model (e.g., MELS or MEMLS). You will learn how to include random subject intercept, slope(s), and scale in the analysis of repeated measurements. In addition, this chapter will give you details about the submodels of the Stage 1 model (e.g., BS and WS Variance submodels). We will provide simple examples and step-by-step instructions to run a Stage 1 model in MixWILD and interpret the outputs. Example 1 shows how to operate a MELS model with random scale (2.2 Instruction; 2.3 Results); in Example 2, we provide a tutorial to run a MEMLS model in MixWILD (2.4 Instruction; 2.5 Results).


2.1 What is MixWILD Stage 1 model

As mentioned in Chapter 1, the Stage 1 model is a mixed-effects model, in which random subject intercept and slope(s) (location effects) and random subject within-subjects variance (scale effect) are included. On the one hand, the Stage 1 model can be considered as an independent model which applies a mixed-effects modeling approach with random subject effects to examine the associations between a time-varying outcome and subject-level or time-varying covariates. On the other hand, the Stage 1 model can also be an antecedent model which generates necessary estimates (e.g., random subject intercept, slope(s), and scale effects) as regressors for use in a subsequent Stage 2 model. In this chapter, the discussion will focus on the application of the Stage 1 model only. We will describe use of Stage 1 estimates of random effects as regressors in a Stage 2 model in Chapter 3.

The Stage 1 model is a MELS or MEMLS model that includes and provides estimates of the random subject effects. The Stage 1 model can also include covariates for the mean and variance submodels. In this way, the random effects can either be unadjusted (submodels with no covariates) or adjusted for covariates (submodels including covariates). MELS and MEMLS in MixWILD offer an extra capability to handle random effects, compared to the standard Ordinary Least Squares regression (OLS), and goes beyond standard mixed model software by including random subject scale effects.

2.1.1 OLS model (Not Used in MixWILD)

Figure 2.1 shows an example of OLS model which uses WS positive affect (PA) to predict daily averaged moderate-to-vigorous physical activity (MVPA) time in minutes. In this model, we neglected the multilevel data structure (i.e., observations nested within person) and simply analyzed the observations using a standard OLS regression model.

Figure 2-1. OLS model (DV: MVPA; IV: WS PA)

For measurement y (i.e., Daily MVPA) of subject i (i = 1, 2, . . . , N subjects) on occasion j (j = 1, 2, . . . , occasions):

(2.1)

In Equation 2.1, is the intercept and is the regressor (i.e., WS PA for a subject at a particular occasion) for the Mean model, and is the corresponding regression coefficient. is subject i's error at time j. With OLS, subjects share one regression line, and observations are dispersed around it comparably in all subjects. This model ignores the structure of the EMA data and potentially leads to impoverished analysis and inferential error due the violation of Independent and identically distributed (i.i.d.) observations assumption.


2.1.2 MELS model

A MELS or MEMLS model is an extension of a multilevel model. Multilevel models typically include random subject intercepts and possibly random slopes to account for the correlation of the repeated observations within subjects. However, ordinarily, multilevel models do not include covariates to predict WS variance (models _ without _ Scale Parameters), but instead assume a common WS variance. Applying MELS or MEMLS models can allow covariates to influence the WS variance (models with Scale Parameters), and even further allow each subject to have their own degree of WS variance, above and beyond the effects of covariates (models with Random Scale). The random scale effect is often necessary to obtain correct inference for the covariate effects on the WS variance (Please see Nordgren et al., 2020).

Here, we applied the same DV (MVPA) and IV (WS) in a MELS model in MixWILD. As mentioned, this model decomposes the variance of the dependent variable (i.e., Daily MVPA) into BS and WS components. As shown in Figure 2.2, each subject is allowed to have their own intercept while the association (slope) between MVPA and WS PA remains the same.

Figure 2-2. MELS model (DV: MVPA; IV: WS PA)

  • MELS Mean Model

, , (2.2)

In Equation 2.2, is the grand intercept and is the regressor (i.e., WS PA) for the Mean model, and is the corresponding regression coefficient. There is a new component, , to represent a subject's mean (deviation from grant mean, ), and it is referred to as a random location (intercept) effect in the MELS model. are the other regressors for the Mean model and are the corresponding regression coefficients. Please note that bolded andrepresent vectors. In other words, there could be more than one variable, and therefore more than one _ Beta _ coefficient. is subject i's error at time j (deviation of a subject's observations from their mean and the fixed part of the model).

  • MELS BS Variance Submodel (MELS Only)

(2.3)

In the BS Variance submodel (Equation 2.3), refers to the BS variance and are the regressors (typically including “1” and other covariates) for the BS Variance model, and is the corresponding vector of regression coefficients. The BS variance is subscripted by i and j to indicate that the value changes depending on the covariates and their coefficients . Please note that bolded andrepresent vectors. As mentioned, there could be more than one variable, and therefore more than one _ Alpha _ coefficient. The exponential function is used to ensure a positive multiplicative factor, and so the resulting BS variance is strictly positive.

Friendly Note: The BS variance () is defined as the variance across subject-level mean. Thanks to the Intensive-longitudinal Study Design, the EMA data become rich and offer many modeling possibilities. The estimation of the BS variance in the MELS models is more complicated than the tradition variance equation. As a result, the BS variance could be time-varying and different across occasion within a person. It's also possible that WS (time-varying) variables may influence the BS variance.

* Yellow area represents the magnitude of BS variance.

Figure 2-3. MELS BS submodel (DV: BS Variance in MVPA; IV: WS PA)

In Figure 2-3, the BS variance is represented by the dispersion of the subject lines. In particular, the amount of spread (yellow area) across the lines indicates the magnitude of the BS variance. For example, if the lines are close together then subjects are more similar (smaller variance) and vice versa. The magnitude of indicates how different subjects are from each other (heterogeneity).

  • MELS WS** Variance Submodel without Random Scale Effect** (with Scale Parameters Only)

(2.4)

  • MELS WS** Variance Submodel with Random Scale Effect**

, (2.5)

  • MELS WS** Variance Submodel with Random Scale Effect and (Linear) Association between the Location and Scale Random Effect**

, (2.6)

In the WS Variance submodel without random scale effect (Equation 2.4), refers to the WS variance and are the regressors (typically including “1” and other covariates) for the WS variance model, andis the corresponding vector of regression coefficients. The magnitude of indicates how data vary within subjects (erraticism). In the WS variance submodel with random scale effect (Equation 2.5), the random scale effect, , allows the WS variance to vary across subjects beyond the contribution of covariates (Dzubur et al., 2020). In Equation 2.6, the coefficient represents the linear association between a subject's location effect and the WS variance, and refers to the regressor of subject-level random location/intercept. It is also possible to have a linear () term and a quadratic () term in the MELS WS Variance submodel, and the coefficients are represented by and respectively.

The WS variance is subscripted by i and j to indicate that the value change depending on the covariates and their coefficients . Please note that bolded and represent vectors. In other words, there could be more than one variable, and therefore more than one _ Tau _ coefficient. The exponential function is used to ensure that the resulting WS variance is positive.

* Green area represents the magnitude of the WS variance.

Figure 2-4. MELS WS submodel with Random Scale effect

(DV: WS Variance in MVPA; IV: WS PA)

In Figure 2-4, the variation of the points within a subject relative to each subject's line indicates the WS variance (green area). Subject 1 with blue dots has a higher WS variance, compared to Subject 2 with orange dots. This difference in the WS variance across subjects is what the random scale effect () represents in Equation 2.5 and 2.6.

Figure 2-5. An Example of Stage 1 Regressors in Mean, BSV and WSV Submodels

Table 2-1. An Example of Stage 1 Regressors in Mean, BSV and WSV Submodels

Model Regressor Regressor List
Mean model \(x^{'}\) Weekend (LV1), Age (LV2), Sex (LV2)
BS Variance Submodel \(u^{'}\) Weekend (LV1), Sex (LV2)
WS Variance Submodel \(w^{'}\) Weekend (LV1), Sex (LV2)

Please note although we have used different letters (, and ) to represent the covariates in the different submodels (Mean, BS and WS Variance submodels), there is no restriction and the same covariates could be used (Dzubur et al., 2020). For instance, Figure 2-5 and Table 2-1 show that the regressors are identical in BS and WS Variance submodels (Weekend and Sex), while it also allows a different variable set (Weekend, Age and Sex) in Mean model.

Per Hedeker and Nordgren (2013), the parameters of these models (Mean, WS and BS Variance submodels) are estimated using maximum likelihood and the Newton–Raphson algorithm. Once the model has converged to a solution, empirical Bayes methods (Bock, 1989) are used to obtain subject-specific estimates for (random location intercept) and (random scale), along with the variance-covariance matrix associated with these estimates, which are saved for potential use in a Stage 2 model. These correspond to estimates of the mean and variance-covariance of the posterior distribution of the random effects (Dzubur et al., 2020).


2.1.3 MEMLS model

Extending the model presented in the previous section (2.1.2 MELS), one may be interested in understanding how the slopes of the lines vary by subject for time-varying covariates. Such random slopes can be used to generalize the above model, allowing for a vector of random location effects instead of only a random intercept.

Figure 2-6. MEMLS model (DV: MVPA; IV: WS PA)

MixWILD can allow for random subject intercept and slope(s) (of time-varying covariates) in the Mean model, which we refer to as MEMLS models to reflect the multiple location random effects. In all, these subject-specific parameters indicate a baseline mean level (random intercept when the covariate(s) equal 0), the effect of (time-varying) covariate(s) on the mean (random slope), and the degree of within-subject variability (random scale) (Dzubur et al., 2020).

Figure 2-6 shows how the slope of WS PA predicting daily MVPA can vary by subject. The average across all subjects is depicted with the solid black line, and the location averages (mean plus slope) of four subjects are presented as dashed lines.

  • MEMLS Mean Model

,

, , (2.7)

In Equation 2.7, is the grand intercept and is the regressor (i.e., WS PA) for the Mean model, and is the corresponding regression coefficient. The random intercept, , represents subject's mean (deviation) when the regressor (i.e., WS PA) equals zero, as it is a deviation from the grand intercept, . The random slope, , indicates the extra random part (association) for a subject beyond the average slope of predicting . It is possible to have multiple random slope effects in a MEMLS model. are the other regressors for the Mean model and are the corresponding regression coefficients. Please note that bolded andrepresent vectors. is subject i's error at time j (deviation from subject's trend line).

Like MELS, MEMLS has Mean and WS Variance submodels, in which covariates can be included to examine their effects on the mean and WS variance. However, as mentioned, MEMLS augments the MELS model by including multiple random subject effects in the Mean model (i.e., both random intercept and slope(s)). In this way, the BS variance-covariance is a function of the random intercept as well as slopes. Please note the random slope effect is only possible for time-varying covariates.

  • MEMLS BS Variance-covariance Matrix

(2.8)

As the BS variance-covariance Matrix (2.8) shows, and refer to the variances of random intercept and slope effects respectively, and represents the covariance. By examining the variances and covariance(s), the MEMLS Stage 1 model can indicate the degree to which the random intercept and slope vary across subjects. The covariance can show the degree to which the random intercept and random slope are associated with each other.

  • MEMLS WS** Variance Submodel with Random Scale Effect and Association(s) between the Location and Scale Random Effect**

, (2.9)

In the WS Variance submodel (Equation 2.9), refers to the WS variance and are the regressors (typically including “1” and other covariates) for the WS Variance model, andis the corresponding vector of regression coefficients. represents the random scale effect. As discussed in Hedeker and Nordgren (2013), an association between the location and scale random effects can be induced by including the location random effects () as predictors in the WS Variance model, using ,which are terms from the Cholesky decomposition of the variance-covariance matrix. In this regard, MixWILD allows for two possibilities to describe the relationship between random location and random scale: (1) no association ( = 0) or (2) association ( 0) (Dzubur et al., 2020). However, in the current version of MixWILD, there is no option for selecting “No Association”. When using MEMLS models with the random scale effect, the association between random location and random scale will be on and this is the default setting. This option will be available in a future update.

MEMLS can have more than 2 random location effects (i.e., multiple random slopes), though often a random intercept and one slope is common. As shown in the variance-covariance matrix (2.10), it represents a MEMLS model with parameters that consist of one random intercept, one random slope and one random scale. To estimate the variance-covariance in this model, it requires a matrix. If there are n random location effects and one random scale effect in a MEMLS, the size of the variance-covariance matrix will be . In this case, the model will be more complicated and estimation time will be significantly increased with each additional random effect.

(2.10)

In MixWILD, the outcome of the Stage 1 model (both MELS and MEMLS) can be continuous, dichotomous or ordinal. However, a random scale parameter is not be available in Stage 1 models with dichotomous outcomes because dichotomous outcomes generally do not provide enough variance to allow for estimation of the random scale effect.

Table 2-2. Summary of Submodels in MELS and MEMLS

Submodel Outcome MELS MEMLS
Mean Model with Random Intercept Only Original outcome N/A
Mean Model with Random Intercept and Random Slope(s) Original outcome N/A
BS Variance Submodel Variance of the subject-level means in outcome N/A
WS Variance Submodel Within-subject variance of the outcome

* Note: √ – Available; N/A – Not Available

Friendly Note: As shown in Table 2-2, WS Variance submodel and Random Scale effects are distinctive features in MELS and MEMLS models. Models without random scale can still include modeling of the WS variance as a function of covariates. If, in addition to omitting random scale, they also do not include WS Variance modeling, then they are equivalent to standard mixed-effects (aka multilevel) models.