# Multilevel and Longitudinal Modeling Using Stata, Volume I e II

Multilevel and Longitudinal Modeling Using Stata, Fourth Edition, by Sophia Rabe-Hesketh and Anders Skrondal, is a complete resource for learning to model data in which observations are grouped—whether those groups are formed by a nesting structure, such as children nested in classrooms, or formed by repeated observations on the same individuals. This text introduces random-effects models, fixed-effects models, mixed-effects models, marginal models, dynamic models, and growth-curve models, all of which account for the grouped nature of these types of data. As Rabe-Hesketh and Skrondal introduce each model, they explain when the model is useful, its assumptions, how to fit and evaluate the model using Stata, and how to interpret the results. With this comprehensive coverage, researchers who need to apply multilevel models will find this book to be the perfect companion. It is also the ideal text for courses in multilevel modeling because it provides examples from a variety of disciplines as well as end-of-chapter exercises that allow students to practice newly learned material.

The book comprises two volumes. Volume I focuses on linear models for continuous outcomes, while volume II focuses on generalized linear models for binary, ordinal, count, and other types of outcomes.

Volume I begins with a review of linear regression and then builds on this review to introduce two-level models, the simplest extensions of linear regression to models for multilevel and longitudinal/panel data. Rabe-Hesketh and Skrondal introduce the random-intercept model without covariates, developing the model from principles and thereby familiarizing the reader with terminology, summarizing and relating the widely used estimating strategies, and providing historical perspective. Once the authors have established the foundation, they smoothly generalize to random-intercept models with covariates and then to a discussion of the various estimators (between, within, and random effects). The authors also discuss models with random coefficients. The text then turns to models specifically designed for longitudinal and panel data—dynamic models, marginal models, and growth-curve models. The last portion of volume I covers models with more than two levels and models with crossed random effects.

The foundation and in-depth coverage of linear-model principles provided in volume I allow for a straightforward transition to generalized linear models for noncontinuous outcomes, which are described in volume II. This second volume begins with chapters introducing multilevel and longitudinal models for binary, ordinal, nominal, and count data. Focus then turns to survival analysis, introducing multilevel models for both discrete-time survival data and continuous-time survival data. The volume concludes by extending the two-level generalized linear models introduced in previous chapters to models with three or more levels and to models with crossed random effects.

In both volumes, readers will find extensive applications of multilevel and longitudinal models. Using many datasets that appeal to a broad audience, Rabe-Hesketh and Skrondal provide worked examples in each chapter. They also show the breadth of Stata’s commands for fitting the models discussed. They demonstrate Stata’s xt suite of commands (xtregxtlogitxtpoisson, etc.), which is designed for two-level random-intercept models for longitudinal/panel data. They demonstrate the me suite of commands (mixedmelogitmepoisson, etc.), which is designed for multilevel models, including those with random coefficients and those with three or more levels. In volume 2, they discuss gllamm, a community-contributed Stata command developed by Rabe-Hesketh and Skrondal that can fit many latent-variable models, of which the generalized linear mixed-effects model is a special case.The types of models fit by the xt commands, the me commands, and gllamm sometimes overlap; when this happens, the authors highlight the differences in syntax, data organization, and output for the commands. The authors also point out the strengths and weaknesses of these commands, based on considerations such as computational speed, accuracy, available predictions, and available postestimation statistics.

The fourth edition of Multilevel and Longitudinal Modeling Using Stata has been thoroughly revised and updated. In it, you will find new material on Kenward–Roger degrees-of-freedom adjustments for small sample sizes, difference-in-differences estimation for natural experiments, instrumental-variables estimation to account for level-one endogeneity, and Bayesian estimation for crossed-effects models. In addition, you will find new discussions of meologitcmxtmixlogitmestregmenbreg, and other commands introduced in Stata since the third edition of the book.

In summary, Multilevel and Longitudinal Modeling Using Stata, Fourth Edition is the most complete, up-to-date depiction of Stata’s capacity for fitting models to multilevel and longitudinal data. Readers will also find thorough explanations of the methods and practical advice for using these techniques. This text is a great introduction for researchers and students wanting to learn about these powerful data analysis tools.

List of tables
List of figures
Preface
Multilevel and longitudinal models: When and why?

I PRELIMINARIES

REVIEW OF LINEAR REGRESSION

Introduction
Is there gender discrimination in faculty salaries?
Independent-samples t test
One-way analysis of variance
Simple linear regression
Dummy variables
Multiple linear regression
Interactions
Dummy variables for more than two groups
Other types of interactions

Interaction between dummy variables
Interaction between continuous covariates

Nonlinear effects
Residual diagnostics
Causal and noncausal interpretations of regression coefficients

Regression as conditional expectation
Regression as structural model

Exercises

II TWO-LEVEL MODELS

VARIANCE-COMPONENTS MODELS

Introduction
How reliable are peak-expiratory-flow measurements?
Inspecting within-subject dependence
The variance-components model

Model specification
Path diagram
Between-subject heterogeneity
Within-subject dependence

Intraclass correlation
Intraclass correlation versus Pearson correlation

Estimation using Stata

Data preparation: Reshaping from wide form to long form
Using xtreg
Using xtmixed

Hypothesis tests and confidence intervals

Hypothesis test and confidence interval for the population mean
Hypothesis test and confidence interval for the between-cluster variance

Likelihood-ratio test

Score test

F test

Confidence interval

Model as data-generating mechanism
Fixed versus random effects
Crossed versus nested effects
Parameter estimation

Model assumptions

Mean structure and covariance structure
Distributional assumptions

Different estimation methods
Inference for β

Estimate and standard error: Balanced case
Estimate: Unbalanced case

Assigning values to the random intercepts

Maximum “likelihood” estimation

Implementation via OLS regression
Implementation via the mean total residual

Empirical Bayes prediction
Empirical Bayes standard errors

Posterior and comparative standard errors
Diagnostic standard errors

Accounting for uncertainty in β̂

Bayesian interpretation of REML estimation and prediction

Exercises

RANDOM-INTERCEPT MODELS WITH COVARIATES

Introduction
Does smoking during pregnancy affect birthweight?

Data structure and descriptive statistics

The linear random-intercept model with covariates

Model specification
Model assumptions
Mean structure
Residual covariance structure
Graphical illustration of random-intercept model

Estimation using Stata

Using xtreg
Using xtmixed

Coefficients of determination or variance explained
Hypothesis tests and confidence intervals

Hypothesis tests for individual regression coefficients
Joint hypothesis tests for several regression coefficients

Predicted means and confidence intervals
Hypothesis test for random-intercept variance

Between and within effects of level-1 covariates

Between-mother effects
Within-mother effects
Relations among estimators
Level-2 endogeneity and cluster-level confounding
Allowing for different within and between effects
Robust Hausman  test

Fixed versus random effects revisited
Assigning values to random effects: Residual diagnostics
More on statistical inference

Overview of estimation methods

Pooled OLS
Feasible generalized least squares (FGLS)
ML by iterative GLS (IGLS)
ML by Newton–Raphson and Fisher scoring
ML by the expectation-maximization (EM) algorithm
REML

Consequences of using standard regression modeling for clustered data

Purely between-cluster covariate
Purely within-cluster covariate

Power and sample-size determination

Purely between-cluster covariate
Purely within-cluster covariate

Exercises

RANDOM-COEFFICIENT MODELS

Introduction
How effective are different schools?
Separate linear regressions for each school
Specification and interpretation of a random-coefficient model

Specification of a random-coefficient model
Interpretation of the random-effects variances and covariances

Estimation using xtmixed

Random-intercept model
Random-coefficient model

Testing the slope variance
Interpretation of estimates
Assigning values to the random intercepts and slopes

Maximum “likelihood” estimation
Empirical Bayes prediction
Model visualization
Residual diagnostics
Inferences for individual schools

Two-stage model formulation

Meaningful specification
Many random coefficients
Convergence problems

Lack of identification

Exercises

III MODELS FOR LONGITUDINAL AND PANEL DATA
Introduction to models for longitudinal and panel data (part III)

SUBJECT-SPECIFIC EFFECTS AND DYNAMIC MODELS

Introduction

Random-effects approach: No endogeneity
Fixed-effects approach: Level-2 endogeneity

De-meaning and subject dummies

De-meaning
Subject dummies

Hausman test
Mundlak approach and robust Hausman test
First-differencing

Difference-in-differences and repeated-measures ANOVA

Does raising the minimum wage reduce employment?
Repeated-measures ANOVA

Subject-specific coefficients

Random-coefficient model: No endogeneity
Fixed-coefficient model: Level-2 endogeneity

Hausman–Taylor: Level-2 endogeneity for level-1 and level-2 covariates
Instrumental-variable methods: Level-1 (and level-2) endogeneity

Do deterrents decrease crime rates?
Conventional fixed-effects approach
Fixed-effects IV estimator
Random-effects IV estimator
More Hausman tests

Dynamic models

Dynamic model without subject-specific intercepts
Dynamic model with subject-specific intercepts

Missing data and dropout

Maximum likelihood estimation under MAR: A simulation

Exercises

MARGINAL MODELS

Introduction
Mean structure
Covariance structures

Unstructured covariance matrix
Random-intercept or compound symmetric/exchangeable structure
Random-coefficient structure
Autoregressive and exponential structures
Moving-average residual structure
Banded and Toeplitz structures

Hybrid and complex marginal models

Random effects and correlated level-1 residuals
Heteroskedastic level-1 residuals over occasions
Heteroskedastic level-1 residuals over groups
Different covariance matrices over groups

Comparing the fit of marginal models
Generalized estimating equations (GEE)
Marginal modeling with few units and many occasions

Is a highly organized labor market beneficial for economic growth?
Marginal modeling for long panels
Fitting marginal models for long panels in Stata

Exercises

GROWTH-CURVE MODELS

Introduction
How do children grow?

Observed growth trajectories

Models for nonlinear growth

Polynomial models

Fitting the models
Predicting the mean trajectory
Predicting trajectories for individual children

Piecewise linear models

Fitting the models
Predicting the mean trajectory

Two-stage model formulation
Heteroskedasticity

Heteroskedasticity at level 1
Heteroskedasticity at level 2

Growth-curve model as a structural equation model

Estimation using sem
Estimation using mixed

Exercises

IV MODELS WITH NESTED AND CROSSED RANDOM EFFECTS

HIGHER-LEVEL MODELS WITH NESTED RANDOM EFFECTS

Introduction
Do peak-expiratory-flow measurements vary between methods within subjects?
Inspecting sources of variability
Three-level variance-components models
Different types of intraclass correlation
Estimation using mixed
Empirical Bayes prediction
Testing variance components
Crossed versus nested random effects revisited
Does nutrition affect cognitive development of Kenyan children?
Describing and plotting three-level data

Data structure and missing data

Level-1 variables
Level-2 variables
Level-3 variables
Plotting growth trajectories

Three-level random-intercept model

Model specification: Reduced form
Model specification: Three-stage formulation

Estimation using mixed

Three-level random-coefficient models

Random coefficient at the child level

Estimation using mixed

Random coefficient at the child and school levels

Estimation using mixed

Residual diagnostics and predictions
Exercises

CROSSED RANDOM EFFECTS

Introduction
How does investment depend on expected profit and capital stock?
A two-way error-components model

Model specification
Residual variances, covariances, and intraclass correlations

Longitudinal correlations
Cross-sectional correlations

Estimation using mixed
Prediction

How much do primary and secondary schools affect attainment at age 16?
Data structure

Specification

Intraclass correlations
Estimation using mixed

Crossed random-effects model with random interaction

Model specification
Intraclass correlations
Estimation using mixed
Testing variance components
Some diagnostics

A trick requiring fewer random effects
Exercises

A Useful Stata commands
References
List of tables
List of figures
List of displays

V MODELS FOR CATEGORICAL RESPONSES

DICHTOMOUS OR BINARY RESPONSES

Introduction
Single-level logit and probit regression models for dichotomous responses

Generalized linear model formulation

Labor-participation data
Estimation using logit
Estimation using glm

Latent-response formulation

Logistic regression

Probit regression

Estimation using probit

Which treatment is best for toenail infection?
Longitudinal data structure
Proportions and fitted population-averaged or marginal probabilities
Random-intercept logistic regression

Model specification

Reduced-form specification
Two-stage formulation

Model assumptions

Estimation

Using xtlogit
Using melogit
Using gllamm

Subject-specific or conditional vs. population-averaged or marginal relationships
Measures of dependence and heterogeneity

Conditional or residual intraclass correlation of the latent responses
Median odds ratio
Measures of association for observed responses at median fixed part of the model

Inference for random-intercept logistic models

Tests and confidence intervals for odds ratios
Tests of variance components

Maximum likelihood estimation

Some speed and accuracy considerations

Integration methods and number of quadrature points
Starting values
Using melogit and gllamm for collapsible data

Assigning values to random effects

Maximum “likelihood” estimation
Empirical Bayes prediction
Empirical Bayes modal prediction

Different kinds of predicted probabilities

Predicted population-averaged or marginal probabilities
Predicted subject-specific probabilities

Predictions for hypothetical subjects: Conditional probabilities
Predictions for the subjects in the sample: Posterior mean probabilities

Other approaches to clustered dichotomous data

Conditional logistic regression

Estimation using clogit

Generalized estimating equations (GEE)

Estimation using xtgee

Exercises

ORDINAL RESPONSES

Introduction
Single-level cumulative models for ordinal responses

Generalized linear model formulation
Latent-response formulation
Proportional odds
Identification

Are antipsychotic drugs effective for patients with schizophrenia?
Longitudinal data structure and graphs

Longitudinal data structure
Plotting cumulative proportions
Plotting cumulative sample logits and transforming the time scale

Single-level proportional-odds model

Model specification

Estimation using ologit

Random-intercept proportional-odds model

Model specification

Estimation using meologit

Estimation using gllamm

Measures of dependence and heterogeneity

Residual intraclass correlation of latent responses
Median odds ratio

Random-coefficient proportional odds model

Model specification

Estimation using meologit

Estimation using gllamm

Different kinds of predicted probabilities

Predicted population-averaged or marginal probabilities
Predicted subject-specific probabilities: Posterior mean

Do experts differ in their grading of student essays?
A random-intercept probit model with grader bias

Model specification
Estimation using gllamm

Model specification
Estimation using gllamm

Model specification
Estimation using gllamm

Cumulative complementary log-log model
Continuation-ratio logit model
Baseline-category logit and stereotype models

Exercises

NOMINAL RESPONSES AND DISCRETE CHOICE

Introduction
Single-level models for nominal responses

Multinomial logit models

Transport data version 1
Estimation using mlogit

Conditional logit models with alternative-specific covariates

Transport data version 2: Expanded form
Estimation using clogit
Estimation using cmclogit

Conditional logit models with alternative- and unit-specific covariates

Estimation using clogit
Estimation using cmclogit

Independence from irrelevant alternatives
Utility-maximization formulation
Does marketing affect choice of yogurt?
Single-level conditional logit models

Conditional logit models with alternative-specific intercepts

Estimation using clogit
Estimation using cmclogit

Multilevel conditional logit models

Preference heterogeneity: Brand-specific random intercepts

Estimation using cmxtmixlogit
Estimation using gllamm

Response heterogeneity: Marketing variables with random coefficients

Estimation using cmxtmixlogit
Estimation using gllamm

Preference and response heterogeneity

Estimation using cmxtmixlogit
Estimation using gllamm

Prediction of random effects and response probabilities

Prediction of random effects and household-specific choice probabilities
Exercises

VI MODELS FOR COUNTS

COUNTS

Introduction
What are counts?

Counts versus proportions
Counts as aggregated event-history data

Single-level Poisson models for counts
Did the German health-care reform reduce the number of doctor visits?
Longitudinal data structure
Single-level Poisson regression

Model specification

Estimation using poisson
Estimation using glm

Random-intercept Poisson regression

Model specification
Measures of dependence and heterogeneity
Estimation

Using xtpoisson
Using mepoisson
Using gllamm

Random-coefficient Poisson regression

Model specification

Estimation using mepoisson
Estimation using gllamm

Overdispersion in single-level models

Normally distributed random intercept

Estimation using xtpoisson

Negative binomial models

Mean dispersion or NB2
Constant dispersion or NB1

Quasilikelihood

Estimation using glm

Level-1 overdispersion in two-level models

Random-intercept Poisson model with robust standard errors

Estimation using mepoisson

Three-level random-intercept model

Negative binomial models with random intercepts

Estimation using menbreg

The HHG model

Other approaches to two-level count data

Conditional Poisson regression

Estimation using xtpoisson, fe
Estimation using Poisson regression with dummy variables for clusters

Conditional negative binomial regression
Generalized estimating equations

Estimation using xtgee

Marginal and conditional effects when responses are MAR
Which Scottish counties have a high risk of lip cancer?
Standardized mortality ratios
Random-intercept Poisson regression

Model specification

Estimation using gllamm

Prediction of standardized mortality ratios

Nonparametric maximum likelihood estimation

Specification

Estimation using gllamm

Prediction

Exercises

VII MODELS FOR SURVIVAL OR DURATION DATA
Introduction to models for survival or duration data (part VII)

DISCRETE-TIME SURVIVAL

Introduction
Single-level models for discrete-time survival data

Discrete-time hazard and discrete-time survival

Promotions data

Data expansion for discrete-time survival analysis
Estimation via regression models for dichotomous responses

Estimation using logit

Including time-constant covariates

Estimation using logit

Including time-varying covariates

Estimation using logit

Multiple absorbing events and competing risks

Estimation using mlogit

Handling left-truncated data

How does birth history affect child mortality?
Data expansion
Proportional hazards and interval-censoring
Complementary log-log models

Marginal baseline hazard

Estimation using cloglog

Including covariates

Estimation using cloglog

Random-intercept complementary log-log model

Model specification

Estimation using mecloglog

Population-averaged or marginal vs. subject-specific or conditional survival probabilities
Exercises

CONTINUOUS-TIME SURVIVAL

Introduction
What makes marriages fail?
Hazards and survival
Proportional hazards models

Piecewise exponential model

Estimation using streg
Estimation using poisson

Cox regression model

Estimation using stcox

Cox regression via Poisson regression for expanded data

Estimation using xtpoisson, fe

Approximate Cox regression: Poisson regression, smooth baseline hazard

Estimation using poisson

Accelerated failure-time models

Log-normal model

Estimation using streg
Estimation using stintreg

Time-varying covariates

Estimation using streg

Does nitrate reduce the risk of angina pectoris?
Marginal modeling

Cox regression with occasion-specific dummy variables

Estimation using stcox

Cox regression with occasion-specific baseline hazards

Estimation using stcox, strata

Approximate Cox regression

Estimation using poisson

Multilevel proportional hazards models

Cox regression with gamma shared frailty

Estimation using stcox, shared

Approximate Cox regression with log-normal shared frailty

Estimation using mepoisson

Approximate Cox regression with normal random intercept and coefficient

Estimation using mepoisson

Multilevel accelerated failure-time models

Log-normal model with gamma shared frailty

Estimation using streg

Log-normal model with log-normal shared frailty

Estimation using mestreg

Log-normal model with normal random intercept and random coefficient

Estimation using mestreg

Fixed-effects approach

Stratified Cox regression with subject-specific baseline hazards

Estimation using stcox, strata

Different approaches to recurrent-event data

Total time risk interval
Counting process risk interval
Gap-time risk interval

Exercises

VIII MODELS WITH NESTED AND CROSSED RANDOM EFFECTS

MODELS WITH NESTED AND CROSSED RANDOM EFFECTS

Introduction
Did the Guatemalan immunization campaign work?
A three-level random-intercept logistic regression model

Model specification
Measures of dependence and heterogeneity

Types of residual intraclass correlations of the latent responses
Types of median odds ratios

Three-stage formulation

Estimation

Using gllamm
Using xtmelogit

A three-level random-coefficient logistic regression model

Estimation

Using gllamm
Using xtmelogit

Prediction of random effects

Empirical Bayes prediction
Empirical Bayes modal prediction

Different kinds of predicted probabilities

Predicted population-averaged or marginal probabilities: New clusters
Predicted median or conditional probabilities
Predicted posterior mean probabilities: Existing clusters

Do salamanders from different populations mate successfully?
Crossed random-effects logistic regression

Setup for estimating crossed random-effects model using melogit
Approximate maximum likelihood estimation

Estimation using melogit

Bayesian estimation

Brief introduction to Bayesian inference
Priors for the salamander data
Estimation using bayes: melogit

Estimates compared
Fully Bayesian versus empirical Bayesian inference for random effects

Exercises

Syntax for gllamm, eq, and gllapred: The bare essentials
Syntax for gllamm
Syntax for gllapred
Syntax for gllasim
References Author: Sophia Rabe-Hesketh and Anders Skrondal
Edition: Fourth Edition
ISBN-13: 978-1-59718-136-5