Multilevel and Longitudinal Modeling Using Stata, Third Edition, by Sophia Rabe-Hesketh and Anders Skrondal, looks specifically at Stata’s treatment of generalized linear mixed models, also known as multilevel or hierarchical models. These models are “mixed” because they allow fixed and random effects, and they are “generalized” because they are appropriate for continuous Gaussian responses as well as binary, count, and other types of limited dependent variables.
The material in the third edition consists of two volumes, a result of the substantial expansion of material from the second edition, and has much to offer readers of the earlier editions. The text has almost doubled in length from the second edition and almost quadrupled in length from the original version, to almost 1,000 pages across the two volumes. Fully updated for Stata 12, the book has 5 new chapters, more than 20 new exercises, and many new datasets.
The two volumes comprise 16 chapters organized into eight parts.
Volume I is devoted to continuous Gaussian linear mixed models and has nine chapters organized into four parts. The first part reviews the methods of linear regression. The second part provides in-depth coverage of two-level models, the simplest extensions of a linear regression model.
Rabe-Hesketh and Skrondal begin with the comparatively simple random-intercept linear model without covariates, developing the mixed model from principles and thereby familiarizing the reader with terminology, summarizing and relating the widely used estimating strategies, and providing historical perspective. Once the authors have established the mixed-model foundation, they smoothly generalize to random-intercept models with covariates and then to a discussion of the various estimators (between, within, and random-effects). The authors then discuss models with random coefficients.
The third part of volume I describes models for longitudinal and panel data, including dynamic models, marginal models (a new chapter), and growth-curve models (a new chapter). The fourth and final part covers models with nested and crossed random effects, including a new chapter describing in more detail higher-level nested models for continuous outcomes.
The mixed-model foundation and the in-depth coverage of the mixed-model principles provided in volume I for continuous outcomes make it straightforward to transition to generalized linear mixed models for noncontinuous outcomes, which are described in volume II.
Volume II is devoted to generalized linear mixed models for binary, categorical, count, and survival outcomes. The second volume has seven chapters also organized into four parts. The first three parts in volume II cover models for categorical responses, including binary, ordinal, and nominal (a new chapter); models for count data; and models for survival data, including discrete-time and continuous-time (a new chapter) survival responses. The fourth and final part in volume II describes models with nested and crossed-random effects with an emphasis on binary outcomes.
The book has extensive applications of generalized mixed models performed in Stata. Rabe-Hesketh and Skrondal developed gllamm, a Stata program that can fit many latent-variable models, of which the generalized linear mixed model is a special case. As of version 10, Stata contains the xtmixed, xtmelogit, and xtmepoisson commands for fitting multilevel models, in addition to other xt commands for fitting standard random-intercept models. The types of models fit by these commands sometimes overlap; when this happens, the authors highlight the differences in syntax, data organization, and output for the two (or more) commands that can be used to fit the same model. The authors also point out the relative strengths and weaknesses of each command when used to fit the same model, based on considerations such as computational speed, accuracy, available predictions, and available postestimation statistics.
In summary, this book is the most complete, up-to-date depiction of Stata’s capacity for fitting generalized linear mixed models. The authors provide an ideal introduction for Stata users wishing to learn about this powerful data analysis tool.
List of Tables
List of Figures
Preface
Multilevel and longitudinal models: When and why?
I PRELIMINARIES
1. REVIEW OF LINEAR REGRESSION
Introduction
Is there gender discrimination in faculty salaries?
Independent-samples t test
One-way analysis of variance
Simple linear regression
Dummy variables
Multiple linear regression
Interactions
Dummy variables for more than two groups
Other types of interactions
Interaction between dummy variables
Interaction between continuous covariates
Nonlinear effects
Residual diagnostics
Causal and noncausal interpretations of regression coefficients
Regression as conditional expectation
Regression as structural model
Summary and further reading
Exercises
II TWO-LEVEL MODELS
2. VARIANCE-COMPONENTS MODELS
Introduction
How reliable are peak-expiratory-flow measurements?
Inspecting within-subject dependence
The variance-components model
Model specification
Path diagram
Between-subject heterogeneity
Within-subject dependence
Intraclass correlation
Intraclass correlation versus Pearson correlation
Estimation using Stata
Data preparation: Reshaping to long form
Using xtreg
Using xtmixed
Hypothesis tests and confidence intervals
Hypothesis test and confidence interval for the population mean
Hypothesis test and confidence interval for the between-cluster variance
Likelihood-ratio test
F test
Confidence intervals
Model as data-generating mechanism
Fixed versus random effects
Crossed versus nested effects
Parameter estimation
Model assumptions
Mean structure and covariance structure
Distributional assumptions
Different estimation methods
Inference for ?
Estimate and standard error: Balanced case
Estimate: Unbalanced case
Assigning values to the random intercepts
Maximum “likelihood” estimation
Implementation via OLS regression
Implementation via the mean total residual
Empirical Bayes prediction
Empirical Bayes standard errors
Comparative standard errors
Diagnostic standard errors
Summary and further reading
Exercises
3. RANDOM-INTERCEPT MODELS WITH COVARIATES
Introduction
Does smoking during pregnancy affect birthweight?
Data structure and descriptive statistics
The linear random-intercept model with covariates
Model specification
Model assumptions
Mean structure
Residual variance and intraclass correlation
Graphical illustration of random-intercept model
Estimation using Stata
Using xtreg
Using xtmixed
Coefficients of determination or variance explained
Hypothesis tests and confidence intervals
Hypothesis tests for regression coefficients
Hypothesis tests for individual regression coefficients
Joint hypothesis tests for several regression coefficients
Predicted means and confidence intervals
Hypothesis test for random-intercept variance
Between and within effects of level-1 covariates
Between-mother effects
Within-mother effects
Relations among estimators
Level-2 endogeneity and cluster-level confounding
Allowing for different within and between effects
Hausman endogeneity test
Fixed versus random effects revisited
Assigning values to random effects: Residual diagnostics
More on statistical inference
Overview of estimation methods
Consequences of using standard regression modeling for clustered data
Power and sample-size determination
Summary and further reading
Exercises
4. RANDOM-COEFFICIENT MODELS
Introduction
How effective are different schools?
Separate linear regressions for each school
Specification and interpretation of a random-coefficient model
Specification of a random-coefficient model
Interpretation of the random-effects variances and covariances
Estimation using xtmixed
Random-intercept model
Random-coefficient model
Testing the slope variance
Interpretation of estimates
Assigning values to the random intercepts and slopes
Maximum “likelihood” estimation
Empirical Bayes prediction
Model visualization
Residual diagnostics
Inferences for individual schools
Two-stage model formulation
Some warnings about random-coefficient models
Meaningful specification
Many random coefficients
Convergence problems
Lack of identification
Summary and further reading
Exercises
III MODELS FOR LONGITUDINAL AND PANEL DATA
Introduction to models for longitudinal and panel data (part III)
5. SUBJECT-SPECIFIC EFFECTS AND DYNAMIC MODELS
Introduction
Conventional random-intercept model
Random-intercept models accommodating endogenous covariates
Consistent estimation of effects of endogenous time-varying covariates
Consistent estimation of effects of endogenous time-varying and endogenous time-constant covariates
Fixed-intercept model
Using xtreg or regress with a differencing operator
Using anova
Random-coefficient model
Fixed-coefficient model
Lagged-response or dynamic models
Conventional lagged-response model
Lagged-response model with subject-specific intercepts
Missing data and dropout
Maximum likelihood estimation under MAR: A simulation
Summary and further reading
Exercises
6. MARGINAL MODELS
Introduction
Mean structure
Covariance structures
Unstructured covariance matrix
Random-intercept or compound symmetric/exchangeable structure
Random-coefficient structure
Autoregressive and exponential structures
Moving-average residual structure
Banded and Toeplitz structures
Hybrid and complex marginal models
Random effects and correlated level-1 residuals
Heteroskedastic level-1 residuals over occasions
Heteroskedastic level-1 residuals over groups
Different covariance matrices over groups
Comparing the fit of marginal models
Generalized estimating equations (GEE)
Marginal modeling with few units and many occasions
Is a highly organized labor market beneficial for economic growth?
Marginal modeling for long panels
Fitting marginal models for long panels in Stata
Summary and further reading
Exercises
7. GROWTH-CURVE MODELS
Introduction
How do children grow?
Observed growth trajectories
Models for nonlinear growth
Polynomial models
Fitting the models
Predicting the mean trajectory
Predicting trajectories for individual children
Piecewise linear models
Fitting the models
Predicting the mean trajectory
Two-stage model formulation
Heteroskedasticity
Heteroskedasticity at level 1
Heteroskedasticity at level 2
How does reading improve from kindergarten through third grade?
Growth-curve model as a structural equation model
Estimation using sem
Estimation using xtmixed
Summary and further reading
Exercises
IV MODELS WITH NESTED AND CROSSED RANDOM EFFECTS
8. HIGHER-LEVEL MODELS WITH NESTED RANDOM EFFECTS
Introduction
Do peak-expiratory-flow measurements vary between methods within subjects?
Inspecting sources of variability
Three-level variance-components models
Different types of intraclass correlation
Estimation using xtmixed
Empirical Bayes prediction
Testing variance components
Crossed versus nested random effects revisited
Does nutrition affect cognitive development of Kenyan children?
Describing and plotting three-level data
Data structure and missing data
Level-1 variables
Level-2 variables
Level-3 variables
Plotting growth trajectories
Three-level random-intercept model
Model specification: Reduced form
Model specification: Three-stage formulation
Estimation using xtmixed
Three-level random-coefficient models
Random coefficient at the child level
Random coefficient at the child and school levels
Residual diagnostics and predictions
Summary and further reading
Exercises
9. CROSSED RANDOM EFFECTS
Introduction
How does investment depend on expected profit and capital stock?
A two-way error-components model
Model specification
Residual variances, covariances, and intraclass correlations
Longitudinal correlations
Cross-sectional correlations
Estimation using xtmixed
Prediction
How much do primary and secondary schools affect attainment at age 16?
Data structure
Additive crossed random-effects model
Specification
Estimation using xtmixed
Crossed random-effects model with random interaction
Model specification
Intraclass correlations
Estimation using xtmixed
Testing variance components
Some diagnostics
A trick requiring fewer random effects
Summary and further reading
Exercises
V MODELS FOR CATEGORICAL RESPONSES
10. DICHTOMOUS OR BINARY RESPONSES
Introduction
Single-level logit and probit regression models for dichotomous responses
Generalized linear model formulation
Latent-response formulation Logistic regression
Probit regression
Which treatment is best for toenail infection?
Longitudinal data structure
Proportions and fitted population-averaged or marginal probabilities
Random-intercept logistic regression
Model specification
Reduced-form specification
Two-stage formulation
Estimation of random-intercept logistic models
Using xtlogit
Using xtmelogit
Using gllamm
Subject-specific or conditional vs. population-averaged or marginal relationships
Measures of dependence and heterogeneity
Conditional or residual intraclass correlation of the latent responses
Median odds ratio
Measures of association for observed responses at median fixed part of the model
Inference for random-intercept logistic models
Tests and confidence intervals for odds ratios
Tests of variance components
Maximum likelihood estimation
Adaptive quadrature
Some speed and accuracy considerations
Advice for speeding up estimation in gllamm
Assigning values to random effects
Maximum “likelihood” estimation
Empirical Bayes prediction
Empirical Bayes modal prediction
Different kinds of predicted probabilities
Predicted population-averaged or marginal probabilities
Predicted subject-specific probabilities
Predictions for hypothetical subjects: Conditional probabilities
Predictions for the subjects in the sample: Posterior mean probabilities
Other approaches to clustered dichotomous data
Conditional logistic regression
Generalized estimating equations (GEE)
Summary and further reading
Exercises
11. ORDINAL RESPONSES
Introduction
Single-level cumulative models for ordinal responses
Generalized linear model formulation
Latent-response formulation
Proportional odds
Identification
Are antipsychotic drugs effective for patients with schizophrenia?
Longitudinal data structure and graphs
Longitudinal data structure
Plotting cumulative proportions
Plotting cumulative sample logits and transforming the time scale
A single-level proportional odds model
Model specification
Estimation using Stata
A random-intercept proportional odds model
Model specification
Estimation using Stata
Measures of dependence and heterogeneity
Residual intraclass correlation of latent responses
Median odds ratio
A random-coefficient proportional odds model
Model specification
Estimation using gllamm
Different kinds of predicted probabilities
Predicted population-averaged or marginal probabilities
Predicted subject-specific probabilities: Posterior mean
Do experts differ in their grading of student essays?
A random-intercept probit model with grader bias
Model specification
Estimation using gllamm
Including grader-specific measurement error variances
Model specification
Estimation using gllamm
Including grader-specific thresholds
Model specification
Estimation using gllamm
Other link functions
Cumulative complementary log-log model
Continuation-ratio logit model
Adjacent-category logit model
Baseline-category logit and stereotype models
Summary and further reading
Exercises
12. NOMINAL RESPONSES AND DISCRETE CHOICE
Introduction
Single-level models for nominal responses
Multinomial logit models
Conditional logit models
Classical conditional logit models
Conditional logit models also including covariates that vary only over units
Independence from irrelevant alternatives
Utility-maximization formulation
Does marketing affect choice of yogurt?
Single-level conditional logit models
Conditional logit models with alternative-specific intercepts
Multilevel conditional logit models
Preference heterogeneity: Brand-specific random intercepts
Response heterogeneity: Marketing variables with random coefficients
Preference and response heterogeneity
Estimation using gllamm
Estimation using mixlogit
Prediction of random effects and response probabilities
Summary and further reading
Exercises
VI MODELS FOR COUNTS
13. COUNTS
Introduction
What are counts?
Counts versus proportions
Counts as aggregated event-history data
Single-level Poisson models for counts
Did the German health-care reform reduce the number of doctor visits?
Longitudinal data structure
Single-level Poisson regression
Model specification
Estimation using Stata
Random-intercept Poisson regression
Model specification
Measures of dependence and heterogeneity
Estimation using Stata
Using xtpoisson
Using xtmepoisson
Using gllamm
Random-coefficient Poisson regression
Model specification
Estimation using Stata
Using xtmepoisson
Using gllamm
Interpretation of estimates
Overdispersion in single-level models
Normally distributed random intercept
Negative binomial models
Mean dispersion or NB2
Constant dispersion or NB1
Quasilikelihood
Level-1 overdispersion in two-level models
Other approaches to two-level count data
Conditional Poisson regression
Conditional negative binomial regression
Generalized estimating equations
Marginal and conditional effects when responses are MAR
Which Scottish counties have a high risk of lip cancer?
Standardized mortality ratios
Random-intercept Poisson regression
Model specification
Estimation using gllamm
Prediction of standardized mortality ratios
Nonparametric maximum likelihood estimation
Specification
Estimation using gllamm
Prediction
Summary and further reading
Exercises
VII MODELS FOR SURVIVAL OR DURATION DATA
Introduction to models for survival or duration data (part VII)
14. DISCRETE-TIME SURVIVAL
Introduction
Single-level models for discrete-time survival data
Discrete-time hazard and discrete-time survival
Data expansion for discrete-time survival analysis
Estimation via regression models for dichotomous responses
Including covariates
Time-constant covariates
Time-varying covariates
Multiple absorbing events and competing risks
Handling left-truncated data
How does birth history affect child mortality?
Data expansion
Proportional hazards and interval-censoring
Complementary log-log models
A random-intercept complementary log-log model
Model specification
Estimation using Stata
Population-averaged or marginal vs. subject-specific or conditional survival probabilities
Summary and further reading
Exercises
15. CONTINUOUS-TIME SURVIVAL
Introduction
What makes marriages fail?
Hazards and survival
Proportional hazards models
Piecewise exponential model
Cox regression model
Poisson regression with smooth baseline hazard
Accelerated failure-time models
Log-normal model
Time-varying covariates
Does nitrate reduce the risk of angina pectoris?
Marginal modeling
Cox regression
Poisson regression with smooth baseline hazard
Multilevel proportional hazards models
Cox regression with gamma shared frailty
Poisson regression with normal random intercepts
Poisson regression with normal random intercept and random coefficient
Multilevel accelerated failure-time models
Log-normal model with gamma shared frailty
Log-normal model with log-normal shared frailty
A fixed-effects approach
Cox regression with subject-specific baseline hazards
Different approaches to recurrent-event data
Total time
Counting process
Gap time
Summary and further reading
Exercises
VIII MODELS WITH NESTED AND CROSSED RANDOM EFFECTS
16. MODELS WITH NESTED AND CROSSED RANDOM EFFECTS
Introduction
Did the Guatemalan immunization campaign work?
A three-level random-intercept logistic regression model
Model specification
Measures of dependence and heterogeneity
Types of residual intraclass correlations of the latent responses
Types of median odds ratios
Three-stage formulation
Estimation of three-level random-intercept logistic regression models
Using gllamm
Using xtmelogit
A three-level random-coefficient logistic regression model
Estimation of three-level random-coefficient logistic regression models
Using gllamm
Using xtmelogit
Prediction of random effects
Empirical Bayes prediction
Empirical Bayes modal prediction
Different kinds of predicted probabilities
Predicted population-averaged or marginal probabilities: New clusters
Predicted median or conditional probabilities
Predicted posterior mean probabilities: Existing clusters
Do salamanders from different populations mate successfully?
Crossed random-effects logistic regression
Summary and further reading
Exercises
A. SYNTAX FOR GLLAMM, EQ, AND GLLAPRED: THE BARE ESSENTIALS
B. SYNTAX FOR GLLAMM
C. SYNTAX FOR GLLAPRED
D. SYNTAX FOR GLLASIM