*Multilevel and Longitudinal Modeling Using Stata, Third Edition*, by Sophia Rabe-Hesketh and Anders Skrondal, looks specifically at Stata’s treatment of generalized linear mixed models, also known as multilevel or hierarchical models. These models are “mixed” because they allow fixed and random effects, and they are “generalized” because they are appropriate for continuous Gaussian responses as well as binary, count, and other types of limited dependent variables.

The material in the third edition consists of two volumes, a result of the substantial expansion of material from the second edition, and has much to offer readers of the earlier editions. The text has almost doubled in length from the second edition and almost quadrupled in length from the original version, to almost 1,000 pages across the two volumes. Fully updated for Stata 12, the book has 5 new chapters, more than 20 new exercises, and many new datasets.

The two volumes comprise 16 chapters organized into eight parts.

Volume I is devoted to continuous Gaussian linear mixed models and has nine chapters organized into four parts. The first part reviews the methods of linear regression. The second part provides in-depth coverage of two-level models, the simplest extensions of a linear regression model.

Rabe-Hesketh and Skrondal begin with the comparatively simple random-intercept linear model without covariates, developing the mixed model from principles and thereby familiarizing the reader with terminology, summarizing and relating the widely used estimating strategies, and providing historical perspective. Once the authors have established the mixed-model foundation, they smoothly generalize to random-intercept models with covariates and then to a discussion of the various estimators (between, within, and random-effects). The authors then discuss models with random coefficients.

The third part of volume I describes models for longitudinal and panel data, including dynamic models, marginal models (a new chapter), and growth-curve models (a new chapter). The fourth and final part covers models with nested and crossed random effects, including a new chapter describing in more detail higher-level nested models for continuous outcomes.

The mixed-model foundation and the in-depth coverage of the mixed-model principles provided in volume I for continuous outcomes make it straightforward to transition to generalized linear mixed models for noncontinuous outcomes, which are described in volume II.

Volume II is devoted to generalized linear mixed models for binary, categorical, count, and survival outcomes. The second volume has seven chapters also organized into four parts. The first three parts in volume II cover models for categorical responses, including binary, ordinal, and nominal (a new chapter); models for count data; and models for survival data, including discrete-time and continuous-time (a new chapter) survival responses. The fourth and final part in volume II describes models with nested and crossed-random effects with an emphasis on binary outcomes.

The book has extensive applications of generalized mixed models performed in Stata. Rabe-Hesketh and Skrondal developed gllamm, a Stata program that can fit many latent-variable models, of which the generalized linear mixed model is a special case. As of version 10, Stata contains the xtmixed, xtmelogit, and xtmepoisson commands for fitting multilevel models, in addition to other xt commands for fitting standard random-intercept models. The types of models fit by these commands sometimes overlap; when this happens, the authors highlight the differences in syntax, data organization, and output for the two (or more) commands that can be used to fit the same model. The authors also point out the relative strengths and weaknesses of each command when used to fit the same model, based on considerations such as computational speed, accuracy, available predictions, and available postestimation statistics.

In summary, this book is the most complete, up-to-date depiction of Stata’s capacity for fitting generalized linear mixed models. The authors provide an ideal introduction for Stata users wishing to learn about this powerful data analysis tool.

List of Tables

List of Figures

Preface

Multilevel and longitudinal models: When and why?

**I PRELIMINARIES**

1. REVIEW OF LINEAR REGRESSION

Introduction

Is there gender discrimination in faculty salaries?

Independent-samples t test

One-way analysis of variance

Simple linear regression

Dummy variables

Multiple linear regression

Interactions

Dummy variables for more than two groups

Other types of interactions

Interaction between dummy variables

Interaction between continuous covariates

Nonlinear effects

Residual diagnostics

Causal and noncausal interpretations of regression coefficients

Regression as conditional expectation

Regression as structural model

Summary and further reading

Exercises

**II TWO-LEVEL MODELS**

2. VARIANCE-COMPONENTS MODELS

Introduction

How reliable are peak-expiratory-flow measurements?

Inspecting within-subject dependence

The variance-components model

Model specification

Path diagram

Between-subject heterogeneity

Within-subject dependence

Intraclass correlation

Intraclass correlation versus Pearson correlation

Estimation using Stata

Data preparation: Reshaping to long form

Using xtreg

Using xtmixed

Hypothesis tests and confidence intervals

Hypothesis test and confidence interval for the population mean

Hypothesis test and confidence interval for the between-cluster variance

Likelihood-ratio test

F test

Confidence intervals

Model as data-generating mechanism

Fixed versus random effects

Crossed versus nested effects

Parameter estimation

Model assumptions

Mean structure and covariance structure

Distributional assumptions

Different estimation methods

Inference for ?

Estimate and standard error: Balanced case

Estimate: Unbalanced case

Assigning values to the random intercepts

Maximum “likelihood” estimation

Implementation via OLS regression

Implementation via the mean total residual

Empirical Bayes prediction

Empirical Bayes standard errors

Comparative standard errors

Diagnostic standard errors

Summary and further reading

Exercises

3. RANDOM-INTERCEPT MODELS WITH COVARIATES

Introduction

Does smoking during pregnancy affect birthweight?

Data structure and descriptive statistics

The linear random-intercept model with covariates

Model specification

Model assumptions

Mean structure

Residual variance and intraclass correlation

Graphical illustration of random-intercept model

Estimation using Stata

Using xtreg

Using xtmixed

Coefficients of determination or variance explained

Hypothesis tests and confidence intervals

Hypothesis tests for regression coefficients

Hypothesis tests for individual regression coefficients

Joint hypothesis tests for several regression coefficients

Predicted means and confidence intervals

Hypothesis test for random-intercept variance

Between and within effects of level-1 covariates

Between-mother effects

Within-mother effects

Relations among estimators

Level-2 endogeneity and cluster-level confounding

Allowing for different within and between effects

Hausman endogeneity test

Fixed versus random effects revisited

Assigning values to random effects: Residual diagnostics

More on statistical inference

Overview of estimation methods

Consequences of using standard regression modeling for clustered data

Power and sample-size determination

Summary and further reading

Exercises

4. RANDOM-COEFFICIENT MODELS

Introduction

How effective are different schools?

Separate linear regressions for each school

Specification and interpretation of a random-coefficient model

Specification of a random-coefficient model

Interpretation of the random-effects variances and covariances

Estimation using xtmixed

Random-intercept model

Random-coefficient model

Testing the slope variance

Interpretation of estimates

Assigning values to the random intercepts and slopes

Maximum “likelihood” estimation

Empirical Bayes prediction

Model visualization

Residual diagnostics

Inferences for individual schools

Two-stage model formulation

Some warnings about random-coefficient models

Meaningful specification

Many random coefficients

Convergence problems

Lack of identification

Summary and further reading

Exercises

**III MODELS FOR LONGITUDINAL AND PANEL DATA**

**Introduction to models for longitudinal and panel data (part III)**

5. SUBJECT-SPECIFIC EFFECTS AND DYNAMIC MODELS

Introduction

Conventional random-intercept model

Random-intercept models accommodating endogenous covariates

Consistent estimation of effects of endogenous time-varying covariates

Consistent estimation of effects of endogenous time-varying and endogenous time-constant covariates

Fixed-intercept model

Using xtreg or regress with a differencing operator

Using anova

Random-coefficient model

Fixed-coefficient model

Lagged-response or dynamic models

Conventional lagged-response model

Lagged-response model with subject-specific intercepts

Missing data and dropout

Maximum likelihood estimation under MAR: A simulation

Summary and further reading

Exercises

6. MARGINAL MODELS

Introduction

Mean structure

Covariance structures

Unstructured covariance matrix

Random-intercept or compound symmetric/exchangeable structure

Random-coefficient structure

Autoregressive and exponential structures

Moving-average residual structure

Banded and Toeplitz structures

Hybrid and complex marginal models

Random effects and correlated level-1 residuals

Heteroskedastic level-1 residuals over occasions

Heteroskedastic level-1 residuals over groups

Different covariance matrices over groups

Comparing the fit of marginal models

Generalized estimating equations (GEE)

Marginal modeling with few units and many occasions

Is a highly organized labor market beneficial for economic growth?

Marginal modeling for long panels

Fitting marginal models for long panels in Stata

Summary and further reading

Exercises

7. GROWTH-CURVE MODELS

Introduction

How do children grow?

Observed growth trajectories

Models for nonlinear growth

Polynomial models

Fitting the models

Predicting the mean trajectory

Predicting trajectories for individual children

Piecewise linear models

Fitting the models

Predicting the mean trajectory

Two-stage model formulation

Heteroskedasticity

Heteroskedasticity at level 1

Heteroskedasticity at level 2

How does reading improve from kindergarten through third grade?

Growth-curve model as a structural equation model

Estimation using sem

Estimation using xtmixed

Summary and further reading

Exercises

**IV MODELS WITH NESTED AND CROSSED RANDOM EFFECTS**

8. HIGHER-LEVEL MODELS WITH NESTED RANDOM EFFECTS

Introduction

Do peak-expiratory-flow measurements vary between methods within subjects?

Inspecting sources of variability

Three-level variance-components models

Different types of intraclass correlation

Estimation using xtmixed

Empirical Bayes prediction

Testing variance components

Crossed versus nested random effects revisited

Does nutrition affect cognitive development of Kenyan children?

Describing and plotting three-level data

Data structure and missing data

Level-1 variables

Level-2 variables

Level-3 variables

Plotting growth trajectories

Three-level random-intercept model

Model specification: Reduced form

Model specification: Three-stage formulation

Estimation using xtmixed

Three-level random-coefficient models

Random coefficient at the child level

Random coefficient at the child and school levels

Residual diagnostics and predictions

Summary and further reading

Exercises

9. CROSSED RANDOM EFFECTS

Introduction

How does investment depend on expected profit and capital stock?

A two-way error-components model

Model specification

Residual variances, covariances, and intraclass correlations

Longitudinal correlations

Cross-sectional correlations

Estimation using xtmixed

Prediction

How much do primary and secondary schools affect attainment at age 16?

Data structure

Additive crossed random-effects model

Specification

Estimation using xtmixed

Crossed random-effects model with random interaction

Model specification

Intraclass correlations

Estimation using xtmixed

Testing variance components

Some diagnostics

A trick requiring fewer random effects

Summary and further reading

Exercises

**V MODELS FOR CATEGORICAL RESPONSES**

10. DICHTOMOUS OR BINARY RESPONSES

Introduction

Single-level logit and probit regression models for dichotomous responses

Generalized linear model formulation

Latent-response formulation Logistic regression

Probit regression

Which treatment is best for toenail infection?

Longitudinal data structure

Proportions and fitted population-averaged or marginal probabilities

Random-intercept logistic regression

Model specification

Reduced-form specification

Two-stage formulation

Estimation of random-intercept logistic models

Using xtlogit

Using xtmelogit

Using gllamm

Subject-specific or conditional vs. population-averaged or marginal relationships

Measures of dependence and heterogeneity

Conditional or residual intraclass correlation of the latent responses

Median odds ratio

Measures of association for observed responses at median fixed part of the model

Inference for random-intercept logistic models

Tests and confidence intervals for odds ratios

Tests of variance components

Maximum likelihood estimation

Adaptive quadrature

Some speed and accuracy considerations

Advice for speeding up estimation in gllamm

Assigning values to random effects

Maximum “likelihood” estimation

Empirical Bayes prediction

Empirical Bayes modal prediction

Different kinds of predicted probabilities

Predicted population-averaged or marginal probabilities

Predicted subject-specific probabilities

Predictions for hypothetical subjects: Conditional probabilities

Predictions for the subjects in the sample: Posterior mean probabilities

Other approaches to clustered dichotomous data

Conditional logistic regression

Generalized estimating equations (GEE)

Summary and further reading

Exercises

11. ORDINAL RESPONSES

Introduction

Single-level cumulative models for ordinal responses

Generalized linear model formulation

Latent-response formulation

Proportional odds

Identification

Are antipsychotic drugs effective for patients with schizophrenia?

Longitudinal data structure and graphs

Longitudinal data structure

Plotting cumulative proportions

Plotting cumulative sample logits and transforming the time scale

A single-level proportional odds model

Model specification

Estimation using Stata

A random-intercept proportional odds model

Model specification

Estimation using Stata

Measures of dependence and heterogeneity

Residual intraclass correlation of latent responses

Median odds ratio

A random-coefficient proportional odds model

Model specification

Estimation using gllamm

Different kinds of predicted probabilities

Predicted population-averaged or marginal probabilities

Predicted subject-specific probabilities: Posterior mean

Do experts differ in their grading of student essays?

A random-intercept probit model with grader bias

Model specification

Estimation using gllamm

Including grader-specific measurement error variances

Model specification

Estimation using gllamm

Including grader-specific thresholds

Model specification

Estimation using gllamm

Other link functions

Cumulative complementary log-log model

Continuation-ratio logit model

Adjacent-category logit model

Baseline-category logit and stereotype models

Summary and further reading

Exercises

12. NOMINAL RESPONSES AND DISCRETE CHOICE

Introduction

Single-level models for nominal responses

Multinomial logit models

Conditional logit models

Classical conditional logit models

Conditional logit models also including covariates that vary only over units

Independence from irrelevant alternatives

Utility-maximization formulation

Does marketing affect choice of yogurt?

Single-level conditional logit models

Conditional logit models with alternative-specific intercepts

Multilevel conditional logit models

Preference heterogeneity: Brand-specific random intercepts

Response heterogeneity: Marketing variables with random coefficients

Preference and response heterogeneity

Estimation using gllamm

Estimation using mixlogit

Prediction of random effects and response probabilities

Summary and further reading

Exercises

**VI MODELS FOR COUNTS**

13. COUNTS

Introduction

What are counts?

Counts versus proportions

Counts as aggregated event-history data

Single-level Poisson models for counts

Did the German health-care reform reduce the number of doctor visits?

Longitudinal data structure

Single-level Poisson regression

Model specification

Estimation using Stata

Random-intercept Poisson regression

Model specification

Measures of dependence and heterogeneity

Estimation using Stata

Using xtpoisson

Using xtmepoisson

Using gllamm

Random-coefficient Poisson regression

Model specification

Estimation using Stata

Using xtmepoisson

Using gllamm

Interpretation of estimates

Overdispersion in single-level models

Normally distributed random intercept

Negative binomial models

Mean dispersion or NB2

Constant dispersion or NB1

Quasilikelihood

Level-1 overdispersion in two-level models

Other approaches to two-level count data

Conditional Poisson regression

Conditional negative binomial regression

Generalized estimating equations

Marginal and conditional effects when responses are MAR

Which Scottish counties have a high risk of lip cancer?

Standardized mortality ratios

Random-intercept Poisson regression

Model specification

Estimation using gllamm

Prediction of standardized mortality ratios

Nonparametric maximum likelihood estimation

Specification

Estimation using gllamm

Prediction

Summary and further reading

Exercises

**VII MODELS FOR SURVIVAL OR DURATION DATA**

**Introduction to models for survival or duration data (part VII)**

14. DISCRETE-TIME SURVIVAL

Introduction

Single-level models for discrete-time survival data

Discrete-time hazard and discrete-time survival

Data expansion for discrete-time survival analysis

Estimation via regression models for dichotomous responses

Including covariates

Time-constant covariates

Time-varying covariates

Multiple absorbing events and competing risks

Handling left-truncated data

How does birth history affect child mortality?

Data expansion

Proportional hazards and interval-censoring

Complementary log-log models

A random-intercept complementary log-log model

Model specification

Estimation using Stata

Population-averaged or marginal vs. subject-specific or conditional survival probabilities

Summary and further reading

Exercises

15. CONTINUOUS-TIME SURVIVAL

Introduction

What makes marriages fail?

Hazards and survival

Proportional hazards models

Piecewise exponential model

Cox regression model

Poisson regression with smooth baseline hazard

Accelerated failure-time models

Log-normal model

Time-varying covariates

Does nitrate reduce the risk of angina pectoris?

Marginal modeling

Cox regression

Poisson regression with smooth baseline hazard

Multilevel proportional hazards models

Cox regression with gamma shared frailty

Poisson regression with normal random intercepts

Poisson regression with normal random intercept and random coefficient

Multilevel accelerated failure-time models

Log-normal model with gamma shared frailty

Log-normal model with log-normal shared frailty

A fixed-effects approach

Cox regression with subject-specific baseline hazards

Different approaches to recurrent-event data

Total time

Counting process

Gap time

Summary and further reading

Exercises

**VIII MODELS WITH NESTED AND CROSSED RANDOM EFFECTS**

16. MODELS WITH NESTED AND CROSSED RANDOM EFFECTS

Introduction

Did the Guatemalan immunization campaign work?

A three-level random-intercept logistic regression model

Model specification

Measures of dependence and heterogeneity

Types of residual intraclass correlations of the latent responses

Types of median odds ratios

Three-stage formulation

Estimation of three-level random-intercept logistic regression models

Using gllamm

Using xtmelogit

A three-level random-coefficient logistic regression model

Estimation of three-level random-coefficient logistic regression models

Using gllamm

Using xtmelogit

Prediction of random effects

Empirical Bayes prediction

Empirical Bayes modal prediction

Different kinds of predicted probabilities

Predicted population-averaged or marginal probabilities: New clusters

Predicted median or conditional probabilities

Predicted posterior mean probabilities: Existing clusters

Do salamanders from different populations mate successfully?

Crossed random-effects logistic regression

Summary and further reading

Exercises

A. SYNTAX FOR GLLAMM, EQ, AND GLLAPRED: THE BARE ESSENTIALS

B. SYNTAX FOR GLLAMM

C. SYNTAX FOR GLLAPRED

D. SYNTAX FOR GLLASIM