Multilevel and Longitudinal Modeling Using Stata, Volume I e II

Multilevel and Longitudinal Modeling Using Stata, Third Edition, by Sophia Rabe-Hesketh and Anders Skrondal, looks specifically at Stata’s treatment of generalized linear mixed models, also known as multilevel or hierarchical models. These models are “mixed” because they allow fixed and random effects, and they are “generalized” because they are appropriate for continuous Gaussian responses as well as binary, count, and other types of limited dependent variables.

 

The material in the third edition consists of two volumes, a result of the substantial expansion of material from the second edition, and has much to offer readers of the earlier editions. The text has almost doubled in length from the second edition and almost quadrupled in length from the original version, to almost 1,000 pages across the two volumes. Fully updated for Stata 12, the book has 5 new chapters, more than 20 new exercises, and many new datasets.

 

The two volumes comprise 16 chapters organized into eight parts.

 

Volume I is devoted to continuous Gaussian linear mixed models and has nine chapters organized into four parts. The first part reviews the methods of linear regression. The second part provides in-depth coverage of two-level models, the simplest extensions of a linear regression model.

 

Rabe-Hesketh and Skrondal begin with the comparatively simple random-intercept linear model without covariates, developing the mixed model from principles and thereby familiarizing the reader with terminology, summarizing and relating the widely used estimating strategies, and providing historical perspective. Once the authors have established the mixed-model foundation, they smoothly generalize to random-intercept models with covariates and then to a discussion of the various estimators (between, within, and random-effects). The authors then discuss models with random coefficients.

 

The third part of volume I describes models for longitudinal and panel data, including dynamic models, marginal models (a new chapter), and growth-curve models (a new chapter). The fourth and final part covers models with nested and crossed random effects, including a new chapter describing in more detail higher-level nested models for continuous outcomes.

 

The mixed-model foundation and the in-depth coverage of the mixed-model principles provided in volume I for continuous outcomes make it straightforward to transition to generalized linear mixed models for noncontinuous outcomes, which are described in volume II.

 

Volume II is devoted to generalized linear mixed models for binary, categorical, count, and survival outcomes. The second volume has seven chapters also organized into four parts. The first three parts in volume II cover models for categorical responses, including binary, ordinal, and nominal (a new chapter); models for count data; and models for survival data, including discrete-time and continuous-time (a new chapter) survival responses. The fourth and final part in volume II describes models with nested and crossed-random effects with an emphasis on binary outcomes.

 

The book has extensive applications of generalized mixed models performed in Stata. Rabe-Hesketh and Skrondal developed gllamm, a Stata program that can fit many latent-variable models, of which the generalized linear mixed model is a special case. As of version 10, Stata contains the xtmixed, xtmelogit, and xtmepoisson commands for fitting multilevel models, in addition to other xt commands for fitting standard random-intercept models. The types of models fit by these commands sometimes overlap; when this happens, the authors highlight the differences in syntax, data organization, and output for the two (or more) commands that can be used to fit the same model. The authors also point out the relative strengths and weaknesses of each command when used to fit the same model, based on considerations such as computational speed, accuracy, available predictions, and available postestimation statistics.

 

In summary, this book is the most complete, up-to-date depiction of Stata’s capacity for fitting generalized linear mixed models. The authors provide an ideal introduction for Stata users wishing to learn about this powerful data analysis tool.

List of Tables
List of Figures
Preface
Multilevel and longitudinal models: When and why?

 

I PRELIMINARIES

 

1. REVIEW OF LINEAR REGRESSION

Introduction
Is there gender discrimination in faculty salaries?
Independent-samples t test
One-way analysis of variance
Simple linear regression
Dummy variables
Multiple linear regression
Interactions
Dummy variables for more than two groups
Other types of interactions

Interaction between dummy variables
Interaction between continuous covariates

Nonlinear effects
Residual diagnostics
Causal and noncausal interpretations of regression coefficients

Regression as conditional expectation
Regression as structural model

Summary and further reading
Exercises

 

II TWO-LEVEL MODELS

 

2. VARIANCE-COMPONENTS MODELS

Introduction
How reliable are peak-expiratory-flow measurements?
Inspecting within-subject dependence
The variance-components model

Model specification
Path diagram
Between-subject heterogeneity
Within-subject dependence

Intraclass correlation
Intraclass correlation versus Pearson correlation

Estimation using Stata

Data preparation: Reshaping to long form
Using xtreg
Using xtmixed

Hypothesis tests and confidence intervals

Hypothesis test and confidence interval for the population mean
Hypothesis test and confidence interval for the between-cluster variance

Likelihood-ratio test
F test

Confidence intervals

Model as data-generating mechanism
Fixed versus random effects
Crossed versus nested effects
Parameter estimation

Model assumptions

Mean structure and covariance structure
Distributional assumptions

Different estimation methods
Inference for ?

Estimate and standard error: Balanced case
Estimate: Unbalanced case

Assigning values to the random intercepts

Maximum “likelihood” estimation

Implementation via OLS regression
Implementation via the mean total residual

Empirical Bayes prediction
Empirical Bayes standard errors

Comparative standard errors
Diagnostic standard errors

Summary and further reading
Exercises

 

3. RANDOM-INTERCEPT MODELS WITH COVARIATES

Introduction
Does smoking during pregnancy affect birthweight?

Data structure and descriptive statistics

The linear random-intercept model with covariates

Model specification
Model assumptions
Mean structure
Residual variance and intraclass correlation
Graphical illustration of random-intercept model

Estimation using Stata

Using xtreg
Using xtmixed

 Coefficients of determination or variance explained
Hypothesis tests and confidence intervals

Hypothesis tests for regression coefficients

Hypothesis tests for individual regression coefficients
Joint hypothesis tests for several regression coefficients

Predicted means and confidence intervals
Hypothesis test for random-intercept variance

Between and within effects of level-1 covariates

Between-mother effects
Within-mother effects
Relations among estimators
Level-2 endogeneity and cluster-level confounding
Allowing for different within and between effects
Hausman endogeneity test

Fixed versus random effects revisited
Assigning values to random effects: Residual diagnostics
More on statistical inference

Overview of estimation methods
Consequences of using standard regression modeling for clustered data
Power and sample-size determination

Summary and further reading
Exercises

 

4. RANDOM-COEFFICIENT MODELS

Introduction
How effective are different schools?
Separate linear regressions for each school
Specification and interpretation of a random-coefficient model

Specification of a random-coefficient model
Interpretation of the random-effects variances and covariances

Estimation using xtmixed

Random-intercept model
Random-coefficient model

Testing the slope variance
Interpretation of estimates
Assigning values to the random intercepts and slopes

Maximum “likelihood” estimation
Empirical Bayes prediction
Model visualization
Residual diagnostics
Inferences for individual schools

Two-stage model formulation
Some warnings about random-coefficient models

Meaningful specification
Many random coefficients
Convergence problems

Lack of identification

Summary and further reading
Exercises

 

III MODELS FOR LONGITUDINAL AND PANEL DATA
Introduction to models for longitudinal and panel data (part III)

 

5. SUBJECT-SPECIFIC EFFECTS AND DYNAMIC MODELS

Introduction
Conventional random-intercept model
Random-intercept models accommodating endogenous covariates

Consistent estimation of effects of endogenous time-varying covariates
Consistent estimation of effects of endogenous time-varying and endogenous time-constant covariates

Fixed-intercept model

Using xtreg or regress with a differencing operator
Using anova

Random-coefficient model
Fixed-coefficient model
Lagged-response or dynamic models

Conventional lagged-response model
Lagged-response model with subject-specific intercepts

Missing data and dropout

Maximum likelihood estimation under MAR: A simulation

Summary and further reading
Exercises

 

6. MARGINAL MODELS

Introduction
Mean structure
Covariance structures

Unstructured covariance matrix
Random-intercept or compound symmetric/exchangeable structure
Random-coefficient structure
Autoregressive and exponential structures
Moving-average residual structure
Banded and Toeplitz structures

Hybrid and complex marginal models

Random effects and correlated level-1 residuals
Heteroskedastic level-1 residuals over occasions
Heteroskedastic level-1 residuals over groups
Different covariance matrices over groups

Comparing the fit of marginal models
Generalized estimating equations (GEE)
Marginal modeling with few units and many occasions

Is a highly organized labor market beneficial for economic growth?
Marginal modeling for long panels
Fitting marginal models for long panels in Stata

Summary and further reading
Exercises

 

7. GROWTH-CURVE MODELS

Introduction
How do children grow?

Observed growth trajectories

Models for nonlinear growth

Polynomial models

Fitting the models
Predicting the mean trajectory
Predicting trajectories for individual children

Piecewise linear models

Fitting the models
Predicting the mean trajectory

Two-stage model formulation
Heteroskedasticity

Heteroskedasticity at level 1
Heteroskedasticity at level 2

How does reading improve from kindergarten through third grade?
Growth-curve model as a structural equation model

Estimation using sem
Estimation using xtmixed

Summary and further reading
Exercises

 

IV MODELS WITH NESTED AND CROSSED RANDOM EFFECTS

 

8. HIGHER-LEVEL MODELS WITH NESTED RANDOM EFFECTS

Introduction
Do peak-expiratory-flow measurements vary between methods within subjects?
Inspecting sources of variability
Three-level variance-components models
Different types of intraclass correlation
Estimation using xtmixed
Empirical Bayes prediction
Testing variance components
Crossed versus nested random effects revisited
Does nutrition affect cognitive development of Kenyan children?
Describing and plotting three-level data

Data structure and missing data

Level-1 variables
Level-2 variables
Level-3 variables
Plotting growth trajectories

Three-level random-intercept model

Model specification: Reduced form
Model specification: Three-stage formulation

Estimation using xtmixed

Three-level random-coefficient models

Random coefficient at the child level
Random coefficient at the child and school levels

Residual diagnostics and predictions
Summary and further reading
Exercises

 

9. CROSSED RANDOM EFFECTS

Introduction
How does investment depend on expected profit and capital stock?
A two-way error-components model

Model specification
Residual variances, covariances, and intraclass correlations

Longitudinal correlations
Cross-sectional correlations

Estimation using xtmixed
Prediction

How much do primary and secondary schools affect attainment at age 16?
Data structure
Additive crossed random-effects model

Specification
Estimation using xtmixed

Crossed random-effects model with random interaction

Model specification
Intraclass correlations
Estimation using xtmixed
Testing variance components
Some diagnostics

A trick requiring fewer random effects
Summary and further reading
Exercises

 

V MODELS FOR CATEGORICAL RESPONSES

 

10. DICHTOMOUS OR BINARY RESPONSES

Introduction
Single-level logit and probit regression models for dichotomous responses

Generalized linear model formulation
Latent-response formulation Logistic regression

Probit regression

Which treatment is best for toenail infection?
Longitudinal data structure
Proportions and fitted population-averaged or marginal probabilities
Random-intercept logistic regression

Model specification

Reduced-form specification
Two-stage formulation

Estimation of random-intercept logistic models

Using xtlogit
Using xtmelogit
Using gllamm

Subject-specific or conditional vs. population-averaged or marginal relationships
Measures of dependence and heterogeneity

Conditional or residual intraclass correlation of the latent responses
Median odds ratio
Measures of association for observed responses at median fixed part of the model

Inference for random-intercept logistic models

Tests and confidence intervals for odds ratios
Tests of variance components

Maximum likelihood estimation

Adaptive quadrature
Some speed and accuracy considerations

Advice for speeding up estimation in gllamm

Assigning values to random effects

Maximum “likelihood” estimation
Empirical Bayes prediction
Empirical Bayes modal prediction

Different kinds of predicted probabilities

Predicted population-averaged or marginal probabilities
Predicted subject-specific probabilities

Predictions for hypothetical subjects: Conditional probabilities
Predictions for the subjects in the sample: Posterior mean probabilities

Other approaches to clustered dichotomous data

Conditional logistic regression
Generalized estimating equations (GEE)

Summary and further reading
Exercises

 

11. ORDINAL RESPONSES

Introduction
Single-level cumulative models for ordinal responses

Generalized linear model formulation
Latent-response formulation
Proportional odds
Identification

Are antipsychotic drugs effective for patients with schizophrenia?
Longitudinal data structure and graphs

Longitudinal data structure
Plotting cumulative proportions
Plotting cumulative sample logits and transforming the time scale

A single-level proportional odds model

Model specification
Estimation using Stata

A random-intercept proportional odds model

Model specification
Estimation using Stata
Measures of dependence and heterogeneity

Residual intraclass correlation of latent responses
Median odds ratio

A random-coefficient proportional odds model

Model specification
Estimation using gllamm

Different kinds of predicted probabilities

Predicted population-averaged or marginal probabilities
Predicted subject-specific probabilities: Posterior mean

Do experts differ in their grading of student essays?
A random-intercept probit model with grader bias

Model specification
Estimation using gllamm

Including grader-specific measurement error variances

Model specification
Estimation using gllamm

Including grader-specific thresholds

Model specification
Estimation using gllamm

Other link functions

Cumulative complementary log-log model
Continuation-ratio logit model
Adjacent-category logit model
Baseline-category logit and stereotype models

Summary and further reading
Exercises

 

12. NOMINAL RESPONSES AND DISCRETE CHOICE

Introduction
Single-level models for nominal responses

Multinomial logit models
Conditional logit models

Classical conditional logit models
Conditional logit models also including covariates that vary only over units

Independence from irrelevant alternatives
Utility-maximization formulation
Does marketing affect choice of yogurt?
Single-level conditional logit models

Conditional logit models with alternative-specific intercepts

Multilevel conditional logit models

Preference heterogeneity: Brand-specific random intercepts
Response heterogeneity: Marketing variables with random coefficients
Preference and response heterogeneity

Estimation using gllamm
Estimation using mixlogit

Prediction of random effects and response probabilities
Summary and further reading
Exercises

 

VI MODELS FOR COUNTS

 

13. COUNTS

Introduction
What are counts?

Counts versus proportions
Counts as aggregated event-history data

Single-level Poisson models for counts
Did the German health-care reform reduce the number of doctor visits?
Longitudinal data structure
Single-level Poisson regression

Model specification

Estimation using Stata

Random-intercept Poisson regression

Model specification
Measures of dependence and heterogeneity
Estimation using Stata

Using xtpoisson
Using xtmepoisson
Using gllamm

Random-coefficient Poisson regression

Model specification
Estimation using Stata

Using xtmepoisson
Using gllamm

Interpretation of estimates

Overdispersion in single-level models

Normally distributed random intercept
Negative binomial models

Mean dispersion or NB2
Constant dispersion or NB1

Quasilikelihood

Level-1 overdispersion in two-level models
Other approaches to two-level count data

Conditional Poisson regression
Conditional negative binomial regression
Generalized estimating equations

Marginal and conditional effects when responses are MAR
Which Scottish counties have a high risk of lip cancer?
Standardized mortality ratios
Random-intercept Poisson regression

Model specification
Estimation using gllamm
Prediction of standardized mortality ratios

Nonparametric maximum likelihood estimation

Specification
Estimation using gllamm
Prediction

Summary and further reading
Exercises

 

VII MODELS FOR SURVIVAL OR DURATION DATA
Introduction to models for survival or duration data (part VII)

 

14. DISCRETE-TIME SURVIVAL

Introduction
Single-level models for discrete-time survival data

Discrete-time hazard and discrete-time survival
Data expansion for discrete-time survival analysis
Estimation via regression models for dichotomous responses
Including covariates

Time-constant covariates
Time-varying covariates

Multiple absorbing events and competing risks
Handling left-truncated data

How does birth history affect child mortality?
Data expansion
Proportional hazards and interval-censoring
Complementary log-log models
A random-intercept complementary log-log model

Model specification
Estimation using Stata

Population-averaged or marginal vs. subject-specific or conditional survival probabilities
Summary and further reading
Exercises

 

15. CONTINUOUS-TIME SURVIVAL

Introduction
What makes marriages fail?
Hazards and survival
Proportional hazards models

Piecewise exponential model
Cox regression model
Poisson regression with smooth baseline hazard

Accelerated failure-time models

Log-normal model

Time-varying covariates
Does nitrate reduce the risk of angina pectoris?
Marginal modeling

Cox regression
Poisson regression with smooth baseline hazard

Multilevel proportional hazards models

Cox regression with gamma shared frailty
Poisson regression with normal random intercepts
Poisson regression with normal random intercept and random coefficient

Multilevel accelerated failure-time models

Log-normal model with gamma shared frailty
Log-normal model with log-normal shared frailty

A fixed-effects approach

Cox regression with subject-specific baseline hazards

Different approaches to recurrent-event data

Total time
Counting process
Gap time

Summary and further reading
Exercises

 

VIII MODELS WITH NESTED AND CROSSED RANDOM EFFECTS

 

16. MODELS WITH NESTED AND CROSSED RANDOM EFFECTS

Introduction
Did the Guatemalan immunization campaign work?
A three-level random-intercept logistic regression model

Model specification
Measures of dependence and heterogeneity

Types of residual intraclass correlations of the latent responses
Types of median odds ratios

Three-stage formulation

Estimation of three-level random-intercept logistic regression models

Using gllamm
Using xtmelogit

A three-level random-coefficient logistic regression model
Estimation of three-level random-coefficient logistic regression models

Using gllamm
Using xtmelogit

Prediction of random effects

Empirical Bayes prediction
Empirical Bayes modal prediction

Different kinds of predicted probabilities

Predicted population-averaged or marginal probabilities: New clusters
Predicted median or conditional probabilities
Predicted posterior mean probabilities: Existing clusters

Do salamanders from different populations mate successfully?
Crossed random-effects logistic regression
Summary and further reading
Exercises

 

A. SYNTAX FOR GLLAMM, EQ, AND GLLAPRED: THE BARE ESSENTIALS

 

B. SYNTAX FOR GLLAMM

 

C. SYNTAX FOR GLLAPRED

 

D. SYNTAX FOR GLLASIM

Author: di Sophia Rabe-Hesketh and Anders Skrondal
Edition: Third Edition
ISBN-13: 978-1-59718-108-2
©Copyright: 2012
Versione e-Book disponibile

Multilevel and Longitudinal Modeling Using Stata, Third Edition, by Sophia Rabe-Hesketh and Anders Skrondal, looks specifically at Stata’s treatment of generalized linear mixed models, also known as multilevel or hierarchical models. These models are “mixed” because they allow fixed and random effects, and they are “generalized” because they are appropriate for continuous Gaussian responses as well as binary, count, and other types of limited dependent variables.