Microeconometrics Using Stata

Microeconometrics Using Stata, Revised Edition, by A. Colin Cameron and Pravin K. Trivedi, is an outstanding introduction to microeconometrics and how to do microeconometric research using Stata. Aimed at students and researchers, this book covers topics left out of microeconometrics textbooks and omitted from basic introductions to Stata. Cameron and Trivedi provide the most complete and up-to-date survey of microeconometric methods available in Stata.

The revised edition has been updated to reflect the new features available in Stata 11 that are germane to microeconomists. Instead of using mfx and the user-written margeff commands, the revised edition uses the new margins command, emphasizing both marginal effects at the means and average marginal effects. Factor variables, which allow you to specify indicator variables and interaction effects, replace the xi command. The new gmm command for generalized method of moments and nonlinear instrumental-variables estimation is presented, along with several examples. Finally, the chapter on maximum likelihood estimation incorporates the enhancements made to ml in Stata 11.

Early in the book, Cameron and Trivedi introduce simulation methods and then use them to illustrate features of the estimators and tests described in the rest of the book. While simulation methods are important tools for econometricians, they are not covered in standard textbooks. By introducing simulation methods, the authors arm students and researchers with techniques they can use in future work. Cameron and Trivedi address each topic with an in-depth Stata example, and they reference their 2005 textbook, Microeconometrics: Methods and Applications, where appropriate.

The authors also show how to use Stata’s programming features to implement methods for which Stata does not have a specific command. Although the book is not specifically about Stata programming, it does show how to solve many programming problems. These techniques are essential in applied microeconometrics because there will always be new, specialized methods beyond what has already been incorporated into a software package.

Cameron and Trivedi’s choice of topics perfectly reflects the current practice of modern microeconometrics. After introducing the reader to Stata, the authors introduce linear regression, simulation, and generalized least-squares methods. The section on cross-sectional techniques is thorough, with up-to-date treatments of instrumental-variables methods for linear models and of quantile-regression methods.

The next section of the book covers estimators for the parameters of linear panel-data models. The authors’ choice of topics is unique: after addressing the standard random-effects and fixed-effects methods, the authors also describe mixed linear models—a method used in many areas outside of econometrics.

Cameron and Trivedi not only address methods for nonlinear regression models but also show how to code new nonlinear estimators in Stata. In addition to detailing nonlinear methods, which are omitted from most econometrics textbooks, this section shows researchers and students how to easily implement new nonlinear estimators.

The authors next describe inference using analytical and bootstrap approximations to the distribution of test statistics. This section highlights Stata’s power to easily obtain bootstrap approximations, and it also introduces the basic elements of statistical inference.

Cameron and Trivedi then include an extensive section about methods for different nonlinear models. They begin by detailing methods for binary dependent variables. This section is followed by sections about multinomial models, tobit and selection models, count-data models, and nonlinear panel-data models. Two appendices about Stata programming complete the book.

The unique combination of topics, intuitive introductions to methods, and detailed illustrations of Stata examples make Microeconometrics Using Stata an invaluable, hands-on addition to the library of anyone who uses microeconometric methods.

List of tables
List of figures
Preface to the Revised Edition
Preface to the First Edition

1. STATA BASICS

Interactive use
Documentation

Stata manuals
The help command
The search, findit, and hsearch commands

Command syntax and operators

Basic command syntax
Example: The summarize command
Example: The regress command
Abbreviations, case sensitivity, and wildcards
Arithmetic, relational, and logical operators
Error messages

Do-files and log files

Writing a do-file
Running do-files
Log files
A three-step process
Different implementations of Stata

Scalars and matrices

Scalars
Matrices

Using results from Stata commands

Using results from the r-class command summarize
Using results from the e-class command regress

Global and local macros

Global macros
Local macros
Scalar or macro?

Looping commands

The foreach loop
The forvalues loop
The while loop
The continue command

Some useful commands
Template do-file
User-written commands
Stata resources
Exercises

2. DATA MANAGEMENT AND GRAPHICS

Introduction
Types of data

Text or ASCII data
Internal numeric data
String data
Formats for displaying numeric data

Inputting data2.3.1 General principles

Inputting data already in Stata format
Inputting data from the keyboard
Inputting nontext data
Inputting text data from a spreadsheet
Inputting text data in free format
Inputting text data in fixed format
Dictionary files
Common pitfalls

Data management

PSID example
Naming and labeling variables
Viewing data
Using original documentation
Missing values
Imputing missing data

Transforming data (generate, replace, egen, recode)

The generate and replace commands
The egen command
The recode command
The by prefix
Indicator variables
Set of indicator variables
Interactions
Demeaning

Saving data
Selecting the sample

Manipulating datasets

Ordering observations and variables
Preserving and restoring a dataset
Wide and long forms for a dataset
Merging datasets
Appending datasets

Graphical display of data

Stata graph commands

Example graph commands
Saving and exporting graphs
Learning how to use graph commands

Box-and-whisker plot
Histogram
Kernel density plot
Twoway scatterplots and fitted lines
Lowess, kernel, local linear, and nearest-neighbor regression
Multiple scatterplots

Stata resources
Exercises

3. LINEAR REGRESSION BASICS

Introduction
Data and data summary

Data description
Variable description
Summary statistics
More-detailed summary statistics
Tables for data
Statistical tests
Data plots

Regression in levels and logs

Basic regression theory
OLS regression and matrix algebra
Properties of the OLS estimator
Heteroskedasticity-robust standard errors
Cluster–robust standard errors
Regression in logs

Basic regression analysis

Correlations
The regress command
Hypothesis tests
Tables of output from several regressions
Even better tables of regression output 3.4.6 Factor variables for categorical variables and interactions

Specification analysis

Specification tests and model diagnostics
Residual diagnostic plots
Influential observations
Specification tests

Test of omitted variables
Test of the Box–Cox model
Test of the functional form of the conditional mean
Heteroskedasticity test
Omnibus test

Tests have power in more than one direction

Prediction

In-sample prediction
MEs and elasticities
Prediction in logs: The retransformation problem
Prediction exercise

Sampling weights

Weights
Weighted mean
Weighted regression
Weighted prediction and MEs

OLS using Mata
Stata resources
Exercises

4. SIMULATION

Introduction
Pseudorandom-number generators: Introduction

Uniform random-number generation
Draws from normal
Draws from t, chi-squared, F, gamma, and beta
Draws from binomial, Poisson, and negative binomial

Independent (but not identically distributed) draws from binomial
Independent (but not identically distributed) draws from Poisson
Histograms and density plots

Distribution of the sample mean

Stata program
The simulate command
Central limit theorem simulation
The postfile command
Alternative central limit theorem simulation

Pseudorandom-number generators: Further details

Inverse-probability transformation
Direct transformation
Other methods
Draws from truncated normal
Draws from multivariate normal

Direct draws from multivariate normal
Transformation using Cholesky decomposition

Draws using Markov chain Monte Carlo method

Computing integrals

Monte Carlo integration
Monte Carlo integration using different S

Simulation for regression: Introduction

Simulation example: OLS with X2 errors
Interpreting simulation output

Unbiasedness of estimator
Standard errors
t statistic
Test size
Number of simulations

Variations

Different sample size and number of simulations
Test power
Different error distributions

Estimator inconsistency
Simulation with endogenous regressors

Stata resources
Exercises

5. GLS REGRESSION

Introduction
GLS and FGLS regression

GLS for heteroskedastic errors
GLS and FGLS
Weighted least squares and robust standard errors

Modeling heteroskedastic data

Simulated dataset
OLS estimation
Detecting heteroskedasticity
FGLS estimation
WLS estimation

System of linear regressions

SUR model
The sureg command
Application to two categories of expenditures
Robust standard errors
Testing cross-equation constraints
Imposing cross-equation constraints

Survey data: Weighting, clustering, and stratification

Survey design
Survey mean estimation
Survey linear regression

Stata resources

Exercises

6. LINEAR INSTRUMENTAL-VARIABLES REGRESSION

Introduction
IV estimation

Basic IV theory
Model setup
IV estimators: IV, 2SLS, and GMM
Instrument validity and relevance
Robust standard-error estimates

IV example

The ivregress command
Medical expenditures with one endogenous regressor
Available instruments
IV estimation of an exactly identified model
IV estimation of an overidentified model
Testing for regressor endogeneity
Tests of overidentifying restrictions
IV estimation with a binary endogenous regressor

Weak instruments

Finite-sample properties of IV estimators
Weak instruments

Diagnostics for weak instruments
Formal tests for weak instruments

The estat firststage command
Just-identified model
Overidentified model
More than one endogenous regressor
Sensitivity to choice of instruments

Better inference with weak instruments

Conditional tests and confidence intervals
LIML estimator
Jackknife IV estimator
Comparison of 2SLS, LIML, JIVE, and GMM

3SLS systems estimation
Stata resources
Exercises

7. QUANTILE REGRESSION

Introduction
QR

Conditional quantiles
Computation of QR estimates and standard errors
The qreg, bsqreg, and sqreg commands

QR for medical expenditures data

Data summary
QR estimates
Interpretation of conditional quantile coefficients
Retransformation
Comparison of estimates at different quantiles
Heteroskedasticity test
Hypothesis tests
Graphical display of coefficients over quantiles

QR for generated heteroskedastic data

Simulated dataset
QR estimates

QR for count data

Quantile count regression
The qcount command
Summary of doctor visits data
Results from QCR

Stata resources
Exercises

8. LINEAR PANEL-DATA MODELS: BASICS

Introduction
Panel-data methods overview

Some basic considerations
Some basic panel models

Individual-effects model
Fixed-effects model
Random-effects model
Pooled model or population-averaged model
Two-way–effects model
Mixed linear models

Cluster–robust inference
The xtreg command
Stata linear panel-data commands

Panel-data summary

Data description and summary statistics
Panel-data organization
Panel-data description
Within and between variation
Time-series plots for each individual
Overall scatterplot
Within scatterplot
Pooled OLS regression with cluster–robust standard errors
Time-series autocorrelations for panel data
Error correlation in the RE model

Pooled or population-averaged estimators

Pooled OLS estimator
Pooled FGLS estimator or population-averaged estimator
The xtreg, pa command
Application of the xtreg, pa command

Within estimator

Within estimator
The xtreg, fe command
Application of the xtreg, fe command
Least-squares dummy-variables regression

Between estimator

Between estimator
Application of the xtreg, be command

RE estimator

RE estimator
The xtreg, re command
Application of the xtreg, re command

Comparison of estimators

Estimates of variance components
Within and between R-squared
Estimator comparison
Fixed effects versus random effects
Hausman test for fixed effects

The hausman command
Robust Hausman test

Prediction

First-difference estimator

First-difference estimator
Strict and weak exogeneity

Long panels

Long-panel dataset
Pooled OLS and PFGLS
The xtpcse and xtgls commands
Application of the xtgls, xtpcse, and xtscc commands
Separate regressions
FE and RE models
Unit roots and cointegration

Panel-data management

Wide-form data
Convert wide form to long form
Convert long form to wide form
An alternative to wide-form data

Stata resources
Exercises

9. LINEAR PANEL-DATA MODELS: EXTENSIONS

Introduction
Panel IV estimation

Panel IV
The xtivreg command
Application of the xtivreg command
Panel IV extensions

Hausman–Taylor estimator

Hausman–Taylor estimator
The xthtaylor command
Application of the xthtaylor command

Arellano–Bond estimator

Dynamic model
IV estimation in the FD model
The xtabond command
Arellano–Bond estimator: Pure time series
Specification tests
The xtdpdsys command
The xtdpd command

Mixed linear models

Mixed linear model
The xtmixed command
Random-intercept model
Cluster–robust standard errors
Random-slopes model
Random-coefficients model
Two-way random-effects model

Clustered data

Clustered dataset
Clustered data using nonpanel commands
Clustered data using panel commands
Hierarchical linear models

Stata resources
Exercises

10. NONLINEAR REGRESSION METHODS

Introduction
Nonlinear example: Doctor visits

Data description
Poisson model description

Nonlinear regression methods

MLE
The poisson command
Postestimation commands
NLS
The nl command
GLM
The glm command
Other estimators

Different estimates of the VCE

General framework
The vce() option
Application of the vce() option
Default estimate of the VCE
Robust estimate of the VCE
Cluster–robust estimate of the VCE
Heteroskedasticity- and autocorrelation-consistent estimate of the VCE
Bootstrap standard errors
Statistical inference

Prediction

The predict and predictnl commands
Application of predict and predictnl
Out-of-sample prediction
Prediction at a specified value of one of the regressors
Prediction at a specified value of all the regressors
Prediction of other quantities
The margins command for prediction

Marginal effects

Calculus and finite-difference methods
MEs estimates AME, MEM, and MER
Elasticities and semielasticities
Simple interpretations of coefficients in single-index models
The margins command for marginal effects
MEM: Marginal effect at mean

Comparison of calculus and finite-difference methods

MER: Marginal effect at representative value
AME: Average marginal effect
Elasticities and semielasticities
AME computed manually
Polynomial regressors
Interacted regressors
Complex interactions and nonlinearities

Model diagnostics

Goodness-of-fit measures
Information criteria for model comparison
Residuals
Model-specification tests

Stata resources
Exercises

11. NONLINEAR OPTIMIZATION METHODS

Introduction
Newton–Raphson method

NR method
NR method for Poisson
Poisson NR example using Mata

Core Mata code for Poisson NR iterations
Complete Stata and Mata code for Poisson NR iterations

Maximization options
Messages during iterations
Stopping criteria
Multiple maximums
Numerical derivatives

The ml command: lf method

The ml command
The lf method
Poisson example: Single-index model
Negative binomial example: Two-index model
NLS example: Nonlikelihood model

Checking the program

Program debugging using ml check and ml trace
Getting the program to run
Checking the data
Multicollinearity and near collinearity
Multiple optimums
Checking parameter estimation
Checking standard-error estimation

The ml command: d0, d1, d2, lf0, lf1, and lf2 methods

Evaluator functions
The d0 method
The d1 method
The lf1 method with the robust estimate of the VCE
The d2 and lf2 methods

The Mata optimize() function

Type d and gf evaluators
Optimize functions
Poisson example

Evaluator program for Poisson MLE
The optimize() function for Poisson MLE

Generalized method of moments

Definition
Nonlinear IV example
GMM using the Mata optimize() function

Stata resources
Exercises

12. TESTING METHODS

Introduction
Critical values and p-values

Standard normal compared with Student’s t
Chi-squared compared with F
Plotting densities
Computing p-values and critical values
Which distributions does Stata use?

Wald tests and confidence intervals

Wald test of linear hypotheses
The test command

Test single coefficient
Test several hypotheses
Test of overall significance
Test calculated from retrieved coefficients and VCE

One-sided Wald tests
Wald test of nonlinear hypotheses (delta method)
The testnl command
Wald confidence intervals
The lincom command
The nlcom command (delta method)
Asymmetric confidence intervals

Likelihood-ratio tests

Likelihood-ratio tests
The lrtest command
Direct computation of LR tests

Lagrange multiplier test (or score test)

LM tests
The estat command
LM test by auxiliary regression

Test size and power

Simulation DGP: OLS with chi-squared errors
Test size
Test power
Asymptotic test power

Specification tests

Moment-based tests
Information matrix test
Chi-squared goodness-of-fit test
Overidentifying restrictions test
Hausman test
Other tests

Stata resources
Exercises

13. BOOTSTRAP METHODS

Introduction
Bootstrap methods

Bootstrap estimate of standard error
Bootstrap methods
Asymptotic refinement
Use the bootstrap with caution

Bootstrap pairs using the vce(bootstrap) option

Bootstrap-pairs method to estimate VCE
The vce(bootstrap) option
Bootstrap standard-errors example
How many bootstraps?
Clustered bootstraps
Bootstrap confidence intervals
The postestimation estat bootstrap command
Bootstrap confidence-intervals example
Bootstrap estimate of bias

Bootstrap pairs using the bootstrap command

The bootstrap command
Bootstrap parameter estimate from a Stata estimation command
Bootstrap standard error from a Stata estimation command
Bootstrap standard error from a user-written estimation command
Bootstrap two-step estimator
Bootstrap Hausman test
Bootstrap standard error of the coefficient of variation

Bootstraps with asymptotic refinement

Percentile-t method
Percentile-t Wald test
Percentile-t Wald confidence interval

Bootstrap pairs using bsample and simulate

The bsample command
The bsample command with simulate
Bootstrap Monte Carlo exercise

Alternative resampling schemes

Bootstrap pairs
Parametric bootstrap
Residual bootstrap
Wild bootstrap
Subsampling

The jackknife

Jackknife method
The vce(jackknife) option and the jackknife command

Stata resources
Exercises

14. BINARY OUTCOME MODELS

Introduction
Some parametric models

Basic model
Logit, probit, linear probability, and clog-log models

Estimation

Latent-variable interpretation and identification
ML estimation
The logit and probit commands
Robust estimate of the VCE
OLS estimation of LPM

Example

Data description
Logit regression
Comparison of binary models and parameter estimates

Hypothesis and specification tests

Wald tests
Likelihood-ratio tests

Lagrange multiplier test of generalized logit
Heteroskedastic probit regression

Model comparison

Goodness of fit and prediction

Pseudo-R2 measure
Comparing predicted probabilities with sample frequencies
Comparing predicted outcomes with actual outcomes
The predict command for fitted probabilities
The prvalue command for fitted probabilities

Marginal effects

Marginal effect at a representative value (MER)
Marginal effect at the mean (MEM)
Average marginal effect (AME)
The prchange command

Endogenous regressors

Example
Model assumptions
Structural-model approach

The ivprobit command
Maximum likelihood estimates
Two-step sequential estimates

IVs approach

Grouped data

Estimation with aggregate data
Grouped-data application

Stata resources
Exercises

15. MULTINOMINAL MODELS

Introduction
Multinomial models overview

Probabilities and MEs
Maximum likelihood estimation
Case-specific and alternative-specific regressors
Stata multinomial model commands

Multinomial example: Choice of fishing mode

Data description
Case-specific regressors
Alternative-specific regressors

Multinomial logit model

The mlogit command
Application of the mlogit command
Coefficient interpretation
Predicted probabilities
MEs

Conditional logit model

Creating long-form data from wide-form data
The asclogit command
The clogit command
Application of the asclogit command
Relationship to multinomial logit model
Coefficient interpretation
Predicted probabilities

MEs

Nested logit model

Relaxing the independence of irrelevant alternatives assumption
NL model
The nlogit command
Model estimates
Predicted probabilities
MEs
Comparison of logit models

Multinomial probit model

MNP
The mprobit command
Maximum simulated likelihood
The asmprobit command
Application of the asmprobit command
Predicted probabilities and MEs

Random-parameters logit

Random-parameters logit
The mixlogit command
Data preparation for mixlogit
Application of the mixlogit command

Ordered outcome models

Data summary
Ordered outcomes
Application of the ologit command
Predicted probabilities
MEs
Other ordered models

Multivariate outcomes

Bivariate probit
Nonlinear SUR

Stata resources
Exercises

16. TOBIT AND SELECTION MODELS

Introduction
Tobit model

Regression with censored data
Tobit model setup
Unknown censoring point
Tobit estimation
ML estimation in Stata

Tobit model example

Data summary
Tobit analysis
Prediction after tobit
Marginal effects

Left-truncated, left-censored, and right-truncated examples
Left-censored case computed directly
Marginal impact on probabilities

The ivtobit command

Tobit for lognormal data

Data example
Setting the censoring point for data in logs
Results
Two-limit tobit
Model diagnostics
Tests of normality and homoskedasticity

Generalized residuals and scores
Test of normality
Test of homoskedasticity

Next step?

Two-part model in logs

Model structure
Part 1 specification
Part 2 of the two-part model

Selection model

Model structure and assumptions
ML estimation of the sample-selection model
Estimation without exclusion restrictions
Two-step estimation
Estimation with exclusion restrictions

Prediction from models with outcome in logs

Predictions from tobit
Predictions from two-part model
Predictions from selection model

Stata resources

Exercises

17. COUNT-DATA MODELS

Introduction
Features of count data

Generated Poisson data
Overdispersion and negative binomial data
Modeling strategies
Estimation methods

Empirical example 1

Data summary
Poisson model

Poisson model results
Robust estimate of VCE for Poisson MLE
Test of overdispersion
Coefficient interpretation and marginal effects

NB2 model

NB2 model results
Fitted probabilities for Poisson and NB2 models
The countfit command
The prvalue command
Discussion
Generalized NB model

Nonlinear least-squares estimation
Hurdle model

Variants of the hurdle model
Application of the hurdle model

Finite-mixture models

FMM specification
Simulated FMM sample with comparisons
ML estimation of the FMM
The fmm command
Application: Poisson finite-mixture model
Interpretation
Comparing marginal effects
Application: NB finite-mixture model
Model selection
Cautionary note

Empirical example 2

Zero-inflated data
Models for zero-inflated data
Results for the NB2 model

The prcounts command

Results for ZINB
Model comparison

The countfit command
Model comparison using countfit

Models with endogenous regressors

Structural-model approach

Model and assumptions
Two-step estimation
Application

Nonlinear IV method

Stata resources

Exercises

18. NONLINEAR PANEL MODELS

Introduction
Nonlinear panel-data overview

Some basic nonlinear panel models

FE models
RE models
Pooled models or population-averaged models
Comparison of models

Dynamic models
Stata nonlinear panel commands

Nonlinear panel-data example

Data description and summary statistics
Panel-data organization
Within and between variation
FE or RE model for these data?

Binary outcome models

Panel summary of the dependent variable
Pooled logit estimator
The xtlogit command
The xtgee command
PA logit estimator
RE logit estimator
FE logit estimator
Panel logit estimator comparison
Prediction and marginal effects
Mixed-effects logit estimator

Tobit model

Panel summary of the dependent variable
RE tobit model
Generalized tobit models
Parametric nonlinear panel models

Count-data models

The xtpoisson command
Panel summary of the dependent variable
Pooled Poisson estimator
PA Poisson estimator
RE Poisson estimators
FE Poisson estimator
Panel Poisson estimators comparison
Negative binomial estimators

Stata resources
Exercises

A. PROGRAMMING IN STATA

Stata matrix commands

Stata matrix overview
Stata matrix input and output

Matrix input by hand
Matrix input from Stata estimation results

Stata matrix subscripts and combining matrices
Matrix operators
Matrix functions
Matrix accumulation commands
OLS using Stata matrix commands

Programs

Modifying a program
Programs with positional arguments
Temporary variables
Programs with named positional arguments
Storing and retrieving program results
Programs with arguments using standard Stata syntax

Program debugging

Some simple tips
Error messages and return code
Trace

B. MATA

How to run Mata

Mata commands in Mata
Mata commands in Stata
Stata commands in Mata
Interactive versus batch use
Mata help

Mata matrix commands

Mata matrix input

Matrix input by hand
Identity matrices, unit vectors, and matrices of constants
Matrix input from Stata data
Matrix input from Stata matrix
Stata interface functions

Mata matrix operators

Element-by-element operators

Mata functions

Scalar and matrix functions
Matrix inversion

Mata cross products
Mata matrix subscripts and combining matrices
Transferring Mata data and matrices to Stata

Creating Stata matrices from Mata matrices
Creating Stata data from a Mata vector

Programming in Mata

Declarations
Mata program
Mata program with results output to Stata
Stata program that calls a Mata program 