This study examines how participating in an elementary school program that targets reading ability, participate, affects the scores of a high school state exam used in college admissions, score. Enrollment in the program is voluntary. Therefore, participation is not randomized, which will confound the effects we are measuring.
Fortunately, there was a lottery that selected some individuals within schools to participate in the program and some to be controls. The lottery gave more weight to children in low-income households. Families of selected students received a small one-time cash transfer if they decided to participate. The binary variable selected is 1 for those selected to participate. Of course, not all who were selected by the lottery participated in the program, and some students participated regardless of the lottery outcome.
If the lottery had selected students randomly, we could use selected as an instrument and estimate the LATE. However, because we know that the lottery was based on income, we need to specify a model so that after adjusting for covariates, the selection is as if it were random. We conjecture that once we control for whether an individual lives in a rural area, rural—which are poorer than urban areas for our population—and for whether they live in a low-income zone, lowinc, the instrument is as good as randomly assigned. Using lateffects, we can fit a model to obtain a LATE of participating in the program on the exam score for those that complied with the treatment assignment. Typing
. lateffects kappa (score) (participate) (selected i.lowinc i.rural)
We chose the normalized kappa weighted estimator for illustration. The first set of parentheses corresponds to the outcome, the second to the treatment, and the last to the instrument propensity-score model.
The following result is obtained:

The LATE is 0.82. This means that for the subpopulation of students who comply with the lottery, scores on the state exam would be 0.82 higher on average if they all participate in the reading program than if none of them participates. Scores are measured on a continuous scale from 1 to 10.
© Copyright 1996–2026 StataCorp LLC. All rights reserved.
Suppose that we believe it is important to consider a student’s grade point average (GPA) in 2005, gpa2005, to model both the treatment and the outcome. To model treatment or outcome, we need the inverse-probability-weighted regression adjustment (IPWRA) estimator. We type

The LATE is larger when we account for GPA. However, in both cases, it is around 1 point.
After lateffects, we can inspect whether the treatment assignment is as if it were random after controlling for covariates. It is possible to type

In the first table of the output, we see that after weighting by the probabilities of being assigned to treatment or control, the treatment and control groups are roughly balanced relative to the sample, which has 612 observations assigned to treatment and 1,522 assigned to be controls. Similarly, we see that the standardized differences after weighting are close to 0 and the variance ratios are close to 1. This suggests that after controlling for covariates, treatment assignment is as if it were random. This provides support for our modeling choices.
There is much more we can do. Above, we fit a linear model. When we have a binary, count, or fractional outcome, we can fit probit, logit, Poisson, fractional probit, or fractional logit models. We can also further check assumptions. We could verify that the overlap assumption is satisfied using lateoverlap. Additionally, we could look at the mean of covariates for the compliers to characterize the complier subpopulation using estat compliers.