Sample selection arises when the sampled data are not representative of the population of interest. A classic example of sample selection is women’s work participation. Suppose that we want to model the wages of women. If we consider only the sample of women who chose to work, we may end up with a sample in which the wages are too high because women who would have low wages may have chosen not to work. Of course, if the decision whether to work is random, there would be no problem with using only the sample of women who work. This is not a realistic assumption in this case. To obtain valid inference in this example, we must model the outcome, the wages, and the decision to work. We will refer to the two models as the outcome model and the participation model.

In Stata, you can use heckman to fit a Heckman selection model to continuous outcomes, heckprobit to fit a probit sample-selection model to binary outcomes, and heckoprobit to fit an ordered probit model with sample selection to ordinal outcomes. You can now simply prefix these commands with bayes: to fit the corresponding Bayesian sample-selection models.

Let’s see it work

Continuing with our example of women’s work participation, we first fit the classical Heckman sample-selection model. Below we model both the wages and the decision to work based on the level of education and age. For the decision to work, we additionally include marriage status and number of children.

```. heckman wage educ age, select(married children educ age)

Heckman selection model                         Number of obs     =      2,000
(regression model with sample selection)              Selected    =      1,343
Nonselected =        657

Wald chi2(2)      =     508.44
Log likelihood = -5178.304                      Prob > chi2       =     0.0000

```
 wage Coef. Std. Err. z P>|z| [95% Conf. Interval] wage5 education .9899537 .0532565 18.59 0.000 .8855729 1.094334 age .2131294 .0206031 10.34 0.000 .1727481 .2535108 _cons .4857752 1.077037 0.45 0.652 -1.625179 2.59673 select married .4451721 .0673954 6.61 0.000 .3130794 .5772647 children .4387068 .0277828 15.79 0.000 .3842534 .4931601 education .0557318 .0107349 5.19 0.000 .0346917 .0767718 age .0365098 .0041533 8.79 0.000 .0283694 .0446502 _cons -2.491015 .1893402 -13.16 0.000 -2.862115 -2.119915 /athrho .8742086 .1014225 8.62 0.000 .6754241 1.072993 /lnsigma 1.792559 .027598 64.95 0.000 1.738468 1.84665 rho .7035061 .0512264 .5885365 .7905862 sigma 6.004797 .1657202 5.68862 6.338548 lambda 4.224412 .3992265 3.441942 5.006881
```LR test of indep. eqns. (rho = 0):   chi2(1) =    61.20   Prob > chi2 = 0.0000

```

To fit its Bayesian analog, we use bayes: heckman.

```. bayes: heckman wage educ age, select(married children educ age)

```
 Model summary Likelihood: wage ~ heckman(xb_wage,xb_select,{athrho} {lnsigma}) Priors: {wage:education age _cons} ~ normal(0,10000) (1) {select:married children education age _cons} ~ normal(0,10000) (2) {athrho lnsigma} ~ normal(0,10000) (1) Parameters are elements of the linear form xb_wage. (2) Parameters are elements of the linear form xb_select.
```Bayesian Heckman selection model                MCMC iterations   =     12,500
Random-walk Metropolis-Hastings sampling        Burn-in           =      2,500
MCMC sample size  =     10,000
Number of obs     =      2,000
Selected    =      1,343
Nonselected =        657
Acceptance rate   =      .3484
Efficiency:   min =     .02314
avg =     .03657
Log marginal likelihood = -5260.2024                          max =     .05013

```
 Equal-tailed Mean Std. Dev. MCSE Median [95% Cred. Interval] wage education .9919131 .051865 .002609 .9931531 .8884407 1.090137 age .2131372 .0209631 .001071 .2132548 .1720535 .2550835 _cons .4696264 1.089225 .0716 .4406188 -1.612032 2.65116 select married .4461775 .0681721 .003045 .4456493 .3178532 .5785857 children .4401305 .0255465 .001156 .4402145 .3911135 .4903804 education .0559983 .0104231 .000484 .0556755 .0360289 .076662 age .0364752 .0042497 .000248 .0362858 .0280584 .0449843 _cons -2.494424 .18976 .011327 -2.498414 -2.861266 -2.114334 athrho .868392 .099374 .005961 .8699977 .6785641 1.062718 lnsigma 1.793428 .0269513 .001457 1.793226 1.740569 1.846779
```Note: Default priors are used for model parameters.

```

Unlike heckman, bayes: heckman reports the ancillary parameters only in the estimation metric. We can use bayesstats summary to obtain the parameters in the original metric.

```. bayesstats summary (rho:1-2/(exp(2*{athrho})+1)) (sigma:exp({lnsigma}))

Posterior summary statistics                      MCMC sample size =    10,000

rho : 1-2/(exp(2*{athrho})+1)
sigma : exp({lnsigma})

```
 Equal-tailed Mean Std. Dev. MCSE Median [95% Cred. Interval] rho .6970522 .0510145 .003071 .701373 .5905851 .7867018 sigma 6.012205 .1621422 .008761 6.008807 5.700587 6.339366

Parameter rho is a correlation coefficient that measures the dependence between the outcome and participation models. If rho is zero, the two models are independent and can be analyzed separately. In other words, there is no sample selection, and we can model the wages using only the sample of women who work without introducing any bias in our results. In our example, rho is estimated to be between 0.59 and 0.79 with a probability of 0.95, so the decision to work is related to the wages in this example.

We can test for sample selection formally by using, for example, Bayes factors. A Bayes factor of two models is simply the ratio of their marginal likelihoods. The larger the value of the marginal likelihood, the better the model fits the data. To test for sample selection, we can compare the marginal likelihoods of the current model and of the model with rho equal to zero.

First, we store the current Bayesian estimation results from the sample-selection model.

```. bayes, saving(heckman_mcmc)

. estimates store heckman

```

Next, we fit a model that assumes no sample selection. When rho equals zero, {athrho} also equals zero. So we specify a strong prior saturated at zero for parameter {athrho}.

```. bayes, prior({athrho}, normal(0,1e-4)) saving(nosel_mcmc):
heckman wage educ age, select(married children educ age)

```
 Model summary Likelihood: wage ~ heckman(xb_wage,xb_select,{athrho} {lnsigma}) Priors: {wage:education age _cons} ~ normal(0,10000) (1) {select:married children education age _cons} ~ normal(0,10000) (2) {athrho} ~ normal(0,1e-4) {lnsigma} ~ normal(0,10000) (1) Parameters are elements of the linear form xb_wage. (2) Parameters are elements of the linear form xb_select.
```Bayesian Heckman selection model                MCMC iterations   =     12,500
Random-walk Metropolis-Hastings sampling        Burn-in           =      2,500
MCMC sample size  =     10,000
Number of obs     =      2,000
Selected    =      1,343
Nonselected =        657
Acceptance rate   =      .3065
Efficiency:   min =     .03943
avg =     .09498
Log marginal likelihood = -5283.0246                          max =      .2432

```
 Equal-tailed Mean Std. Dev. MCSE Median [95% Cred. Interval] wage education .8981219 .0509913 .001578 .8973616 .8013416 1.000497 age .1477784 .01854 .00066 .1477496 .1115628 .1850257 _cons 5.994764 .890318 .030657 6.014622 4.150738 7.658942 select married .4351031 .0748102 .003577 .4377313 .2821176 .5752786 children .4501657 .0285028 .001045 .4492015 .3937091 .5048498 education .0584037 .0110582 .000524 .0579573 .0370387 .0814287 age .034779 .0043677 .00022 .0348894 .0259916 .043139 _cons -2.47607 .1962162 .009818 -2.467739 -2.862694 -2.10733 athrho .0062804 .010209 .00023 .0062746 -.014139 .0261746 lnsigma 1.69586 .019056 .000386 1.695649 1.65948 1.734115
```Note: Default priors are used for some model parameters.

. estimates store nosel
```

We now use bayesstats ic to obtain the Bayes factor of the two models.

```. bayesstats ic heckman nosel

Bayesian information criteria

```
 DIC log(ML) log(BF) heckman 10376.05 -5260.202 . nosel 10435.29 -5283.025 -22.82221
```Note: Marginal likelihood (ML) is computed
using Laplace-Metropolis approximation.```

The value of the log-Bayes factor of -23 indicates a very strong preference for the sample-selection model heckman and thus for the presence of sample selection in these data.

Tell me more

Learn more about the general features of the bayes prefix.