Stata handles factor (categorical) variables elegantly. You can prefix a variable with i. to specify indicators for each level (category) of the variable. You can put a # between two variables to create an interaction–indicators for each combination of the categories of the variables. You can put ## instead to specify a full factorial of the variables—main effects for each variable and an interaction. If you want to interact a continuous variable with a factor variable, just prefix the continuous variable with c.. You can specify up to eight-way interactions.
We run a linear regression of cholesterol level on a full factorial of age group and whether the person smokes along with a continuous body mass index (bmi) and its interaction with whether the person smokes.
Source | SS df MS | Number of obs = 4,049 | |
F(9, 4039) = 15.30 | |||
Model | 137.845627 9 15.3161808 | Prob > F = 0.0000 | |
Residual | 4044.55849 4,039 1.0013762 | R-squared = 0.0330 | |
Adj R-squared = 0.0308 | |||
Total | 4182.40412 4,048 1.0332026 | Root MSE = 1.0007 |
cholesterol | Coef. Std. Err. t P>|t| [95% Conf. Interval] | |
smoker | ||
smoker | -.7699108 .337665 -2.28 0.023 -1.431921 -.1079012 | |
agegrp | ||
45-49 | .1554985 .0620537 2.51 0.012 .0338391 .2771579 | |
50-54 | .1838839 .0618467 2.97 0.003 .0626303 .3051375 | |
55-59 | .1746813 .0763244 2.29 0.022 .0250433 .3243193 | |
smoker# agegrp | ||
smoker # 45-49 | -.118553 .1367914 -0.87 0.386 -.3867396 .1496336 | |
smoker # 50-54 | -.1332379 .1363604 -0.98 0.329 -.4005796 .1341038 | |
smoker # 55-59 | -.2466412 .1717679 -1.44 0.151 -.5834009 .0901185 | |
bmi | .0253916 .0059336 4.28 0.000 .0137585 .0370246 | |
smoker#c.bmi | ||
smoker | .0501707 .0129223 3.88 0.000 .0248358 .0755055 | |
_cons | 5.437234 .1520921 35.75 0.000 5.139049 5.735418 |
We could have used parenthesis binding, to type the same model more briefly:
Base levels can be changed on the fly: i.agegrp uses the default base level of 1, whereas b3.agegrp makes 3 the base level.
The level indicator variables are not created in your dataset, saving lots of space.
Factor variables are integrated deeply into Stata’s processing of variable lists, providing a consistent way of interacting with both estimation and postestimation commands.