# Data Analysis using Stata

Data Analysis Using Stata, Third Edition has been completely revamped to reflect the capabilities of Stata 12. This book will appeal to those just learning statistics and Stata, as well as to the many users who are switching to Stata from other packages. Throughout the book, Kohler and Kreuter show examples using data from the German Socio-Economic Panel, a large survey of households containing demographic, income, employment, and other key information.

Kohler and Kreuter take a hands-on approach, first showing how to use Stata’s graphical interface and then describing Stata’s syntax. The core of the book covers all aspects of social science research, including data manipulation, production of tables and graphs, linear regression analysis, and logistic modeling. The authors describe Stata’s handling of categorical covariates and show how the new margins and marginsplot commands greatly simplify the interpretation of regression and logistic results. An entirely new chapter discusses aspects of statistical inference, including random samples, complex survey samples, nonresponse, and causal inference.

The rest of the book includes chapters on reading text files into Stata, writing programs and do-files, and using Internet resources such as the search command and the SSC archive.

Data Analysis Using Stata, Third Edition has been structured so that it can be used as a self-study course or as a textbook in an introductory data analysis or statistics course. It will appeal to students and academic researchers in all the social sciences.

List of Tables
List of Figures
Preface

1. “THE FIRST TIME”

Starting Stata

Inputting commands
Files and the working memory
Variables and observations
Looking at data
Interrupting a command and repeating a command
The variable list
The in qualifier
Summary statistics
The if qualifier
Define missing values
The by prefix
Command options
Frequency tables
Graphs
Getting help
Recoding of variables
Variable labels and value labels
Linear regression

Do-files
Exiting Stata
Exercises

2. WORKING WITH DO-FILES

From interactive work to working with a do-file

Alternative 1
Alternative 2

Designing do-files

Line breaks
Some crucial commands

Exercises

3. THE GRAMMAR OF STATA

The elements of Stata commands

Stata commands
The variable list

List of variables: required or optionals
Abbreviation rules
Special listings

Options
The in qualifier
The if qualifier
Expressions

Operators
Functions

Lists of numbers
Using filenames

Repeating similar commands

The by prefix
The foreach loop

The types of foreach lists
Several commands within a foreach loop

The forvalues loop

Weights

Frequency weights
Analytic weights
Probability weights

Exercises

4. GENERAL COMMENTS ON THE STATISTICAL COMMANDS

Regular statistical commands
Estimation commands
Exercises

5. CREATING AND CHANGING VARIABLES

The commands generate and replace

Variable names
Some examples
Useful functions
Changing codes with by, n, and N
Subscripts

Specialized recoding commands

The recode command
The egen command

Recording string variables

Recording date and time

Dates
Time

Setting missing values
Labels
Storage types, or, the ghost in the machine
Exercises

6. CREATING AND CHANGING GRAPHS

A primer on graph syntax
Graph types

Examples
Specialized graphs

Graph elements

Appearance of data

Choice of marker
Marker colors
Marker size
Lines

Graphs and plot regions

Graph size
Plot region
Scaling the axes

Information inside the plot region

Reference lines
Labeling inside the plot region

Information outside the plot region

Labeling the axes
Tick lines
Axis titles
The legend
Graph titles

Multiple graphs

Overlaying numerous twoway graphs
Option by()
Combining graphs

Saving and printing graphs
Exercises

7. DESCRIBING AND COMPARING DISTRIBUTIONS

Categories: Few or many?
Variables with few categories

Tables

Frequency tables
More than one frequency table
Comparing distributions
Summary statistics
More than one contingency table

Graphs

Histograms
Bar charts
Bar charts
Dot chart

Variables with many categories

Frequencies of grouped data

Some remarks on grouping data
Special techniques for grouping data

Describing data using statistics

Important summary statistics
The summarize command
The tabstat command
Comparing distributions using statistics

Graphs

Box plots
Histograms
Kernel density estimation
Quantile plot
Comparing distributions with Q–Q plots

Exercises

8. STATISTICAL INFERENCE

Random samples and sampling distributions

Random numbers
Creating fictitious datasets
Drawing random samples
The sampling distribution

Descriptive inference

Standard errors for simple random samples
Standard errors for complex samples

Typical forms of complex samples
Sampling distributions for complex samples
Using Stata’s svy commands

Standard errors with nonresponse

Unit nonresponse and poststratification weights
Item nonresponse and multiple imputation

Uses of standard errors

Confidence intervals
Significance tests
Two-group mean comparison test

Causal inference

Basic concepts

Data-generating processes
Counterfactual concept of causality

The effect of third-class tickets
Some problems of causal inference

Exercises

9. INTRODUCTION TO LINEAR REGRESSION

Simple linear regression

The basic principle

Linear regression using Stata

The table of coefficients
Standard errors
The table of ANOVA results
The model fit table

Multiple regression

Multiple regression using Stata

Standardized regression coefficients

What does “under control” mean?

Regression diagnostics

Violation of E(?i) = 0

Linearity
Influential cases
Omitted variables
Multicollinearity

Violation of Var(?i) = ?2
Violation of Cov(?i, ?j) = 0, i ? j

Model extensions

Categorical independent variables
Interaction terms

Regression models using transformed variables

Nonlinear relations
Eliminating heteroskedasticity

Reporting regression results

Tables of similar regression models
Plots of coefficients
Conditional-effects plots

Median regression
Regression models for panel data

From wide to long format
Fixed-effects models

Error-component models

Exercises

10. REGRESSION MODELS FOR CATEGORICAL DEPENDENT VARIABLES

The linear probability model
Basic concepts

Odds, log odds, and odds ratios
Excursion: The maximum likelihood principle

Logistic regression with Stata

The coefficients table

Interpretation with odds ratios
Probability interpretation
Average marginal effects

The iteration block
The model fit block

Classification tables
Pearson chi-squared

Logistic regression diagnostics

Linearity
Influential cases

Likelihood-ratio test
Refined models

Nonlinear relationships
Interaction effects

Probit models
Multinomial logistic regression
Models for ordinal data

Exercises

The goal: The data matrix

Reading system files from other packages

Inputting data

Input data using the editor
The input command

Combining data

The GSOEP database
The merge command

Merge 1:1 matches with rectangular data
Merge 1:1 matches with nonrectangular data
Merging more than two files
Merging m:1 and 1:m matches

The append command

Saving and exporting data

Handling lage datasets

Rules for handling the working memory
Using oversized datasets

Exercises

12. DO-FILES FOR ADVANCED USERS AND USERS-WRITTEN PROGRAMS

Two examples of usage
Four programming tools

Local macros

Calculating with local macros
Combining local macros
Changing local macros

Do-files
Programs

The problem of redefinition
The problem of naming
The problem of error checking

User-written Stata commands

Sketch of the syntax
Parsing variable lists
Parsing options
Parsing if and in qualifiers
Generating an unknown number of variables
Default values
Extended macro functions
Avoiding changes in the dataset
Help files

Exercises

13. AROUND STATA

Resources and information
Taking care of Stata