A Gentle Introduction to Stata

Alan C. Acock’s A Gentle Introduction to Stata, Revised Sixth Edition is aimed at new Stata users who want to become proficient in Stata. After reading this introductory text, new users will be able to not only use Stata well but also learn new aspects of Stata.

 

Acock assumes that the user is not familiar with any statistical software. This assumption of a blank slate is central to the structure and contents of the book. Acock starts with the basics; for example, the part of the book that deals with data management begins with a careful and detailed example of turning survey data on paper into a Stata-ready dataset. When explaining how to go about basic exploratory statistical procedures, Acock includes notes that will help the reader develop good work habits. This mixture of explaining good Stata habits and explaining good statistical habits continues throughout the book.

 

Acock is quite careful to teach the reader all aspects of using Stata. He covers data management, good work habits (including the use of basic do-files), basic exploratory statistics (including graphical displays), and analyses using the standard array of basic statistical tools (correlation, linear and logistic regression, and parametric and nonparametric tests of location and dispersion). He also successfully introduces some more advanced topics such as multiple imputation and multilevel modeling in a very approachable manner. Acock teaches Stata commands by using the menus and dialog boxes while still stressing the value of Stata commands and do-files. In this way, he ensures that all types of users can build good work habits. Each chapter has exercises that the motivated reader can use to reinforce the material.

 

The tone of the book is friendly and conversational without ever being glib or condescending. Important asides and notes about terminology are set off in boxes, which makes the text easy to read without any convoluted twists or forward referencing. Rather than splitting topics by their Stata implementation, Acock arranges the topics as they would appear in a basic statistics textbook; graphics and postestimation are woven into the material naturally. Real datasets, such as the General Social Surveys from 2002, 2006, and 2016, are used throughout the book.

 

The focus of the book is especially helpful for those in the behavioral and social sciences because the presentation of basic statistical modeling is supplemented with discussions of effect sizes and standardized coefficients. Various selection criteria, such as semipartial correlations, are discussed for model selection. Acock also covers a variety of commands available for evaluating reliability and validity of measurements.

 

The revised sixth edition is fully up to date for Stata 17, including updated discussion and images of Stata’s interface and modern command syntax. In addition, examples include new features such as the table command and collect suite for creating and exporting customized tables as well as the option for creating graphs with transparency.

 

© Copyright 1996–2022 StataCorp LLC

List of figures
List of tables
List of boxed tips
Preface
Support materials for the book

 

1. GETTING STARTED

Conventions
Introduction
The Stata screen
Using an existing dataset
An example of a short Stata session
Video aids to learning Stata
Summary
Exercises

 

2. ENTERING DATA

Creating a dataset
An example questionnaire
Developing a coding system
Entering data using the Data Editor

 

Value labels

 

The Variables Manager
The Data Editor (Browse) view
Saving your dataset
Checking the data
Summary
Exercises

 

3. PREPARING DATA FOR ANALYSIS

Introduction
Planning your work
Creating value labels
Reverse-code variables
Creating and modifying variables
Creating scales
Save some of your data
Summary
Exercises

 

4. WORKING WITH COMMANDS, DO-FILES, AND RESULTS

Introduction
How Stata commands are constructed
Creating a do-file
Copying your results to a word processor
Logging your command file
Summary
Exercises

 

5. DESCRIPTIVE STATISTICS AND GRAPHS FOR ONE VARIABLE

Descriptive statistics and graphs
Where is the center of a distribution?
How dispersed is the distribution?
Statistics and graphs—unordered categories
Statistics and graphs—ordered categories and variables
Statistics and graphs—quantitative variables
Summary
Exercises

 

6. STATISTICS AND GRAPHS FOR TWO CATEGORICAL VARIABLES

Relationship between categorical variables
Cross-tabulation
Chi-squared test

 

Degrees of freedom
Probability tables

 

Percentages and measures of association
Odds ratios when dependent variable has two categories
Ordered categorical variables
Interactive tables
Tables—linking categorical and quantitative variables
Power analysis when using a chi-squared test of significance
Summary
Exercises

 

7. TESTS FOR ONE OR TWO MEANS

Introduction to tests for one or two means
Randomization
Random sampling
Hypotheses
One-sample test of a proportion
Two-sample test of a proportion
One-sample test of means
Two-sample test of group means

 

Testing for unequal variances

 

Repeated-measures t test
Power analysis
Nonparametric alternatives

 

Mann–Whitney two-sample rank-sum test
Nonparametric alternative: Median test

 

Video tutorial related to this chapter
Summary
Exercises

 

8. BIVARIATE CORRELATION AND REGRESSION

Introduction to bivariate correlation and regression
Scattergrams
Plotting the regression line
An alternative to producing a scattergram, binscatter
Correlation
Regression
Spearman’s rho: Rank-order correlation for ordinal data
Power analysis with correlation
Summary
Exercises

 

9. ANALYSIS OF VARIANCE

The logic of one-way analysis of variance
ANOVA example
ANOVA example with nonexperimental data
Power analysis for one-way ANOVA
A nonparametric alternative to ANOVA
Analysis of covariance
Two-way ANOVA
Repeated-measures design
Intraclass correlation—measuring agreement
Power analysis with ANOVA

 

Power analysis for one-way ANOVA
Power analysis for two-way ANOVA
Power analysis for repeated-measures ANOVA
Summary of power analysis for ANOVA

 

Summary
Exercises

 

10. MULTIPLE REGRESSION

Introduction to multiple regression
What is multiple regression?
The basic multiple regression command
Increment in R-squared: Semipartial correlations
Is the dependent variable normally distributed?
Are the residuals normally distributed?
Regression diagnostic statistics

 

Outliers and influential cases
Influential observations: DFbeta
Combinations of variables may cause problems

 

Weighted data
Categorical predictors and hierarchical regression
A shortcut for working with a categorical variable
Fundamentals of interaction
Nonlinear relations

 

Fitting a quadratic model
Centering when using a quadratic term
Do we need to add a quadratic component?

 

Power analysis in multiple regression
Summary
Exercises

 

11. LOGISTIC REGRESSION

Introduction to logistic regression
An example
What is an odds ratio and a logit?

 

The odds ratio
The logit transformation

 

Data used in the rest of the chapter
Logistic regression
Hypothesis testing

 

Testing individual coefficients
Testing sets of coefficients

 

Margins: More on interpreting results from logistic regression
Nested logistic regressions
Power analysis when doing logistic regression
Next steps for using logistic regression and its extensions
Summary
Exercises

 

12. MEASUREMENT, RELIABILITY, AND VALIDITY

Overview of reliability and validity
Constructing a scale

 

Generating a mean score for each person

 

Reliability

 

Stability and test–retest reliability
Equivalence
Split-half and alpha reliability—internal consistency
Kuder–Richardson reliability for dichotomous items
Rater agreement—kappa (K)

 

Validity

 

Expert judgment
Criterion-related validity
Construct validity

 

Factor analysis
PCF analysis

 

Orthogonal rotation: Varimax
Oblique rotation: Promax

 

But we wanted one scale, not four scales

 

Scoring our variable

 

Summary
Exercises

 

13. STRUCTURAL EQUATION AND GENERALIZED STRUCTURAL EQUATION MODELING

Linear regression using sem

 

Using the sem command directly
SEM and working with missing values
Exploring missing values and auxiliary variables
Getting auxiliary variables into your SEM command

 

A quick way to draw a regression model
The gsem command for logistic regression

 

Fitting the model using the logit command
Fitting the model using the gsem command

 

Path analysis and mediation
Conclusions and what is next for the sem command
Exercises

 

14. WORKING WITH MISSING VALUES – MULTIPLE IMPUTATION

Working with missing values—multiple imputation
What variables do we include when doing imputations?
The nature of the problem
Multiple imputation and its assumptions about the mechanism for missingness
Multiple imputation
A detailed example
Preliminary analysis
Setup and multiple-imputation stage
The analysis stage
For those who want an R2 and standardized βs
When impossible values are imputed

Summary
Exercises

 

15. AN INTRODUCTION TO MULTILEVEL ANALYSIS 

Questions and data for groups of individuals
Questions and data for a longitudinal multilevel application
Fixed-effects regression models
Random-effects regression models
An applied example

 

Research questions
Reshaping data to do multilevel analysis

 

A quick visualization of our data
Random-intercept model
Random intercept—linear model
Random-intercept model—quadratic term
Treating time as a categorical variable
Random-coefficients model
Including a time-invariant covariate
Summary
Exercises

 

16. ITEM RESPONSE THEORY (IRT)
How are IRT measures of variables different from summated scales?
Overview of three IRT models for dichotomous items

The one-parameter logistic (1PL) model
The two-parameter logistic (2PL) model
The three-parameter logistic (3PL) model

 

Fitting the 1PL model using Stata

 

The estimation
How important is each of the items?
An overall evaluation of our scale
Estimating the latent score

 

Fitting a 2PL IRT model

 

Fitting the 2PL model

 

The graded response model—IRT for Likert-type items

 

The data
Fitting our graded response model
Estimating a person’s score

 

Reliability of the fitted IRT model
Using the Stata menu system
Extensions of IRT
Exercises

 

A. WHAT’S NEXT?

 

Introduction to the appendix
Resources

 

Web resources
Books about Stata
Short courses
Acquiring data
Learning from the postestimation methods

 

Summary

 

Glossary of acronyms
Glossary of mathematical and statistical symbols
References

© Copyright 1996–2022 StataCorp LLC

Author: Alan C. Acock
Edition: Revised Sixth Edition
ISBN-13: 978-1-59718-367-3
©Copyright: 1996–2023 StataCorp LLC
Versione e-Book disponibile

Alan C. Acock’s A Gentle Introduction to Stata, Revised Sixth Edition is aimed at new Stata users who want to become proficient in Stata. After reading this introductory text, new users will be able to not only use Stata well but also learn new aspects of Stata.