A Gentle Introduction to Stata

Alan C. Acock’s A Gentle Introduction to Stata, Fifth Edition is aimed at new Stata users who want to become proficient in Stata. After reading this introductory text, new users will be able not only to use Stata well but also to learn new aspects of Stata.

 

Acock assumes that the user is not familiar with any statistical software. This assumption of a blank slate is central to the structure and contents of the book. Acock starts with the basics; for example, the part of the book that deals with data management begins with a careful and detailed example of turning survey data on paper into a Stata-ready dataset on the computer. When explaining how to go about basic exploratory statistical procedures, Acock includes notes that will help the reader develop good work habits. This mixture of explaining good Stata habits and good statistical habits continues throughout the book.

 

Acock is quite careful to teach the reader all aspects of using Stata. He covers data management, good work habits (including the use of basic do-files), basic exploratory statistics (including graphical displays), and analyses using the standard array of basic statistical tools (correlation, linear and logistic regression, and parametric and nonparametric tests of location and dispersion). He also successfully introduces some more advanced topics such as multiple imputation and structural equation modeling in a very approachable manner. Acock teaches Stata commands by using the menus and dialog boxes while still stressing the value of do-files. In this way, he ensures that all types of users can build good work habits. Each chapter has exercises that the motivated reader can use to reinforce the material.

 

The tone of the book is friendly and conversational without ever being glib or condescending. Important asides and notes about terminology are set off in boxes, which makes the text easy to read without any convoluted twists or forward-referencing. Rather than splitting topics by their Stata implementation, Acock arranges the topics as they would appear in a basic statistics textbook; graphics and postestimation are woven into the material in a natural fashion. Real datasets, such as the General Social Surveys from 2002 and 2006, are used throughout the book.

 

The focus of the book is especially helpful for those in the behavioral and social sciences because the presentation of basic statistical modeling is supplemented with discussions of effect sizes and standardized coefficients. Various selection criteria, such as semipartial correlations, are discussed for model selection. Acock also covers a variety of commands available for evaluating reliability and validity of measurements.

 

The fifth edition of the book includes two new chapters that cover multilevel modeling and item response theory (IRT) models. The multilevel modeling chapter demonstrates how to fit linear multilevel models using the mixed command. Acock discusses models with both random intercepts and random coefficients, and he provides a variety of examples that apply these models to longitudinal data. The IRT chapter introduces the use of IRT models for evaluating a set of items designed to measure a specific trait such as an attitude, a value, or a belief. Acock shows how to use the irt suite of commands, which are new in Stata 14, to fit IRT models and to graph the results. In addition, he presents a measure of reliability that can be computed when using IRT.

List of figures
List of tables
List of boxed tips
Preface
Support materials for the book

 

1. GETTING STARTED

Conventions
Introduction
The Stata screen
Using an existing dataset
An example of a short Stata session
Video aids to learning Stata
Summary
Exercises

 

2. ENTERING DATA

Creating a dataset
An example questionnaire
Developing a coding system
Entering data using the Data Editor

Value labels

The Variables Manager
The Data Editor (Browse) view
Saving your dataset
Checking the data
Summary
Exercises

 

3. PREPARING DATA FOR ANALYSIS

Introduction
Planning your work
Creating value labels
Reverse-code variables
Creating and modifying variables
Creating scales
Save some of your data
Summary
Exercises

 

4. WORKING WITH COMMANDS, DO-FILES, AND RESULTS

Introduction
How Stata commands are constructed
Creating a do-file
Copying your results to a word processor
Logging your command file
Summary
Exercises

 

5. DESCRIPTIVE STATISTICS AND GRAPHS FOR ONE VARIABLE

Descriptive statistics and graphs
Where is the center of a distribution?
How dispersed is the distribution?
Statistics and graphs—unordered categories
Statistics and graphs—ordered categories and variables
Statistics and graphs—quantitative variables
Summary
Exercises

 

6. STATISTICS AND GRAPHS FOR TWO CATEGORICAL VARIABLES

Relationship between categorical variables
Cross-tabulation
Chi-squared test

Degrees of freedom

Probability tables

Percentages and measures of association
Odds ratios when dependent variable has two categories
Ordered categorical variables
Interactive tables
Tables—linking categorical and quantitative variables
Power analysis when using a chi-squared test of significance
Summary
Exercises

 

7. TESTS FOR ONE OR TWO MEANS

Introduction to tests for one or two means
Randomization
Random sampling
Hypotheses
One-sample test of a proportion
Two-sample test of a proportion
One-sample test of means
Two-sample test of group means

Testing for unequal variances

Repeated-measures t test
Power analysis
Nonparametric alternatives

Mann–Whitney two-sample rank-sum test
Nonparametric alternative: Median test

Video tutorial related to this chapter
Summary
Exercises

 

8. BIVARIATE CORRELATION AND REGRESSION

Introduction to bivariate correlation and regression
Scattergrams
Plotting the regression line
An alternative to producing a scattergram, binscatter
Correlation
Regression
Spearman’s rho: Rank-order correlation for ordinal data
Power analysis with correlation
Summary
Exercises

 

9. ANALYSIS OF VARIANCE

The logic of one-way analysis of variance
ANOVA example
ANOVA example with nonexperimental data
Power analysis for one-way ANOVA
A nonparametric alternative to ANOVA
Analysis of covariance
Two-way ANOVA
Repeated-measures design
Intraclass correlation—measuring agreement
Power analysis with ANOVA

Power analysis for one-way ANOVA
Power analysis for two-way ANOVA
Power analysis for repeated-measures ANOVA
Summary of power analysis for ANOVA

Summary
Exercises

 

10. MULTIPLE REGRESSION

Introduction to multiple regression
What is multiple regression?
The basic multiple regression command
Increment in R-squared: Semipartial correlations
Is the dependent variable normally distributed?
Are the residuals normally distributed?
Regression diagnostic statistics

Outliers and influential cases

Influential observations: DFbeta

Combinations of variables may cause problems

Weighted data
Categorical predictors and hierarchical regression
A shortcut for working with a categorical variable
Fundamentals of interaction
Nonlinear relations

Fitting a quadratic model

Centering when using a quadratic term

Do we need to add a quadratic component?

Power analysis in multiple regression
Summary
Exercises

 

11. LOGISTIC REGRESSION

Introduction to logistic regression
An example
What is an odds ratio and a logit?

The odds ratio

The logit transformation

Data used in the rest of the chapter
Logistic regression
Hypothesis testing

Testing individual coefficients

Testing sets of coefficients

More on interpreting results from logistic regression
Nested logistic regressions
Power analysis when doing logistic regression
Next steps for using logistic regression and its extensions
Summary
Exercises

 

12. MEASUREMENT, RELIABILITY, AND VALIDITY

Overview of reliability and validity
Constructing a scale

Generating a mean score for each person

Reliability

Stability and test–retest reliability

Equivalence

Split-half and alpha reliability—internal consistency

Kuder–Richardson reliability for dichotomous items

Rater agreement—kappa (K)

Validity

Expert judgment

Criterion-related validity

Construct validity

Factor analysis
PCF analysis

Orthogonal rotation: Varimax

Oblique rotation: Promax

But we wanted one scale, not four scales

Scoring our variable

Summary
Exercises

 

13. WORKING WITH MISSING VALUES— MULTIPLE IMPUTATION

The nature of the problem
Multiple imputation and its assumptions about the mechanism for missingness
What variables do we include when doing imputations?
Multiple imputation
A detailed example

Preliminary analysis

Setup and multiple-imputation stage

The analysis stage

For those who want an R2 and standardized ?s

When impossible values are imputed

Summary
Exercises

 

14. THE SEM AND GSEM COMMANDS

Linear regression using sem

Using the SEM Builder to fit a basic regression model

A quick way to draw a regression model and a fresh start

Using sem without the SEM Builder

The gsem command for logistic regression

Fitting the model using the logit command

Fitting the model using the gsem command

Path analysis and mediation
Conclusions and what is next for the sem command
Exercises

 

15. AN INTRODUCTION TO MULTILEVEL ANALYSIS

Questions and data for groups of individuals
Questions and data for a longitudinal multilevel application
Fixed-effects regression models
Random-effects regression models
An applied example

Research questions

Reshaping data to do multilevel analysis

A quick visualization of our data
Random-intercept model

Random intercept—linear model

Random-intercept model—quadratic term

Treating time as a categorical variable

Random-coefficients model
Including a time-invariant covariate
Summary
Exercises

 

16. ITEM RESPONSE THEORY (IRT)

How are IRT measures of variables different from summated scales?
Overview of three IRT models for dichotomous items

The one-parameter logistic (1PL) model

The two-parameter logistic (2PL) model

The three-parameter logistic (3PL) model

Fitting the 1PL model using Stata

The estimation

How important is each of the items?

An overall evaluation of our scale

Estimating the latent score

Fitting a 2PL IRT model

Fitting the 2PL model

The graded response model—IRT for Likert-type items

The data

Fitting our graded response model

Estimating a person’s score

Reliability of the fitted IRT model
Using the Stata menu system
Extensions of IRT
Exercises

 

A. WHAT’S NEXT?

Introduction to the appendix
Resources

Web resources

Books about Stata

Short courses

Acquiring data

Learning from the postestimation methods

Summary

Author: Alan C. Acock
Edition: Fifth Edition
ISBN-13: 978-1-59718-185-3
©Copyright: 2016
Versione e-Book disponibile

Alan C. Acock’s A Gentle Introduction to Stata, Fifth Edition is aimed at new Stata users who want to become proficient in Stata. After reading this introductory text, new users will be able not only to use Stata well but also to learn new aspects of Stata.