Data Management Using Stata: A Practical Handbook

Michael N. Mitchell’s Data Management Using Stata comprehensively covers data-management tasks, from those a beginning statistician would need to those hard-to-verbalize tasks that can confound an experienced user. Mitchell does this all in simple language with illustrative examples.

 

The book is modular in structure, with modules based on data-management tasks rather than on clusters of commands. This format is helpful because it allows readers to find and read just what they need to solve a problem at hand. To complement this format, the book is in a style that will teach even sporadic readers good habits in data management, even if the reader chooses to read chapters out of order.

 

Throughout the book, Mitchell subtly emphasizes the absolute necessity of reproducibility and an audit trail. Instead of stressing programming esoterica, Mitchell reinforces simple habits and points out the time-savings gained by being careful. Mitchell’s experience in UCLA’s Academic Technology Services clearly drives much of his advice.

 

Mitchell includes advice for those who would like to learn to write their own data-management Stata commands. Even experienced users will learn new tricks and new ways to approach data-management problems.

 

This is a great book—thoroughly recommended for anyone interested in data management using Stata.

Acknowledgements
List of tables
List of figures
Preface

 

1. INTRODUCTION

Using this book
Overview of this book
Listing observations in this book
The likelihood maximization problem

 

2. READING AND WRITING DATASETS

Introduction
Reading Stata datasets
Saving Stata datasets
Reading comma-separated and tab-separated files
Reading space-separated files
Reading fixed-column files
Reading fixed-column files with multiple lines of raw data per observation
Reading SAS XPORT files
Common errors reading files
Entering data directly into the Stata Data Editor
Saving comma-separated and tab-separated files
Saving space-separated files
Saving SAS XPORT files

 

3. DATA CLEANING

Introduction
Double data entry
Checking individual variables
Checking categorical by categorical variables
Checking categorical by continuous variables
Checking continuous by continuous variables
Correcting errors in data
Identifying duplicates
Final thoughts on data cleaning

 

4. LABELING DATASETS

Introduction
Describing datasets
Labeling variables
Labeling values
Labeling utilities
Labeling variables and values in different languages
Adding comments to your dataset using notes
Formatting the display of variables
Changing the order of variables in a dataset

 

5. CREATING VARIABLES

Introduction
Creating and changing variables
Numeric expressions and functions
String expressions and functions
Recoding
Coding missing values
Dummy variables
Date variables
Date-and-time variables
Computations across variables
Computations across observations
More examples using the egen command
Converting string variables to numeric variables
Converting numeric variables to string variables
Renaming and ordering variables

 

6. COMBINING DATASETS

Introduction
Appending: Appending datasets
Appending: Problems
Merging: One-to-one match-merging
Merging: One-to-many match-merging
Merging: Merging multiple datasets
Merging: Update merges
Merging: Additional options when merging datasets
Merging: Problems merging datasets
Joining datasets
Crossing datasets

 

7. PROCESSING OBSERVATIONS ACROSS SUBGROUPS

Introduction
Obtaining separate results for subgroups
Computing values separately by subgroups
Computing values within subgroups: Subscripting observations
Computing values within subgroups: Computations across observations
Computing values within subgroups: Running sums
Computing values within subgroups: More examples
Comparing the by and tsset commands

 

8. CHANGING THE SHAPE OF YOUR DATA

Introduction
Wide and long datasets
Introduction to reshaping long to wide
Reshaping long to wide: Problems
Introduction to reshaping wide to long
Reshaping wide to long: Problems
Multilevel datasets
Collapsing datasets

 

9. PROGRAMMING FOR DATA MANAGEMENT

Introduction
Tips on long-term goals in data management
Executing do-files and making log files
Automating data checking
Combining do-files
Introducing Stata macros
Manipulating Stata macros
Repeating commands by looping over variables
Repeating commands by looping over numbers
Repeating commands by looping over anything
Accessing results saved from Stata commands
Saving results of estimation commands as data
Writing Stata programs

 

10. ADDITIONAL RESOURCES

Online resources for this book
Finding and installing additional programs
More online resources

 

A. COMMON ELEMENTS

Introduction
Overview of Stata syntax
Working across groups of observations with by
Comments
Data types
Logical expressions
Functions
Subsetting observations with if and in
Subsetting observations and variables with keep and drop
Missing values
Referring to variable lists

Author: Michael N. Mitchell
ISBN-13: 978-1-59718-076-4
©Copyright: 2010
Versione e-Book disponibile

The book is modular in structure, with modules based on data-management tasks rather than on clusters of commands. This format is helpful because it allows readers to find and read just what they need to solve a problem at hand. To complement this format, the book is in a style that will teach even sporadic readers good habits in data management, even if the reader chooses to read chapters out of order.