# An Introduction to Survival Analysis Using Stata

An Introduction to Survival Analysis Using Stata, Revised Third Edition is the ideal tutorial for professional data analysts who want to learn survival analysis for the first time or who are well versed in survival analysis but are not as dexterous in using Stata to analyze survival data. This text also serves as a valuable reference to those readers who already have experience using Stata’s survival analysis routines.

The revised third edition has been updated for Stata 14, and it includes a new section on predictive margins and marginal effects, which demonstrates how to obtain and visualize marginal predictions and marginal effects using the margins and marginsplot commands after survival regression models.

Survival analysis is a field of its own that requires specialized data management and analysis procedures. To meet this requirement, Stata provides the stfamily of commands for organizing and summarizing survival data.

This book provides statistical theory, step-by-step procedures for analyzing survival data, an in-depth usage guide for Stata’s most widely used stcommands, and a collection of tips for using Stata to analyze survival data and to present the results. This book develops from first principles the statistical concepts unique to survival data and assumes only a knowledge of basic probability and statistics and a working knowledge of Stata.

The first three chapters of the text cover basic theoretical concepts: hazard functions, cumulative hazard functions, and their interpretations; survivor functions; hazard models; and a comparison of nonparametric, semiparametric, and parametric methodologies. Chapter 4 deals with censoring and truncation. The next three chapters cover the formatting, manipulation, stsetting, and error checking involved in preparing survival data for analysis using Stata’s st analysis commands. Chapter 8 covers nonparametric methods, including the Kaplan–Meier and Nelson–Aalen estimators and the various nonparametric tests for the equality of survival experience.

Chapters 9–11 discuss Cox regression and include various examples of fitting a Cox model, obtaining predictions, interpreting results, building models, model diagnostics, and regression with survey data. The next four chapters cover parametric models, which are fit using Stata’s streg command. These chapters include detailed derivations of all six parametric models currently supported in Stata and methods for determining which model is appropriate, as well as information on stratification, obtaining predictions, and advanced topics such as frailty models. Chapter 16 is devoted to power and sample-size calculations for survival studies. The final chapter covers survival analysis in the presence of competing risks.

List of tables
List of figures
Preface to the Revised Third Edition
Preface to the Third Edition
Preface to the Second Edition
Preface to the Revised Edition
Preface to the First Edition
Notation and typography

1. THE PROBLEM OF SURVIVAL ANALYSIS

Parametric modeling
Semiparametric modeling
Nonparametric analysis

2. DESCRIBING THE DISTRIBUTION OF FAILURE TIMES

The survivor and hazard functions
The quantile function
Interpreting the cumulative hazard and hazard rate

Interpreting the cumulative hazard
Interpreting the hazard rate

Means and medians

3. HAZARD MODELS

Parametric models
Semiparametric models
Analysis time (time at risk)

4. CENSORING AND TRUNCATION

Censoring

Right-censoring
Interval-censoring
Left-censoring

Truncation

Left-truncation (delayed entry)
Right-truncation
Gaps

5. RECORDING SURVIVAL DATA

The desired format
Other formats
Example: Wide-form snapshot data

6. USING STSET

A short lesson on dates
Purposes of the stset command
Syntax of the stset command

Specifying analysis time
Variables defined by stset
Specifying what constitutes failure
Specifying when subjects exit from the analysis
Specifying when subjects enter the analysis
Specifying the subject-ID variable
Specifying the begin-of-span variable
Convenience options

7. AFTER STSET

Look at stset’s output
Use stdescribe
Use stvary
Perhaps use stfill
Example: Hip-fracture data

8. NONPARAMETRIC ANALYSIS

The Kaplan–Meier estimator

Calculation
Censoring
Left-truncation (delayed entry)
Gaps
Relationship to the empirical distribution function
Other uses of sts list
Graphing the Kaplan–Meier estimate

The Nelson–Aalen estimator
Estimating the hazard function
Estimating mean and median survival times
Tests of hypothesis

The log-rank test
The Wilcoxon test
Other tests
Stratified tests

9. THE COX PROPORTIONAL HAZARD MODEL

Using stcox

The Cox model has no intercept
Interpreting coefficients
The effect of units on coefficients
Estimating the baseline cumulative hazard and survivor functions
Estimating the baseline hazard function
The effect of units on the baseline functions

Likelihood calculations

No tied failures
Tied failures

The marginal calculation
The partial calculation
The Breslow approximation
The Efron approximation

Summary

Stratified analysis

Obtaining coefficient estimates
Obtaining estimates of baseline functions

Cox models with shared frailty

Parameter estimation
Obtaining estimates of baseline functions

Cox models with survey data

Declaring survey characteristics
Fitting a Cox model with survey data
Some caveats of analyzing survival data from complex survey designs

Cox model with missing data—multiple imputation

Imputing missing values
Multiple-imputation inference

10. MODEL BUILDING USING STCOX

Indicator variables
Categorical variables
Continuous variables

Fractional polynomials

Interactions
Time-varying variables

Using stcox, tvc() texp()
Using stsplit

Modeling group effects: fixed-effects, random-effects, stratification, and clustering

11. THE COX MODEL: DIAGNOSTICS

Testing the proportional-hazards assumption

Tests based on reestimation
Test based on Schoenfeld residuals
Graphical methods

Residuals and diagnostic measures

Reye’s syndrome data

Determining functional form
Goodness of fit
Outliers and influential points

12. PARAMETRIC MODELS

Motivation
Classes of parametric models

Parametric proportional hazards models
Accelerated failure-time models
Comparing the two parameterizations

13. A SURVEY OF PARAMETRIC REGRESSION MODELS IN STATA

The exponential model

Exponential regression in the PH metric
Exponential regression in the AFT metric

Weibull regression

Weibull regression in the PH metric

Fitting null models

Weibull regression in the AFT metric

Gompertz regression (PH metric)
Lognormal regression (AFT metric)
Loglogistic regression (AFT metric)
Generalized gamma regression (AFT metric)
Choosing among parametric models

Nested models
Nonnested models

14. POSTESTIMATION COMMANDS FOR PARAMETRIC MODELS

Use of predict after streg

Predicting the time of failure
Predicting the hazard and related functions
Calculating residuals

Using stcurve
Predictive margins and marginal effects

Predictive margins

Marginal mean survival time
Marginal survival probabilities
Multiple-record data

Marginal effects

15. GENERALIZING THE PARAMETRIC REGRESSION MODEL

Using the ancillary() option
Stratified models

Frailty models

Unshared-frailty models
Example: Kidney data
Testing for heterogeneity
Shared-frailty models

16. POWER AND SAMPLE-SIZE DETERMINATION FOR SURVIVAL ANALYSIS

Estimating sample size

Multiple-myeloma data
Comparing two survivor functions nonparametrically
Comparing two exponential survivor functions
Cox regression models

Accounting for withdrawal and accrual of subjects

The effect of withdrawal or loss to follow-up
The effect of accrual
Examples

Estimating power and effect size
Tabulating or graphing results

17. COMPETING RISKS

Cause-specific hazards
Cumulative incidence functions
Nonparametric analysis

Breast cancer data
Cause-specific hazards
Cumulative incidence functions

Semiparametric analysis

Cause-specific hazards

Simultaneous regressions for cause-specific hazards

Cumulative incidence functions

Using stcrreg
Using stcox

Parametric analysis Author: Mario Cleves, William Gould and Yulia V. Marchenko
Edition: Revised Third Edition
ISBN-13: 978-1-59718-174-7