Flexible Parametric Survival Analysis Using Stata: Beyond the Cox Model

Researchers wishing to fit regression models to survival data have long faced the difficult task of choosing between the Cox model and a parametric survival model such as Weibull. Cox models can be fit using Stata’s stcox command, and parametric models are fit using streg, which offers five parametric forms in addition to Weibull. While the Cox model makes minimal assumptions about the form of the baseline hazard function, prediction of hazards and other related functions for a given set of covariates is hindered by this lack of assumptions; the resulting estimated curves are not smooth and do not possess information about what occurs between the observed failure times. Parametric models offer nice, smooth predictions by assuming a functional form of the hazard, but often the assumed form is too structured for use with real data, especially if there exist significant changes in the shape of the hazard over time.

 

This text is concerned with obtaining a compromise between Cox and parametric models that retains the desired features of both types of models. The book is aimed at researchers who are familiar with the basic concepts of survival analysis and with the stcox and streg commands in Stata. As such, it is an excellent complement to An Introduction to Survival Analysis Using Stata, by Cleves et al. (2010).

 

This book is written for Stata 12, but is fully compatible with Stata 11 as well.

 

Much of the text is dedicated to estimation with Royston–Parmar models using the stpm2 command, which is maintained by the authors and available from the Statistical Software Components (SSC) archive at http://www.repec.org. Royston–Parmar models are highly flexible alternatives to the exponential, Weibull, loglogistic, and lognormal models (fit using streg) that allow extension from proportional hazards to proportional odds and to scaled probit models. Additional flexibility is obtained by the use of restricted cubic spline functions as alternatives to the linear functions of log time used in standard models. The authors demonstrate fitting these models and graphing predicted hazards, cumulative hazards, and survival functions with real data from breast cancer and prostate cancer studies.

 

After some introductory material on the motivation behind flexible parametric models and on working with survival data in Stata, the authors proceed by demonstrating that Cox models may instead be expressed as Poisson models by splitting the time scale at the observed failures. The Poisson-model expression allows extension by changing how the time scale is split and by introducing restricted cubic splines and fractional polynomials. Royston–Parmar models are then introduced, followed by material on model building and diagnostics for these models. Considerable attention is then given to time-dependent effects, how these may be modeled, and how to interpret the graphs of the predicted functions the models produce. This material is followed by a chapter on relative survival models such as those used for population-based cancer studies. This chapter is very thorough, relates well to the previous material, and is an ideal introduction for those new to the concepts of relative survival and excess mortality. The final chapter is devoted to advanced topics such as determining the number needed to treat (NNT), handling multiple-event data, and analyzing competing risks.

List of Tables
List of Figures
Preface

 

1. THEORY AND PRACTICE

Goals
A brief review of the Cox proportional hazards model
Beyond the Cox model

Estimating the baseline hazard
The baseline hazard contains useful information
Advantages of smooth survival functions
Some requirements of a practical survival analysis
When the proportional-hazards assumption is breached

Why parametric models?

Smooth baseline hazard and survival functions
Time-dependent HRs
Modeling on different scales
Relative survival
Prediction out of sample
Multiple time scales

Why not standard parametric models?
A brief introduction to stpm2

Estimation (model fitting)
Postestimation facilities (prediction)

Basic relationships in survival analysis
Comparing models
The delta method
Ado-file resources
How our book is organized

 

2. USING STSET AND STSPLIT

What is the stset command?
Some key concepts
Syntax of the stset command
Variables created by the stset command
Examples of using stset

Standard survival data
Using the scale( ) option
Date of diagnosis and date of exit
Date of diagnosis and date of exit with the scale( ) option
Restricting the follow-up time
Left-truncation
Age as the time scale

The stsplit command

Time-dependent effects
Time-varying covariates

Conclusion

 

3. GRAPHICAL INTRODUCTION TO THE PRINCIPAL DATASETS

Introduction
Rotterdam breast cancer data
England and Wales breast cancer data
Orchiectomy data
Conclusion

 

4. POISSON MODELS

Introduction
Modeling rates with the Poisson distribution
Splitting the time scale

The piecewise exponential model
Time as just another covariate

Collapsing the data to speed up computation
Splitting at unique failure times

Technical note: Why the Cox and Poisson approaches are equivalent

Comparing a different number of intervals
Fine splitting of the time scale
Splines: Motivation and definition

Calculating splines
Restricted cubic splines
Splines: Application to the Rotterdam data
Varying the number of knots
Varying the location of the knots
Estimating the survival function

FPs: Motivation and definition

Application to Rotterdam data
Higher order FP models
FP function selection procedure

Discussion

 

5. ROYSTON- PARMAR MODELS

Motivation and introduction

The exponential distribution
The Weibull distribution
Generalizing the Weibull
Estimating the hazard function

Proportional hazards models

Generalizing the Weibull
Example
Comparing parameters of PH(1) and Weibull models

Selecting a spline function

Knot positions

Example

How many knots?

PO models

Introduction
The loglogistic model
Generalizing the loglogistic model
Comparing parameters of PO(1) and loglogistic models

Example

Probit models

Motivation
Generalizing the probit model
Comparing parameters of probit(1) and lognormal models
Comments on probit and POs models

Royston–Parmar (RP) models

Models with ? not equal to 0 or 1
Example
Likelihood function and parameter estimation
Comparing regression coefficients
Model selection
Sensitivity to number of knots
Sensitivity to location of knots

Concluding remarks

 

6. PROGNOSTIC MODELS

Introduction
Developing and reporting a prognostic model
What does the baseline hazard function mean?

Example

Model selection

Choice of scale and baseline complexity

Example

Selection of variables and functional forms

Example

Quantitative outputs from the model

Survival probabilities for individuals
Survival probabilities across the risk spectrum
Survival probabilities at given covariate values
Survival probabilities in groups
Plotting adjusted survival curves
Plotting differences between survival curves
Centiles of the survival distribution

Goodness of fit

Example

Discrimination and explained variation

Example
Harrell’s C index of concordance

Out-of-sample prediction: Concept and applications

Extrapolation of survival functions: Basic technique
Extrapolation of survival functions: Further investigations
Validation of prognostic models: Basics
Validation of prognostic models: Further comments

Visualization of survival times

Example

Discussion

 

7. TIME-DEPENDENT EFFECTS

Introduction
Definitions
What do we mean by a TD effect?
Proportional on which scale?
Poisson models with TD effects

Piecewise models
Using restricted cubic splines

RP models with TD effects

Piecewise HRs
Continuous TD effects
More than one TD effect
Stratification is the same as including TD effects

TD effects for continuous variables
Attained age as the time scale

The orchiectomy data
Proportional hazards model
TD model

Multiple time scales
Prognostic models with TD effects

Example

Discussion

 

8. RELATIVE SURVIVAL

Introduction
What is relative survival?
Excess mortality and relative survival

Excess mortality
Relative survival is a ratio

Motivating example
Life-table estimation of relative survival

Using strs

Poisson models for relative survival

Piecewise models
Restricted cubic splines

RP models for relative survival

Likelihood for relative survival models
Proportional cumulative excess hazards
RP models on other scales
Application to England and Wales breast cancer data
Relative survival models on other scales
Time-dependent effects

Some comments on model selection
Age as a continuous variable
Concluding remarks

 

9. FURTHER TOPICS

Introduction
Number needed to treat

Example

Average and adjusted survival curves

Renal data

Modeling distributions with RP models

Example 1: Rotterdam breast cancer data
Example 2: CD4 lymphocyte data
Example 3: Prostate cancer data

Multiple events

Introduction
The AG model
The WLW model
The PWP model
Multiple events in RP models
Summary

Bayesian RP models

Introduction
The “zeros trick” in WinBUGS
Fitting a RP model
Summary

Competing risks

Summary

Period analysis

Introduction
What is period analysis?
Application to England and Wales breast cancer data

Crude probability of death from relative survival models

Introduction
Application to England and Wales breast cancer data
Conclusion

Final remarks

Author: Patrick Royston and Paul C. Lambert
ISBN-13: 978-1-59718-079-5
©Copyright: 2011
Versione e-Book disponibile

This text is concerned with obtaining a compromise between Cox and parametric models that retains the desired features of both types of models. The book is aimed at researchers who are familiar with the basic concepts of survival analysis and with the stcox and streg commands in Stata.