What’s this about?

ICD-10 codes are the standard for reporting international morbidity and mortality figures. They are also used by many countries to code diagnosis information for healthcare encounters such as a visit to a doctor or an admission to the hospital. The codes can be found in many administrative datasets such as death certificates, hospital discharge records, and medical billing forms.

 

When data are gathered from multiple sources, they may not be fully standardized. There can also be reporting errors. icd10 is designed to address these common challenges with secondary data. Finally, the number of codes means that analyzing the data in a meaningful way is often impossible without summarizing the information. Whether you want to add text to codes or create indicator variables, icd10 makes working with ICD-10 diagnosis codes easy.

 

Let’s see it work

We have 2010 mortality data for the United States—more than 2.4 million deaths.

use female agerc cause place using vital10.dta, clear
(US mortality data, 2010 -- CDC Vital Statistics)

describe 


Contains data from vital10.dta
  obs:     2,472,542                          US mortality data, 2010 -- CDC
                                                Vital Statistics
 vars:             4                          31 Mar 2015 13:46
 size:    32,143,046

storage display value
variable name type format label variable label
female float %9.0g female Decedent is female, female=1,
male=0
place byte %8.0g pod Place of death and status
cause str4 %9s Cause of death (ICD-10 code)
agerc float %14.0g agerc Age, Census recode
Sorted by:

 

We want to identify all deaths due to respiratory illnesses. Any of 275 codes can currently be used to define a respiratory illness, far more than we would ever want to type! A plausible alternative is to use a lookup table, but definitions are often provided in terms of a range of codes, leaving you to type the codes at least once to create the lookup table anyway.

 

icd10 provides a straightforward and fast alternative. All respiratory diagnoses fall in the range of J10 to J98.9, so the only thing we need to do is type

icd10 generate resp = cause, range(J10/J989)

 

You do not need to provide separate ranges for category (3-character) and subcategory (4-character) codes because the range() option of icd10 treats category codes as the lowest value in a range.

 

We may wish to further examine deaths from pneumonia. We want to add an indicator for a pneumonia cause of death to only those decedents that we already know have a respiratory diagnosis.

icd10 gen pneumonia = cause if resp==1, range(J12/J189)

tabulate pneumonia

pneumonia Freq. Percent Cum.
0 187,594 79.07 79.07
1 49,660 20.93 100.00
Total 237,254 100.00

 

We see that about 21% of all deaths from respiratory illnesses in the US in 2010 were from pneumonia.

 

Of course, we can use icd10 for many other tasks, such as checking that codes are defined, adding WHO’s official descriptions of the codes to our dataset, standardizing formats, and more.

 

For now, though, let’s look at a few of the many cases when data management with icd10 is useful.

 

It is useful if you need to identify populations with diseases or causes of deaths for reports. For example, we could create a summary dataset and then export it using export excel:

excel

 

We have further customized the format of this table using Excel cell formatting.

 

It is useful if you want to create nicely labeled frequency plots. Using icd10 generate with the description andlong options combined with Stata’s commands to create and graph summary data, we can create graphs such as

 

Excel spreadsheet

 

It is useful if you are calculating basic epidemiological statistics. For example, we could create a summary dataset and add population information to calculate the number of deaths due to pneumonia by age and sex and then compare age-standardized rates by sex.

contract female agerc if pneumonia==1, freq(pneudeaths)
describe using as2010, short


Contains data                                 Census 2010 population (by age
                                                and sex)
(output omitted)

. describe using 2010, short

Contains data                                 Census 2010 population by age
(output omitted)

. merge 1:1 female agerc using as2010, nogenerate
(output omitted)

. dstdize pneudeaths pop agerc, by(female) using(2010)
(2 observations excluded because of missing values)

-> female= 0
—–Unadjusted—– Std.
Pop. Stratum Pop.
Stratum Pop. Cases Dist. Rate[s] Dst[P] s*P
0-4 yrs 10319427 143 0.068 0.0000 0.065 0.0000
5-14 yrs 20969500 27 0.138 0.0000 0.133 0.0000
15-24 yr 22317842 89 0.147 0.0000 0.141 0.0000
25-34 yr 20632091 206 0.136 0.0000 0.133 0.0000
35-44 yr 20435999 408 0.135 0.0000 0.133 0.0000
45-54 yr 22142359 1038 0.146 0.0000 0.146 0.0000
55-64 yr 17601148 2084 0.116 0.0001 0.118 0.0000
65-74 yr 10096519 3416 0.067 0.0003 0.070 0.0000
75-84 yr 5476762 6809 0.036 0.0012 0.042 0.0001
85+ yrs 1789679 9186 0.012 0.0051 0.018 0.0001
Totals: 151781326 23406 Adjusted Cases: 29474.7
Crude Rate: 0.0002
Adjusted Rate: 0.0002
95% Conf. Interval: [0.0002, 0.0002]
-> female= 1
—–Unadjusted—– Std.
Pop. Stratum Pop.
Stratum Pop. Cases Dist. Rate[s] Dst[P] s*P
0-4 yrs 9881935 114 0.063 0.0000 0.065 0.0000
5-14 yrs 20056351 32 0.128 0.0000 0.133 0.0000
15-24 yr 21308500 61 0.136 0.0000 0.141 0.0000
25-34 yr 20431857 135 0.130 0.0000 0.133 0.0000
35-44 yr 20634607 313 0.131 0.0000 0.133 0.0000
45-54 yr 22864357 789 0.146 0.0000 0.146 0.0000
55-64 yr 18881581 1473 0.120 0.0001 0.118 0.0000
65-74 yr 11616910 2623 0.074 0.0002 0.070 0.0000
75-84 yr 7584360 6534 0.048 0.0009 0.042 0.0000
85+ yrs 3703754 14178 0.024 0.0038 0.018 0.0001
Totals: 156964212 26252 Adjusted Cases: 21813.8
Crude Rate: 0.0002
Adjusted Rate: 0.0001
95% Conf. Interval: [0.0001, 0.0001]
Summary of Study Populations:
female N Crude Adj_Rate Confidence Interval
0 151781326 0.000154 0.000194 [ 0.000192, 0.000197]
1 156964212 0.000167 0.000139 [ 0.000137, 0.000141]

 

Finally, it is useful if you want to create an indicator variable for analysis. For example, we may want to calculate and plot the marginal effect of age group on the probability of pneumonia as the cause of death, after controlling for whether the decedent is female.

logit pneumonia i.female i.agerc
(output omitted)

quietly margins agerc

marginsplot, title("Predictive Margins of Age with 95% CIs") 
               xtitle(Age) xlabel(, angle(45)) 
               ytitle("Pr(Pneumonia Death)")

Excel spreadsheet

 

In short, whether you simply want to verify that your data are valid or are using the codes as a step in a larger project, icd10 provides valuable tools for reporting and research.

 

The ICD-10 codes used in Stata are copyrighted to WHO. To see information about the copyright and updates to the codes, type

icd10 query


ICD-10 Version and Change Log

  License agreement
    ICD-10 codes used by permission of the World Health Organization (WHO), from: International
        Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10)
        2010 Edition. Vols. 1-3. Geneva, World Health Organization, 2011.
    See copyright icd10 for the ICD-10 copyright notification.

  Edition 2010, 2015 update
    Per the license agreement with WHO, "Official WHO Updates combined 1996-2012 Volume 1" was
        reviewed for potential changes scheduled for implementation on January 1, 2015.
    Between 2014 and 2015:
          0 codes added,   0 codes deleted,   0 code descriptions changed.

(output omitted)