2026 STATA BIOSTATISTICS AND EPIDEMIOLOGY VIRTUAL SYMPOSIUM

AGENDA

10:00 a.m.  Estimating breast cancer incidence using multiple imputation with chained equations (MICE)

       Anna Johansson, Karolinska Institutet

Breast cancer is not one disease but many different subtypes. When estimating breast cancer incidence in the population, we use routine registry data. Information on breast cancer subtype is sometimes missing in these registry data, and such missingness is more common in certain patient groups and thus not random. Hence, it is appropriate to use multiple imputation with chained equations (MICE) when estimating subtype-specific breast cancer incidence. I will give examples on how we have applied MICE to Swedish breast cancer data, which choices we made in order to build an imputation model (using mi impute), as well as challenges in combining the imputed estimates using Rubin’s rules (using mi estimate).

10:30 a.m.  Regression models for accuracy estimation

                   Niels Henrik Bruun, Aalborg University Hospital

I present regression-based methods for estimating and comparing diagnostic accuracy measures while addressing the STARD 2015 requirements. Key metrics include sensitivity, specificity, AUC, PPV, NPV, and accuracy. True-positive and false-positive rates, independent of prevalence, are estimated using OLS regression with robust variance. The derived measures, PPV, NPV, and accuracy, are computed from prevalence, sensitivity, and specificity using nonlinear formulas. For single-modality analysis, sensitivity and specificity are obtained by regressing test outcomes on the “true” values, such as those obtained from pathology. For multimodality studies on the same subjects, data are stacked with a modality indicator, and mixed-effects models with random intercepts are used to account for correlation. A new confreg command combines regression and nonlinear estimation to estimate accuracy metrics under dependency structures. These methods provide a flexible framework for robust comparisons of diagnostic performance across instruments.

11:00 a.m.  Break

11:15 a.m.  wqsreg – A Stata command for weighted quantile sum regression

Marta Ponzano, Department of Life Sciences, Health and Health Professions, Link Campus University, Department of Health Sciences, University of Genoa

Additional authors:

Stefano Renzetti, Department of Medicine and Surgery, University of Parma

Andrea Bellavia, Department of Environmental Health, Harvard T.H. Chan School of Public Health, TIMI Study Group, Brigham and Women’s Hospital, Harvard Medical School

Weighted quantile sum (WQS) regression is a statistical method for quantifying the association between a set of possibly correlated predictors and a health outcome, estimating the joint effect of the predictors as well as their individual contributions to the total effect. We present wqsreg, the first Stata command for WQS regression, implemented for continuous, binary, and count outcomes. The execution of the command involves two sequential steps: 1) estimating the weights and constructing the WQS index under specific constraints and 2) modeling its association with the outcome. wqsreg integrates several flexible components of the framework such as bootstrap, training/validation, and repeated holdout procedures; it returns regression estimates as well as graphical displays of the individual weights. wqsreg requires Stata version 11 or higher and is freely available on GitHub. We present an application of the command on exposome data exploring the association between 38 exposures and a continuous outcome while adjusting for a set of covariates. To the best of our knowledge, wqsreg provides the first command to conduct WQS regression in Stata. We anticipate that our contribution will further promote the use of appropriate statistical methods for handling multiple correlated predictors.

11:45 a.m.  Summarizing data from continuous glucose monitors using the cgmstats package

Natalie Daya Malek, Johns Hopkins University

The use of wearable CGMs is growing rapidly. The latest generation of CGM systems do not require fingerstick calibration, are minimally invasive, and are frequently used in research studies. CGM sensors are typically worn for up to 2 weeks and record interstitial glucose measurements every minute to every 15 minutes, depending on the sensor used. CGM systems generate hundreds of measurements per day and thousands of measurements in one person over a single wear. There is a need for tools that allow researchers to efficiently organize and summarize the wealth of data on glucose patterns produced by CGM systems. We developed the cgmstats package, which generates CGM summary measures from a variety of CGM systems and allows the user to flexibly define ranges and generate data visualizations. We provide an overview of the cgmstats package and examples of its use. The cgmstats package supports rigorous and reproducible analyses of CGM data.

12:15 p.m. Lunch

1:15 p.m.  Demographic estimation and projection methods using Stata: Mortality, fertility, and multistate population dynamics

Demographic estimation and projection methods using Stata: Mortality, fertility, and multistate population dynamics

Reliable demographic analysis in settings with incomplete or imperfect data requires flexible and transparent estimation and projection tools. This paper presents an integrated suite of Stata-based methods for estimating mortality and fertility and for projecting populations by age, sex, and additional characteristics. First, I revisit intercensal approaches to mortality estimation, including census-based, death distribution, and iterative methods, and introduce tools for constructing single-decrement life tables and estimating age-specific net migration using two population age distributions and intercensal deaths. Second, I describe an enhanced implementation of the own-children method for estimating age-specific fertility rates, providing graphical summaries of recent fertility patterns, weighted subgroup estimates, and a wide range of reproductive indicators derived from biological mother-child links. Third, I present a matrix-based projection framework for forecasting population dynamics under specified schedules of fertility, mortality, and migration, supporting one- and two-sex models as well as multistate classifications such as region, race, or health status. Empirical illustrations draw on census and register data from Vietnam, Brazil, and Sweden, demonstrating applicability across diverse demographic contexts. Together, these methods offer a coherent and extensible toolkit for demographic estimation and projection using standard data sources.

2:00 p.m.  Modeling longitudinal core temperature in a crossover trial of farmworkers in California

    Maria Montez Rath, Stanford University

3:00 p.m. Adjourn