Missingness in variables that define study eligibility criteria is a seldom
addressed challenge in electronic health record (EHR)-based settings. It is
typically the case that patients with incomplete eligibility information are
excluded from analysis without consideration of (implicit) assumptions that are
being made, leaving study conclusions subject to potential selection bias. In
an effort to ascertain eligibility for more patients, researchers may look back
further in time prior to study baseline, and in using outdated values of
eligibility-defining covariates may inappropriately be including individuals
who, unbeknownst to the researcher, fail to meet eligibility at baseline. To
the best of our knowledge, however, very little work has been done to mitigate
these concerns. We propose a robust and efficient estimator of the causal
average treatment effect on the treated, defined in the study eligible
population, in cohort studies where eligibility-defining covariates are missing
at random. The approach facilitates the use of flexible machine-learning
strategies for component nuisance functions while maintaining appropriate
convergence rates for valid asymptotic inference. EHR data from Kaiser
Permanente are used as motivation as well as a basis for extensive simulations
that verify robustness properties under various degrees of model
misspecification. The data are also used to demonstrate the use of the method
to analyze differences between two common bariatric surgical interventions for
long-term weight and glycemic outcomes among a cohort of severely obese
patients with type II diabetes mellitus.
Cet article explore les excursions dans le temps et leurs implications.
Télécharger PDF:
2504.16230v1