|国家预印本平台
首页|The epidemiological characteristics of stroke phenotypes defined with ICD-10 and free-text: a cohort study linked to electronic health records

The epidemiological characteristics of stroke phenotypes defined with ICD-10 and free-text: a cohort study linked to electronic health records

The epidemiological characteristics of stroke phenotypes defined with ICD-10 and free-text: a cohort study linked to electronic health records

来源:medRxiv_logomedRxiv
英文摘要

Abstract BackgroundCoded healthcare data may not capture all stroke cases and has limited accuracy for stroke subtypes. We sought to determine the incremental value of adding natural language processing (NLP) of free-text radiology reports to international classification of disease (ICD-10) codes to phenotype stroke, and stroke subtypes, in routinely collected healthcare datasets. MethodsWe linked participants in a community-based prospective cohort study, Generation Scotland, to clinical brain imaging reports (2008-2020) from five Scottish health boards. We used five combinations of NLP outputs and ICD-10 codes to define stroke phenotypes. With these phenotype models we measured the: stroke incidence standardised to a European Standardised Population; adjusted hazard ratio (aHR) of baseline hypertension for later stroke; and proportion of participants allocated stroke subtypes. ResultsOf 19,026 participants, over a mean follow-up of 10.2 years, 1938 had 3493 brain scans. Any stroke was identified in 534 participants: 319 with NLP alone, 59 with ICD-10 codes alone and 156 with both ICD-10 codes and an NLP report consistent with stroke. The stroke aHR for baseline hypertension was 1.47 (95%CI: 1.12-1.92) for NLP-defined stroke only; 1.57 (95%CI: 1.18-2.10) for ICD-10 defined stroke only; and 1.81 (95%CI: 1.20-2.72) for cases with ICD 10 stroke codes and NLP stroke phenotypes. The age-standardised incidence of stroke for these phenotype models was 1.35, 1.34, and 0.65 per 1000 person years, respectively. The proportion of strokes not subtyped was 26% (57/215) using only ICD-10, 9% (42/467) using only NLP, and 12% (65/534) using both NLP and ICD-10. ConclusionsAddition of NLP derived phenotypes to ICD-10 stroke codes identified approximately 2.5 times more stroke cases and greatly increased the proportion with subtyping. The phenotype model using ICD 10 stroke codes and NLP stroke phenotypes had the strongest association with baseline hypertension. This information is relevant to large cohort studies and clinical trials that use routine electronic health records for outcome ascertainment.

Wu Honghan、Campbell Archie、Casey Arlene、Grover Claire、Alex Beatrice、Chalmers Fionna、Ball Emily、Rannikmae Kristiina、Adams Mark、Whiteley William N、Iveson Matthew、McIntosh Andrew M、Whalley Heather、Davidson Emma M

Institute of Health Informatics, University College London||The Alan Turing InstituteCentre for Genomic and Experimental Medicine, Institute of Genetics & Cancer, University of EdinburghAdvanced Care Research Centre, Usher Institute, University of EdinburghInstitute for Language, Cognition and Computation, School of informatics, University of EdinburghSchool of Literatures, Languages and Cultures (LLC), University of Edinburgh||Edinburgh Futures Institute, University of Edinburgh, EdinburghCentre for Genomic and Experimental Medicine, Institute of Genetics & Cancer, University of Edinburgh||Health Data Research UKCentre for Clinical Brain Sciences, University of Edinburgh||Division of Psychiatry, University of EdinburghCentre for Medical Informatics, Usher Institute, University of EdinburghDivision of Psychiatry, University of EdinburghCentre for Clinical Brain Sciences, University of Edinburgh||Health Data Research UK||MRC Population Health Unit, University of OxfordDivision of Psychiatry, University of EdinburghCentre for Genomic and Experimental Medicine, Institute of Genetics & Cancer, University of Edinburgh||Division of Psychiatry, University of EdinburghCentre for Clinical Brain Sciences, University of Edinburgh||Centre for Genomic and Experimental Medicine, Institute of Genetics & Cancer, University of Edinburgh||Division of Psychiatry, University of EdinburghCentre for Clinical Brain Sciences, University of Edinburgh

10.1101/2023.04.03.23288096

医学研究方法神经病学、精神病学临床医学

RadiologyNatural language processingBrain imagingPhenotypingRadiology reportsStrokeelectronic health records

Wu Honghan,Campbell Archie,Casey Arlene,Grover Claire,Alex Beatrice,Chalmers Fionna,Ball Emily,Rannikmae Kristiina,Adams Mark,Whiteley William N,Iveson Matthew,McIntosh Andrew M,Whalley Heather,Davidson Emma M.The epidemiological characteristics of stroke phenotypes defined with ICD-10 and free-text: a cohort study linked to electronic health records[EB/OL].(2025-03-28)[2025-05-10].https://www.medrxiv.org/content/10.1101/2023.04.03.23288096.点此复制

评论