中文电子病历数据元抽取方法
Method for Extracting Data Elements from Chinese Electronic Medical Records
目的/意义从电子病历中抽取遵循国家标准的数据元,有助于实现电子病历数据的细粒度共享。方法/过程提出一种中文电子病历数据元抽取方法。首先利用ALBERT、BiLSTM和CRF模型对电子病历进行序列标注,并根据标注结果生成一组候选数据元;然后针对每个候选数据元,采集其上下文信息并形成一个增强的键向量;最后计算该向量与标准向量之间的相似度,据此判断候选数据元是否有效。结果/结论结果显示,该方法的F1值为90.32%,效果较好;不足之处是实验数据集规模较小且数据元类型分布不均衡。
Purpose/Significance Extracting data elements that comply with national standards from EMR (Electronic Medical Records) can help to achieve fine-grained sharing of EMR data. Method/Process This paper proposes a method for extracting data elements from Chinese EMRs. Firstly, it uses the ALBERT, BILSTM and CRF models to perform sequence labeling on EMRs, and generates a set of candidate data elements based on labeling results. Then, for any candidate data element, its contextual information is collected to form an enhanced key vector. Finally, the similarity between the vector and the standard vector is calculated to determine whether the candidate data element is valid. Result/Conclusion The results show that the F1 value is 90.32%, indicating good performance. The shortcomings are the small size of the experimental dataset and the uneven distribution of data element types.
郭维嘉
医学研究方法计算技术、计算机技术
电子病历数据元LBERT序列标注oken向量
Electronic medical recordata elementLBERTSequence labelingoken vector
郭维嘉.中文电子病历数据元抽取方法[EB/OL].(2023-12-13)[2025-08-16].https://www.biomedrxiv.org.cn/article/doi/bmr.202404.00038.点此复制
评论