2019 № 3 Development of algorithm for searching of clinically homogeneous patients from semistructured text data of oncological electronic health record
The growth in the number of patients with malignant neoplasms in Russia significantly increases the load on a specialized network of oncological institutions and oncologists. It is most likely that this trend will continue in the coming years. One of the ways to improve the efficiency of medical activity is the extraction knowledge from medical data arrays, using modern data analysis methods, by clustering patients into groups of clinically homogeneous (similar) patients from electronic health records. The aim of the study is to develop an algorithm for finding clinically homogeneous patients according to the electronic health records of the oncological dispensary, with follow-up possibility of integration into the clinical decision support system (CDSS). The use of such CDSS in practical medicine and in the field of medical education will allow us to analyze both semistructured and unstructured arrays of information, which will require further implementation and improvement of information systems at all levels of
medical care. The homogeneity of patients was determined by machine learning by cosine distance in the space of vector representations of electronic health records. An experiment on 20 randomly selected electronic health records of patients of Krasnodar Regional Oncological Dispensary showed high efficiency of the algorithm in creating clusters of clinically homogeneous patients.