[2026-02-27]Korea Develops Bilingual AI Model for Accurate Medical Record Analysis

The Korea National Institute of Health (NIH), under the Korea Disease Control and Prevention Agency, has announced the development of the nation’s first bilingual artificial intelligence (AI) language model for medical records. This initiative addresses the challenge of analyzing electronic medical records (EMRs) in Korea, where approximately 80% of records are unstructured and contain both Korean and English medical terminology. Traditional single-language AI models have struggled with accuracy due to this linguistic complexity. The new model leverages domain-adaptive pre-training and a bilingual vocabulary to improve understanding and analysis of mixed-language clinical texts.

The bilingual AI model was developed in collaboration with Korea University College of Medicine, led by Professor Hyungjun Joo. It is designed to analyze both Korean and English content within EMRs, specifically targeting the accurate classification of chest CT reports. In validation tests, the model achieved a comprehensive accuracy score of 0.94, a level considered suitable for clinical application. The project is part of the ‘AI Algorithm Technology Development Project for Unstructured Medical Data Analysis,’ supporting researchers and healthcare institutions in utilizing complex medical data.

Implementation of the bilingual AI model involved constructing a specialized vocabulary of 45,000 terms and integrating balanced Korean (54.4%) and English (33.8%) datasets. The model underwent additional domain-specific pre-training using existing models such as KM-BERT and BioBERT. Performance evaluation showed significant improvements in disease classification, with F1 scores for cancer and pneumonia reaching up to 0.9802 and 0.9560, respectively. The model is scheduled for public release on GitHub, enabling broader access for researchers and healthcare providers.

Frequently asked questions include: What makes this model unique? It is the first in Korea to accurately analyze bilingual medical records, addressing the limitations of previous single-language models. How will this impact clinical practice? The model’s high accuracy in classifying unstructured CT reports supports more precise and systematic use of EMR data. When will the model be available? The NIH plans to release it on GitHub for public use, supporting ongoing research and healthcare innovation.


🔗 Original source

🎯 metaqsol opinion:
Metaqsol opinion: The introduction of a bilingual AI model tailored to Korea’s clinical environment is a notable step forward in medical data analysis. By directly addressing the challenge of mixed-language EMRs, the model’s high accuracy supports its potential for real-world clinical use. The public release will encourage further research and innovation, benefiting both healthcare providers and the broader AI community. This development is expected to improve the systematic use of medical data and elevate Korea’s capabilities in healthcare AI.

Leave a Comment