- Kırıkkale Üniversitesi Tıp Fakültesi Dergisi
- Cilt: 25 Sayı: 3
- COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN ...
COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS
Authors : Ibrahim Sarbay, Göksu Bozdereli Berikol, Ibrahim Ulaş Özturan, Keith Grimes
Pages : 482-521
Doi:10.24938/kutfd.1369468
View : 109 | Download : 157
Publication Date : 2023-12-26
Article Type : Research
Abstract :Objective: Being publicly available, easy to use, and continuously evolving, next-generation chatbots have the potential to be used in triage, one of the most critical functions of an Emergency Department. The aim of this study was to assess the performance of Generative Pre-trained Transformer 4 (GPT-4), Bard and Claude during decision-making for Emergency Department triage. Material and Methods: This was a preliminary cross-sectional study conducted with 50 case scenarios. Emergency Medicine specialists determined the reference Emergency Severity Index triage category of each scenario. Subsequently, each case scenario was queried using three chatbots. Inconsistent classifications between the chatbots and references were defined as over-triage (false positive) or under-triage (false negative). The primary and secondary outcomes were the predictive performance of chatbots and the difference between them in predicting high acuity triage. Results: F1 Scores for GPT-4, Bard, and Claude for predicting Emergency Severity Index 1 and 2 were 0.899, 0.791, and 0.865 respectively. The ROC Curve of GPT-4 for high acuity predictions showed an area under the curve (AUC) of 0.911 (95% CI: 0,814-1; p<0.001), while Bard showed an AUC of 0.819 (95% CI: 0.692-0.945; p<0.001) and for Claude this was 0.881 (95% CI:0.768-0.994; p<0.001). Conclusion: GPT-4, in its current form, was able to detect high acuity Emergency Severity Index scores in our case set and had close agreement with Emergency Medicine specialists, followed by Claude, while Bard\'s agreement was relatively lower. GPT-4 and Claude provided better results than Bard in case management recommendations. We believe that studies evaluating the effectiveness and limitations of chatbots in triage are important because of their future potential.Keywords : Yapay zekâ, tanı, triyaj