COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS

Home Page
About
Submit A Journal
Submit A Conference
Submit Paper/Book
- Submit a Preprint
- Submit a Book
Publisher/Editor Panel
- Sign In/Sign Up

Kırıkkale Üniversitesi Tıp Fakültesi Dergisi
Cilt: 25 Sayı: 3
COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN ...

COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS

Authors : Ibrahim Sarbay, Göksu Bozdereli Berikol, Ibrahim Ulaş Özturan, Keith Grimes

Pages : 482-521

Doi:10.24938/kutfd.1369468

View : 109 | Download : 157

Publication Date : 2023-12-26

Article Type : Research

Abstract :Objective: Being publicly available, easy to use, and continuously evolving, next-generation chatbots have the potential to be used in triage, one of the most critical functions of an Emergency Department. The aim of this study was to assess the performance of Generative Pre-trained Transformer 4 (GPT-4), Bard and Claude during decision-making for Emergency Department triage. Material and Methods: This was a preliminary cross-sectional study conducted with 50 case scenarios. Emergency Medicine specialists determined the reference Emergency Severity Index triage category of each scenario. Subsequently, each case scenario was queried using three chatbots. Inconsistent classifications between the chatbots and references were defined as over-triage (false positive) or under-triage (false negative). The primary and secondary outcomes were the predictive performance of chatbots and the difference between them in predicting high acuity triage. Results: F1 Scores for GPT-4, Bard, and Claude for predicting Emergency Severity Index 1 and 2 were 0.899, 0.791, and 0.865 respectively. The ROC Curve of GPT-4 for high acuity predictions showed an area under the curve (AUC) of 0.911 (95% CI: 0,814-1; p<0.001), while Bard showed an AUC of 0.819 (95% CI: 0.692-0.945; p<0.001) and for Claude this was 0.881 (95% CI:0.768-0.994; p<0.001). Conclusion: GPT-4, in its current form, was able to detect high acuity Emergency Severity Index scores in our case set and had close agreement with Emergency Medicine specialists, followed by Claude, while Bard\'s agreement was relatively lower. GPT-4 and Claude provided better results than Bard in case management recommendations. We believe that studies evaluating the effectiveness and limitations of chatbots in triage are important because of their future potential.
Keywords : Yapay zekâ, tanı, triyaj

ORIGINAL ARTICLE URL

VIEW PAPER (PDF)

All Rights Reserved. İzmir Akademi Derneği
CopyRight © 2024