PROMPT ENGINEERING IN MEDICINE: HARASSING THE POTENTIAL OF ARTIFICIAL INTELLIGENCE IN CLINICAL PRACTICE, EDUCATION, AND RESEARCH
Keywords:
Prompt engineering, large language models, clinical decision support, medical education, artificial intelligence in healthcare, evidence-based medicine, digital literacy.Abstract
Large language models (LLMs) are transforming approaches to medical information processing, but the quality of their output critically depends on the structure of the input query. Prompt engineering—a methodology for the purposeful construction of text instructions for LLMs—is becoming an essential tool for physician researchers, clinicians, and medical faculty. This article systematizes the basic principles of prompt engineering, including defining a role, context, output format, and reasoning chain. Specific application scenarios are considered in three domains: scientific research (cohort data analysis, systematic literature review, manuscript editing), medical education (problem-based learning, clinical case and MCQ generation), and clinical practice (decision support, differential diagnosis, preoperative planning). Ethical considerations are discussed, including the risk of hallucinations, data privacy, and the essential role of expert verification. Development prospects are linked to the development of digital literacy as a core competency of medical professionals.
References
Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198.
Nori H, King N, McKinney SM, Carignan D, Horvitz E. Capabilities of GPT-4 on medical competency examinations. arXiv preprint arXiv:2303.13375. 2023.
Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, et al. Large language models encode clinical knowledge. Nature. 2023;620(7972):172–180.
Liévin V, Hother CE, Motzfeldt AG, Winther O. Can large language models reason about medical questions? Patterns. 2024;5(3):100943.
Dave T, Athaluri SA, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell. 2023;6:1169595.
Sallam M. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare. 2023;11(6):887.
Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29(8):1930–1940.
Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N Engl J Med. 2023;388(13):1233–1239.
Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst. 2023;47(1):33.
Wang X, Chen Y, Wang L, Liu Z, Liu J. Prompt engineering in consistency and reliability with the medical AI correspondence. npj Digit Med. 2024;7(1):207.
Wei J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F, et al. Chain-of-thought prompting elicits reasoning in large language models. Adv Neural Inf Process Syst. 2022;35:24824–24837.
Savage T, Nayak A, Gallo R, Rangan E, Chen JH. Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine. npj Digit Med. 2024;7(1):20.
White J, Fu Q, Hays S, Sandborn M, Olea C, Gilbert H, et al. A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv preprint arXiv:2302.11382. 2023.
Tang L, Sun Z, Idnay B, Nestor JG, Soroush A, Elber Melo PA, et al. Evaluating large language models on medical evidence summarization. npj Digit Med. 2023;6(1):158.
Clusmann J, Kolbinger FR, Muti HS, Grberó GI, Nikberó JN, Schneider V, et al. The future landscape of large language models in medicine. Commun Med. 2023;3(1):141.
Agrawal M, Hegselmann S, Lang H, Kim Y, Sontag D. Large language models are few-shot clinical information extractors. arXiv preprint arXiv:2205.12689. 2022.
Sandmann S, Riepenhausen S, Plagwitz L, Dugas M. Systematic analysis of ChatGPT, Google search, and Llama 2 for clinical decision support. Nat Commun. 2024;15(1):2050.
Sallam M, Salim NA, Barakat M, Al-Tammemi AB. ChatGPT applications in medical, dental, pharmacy, and public health education: a descriptive study highlighting the advantages and limitations. Narra J. 2023;3(1):e103.
Safranek CW, Sidamon-Eristoff AE, Gilson A, Chartash D. The role of large language models in medical education: applications and implications. JMIR Med Educ. 2023;9:e50945.
Bhayana R, Krishna S, Bleakney RR. Chatbots and large language models in radiology: a practical primer for clinical and research applications. Radiology. 2024;310(2):e232756.
Strong E, DiGiammarino A, Weng Y, Basaviah P, Hosamani P, Kumar A, et al. Chatbot vs medical student performance on free-response clinical reasoning examinations. JAMA Intern Med. 2023;183(9):1028–1030.
Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How does ChatGPT perform on the United States Medical Licensing Examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:e45312.
Kanjee Z, Crowe B, Rodman A. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA. 2023;330(1):78–80.
Savage T, Nayak A, Gallo R, Rangan E, Chen JH. Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine. npj Digit Med. 2024;7(1):20.
Mehandru N, Miao BY, Almaraz ER, Sushil M, Butte AJ, Alaa A. Evaluating large language models as agents in the clinic. npj Digit Med. 2024;7(1):84.
Chen S, Guevara M, Moningi S, Hoebers F, Kann BH. The effect of using a large language model to respond to patient messages. Lancet Digit Health. 2024;6(5):e379–e381.
Omiye JA, Lester JC, Spichak S, Rotemberg V, Daneshjou R. Large language models propagate race-based medicine. npj Digit Med. 2023;6(1):195.
Meskó B, Topol EJ. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. npj Digit Med. 2023;6(1):120.
Duong D, Solomon BD. Analysis of large-language model versus human performance for genetics questions. Eur J Hum Genet. 2024;32(4):466–468.
Goh E, Gallo R, Hom J, Strong E, Weng Y, Kerman J, et al. Large language model influence on diagnostic reasoning: a randomized clinical vignette study. JAMA Netw Open. 2024;7(10):e2440969.