2Department of Radiology Ankara Mamak State Hospital, Ankara-Türkiye
3Department of Radiology, Kırıkkale High Specialty Hospital, Kırıkkale-Türkiye DOI : 10.5505/tjo.2026.4733 OBJECTIVE
To evaluate the diagnostic performance of eight current large language models (LLMs) in applying the RECIST 1.1 guidelines for oncologic treatment response imaging and to compare their performance with that of board-certified radiologists. This study explores the potential of LLMs as supportive adjuncts in cancer follow-up imaging.
METHODS
In this observational cross-sectional study, 50 text-based and 30 case-based multiple-choice questions
derived from RECIST 1.1 were administered to eight LLMs with three different prompts and two junior
radiologists with seven years of experience. Responses were independently scored as correct or incorrect,
and non-parametric statistical analyses were performed to compare performance across groups.
RESULTS
LLMs demonstrated promising performance in text-based interpretation about RECIST, with only minor
performance variations. Claude 3.5 Sonnet had the most successful performance, achieving 83.3%
accuracy on case-based and 90% on text-based questions. Other models exhibited robust performance,
with no significant differences in case-based assessments between LLMs and radiologists. LLMs achieved
similar results across the three different prompts with minor variations.
CONCLUSION
LLMs have great potential for response evaluation in oncological imaging and not only support radiologists
but may soon redefine clinical workflows, setting a new benchmark for diagnostic excellence in radiology.




