TURKISH JOURNAL OF ONCOLOGY 2026 , Vol 41 , Num 1
The Knowledge of Large Language Models Regarding Response Evaluation Criteria in Solid Tumors: A Comparative Study with Prompt Effect
Eren ÇAMUR1,Turay CESUR2,Yasin Celal GÜNEŞ3
1Department of Radiology, Ankara 29 Mayıs State Hospital, Ankara-Türkiye
2Department of Radiology Ankara Mamak State Hospital, Ankara-Türkiye
3Department of Radiology, Kırıkkale High Specialty Hospital, Kırıkkale-Türkiye
DOI : 10.5505/tjo.2026.4733 OBJECTIVE
To evaluate the diagnostic performance of eight current large language models (LLMs) in applying the RECIST 1.1 guidelines for oncologic treatment response imaging and to compare their performance with that of board-certified radiologists. This study explores the potential of LLMs as supportive adjuncts in cancer follow-up imaging.

METHODS
In this observational cross-sectional study, 50 text-based and 30 case-based multiple-choice questions derived from RECIST 1.1 were administered to eight LLMs with three different prompts and two junior radiologists with seven years of experience. Responses were independently scored as correct or incorrect, and non-parametric statistical analyses were performed to compare performance across groups.

RESULTS
LLMs demonstrated promising performance in text-based interpretation about RECIST, with only minor performance variations. Claude 3.5 Sonnet had the most successful performance, achieving 83.3% accuracy on case-based and 90% on text-based questions. Other models exhibited robust performance, with no significant differences in case-based assessments between LLMs and radiologists. LLMs achieved similar results across the three different prompts with minor variations.

CONCLUSION
LLMs have great potential for response evaluation in oncological imaging and not only support radiologists but may soon redefine clinical workflows, setting a new benchmark for diagnostic excellence in radiology. Keywords : ChatGPT; cancer; large language models; response; treatment