Background
Diagnosing jaw deformities requires precise interpretation of cephalometric measurements, but current diagnostic methods often lack accessibility and standardization, particularly for less experienced clinicians. This study examines the application of Large Language Models (LLMs) in diagnosing jaw deformities, aiming to overcome the limitations of various diagnostic methods by harnessing the advanced capabilities of LLMs for enhanced data interpretation. The goal is to provide tools that simplify complex data analysis and make diagnostic processes more accessible and intuitive for clinical practitioners.
Methods
An experiment involving patients with jaw deformities was conducted, where cephalometric measurements (SNB Angle, Facial Angle, Mandibular Unit Length) were converted into text for LLM analysis. Multiple LLMs, including LLAMA-2 variants, GPT 3.5/4, Gemini-Pro, Qwen-2.5 variants, and DeepSeek-R1-Distilled reasoning models, were evaluated against various methods (Threshold-based, Machine Learning Models) using balanced accuracy and F1-score. Ablation studies varied text segmentation and few-shot demonstration selection strategies. A structured error analysis schema using an independent LLM evaluator assessed guideline compliance, reasoning quality, and educational value.
Findings
LLAMA-2–13B and GPT-4 achieved comparable accuracy to traditional methods (63–68 % vs. 62–64 %), with text descriptors improving performance over numerical input. Few-shot demonstration selection using conditional entropy minimization boosted accuracy by 5 %. Larger models eliminated ambiguous classifications, while reasoning models (DeepSeek-R1-Distilled) scored highest in educational value but risked overelaboration.
Interpretation
Integrating LLMs into the diagnosis of jaw deformities marks a significant advancement in making diagnostic processes more accessible and reducing reliance on specialized training. These models serve as valuable auxiliary tools, offering clear, understandable outputs that facilitate easier decision-making for clinicians, particularly those with less experience or in settings with limited access to specialized expertise. Future refinements and adaptations to include more comprehensive and medically specific datasets are expected to enhance the precision and utility of LLMs, potentially transforming the landscape of medical diagnostics.
Reference
J. Lee, X. Xu, D. Kim, H. Deng, T. Kuang, N. Lampen, X. Fang, J. Gateno, P. Yan, "Optimizing In-Context Learning for Large Language Models to Diagnose Mandibular Deformities
,"
Informatics and Health, vol. 3, issue 1, pp. 92-100, March 2026.


