Bone Age Estimation Through Hand X-Ray Analysis with Visual Transformer Model

Published in The International Conference on Medicine and Artificial Intelligence in Health, 2025

Abstract

Bone age estimation is essential in pediatric healthcare for assessing growth and diagnosing developmental disorders. Traditional methods, such as the Greulich-Pyle atlas, are time-consuming and prone to inter-observer variability. This study proposes an automated approach using a Vision Transformer (ViT) model to improve the accuracy and efficiency of bone age prediction from hand X-ray images. The model was trained on the public Atlas dataset, which includes 1,390 left-hand X-rays of individuals aged from infancy to 18 years. To address data imbalance and enhance robustness, all images were resized to 512×512 pixels and augmented to 7,393 samples using transformations such as rotation, flipping, and brightness adjustment. The ViT model was optimized using the Adam optimizer and mean squared error (MSE) loss. It achieved a mean absolute error (MAE) of 3.2 months. Predictions within ±3 months of the actual age were considered accurate, resulting in a tolerance-based accuracy of 92%. This clinically meaningful evaluation metric reflects real-world applicability. The ViT model outperformed conventional CNN approaches, demonstrating the strength of transformer-based architectures in capturing complex spatial patterns in medical images. These results support the use of ViT as a reliable and scalable tool for automated bone age assessment in pediatric care.

Certificate

Certificate

Poster

Recommended citation: Author(s). (2025). "Bone Age Estimation Through Hand X-Ray Analysis with Visual Transformer Model." . https://www.researchgate.net/publication/396397607_Bone_Age_Estimation_Through_Hand_X-Ray_Analysis_with_Visual_Transformer_Model