How existing machine learning models for DDoS detection differ in performance and accuracy when applied to synthetic versus real-world network traffic datasets
DOI:
https://doi.org/10.11113/oiji2025.13n2.349Keywords:
DDoS, Cloud, Security, Real-World Dataset, Synthetic DatasetAbstract
Machine learning–based DDoS detection systems frequently report exceptionally high performance, often exceeding 98–99% accuracy. However, such results are predominantly derived from synthetic, laboratory-generated datasets that fail to capture the complexity, variability, and noise of real operational environments. This phenomenon is not unique to cybersecurity; similar patterns have been observed in applied health technologies such as remote blood pressure monitoring, where machine learning models trained on controlled clinical datasets often demonstrate inflated performance but struggle to generalize to real-world home monitoring conditions. This paper empirically demonstrates how multiple machine learning models achieve near-perfect performance when evaluated on controlled, laboratory-created DDoS datasets. Using two widely adopted benchmark datasets, the evaluated models achieved accuracies close to 99%. However, when the same learning methods were applied to a real-world dataset constructed from 28 months of unsolicited network traffic, model accuracy declined to approximately 92%.














