Optimizing Machine Learning Models for Cardiovascular Drug Candidate Screening with XGBoost and Successive Halving

Authors

  • Teuku Rizky Noviandy Department of Information Systems, Faculty of Engineering, Universitas Abulyatama, Aceh Besar 23372, Indonesia
  • Ryan Setiawan Department of Information Systems, Faculty of Engineering, Universitas Abulyatama, Aceh Besar 23372, Indonesia
  • Rahmat Sufri Department of Information Systems, Faculty of Engineering, Universitas Abulyatama, Aceh Besar 23372, Indonesia
  • Anisah Anisah Department of Information Systems, Faculty of Engineering, Universitas Abulyatama, Aceh Besar 23372, Indonesia

DOI:

https://doi.org/10.11113/oiji2025.13n2.339

Keywords:

Classification, Drug discovery, Hyperparameter optimization, QSAR modeling, Virtual screening

Abstract

Thrombin, a key enzyme in the blood coagulation process, is an important target in the development of anticoagulant therapies. This study proposes an efficient and accurate quantitative structure–activity relationship (QSAR) modeling framework that combines the XGBoost algorithm with the Successive Halving (SH) method for hyperparameter optimization. A dataset of 3,145 compounds was collected from the ChEMBL database, and molecular descriptors were generated using the Mordred calculator. After preprocessing and feature selection, the SH-tuned XGBoost model achieved the highest performance, with an accuracy of 85.69%, F1-score of 80.93%, and ROC-AUC of 0.917. Compared to baseline and Random Search-tuned models, the SH-tuned model demonstrated superior predictive accuracy and significantly reduced training time. Confusion matrix analysis confirmed the model’s strong sensitivity and balanced classification of active and inactive compounds. These results illustrate the effectiveness of combining gradient boosting with efficient hyperparameter optimization for virtual screening. The proposed framework reduces computational cost without compromising model quality and can support early-stage drug discovery. Future extensions may include regression modeling, expanded datasets, and integration of biological or pharmacokinetic data to further improve performance and applicability.

Downloads

Published

2025-12-26

How to Cite

Noviandy, T. R., Setiawan, R., Sufri, R., & Anisah, A. (2025). Optimizing Machine Learning Models for Cardiovascular Drug Candidate Screening with XGBoost and Successive Halving. Open International Journal of Informatics, 13(2), 41–50. https://doi.org/10.11113/oiji2025.13n2.339