Development and Optimization of a Comprehensive Data Framework for Cervical Cancer Diagnosis

  • Farouk Sadi Maryam Abacha American University of Niger, Maradi, Niger Republic

Abstract

A global public health menace, cervical cancer is a major public health problem worldwide. Persistent infection with high-risk types of human papillomavirus (HPV) is the primary cause of the disease. Moreover, timely and accurate diagnosis is essential for clinical intervention, but socio-economic and health care system-related factors often preclude effective screening. Current solutions like Pap smears and HPV tests are not perfect or accessible to all regions and in fact, can lead to false positive or negative outcomes. We implement aggregation of both primary health care data as well as secondary datasets obtained from platform sources to create a strong framework for cervical cancer diagnosis. By palpably extended dataset in this way, it reduces the generalizability of machine learning diagnostic models that often grapples with the problems faced in real-world healthcare data like missing items component be afflicted with time few, outlier, and feature scale 250. This data cleansing helps ensure that predictive modeling can proceed without a risk of misleading results; outliers and missing data points are thus handled through systematic approaches such as outlier detection, missing data imputation, and data normalization. Evaluated five classification algorithms based on accuracy, precision, recall and f1 score Logistic Regression, SVM, KNN, Random Forest and Decision Trees The finding showed that Logistic Regression, SVM and KNN (your classifier) have better performance metrics and achieve an accuracy of 86% with lower recall on random forest and decision tree models. This work illustrates it is indeed possible to use an optimized data framework to improve the detection of cervical pathology through machine learning. Although significant advancements were made, the ongoing quest for improved sensitivity and the investigation of hybrid modeling approaches are crucial. Further studies should concentrate on improving the performances of less powerful models and use of more sophisticated data-analytical methods to guarantee that screening strategies provide timely and accurate diagnoses. Performing exploratory data analyses to discover relevant predictive features, using domain knowledge to narrow features of interest, based on the results of the study.


Keywords: Cervical Cancer, Data Framework, Machine Learning, Diagnostic Model, Data Integration, Electronic Health Records (EHR) and Human Papillomavirus (HPV)

Published
2024-12-31
How to Cite
SADI, Farouk. Development and Optimization of a Comprehensive Data Framework for Cervical Cancer Diagnosis. NIU Journal of Social Sciences, [S.l.], v. 10, n. 4, p. 229-236, dec. 2024. ISSN 3007-1690. Available at: <https://kampalajournals.ac.ug/ojs/index.php/niujoss/article/view/2068>. Date accessed: 13 apr. 2026. doi: https://doi.org/10.58709/niujss.v10i4.2068.