Publication:
Datrel: A Noise-Tolerant Data Relocation Approach for Effective Synthetic Data Generation in Imbalanced Classifiers

dc.authorscopusid57194769905
dc.authorwosidSağlam, Fatih/Aaa-4146-2022
dc.contributor.authorSaglam, Fatih
dc.contributor.authorIDSağlam, Fatih/0000-0002-2084-2008
dc.date.accessioned2025-12-11T01:08:34Z
dc.date.issued2025
dc.departmentOndokuz Mayıs Üniversitesien_US
dc.department-temp[Saglam, Fatih] Ondokuz Mayis Univ, Fac Art & Sci, Dept Stat, Atakum, Samsun, Turkiyeen_US
dc.descriptionSağlam, Fatih/0000-0002-2084-2008en_US
dc.description.abstractMost machine learning algorithms tend to bias towards the majority class when a dataset exhibits a skewed distribution in the class variable. This is called the class imbalance problem and is frequently encountered in real-life applications. One of the most prevalent methods for addressing class imbalance is data resampling, which generates or removes samples to balance the dataset. A well-known issue with oversampling is noise generation. Noise removal or hybrid resampling is used to deal with noise. However, these methods cause imbalance to re-emerge. In this study, a data relocation approach named DatRel is proposed to address the noise generation problem of oversampling without causing imbalance. The proposed approach utilizes pure and proper class cover catch digraphs (P-CCCD) to determine dominant points and cover areas for minority class. Then, new samples from oversampling are drawn to the dominant points until they are covered. This process ensures that newly generated samples never overlap with a negative sample. Imbalance is not affected since no sample is removed by undersampling. The proposed DatRel approach is applied to commonly used oversampling methods, namely SMOTE, ADASYN, and BLSMOTE. Moreover, the performance of the DatRel approach is compared to noise filtering methods such as Tomeklink, ENN, NEATER, and NearMiss after SMOTE. Several baseline classification algorithms are employed, and comparisons are made using various metrics. Results using 49 imbalanced datasets show that DatRel improves classifier performance in oversampling methods and demonstrates its value in comparison to other noise removal techniques according to AUC, BACC, F1, GMEAN, and MCC.en_US
dc.description.sponsorshipScientific and Technological Research Council of Turkiye (TUBITAK)en_US
dc.description.sponsorshipOpen access funding provided by the Scientific and Technological Research Council of Turkiye (TUBITAK). This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.en_US
dc.description.woscitationindexScience Citation Index Expanded
dc.identifier.doi10.1007/s10994-025-06755-8
dc.identifier.issn0885-6125
dc.identifier.issn1573-0565
dc.identifier.issue5en_US
dc.identifier.scopus2-s2.0-105000871302
dc.identifier.scopusqualityQ1
dc.identifier.urihttps://doi.org/10.1007/s10994-025-06755-8
dc.identifier.urihttps://hdl.handle.net/20.500.12712/41576
dc.identifier.volume114en_US
dc.identifier.wosWOS:001451819900002
dc.identifier.wosqualityQ2
dc.institutionauthorSaglam, Fatih
dc.language.isoenen_US
dc.publisherSpringeren_US
dc.relation.ispartofMachine Learningen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectData Relocationen_US
dc.subjectSynthetic Data Generationen_US
dc.subjectClass Imbalanceen_US
dc.subjectOversamplingen_US
dc.subjectNoise-Toleranceen_US
dc.titleDatrel: A Noise-Tolerant Data Relocation Approach for Effective Synthetic Data Generation in Imbalanced Classifiersen_US
dc.typeArticleen_US
dspace.entity.typePublication

Files