Publication:
How to Handle Data Imbalance and Feature Selection Problems in CNN-Based Stock Price Forecasting

dc.authorscopusid57205617688
dc.authorscopusid22953804000
dc.authorwosidKiliç, Erdal/Hjy-2853-2023
dc.authorwosidAkşehi̇r, Zinnet Duygu/Gwu-7564-2022
dc.contributor.authorAksehir, Zinnet Duygu
dc.contributor.authorKilic, Erdal
dc.contributor.authorIDKiliç, Erdal/0000-0003-1585-0991
dc.contributor.authorIDAkşehi̇r, Zinnet Duygu/0000-0002-6834-6847
dc.date.accessioned2025-12-11T01:13:32Z
dc.date.issued2022
dc.departmentOndokuz Mayıs Üniversitesien_US
dc.department-temp[Aksehir, Zinnet Duygu; Kilic, Erdal] Ondokuz Mayis Univ, Dept Comp Engn, TR-55139 Samsun, Turkeyen_US
dc.descriptionKiliç, Erdal/0000-0003-1585-0991; Akşehi̇r, Zinnet Duygu/0000-0002-6834-6847;en_US
dc.description.abstractStock market forecasting is a time series problem that aims to predict possible future prices or directions of an index/stock. The stock data contains high uncertainty and is influenced by too many factors; hence it isn't easy to achieve the goal by traditional time series methods. In literature, the convolutional neural networks (CNN) models were used for stock market forecasting and gave successful results. But, data imbalance due to labeling and feature selection problems were seen when considering these models. Hence, this study proposed a new rule-based labeling algorithm and a new feature selection approach to solve the issues. In addition, a CNN-based model, which was presented to predict the next day's trade action of stocks in the Dow30 index, was constructed to check the effectiveness of the data labeling and the feature selection approach. Different image-based input variable sets were created using technical indicators, gold, and oil price data to feed the CNN model. The prediction performance of CNN models was compared with other studies in the literature. The experimental results showed that the CNN prediction model, which uses the proposed feature selection and labeling approaches in this study, performs 3-22% higher accuracy than the CNN-based models taking part in other studies. Also, the labeling approach proposed is more successful than Chen and Huang's data weighting approach to solve the stock data imbalance problem. This algorithm reduced the ratio between labeled data from 15 times to 1.8 times.en_US
dc.description.woscitationindexScience Citation Index Expanded
dc.identifier.doi10.1109/ACCESS.2022.3160797
dc.identifier.endpage31305en_US
dc.identifier.issn2169-3536
dc.identifier.scopus2-s2.0-85127081333
dc.identifier.scopusqualityQ1
dc.identifier.startpage31297en_US
dc.identifier.urihttps://doi.org/10.1109/ACCESS.2022.3160797
dc.identifier.urihttps://hdl.handle.net/20.500.12712/42137
dc.identifier.volume10en_US
dc.identifier.wosWOS:000773269700001
dc.identifier.wosqualityQ2
dc.language.isoenen_US
dc.publisherIEEE-Institute of Electrical and Electronics Engineers Incen_US
dc.relation.ispartofIEEE Accessen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectPredictive Modelsen_US
dc.subjectLabelingen_US
dc.subjectData Modelsen_US
dc.subjectConvolutional Neural Networksen_US
dc.subjectBiological System Modelingen_US
dc.subjectStock Marketsen_US
dc.subjectForecastingen_US
dc.subjectCNN Modelen_US
dc.subjectFeature Selectionen_US
dc.subjectLabelingen_US
dc.subjectStock Predictionen_US
dc.titleHow to Handle Data Imbalance and Feature Selection Problems in CNN-Based Stock Price Forecastingen_US
dc.typeArticleen_US
dspace.entity.typePublication

Files