Improving the Accuracy of House Price Prediction using Catboost Regression with Random Search Hyperparameter Tuning: A Comparative Analysis
DOI:
https://doi.org/10.26877/asset.v6i3.602Keywords:
House price prediction, Catboost Regression, Hyperparameter tuning, Random search, King County datasetAbstract
Achieving a significant improvement over traditional models, this study presents a novel approach to house price prediction through the integration of Catboost Regression and Random Search Hyperparameter Tuning. By applying these advanced machine learning techniques to the King County Dataset, we conducted a thorough regression analysis and predictive modeling that resulted in a marked increase in accuracy. The baseline model, a conventional linear regression, provided a foundation for comparison, evaluating performance metrics such as R-squared and Mean Squared Error (MSE). The meticulous hyperparameter tuning of the Catboost model yielded a remarkable improvement in predictive accuracy, demonstrating the efficacy of sophisticated data science techniques in real estate and property valuation. The percentage increase in accuracy over the baseline model is explicitly stated in the abstract.
References
R. E. Febrita, A. N. Alfiyatin, H. Taufiq, and W. F. Mahmudy, “Data-driven fuzzy rule extraction for housing price prediction in Malang, East Java,” in 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Bali: IEEE, Oct. 2017, pp. 351–358. doi: 10.1109/ICACSIS.2017.8355058.
Y. Wang and Q. Zhao, “House Price Prediction Based on Machine Learning: A Case of King County,” Business and Management Research, vol. 211.
Ch. R. Madhuri, G. Anuradha, and M. V. Pujitha, “House Price Prediction Using Regression Techniques: A Comparative Study,” in 2019 International Conference on Smart Structures and Systems (ICSSS), Chennai, India: IEEE, Mar. 2019, pp. 1–5. doi: 10.1109/ICSSS.2019.8882834.
V. S. Rana, J. Mondal, A. Sharma, and I. Kashyap, “House Price Prediction Using Optimal Regression Techniques,” in 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), Greater Noida, India: IEEE, Dec. 2020, pp. 203–208. doi: 10.1109/ICACCCN51052.2020.9362864.
M. Chakraborty, A. Mukhopadhyay, and U. Maulik, “A Comparative Analysis of Different Regression Models on Predicting the Spread of Covid-19 in India,” in 2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA), Greater Noida, India: IEEE, Oct. 2020, pp. 519–524. doi: 10.1109/ICCCA49541.2020.9250748.
Kavitha S, Varuna S, and Ramya R, “A comparative analysis on linear regression and support vector regression,” in 2016 Online International Conference on Green Engineering and Technologies (IC-GET), Coimbatore, India: IEEE, Nov. 2016, pp. 1–5. doi: 10.1109/GET.2016.7916627.
V. Chouvatut and S. Wattanapairotrat, “Feature Reduction from Correlation Matrix for Classification of Two Basil Species in Common Genus,” in 2019 16th International Joint Conference on Computer Science and Software Engineering (JCSSE), Chonburi, Thailand: IEEE, Jul. 2019, pp. 375–380. doi: 10.1109/JCSSE.2019.8864221.
P. Kapoor, P. K. Singh, and A. K. Cherukuri, “IT Act Crime Pattern Analysis using Regression and Correlation Matrix,” in 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India: IEEE, Jun. 2020, pp. 1102–1106. doi: 10.1109/ICRITO48877.2020.9197835.
N. Peng, K. Li, and Y. Qin, “Leveraging Multi-Modality Data to Airbnb Price Prediction,” in 2020 2nd International Conference on Economic Management and Model Engineering (ICEMME), Chongqing, China: IEEE, Nov. 2020, pp. 1066–1071. doi: 10.1109/ICEMME51517.2020.00215.
C. Wang and Q. Gao, “High and Low Prices Prediction of Soybean Futures with LSTM Neural Network,” in 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China: IEEE, Nov. 2018, pp. 140–143. doi: 10.1109/ICSESS.2018.8663896.
J. Ding, Z. Chen, L. Xiaolong, and B. Lai, “Sales Forecasting Based on CatBoost,” in 2020 2nd International Conference on Information Technology and Computer Application (ITCA), Guangzhou, China: IEEE, Dec. 2020, pp. 636–639. doi: 10.1109/ITCA52113.2020.00138.
T. D. Phan, “Housing Price Prediction Using Machine Learning Algorithms: The Case of Melbourne City, Australia,” in 2018 International Conference on Machine Learning and Data Engineering (iCMLDE), Sydney, Australia: IEEE, Dec. 2018, pp. 35–42. doi: 10.1109/iCMLDE.2018.00017.
A. Varma, A. Sarma, S. Doshi, and R. Nair, “House Price Prediction Using Machine Learning and Neural Networks,” in 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), Coimbatore: IEEE, Apr. 2018, pp. 1936–1939. doi: 10.1109/ICICCT.2018.8473231.
M. R. Mubarok and R. Herteno, “HYPER-PARAMETER TUNING PADA XGBOOST UNTUK PREDIKSI KEBERLANGSUNGAN HIDUP PASIEN GAGAL JANTUNG,” vol. 09, 2022.
A. A. Ibrahim, R. L., M. M., R. O., and G. A., “Comparison of the CatBoost Classifier with other Machine Learning Methods,” IJACSA, vol. 11, no. 11, 2020, doi: 10.14569/IJACSA.2020.0111190.
M. J. M. M., U. S., M. B. P., and S. G. Sandhya, “Detection of ransomware in static analysis by using Gradient Tree Boosting Algorithm,” in 2020 International Conference on System, Computation, Automation and Networking (ICSCAN), Pondicherry, India: IEEE, Jul. 2020, pp. 1–5. doi: 10.1109/ICSCAN49426.2020.9262315.
S. V. Boyapati, M. S. Karthik, K. Subrahmanyam, and B. R. Reddy, “An Analysis of House Price Prediction Using Ensemble Learning Algorithms,” Research Reports on Computer Science, pp. 87–96, May 2023, doi: 10.37256/rrcs.2320232639.
M. Massaoudi, S. S. Refaat, H. Abu-Rub, I. Chihi, and F. S. Wesleti, “A Hybrid Bayesian Ridge Regression-CWT-Catboost Model For PV Power Forecasting,” in 2020 IEEE Kansas Power and Energy Conference (KPEC), Manhattan, KS, USA: IEEE, Jul. 2020, pp. 1–5. doi: 10.1109/KPEC47870.2020.9167596.
Fatihah Rahmadayana and Yuliant Sibaroni, “Sentiment Analysis of Work from Home Activity using SVM with Randomized Search Optimization,” RESTI, vol. 5, no. 5, pp. 936–942, Oct. 2021, doi: 10.29207/resti.v5i5.3457.
A. Nugroho and H. Suhartanto, “Hyper-Parameter Tuning based on Random Search for DenseNet Optimization,” in 2020 7th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE), Semarang, Indonesia: IEEE, Sep. 2020, pp. 96–99. doi: 10.1109/ICITACEE50144.2020.9239164.
P. Liashchynskyi and P. Liashchynskyi, “Grid Search, Random Search, Genetic Algorithm: A Big Comparison for NAS.” arXiv, Dec. 12, 2019. Accessed: Jan. 23, 2024. [Online]. Available: http://arxiv.org/abs/1912.06059
R. Shiller, “Understanding Recent Trends in House Prices and Home Ownership,” National Bureau of Economic Research, Cambridge, MA, w13553, Oct. 2007. doi: 10.3386/w13553.