Classification of Movie Recommendation on Netflix Using Random Forest Algorithm

Authors

DOI:

https://doi.org/10.26877/asset.v6i3.676

Keywords:

Decision Tree, Feature Selection, Random Forest, Netflix, Recommendation

Abstract

Netflix is one of the most popular streaming platforms in this world. So many movies and shows with various genres and production countries are available on this platform. Netflix has their own recommendation systems for the subscribers according to their data and algorithm. This research aims to compare two methods of data classifications using Decision Tree and Random Forest algorithm and make a recommendation system based on Netflix dataset. This paper use feature importance to selecting relevant feature and how n_estimators affect the classification. In this research, Random Forest with 50 trees estimator with 96.84% accuracy before feature selection and 96.92% accuracy after feature selection has the best accuracy compared to the Decision Tree classification. Besides, Decision Tree has only 95.64% accuracy before feature selection and increases to 96.07% accuracy after feature selection. Trees estimator also affect the accuracy of Random Forest classification. After comparing the results, Random Forest with 50 trees estimators using feature selection provides best accuracy and it will be used to predict some similar movies and shows recommendation

Author Biographies

Christy Atika Sari, University of Dian Nuswantoro

Eko Hari Rachmawanto, University of Dian Nuswantoro

References

Mukhsinin, D. A., Rafliansyah, M., Ibrahim, S. A., Rahmaddeni, R., & Wulandari, D. (2024). Implementasi Algoritma Decision Tree untuk Rekomendasi Film dan Klasifikasi Rating pada Platform Netflix. MALCOM: Indonesian Journal of Machine Learning and Computer Science, 4(2), 570–579. https://doi.org/10.57152/malcom.v4i2.1255

Maulidah, M., Gata, W., Aulianita, R., Agustyaningrum, C. I., Studi, P., Komputer, I., & Mandiri, N. (2020). Algoritma Klasifikasi Decision Tree Untuk Rekomendasi Buku Berdasarkan Kategori Buku. 13(2), 89–96. https://doi.org/10.51903/e-bisnis.v13i2.251

Setiawan, D., Alfiyani, L., Sulistio, J., & Qurtubi, Q. (2024). Utilizing Data Mining Techniques to Analysis Changes in Purchase Behavior of Batik’s Customers. Advance Sustainable Science, Engineering and Technology, 6(2), 02402015. https://doi.org/10.26877/asset.v6i2.18506

Amini, N., Saragih, T. H., Faisal, M. R., Farmadi, A., Abadi, F., Komputer, I., Dan Ilmu, M., Alam, P., Lambung, U., Jalan, M., Ahmad, J., Km, Y., & Selatan, K. (n.d.). JIP (Jurnal Informatika Polinema) Implementasi Algoritma Genetika Untuk Seleksi Fitur Pada Klasifikasi Genre Musik Menggunakan Metode Random Forest. https://doi.org/10.33795/jip.v9i1.1028

Fanani, N. M. A. (2024). Sistem Rekomendasi Film Menggunakan Metode K-NN. Jurnal Ilmiah Sistem Informasi Dan Ilmu Komputer, 4(1), 178–185. https://doi.org/10.55606/juisik.v4i1.760

Alam, L. (2024). Implementation of the Adaboost Method to Increase the Accuracy of Early Diabetes Predictions to Prevent Death Decision Tree-Based. Advance Sustainable Science, Engineering and Technology, 6(2), 0240207. https://doi.org/10.26877/asset.v6i2.18342

Dwiyani, L. K. D., Suarjaya, I. M. A. D., & Rusjayanthi, N. K. D. (2023). Classification of Explicit Songs Based on Lyrics Using Random Forest Algorithm. Journal of Information Systems and Informatics, 5(2), 550–567. https://doi.org/10.51519/journalisi.v5i2.491

Çetin, V., & Yıldız, O. (2022). A comprehensive review on data preprocessing techniques in data analysis. Pamukkale University Journal of Engineering Sciences, 28(2), 299–312.

Fan, C., Chen, M., Wang, X., Wang, J., & Huang, B. (2021). A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery From Building Operational Data. In Frontiers in Energy Research (Vol. 9). Frontiers Media S.A. https://doi.org/10.3389/fenrg.2021.652801

Lan, T., Hu, H., Jiang, C., Yang, G., & Zhao, Z. (2020). A comparative study of decision tree, random forest, and convolutional neural network for spread-F identification. Advances in Space Research, 65(8), 2052–2061.

Wang, H. (n.d.). Research on the Application of Random Forest-based Feature Selection Algorithm in Data Mining Experiments. In IJACSA) International Journal of Advanced Computer Science and Applications (Vol. 14, Issue 10).

Navisa, S., Hakim, L., Nabilah, A., Informasi, S., Sains, F., Teknologi, D., Sunan, U., Uin, A., & Ampel, S. (2021). Komparasi Algoritma Klasifikasi Genre Musik pada Spotify Menggunakan CRISP-DM. In Jurnal Sistem Cerdas.

Olisah, C. C., Smith, L., & Smith, M. (2022). Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. Computer Methods and Programs in Biomedicine, 220.

Chen, D. (2024). Walmart sales prediction based on random forest model and application of feature importance. Applied and Computational Engineering, 53(1), 264–273. https://doi.org/10.54254/2755-2721/53/20241461

Pehlivan S, İşler Y. Detection of heart disease risk utilizing correlation matrix, random forest and permutation feature importance approaches. Akıllı Sistemler ve Uygulamaları Dergisi (Journal of Intelligent Systems with Applications) 2020; 3(1): 29-34.Sandag, G. A. (2020).

Ratnasingam, S., & Muñoz-Lopez, J. (2023). Distance Correlation-Based Feature Selection in Random Forest. Entropy, 25(9), 1250. https://doi.org/10.3390/e25091250

Khaire, U. M., & Dhanalakshmi, R. (2022). Stability of feature selection algorithm: A review. In Journal of King Saud University - Computer and Information Sciences (Vol. 34, Issue 4, pp. 1060–1073). King Saud bin Abdulaziz University.

Kamila, I. P., Sari, C. A., Rachmawanto, E. H., & Cahyo, N. R. D. (2023). A Good Evaluation Based on Confusion Matrix for Lung Diseases Classification using Convolutional Neural Networks. Advance Sustainable Science, Engineering and Technology, 6(1), 0240102. https://doi.org/10.26877/asset.v6i1.17330

Disha, R.A., Waheed, S. Performance analysis of machine learning models for intrusion detection system using Gini Impurity-based Weighted Random Forest (GIWRF) feature selection technique. Cybersecurity 5, 1 (2022). https://doi.org/10.1186/s42400-021-00103-8

Sage, A. J., Genschel, U., & Nettleton, D. (2020). Tree aggregation for random forest class probability estimation. Statistical Analysis and Data Mining, 13(2), 134–150. https://doi.org/10.1002/sam.11446

Netflix TV Shows and Movies. (2022, July 26). Kaggle. https://www.kaggle.com/datasets/victorsoeiro/netflix-tv-shows-and-movies

Saarela, M., & Jauhiainen, S. (2021). Comparison of feature importance measures as explanations for classification models. SN Applied Sciences, 3(2). https://doi.org/10.1007/s42452-021-04148-9

Downloads

Published

2024-07-27

Issue

Section

Articles