Classification of Movie Recommendation on Netflix Using Random Forest Algorithm
DOI:
https://doi.org/10.26877/asset.v6i3.676Keywords:
Decision Tree, Feature Selection, Random Forest, Netflix, RecommendationAbstract
Netflix is one of the most popular streaming platforms in this world. So many movies and shows with various genres and production countries are available on this platform. Netflix has their own recommendation systems for the subscribers according to their data and algorithm. This research aims to compare two methods of data classifications using Decision Tree and Random Forest algorithm and make a recommendation system based on Netflix dataset. This paper use feature importance to selecting relevant feature and how n_estimators affect the classification. In this research, Random Forest with 50 trees estimator with 96.84% accuracy before feature selection and 96.92% accuracy after feature selection has the best accuracy compared to the Decision Tree classification. Besides, Decision Tree has only 95.64% accuracy before feature selection and increases to 96.07% accuracy after feature selection. Trees estimator also affect the accuracy of Random Forest classification. After comparing the results, Random Forest with 50 trees estimators using feature selection provides best accuracy and it will be used to predict some similar movies and shows recommendation
References
Mukhsinin, D. A., Rafliansyah, M., Ibrahim, S. A., Rahmaddeni, R., & Wulandari, D. (2024). Implementasi Algoritma Decision Tree untuk Rekomendasi Film dan Klasifikasi Rating pada Platform Netflix. MALCOM: Indonesian Journal of Machine Learning and Computer Science, 4(2), 570–579. https://doi.org/10.57152/malcom.v4i2.1255
Maulidah, M., Gata, W., Aulianita, R., Agustyaningrum, C. I., Studi, P., Komputer, I., & Mandiri, N. (2020). Algoritma Klasifikasi Decision Tree Untuk Rekomendasi Buku Berdasarkan Kategori Buku. 13(2), 89–96. https://doi.org/10.51903/e-bisnis.v13i2.251
Setiawan, D., Alfiyani, L., Sulistio, J., & Qurtubi, Q. (2024). Utilizing Data Mining Techniques to Analysis Changes in Purchase Behavior of Batik’s Customers. Advance Sustainable Science, Engineering and Technology, 6(2), 02402015. https://doi.org/10.26877/asset.v6i2.18506
Amini, N., Saragih, T. H., Faisal, M. R., Farmadi, A., Abadi, F., Komputer, I., Dan Ilmu, M., Alam, P., Lambung, U., Jalan, M., Ahmad, J., Km, Y., & Selatan, K. (n.d.). JIP (Jurnal Informatika Polinema) Implementasi Algoritma Genetika Untuk Seleksi Fitur Pada Klasifikasi Genre Musik Menggunakan Metode Random Forest. https://doi.org/10.33795/jip.v9i1.1028
Fanani, N. M. A. (2024). Sistem Rekomendasi Film Menggunakan Metode K-NN. Jurnal Ilmiah Sistem Informasi Dan Ilmu Komputer, 4(1), 178–185. https://doi.org/10.55606/juisik.v4i1.760
Alam, L. (2024). Implementation of the Adaboost Method to Increase the Accuracy of Early Diabetes Predictions to Prevent Death Decision Tree-Based. Advance Sustainable Science, Engineering and Technology, 6(2), 0240207. https://doi.org/10.26877/asset.v6i2.18342
Dwiyani, L. K. D., Suarjaya, I. M. A. D., & Rusjayanthi, N. K. D. (2023). Classification of Explicit Songs Based on Lyrics Using Random Forest Algorithm. Journal of Information Systems and Informatics, 5(2), 550–567. https://doi.org/10.51519/journalisi.v5i2.491
Çetin, V., & Yıldız, O. (2022). A comprehensive review on data preprocessing techniques in data analysis. Pamukkale University Journal of Engineering Sciences, 28(2), 299–312.
Fan, C., Chen, M., Wang, X., Wang, J., & Huang, B. (2021). A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery From Building Operational Data. In Frontiers in Energy Research (Vol. 9). Frontiers Media S.A. https://doi.org/10.3389/fenrg.2021.652801
Lan, T., Hu, H., Jiang, C., Yang, G., & Zhao, Z. (2020). A comparative study of decision tree, random forest, and convolutional neural network for spread-F identification. Advances in Space Research, 65(8), 2052–2061.
Wang, H. (n.d.). Research on the Application of Random Forest-based Feature Selection Algorithm in Data Mining Experiments. In IJACSA) International Journal of Advanced Computer Science and Applications (Vol. 14, Issue 10).
Navisa, S., Hakim, L., Nabilah, A., Informasi, S., Sains, F., Teknologi, D., Sunan, U., Uin, A., & Ampel, S. (2021). Komparasi Algoritma Klasifikasi Genre Musik pada Spotify Menggunakan CRISP-DM. In Jurnal Sistem Cerdas.
Olisah, C. C., Smith, L., & Smith, M. (2022). Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. Computer Methods and Programs in Biomedicine, 220.
Chen, D. (2024). Walmart sales prediction based on random forest model and application of feature importance. Applied and Computational Engineering, 53(1), 264–273. https://doi.org/10.54254/2755-2721/53/20241461
Pehlivan S, İşler Y. Detection of heart disease risk utilizing correlation matrix, random forest and permutation feature importance approaches. Akıllı Sistemler ve Uygulamaları Dergisi (Journal of Intelligent Systems with Applications) 2020; 3(1): 29-34.Sandag, G. A. (2020).
Ratnasingam, S., & Muñoz-Lopez, J. (2023). Distance Correlation-Based Feature Selection in Random Forest. Entropy, 25(9), 1250. https://doi.org/10.3390/e25091250
Khaire, U. M., & Dhanalakshmi, R. (2022). Stability of feature selection algorithm: A review. In Journal of King Saud University - Computer and Information Sciences (Vol. 34, Issue 4, pp. 1060–1073). King Saud bin Abdulaziz University.
Kamila, I. P., Sari, C. A., Rachmawanto, E. H., & Cahyo, N. R. D. (2023). A Good Evaluation Based on Confusion Matrix for Lung Diseases Classification using Convolutional Neural Networks. Advance Sustainable Science, Engineering and Technology, 6(1), 0240102. https://doi.org/10.26877/asset.v6i1.17330
Disha, R.A., Waheed, S. Performance analysis of machine learning models for intrusion detection system using Gini Impurity-based Weighted Random Forest (GIWRF) feature selection technique. Cybersecurity 5, 1 (2022). https://doi.org/10.1186/s42400-021-00103-8
Sage, A. J., Genschel, U., & Nettleton, D. (2020). Tree aggregation for random forest class probability estimation. Statistical Analysis and Data Mining, 13(2), 134–150. https://doi.org/10.1002/sam.11446
Netflix TV Shows and Movies. (2022, July 26). Kaggle. https://www.kaggle.com/datasets/victorsoeiro/netflix-tv-shows-and-movies
Saarela, M., & Jauhiainen, S. (2021). Comparison of feature importance measures as explanations for classification models. SN Applied Sciences, 3(2). https://doi.org/10.1007/s42452-021-04148-9