Development and Evaluation of an IndoBERT-Based NLP Model for Automated Clickbait Detection
DOI:
https://doi.org/10.26877/asset.v8i1.2637Keywords:
IndoBERT, NLP system design, clickbait detection, machine learning pipeline, model evaluationAbstract
The rapid growth of digital news platforms necessitates reliable and automated systems for maintaining content quality at scale. This study presents the engineering and evaluation of an IndoBERT-based Natural Language Processing (NLP) framework for automated clickbait detection in Indonesian news headlines. The proposed framework is designed as an end-to-end text classification pipeline, incorporating data preprocessing, tokenization, fine-tuning of a pretrained IndoBERT model, and systematic performance evaluation. Experiments were conducted using the CLICK-ID dataset comprising 15,000 Indonesian news headlines, with an 80:20 stratified train–test split. The fine-tuned model achieved an accuracy of 0.83, with a precision of 0.82, recall of 0.77, and an F1-score of 0.79 for the clickbait class. Further evaluation using threshold-independent metrics yielded a ROC-AUC value of 0.89 and an average precision of 0.88, indicating strong discriminative capability under moderate class imbalance. Comparative analysis shows that the proposed approach outperforms prior CNN, Bi-LSTM, and ensemble-based methods evaluated on the same dataset. These results demonstrate that IndoBERT provides a robust foundation for engineering automated clickbait detection systems tailored to Indonesian-language news streams.
References
[1] Tejedor S, Portalés-Oliva M, Carniel-Bugs R, et al. Journalism Students and Information Consumption in the Era of Fake News. Media Commun 2021; 9: 338–350.
[2] Ardia DS, Ringel E, Ekstrand V, et al. Addressing the Decline of Local News, Rise of Platforms, and Spread of Mis- and Disinformation Online: A Summary of Current Research and Policy Proposals. SSRN Electronic Journal. Epub ahead of print 22 December 2020. DOI: 10.2139/SSRN.3765576.
[3] Garde-Eransus E, Llamas Saíz C. Discursive Strategies and Linguistic Marks in Clickbait Headlines. Estudios Sobre el Mensaje Periodistico 2025; 31: 1–13.
[4] Pelau C, Pop MI, Stanescu M, et al. The Breaking News Effect and Its Impact on the Credibility and Trust in Information Posted on Social Media. Electronics (Switzerland); 12. Epub ahead of print 1 January 2023. DOI: 10.3390/ELECTRONICS12020423.
[5] Molyneux L, Coddington M. Aggregation, Clickbait and Their Effect on Perceptions of Journalistic Credibility and Quality. Journalism Practice 2020; 14: 429–446.
[6] Carcioppolo N, Lun D, Mcfarlane SJ. Exaggerated and Questioning Clickbait Headlines and Their Influence on Media Learning. J Media Psychol 2022; 34: 30–41.
[7] Naeem B, Khan A, Beg MO, et al. A deep learning framework for clickbait detection on social area network using natural language cues. J Comput Soc Sci 2020; 3: 231–243.
[8] Dastidar AK, Khairnar A, Anand M, et al. Deep Dive into Clickbait Secrets: Integrating Multi-modal Features and Leveraging Deep Learning Architectures. Lecture Notes in Networks and Systems 2024; 1020 LNNS: 161–173.
[9] Jung AK, Stieglitz S, Kissmer T, et al. Click me. . .! The influence of clickbait on user engagement in social media and the role of digital nudging. PLoS One; 17. Epub ahead of print 1 June 2022. DOI: 10.1371/JOURNAL.PONE.0266743.
[10] Liu T, Yu K, Wang L, et al. WCD: A New Chinese Online Social Media Dataset for Clickbait Analysis and Detection. Proceedings of 2021 7th IEEE International Conference on Network Intelligence and Digital Content, IC-NIDC 2021 2021; 368–372.
[11] Coste CI, Bufnea D, Niculescu V. A New Language Independent Strategy for Clickbait Detection. 2020 28th International Conference on Software, Telecommunications and Computer Networks, SoftCOM 2020. Epub ahead of print 17 September 2020. DOI: 10.23919/SOFTCOM50211.2020.9238342.
[12] Yadav KK, Bansal N. A Comparative Study on Clickbait Detection using Machine Learning Based Methods. 2023 International Conference on Disruptive Technologies, ICDT 2023 2023; 661–665.
[13] Brogly C, Rubin VL. Detecting Clickbait: Here’s How to Do It. Canadian Journal of Information and Library Science 2018; 42: 154–175.
[14] Santoso Hadi P, Muljono, Fanani AZ, et al. Using Extra Weight in Machine Learning Algorithms for Clickbait Detection of Indonesia Online News Headlines. Proceedings - 2021 International Seminar on Application for Technology of Information and Communication: IT Opportunities and Creativities for Digital Innovation and Communication within Global Pandemic, iSemantic 2021 2021; 37–41.
[15] Adrian FHN, Handradika NN, Prasojo AE, et al. Clickbait Detection on Online News Headlines Using Naive Bayes and LSTM. International Conference on Artificial Intelligence and Mechatronics System, AIMS 2024. Epub ahead of print 2024. DOI: 10.1109/AIMS61812.2024.10512986.
[16] Pujahari A, Sisodia DS. Clickbait detection using multiple categorisation techniques. J Inf Sci 2021; 47: 118–128.
[17] Kurniawan S, Pramayoga AS, Ashari YF. An Ensemble-Based Approach for Detecting Clickbait in Indonesian Online Media. Jurnal Masyarakat Informatika 2025; 16: 104–118.
[18] Zheng J, Yu K, Wu X. A deep model based on Lure and Similarity for Adaptive Clickbait Detection. Knowl Based Syst; 214. Epub ahead of print 28 February 2021. DOI: 10.1016/J.KNOSYS.2020.106714.
[19] Satpute RS, Agrawal A. Machine Learning Approach for Ambiguity Detection in Social Media Context. 2023 International Conference on Communication, Security and Artificial Intelligence, ICCSAI 2023 2023; 516–522.
[20] Wang S, Luo J, Luo L. Large-scale Text Multiclass Classification Using Spark ML Packages. J Phys Conf Ser; 2171. Epub ahead of print 24 January 2022. DOI: 10.1088/1742-6596/2171/1/012022.
[21] Adrian FHN, Handradika NN, Prasojo AE, et al. Clickbait Detection on Online News Headlines Using Naive Bayes and LSTM. International Conference on Artificial Intelligence and Mechatronics System, AIMS 2024. Epub ahead of print 2024. DOI: 10.1109/AIMS61812.2024.10512986.
[22] Kongyoung S, Rugchatjaroen A, Kaothanthong N. Automatic feature extraction and classification model for detecting Thai clickbait headlines using convolutional neural network. Frontiers in Artificial Intelligence and Applications 2019; 312: 184–194.
[23] Alhanaya R, Alqarawi D, Alharbi B, et al. Mushakkal: Detecting Arabic Clickbait Using CNN with Various Optimizers. Journal of Information Technology Management 2024; 16: 64–78.
[24] Wei F, Nguyen UT. An Attention-Based Neural Network Using Human Semantic Knowledge and Its Application to Clickbait Detection. IEEE Open Journal of the Computer Society 2022; 3: 217–232.
[25] Jiayi G, Ke Y, Zhou H, et al. Clickbait Analysis and Detection Method on Chinese Social Media. Proceedings - 2022 8th International Conference on Big Data Computing and Communications, BigCom 2022 2022; 342–349.
[26] Suryanto TLM, Wibawa AP, Hariyono, et al. Comparative Performance of Transformer Models for Cultural Heritage in NLP Tasks. Advance Sustainable Science Engineering and Technology 2025; 7: 02501015–02501015.
[27] Rupa MC, Ramani K. Hybrid Approaches for Advanced Medical Text Summarization: Combining TF-IDF, BERT, and Seq2Seq Models. Advance Sustainable Science Engineering and Technology 2025; 7: 0250301–0250301.
[28] Sirusstara J, Alexander N, Alfarisy A, et al. Clickbait Headline Detection in Indonesian News Sites using Robustly Optimized BERT Pre-training Approach (RoBERTa). 2022 3rd International Conference on Artificial Intelligence and Data Sciences: Championing Innovations in Artificial Intelligence and Data Sciences for Sustainable Future, AiDAS 2022 - Proceedings 2022; 248–253.
[29] Fakhruzzaman MN, Gunawan SW. CekUmpanKlik: an artificial intelligence-based application to detect Indonesian clickbait. IAES International Journal of Artificial Intelligence 2022; 11: 1232–1238.
[30] Karen A, Christopher M, Qomariyah NN, et al. Clarifact-AI: Detecting Fake News in Indonesian Language with Natural Language Processing Using BiLSTM and IndoBERT Models. 10th International Conference on ICT for Smart Society, ICISS 2023 - Proceeding. Epub ahead of print 2023. DOI: 10.1109/ICISS59129.2023.10291714.
[31] Koto F, Rahimi A, Lau JH, et al. IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference 2020; 757–770.
[32] William A, Sari Y. CLICK-ID: A novel dataset for Indonesian clickbait headlines. Data Brief 2020; 32: 106231.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Advance Sustainable Science Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.



