Enhancing Pose-Based Sign Language Recognition: A Comparative Study of Preprocessing Strategies with GRU and LSTM

Authors

  • Toby Purbojo Parahyangan Catholic University
  • Andreas Wijaya Parahyangan Catholic University

DOI:

https://doi.org/10.26877/sj5scb03

Keywords:

Isolated sign language recognition, Preprocessing, Feature engineering, Machine learning, Gated Recurrent Unit, Long Short-Term Memory

Abstract

Recognizing isolated sign language gestures is difficult due to differences in body proportions and missing pose landmarks. Many current methods struggle to work well across different signers. To solve this, we propose reference-based normalization, which reduces body shape differences by separately normalizing body parts such as the full body, arms, face, and hands. We tested this method using LSTM and GRU models on two datasets: a custom American Sign Language (ASL) dataset with one amateur signer, and the public WLASL dataset with various signers. On the custom dataset, the highest accuracy (97.75%) was achieved using LSTM with normalization applied only to the full body and hands, since the signer was consistent. For the WLASL dataset, adding normalization for the arms and face improved accuracy by 3.10% for LSTM and 0.77% for GRU. The GRU model reached the best WLASL result (74.03%) with fewer parameters than other advanced models. These findings show that reference-based normalization improves sign recognition performance and has potential for real-world use, especially in recognizing signs in continuous sequences.

Author Biographies

  • Toby Purbojo, Parahyangan Catholic University

    Center of Mathematics and Society, Faculty of Science, Parahyangan Catholic University, Ciumbuleuit No.94, Bandung, 40141, West Java, Indonesia.

  • Andreas Wijaya, Parahyangan Catholic University

    Modeling and Simulation Laboratory, Faculty of Science, Parahyangan Catholic University, Ciumbuleuit No.94, Bandung, 40141, West Java, Indonesia.

References

[1] World Health Organization, “Deafness and hearing loss.” Accessed: Nov. 14, 2024. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss.

[2] N. Sarhan and S. Frintrop, “Unraveling a decade: a comprehensive survey on isolated sign language recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3210–3219.

[3] S. Alyami and H. Luqman, “A Comparative Study of Continuous Sign Language Recognition Techniques,” arXiv preprint arXiv:2406.12369, 2024.

[4] H. Hu, W. Zhao, W. Zhou, Y. Wang, and H. Li, “SignBERT: Pre-training of hand-model-aware representation for sign language recognition,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 11087–11096.

[5] R. Zuo, F. Wei, and B. Mak, “Natural language-assisted sign language recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14890–14900.

[6] S. Yang and Q. Zhu, “Continuous Chinese sign language recognition with CNN-LSTM,” in Ninth international conference on digital image processing (ICDIP 2017), 2017, pp. 83–89.

[7] D. Kothadiya, C. Bhatt, K. Sapariya, K. Patel, A.-B. Gil-González, and J. M. Corchado, “Deepsign: Sign language detection and recognition using deep learning,” Electronics (Basel), vol. 11, no. 11, p. 1780, 2022.

[8] M. Pu, C. Y. Chong, and M. K. Lim, “Robustness evaluation in hand pose estimation models using metamorphic testing,” in 2023 IEEE/ACM 8th International Workshop on Metamorphic Testing (MET), 2023, pp. 31–38.

[9] M. Boháček and M. Hrúz, “Sign pose-based transformer for word-level sign language recognition,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2022, pp. 182–191.

[10] D. Li, C. Rodriguez, X. Yu, and H. Li, “Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2020, pp. 1459–1469.

[11] N. Naz, H. Sajid, S. Ali, O. Hasan, and M. K. Ehsan, “Signgraph: An efficient and accurate pose-based graph convolution approach toward sign language recognition,” IEEE Access, vol. 11, pp. 19135–19147, 2023.

[12] D. Laines, M. Gonzalez-Mendoza, G. Ochoa-Ruiz, and G. Bejarano, “Isolated sign language recognition based on tree structure skeleton images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 276–284.

[13] M. Maruyama, S. Ghose, K. Inoue, P. P. Roy, M. Iwamura, and M. Yoshioka, “Word-level sign language recognition with multi-stream neural networks focusing on local regions,” arXiv preprint arXiv:2106.15989, 2024.

[14] K. Roh, H. Lee, E. J. Hwang, S. Cho, and J. C. Park, “Preprocessing Mediapipe Keypoints with Keypoint Reconstruction and Anchors for Isolated Sign Language Recognition,” in Proceedings of the LREC-COLING 2024 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources, 2024, pp. 323–334.

[15] I. D. Raharjo and E. R. Subhiyakto, “Implementing Long Short Term Memory (LSTM) in Chatbots for Multi Usaha Raya,” Advance Sustainable Science Engineering and Technology, vol. 6, no. 4, p. 2404018, 2024.

[16] J. Xiao, T. Deng, and S. Bi, “Comparative Analysis of LSTM, GRU, and Transformer Models for Stock Price Prediction,” in Proceedings of the International Conference on Digital Economy, Blockchain and Artificial Intelligence, 2024, pp. 103–108.

[17] J. Hong et al., “Multi-forword-step state of charge prediction for real-world electric vehicles battery systems using a novel LSTM-GRU hybrid neural network,” Etransportation, vol. 20, p. 100322, 2024.

[18] A. Desai et al., “ASL citizen: a community-sourced dataset for advancing isolated sign language recognition,” Adv Neural Inf Process Syst, vol. 36, pp. 76893–76907, 2023.

[19] D. Li, C. Rodriguez, X. Yu, X. Li, Y. Huang, and D. Metaxas, “WLASL: A Large-Scale Dataset for Word-Level American Sign Language Recognition.” Accessed: Jan. 21, 2025. [Online]. Available: https://github.com/dxli94/WLASL

[20] M. De Coster, E. Rushe, R. Holmes, A. Ventresque, and J. Dambre, “Towards the extraction of robust sign embeddings for low resource sign language recognition,” arXiv preprint arXiv:2306.17558, 2023.

[21] Google AI, “MediaPipe Holistic,” Accessed: Jan. 21, 2025. [Online]. Available: https://github.com/google-ai-edge/mediapipe/blob/master/docs/solutions/holistic.md

[22] A. Tunga, S. V. Nuthalapati, and J. Wachs, “Pose-based sign language recognition using GCN and BERT,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2021, pp. 31–4

Downloads

Published

2025-04-30