, Duc-Chinh Nguyen
, Thai-Kim Dinh
, Tran Tien-Tam
, Do Tien Thanh
, Oscal Tzyh-Chiang Chen
This paper develops a robust and efficient method for the classification of Vietnamese Sign Language gestures. The study focuses on leveraging deep learning techniques, specifically a Graph Convolutional Network (GCN), to analyze hand skeletal points for gesture recognition. The Vietnamese Sign Language custom dataset (ViSL) of 33 characters and numbers, conducting experiments to validate the model's performance, and comparing it with existing architectures. The proposed approach integrates multiple streams of GCN, based on the lightweight MobileNet architecture. The custom dataset is preprocessed to extract key skeletal points using Mediapipe, forming the input for the multiple GCN. Experiments were conducted to evaluate the proposed model's accuracy, comparing its performance with traditional architectures such as VGG and ViT. The experimental results highlight the proposed model superior performance, achieving an accuracy of 99.94% test on the custom ViSL dataset, reach accuracy of 0.993% and 0.994% on American Sign Language (ASL) and ASL MINST dataset, respectivly. The multi-stream GCN approach significantly outperformed traditional architectures in terms of both accuracy and computational efficiency. This study demonstrates the effectiveness of using multi-stream GCNs based on MobileNet for ViSL recognition, showcasing their potential for real-world applications.
| Rights and permissions | |
|
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. |