Multilingual Speech Transcription System for Kazakh, Russian and English Languages

Authors

DOI:

https://doi.org/10.20508/95kg3z55

Keywords:

Multilingual Speech Transcription, Offline Speech Recognition, Language Detection, Audio Preprocessing, Digital Forensics

Abstract

This paper presents a novel multilingual speech transcription system designed for Kazakh, Russian, and English languages. Unlike existing solutions such as OpenAI Whisper and Kaldi-based offline models, the proposed system introduces three key innovations: (1) a specialized preprocessing pipeline optimized for Kazakh phonetic characteristics, (2) dynamic language detection with confidence scoring, and (3) fine-tuned acoustic models trained on a comprehensive trilingual dataset of 450 hours. The system achieves a 23% better Word Error Rate (WER) for Kazakh (8.7% vs. 11.3%) compared to Whisper and a 15% improvement for code-switched utterances. Statistical significance testing using paired t-tests (p < 0.001) and 95% confidence intervals confirm the superiority of the proposed approach across all target languages. Unlike existing solutions such as OpenAI Whisper which achieves 11.3% WER for Kazakh, the proposed system demonstrates 23% improvement with 8.7% WER (p < 0.001, 95% CI: [8.1%, 9.3%]), while maintaining complete offline functionality and specialized optimization for Central Asian linguistic patterns. However, OpenAI Whisper supports 99 languages, while proposed system supports 3 languages, and has better zero‑shot performance.

Downloads

Download data is not yet available.

Author Biographies

  • Leila Rzayeva, Research and Innovation Center “CyberTech”, Astana IT University Astana, Kazakhstan

    Research and Innovation Center “CyberTech”, Astana IT University Astana, Kazakhstan

  • Nursultan Nyssanov, Research and Innovation Center “CyberTech”, Astana IT University Astana, Kazakhstan

    Research and Innovation Center “CyberTech”, Astana IT University Astana, Kazakhstan

  • Zuleikha Syzdykova, Research and Innovation Center “CyberTech”, Astana IT University Astana, Kazakhstan

    Research and Innovation Center “CyberTech”, Astana IT University Astana, Kazakhstan

  • Kuandyk Niyazaliyev, Research and Innovation Center “CyberTech”, Astana IT University Astana, Kazakhstan

    Research and Innovation Center “CyberTech”, Astana IT University Astana, Kazakhstan

  • Alisher Batkuldin, Research and Innovation Center “CyberTech”, Astana IT University Astana, Kazakhstan

    Research and Innovation Center “CyberTech”, Astana IT University Astana, Kazakhstan

  • Timur Grigoryev, Research and Innovation Center “CyberTech”, Astana IT University Astana, Kazakhstan

    Research and Innovation Center “CyberTech”, Astana IT University Astana, Kazakhstan

Downloads

Published

19.09.2025

Issue

Section

RESEARCH ARTICLES

How to Cite

Multilingual Speech Transcription System for Kazakh, Russian and English Languages. (2025). Artificial Intelligence Research and Applications, 1(3), 105-114. https://doi.org/10.20508/95kg3z55