By Khav SAROEUN, Software Engineer
Communication via voice message (instead of typing) is very popular in Cambodia. There are therefore many use cases for both Speech to Text and Text to Speech in Khmer. This blog post details the framework we used to assess the capabilities of various S2T tools in recognizing and transcribing Khmer speech, using as an example the Web Speech API. We’ll walk through converting audio files into the correct format, automating the transcription process, and saving the output to a text file for analysis.
What We’ll Learn
- How to convert audio file formats to be compatible with speech-to-text processing.
- Using Python for speech recognition and transcription with Google Web Speech API.
- Writing the transcribed text to a file.
Tools & Libraries Required
- Python: The core language used for scripting the automation process.
- SpeechRecognition: A library that integrates with the Google Web Speech API to handle transcription of audio files.
- Pydub: A library for converting and manipulating audio files (e.g., .m4a to .wav format).
- FFmpeg: A multimedia framework for audio conversion. Download and install make sure to set it up in your system environment variables (e.g., ffmpeg -version should display the version if installed correctly).
- OS & Pathlib: These Python libraries handle file management and operations for smoother automation.
Step-by-Step Code Guide
Note that all code here has been simplified (hardcoded paths etc) for readability.
1. Prepare the Audio File and Convert it
For this step, we want to write script for convert extension of audio file to correct format. This part will use Pydub and FFmpeg together.
Most audio recordings are in .m4a
, .ogg, . mp3, or pm4
format, but the Google Web Speech API requires .wav
format for accurate processing. Here’s how to convert your file:
Install Pydub and FFmpeg:
pip install pydub
sudo apt-get install ffmpeg
Example code:
from pydub import AudioSegment
from pathlib import Path
entries = Path('/path/to/your/folder')
for entry in entries.iterdir():
if entry.is_file() and entry.suffix == '.ogg': #set your current file extensions here
voice_file_path = entry.resolve()
wav_file_path = voice_file_path.with_suffix('.wav')
try:
#if successful, it will convert your audio file to extensions .wav
audio = AudioSegment.from_file(voice_file_path, format="ogg")
audio.export(wav_file_path, format="wav")
except Exception as e:
print(f"Error processing {voice_file_path}: {e}")
2. Transcribing Audio with SpeechRecognition in Khmer
This script is designed to transcribe audio recordings into text using the SpeechRecognition
library, specifically for the Khmer language (km-KH), the official language of Cambodia. It leverages the SpeechRecognition
library to convert spoken Khmer into text by interfacing with the Google Web Speech API, which supports Khmer language input.
Install Speech Recognition
pip install SpeechRecognition
pip install pathlib
import speech_recognition as sr
from pathlib import Path
recognizer = sr.Recognizer()
voice_file_path = Path("record/file.wav")
wav_file_path = voice_file_path.with_suffix(".wav")
with sr.AudioFile(str(wav_file_path)) as source:
audio_data = recognizer.record(source)
# Transcribe using Google Web Speech API to Khmer language recognition
transcribed_text = recognizer.recognize_google(audio_data, language="km-KH")
# You can write the transcribed text into a file
3. Full code
The full version of this code, We want to transcribe audio recordings into text, specifically for the Khmer language (km-KH). For the first, we will get folder of the audio files, and convert it to file.wav. Then, using SpeechRecognition
to transcribe audio recordings into text. Final, write the text to file.txt
from pathlib import Path
import speech_recognition as sr
from pydub import AudioSegment
# Manually set the path to the ffmpeg executable
AudioSegment.converter = r"C:\ffmpeg-7.0.2-full_build\ffmpeg-7.0.2-full_build\bin\ffmpeg.exe"
# Open the file to write the transcriptions
with open("results.txt", "a", encoding="utf-8") as result_file:
entries = Path('/path/to/your/folder')
for entry in entries.iterdir():
if entry.is_file() and entry.suffix == '.wav': # change the format to pm4, m4a, mp3, ogg, etc.
voice_file_path = entry.resolve()
wav_file_path = voice_file_path.with_suffix('.wav')
try:
# Convert the original audio file to wav
audio = AudioSegment.from_file(voice_file_path, format="wav") # change the format if needed
audio.export(wav_file_path, format="wav")
# Transcribe the converted wav file using Khmer language recognition
recognizer = sr.Recognizer()
with sr.AudioFile(str(wav_file_path)) as source:
audio_data = recognizer.record(source)
transcribed_text = recognizer.recognize_google(audio_data, language="km-KH")
# Write the transcribed text to the file
result_file.write(f"{transcribed_text}\n")
except Exception as e:
print(f"Error processing {voice_file_path}: {e}")
Folder structure

In result.txt
តើខ្ញុំអាចទាក់ទងអ្នកដោយរបៀបណាប្រសិនបើខ្ញុំមានបញ្ហា
តើអ្នកកំពុងតែធ្វើអ្វីនៅក្នុងបន្ទប់នេះ
តើប្រភេទកីឡាអ្វីដែលអ្នកចូលចិត្តជាងគេ
Conclusion
By following the above steps, you can test the Web Speech API’s ability to understand spoken Khmer and automate the process of converting, transcribing, and writing the output in result.txt. Keep in mind that the accuracy of the transcription may vary based on the clarity of the audio and background noise, so the results may not be perfect.