Voice Recognition with VOSK
Installing VOSK
Installing VOSK is surprisingly simple if you’re using Python3 and the latest Raspberry Pi OS. This is pretty much all we had to do:
#Python:
pip3 install vosk
#Dependencies:
sudo apt install libgfortran3
#Examples
pip3 install pyaudio
sudo apt install libportaudio2
git clone https://github.com/alphacep/vosk-api
#Download and unzip a model file, e.g. the small US-English model:
http://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
Then unzip the downloaded model ZIP file into the example folder and test:
#Unzip the model file and rename as “model” e.g.:
cd vosk-api/python/example/
unzip vosk-model-small-en-us-0.15.zip
mv vosk-model-small-en-us-0.15 model
#Ensure you have a working microphone configured, you can check by using:
arecord | aplay
#Run the test code
./test_microphone.py
Bluetooth Headset
Many Bluetooth headsets do not automatically add the microphone into the audio configuration file, so not immediately usable. If this is the case for your headset, then after selecting the headset in the audio menu:
- Edit .asoundrc, for example:
nano ~/.asoundrc - Change profile to “sco”
- Add the pcm.input sections by copying the pcm.output
- Add the capture.pcm section
- Save
- Test by using:
arecord | aplay
For example:
pcm.!default {
type asym
playback.pcm {
type plug
slave.pcm "output"
}
capture.pcm {
type plug
slave.pcm "input"
}
}
pcm.output {
type bluealsa
device "2E:A1:C2:2C:B8:84"
profile "sco"
}
pcm.input {
type bluealsa
device "2E:A1:C2:2C:B8:84"
profile "sco"
}
ctl.!default {
type bluealsa
}
Robot Test using Python
In order to test out the VOSK speech recognition on Chameleon, we had a very simple algorithm: wait for a voice command, then move for 0.5 second, stop and repeat.
Here’s our full test code:
from vosk import Model, KaldiRecognizer
import pyaudio
import sys
import json
print("Loading model...")
model = Model("model")
rec = KaldiRecognizer(model, 16000)
print("Opening stream...")
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=4000)
stream.start_stream()
print("Ready")
last_text = "x"
while True:
data = stream.read(4000, False)
if len(data) == 0:
break
if rec.AcceptWaveform(data):
result_json = rec.Result()
text = json.loads(result_json)['text']
else:
result_json = rec.PartialResult()
text = json.loads(result_json)['partial']
print(result_json)
if last_text != text:
# Adjust the motors
move_robot(text)
import redboard
import time
# RedBoard motor control
rb = redboard.RedBoard()
def left():
rb._set_motor_speed(0, 0.0)
rb._set_motor_speed(1, 0.2)
time.sleep(0.5)
stop()
def right():
rb._set_motor_speed(0, -0.2)
rb._set_motor_speed(1, 0.0)
time.sleep(0.5)
stop()
def ahead():
rb._set_motor_speed(0, -0.15)
rb._set_motor_speed(1, 0.15)
time.sleep(1.0)
rb._set_motor_speed(0, -0.02)
rb._set_motor_speed(1, 0.02)
def reverse():
rb._set_motor_speed(0, 0.15)
rb._set_motor_speed(1, -0.15)
time.sleep(0.5)
stop()
def stop():
rb._set_motor_speed(0, 0.0)
rb._set_motor_speed(1, 0.0)
def move_robot(cmd):
commands = {
'left': left,
'right': right,
'go': ahead,
'reverse': reverse,
'stop': stop,
'top': stop
}
commands.get(cmd, lambda: None)()
Test Run
Not bad! Only 1 minute 28 seconds. How fast can we now make it without stopping after each stage?